Hello, Thanks all for your feedback. I started voting procedure for this KIP. If there’re any other concerns about this KIP, please let me know.
Regards, Levani > On Jul 20, 2019, at 8:39 PM, Levani Kokhreidze <levani.co...@gmail.com> wrote: > > Hi Matthias, > > Thanks for the suggestion, makes sense. > I’ve updated KIP > (https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint>). > > Regards, > Levani > > >> On Jul 20, 2019, at 3:53 AM, Matthias J. Sax <matth...@confluent.io >> <mailto:matth...@confluent.io>> wrote: >> >> Thanks for driving the KIP. >> >> I agree that users need to be able to specify a partitioning strategy. >> >> Sophie raises a fair point about topic configs and producer configs. My >> take is, that consider `Repartitioned` as an "extension" to `Produced`, >> that adds topic configuration, is a good way to think about it and helps >> to keep the API "clean". >> >> >> With regard to method names. I would prefer to avoid abbreviations. Can >> we rename: >> >> `withNumOfPartitions` -> `withNumberOfPartitions` >> >> Furthermore, it might be good to add some more `static` methods: >> >> - Repartitioned.with(Serde<K>, Serde<V>) >> - Repartitioned.withNumberOfPartitions(int) >> - Repartitioned.streamPartitioner(StreamPartitioner) >> >> >> -Matthias >> >> On 7/19/19 3:33 PM, Levani Kokhreidze wrote: >>> Totally agree. I think in KStream interface it makes sense to have some >>> duplicate configurations between operators in order to keep API simple and >>> usable. >>> Also, as more surface API has, harder it is to have proper backward >>> compatibility. >>> While initial idea of keeping topic level configs separate was exciting, >>> having Repartitioned class encapsulate some producer level configs makes >>> API more readable. >>> >>> Regards, >>> Levani >>> >>>> On Jul 20, 2019, at 1:15 AM, Sophie Blee-Goldman <sop...@confluent.io >>>> <mailto:sop...@confluent.io>> wrote: >>>> >>>> I think that is a good point about trying to keep producer level >>>> configurations and (repartition) topic level considerations separate. >>>> Number of partitions is definitely purely a topic level configuration. But >>>> on some level, serdes and partitioners are just as much a topic >>>> configuration as a producer one. You could have two producers configured >>>> with different serdes and/or partitioners, but if they are writing to the >>>> same topic the result would be very difficult to part. So in a sense, these >>>> are configurations of topics in Streams, not just producers. >>>> >>>> Another way to think of it: while the Streams API is not always true to >>>> this, ideally all the relevant configs for an operator are wrapped into a >>>> single object (in this case, Repartitioned). We could instead split out the >>>> fields in common with Produced into a separate parameter to keep topic and >>>> producer level configurations separate, but this increases the API surface >>>> area by a lot. It's much more straightforward to just say "this is >>>> everything that this particular operator needs" without worrying about what >>>> exactly you're specifying. >>>> >>>> I suppose you could alternatively make Produced a field of Repartitioned, >>>> but I don't think we do this kind of composition elsewhere in Streams at >>>> the moment >>>> >>>> On Fri, Jul 19, 2019 at 1:45 PM Levani Kokhreidze <levani.co...@gmail.com >>>> <mailto:levani.co...@gmail.com>> >>>> wrote: >>>> >>>>> Hi Bill, >>>>> >>>>> Thanks a lot for the feedback. >>>>> Yes, that makes sense. I’ve updated KIP with `Repartitioned#partitioner` >>>>> configuration. >>>>> In the beginning, I wanted to introduce a class for topic level >>>>> configuration and keep topic level and producer level configurations (such >>>>> as Produced) separately (see my second email in this thread). >>>>> But while looking at the semantics of KStream interface, I couldn’t really >>>>> figure out good operation name for Topic level configuration class and >>>>> just >>>>> introducing `Topic` config class was kinda breaking the semantics. >>>>> So I think having Repartitioned class which encapsulates topic and >>>>> producer level configurations for internal topics is viable thing to do. >>>>> >>>>> Regards, >>>>> Levani >>>>> >>>>>> On Jul 19, 2019, at 7:47 PM, Bill Bejeck <bbej...@gmail.com >>>>>> <mailto:bbej...@gmail.com>> wrote: >>>>>> >>>>>> Hi Lavani, >>>>>> >>>>>> Thanks for resurrecting this KIP. >>>>>> >>>>>> I'm also a +1 for adding a partition option. In addition to the reason >>>>>> provided by John, my reasoning is: >>>>>> >>>>>> 1. Users may want to use something other than hash-based partitioning >>>>>> 2. Users may wish to partition on something different than the key >>>>>> without having to change the key. For example: >>>>>> 1. A combination of fields in the value in conjunction with the key >>>>>> 2. Something other than the key >>>>>> 3. We allow users to specify a partitioner on Produced hence in >>>>>> KStream.to and KStream.through, so it makes sense for API consistency. >>>>>> >>>>>> Just my 2 cents. >>>>>> >>>>>> Thanks, >>>>>> Bill >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Jul 19, 2019 at 5:46 AM Levani Kokhreidze < >>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> >>>>>> wrote: >>>>>> >>>>>>> Hi John, >>>>>>> >>>>>>> In my mind it makes sense. >>>>>>> If we add partitioner configuration to Repartitioned class, with the >>>>>>> combination of specifying number of partitions for internal topics, user >>>>>>> will have opportunity to ensure co-partitioning before join operation. >>>>>>> I think this can be quite powerful feature. >>>>>>> Wondering what others think about this? >>>>>>> >>>>>>> Regards, >>>>>>> Levani >>>>>>> >>>>>>>> On Jul 18, 2019, at 1:20 AM, John Roesler <j...@confluent.io >>>>>>>> <mailto:j...@confluent.io>> wrote: >>>>>>>> >>>>>>>> Yes, I believe that's what I had in mind. Again, not totally sure it >>>>>>>> makes sense, but I believe something similar is the rationale for >>>>>>>> having the partitioner option in Produced. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> -John >>>>>>>> >>>>>>>> On Wed, Jul 17, 2019 at 3:20 PM Levani Kokhreidze >>>>>>>> <levani.co...@gmail.com <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>> >>>>>>>>> Hey John, >>>>>>>>> >>>>>>>>> Oh that’s interesting use-case. >>>>>>>>> Do I understand this correctly, in your example I would first issue >>>>>>> repartition(Repartitioned) with proper partitioner that essentially >>>>> would >>>>>>> be the same as the topic I want to join with and then do the >>>>> KStream#join >>>>>>> with DSL? >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Levani >>>>>>>>> >>>>>>>>>> On Jul 17, 2019, at 11:11 PM, John Roesler <j...@confluent.io >>>>>>>>>> <mailto:j...@confluent.io>> >>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hey, all, just to chime in, >>>>>>>>>> >>>>>>>>>> I think it might be useful to have an option to specify the >>>>>>>>>> partitioner. The case I have in mind is that some data may get >>>>>>>>>> repartitioned and then joined with an input topic. If the right-side >>>>>>>>>> input topic uses a custom partitioning strategy, then the >>>>>>>>>> repartitioned stream also needs to be partitioned with the same >>>>>>>>>> strategy. >>>>>>>>>> >>>>>>>>>> Does that make sense, or did I maybe miss something important? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> -John >>>>>>>>>> >>>>>>>>>> On Wed, Jul 17, 2019 at 2:48 PM Levani Kokhreidze >>>>>>>>>> <levani.co...@gmail.com <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>>>> >>>>>>>>>>> Yes, I was thinking about it as well. To be honest I’m not sure >>>>> about >>>>>>> it yet. >>>>>>>>>>> As Kafka Streams DSL user, I don’t really think I would need control >>>>>>> over partitioner for internal topics. >>>>>>>>>>> As a user, I would assume that Kafka Streams knows best how to >>>>>>> partition data for internal topics. >>>>>>>>>>> In this KIP I wrote that Produced should be used only for topics >>>>> that >>>>>>> are created by user In advance. >>>>>>>>>>> In those cases maybe it make sense to have possibility to specify >>>>> the >>>>>>> partitioner. >>>>>>>>>>> I don’t have clear answer on that yet, but I guess specifying the >>>>>>> partitioner can be added as well if there’s agreement on this. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Levani >>>>>>>>>>> >>>>>>>>>>>> On Jul 17, 2019, at 10:42 PM, Sophie Blee-Goldman < >>>>>>> sop...@confluent.io <mailto:sop...@confluent.io>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Thanks for clearing that up. I agree that Repartitioned would be a >>>>>>> useful >>>>>>>>>>>> addition. I'm wondering if it might also need to have >>>>>>>>>>>> a withStreamPartitioner method/field, similar to Produced? I'm not >>>>>>> sure how >>>>>>>>>>>> widely this feature is really used, but seems it should be >>>>> available >>>>>>> for >>>>>>>>>>>> repartition topics. >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Jul 17, 2019 at 11:26 AM Levani Kokhreidze < >>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hey Sophie, >>>>>>>>>>>>> >>>>>>>>>>>>> In both cases KStream#repartition and >>>>>>> KStream#repartition(Repartitioned) >>>>>>>>>>>>> topic will be created and managed by Kafka Streams. >>>>>>>>>>>>> Idea of Repartitioned is to give user more control over the topic >>>>>>> such as >>>>>>>>>>>>> num of partitions. >>>>>>>>>>>>> I feel like Repartitioned parameter is something that is missing >>>>> in >>>>>>>>>>>>> current DSL design. >>>>>>>>>>>>> Essentially giving user control over parallelism by configuring >>>>> num >>>>>>> of >>>>>>>>>>>>> partitions for internal topics. >>>>>>>>>>>>> >>>>>>>>>>>>> Hope this answers your question. >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Levani >>>>>>>>>>>>> >>>>>>>>>>>>>> On Jul 17, 2019, at 9:02 PM, Sophie Blee-Goldman < >>>>>>> sop...@confluent.io <mailto:sop...@confluent.io>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hey Levani, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for the KIP! Can you clarify one thing for me -- for the >>>>>>>>>>>>>> KStream#repartition signature taking a Repartitioned, will the >>>>>>> topic be >>>>>>>>>>>>>> auto-created by Streams (which seems to be the case for the >>>>>>> signature >>>>>>>>>>>>>> without a Repartitioned) or does it have to be pre-created? The >>>>>>> wording >>>>>>>>>>>>> in >>>>>>>>>>>>>> the KIP makes it seem like one version of the method will >>>>>>> auto-create >>>>>>>>>>>>>> topics while the other will not. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> Sophie >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 10:15 AM Levani Kokhreidze < >>>>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> One more bump about KIP-221 ( >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>> >>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>> >>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint> >>>>>>>>>>>>>>> < >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>> >>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>> ) >>>>>>>>>>>>>>> so it doesn’t get lost in mailing list :) >>>>>>>>>>>>>>> Would love to hear communities opinions/concerns about this KIP. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze < >>>>>>> levani.co...@gmail.com >>>>>>>>>>>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Kind reminder about this KIP: >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>> >>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>> < >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>> >>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze < >>>>>>>>>>>>> levani.co...@gmail.com >>>>>>>>>>>>>>> <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> In order to move this KIP forward, I’ve updated confluence >>>>> page >>>>>>> with >>>>>>>>>>>>>>> the new proposal >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>> >>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>> < >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>> >>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I’ve also filled “Rejected Alternatives” section. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Looking forward to discuss this KIP :) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> King regards, >>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 1:08 PM, Levani Kokhreidze < >>>>>>>>>>>>> levani.co...@gmail.com >>>>>>>>>>>>>>> <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hello Matthias, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks for the feedback and ideas. >>>>>>>>>>>>>>>>>> I like the idea of introducing dedicated `Topic` class for >>>>>>> topic >>>>>>>>>>>>>>> configuration for internal operators like `groupedBy`. >>>>>>>>>>>>>>>>>> Would be great to hear others opinion about this as well. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 7:00 AM, Matthias J. Sax < >>>>>>> matth...@confluent.io >>>>>>>>>>>>>>> <mailto:matth...@confluent.io>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Levani, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks for picking up this KIP! And thanks for summarizing >>>>>>>>>>>>> everything. >>>>>>>>>>>>>>>>>>> Even if some points may have been discussed already (can't >>>>>>> really >>>>>>>>>>>>>>>>>>> remember), it's helpful to get a good summary to refresh the >>>>>>>>>>>>>>> discussion. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I think your reasoning makes sense. With regard to the >>>>>>> distinction >>>>>>>>>>>>>>>>>>> between operators that manage topics and operators that use >>>>>>>>>>>>>>> user-created >>>>>>>>>>>>>>>>>>> topics: Following this argument, it might indicate that >>>>>>> leaving >>>>>>>>>>>>>>>>>>> `through()` as-is (as an operator that uses use-defined >>>>>>> topics) and >>>>>>>>>>>>>>>>>>> introducing a new `repartition()` operator (an operator that >>>>>>> manages >>>>>>>>>>>>>>>>>>> topics itself) might be good. Otherwise, there is one >>>>> operator >>>>>>>>>>>>>>>>>>> `through()` that sometimes manages topics but sometimes >>>>> not; a >>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>>>> name, ie, new operator would make the distinction clearer. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> About adding `numOfPartitions` to `Grouped`. I am wondering >>>>>>> if the >>>>>>>>>>>>>>> same >>>>>>>>>>>>>>>>>>> argument as for `Produced` does apply and adding it is >>>>>>> semantically >>>>>>>>>>>>>>>>>>> questionable? Might be good to get opinions of others on >>>>>>> this, too. >>>>>>>>>>>>> I >>>>>>>>>>>>>>> am >>>>>>>>>>>>>>>>>>> not sure myself what solution I prefer atm. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> So far, KS uses configuration objects that allow to >>>>> configure >>>>>>> a >>>>>>>>>>>>>>> certain >>>>>>>>>>>>>>>>>>> "entity" like a consumer, producer, store. If we assume that >>>>>>> a topic >>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>> a similar entity, I am wonder if we should have a >>>>>>>>>>>>>>>>>>> `Topic#withNumberOfPartitions()` class and method instead of >>>>>>> a plain >>>>>>>>>>>>>>>>>>> integer? This would allow us to add other configs, like >>>>>>> replication >>>>>>>>>>>>>>>>>>> factor, retention-time etc, easily, without the need to >>>>>>> change the >>>>>>>>>>>>>>> "main >>>>>>>>>>>>>>>>>>> API". >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Just want to give some ideas. Not sure if I like them >>>>> myself. >>>>>>> :) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 7/1/19 1:04 AM, Levani Kokhreidze wrote: >>>>>>>>>>>>>>>>>>>> Actually, giving it more though - maybe enhancing Produced >>>>>>> with num >>>>>>>>>>>>>>> of partitions configuration is not the best approach. Let me >>>>>>> explain >>>>>>>>>>>>> why: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 1) If we enhance Produced class with this configuration, >>>>>>> this will >>>>>>>>>>>>>>> also affect KStream#to operation. Since KStream#to is the final >>>>>>> sink of >>>>>>>>>>>>> the >>>>>>>>>>>>>>> topology, for me, it seems to be reasonable assumption that user >>>>>>> needs >>>>>>>>>>>>> to >>>>>>>>>>>>>>> manually create sink topic in advance. And in that case, having >>>>>>> num of >>>>>>>>>>>>>>> partitions configuration doesn’t make much sense. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 2) Looking at Produced class, based on API contract, seems >>>>>>> like >>>>>>>>>>>>>>> Produced is designed to be something that is explicitly for >>>>>>> producer >>>>>>>>>>>>> (key >>>>>>>>>>>>>>> serializer, value serializer, partitioner those all are producer >>>>>>>>>>>>> specific >>>>>>>>>>>>>>> configurations) and num of partitions is topic level >>>>>>> configuration. And >>>>>>>>>>>>> I >>>>>>>>>>>>>>> don’t think mixing topic and producer level configurations >>>>>>> together in >>>>>>>>>>>>> one >>>>>>>>>>>>>>> class is the good approach. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 3) Looking at KStream interface, seems like Produced >>>>>>> parameter is >>>>>>>>>>>>>>> for operations that work with non-internal (e.g topics created >>>>> and >>>>>>>>>>>>> managed >>>>>>>>>>>>>>> internally by Kafka Streams) topics and I think we should leave >>>>>>> it as >>>>>>>>>>>>> it is >>>>>>>>>>>>>>> in that case. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Taking all this things into account, I think we should >>>>>>> distinguish >>>>>>>>>>>>>>> between DSL operations, where Kafka Streams should create and >>>>>>> manage >>>>>>>>>>>>>>> internal topics (KStream#groupBy) vs topics that should be >>>>>>> created in >>>>>>>>>>>>>>> advance (e.g KStream#to). >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> To sum it up, I think adding numPartitions configuration in >>>>>>>>>>>>> Produced >>>>>>>>>>>>>>> will result in mixing topic and producer level configuration in >>>>>>> one >>>>>>>>>>>>> class >>>>>>>>>>>>>>> and it’s gonna break existing API semantics. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Regarding making topic name optional in KStream#through - I >>>>>>> think >>>>>>>>>>>>>>> underline idea is very useful and giving users possibility to >>>>>>> specify >>>>>>>>>>>>> num >>>>>>>>>>>>>>> of partitions there is even more useful :) Considering arguments >>>>>>> against >>>>>>>>>>>>>>> adding num of partitions in Produced class, I see two options >>>>>>> here: >>>>>>>>>>>>>>>>>>>> 1) Add following method overloads >>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and num of >>>>>>> partitions >>>>>>>>>>>>>>> will be taken from source topic >>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic will be auto >>>>>>>>>>>>>>> generated with specified num of partitions >>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final Produced<K, V> >>>>>>>>>>>>>>> produced) - topic will be with generated with specified num of >>>>>>>>>>>>> partitions >>>>>>>>>>>>>>> and configuration taken from produced parameter. >>>>>>>>>>>>>>>>>>>> 2) Leave KStream#through as it is and introduce new method >>>>> - >>>>>>>>>>>>>>> KStream#repartition (I think Matthias suggested this in one of >>>>> the >>>>>>>>>>>>> threads) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Considering all mentioned above I propose the following >>>>> plan: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Option A: >>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is >>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to Grouped class (as >>>>>>>>>>>>>>> mentioned in the KIP) >>>>>>>>>>>>>>>>>>>> 3) Add following method overloads to KStream#through >>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and num of >>>>>>> partitions >>>>>>>>>>>>>>> will be taken from source topic >>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic will be auto >>>>>>>>>>>>>>> generated with specified num of partitions >>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final Produced<K, V> >>>>>>>>>>>>>>> produced) - topic will be with generated with specified num of >>>>>>>>>>>>> partitions >>>>>>>>>>>>>>> and configuration taken from produced parameter. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Option B: >>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is >>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to Grouped class (as >>>>>>>>>>>>>>> mentioned in the KIP) >>>>>>>>>>>>>>>>>>>> 3) Add new operator KStream#repartition for creating and >>>>>>> managing >>>>>>>>>>>>>>> internal repartition topics >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> P.S. I’m sorry if all of this was already discussed in the >>>>>>> mailing >>>>>>>>>>>>>>> list, but I kinda got with all the threads that were about this >>>>>>> KIP :( >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Jul 1, 2019, at 9:56 AM, Levani Kokhreidze < >>>>>>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I would like to resurrect discussion around KIP-221. Going >>>>>>> through >>>>>>>>>>>>>>> the discussion thread, there’s seems to agreement around >>>>>>> usefulness of >>>>>>>>>>>>> this >>>>>>>>>>>>>>> feature. >>>>>>>>>>>>>>>>>>>>> Regarding the implementation, as far as I understood, the >>>>>>> most >>>>>>>>>>>>>>> optimal solution for me seems the following: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 1) Add two method overloads to KStream#through method >>>>>>> (essentially >>>>>>>>>>>>>>> making topic name optional) >>>>>>>>>>>>>>>>>>>>> 2) Enhance Produced class with numOfPartitions >>>>> configuration >>>>>>>>>>>>> field. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Those two changes will allow DSL users to control >>>>>>> parallelism and >>>>>>>>>>>>>>> trigger re-partition without doing stateful operations. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I will update KIP with interface changes around >>>>>>> KStream#through if >>>>>>>>>>>>>>> this changes sound sensible. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>> >>> >> >