Hi Matthias, Thanks for the suggestion. I Don’t have strong opinion on that one. Agree that avoiding unnecessary method overloads is a good idea.
Updated KIP Regards, Levani > On Jul 24, 2019, at 8:50 PM, Matthias J. Sax <matth...@confluent.io> wrote: > > One question: > > Why do we add > >> Repartitioned#with(final String name, final int numberOfPartitions) > > It seems that `#with(String name)`, `#numberOfPartitions(int)` in > combination with `withName()` and `withNumberOfPartitions()` should be > sufficient. Users can chain the method calls. > > (I think it's valuable to keep the number of overload small if possible.) > > Otherwise LGTM. > > > -Matthias > > > On 7/23/19 2:18 PM, Levani Kokhreidze wrote: >> Hello, >> >> Thanks all for your feedback. >> I started voting procedure for this KIP. If there’re any other concerns >> about this KIP, please let me know. >> >> Regards, >> Levani >> >>> On Jul 20, 2019, at 8:39 PM, Levani Kokhreidze <levani.co...@gmail.com> >>> wrote: >>> >>> Hi Matthias, >>> >>> Thanks for the suggestion, makes sense. >>> I’ve updated KIP >>> (https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>> >>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint>). >>> >>> Regards, >>> Levani >>> >>> >>>> On Jul 20, 2019, at 3:53 AM, Matthias J. Sax <matth...@confluent.io >>>> <mailto:matth...@confluent.io>> wrote: >>>> >>>> Thanks for driving the KIP. >>>> >>>> I agree that users need to be able to specify a partitioning strategy. >>>> >>>> Sophie raises a fair point about topic configs and producer configs. My >>>> take is, that consider `Repartitioned` as an "extension" to `Produced`, >>>> that adds topic configuration, is a good way to think about it and helps >>>> to keep the API "clean". >>>> >>>> >>>> With regard to method names. I would prefer to avoid abbreviations. Can >>>> we rename: >>>> >>>> `withNumOfPartitions` -> `withNumberOfPartitions` >>>> >>>> Furthermore, it might be good to add some more `static` methods: >>>> >>>> - Repartitioned.with(Serde<K>, Serde<V>) >>>> - Repartitioned.withNumberOfPartitions(int) >>>> - Repartitioned.streamPartitioner(StreamPartitioner) >>>> >>>> >>>> -Matthias >>>> >>>> On 7/19/19 3:33 PM, Levani Kokhreidze wrote: >>>>> Totally agree. I think in KStream interface it makes sense to have some >>>>> duplicate configurations between operators in order to keep API simple >>>>> and usable. >>>>> Also, as more surface API has, harder it is to have proper backward >>>>> compatibility. >>>>> While initial idea of keeping topic level configs separate was exciting, >>>>> having Repartitioned class encapsulate some producer level configs makes >>>>> API more readable. >>>>> >>>>> Regards, >>>>> Levani >>>>> >>>>>> On Jul 20, 2019, at 1:15 AM, Sophie Blee-Goldman <sop...@confluent.io >>>>>> <mailto:sop...@confluent.io>> wrote: >>>>>> >>>>>> I think that is a good point about trying to keep producer level >>>>>> configurations and (repartition) topic level considerations separate. >>>>>> Number of partitions is definitely purely a topic level configuration. >>>>>> But >>>>>> on some level, serdes and partitioners are just as much a topic >>>>>> configuration as a producer one. You could have two producers configured >>>>>> with different serdes and/or partitioners, but if they are writing to the >>>>>> same topic the result would be very difficult to part. So in a sense, >>>>>> these >>>>>> are configurations of topics in Streams, not just producers. >>>>>> >>>>>> Another way to think of it: while the Streams API is not always true to >>>>>> this, ideally all the relevant configs for an operator are wrapped into a >>>>>> single object (in this case, Repartitioned). We could instead split out >>>>>> the >>>>>> fields in common with Produced into a separate parameter to keep topic >>>>>> and >>>>>> producer level configurations separate, but this increases the API >>>>>> surface >>>>>> area by a lot. It's much more straightforward to just say "this is >>>>>> everything that this particular operator needs" without worrying about >>>>>> what >>>>>> exactly you're specifying. >>>>>> >>>>>> I suppose you could alternatively make Produced a field of Repartitioned, >>>>>> but I don't think we do this kind of composition elsewhere in Streams at >>>>>> the moment >>>>>> >>>>>> On Fri, Jul 19, 2019 at 1:45 PM Levani Kokhreidze >>>>>> <levani.co...@gmail.com <mailto:levani.co...@gmail.com>> >>>>>> wrote: >>>>>> >>>>>>> Hi Bill, >>>>>>> >>>>>>> Thanks a lot for the feedback. >>>>>>> Yes, that makes sense. I’ve updated KIP with `Repartitioned#partitioner` >>>>>>> configuration. >>>>>>> In the beginning, I wanted to introduce a class for topic level >>>>>>> configuration and keep topic level and producer level configurations >>>>>>> (such >>>>>>> as Produced) separately (see my second email in this thread). >>>>>>> But while looking at the semantics of KStream interface, I couldn’t >>>>>>> really >>>>>>> figure out good operation name for Topic level configuration class and >>>>>>> just >>>>>>> introducing `Topic` config class was kinda breaking the semantics. >>>>>>> So I think having Repartitioned class which encapsulates topic and >>>>>>> producer level configurations for internal topics is viable thing to do. >>>>>>> >>>>>>> Regards, >>>>>>> Levani >>>>>>> >>>>>>>> On Jul 19, 2019, at 7:47 PM, Bill Bejeck <bbej...@gmail.com >>>>>>>> <mailto:bbej...@gmail.com>> wrote: >>>>>>>> >>>>>>>> Hi Lavani, >>>>>>>> >>>>>>>> Thanks for resurrecting this KIP. >>>>>>>> >>>>>>>> I'm also a +1 for adding a partition option. In addition to the reason >>>>>>>> provided by John, my reasoning is: >>>>>>>> >>>>>>>> 1. Users may want to use something other than hash-based partitioning >>>>>>>> 2. Users may wish to partition on something different than the key >>>>>>>> without having to change the key. For example: >>>>>>>> 1. A combination of fields in the value in conjunction with the key >>>>>>>> 2. Something other than the key >>>>>>>> 3. We allow users to specify a partitioner on Produced hence in >>>>>>>> KStream.to and KStream.through, so it makes sense for API consistency. >>>>>>>> >>>>>>>> Just my 2 cents. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Bill >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jul 19, 2019 at 5:46 AM Levani Kokhreidze < >>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi John, >>>>>>>>> >>>>>>>>> In my mind it makes sense. >>>>>>>>> If we add partitioner configuration to Repartitioned class, with the >>>>>>>>> combination of specifying number of partitions for internal topics, >>>>>>>>> user >>>>>>>>> will have opportunity to ensure co-partitioning before join operation. >>>>>>>>> I think this can be quite powerful feature. >>>>>>>>> Wondering what others think about this? >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Levani >>>>>>>>> >>>>>>>>>> On Jul 18, 2019, at 1:20 AM, John Roesler <j...@confluent.io >>>>>>>>>> <mailto:j...@confluent.io>> wrote: >>>>>>>>>> >>>>>>>>>> Yes, I believe that's what I had in mind. Again, not totally sure it >>>>>>>>>> makes sense, but I believe something similar is the rationale for >>>>>>>>>> having the partitioner option in Produced. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> -John >>>>>>>>>> >>>>>>>>>> On Wed, Jul 17, 2019 at 3:20 PM Levani Kokhreidze >>>>>>>>>> <levani.co...@gmail.com <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hey John, >>>>>>>>>>> >>>>>>>>>>> Oh that’s interesting use-case. >>>>>>>>>>> Do I understand this correctly, in your example I would first issue >>>>>>>>> repartition(Repartitioned) with proper partitioner that essentially >>>>>>> would >>>>>>>>> be the same as the topic I want to join with and then do the >>>>>>> KStream#join >>>>>>>>> with DSL? >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Levani >>>>>>>>>>> >>>>>>>>>>>> On Jul 17, 2019, at 11:11 PM, John Roesler <j...@confluent.io >>>>>>>>>>>> <mailto:j...@confluent.io>> >>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hey, all, just to chime in, >>>>>>>>>>>> >>>>>>>>>>>> I think it might be useful to have an option to specify the >>>>>>>>>>>> partitioner. The case I have in mind is that some data may get >>>>>>>>>>>> repartitioned and then joined with an input topic. If the >>>>>>>>>>>> right-side >>>>>>>>>>>> input topic uses a custom partitioning strategy, then the >>>>>>>>>>>> repartitioned stream also needs to be partitioned with the same >>>>>>>>>>>> strategy. >>>>>>>>>>>> >>>>>>>>>>>> Does that make sense, or did I maybe miss something important? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> -John >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Jul 17, 2019 at 2:48 PM Levani Kokhreidze >>>>>>>>>>>> <levani.co...@gmail.com <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Yes, I was thinking about it as well. To be honest I’m not sure >>>>>>> about >>>>>>>>> it yet. >>>>>>>>>>>>> As Kafka Streams DSL user, I don’t really think I would need >>>>>>>>>>>>> control >>>>>>>>> over partitioner for internal topics. >>>>>>>>>>>>> As a user, I would assume that Kafka Streams knows best how to >>>>>>>>> partition data for internal topics. >>>>>>>>>>>>> In this KIP I wrote that Produced should be used only for topics >>>>>>> that >>>>>>>>> are created by user In advance. >>>>>>>>>>>>> In those cases maybe it make sense to have possibility to specify >>>>>>> the >>>>>>>>> partitioner. >>>>>>>>>>>>> I don’t have clear answer on that yet, but I guess specifying the >>>>>>>>> partitioner can be added as well if there’s agreement on this. >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Levani >>>>>>>>>>>>> >>>>>>>>>>>>>> On Jul 17, 2019, at 10:42 PM, Sophie Blee-Goldman < >>>>>>>>> sop...@confluent.io <mailto:sop...@confluent.io>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for clearing that up. I agree that Repartitioned would be >>>>>>>>>>>>>> a >>>>>>>>> useful >>>>>>>>>>>>>> addition. I'm wondering if it might also need to have >>>>>>>>>>>>>> a withStreamPartitioner method/field, similar to Produced? I'm >>>>>>>>>>>>>> not >>>>>>>>> sure how >>>>>>>>>>>>>> widely this feature is really used, but seems it should be >>>>>>> available >>>>>>>>> for >>>>>>>>>>>>>> repartition topics. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 11:26 AM Levani Kokhreidze < >>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hey Sophie, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In both cases KStream#repartition and >>>>>>>>> KStream#repartition(Repartitioned) >>>>>>>>>>>>>>> topic will be created and managed by Kafka Streams. >>>>>>>>>>>>>>> Idea of Repartitioned is to give user more control over the >>>>>>>>>>>>>>> topic >>>>>>>>> such as >>>>>>>>>>>>>>> num of partitions. >>>>>>>>>>>>>>> I feel like Repartitioned parameter is something that is missing >>>>>>> in >>>>>>>>>>>>>>> current DSL design. >>>>>>>>>>>>>>> Essentially giving user control over parallelism by configuring >>>>>>> num >>>>>>>>> of >>>>>>>>>>>>>>> partitions for internal topics. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hope this answers your question. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Jul 17, 2019, at 9:02 PM, Sophie Blee-Goldman < >>>>>>>>> sop...@confluent.io <mailto:sop...@confluent.io>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hey Levani, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks for the KIP! Can you clarify one thing for me -- for the >>>>>>>>>>>>>>>> KStream#repartition signature taking a Repartitioned, will the >>>>>>>>> topic be >>>>>>>>>>>>>>>> auto-created by Streams (which seems to be the case for the >>>>>>>>> signature >>>>>>>>>>>>>>>> without a Repartitioned) or does it have to be pre-created? The >>>>>>>>> wording >>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>> the KIP makes it seem like one version of the method will >>>>>>>>> auto-create >>>>>>>>>>>>>>>> topics while the other will not. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>> Sophie >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 10:15 AM Levani Kokhreidze < >>>>>>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> One more bump about KIP-221 ( >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>> >>>>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint> >>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>> ) >>>>>>>>>>>>>>>>> so it doesn’t get lost in mailing list :) >>>>>>>>>>>>>>>>> Would love to hear communities opinions/concerns about this >>>>>>>>>>>>>>>>> KIP. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze < >>>>>>>>> levani.co...@gmail.com >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Kind reminder about this KIP: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze < >>>>>>>>>>>>>>> levani.co...@gmail.com >>>>>>>>>>>>>>>>> <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> In order to move this KIP forward, I’ve updated confluence >>>>>>> page >>>>>>>>> with >>>>>>>>>>>>>>>>> the new proposal >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I’ve also filled “Rejected Alternatives” section. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Looking forward to discuss this KIP :) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> King regards, >>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 1:08 PM, Levani Kokhreidze < >>>>>>>>>>>>>>> levani.co...@gmail.com >>>>>>>>>>>>>>>>> <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hello Matthias, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks for the feedback and ideas. >>>>>>>>>>>>>>>>>>>> I like the idea of introducing dedicated `Topic` class for >>>>>>>>> topic >>>>>>>>>>>>>>>>> configuration for internal operators like `groupedBy`. >>>>>>>>>>>>>>>>>>>> Would be great to hear others opinion about this as well. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 7:00 AM, Matthias J. Sax < >>>>>>>>> matth...@confluent.io >>>>>>>>>>>>>>>>> <mailto:matth...@confluent.io>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Levani, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks for picking up this KIP! And thanks for summarizing >>>>>>>>>>>>>>> everything. >>>>>>>>>>>>>>>>>>>>> Even if some points may have been discussed already (can't >>>>>>>>> really >>>>>>>>>>>>>>>>>>>>> remember), it's helpful to get a good summary to refresh >>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> discussion. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I think your reasoning makes sense. With regard to the >>>>>>>>> distinction >>>>>>>>>>>>>>>>>>>>> between operators that manage topics and operators that >>>>>>>>>>>>>>>>>>>>> use >>>>>>>>>>>>>>>>> user-created >>>>>>>>>>>>>>>>>>>>> topics: Following this argument, it might indicate that >>>>>>>>> leaving >>>>>>>>>>>>>>>>>>>>> `through()` as-is (as an operator that uses use-defined >>>>>>>>> topics) and >>>>>>>>>>>>>>>>>>>>> introducing a new `repartition()` operator (an operator >>>>>>>>>>>>>>>>>>>>> that >>>>>>>>> manages >>>>>>>>>>>>>>>>>>>>> topics itself) might be good. Otherwise, there is one >>>>>>> operator >>>>>>>>>>>>>>>>>>>>> `through()` that sometimes manages topics but sometimes >>>>>>> not; a >>>>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>>>>>> name, ie, new operator would make the distinction clearer. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> About adding `numOfPartitions` to `Grouped`. I am >>>>>>>>>>>>>>>>>>>>> wondering >>>>>>>>> if the >>>>>>>>>>>>>>>>> same >>>>>>>>>>>>>>>>>>>>> argument as for `Produced` does apply and adding it is >>>>>>>>> semantically >>>>>>>>>>>>>>>>>>>>> questionable? Might be good to get opinions of others on >>>>>>>>> this, too. >>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>> am >>>>>>>>>>>>>>>>>>>>> not sure myself what solution I prefer atm. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> So far, KS uses configuration objects that allow to >>>>>>> configure >>>>>>>>> a >>>>>>>>>>>>>>>>> certain >>>>>>>>>>>>>>>>>>>>> "entity" like a consumer, producer, store. If we assume >>>>>>>>>>>>>>>>>>>>> that >>>>>>>>> a topic >>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>>> a similar entity, I am wonder if we should have a >>>>>>>>>>>>>>>>>>>>> `Topic#withNumberOfPartitions()` class and method instead >>>>>>>>>>>>>>>>>>>>> of >>>>>>>>> a plain >>>>>>>>>>>>>>>>>>>>> integer? This would allow us to add other configs, like >>>>>>>>> replication >>>>>>>>>>>>>>>>>>>>> factor, retention-time etc, easily, without the need to >>>>>>>>> change the >>>>>>>>>>>>>>>>> "main >>>>>>>>>>>>>>>>>>>>> API". >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Just want to give some ideas. Not sure if I like them >>>>>>> myself. >>>>>>>>> :) >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On 7/1/19 1:04 AM, Levani Kokhreidze wrote: >>>>>>>>>>>>>>>>>>>>>> Actually, giving it more though - maybe enhancing >>>>>>>>>>>>>>>>>>>>>> Produced >>>>>>>>> with num >>>>>>>>>>>>>>>>> of partitions configuration is not the best approach. Let me >>>>>>>>> explain >>>>>>>>>>>>>>> why: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 1) If we enhance Produced class with this configuration, >>>>>>>>> this will >>>>>>>>>>>>>>>>> also affect KStream#to operation. Since KStream#to is the >>>>>>>>>>>>>>>>> final >>>>>>>>> sink of >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> topology, for me, it seems to be reasonable assumption that >>>>>>>>>>>>>>>>> user >>>>>>>>> needs >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> manually create sink topic in advance. And in that case, >>>>>>>>>>>>>>>>> having >>>>>>>>> num of >>>>>>>>>>>>>>>>> partitions configuration doesn’t make much sense. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 2) Looking at Produced class, based on API contract, >>>>>>>>>>>>>>>>>>>>>> seems >>>>>>>>> like >>>>>>>>>>>>>>>>> Produced is designed to be something that is explicitly for >>>>>>>>> producer >>>>>>>>>>>>>>> (key >>>>>>>>>>>>>>>>> serializer, value serializer, partitioner those all are >>>>>>>>>>>>>>>>> producer >>>>>>>>>>>>>>> specific >>>>>>>>>>>>>>>>> configurations) and num of partitions is topic level >>>>>>>>> configuration. And >>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>> don’t think mixing topic and producer level configurations >>>>>>>>> together in >>>>>>>>>>>>>>> one >>>>>>>>>>>>>>>>> class is the good approach. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 3) Looking at KStream interface, seems like Produced >>>>>>>>> parameter is >>>>>>>>>>>>>>>>> for operations that work with non-internal (e.g topics created >>>>>>> and >>>>>>>>>>>>>>> managed >>>>>>>>>>>>>>>>> internally by Kafka Streams) topics and I think we should >>>>>>>>>>>>>>>>> leave >>>>>>>>> it as >>>>>>>>>>>>>>> it is >>>>>>>>>>>>>>>>> in that case. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Taking all this things into account, I think we should >>>>>>>>> distinguish >>>>>>>>>>>>>>>>> between DSL operations, where Kafka Streams should create and >>>>>>>>> manage >>>>>>>>>>>>>>>>> internal topics (KStream#groupBy) vs topics that should be >>>>>>>>> created in >>>>>>>>>>>>>>>>> advance (e.g KStream#to). >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> To sum it up, I think adding numPartitions configuration >>>>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>> Produced >>>>>>>>>>>>>>>>> will result in mixing topic and producer level configuration >>>>>>>>>>>>>>>>> in >>>>>>>>> one >>>>>>>>>>>>>>> class >>>>>>>>>>>>>>>>> and it’s gonna break existing API semantics. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Regarding making topic name optional in KStream#through >>>>>>>>>>>>>>>>>>>>>> - I >>>>>>>>> think >>>>>>>>>>>>>>>>> underline idea is very useful and giving users possibility to >>>>>>>>> specify >>>>>>>>>>>>>>> num >>>>>>>>>>>>>>>>> of partitions there is even more useful :) Considering >>>>>>>>>>>>>>>>> arguments >>>>>>>>> against >>>>>>>>>>>>>>>>> adding num of partitions in Produced class, I see two options >>>>>>>>> here: >>>>>>>>>>>>>>>>>>>>>> 1) Add following method overloads >>>>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and num of >>>>>>>>> partitions >>>>>>>>>>>>>>>>> will be taken from source topic >>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic will be auto >>>>>>>>>>>>>>>>> generated with specified num of partitions >>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final Produced<K, V> >>>>>>>>>>>>>>>>> produced) - topic will be with generated with specified num of >>>>>>>>>>>>>>> partitions >>>>>>>>>>>>>>>>> and configuration taken from produced parameter. >>>>>>>>>>>>>>>>>>>>>> 2) Leave KStream#through as it is and introduce new >>>>>>>>>>>>>>>>>>>>>> method >>>>>>> - >>>>>>>>>>>>>>>>> KStream#repartition (I think Matthias suggested this in one of >>>>>>> the >>>>>>>>>>>>>>> threads) >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Considering all mentioned above I propose the following >>>>>>> plan: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Option A: >>>>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is >>>>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to Grouped class >>>>>>>>>>>>>>>>>>>>>> (as >>>>>>>>>>>>>>>>> mentioned in the KIP) >>>>>>>>>>>>>>>>>>>>>> 3) Add following method overloads to KStream#through >>>>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and num of >>>>>>>>> partitions >>>>>>>>>>>>>>>>> will be taken from source topic >>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic will be auto >>>>>>>>>>>>>>>>> generated with specified num of partitions >>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final Produced<K, V> >>>>>>>>>>>>>>>>> produced) - topic will be with generated with specified num of >>>>>>>>>>>>>>> partitions >>>>>>>>>>>>>>>>> and configuration taken from produced parameter. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Option B: >>>>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is >>>>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to Grouped class >>>>>>>>>>>>>>>>>>>>>> (as >>>>>>>>>>>>>>>>> mentioned in the KIP) >>>>>>>>>>>>>>>>>>>>>> 3) Add new operator KStream#repartition for creating and >>>>>>>>> managing >>>>>>>>>>>>>>>>> internal repartition topics >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> P.S. I’m sorry if all of this was already discussed in >>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>> mailing >>>>>>>>>>>>>>>>> list, but I kinda got with all the threads that were about >>>>>>>>>>>>>>>>> this >>>>>>>>> KIP :( >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Jul 1, 2019, at 9:56 AM, Levani Kokhreidze < >>>>>>>>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I would like to resurrect discussion around KIP-221. >>>>>>>>>>>>>>>>>>>>>>> Going >>>>>>>>> through >>>>>>>>>>>>>>>>> the discussion thread, there’s seems to agreement around >>>>>>>>> usefulness of >>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>> feature. >>>>>>>>>>>>>>>>>>>>>>> Regarding the implementation, as far as I understood, >>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>> most >>>>>>>>>>>>>>>>> optimal solution for me seems the following: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> 1) Add two method overloads to KStream#through method >>>>>>>>> (essentially >>>>>>>>>>>>>>>>> making topic name optional) >>>>>>>>>>>>>>>>>>>>>>> 2) Enhance Produced class with numOfPartitions >>>>>>> configuration >>>>>>>>>>>>>>> field. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Those two changes will allow DSL users to control >>>>>>>>> parallelism and >>>>>>>>>>>>>>>>> trigger re-partition without doing stateful operations. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I will update KIP with interface changes around >>>>>>>>> KStream#through if >>>>>>>>>>>>>>>>> this changes sound sensible. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>>> >>> >> >> >