Yes, I was thinking about it as well. To be honest I’m not sure about it yet. As Kafka Streams DSL user, I don’t really think I would need control over partitioner for internal topics. As a user, I would assume that Kafka Streams knows best how to partition data for internal topics. In this KIP I wrote that Produced should be used only for topics that are created by user In advance. In those cases maybe it make sense to have possibility to specify the partitioner. I don’t have clear answer on that yet, but I guess specifying the partitioner can be added as well if there’s agreement on this.
Regards, Levani > On Jul 17, 2019, at 10:42 PM, Sophie Blee-Goldman <sop...@confluent.io> wrote: > > Thanks for clearing that up. I agree that Repartitioned would be a useful > addition. I'm wondering if it might also need to have > a withStreamPartitioner method/field, similar to Produced? I'm not sure how > widely this feature is really used, but seems it should be available for > repartition topics. > > On Wed, Jul 17, 2019 at 11:26 AM Levani Kokhreidze <levani.co...@gmail.com> > wrote: > >> Hey Sophie, >> >> In both cases KStream#repartition and KStream#repartition(Repartitioned) >> topic will be created and managed by Kafka Streams. >> Idea of Repartitioned is to give user more control over the topic such as >> num of partitions. >> I feel like Repartitioned parameter is something that is missing in >> current DSL design. >> Essentially giving user control over parallelism by configuring num of >> partitions for internal topics. >> >> Hope this answers your question. >> >> Regards, >> Levani >> >>> On Jul 17, 2019, at 9:02 PM, Sophie Blee-Goldman <sop...@confluent.io> >> wrote: >>> >>> Hey Levani, >>> >>> Thanks for the KIP! Can you clarify one thing for me -- for the >>> KStream#repartition signature taking a Repartitioned, will the topic be >>> auto-created by Streams (which seems to be the case for the signature >>> without a Repartitioned) or does it have to be pre-created? The wording >> in >>> the KIP makes it seem like one version of the method will auto-create >>> topics while the other will not. >>> >>> Cheers, >>> Sophie >>> >>> On Wed, Jul 17, 2019 at 10:15 AM Levani Kokhreidze < >> levani.co...@gmail.com> >>> wrote: >>> >>>> Hello, >>>> >>>> One more bump about KIP-221 ( >>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>> < >>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>> ) >>>> so it doesn’t get lost in mailing list :) >>>> Would love to hear communities opinions/concerns about this KIP. >>>> >>>> Regards, >>>> Levani >>>> >>>> >>>>> On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze <levani.co...@gmail.com >>> >>>> wrote: >>>>> >>>>> Hello, >>>>> >>>>> Kind reminder about this KIP: >>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>> < >>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>> >>>>> >>>>> Regards, >>>>> Levani >>>>> >>>>>> On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze < >> levani.co...@gmail.com >>>> <mailto:levani.co...@gmail.com>> wrote: >>>>>> >>>>>> Hello, >>>>>> >>>>>> In order to move this KIP forward, I’ve updated confluence page with >>>> the new proposal >>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>> < >>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>> >>>>>> I’ve also filled “Rejected Alternatives” section. >>>>>> >>>>>> Looking forward to discuss this KIP :) >>>>>> >>>>>> King regards, >>>>>> Levani >>>>>> >>>>>> >>>>>>> On Jul 3, 2019, at 1:08 PM, Levani Kokhreidze < >> levani.co...@gmail.com >>>> <mailto:levani.co...@gmail.com>> wrote: >>>>>>> >>>>>>> Hello Matthias, >>>>>>> >>>>>>> Thanks for the feedback and ideas. >>>>>>> I like the idea of introducing dedicated `Topic` class for topic >>>> configuration for internal operators like `groupedBy`. >>>>>>> Would be great to hear others opinion about this as well. >>>>>>> >>>>>>> Kind regards, >>>>>>> Levani >>>>>>> >>>>>>> >>>>>>>> On Jul 3, 2019, at 7:00 AM, Matthias J. Sax <matth...@confluent.io >>>> <mailto:matth...@confluent.io>> wrote: >>>>>>>> >>>>>>>> Levani, >>>>>>>> >>>>>>>> Thanks for picking up this KIP! And thanks for summarizing >> everything. >>>>>>>> Even if some points may have been discussed already (can't really >>>>>>>> remember), it's helpful to get a good summary to refresh the >>>> discussion. >>>>>>>> >>>>>>>> I think your reasoning makes sense. With regard to the distinction >>>>>>>> between operators that manage topics and operators that use >>>> user-created >>>>>>>> topics: Following this argument, it might indicate that leaving >>>>>>>> `through()` as-is (as an operator that uses use-defined topics) and >>>>>>>> introducing a new `repartition()` operator (an operator that manages >>>>>>>> topics itself) might be good. Otherwise, there is one operator >>>>>>>> `through()` that sometimes manages topics but sometimes not; a >>>> different >>>>>>>> name, ie, new operator would make the distinction clearer. >>>>>>>> >>>>>>>> About adding `numOfPartitions` to `Grouped`. I am wondering if the >>>> same >>>>>>>> argument as for `Produced` does apply and adding it is semantically >>>>>>>> questionable? Might be good to get opinions of others on this, too. >> I >>>> am >>>>>>>> not sure myself what solution I prefer atm. >>>>>>>> >>>>>>>> So far, KS uses configuration objects that allow to configure a >>>> certain >>>>>>>> "entity" like a consumer, producer, store. If we assume that a topic >>>> is >>>>>>>> a similar entity, I am wonder if we should have a >>>>>>>> `Topic#withNumberOfPartitions()` class and method instead of a plain >>>>>>>> integer? This would allow us to add other configs, like replication >>>>>>>> factor, retention-time etc, easily, without the need to change the >>>> "main >>>>>>>> API". >>>>>>>> >>>>>>>> Just want to give some ideas. Not sure if I like them myself. :) >>>>>>>> >>>>>>>> >>>>>>>> -Matthias >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 7/1/19 1:04 AM, Levani Kokhreidze wrote: >>>>>>>>> Actually, giving it more though - maybe enhancing Produced with num >>>> of partitions configuration is not the best approach. Let me explain >> why: >>>>>>>>> >>>>>>>>> 1) If we enhance Produced class with this configuration, this will >>>> also affect KStream#to operation. Since KStream#to is the final sink of >> the >>>> topology, for me, it seems to be reasonable assumption that user needs >> to >>>> manually create sink topic in advance. And in that case, having num of >>>> partitions configuration doesn’t make much sense. >>>>>>>>> >>>>>>>>> 2) Looking at Produced class, based on API contract, seems like >>>> Produced is designed to be something that is explicitly for producer >> (key >>>> serializer, value serializer, partitioner those all are producer >> specific >>>> configurations) and num of partitions is topic level configuration. And >> I >>>> don’t think mixing topic and producer level configurations together in >> one >>>> class is the good approach. >>>>>>>>> >>>>>>>>> 3) Looking at KStream interface, seems like Produced parameter is >>>> for operations that work with non-internal (e.g topics created and >> managed >>>> internally by Kafka Streams) topics and I think we should leave it as >> it is >>>> in that case. >>>>>>>>> >>>>>>>>> Taking all this things into account, I think we should distinguish >>>> between DSL operations, where Kafka Streams should create and manage >>>> internal topics (KStream#groupBy) vs topics that should be created in >>>> advance (e.g KStream#to). >>>>>>>>> >>>>>>>>> To sum it up, I think adding numPartitions configuration in >> Produced >>>> will result in mixing topic and producer level configuration in one >> class >>>> and it’s gonna break existing API semantics. >>>>>>>>> >>>>>>>>> Regarding making topic name optional in KStream#through - I think >>>> underline idea is very useful and giving users possibility to specify >> num >>>> of partitions there is even more useful :) Considering arguments against >>>> adding num of partitions in Produced class, I see two options here: >>>>>>>>> 1) Add following method overloads >>>>>>>>> * through() - topic will be auto-generated and num of partitions >>>> will be taken from source topic >>>>>>>>> * through(final int numOfPartitions) - topic will be auto >>>> generated with specified num of partitions >>>>>>>>> * through(final int numOfPartitions, final Produced<K, V> >>>> produced) - topic will be with generated with specified num of >> partitions >>>> and configuration taken from produced parameter. >>>>>>>>> 2) Leave KStream#through as it is and introduce new method - >>>> KStream#repartition (I think Matthias suggested this in one of the >> threads) >>>>>>>>> >>>>>>>>> Considering all mentioned above I propose the following plan: >>>>>>>>> >>>>>>>>> Option A: >>>>>>>>> 1) Leave Produced as it is >>>>>>>>> 2) Add num of partitions configuration to Grouped class (as >>>> mentioned in the KIP) >>>>>>>>> 3) Add following method overloads to KStream#through >>>>>>>>> * through() - topic will be auto-generated and num of partitions >>>> will be taken from source topic >>>>>>>>> * through(final int numOfPartitions) - topic will be auto >>>> generated with specified num of partitions >>>>>>>>> * through(final int numOfPartitions, final Produced<K, V> >>>> produced) - topic will be with generated with specified num of >> partitions >>>> and configuration taken from produced parameter. >>>>>>>>> >>>>>>>>> Option B: >>>>>>>>> 1) Leave Produced as it is >>>>>>>>> 2) Add num of partitions configuration to Grouped class (as >>>> mentioned in the KIP) >>>>>>>>> 3) Add new operator KStream#repartition for creating and managing >>>> internal repartition topics >>>>>>>>> >>>>>>>>> P.S. I’m sorry if all of this was already discussed in the mailing >>>> list, but I kinda got with all the threads that were about this KIP :( >>>>>>>>> >>>>>>>>> Kind regards, >>>>>>>>> Levani >>>>>>>>> >>>>>>>>>> On Jul 1, 2019, at 9:56 AM, Levani Kokhreidze < >>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> I would like to resurrect discussion around KIP-221. Going through >>>> the discussion thread, there’s seems to agreement around usefulness of >> this >>>> feature. >>>>>>>>>> Regarding the implementation, as far as I understood, the most >>>> optimal solution for me seems the following: >>>>>>>>>> >>>>>>>>>> 1) Add two method overloads to KStream#through method (essentially >>>> making topic name optional) >>>>>>>>>> 2) Enhance Produced class with numOfPartitions configuration >> field. >>>>>>>>>> >>>>>>>>>> Those two changes will allow DSL users to control parallelism and >>>> trigger re-partition without doing stateful operations. >>>>>>>>>> >>>>>>>>>> I will update KIP with interface changes around KStream#through if >>>> this changes sound sensible. >>>>>>>>>> >>>>>>>>>> Kind regards, >>>>>>>>>> Levani >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >> >>