Thanks for clearing that up. I agree that Repartitioned would be a useful addition. I'm wondering if it might also need to have a withStreamPartitioner method/field, similar to Produced? I'm not sure how widely this feature is really used, but seems it should be available for repartition topics.
On Wed, Jul 17, 2019 at 11:26 AM Levani Kokhreidze <levani.co...@gmail.com> wrote: > Hey Sophie, > > In both cases KStream#repartition and KStream#repartition(Repartitioned) > topic will be created and managed by Kafka Streams. > Idea of Repartitioned is to give user more control over the topic such as > num of partitions. > I feel like Repartitioned parameter is something that is missing in > current DSL design. > Essentially giving user control over parallelism by configuring num of > partitions for internal topics. > > Hope this answers your question. > > Regards, > Levani > > > On Jul 17, 2019, at 9:02 PM, Sophie Blee-Goldman <sop...@confluent.io> > wrote: > > > > Hey Levani, > > > > Thanks for the KIP! Can you clarify one thing for me -- for the > > KStream#repartition signature taking a Repartitioned, will the topic be > > auto-created by Streams (which seems to be the case for the signature > > without a Repartitioned) or does it have to be pre-created? The wording > in > > the KIP makes it seem like one version of the method will auto-create > > topics while the other will not. > > > > Cheers, > > Sophie > > > > On Wed, Jul 17, 2019 at 10:15 AM Levani Kokhreidze < > levani.co...@gmail.com> > > wrote: > > > >> Hello, > >> > >> One more bump about KIP-221 ( > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint > >> < > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint > >) > >> so it doesn’t get lost in mailing list :) > >> Would love to hear communities opinions/concerns about this KIP. > >> > >> Regards, > >> Levani > >> > >> > >>> On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze <levani.co...@gmail.com > > > >> wrote: > >>> > >>> Hello, > >>> > >>> Kind reminder about this KIP: > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint > >> < > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint > >>> > >>> > >>> Regards, > >>> Levani > >>> > >>>> On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze < > levani.co...@gmail.com > >> <mailto:levani.co...@gmail.com>> wrote: > >>>> > >>>> Hello, > >>>> > >>>> In order to move this KIP forward, I’ve updated confluence page with > >> the new proposal > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint > >> < > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint > >>> > >>>> I’ve also filled “Rejected Alternatives” section. > >>>> > >>>> Looking forward to discuss this KIP :) > >>>> > >>>> King regards, > >>>> Levani > >>>> > >>>> > >>>>> On Jul 3, 2019, at 1:08 PM, Levani Kokhreidze < > levani.co...@gmail.com > >> <mailto:levani.co...@gmail.com>> wrote: > >>>>> > >>>>> Hello Matthias, > >>>>> > >>>>> Thanks for the feedback and ideas. > >>>>> I like the idea of introducing dedicated `Topic` class for topic > >> configuration for internal operators like `groupedBy`. > >>>>> Would be great to hear others opinion about this as well. > >>>>> > >>>>> Kind regards, > >>>>> Levani > >>>>> > >>>>> > >>>>>> On Jul 3, 2019, at 7:00 AM, Matthias J. Sax <matth...@confluent.io > >> <mailto:matth...@confluent.io>> wrote: > >>>>>> > >>>>>> Levani, > >>>>>> > >>>>>> Thanks for picking up this KIP! And thanks for summarizing > everything. > >>>>>> Even if some points may have been discussed already (can't really > >>>>>> remember), it's helpful to get a good summary to refresh the > >> discussion. > >>>>>> > >>>>>> I think your reasoning makes sense. With regard to the distinction > >>>>>> between operators that manage topics and operators that use > >> user-created > >>>>>> topics: Following this argument, it might indicate that leaving > >>>>>> `through()` as-is (as an operator that uses use-defined topics) and > >>>>>> introducing a new `repartition()` operator (an operator that manages > >>>>>> topics itself) might be good. Otherwise, there is one operator > >>>>>> `through()` that sometimes manages topics but sometimes not; a > >> different > >>>>>> name, ie, new operator would make the distinction clearer. > >>>>>> > >>>>>> About adding `numOfPartitions` to `Grouped`. I am wondering if the > >> same > >>>>>> argument as for `Produced` does apply and adding it is semantically > >>>>>> questionable? Might be good to get opinions of others on this, too. > I > >> am > >>>>>> not sure myself what solution I prefer atm. > >>>>>> > >>>>>> So far, KS uses configuration objects that allow to configure a > >> certain > >>>>>> "entity" like a consumer, producer, store. If we assume that a topic > >> is > >>>>>> a similar entity, I am wonder if we should have a > >>>>>> `Topic#withNumberOfPartitions()` class and method instead of a plain > >>>>>> integer? This would allow us to add other configs, like replication > >>>>>> factor, retention-time etc, easily, without the need to change the > >> "main > >>>>>> API". > >>>>>> > >>>>>> Just want to give some ideas. Not sure if I like them myself. :) > >>>>>> > >>>>>> > >>>>>> -Matthias > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 7/1/19 1:04 AM, Levani Kokhreidze wrote: > >>>>>>> Actually, giving it more though - maybe enhancing Produced with num > >> of partitions configuration is not the best approach. Let me explain > why: > >>>>>>> > >>>>>>> 1) If we enhance Produced class with this configuration, this will > >> also affect KStream#to operation. Since KStream#to is the final sink of > the > >> topology, for me, it seems to be reasonable assumption that user needs > to > >> manually create sink topic in advance. And in that case, having num of > >> partitions configuration doesn’t make much sense. > >>>>>>> > >>>>>>> 2) Looking at Produced class, based on API contract, seems like > >> Produced is designed to be something that is explicitly for producer > (key > >> serializer, value serializer, partitioner those all are producer > specific > >> configurations) and num of partitions is topic level configuration. And > I > >> don’t think mixing topic and producer level configurations together in > one > >> class is the good approach. > >>>>>>> > >>>>>>> 3) Looking at KStream interface, seems like Produced parameter is > >> for operations that work with non-internal (e.g topics created and > managed > >> internally by Kafka Streams) topics and I think we should leave it as > it is > >> in that case. > >>>>>>> > >>>>>>> Taking all this things into account, I think we should distinguish > >> between DSL operations, where Kafka Streams should create and manage > >> internal topics (KStream#groupBy) vs topics that should be created in > >> advance (e.g KStream#to). > >>>>>>> > >>>>>>> To sum it up, I think adding numPartitions configuration in > Produced > >> will result in mixing topic and producer level configuration in one > class > >> and it’s gonna break existing API semantics. > >>>>>>> > >>>>>>> Regarding making topic name optional in KStream#through - I think > >> underline idea is very useful and giving users possibility to specify > num > >> of partitions there is even more useful :) Considering arguments against > >> adding num of partitions in Produced class, I see two options here: > >>>>>>> 1) Add following method overloads > >>>>>>> * through() - topic will be auto-generated and num of partitions > >> will be taken from source topic > >>>>>>> * through(final int numOfPartitions) - topic will be auto > >> generated with specified num of partitions > >>>>>>> * through(final int numOfPartitions, final Produced<K, V> > >> produced) - topic will be with generated with specified num of > partitions > >> and configuration taken from produced parameter. > >>>>>>> 2) Leave KStream#through as it is and introduce new method - > >> KStream#repartition (I think Matthias suggested this in one of the > threads) > >>>>>>> > >>>>>>> Considering all mentioned above I propose the following plan: > >>>>>>> > >>>>>>> Option A: > >>>>>>> 1) Leave Produced as it is > >>>>>>> 2) Add num of partitions configuration to Grouped class (as > >> mentioned in the KIP) > >>>>>>> 3) Add following method overloads to KStream#through > >>>>>>> * through() - topic will be auto-generated and num of partitions > >> will be taken from source topic > >>>>>>> * through(final int numOfPartitions) - topic will be auto > >> generated with specified num of partitions > >>>>>>> * through(final int numOfPartitions, final Produced<K, V> > >> produced) - topic will be with generated with specified num of > partitions > >> and configuration taken from produced parameter. > >>>>>>> > >>>>>>> Option B: > >>>>>>> 1) Leave Produced as it is > >>>>>>> 2) Add num of partitions configuration to Grouped class (as > >> mentioned in the KIP) > >>>>>>> 3) Add new operator KStream#repartition for creating and managing > >> internal repartition topics > >>>>>>> > >>>>>>> P.S. I’m sorry if all of this was already discussed in the mailing > >> list, but I kinda got with all the threads that were about this KIP :( > >>>>>>> > >>>>>>> Kind regards, > >>>>>>> Levani > >>>>>>> > >>>>>>>> On Jul 1, 2019, at 9:56 AM, Levani Kokhreidze < > >> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> wrote: > >>>>>>>> > >>>>>>>> Hello, > >>>>>>>> > >>>>>>>> I would like to resurrect discussion around KIP-221. Going through > >> the discussion thread, there’s seems to agreement around usefulness of > this > >> feature. > >>>>>>>> Regarding the implementation, as far as I understood, the most > >> optimal solution for me seems the following: > >>>>>>>> > >>>>>>>> 1) Add two method overloads to KStream#through method (essentially > >> making topic name optional) > >>>>>>>> 2) Enhance Produced class with numOfPartitions configuration > field. > >>>>>>>> > >>>>>>>> Those two changes will allow DSL users to control parallelism and > >> trigger re-partition without doing stateful operations. > >>>>>>>> > >>>>>>>> I will update KIP with interface changes around KStream#through if > >> this changes sound sensible. > >>>>>>>> > >>>>>>>> Kind regards, > >>>>>>>> Levani > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > >> > >