Hey Levani, Thanks for the KIP! Can you clarify one thing for me -- for the KStream#repartition signature taking a Repartitioned, will the topic be auto-created by Streams (which seems to be the case for the signature without a Repartitioned) or does it have to be pre-created? The wording in the KIP makes it seem like one version of the method will auto-create topics while the other will not.
Cheers, Sophie On Wed, Jul 17, 2019 at 10:15 AM Levani Kokhreidze <levani.co...@gmail.com> wrote: > Hello, > > One more bump about KIP-221 ( > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint > < > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint>) > so it doesn’t get lost in mailing list :) > Would love to hear communities opinions/concerns about this KIP. > > Regards, > Levani > > > > On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze <levani.co...@gmail.com> > wrote: > > > > Hello, > > > > Kind reminder about this KIP: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint > < > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint > > > > > > Regards, > > Levani > > > >> On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze <levani.co...@gmail.com > <mailto:levani.co...@gmail.com>> wrote: > >> > >> Hello, > >> > >> In order to move this KIP forward, I’ve updated confluence page with > the new proposal > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint > < > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint > > > >> I’ve also filled “Rejected Alternatives” section. > >> > >> Looking forward to discuss this KIP :) > >> > >> King regards, > >> Levani > >> > >> > >>> On Jul 3, 2019, at 1:08 PM, Levani Kokhreidze <levani.co...@gmail.com > <mailto:levani.co...@gmail.com>> wrote: > >>> > >>> Hello Matthias, > >>> > >>> Thanks for the feedback and ideas. > >>> I like the idea of introducing dedicated `Topic` class for topic > configuration for internal operators like `groupedBy`. > >>> Would be great to hear others opinion about this as well. > >>> > >>> Kind regards, > >>> Levani > >>> > >>> > >>>> On Jul 3, 2019, at 7:00 AM, Matthias J. Sax <matth...@confluent.io > <mailto:matth...@confluent.io>> wrote: > >>>> > >>>> Levani, > >>>> > >>>> Thanks for picking up this KIP! And thanks for summarizing everything. > >>>> Even if some points may have been discussed already (can't really > >>>> remember), it's helpful to get a good summary to refresh the > discussion. > >>>> > >>>> I think your reasoning makes sense. With regard to the distinction > >>>> between operators that manage topics and operators that use > user-created > >>>> topics: Following this argument, it might indicate that leaving > >>>> `through()` as-is (as an operator that uses use-defined topics) and > >>>> introducing a new `repartition()` operator (an operator that manages > >>>> topics itself) might be good. Otherwise, there is one operator > >>>> `through()` that sometimes manages topics but sometimes not; a > different > >>>> name, ie, new operator would make the distinction clearer. > >>>> > >>>> About adding `numOfPartitions` to `Grouped`. I am wondering if the > same > >>>> argument as for `Produced` does apply and adding it is semantically > >>>> questionable? Might be good to get opinions of others on this, too. I > am > >>>> not sure myself what solution I prefer atm. > >>>> > >>>> So far, KS uses configuration objects that allow to configure a > certain > >>>> "entity" like a consumer, producer, store. If we assume that a topic > is > >>>> a similar entity, I am wonder if we should have a > >>>> `Topic#withNumberOfPartitions()` class and method instead of a plain > >>>> integer? This would allow us to add other configs, like replication > >>>> factor, retention-time etc, easily, without the need to change the > "main > >>>> API". > >>>> > >>>> Just want to give some ideas. Not sure if I like them myself. :) > >>>> > >>>> > >>>> -Matthias > >>>> > >>>> > >>>> > >>>> On 7/1/19 1:04 AM, Levani Kokhreidze wrote: > >>>>> Actually, giving it more though - maybe enhancing Produced with num > of partitions configuration is not the best approach. Let me explain why: > >>>>> > >>>>> 1) If we enhance Produced class with this configuration, this will > also affect KStream#to operation. Since KStream#to is the final sink of the > topology, for me, it seems to be reasonable assumption that user needs to > manually create sink topic in advance. And in that case, having num of > partitions configuration doesn’t make much sense. > >>>>> > >>>>> 2) Looking at Produced class, based on API contract, seems like > Produced is designed to be something that is explicitly for producer (key > serializer, value serializer, partitioner those all are producer specific > configurations) and num of partitions is topic level configuration. And I > don’t think mixing topic and producer level configurations together in one > class is the good approach. > >>>>> > >>>>> 3) Looking at KStream interface, seems like Produced parameter is > for operations that work with non-internal (e.g topics created and managed > internally by Kafka Streams) topics and I think we should leave it as it is > in that case. > >>>>> > >>>>> Taking all this things into account, I think we should distinguish > between DSL operations, where Kafka Streams should create and manage > internal topics (KStream#groupBy) vs topics that should be created in > advance (e.g KStream#to). > >>>>> > >>>>> To sum it up, I think adding numPartitions configuration in Produced > will result in mixing topic and producer level configuration in one class > and it’s gonna break existing API semantics. > >>>>> > >>>>> Regarding making topic name optional in KStream#through - I think > underline idea is very useful and giving users possibility to specify num > of partitions there is even more useful :) Considering arguments against > adding num of partitions in Produced class, I see two options here: > >>>>> 1) Add following method overloads > >>>>> * through() - topic will be auto-generated and num of partitions > will be taken from source topic > >>>>> * through(final int numOfPartitions) - topic will be auto > generated with specified num of partitions > >>>>> * through(final int numOfPartitions, final Produced<K, V> > produced) - topic will be with generated with specified num of partitions > and configuration taken from produced parameter. > >>>>> 2) Leave KStream#through as it is and introduce new method - > KStream#repartition (I think Matthias suggested this in one of the threads) > >>>>> > >>>>> Considering all mentioned above I propose the following plan: > >>>>> > >>>>> Option A: > >>>>> 1) Leave Produced as it is > >>>>> 2) Add num of partitions configuration to Grouped class (as > mentioned in the KIP) > >>>>> 3) Add following method overloads to KStream#through > >>>>> * through() - topic will be auto-generated and num of partitions > will be taken from source topic > >>>>> * through(final int numOfPartitions) - topic will be auto > generated with specified num of partitions > >>>>> * through(final int numOfPartitions, final Produced<K, V> > produced) - topic will be with generated with specified num of partitions > and configuration taken from produced parameter. > >>>>> > >>>>> Option B: > >>>>> 1) Leave Produced as it is > >>>>> 2) Add num of partitions configuration to Grouped class (as > mentioned in the KIP) > >>>>> 3) Add new operator KStream#repartition for creating and managing > internal repartition topics > >>>>> > >>>>> P.S. I’m sorry if all of this was already discussed in the mailing > list, but I kinda got with all the threads that were about this KIP :( > >>>>> > >>>>> Kind regards, > >>>>> Levani > >>>>> > >>>>>> On Jul 1, 2019, at 9:56 AM, Levani Kokhreidze < > levani.co...@gmail.com <mailto:levani.co...@gmail.com>> wrote: > >>>>>> > >>>>>> Hello, > >>>>>> > >>>>>> I would like to resurrect discussion around KIP-221. Going through > the discussion thread, there’s seems to agreement around usefulness of this > feature. > >>>>>> Regarding the implementation, as far as I understood, the most > optimal solution for me seems the following: > >>>>>> > >>>>>> 1) Add two method overloads to KStream#through method (essentially > making topic name optional) > >>>>>> 2) Enhance Produced class with numOfPartitions configuration field. > >>>>>> > >>>>>> Those two changes will allow DSL users to control parallelism and > trigger re-partition without doing stateful operations. > >>>>>> > >>>>>> I will update KIP with interface changes around KStream#through if > this changes sound sensible. > >>>>>> > >>>>>> Kind regards, > >>>>>> Levani > >>>>> > >>>>> > >>>> > >>> > >> > > > >