Hello, One more bump about KIP-221 (https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint <https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint>) so it doesn’t get lost in mailing list :) Would love to hear communities opinions/concerns about this KIP.
Regards, Levani > On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze <levani.co...@gmail.com> wrote: > > Hello, > > Kind reminder about this KIP: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint> > > Regards, > Levani > >> On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze <levani.co...@gmail.com >> <mailto:levani.co...@gmail.com>> wrote: >> >> Hello, >> >> In order to move this KIP forward, I’ve updated confluence page with the new >> proposal >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >> >> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint> >> I’ve also filled “Rejected Alternatives” section. >> >> Looking forward to discuss this KIP :) >> >> King regards, >> Levani >> >> >>> On Jul 3, 2019, at 1:08 PM, Levani Kokhreidze <levani.co...@gmail.com >>> <mailto:levani.co...@gmail.com>> wrote: >>> >>> Hello Matthias, >>> >>> Thanks for the feedback and ideas. >>> I like the idea of introducing dedicated `Topic` class for topic >>> configuration for internal operators like `groupedBy`. >>> Would be great to hear others opinion about this as well. >>> >>> Kind regards, >>> Levani >>> >>> >>>> On Jul 3, 2019, at 7:00 AM, Matthias J. Sax <matth...@confluent.io >>>> <mailto:matth...@confluent.io>> wrote: >>>> >>>> Levani, >>>> >>>> Thanks for picking up this KIP! And thanks for summarizing everything. >>>> Even if some points may have been discussed already (can't really >>>> remember), it's helpful to get a good summary to refresh the discussion. >>>> >>>> I think your reasoning makes sense. With regard to the distinction >>>> between operators that manage topics and operators that use user-created >>>> topics: Following this argument, it might indicate that leaving >>>> `through()` as-is (as an operator that uses use-defined topics) and >>>> introducing a new `repartition()` operator (an operator that manages >>>> topics itself) might be good. Otherwise, there is one operator >>>> `through()` that sometimes manages topics but sometimes not; a different >>>> name, ie, new operator would make the distinction clearer. >>>> >>>> About adding `numOfPartitions` to `Grouped`. I am wondering if the same >>>> argument as for `Produced` does apply and adding it is semantically >>>> questionable? Might be good to get opinions of others on this, too. I am >>>> not sure myself what solution I prefer atm. >>>> >>>> So far, KS uses configuration objects that allow to configure a certain >>>> "entity" like a consumer, producer, store. If we assume that a topic is >>>> a similar entity, I am wonder if we should have a >>>> `Topic#withNumberOfPartitions()` class and method instead of a plain >>>> integer? This would allow us to add other configs, like replication >>>> factor, retention-time etc, easily, without the need to change the "main >>>> API". >>>> >>>> Just want to give some ideas. Not sure if I like them myself. :) >>>> >>>> >>>> -Matthias >>>> >>>> >>>> >>>> On 7/1/19 1:04 AM, Levani Kokhreidze wrote: >>>>> Actually, giving it more though - maybe enhancing Produced with num of >>>>> partitions configuration is not the best approach. Let me explain why: >>>>> >>>>> 1) If we enhance Produced class with this configuration, this will also >>>>> affect KStream#to operation. Since KStream#to is the final sink of the >>>>> topology, for me, it seems to be reasonable assumption that user needs to >>>>> manually create sink topic in advance. And in that case, having num of >>>>> partitions configuration doesn’t make much sense. >>>>> >>>>> 2) Looking at Produced class, based on API contract, seems like Produced >>>>> is designed to be something that is explicitly for producer (key >>>>> serializer, value serializer, partitioner those all are producer specific >>>>> configurations) and num of partitions is topic level configuration. And I >>>>> don’t think mixing topic and producer level configurations together in >>>>> one class is the good approach. >>>>> >>>>> 3) Looking at KStream interface, seems like Produced parameter is for >>>>> operations that work with non-internal (e.g topics created and managed >>>>> internally by Kafka Streams) topics and I think we should leave it as it >>>>> is in that case. >>>>> >>>>> Taking all this things into account, I think we should distinguish >>>>> between DSL operations, where Kafka Streams should create and manage >>>>> internal topics (KStream#groupBy) vs topics that should be created in >>>>> advance (e.g KStream#to). >>>>> >>>>> To sum it up, I think adding numPartitions configuration in Produced will >>>>> result in mixing topic and producer level configuration in one class and >>>>> it’s gonna break existing API semantics. >>>>> >>>>> Regarding making topic name optional in KStream#through - I think >>>>> underline idea is very useful and giving users possibility to specify num >>>>> of partitions there is even more useful :) Considering arguments against >>>>> adding num of partitions in Produced class, I see two options here: >>>>> 1) Add following method overloads >>>>> * through() - topic will be auto-generated and num of partitions will >>>>> be taken from source topic >>>>> * through(final int numOfPartitions) - topic will be auto generated >>>>> with specified num of partitions >>>>> * through(final int numOfPartitions, final Produced<K, V> produced) - >>>>> topic will be with generated with specified num of partitions and >>>>> configuration taken from produced parameter. >>>>> 2) Leave KStream#through as it is and introduce new method - >>>>> KStream#repartition (I think Matthias suggested this in one of the >>>>> threads) >>>>> >>>>> Considering all mentioned above I propose the following plan: >>>>> >>>>> Option A: >>>>> 1) Leave Produced as it is >>>>> 2) Add num of partitions configuration to Grouped class (as mentioned in >>>>> the KIP) >>>>> 3) Add following method overloads to KStream#through >>>>> * through() - topic will be auto-generated and num of partitions will >>>>> be taken from source topic >>>>> * through(final int numOfPartitions) - topic will be auto generated >>>>> with specified num of partitions >>>>> * through(final int numOfPartitions, final Produced<K, V> produced) - >>>>> topic will be with generated with specified num of partitions and >>>>> configuration taken from produced parameter. >>>>> >>>>> Option B: >>>>> 1) Leave Produced as it is >>>>> 2) Add num of partitions configuration to Grouped class (as mentioned in >>>>> the KIP) >>>>> 3) Add new operator KStream#repartition for creating and managing >>>>> internal repartition topics >>>>> >>>>> P.S. I’m sorry if all of this was already discussed in the mailing list, >>>>> but I kinda got with all the threads that were about this KIP :( >>>>> >>>>> Kind regards, >>>>> Levani >>>>> >>>>>> On Jul 1, 2019, at 9:56 AM, Levani Kokhreidze <levani.co...@gmail.com >>>>>> <mailto:levani.co...@gmail.com>> wrote: >>>>>> >>>>>> Hello, >>>>>> >>>>>> I would like to resurrect discussion around KIP-221. Going through the >>>>>> discussion thread, there’s seems to agreement around usefulness of this >>>>>> feature. >>>>>> Regarding the implementation, as far as I understood, the most optimal >>>>>> solution for me seems the following: >>>>>> >>>>>> 1) Add two method overloads to KStream#through method (essentially >>>>>> making topic name optional) >>>>>> 2) Enhance Produced class with numOfPartitions configuration field. >>>>>> >>>>>> Those two changes will allow DSL users to control parallelism and >>>>>> trigger re-partition without doing stateful operations. >>>>>> >>>>>> I will update KIP with interface changes around KStream#through if this >>>>>> changes sound sensible. >>>>>> >>>>>> Kind regards, >>>>>> Levani >>>>> >>>>> >>>> >>> >> >