Hi all, Here’s voting thread for this KIP: https://www.mail-archive.com/dev@kafka.apache.org/msg99680.html <https://www.mail-archive.com/dev@kafka.apache.org/msg99680.html>
Regards, Levani > On Jul 24, 2019, at 11:15 PM, Levani Kokhreidze <levani.co...@gmail.com> > wrote: > > Hi Matthias, > > Thanks for the suggestion. I Don’t have strong opinion on that one. > Agree that avoiding unnecessary method overloads is a good idea. > > Updated KIP > > Regards, > Levani > > >> On Jul 24, 2019, at 8:50 PM, Matthias J. Sax <matth...@confluent.io> wrote: >> >> One question: >> >> Why do we add >> >>> Repartitioned#with(final String name, final int numberOfPartitions) >> >> It seems that `#with(String name)`, `#numberOfPartitions(int)` in >> combination with `withName()` and `withNumberOfPartitions()` should be >> sufficient. Users can chain the method calls. >> >> (I think it's valuable to keep the number of overload small if possible.) >> >> Otherwise LGTM. >> >> >> -Matthias >> >> >> On 7/23/19 2:18 PM, Levani Kokhreidze wrote: >>> Hello, >>> >>> Thanks all for your feedback. >>> I started voting procedure for this KIP. If there’re any other concerns >>> about this KIP, please let me know. >>> >>> Regards, >>> Levani >>> >>>> On Jul 20, 2019, at 8:39 PM, Levani Kokhreidze <levani.co...@gmail.com> >>>> wrote: >>>> >>>> Hi Matthias, >>>> >>>> Thanks for the suggestion, makes sense. >>>> I’ve updated KIP >>>> (https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>> >>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint>). >>>> >>>> Regards, >>>> Levani >>>> >>>> >>>>> On Jul 20, 2019, at 3:53 AM, Matthias J. Sax <matth...@confluent.io >>>>> <mailto:matth...@confluent.io>> wrote: >>>>> >>>>> Thanks for driving the KIP. >>>>> >>>>> I agree that users need to be able to specify a partitioning strategy. >>>>> >>>>> Sophie raises a fair point about topic configs and producer configs. My >>>>> take is, that consider `Repartitioned` as an "extension" to `Produced`, >>>>> that adds topic configuration, is a good way to think about it and helps >>>>> to keep the API "clean". >>>>> >>>>> >>>>> With regard to method names. I would prefer to avoid abbreviations. Can >>>>> we rename: >>>>> >>>>> `withNumOfPartitions` -> `withNumberOfPartitions` >>>>> >>>>> Furthermore, it might be good to add some more `static` methods: >>>>> >>>>> - Repartitioned.with(Serde<K>, Serde<V>) >>>>> - Repartitioned.withNumberOfPartitions(int) >>>>> - Repartitioned.streamPartitioner(StreamPartitioner) >>>>> >>>>> >>>>> -Matthias >>>>> >>>>> On 7/19/19 3:33 PM, Levani Kokhreidze wrote: >>>>>> Totally agree. I think in KStream interface it makes sense to have some >>>>>> duplicate configurations between operators in order to keep API simple >>>>>> and usable. >>>>>> Also, as more surface API has, harder it is to have proper backward >>>>>> compatibility. >>>>>> While initial idea of keeping topic level configs separate was exciting, >>>>>> having Repartitioned class encapsulate some producer level configs makes >>>>>> API more readable. >>>>>> >>>>>> Regards, >>>>>> Levani >>>>>> >>>>>>> On Jul 20, 2019, at 1:15 AM, Sophie Blee-Goldman <sop...@confluent.io >>>>>>> <mailto:sop...@confluent.io>> wrote: >>>>>>> >>>>>>> I think that is a good point about trying to keep producer level >>>>>>> configurations and (repartition) topic level considerations separate. >>>>>>> Number of partitions is definitely purely a topic level configuration. >>>>>>> But >>>>>>> on some level, serdes and partitioners are just as much a topic >>>>>>> configuration as a producer one. You could have two producers configured >>>>>>> with different serdes and/or partitioners, but if they are writing to >>>>>>> the >>>>>>> same topic the result would be very difficult to part. So in a sense, >>>>>>> these >>>>>>> are configurations of topics in Streams, not just producers. >>>>>>> >>>>>>> Another way to think of it: while the Streams API is not always true to >>>>>>> this, ideally all the relevant configs for an operator are wrapped into >>>>>>> a >>>>>>> single object (in this case, Repartitioned). We could instead split out >>>>>>> the >>>>>>> fields in common with Produced into a separate parameter to keep topic >>>>>>> and >>>>>>> producer level configurations separate, but this increases the API >>>>>>> surface >>>>>>> area by a lot. It's much more straightforward to just say "this is >>>>>>> everything that this particular operator needs" without worrying about >>>>>>> what >>>>>>> exactly you're specifying. >>>>>>> >>>>>>> I suppose you could alternatively make Produced a field of >>>>>>> Repartitioned, >>>>>>> but I don't think we do this kind of composition elsewhere in Streams at >>>>>>> the moment >>>>>>> >>>>>>> On Fri, Jul 19, 2019 at 1:45 PM Levani Kokhreidze >>>>>>> <levani.co...@gmail.com <mailto:levani.co...@gmail.com>> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Bill, >>>>>>>> >>>>>>>> Thanks a lot for the feedback. >>>>>>>> Yes, that makes sense. I’ve updated KIP with >>>>>>>> `Repartitioned#partitioner` >>>>>>>> configuration. >>>>>>>> In the beginning, I wanted to introduce a class for topic level >>>>>>>> configuration and keep topic level and producer level configurations >>>>>>>> (such >>>>>>>> as Produced) separately (see my second email in this thread). >>>>>>>> But while looking at the semantics of KStream interface, I couldn’t >>>>>>>> really >>>>>>>> figure out good operation name for Topic level configuration class and >>>>>>>> just >>>>>>>> introducing `Topic` config class was kinda breaking the semantics. >>>>>>>> So I think having Repartitioned class which encapsulates topic and >>>>>>>> producer level configurations for internal topics is viable thing to >>>>>>>> do. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Levani >>>>>>>> >>>>>>>>> On Jul 19, 2019, at 7:47 PM, Bill Bejeck <bbej...@gmail.com >>>>>>>>> <mailto:bbej...@gmail.com>> wrote: >>>>>>>>> >>>>>>>>> Hi Lavani, >>>>>>>>> >>>>>>>>> Thanks for resurrecting this KIP. >>>>>>>>> >>>>>>>>> I'm also a +1 for adding a partition option. In addition to the >>>>>>>>> reason >>>>>>>>> provided by John, my reasoning is: >>>>>>>>> >>>>>>>>> 1. Users may want to use something other than hash-based partitioning >>>>>>>>> 2. Users may wish to partition on something different than the key >>>>>>>>> without having to change the key. For example: >>>>>>>>> 1. A combination of fields in the value in conjunction with the key >>>>>>>>> 2. Something other than the key >>>>>>>>> 3. We allow users to specify a partitioner on Produced hence in >>>>>>>>> KStream.to and KStream.through, so it makes sense for API consistency. >>>>>>>>> >>>>>>>>> Just my 2 cents. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Bill >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Jul 19, 2019 at 5:46 AM Levani Kokhreidze < >>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi John, >>>>>>>>>> >>>>>>>>>> In my mind it makes sense. >>>>>>>>>> If we add partitioner configuration to Repartitioned class, with the >>>>>>>>>> combination of specifying number of partitions for internal topics, >>>>>>>>>> user >>>>>>>>>> will have opportunity to ensure co-partitioning before join >>>>>>>>>> operation. >>>>>>>>>> I think this can be quite powerful feature. >>>>>>>>>> Wondering what others think about this? >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Levani >>>>>>>>>> >>>>>>>>>>> On Jul 18, 2019, at 1:20 AM, John Roesler <j...@confluent.io >>>>>>>>>>> <mailto:j...@confluent.io>> wrote: >>>>>>>>>>> >>>>>>>>>>> Yes, I believe that's what I had in mind. Again, not totally sure it >>>>>>>>>>> makes sense, but I believe something similar is the rationale for >>>>>>>>>>> having the partitioner option in Produced. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> -John >>>>>>>>>>> >>>>>>>>>>> On Wed, Jul 17, 2019 at 3:20 PM Levani Kokhreidze >>>>>>>>>>> <levani.co...@gmail.com <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hey John, >>>>>>>>>>>> >>>>>>>>>>>> Oh that’s interesting use-case. >>>>>>>>>>>> Do I understand this correctly, in your example I would first issue >>>>>>>>>> repartition(Repartitioned) with proper partitioner that essentially >>>>>>>> would >>>>>>>>>> be the same as the topic I want to join with and then do the >>>>>>>> KStream#join >>>>>>>>>> with DSL? >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Levani >>>>>>>>>>>> >>>>>>>>>>>>> On Jul 17, 2019, at 11:11 PM, John Roesler <j...@confluent.io >>>>>>>>>>>>> <mailto:j...@confluent.io>> >>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hey, all, just to chime in, >>>>>>>>>>>>> >>>>>>>>>>>>> I think it might be useful to have an option to specify the >>>>>>>>>>>>> partitioner. The case I have in mind is that some data may get >>>>>>>>>>>>> repartitioned and then joined with an input topic. If the >>>>>>>>>>>>> right-side >>>>>>>>>>>>> input topic uses a custom partitioning strategy, then the >>>>>>>>>>>>> repartitioned stream also needs to be partitioned with the same >>>>>>>>>>>>> strategy. >>>>>>>>>>>>> >>>>>>>>>>>>> Does that make sense, or did I maybe miss something important? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> -John >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Jul 17, 2019 at 2:48 PM Levani Kokhreidze >>>>>>>>>>>>> <levani.co...@gmail.com <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yes, I was thinking about it as well. To be honest I’m not sure >>>>>>>> about >>>>>>>>>> it yet. >>>>>>>>>>>>>> As Kafka Streams DSL user, I don’t really think I would need >>>>>>>>>>>>>> control >>>>>>>>>> over partitioner for internal topics. >>>>>>>>>>>>>> As a user, I would assume that Kafka Streams knows best how to >>>>>>>>>> partition data for internal topics. >>>>>>>>>>>>>> In this KIP I wrote that Produced should be used only for topics >>>>>>>> that >>>>>>>>>> are created by user In advance. >>>>>>>>>>>>>> In those cases maybe it make sense to have possibility to specify >>>>>>>> the >>>>>>>>>> partitioner. >>>>>>>>>>>>>> I don’t have clear answer on that yet, but I guess specifying the >>>>>>>>>> partitioner can be added as well if there’s agreement on this. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Levani >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Jul 17, 2019, at 10:42 PM, Sophie Blee-Goldman < >>>>>>>>>> sop...@confluent.io <mailto:sop...@confluent.io>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for clearing that up. I agree that Repartitioned would >>>>>>>>>>>>>>> be a >>>>>>>>>> useful >>>>>>>>>>>>>>> addition. I'm wondering if it might also need to have >>>>>>>>>>>>>>> a withStreamPartitioner method/field, similar to Produced? I'm >>>>>>>>>>>>>>> not >>>>>>>>>> sure how >>>>>>>>>>>>>>> widely this feature is really used, but seems it should be >>>>>>>> available >>>>>>>>>> for >>>>>>>>>>>>>>> repartition topics. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 11:26 AM Levani Kokhreidze < >>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hey Sophie, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In both cases KStream#repartition and >>>>>>>>>> KStream#repartition(Repartitioned) >>>>>>>>>>>>>>>> topic will be created and managed by Kafka Streams. >>>>>>>>>>>>>>>> Idea of Repartitioned is to give user more control over the >>>>>>>>>>>>>>>> topic >>>>>>>>>> such as >>>>>>>>>>>>>>>> num of partitions. >>>>>>>>>>>>>>>> I feel like Repartitioned parameter is something that is >>>>>>>>>>>>>>>> missing >>>>>>>> in >>>>>>>>>>>>>>>> current DSL design. >>>>>>>>>>>>>>>> Essentially giving user control over parallelism by configuring >>>>>>>> num >>>>>>>>>> of >>>>>>>>>>>>>>>> partitions for internal topics. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hope this answers your question. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Jul 17, 2019, at 9:02 PM, Sophie Blee-Goldman < >>>>>>>>>> sop...@confluent.io <mailto:sop...@confluent.io>> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hey Levani, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks for the KIP! Can you clarify one thing for me -- for >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> KStream#repartition signature taking a Repartitioned, will the >>>>>>>>>> topic be >>>>>>>>>>>>>>>>> auto-created by Streams (which seems to be the case for the >>>>>>>>>> signature >>>>>>>>>>>>>>>>> without a Repartitioned) or does it have to be pre-created? >>>>>>>>>>>>>>>>> The >>>>>>>>>> wording >>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>> the KIP makes it seem like one version of the method will >>>>>>>>>> auto-create >>>>>>>>>>>>>>>>> topics while the other will not. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>> Sophie >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 10:15 AM Levani Kokhreidze < >>>>>>>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> One more bump about KIP-221 ( >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>> >>>>>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint> >>>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>>> ) >>>>>>>>>>>>>>>>>> so it doesn’t get lost in mailing list :) >>>>>>>>>>>>>>>>>> Would love to hear communities opinions/concerns about this >>>>>>>>>>>>>>>>>> KIP. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze < >>>>>>>>>> levani.co...@gmail.com >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Kind reminder about this KIP: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze < >>>>>>>>>>>>>>>> levani.co...@gmail.com >>>>>>>>>>>>>>>>>> <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> In order to move this KIP forward, I’ve updated confluence >>>>>>>> page >>>>>>>>>> with >>>>>>>>>>>>>>>>>> the new proposal >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I’ve also filled “Rejected Alternatives” section. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Looking forward to discuss this KIP :) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> King regards, >>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 1:08 PM, Levani Kokhreidze < >>>>>>>>>>>>>>>> levani.co...@gmail.com >>>>>>>>>>>>>>>>>> <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hello Matthias, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks for the feedback and ideas. >>>>>>>>>>>>>>>>>>>>> I like the idea of introducing dedicated `Topic` class for >>>>>>>>>> topic >>>>>>>>>>>>>>>>>> configuration for internal operators like `groupedBy`. >>>>>>>>>>>>>>>>>>>>> Would be great to hear others opinion about this as well. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 7:00 AM, Matthias J. Sax < >>>>>>>>>> matth...@confluent.io >>>>>>>>>>>>>>>>>> <mailto:matth...@confluent.io>> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Levani, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks for picking up this KIP! And thanks for >>>>>>>>>>>>>>>>>>>>>> summarizing >>>>>>>>>>>>>>>> everything. >>>>>>>>>>>>>>>>>>>>>> Even if some points may have been discussed already >>>>>>>>>>>>>>>>>>>>>> (can't >>>>>>>>>> really >>>>>>>>>>>>>>>>>>>>>> remember), it's helpful to get a good summary to refresh >>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> discussion. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I think your reasoning makes sense. With regard to the >>>>>>>>>> distinction >>>>>>>>>>>>>>>>>>>>>> between operators that manage topics and operators that >>>>>>>>>>>>>>>>>>>>>> use >>>>>>>>>>>>>>>>>> user-created >>>>>>>>>>>>>>>>>>>>>> topics: Following this argument, it might indicate that >>>>>>>>>> leaving >>>>>>>>>>>>>>>>>>>>>> `through()` as-is (as an operator that uses use-defined >>>>>>>>>> topics) and >>>>>>>>>>>>>>>>>>>>>> introducing a new `repartition()` operator (an operator >>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>> manages >>>>>>>>>>>>>>>>>>>>>> topics itself) might be good. Otherwise, there is one >>>>>>>> operator >>>>>>>>>>>>>>>>>>>>>> `through()` that sometimes manages topics but sometimes >>>>>>>> not; a >>>>>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>>>>>>> name, ie, new operator would make the distinction >>>>>>>>>>>>>>>>>>>>>> clearer. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> About adding `numOfPartitions` to `Grouped`. I am >>>>>>>>>>>>>>>>>>>>>> wondering >>>>>>>>>> if the >>>>>>>>>>>>>>>>>> same >>>>>>>>>>>>>>>>>>>>>> argument as for `Produced` does apply and adding it is >>>>>>>>>> semantically >>>>>>>>>>>>>>>>>>>>>> questionable? Might be good to get opinions of others on >>>>>>>>>> this, too. >>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>> am >>>>>>>>>>>>>>>>>>>>>> not sure myself what solution I prefer atm. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> So far, KS uses configuration objects that allow to >>>>>>>> configure >>>>>>>>>> a >>>>>>>>>>>>>>>>>> certain >>>>>>>>>>>>>>>>>>>>>> "entity" like a consumer, producer, store. If we assume >>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>> a topic >>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>>>> a similar entity, I am wonder if we should have a >>>>>>>>>>>>>>>>>>>>>> `Topic#withNumberOfPartitions()` class and method >>>>>>>>>>>>>>>>>>>>>> instead of >>>>>>>>>> a plain >>>>>>>>>>>>>>>>>>>>>> integer? This would allow us to add other configs, like >>>>>>>>>> replication >>>>>>>>>>>>>>>>>>>>>> factor, retention-time etc, easily, without the need to >>>>>>>>>> change the >>>>>>>>>>>>>>>>>> "main >>>>>>>>>>>>>>>>>>>>>> API". >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Just want to give some ideas. Not sure if I like them >>>>>>>> myself. >>>>>>>>>> :) >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On 7/1/19 1:04 AM, Levani Kokhreidze wrote: >>>>>>>>>>>>>>>>>>>>>>> Actually, giving it more though - maybe enhancing >>>>>>>>>>>>>>>>>>>>>>> Produced >>>>>>>>>> with num >>>>>>>>>>>>>>>>>> of partitions configuration is not the best approach. Let me >>>>>>>>>> explain >>>>>>>>>>>>>>>> why: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> 1) If we enhance Produced class with this configuration, >>>>>>>>>> this will >>>>>>>>>>>>>>>>>> also affect KStream#to operation. Since KStream#to is the >>>>>>>>>>>>>>>>>> final >>>>>>>>>> sink of >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> topology, for me, it seems to be reasonable assumption that >>>>>>>>>>>>>>>>>> user >>>>>>>>>> needs >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>> manually create sink topic in advance. And in that case, >>>>>>>>>>>>>>>>>> having >>>>>>>>>> num of >>>>>>>>>>>>>>>>>> partitions configuration doesn’t make much sense. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> 2) Looking at Produced class, based on API contract, >>>>>>>>>>>>>>>>>>>>>>> seems >>>>>>>>>> like >>>>>>>>>>>>>>>>>> Produced is designed to be something that is explicitly for >>>>>>>>>> producer >>>>>>>>>>>>>>>> (key >>>>>>>>>>>>>>>>>> serializer, value serializer, partitioner those all are >>>>>>>>>>>>>>>>>> producer >>>>>>>>>>>>>>>> specific >>>>>>>>>>>>>>>>>> configurations) and num of partitions is topic level >>>>>>>>>> configuration. And >>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>> don’t think mixing topic and producer level configurations >>>>>>>>>> together in >>>>>>>>>>>>>>>> one >>>>>>>>>>>>>>>>>> class is the good approach. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> 3) Looking at KStream interface, seems like Produced >>>>>>>>>> parameter is >>>>>>>>>>>>>>>>>> for operations that work with non-internal (e.g topics >>>>>>>>>>>>>>>>>> created >>>>>>>> and >>>>>>>>>>>>>>>> managed >>>>>>>>>>>>>>>>>> internally by Kafka Streams) topics and I think we should >>>>>>>>>>>>>>>>>> leave >>>>>>>>>> it as >>>>>>>>>>>>>>>> it is >>>>>>>>>>>>>>>>>> in that case. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Taking all this things into account, I think we should >>>>>>>>>> distinguish >>>>>>>>>>>>>>>>>> between DSL operations, where Kafka Streams should create and >>>>>>>>>> manage >>>>>>>>>>>>>>>>>> internal topics (KStream#groupBy) vs topics that should be >>>>>>>>>> created in >>>>>>>>>>>>>>>>>> advance (e.g KStream#to). >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> To sum it up, I think adding numPartitions >>>>>>>>>>>>>>>>>>>>>>> configuration in >>>>>>>>>>>>>>>> Produced >>>>>>>>>>>>>>>>>> will result in mixing topic and producer level configuration >>>>>>>>>>>>>>>>>> in >>>>>>>>>> one >>>>>>>>>>>>>>>> class >>>>>>>>>>>>>>>>>> and it’s gonna break existing API semantics. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Regarding making topic name optional in KStream#through >>>>>>>>>>>>>>>>>>>>>>> - I >>>>>>>>>> think >>>>>>>>>>>>>>>>>> underline idea is very useful and giving users possibility to >>>>>>>>>> specify >>>>>>>>>>>>>>>> num >>>>>>>>>>>>>>>>>> of partitions there is even more useful :) Considering >>>>>>>>>>>>>>>>>> arguments >>>>>>>>>> against >>>>>>>>>>>>>>>>>> adding num of partitions in Produced class, I see two options >>>>>>>>>> here: >>>>>>>>>>>>>>>>>>>>>>> 1) Add following method overloads >>>>>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and num of >>>>>>>>>> partitions >>>>>>>>>>>>>>>>>> will be taken from source topic >>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic will be >>>>>>>>>>>>>>>>>>>>>>> auto >>>>>>>>>>>>>>>>>> generated with specified num of partitions >>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final Produced<K, >>>>>>>>>>>>>>>>>>>>>>> V> >>>>>>>>>>>>>>>>>> produced) - topic will be with generated with specified num >>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>> partitions >>>>>>>>>>>>>>>>>> and configuration taken from produced parameter. >>>>>>>>>>>>>>>>>>>>>>> 2) Leave KStream#through as it is and introduce new >>>>>>>>>>>>>>>>>>>>>>> method >>>>>>>> - >>>>>>>>>>>>>>>>>> KStream#repartition (I think Matthias suggested this in one >>>>>>>>>>>>>>>>>> of >>>>>>>> the >>>>>>>>>>>>>>>> threads) >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Considering all mentioned above I propose the following >>>>>>>> plan: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Option A: >>>>>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is >>>>>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to Grouped class >>>>>>>>>>>>>>>>>>>>>>> (as >>>>>>>>>>>>>>>>>> mentioned in the KIP) >>>>>>>>>>>>>>>>>>>>>>> 3) Add following method overloads to KStream#through >>>>>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and num of >>>>>>>>>> partitions >>>>>>>>>>>>>>>>>> will be taken from source topic >>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic will be >>>>>>>>>>>>>>>>>>>>>>> auto >>>>>>>>>>>>>>>>>> generated with specified num of partitions >>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final Produced<K, >>>>>>>>>>>>>>>>>>>>>>> V> >>>>>>>>>>>>>>>>>> produced) - topic will be with generated with specified num >>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>> partitions >>>>>>>>>>>>>>>>>> and configuration taken from produced parameter. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Option B: >>>>>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is >>>>>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to Grouped class >>>>>>>>>>>>>>>>>>>>>>> (as >>>>>>>>>>>>>>>>>> mentioned in the KIP) >>>>>>>>>>>>>>>>>>>>>>> 3) Add new operator KStream#repartition for creating and >>>>>>>>>> managing >>>>>>>>>>>>>>>>>> internal repartition topics >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> P.S. I’m sorry if all of this was already discussed in >>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>> mailing >>>>>>>>>>>>>>>>>> list, but I kinda got with all the threads that were about >>>>>>>>>>>>>>>>>> this >>>>>>>>>> KIP :( >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Jul 1, 2019, at 9:56 AM, Levani Kokhreidze < >>>>>>>>>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I would like to resurrect discussion around KIP-221. >>>>>>>>>>>>>>>>>>>>>>>> Going >>>>>>>>>> through >>>>>>>>>>>>>>>>>> the discussion thread, there’s seems to agreement around >>>>>>>>>> usefulness of >>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>> feature. >>>>>>>>>>>>>>>>>>>>>>>> Regarding the implementation, as far as I understood, >>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>> most >>>>>>>>>>>>>>>>>> optimal solution for me seems the following: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> 1) Add two method overloads to KStream#through method >>>>>>>>>> (essentially >>>>>>>>>>>>>>>>>> making topic name optional) >>>>>>>>>>>>>>>>>>>>>>>> 2) Enhance Produced class with numOfPartitions >>>>>>>> configuration >>>>>>>>>>>>>>>> field. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Those two changes will allow DSL users to control >>>>>>>>>> parallelism and >>>>>>>>>>>>>>>>>> trigger re-partition without doing stateful operations. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I will update KIP with interface changes around >>>>>>>>>> KStream#through if >>>>>>>>>>>>>>>>>> this changes sound sensible. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>> >> >