One question: Why do we add
> Repartitioned#with(final String name, final int numberOfPartitions) It seems that `#with(String name)`, `#numberOfPartitions(int)` in combination with `withName()` and `withNumberOfPartitions()` should be sufficient. Users can chain the method calls. (I think it's valuable to keep the number of overload small if possible.) Otherwise LGTM. -Matthias On 7/23/19 2:18 PM, Levani Kokhreidze wrote: > Hello, > > Thanks all for your feedback. > I started voting procedure for this KIP. If there’re any other concerns about > this KIP, please let me know. > > Regards, > Levani > >> On Jul 20, 2019, at 8:39 PM, Levani Kokhreidze <levani.co...@gmail.com> >> wrote: >> >> Hi Matthias, >> >> Thanks for the suggestion, makes sense. >> I’ve updated KIP >> (https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >> >> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint>). >> >> Regards, >> Levani >> >> >>> On Jul 20, 2019, at 3:53 AM, Matthias J. Sax <matth...@confluent.io >>> <mailto:matth...@confluent.io>> wrote: >>> >>> Thanks for driving the KIP. >>> >>> I agree that users need to be able to specify a partitioning strategy. >>> >>> Sophie raises a fair point about topic configs and producer configs. My >>> take is, that consider `Repartitioned` as an "extension" to `Produced`, >>> that adds topic configuration, is a good way to think about it and helps >>> to keep the API "clean". >>> >>> >>> With regard to method names. I would prefer to avoid abbreviations. Can >>> we rename: >>> >>> `withNumOfPartitions` -> `withNumberOfPartitions` >>> >>> Furthermore, it might be good to add some more `static` methods: >>> >>> - Repartitioned.with(Serde<K>, Serde<V>) >>> - Repartitioned.withNumberOfPartitions(int) >>> - Repartitioned.streamPartitioner(StreamPartitioner) >>> >>> >>> -Matthias >>> >>> On 7/19/19 3:33 PM, Levani Kokhreidze wrote: >>>> Totally agree. I think in KStream interface it makes sense to have some >>>> duplicate configurations between operators in order to keep API simple and >>>> usable. >>>> Also, as more surface API has, harder it is to have proper backward >>>> compatibility. >>>> While initial idea of keeping topic level configs separate was exciting, >>>> having Repartitioned class encapsulate some producer level configs makes >>>> API more readable. >>>> >>>> Regards, >>>> Levani >>>> >>>>> On Jul 20, 2019, at 1:15 AM, Sophie Blee-Goldman <sop...@confluent.io >>>>> <mailto:sop...@confluent.io>> wrote: >>>>> >>>>> I think that is a good point about trying to keep producer level >>>>> configurations and (repartition) topic level considerations separate. >>>>> Number of partitions is definitely purely a topic level configuration. But >>>>> on some level, serdes and partitioners are just as much a topic >>>>> configuration as a producer one. You could have two producers configured >>>>> with different serdes and/or partitioners, but if they are writing to the >>>>> same topic the result would be very difficult to part. So in a sense, >>>>> these >>>>> are configurations of topics in Streams, not just producers. >>>>> >>>>> Another way to think of it: while the Streams API is not always true to >>>>> this, ideally all the relevant configs for an operator are wrapped into a >>>>> single object (in this case, Repartitioned). We could instead split out >>>>> the >>>>> fields in common with Produced into a separate parameter to keep topic and >>>>> producer level configurations separate, but this increases the API surface >>>>> area by a lot. It's much more straightforward to just say "this is >>>>> everything that this particular operator needs" without worrying about >>>>> what >>>>> exactly you're specifying. >>>>> >>>>> I suppose you could alternatively make Produced a field of Repartitioned, >>>>> but I don't think we do this kind of composition elsewhere in Streams at >>>>> the moment >>>>> >>>>> On Fri, Jul 19, 2019 at 1:45 PM Levani Kokhreidze <levani.co...@gmail.com >>>>> <mailto:levani.co...@gmail.com>> >>>>> wrote: >>>>> >>>>>> Hi Bill, >>>>>> >>>>>> Thanks a lot for the feedback. >>>>>> Yes, that makes sense. I’ve updated KIP with `Repartitioned#partitioner` >>>>>> configuration. >>>>>> In the beginning, I wanted to introduce a class for topic level >>>>>> configuration and keep topic level and producer level configurations >>>>>> (such >>>>>> as Produced) separately (see my second email in this thread). >>>>>> But while looking at the semantics of KStream interface, I couldn’t >>>>>> really >>>>>> figure out good operation name for Topic level configuration class and >>>>>> just >>>>>> introducing `Topic` config class was kinda breaking the semantics. >>>>>> So I think having Repartitioned class which encapsulates topic and >>>>>> producer level configurations for internal topics is viable thing to do. >>>>>> >>>>>> Regards, >>>>>> Levani >>>>>> >>>>>>> On Jul 19, 2019, at 7:47 PM, Bill Bejeck <bbej...@gmail.com >>>>>>> <mailto:bbej...@gmail.com>> wrote: >>>>>>> >>>>>>> Hi Lavani, >>>>>>> >>>>>>> Thanks for resurrecting this KIP. >>>>>>> >>>>>>> I'm also a +1 for adding a partition option. In addition to the reason >>>>>>> provided by John, my reasoning is: >>>>>>> >>>>>>> 1. Users may want to use something other than hash-based partitioning >>>>>>> 2. Users may wish to partition on something different than the key >>>>>>> without having to change the key. For example: >>>>>>> 1. A combination of fields in the value in conjunction with the key >>>>>>> 2. Something other than the key >>>>>>> 3. We allow users to specify a partitioner on Produced hence in >>>>>>> KStream.to and KStream.through, so it makes sense for API consistency. >>>>>>> >>>>>>> Just my 2 cents. >>>>>>> >>>>>>> Thanks, >>>>>>> Bill >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Jul 19, 2019 at 5:46 AM Levani Kokhreidze < >>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi John, >>>>>>>> >>>>>>>> In my mind it makes sense. >>>>>>>> If we add partitioner configuration to Repartitioned class, with the >>>>>>>> combination of specifying number of partitions for internal topics, >>>>>>>> user >>>>>>>> will have opportunity to ensure co-partitioning before join operation. >>>>>>>> I think this can be quite powerful feature. >>>>>>>> Wondering what others think about this? >>>>>>>> >>>>>>>> Regards, >>>>>>>> Levani >>>>>>>> >>>>>>>>> On Jul 18, 2019, at 1:20 AM, John Roesler <j...@confluent.io >>>>>>>>> <mailto:j...@confluent.io>> wrote: >>>>>>>>> >>>>>>>>> Yes, I believe that's what I had in mind. Again, not totally sure it >>>>>>>>> makes sense, but I believe something similar is the rationale for >>>>>>>>> having the partitioner option in Produced. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> -John >>>>>>>>> >>>>>>>>> On Wed, Jul 17, 2019 at 3:20 PM Levani Kokhreidze >>>>>>>>> <levani.co...@gmail.com <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>>> >>>>>>>>>> Hey John, >>>>>>>>>> >>>>>>>>>> Oh that’s interesting use-case. >>>>>>>>>> Do I understand this correctly, in your example I would first issue >>>>>>>> repartition(Repartitioned) with proper partitioner that essentially >>>>>> would >>>>>>>> be the same as the topic I want to join with and then do the >>>>>> KStream#join >>>>>>>> with DSL? >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Levani >>>>>>>>>> >>>>>>>>>>> On Jul 17, 2019, at 11:11 PM, John Roesler <j...@confluent.io >>>>>>>>>>> <mailto:j...@confluent.io>> >>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hey, all, just to chime in, >>>>>>>>>>> >>>>>>>>>>> I think it might be useful to have an option to specify the >>>>>>>>>>> partitioner. The case I have in mind is that some data may get >>>>>>>>>>> repartitioned and then joined with an input topic. If the right-side >>>>>>>>>>> input topic uses a custom partitioning strategy, then the >>>>>>>>>>> repartitioned stream also needs to be partitioned with the same >>>>>>>>>>> strategy. >>>>>>>>>>> >>>>>>>>>>> Does that make sense, or did I maybe miss something important? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> -John >>>>>>>>>>> >>>>>>>>>>> On Wed, Jul 17, 2019 at 2:48 PM Levani Kokhreidze >>>>>>>>>>> <levani.co...@gmail.com <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Yes, I was thinking about it as well. To be honest I’m not sure >>>>>> about >>>>>>>> it yet. >>>>>>>>>>>> As Kafka Streams DSL user, I don’t really think I would need >>>>>>>>>>>> control >>>>>>>> over partitioner for internal topics. >>>>>>>>>>>> As a user, I would assume that Kafka Streams knows best how to >>>>>>>> partition data for internal topics. >>>>>>>>>>>> In this KIP I wrote that Produced should be used only for topics >>>>>> that >>>>>>>> are created by user In advance. >>>>>>>>>>>> In those cases maybe it make sense to have possibility to specify >>>>>> the >>>>>>>> partitioner. >>>>>>>>>>>> I don’t have clear answer on that yet, but I guess specifying the >>>>>>>> partitioner can be added as well if there’s agreement on this. >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Levani >>>>>>>>>>>> >>>>>>>>>>>>> On Jul 17, 2019, at 10:42 PM, Sophie Blee-Goldman < >>>>>>>> sop...@confluent.io <mailto:sop...@confluent.io>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for clearing that up. I agree that Repartitioned would be a >>>>>>>> useful >>>>>>>>>>>>> addition. I'm wondering if it might also need to have >>>>>>>>>>>>> a withStreamPartitioner method/field, similar to Produced? I'm not >>>>>>>> sure how >>>>>>>>>>>>> widely this feature is really used, but seems it should be >>>>>> available >>>>>>>> for >>>>>>>>>>>>> repartition topics. >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Jul 17, 2019 at 11:26 AM Levani Kokhreidze < >>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hey Sophie, >>>>>>>>>>>>>> >>>>>>>>>>>>>> In both cases KStream#repartition and >>>>>>>> KStream#repartition(Repartitioned) >>>>>>>>>>>>>> topic will be created and managed by Kafka Streams. >>>>>>>>>>>>>> Idea of Repartitioned is to give user more control over the topic >>>>>>>> such as >>>>>>>>>>>>>> num of partitions. >>>>>>>>>>>>>> I feel like Repartitioned parameter is something that is missing >>>>>> in >>>>>>>>>>>>>> current DSL design. >>>>>>>>>>>>>> Essentially giving user control over parallelism by configuring >>>>>> num >>>>>>>> of >>>>>>>>>>>>>> partitions for internal topics. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hope this answers your question. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Levani >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Jul 17, 2019, at 9:02 PM, Sophie Blee-Goldman < >>>>>>>> sop...@confluent.io <mailto:sop...@confluent.io>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hey Levani, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for the KIP! Can you clarify one thing for me -- for the >>>>>>>>>>>>>>> KStream#repartition signature taking a Repartitioned, will the >>>>>>>> topic be >>>>>>>>>>>>>>> auto-created by Streams (which seems to be the case for the >>>>>>>> signature >>>>>>>>>>>>>>> without a Repartitioned) or does it have to be pre-created? The >>>>>>>> wording >>>>>>>>>>>>>> in >>>>>>>>>>>>>>> the KIP makes it seem like one version of the method will >>>>>>>> auto-create >>>>>>>>>>>>>>> topics while the other will not. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> Sophie >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 10:15 AM Levani Kokhreidze < >>>>>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> One more bump about KIP-221 ( >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>> >>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>> >>>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint> >>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>> >>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>> ) >>>>>>>>>>>>>>>> so it doesn’t get lost in mailing list :) >>>>>>>>>>>>>>>> Would love to hear communities opinions/concerns about this >>>>>>>>>>>>>>>> KIP. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze < >>>>>>>> levani.co...@gmail.com >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Kind reminder about this KIP: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>> >>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>> >>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze < >>>>>>>>>>>>>> levani.co...@gmail.com >>>>>>>>>>>>>>>> <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> In order to move this KIP forward, I’ve updated confluence >>>>>> page >>>>>>>> with >>>>>>>>>>>>>>>> the new proposal >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>> >>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>> >>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I’ve also filled “Rejected Alternatives” section. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Looking forward to discuss this KIP :) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> King regards, >>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 1:08 PM, Levani Kokhreidze < >>>>>>>>>>>>>> levani.co...@gmail.com >>>>>>>>>>>>>>>> <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hello Matthias, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks for the feedback and ideas. >>>>>>>>>>>>>>>>>>> I like the idea of introducing dedicated `Topic` class for >>>>>>>> topic >>>>>>>>>>>>>>>> configuration for internal operators like `groupedBy`. >>>>>>>>>>>>>>>>>>> Would be great to hear others opinion about this as well. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 7:00 AM, Matthias J. Sax < >>>>>>>> matth...@confluent.io >>>>>>>>>>>>>>>> <mailto:matth...@confluent.io>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Levani, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks for picking up this KIP! And thanks for summarizing >>>>>>>>>>>>>> everything. >>>>>>>>>>>>>>>>>>>> Even if some points may have been discussed already (can't >>>>>>>> really >>>>>>>>>>>>>>>>>>>> remember), it's helpful to get a good summary to refresh >>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> discussion. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I think your reasoning makes sense. With regard to the >>>>>>>> distinction >>>>>>>>>>>>>>>>>>>> between operators that manage topics and operators that use >>>>>>>>>>>>>>>> user-created >>>>>>>>>>>>>>>>>>>> topics: Following this argument, it might indicate that >>>>>>>> leaving >>>>>>>>>>>>>>>>>>>> `through()` as-is (as an operator that uses use-defined >>>>>>>> topics) and >>>>>>>>>>>>>>>>>>>> introducing a new `repartition()` operator (an operator >>>>>>>>>>>>>>>>>>>> that >>>>>>>> manages >>>>>>>>>>>>>>>>>>>> topics itself) might be good. Otherwise, there is one >>>>>> operator >>>>>>>>>>>>>>>>>>>> `through()` that sometimes manages topics but sometimes >>>>>> not; a >>>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>>>>> name, ie, new operator would make the distinction clearer. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> About adding `numOfPartitions` to `Grouped`. I am wondering >>>>>>>> if the >>>>>>>>>>>>>>>> same >>>>>>>>>>>>>>>>>>>> argument as for `Produced` does apply and adding it is >>>>>>>> semantically >>>>>>>>>>>>>>>>>>>> questionable? Might be good to get opinions of others on >>>>>>>> this, too. >>>>>>>>>>>>>> I >>>>>>>>>>>>>>>> am >>>>>>>>>>>>>>>>>>>> not sure myself what solution I prefer atm. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> So far, KS uses configuration objects that allow to >>>>>> configure >>>>>>>> a >>>>>>>>>>>>>>>> certain >>>>>>>>>>>>>>>>>>>> "entity" like a consumer, producer, store. If we assume >>>>>>>>>>>>>>>>>>>> that >>>>>>>> a topic >>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>> a similar entity, I am wonder if we should have a >>>>>>>>>>>>>>>>>>>> `Topic#withNumberOfPartitions()` class and method instead >>>>>>>>>>>>>>>>>>>> of >>>>>>>> a plain >>>>>>>>>>>>>>>>>>>> integer? This would allow us to add other configs, like >>>>>>>> replication >>>>>>>>>>>>>>>>>>>> factor, retention-time etc, easily, without the need to >>>>>>>> change the >>>>>>>>>>>>>>>> "main >>>>>>>>>>>>>>>>>>>> API". >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Just want to give some ideas. Not sure if I like them >>>>>> myself. >>>>>>>> :) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On 7/1/19 1:04 AM, Levani Kokhreidze wrote: >>>>>>>>>>>>>>>>>>>>> Actually, giving it more though - maybe enhancing Produced >>>>>>>> with num >>>>>>>>>>>>>>>> of partitions configuration is not the best approach. Let me >>>>>>>> explain >>>>>>>>>>>>>> why: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 1) If we enhance Produced class with this configuration, >>>>>>>> this will >>>>>>>>>>>>>>>> also affect KStream#to operation. Since KStream#to is the final >>>>>>>> sink of >>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> topology, for me, it seems to be reasonable assumption that >>>>>>>>>>>>>>>> user >>>>>>>> needs >>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> manually create sink topic in advance. And in that case, having >>>>>>>> num of >>>>>>>>>>>>>>>> partitions configuration doesn’t make much sense. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 2) Looking at Produced class, based on API contract, seems >>>>>>>> like >>>>>>>>>>>>>>>> Produced is designed to be something that is explicitly for >>>>>>>> producer >>>>>>>>>>>>>> (key >>>>>>>>>>>>>>>> serializer, value serializer, partitioner those all are >>>>>>>>>>>>>>>> producer >>>>>>>>>>>>>> specific >>>>>>>>>>>>>>>> configurations) and num of partitions is topic level >>>>>>>> configuration. And >>>>>>>>>>>>>> I >>>>>>>>>>>>>>>> don’t think mixing topic and producer level configurations >>>>>>>> together in >>>>>>>>>>>>>> one >>>>>>>>>>>>>>>> class is the good approach. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 3) Looking at KStream interface, seems like Produced >>>>>>>> parameter is >>>>>>>>>>>>>>>> for operations that work with non-internal (e.g topics created >>>>>> and >>>>>>>>>>>>>> managed >>>>>>>>>>>>>>>> internally by Kafka Streams) topics and I think we should leave >>>>>>>> it as >>>>>>>>>>>>>> it is >>>>>>>>>>>>>>>> in that case. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Taking all this things into account, I think we should >>>>>>>> distinguish >>>>>>>>>>>>>>>> between DSL operations, where Kafka Streams should create and >>>>>>>> manage >>>>>>>>>>>>>>>> internal topics (KStream#groupBy) vs topics that should be >>>>>>>> created in >>>>>>>>>>>>>>>> advance (e.g KStream#to). >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> To sum it up, I think adding numPartitions configuration >>>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>> Produced >>>>>>>>>>>>>>>> will result in mixing topic and producer level configuration in >>>>>>>> one >>>>>>>>>>>>>> class >>>>>>>>>>>>>>>> and it’s gonna break existing API semantics. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Regarding making topic name optional in KStream#through - >>>>>>>>>>>>>>>>>>>>> I >>>>>>>> think >>>>>>>>>>>>>>>> underline idea is very useful and giving users possibility to >>>>>>>> specify >>>>>>>>>>>>>> num >>>>>>>>>>>>>>>> of partitions there is even more useful :) Considering >>>>>>>>>>>>>>>> arguments >>>>>>>> against >>>>>>>>>>>>>>>> adding num of partitions in Produced class, I see two options >>>>>>>> here: >>>>>>>>>>>>>>>>>>>>> 1) Add following method overloads >>>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and num of >>>>>>>> partitions >>>>>>>>>>>>>>>> will be taken from source topic >>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic will be auto >>>>>>>>>>>>>>>> generated with specified num of partitions >>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final Produced<K, V> >>>>>>>>>>>>>>>> produced) - topic will be with generated with specified num of >>>>>>>>>>>>>> partitions >>>>>>>>>>>>>>>> and configuration taken from produced parameter. >>>>>>>>>>>>>>>>>>>>> 2) Leave KStream#through as it is and introduce new method >>>>>> - >>>>>>>>>>>>>>>> KStream#repartition (I think Matthias suggested this in one of >>>>>> the >>>>>>>>>>>>>> threads) >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Considering all mentioned above I propose the following >>>>>> plan: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Option A: >>>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is >>>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to Grouped class >>>>>>>>>>>>>>>>>>>>> (as >>>>>>>>>>>>>>>> mentioned in the KIP) >>>>>>>>>>>>>>>>>>>>> 3) Add following method overloads to KStream#through >>>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and num of >>>>>>>> partitions >>>>>>>>>>>>>>>> will be taken from source topic >>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic will be auto >>>>>>>>>>>>>>>> generated with specified num of partitions >>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final Produced<K, V> >>>>>>>>>>>>>>>> produced) - topic will be with generated with specified num of >>>>>>>>>>>>>> partitions >>>>>>>>>>>>>>>> and configuration taken from produced parameter. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Option B: >>>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is >>>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to Grouped class >>>>>>>>>>>>>>>>>>>>> (as >>>>>>>>>>>>>>>> mentioned in the KIP) >>>>>>>>>>>>>>>>>>>>> 3) Add new operator KStream#repartition for creating and >>>>>>>> managing >>>>>>>>>>>>>>>> internal repartition topics >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> P.S. I’m sorry if all of this was already discussed in the >>>>>>>> mailing >>>>>>>>>>>>>>>> list, but I kinda got with all the threads that were about this >>>>>>>> KIP :( >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Jul 1, 2019, at 9:56 AM, Levani Kokhreidze < >>>>>>>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I would like to resurrect discussion around KIP-221. >>>>>>>>>>>>>>>>>>>>>> Going >>>>>>>> through >>>>>>>>>>>>>>>> the discussion thread, there’s seems to agreement around >>>>>>>> usefulness of >>>>>>>>>>>>>> this >>>>>>>>>>>>>>>> feature. >>>>>>>>>>>>>>>>>>>>>> Regarding the implementation, as far as I understood, the >>>>>>>> most >>>>>>>>>>>>>>>> optimal solution for me seems the following: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 1) Add two method overloads to KStream#through method >>>>>>>> (essentially >>>>>>>>>>>>>>>> making topic name optional) >>>>>>>>>>>>>>>>>>>>>> 2) Enhance Produced class with numOfPartitions >>>>>> configuration >>>>>>>>>>>>>> field. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Those two changes will allow DSL users to control >>>>>>>> parallelism and >>>>>>>>>>>>>>>> trigger re-partition without doing stateful operations. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I will update KIP with interface changes around >>>>>>>> KStream#through if >>>>>>>>>>>>>>>> this changes sound sensible. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>>> >>> >> > >
signature.asc
Description: OpenPGP digital signature