Hey Levani, I think people are busy with the upcoming 2.4 release, and don't have much spare time at the moment. It's kind of a difficult time to get attention on things, but feel free to pick up something else to work on in the meantime until things have calmed down a bit!
Cheers, Sophie On Wed, Oct 16, 2019 at 11:26 AM Levani Kokhreidze <levani.co...@gmail.com> wrote: > Hello all, > > Sorry for bringing this thread again, but I would like to get some > attention on this PR: https://github.com/apache/kafka/pull/7170 < > https://github.com/apache/kafka/pull/7170> > It's been a while now and I would love to move on to other KIPs as well. > Please let me know if you have any concerns. > > Regards, > Levani > > > > On Jul 26, 2019, at 11:25 AM, Levani Kokhreidze <levani.co...@gmail.com> > wrote: > > > > Hi all, > > > > Here’s voting thread for this KIP: > https://www.mail-archive.com/dev@kafka.apache.org/msg99680.html < > https://www.mail-archive.com/dev@kafka.apache.org/msg99680.html> > > > > Regards, > > Levani > > > >> On Jul 24, 2019, at 11:15 PM, Levani Kokhreidze <levani.co...@gmail.com > <mailto:levani.co...@gmail.com>> wrote: > >> > >> Hi Matthias, > >> > >> Thanks for the suggestion. I Don’t have strong opinion on that one. > >> Agree that avoiding unnecessary method overloads is a good idea. > >> > >> Updated KIP > >> > >> Regards, > >> Levani > >> > >> > >>> On Jul 24, 2019, at 8:50 PM, Matthias J. Sax <matth...@confluent.io > <mailto:matth...@confluent.io>> wrote: > >>> > >>> One question: > >>> > >>> Why do we add > >>> > >>>> Repartitioned#with(final String name, final int numberOfPartitions) > >>> > >>> It seems that `#with(String name)`, `#numberOfPartitions(int)` in > >>> combination with `withName()` and `withNumberOfPartitions()` should be > >>> sufficient. Users can chain the method calls. > >>> > >>> (I think it's valuable to keep the number of overload small if > possible.) > >>> > >>> Otherwise LGTM. > >>> > >>> > >>> -Matthias > >>> > >>> > >>> On 7/23/19 2:18 PM, Levani Kokhreidze wrote: > >>>> Hello, > >>>> > >>>> Thanks all for your feedback. > >>>> I started voting procedure for this KIP. If there’re any other > concerns about this KIP, please let me know. > >>>> > >>>> Regards, > >>>> Levani > >>>> > >>>>> On Jul 20, 2019, at 8:39 PM, Levani Kokhreidze < > levani.co...@gmail.com <mailto:levani.co...@gmail.com>> wrote: > >>>>> > >>>>> Hi Matthias, > >>>>> > >>>>> Thanks for the suggestion, makes sense. > >>>>> I’ve updated KIP ( > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint > < > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint> > < > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint > < > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint > >>). > >>>>> > >>>>> Regards, > >>>>> Levani > >>>>> > >>>>> > >>>>>> On Jul 20, 2019, at 3:53 AM, Matthias J. Sax <matth...@confluent.io > <mailto:matth...@confluent.io> <mailto:matth...@confluent.io <mailto: > matth...@confluent.io>>> wrote: > >>>>>> > >>>>>> Thanks for driving the KIP. > >>>>>> > >>>>>> I agree that users need to be able to specify a partitioning > strategy. > >>>>>> > >>>>>> Sophie raises a fair point about topic configs and producer > configs. My > >>>>>> take is, that consider `Repartitioned` as an "extension" to > `Produced`, > >>>>>> that adds topic configuration, is a good way to think about it and > helps > >>>>>> to keep the API "clean". > >>>>>> > >>>>>> > >>>>>> With regard to method names. I would prefer to avoid abbreviations. > Can > >>>>>> we rename: > >>>>>> > >>>>>> `withNumOfPartitions` -> `withNumberOfPartitions` > >>>>>> > >>>>>> Furthermore, it might be good to add some more `static` methods: > >>>>>> > >>>>>> - Repartitioned.with(Serde<K>, Serde<V>) > >>>>>> - Repartitioned.withNumberOfPartitions(int) > >>>>>> - Repartitioned.streamPartitioner(StreamPartitioner) > >>>>>> > >>>>>> > >>>>>> -Matthias > >>>>>> > >>>>>> On 7/19/19 3:33 PM, Levani Kokhreidze wrote: > >>>>>>> Totally agree. I think in KStream interface it makes sense to have > some duplicate configurations between operators in order to keep API simple > and usable. > >>>>>>> Also, as more surface API has, harder it is to have proper > backward compatibility. > >>>>>>> While initial idea of keeping topic level configs separate was > exciting, having Repartitioned class encapsulate some producer level > configs makes API more readable. > >>>>>>> > >>>>>>> Regards, > >>>>>>> Levani > >>>>>>> > >>>>>>>> On Jul 20, 2019, at 1:15 AM, Sophie Blee-Goldman < > sop...@confluent.io <mailto:sop...@confluent.io> <mailto: > sop...@confluent.io <mailto:sop...@confluent.io>>> wrote: > >>>>>>>> > >>>>>>>> I think that is a good point about trying to keep producer level > >>>>>>>> configurations and (repartition) topic level considerations > separate. > >>>>>>>> Number of partitions is definitely purely a topic level > configuration. But > >>>>>>>> on some level, serdes and partitioners are just as much a topic > >>>>>>>> configuration as a producer one. You could have two producers > configured > >>>>>>>> with different serdes and/or partitioners, but if they are > writing to the > >>>>>>>> same topic the result would be very difficult to part. So in a > sense, these > >>>>>>>> are configurations of topics in Streams, not just producers. > >>>>>>>> > >>>>>>>> Another way to think of it: while the Streams API is not always > true to > >>>>>>>> this, ideally all the relevant configs for an operator are > wrapped into a > >>>>>>>> single object (in this case, Repartitioned). We could instead > split out the > >>>>>>>> fields in common with Produced into a separate parameter to keep > topic and > >>>>>>>> producer level configurations separate, but this increases the > API surface > >>>>>>>> area by a lot. It's much more straightforward to just say "this is > >>>>>>>> everything that this particular operator needs" without worrying > about what > >>>>>>>> exactly you're specifying. > >>>>>>>> > >>>>>>>> I suppose you could alternatively make Produced a field of > Repartitioned, > >>>>>>>> but I don't think we do this kind of composition elsewhere in > Streams at > >>>>>>>> the moment > >>>>>>>> > >>>>>>>> On Fri, Jul 19, 2019 at 1:45 PM Levani Kokhreidze < > levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto: > levani.co...@gmail.com <mailto:levani.co...@gmail.com>>> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Hi Bill, > >>>>>>>>> > >>>>>>>>> Thanks a lot for the feedback. > >>>>>>>>> Yes, that makes sense. I’ve updated KIP with > `Repartitioned#partitioner` > >>>>>>>>> configuration. > >>>>>>>>> In the beginning, I wanted to introduce a class for topic level > >>>>>>>>> configuration and keep topic level and producer level > configurations (such > >>>>>>>>> as Produced) separately (see my second email in this thread). > >>>>>>>>> But while looking at the semantics of KStream interface, I > couldn’t really > >>>>>>>>> figure out good operation name for Topic level configuration > class and just > >>>>>>>>> introducing `Topic` config class was kinda breaking the > semantics. > >>>>>>>>> So I think having Repartitioned class which encapsulates topic > and > >>>>>>>>> producer level configurations for internal topics is viable > thing to do. > >>>>>>>>> > >>>>>>>>> Regards, > >>>>>>>>> Levani > >>>>>>>>> > >>>>>>>>>> On Jul 19, 2019, at 7:47 PM, Bill Bejeck <bbej...@gmail.com > <mailto:bbej...@gmail.com> <mailto:bbej...@gmail.com <mailto: > bbej...@gmail.com>>> wrote: > >>>>>>>>>> > >>>>>>>>>> Hi Lavani, > >>>>>>>>>> > >>>>>>>>>> Thanks for resurrecting this KIP. > >>>>>>>>>> > >>>>>>>>>> I'm also a +1 for adding a partition option. In addition to > the reason > >>>>>>>>>> provided by John, my reasoning is: > >>>>>>>>>> > >>>>>>>>>> 1. Users may want to use something other than hash-based > partitioning > >>>>>>>>>> 2. Users may wish to partition on something different than the > key > >>>>>>>>>> without having to change the key. For example: > >>>>>>>>>> 1. A combination of fields in the value in conjunction with > the key > >>>>>>>>>> 2. Something other than the key > >>>>>>>>>> 3. We allow users to specify a partitioner on Produced hence in > >>>>>>>>>> KStream.to and KStream.through, so it makes sense for API > consistency. > >>>>>>>>>> > >>>>>>>>>> Just my 2 cents. > >>>>>>>>>> > >>>>>>>>>> Thanks, > >>>>>>>>>> Bill > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Fri, Jul 19, 2019 at 5:46 AM Levani Kokhreidze < > >>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto: > levani.co...@gmail.com <mailto:levani.co...@gmail.com>>> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Hi John, > >>>>>>>>>>> > >>>>>>>>>>> In my mind it makes sense. > >>>>>>>>>>> If we add partitioner configuration to Repartitioned class, > with the > >>>>>>>>>>> combination of specifying number of partitions for internal > topics, user > >>>>>>>>>>> will have opportunity to ensure co-partitioning before join > operation. > >>>>>>>>>>> I think this can be quite powerful feature. > >>>>>>>>>>> Wondering what others think about this? > >>>>>>>>>>> > >>>>>>>>>>> Regards, > >>>>>>>>>>> Levani > >>>>>>>>>>> > >>>>>>>>>>>> On Jul 18, 2019, at 1:20 AM, John Roesler <j...@confluent.io > <mailto:j...@confluent.io> <mailto:j...@confluent.io <mailto: > j...@confluent.io>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Yes, I believe that's what I had in mind. Again, not totally > sure it > >>>>>>>>>>>> makes sense, but I believe something similar is the rationale > for > >>>>>>>>>>>> having the partitioner option in Produced. > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks, > >>>>>>>>>>>> -John > >>>>>>>>>>>> > >>>>>>>>>>>> On Wed, Jul 17, 2019 at 3:20 PM Levani Kokhreidze > >>>>>>>>>>>> <levani.co...@gmail.com <mailto:levani.co...@gmail.com> > <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Hey John, > >>>>>>>>>>>>> > >>>>>>>>>>>>> Oh that’s interesting use-case. > >>>>>>>>>>>>> Do I understand this correctly, in your example I would > first issue > >>>>>>>>>>> repartition(Repartitioned) with proper partitioner that > essentially > >>>>>>>>> would > >>>>>>>>>>> be the same as the topic I want to join with and then do the > >>>>>>>>> KStream#join > >>>>>>>>>>> with DSL? > >>>>>>>>>>>>> > >>>>>>>>>>>>> Regards, > >>>>>>>>>>>>> Levani > >>>>>>>>>>>>> > >>>>>>>>>>>>>> On Jul 17, 2019, at 11:11 PM, John Roesler < > j...@confluent.io <mailto:j...@confluent.io> <mailto:j...@confluent.io > <mailto:j...@confluent.io>>> > >>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Hey, all, just to chime in, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I think it might be useful to have an option to specify the > >>>>>>>>>>>>>> partitioner. The case I have in mind is that some data may > get > >>>>>>>>>>>>>> repartitioned and then joined with an input topic. If the > right-side > >>>>>>>>>>>>>> input topic uses a custom partitioning strategy, then the > >>>>>>>>>>>>>> repartitioned stream also needs to be partitioned with the > same > >>>>>>>>>>>>>> strategy. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Does that make sense, or did I maybe miss something > important? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>> -John > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 2:48 PM Levani Kokhreidze > >>>>>>>>>>>>>> <levani.co...@gmail.com <mailto:levani.co...@gmail.com> > <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Yes, I was thinking about it as well. To be honest I’m not > sure > >>>>>>>>> about > >>>>>>>>>>> it yet. > >>>>>>>>>>>>>>> As Kafka Streams DSL user, I don’t really think I would > need control > >>>>>>>>>>> over partitioner for internal topics. > >>>>>>>>>>>>>>> As a user, I would assume that Kafka Streams knows best > how to > >>>>>>>>>>> partition data for internal topics. > >>>>>>>>>>>>>>> In this KIP I wrote that Produced should be used only for > topics > >>>>>>>>> that > >>>>>>>>>>> are created by user In advance. > >>>>>>>>>>>>>>> In those cases maybe it make sense to have possibility to > specify > >>>>>>>>> the > >>>>>>>>>>> partitioner. > >>>>>>>>>>>>>>> I don’t have clear answer on that yet, but I guess > specifying the > >>>>>>>>>>> partitioner can be added as well if there’s agreement on this. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>>> Levani > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Jul 17, 2019, at 10:42 PM, Sophie Blee-Goldman < > >>>>>>>>>>> sop...@confluent.io <mailto:sop...@confluent.io> <mailto: > sop...@confluent.io <mailto:sop...@confluent.io>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Thanks for clearing that up. I agree that Repartitioned > would be a > >>>>>>>>>>> useful > >>>>>>>>>>>>>>>> addition. I'm wondering if it might also need to have > >>>>>>>>>>>>>>>> a withStreamPartitioner method/field, similar to > Produced? I'm not > >>>>>>>>>>> sure how > >>>>>>>>>>>>>>>> widely this feature is really used, but seems it should be > >>>>>>>>> available > >>>>>>>>>>> for > >>>>>>>>>>>>>>>> repartition topics. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 11:26 AM Levani Kokhreidze < > >>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> > <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com>>> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Hey Sophie, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> In both cases KStream#repartition and > >>>>>>>>>>> KStream#repartition(Repartitioned) > >>>>>>>>>>>>>>>>> topic will be created and managed by Kafka Streams. > >>>>>>>>>>>>>>>>> Idea of Repartitioned is to give user more control over > the topic > >>>>>>>>>>> such as > >>>>>>>>>>>>>>>>> num of partitions. > >>>>>>>>>>>>>>>>> I feel like Repartitioned parameter is something that is > missing > >>>>>>>>> in > >>>>>>>>>>>>>>>>> current DSL design. > >>>>>>>>>>>>>>>>> Essentially giving user control over parallelism by > configuring > >>>>>>>>> num > >>>>>>>>>>> of > >>>>>>>>>>>>>>>>> partitions for internal topics. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Hope this answers your question. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>>>>> Levani > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On Jul 17, 2019, at 9:02 PM, Sophie Blee-Goldman < > >>>>>>>>>>> sop...@confluent.io <mailto:sop...@confluent.io> <mailto: > sop...@confluent.io <mailto:sop...@confluent.io>>> > >>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Hey Levani, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Thanks for the KIP! Can you clarify one thing for me -- > for the > >>>>>>>>>>>>>>>>>> KStream#repartition signature taking a Repartitioned, > will the > >>>>>>>>>>> topic be > >>>>>>>>>>>>>>>>>> auto-created by Streams (which seems to be the case for > the > >>>>>>>>>>> signature > >>>>>>>>>>>>>>>>>> without a Repartitioned) or does it have to be > pre-created? The > >>>>>>>>>>> wording > >>>>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>>> the KIP makes it seem like one version of the method > will > >>>>>>>>>>> auto-create > >>>>>>>>>>>>>>>>>> topics while the other will not. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>>>>>> Sophie > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 10:15 AM Levani Kokhreidze < > >>>>>>>>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> > <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com>>> > >>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Hello, > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> One more bump about KIP-221 ( > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint > < > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint> > < > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint > < > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint > >> > >>>>>>>>>>>>>>>>>>> < > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint > < > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint > > > >>>>>>>>>>>>>>>>>> ) > >>>>>>>>>>>>>>>>>>> so it doesn’t get lost in mailing list :) > >>>>>>>>>>>>>>>>>>> Would love to hear communities opinions/concerns about > this KIP. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>>>>>>> Levani > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze < > >>>>>>>>>>> levani.co...@gmail.com > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Hello, > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Kind reminder about this KIP: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint > >>>>>>>>>>>>>>>>>>> < > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>>>>>>>> Levani > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze < > >>>>>>>>>>>>>>>>> levani.co...@gmail.com > >>>>>>>>>>>>>>>>>>> <mailto:levani.co...@gmail.com>> wrote: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Hello, > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> In order to move this KIP forward, I’ve updated > confluence > >>>>>>>>> page > >>>>>>>>>>> with > >>>>>>>>>>>>>>>>>>> the new proposal > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint > >>>>>>>>>>>>>>>>>>> < > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> I’ve also filled “Rejected Alternatives” section. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Looking forward to discuss this KIP :) > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> King regards, > >>>>>>>>>>>>>>>>>>>>> Levani > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 1:08 PM, Levani Kokhreidze < > >>>>>>>>>>>>>>>>> levani.co...@gmail.com > >>>>>>>>>>>>>>>>>>> <mailto:levani.co...@gmail.com>> wrote: > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Hello Matthias, > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Thanks for the feedback and ideas. > >>>>>>>>>>>>>>>>>>>>>> I like the idea of introducing dedicated `Topic` > class for > >>>>>>>>>>> topic > >>>>>>>>>>>>>>>>>>> configuration for internal operators like `groupedBy`. > >>>>>>>>>>>>>>>>>>>>>> Would be great to hear others opinion about this as > well. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Kind regards, > >>>>>>>>>>>>>>>>>>>>>> Levani > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 7:00 AM, Matthias J. Sax < > >>>>>>>>>>> matth...@confluent.io > >>>>>>>>>>>>>>>>>>> <mailto:matth...@confluent.io>> wrote: > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Levani, > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Thanks for picking up this KIP! And thanks for > summarizing > >>>>>>>>>>>>>>>>> everything. > >>>>>>>>>>>>>>>>>>>>>>> Even if some points may have been discussed > already (can't > >>>>>>>>>>> really > >>>>>>>>>>>>>>>>>>>>>>> remember), it's helpful to get a good summary to > refresh the > >>>>>>>>>>>>>>>>>>> discussion. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> I think your reasoning makes sense. With regard to > the > >>>>>>>>>>> distinction > >>>>>>>>>>>>>>>>>>>>>>> between operators that manage topics and operators > that use > >>>>>>>>>>>>>>>>>>> user-created > >>>>>>>>>>>>>>>>>>>>>>> topics: Following this argument, it might indicate > that > >>>>>>>>>>> leaving > >>>>>>>>>>>>>>>>>>>>>>> `through()` as-is (as an operator that uses > use-defined > >>>>>>>>>>> topics) and > >>>>>>>>>>>>>>>>>>>>>>> introducing a new `repartition()` operator (an > operator that > >>>>>>>>>>> manages > >>>>>>>>>>>>>>>>>>>>>>> topics itself) might be good. Otherwise, there is > one > >>>>>>>>> operator > >>>>>>>>>>>>>>>>>>>>>>> `through()` that sometimes manages topics but > sometimes > >>>>>>>>> not; a > >>>>>>>>>>>>>>>>>>> different > >>>>>>>>>>>>>>>>>>>>>>> name, ie, new operator would make the distinction > clearer. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> About adding `numOfPartitions` to `Grouped`. I am > wondering > >>>>>>>>>>> if the > >>>>>>>>>>>>>>>>>>> same > >>>>>>>>>>>>>>>>>>>>>>> argument as for `Produced` does apply and adding > it is > >>>>>>>>>>> semantically > >>>>>>>>>>>>>>>>>>>>>>> questionable? Might be good to get opinions of > others on > >>>>>>>>>>> this, too. > >>>>>>>>>>>>>>>>> I > >>>>>>>>>>>>>>>>>>> am > >>>>>>>>>>>>>>>>>>>>>>> not sure myself what solution I prefer atm. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> So far, KS uses configuration objects that allow to > >>>>>>>>> configure > >>>>>>>>>>> a > >>>>>>>>>>>>>>>>>>> certain > >>>>>>>>>>>>>>>>>>>>>>> "entity" like a consumer, producer, store. If we > assume that > >>>>>>>>>>> a topic > >>>>>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>>>>>> a similar entity, I am wonder if we should have a > >>>>>>>>>>>>>>>>>>>>>>> `Topic#withNumberOfPartitions()` class and method > instead of > >>>>>>>>>>> a plain > >>>>>>>>>>>>>>>>>>>>>>> integer? This would allow us to add other configs, > like > >>>>>>>>>>> replication > >>>>>>>>>>>>>>>>>>>>>>> factor, retention-time etc, easily, without the > need to > >>>>>>>>>>> change the > >>>>>>>>>>>>>>>>>>> "main > >>>>>>>>>>>>>>>>>>>>>>> API". > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Just want to give some ideas. Not sure if I like > them > >>>>>>>>> myself. > >>>>>>>>>>> :) > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> -Matthias > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> On 7/1/19 1:04 AM, Levani Kokhreidze wrote: > >>>>>>>>>>>>>>>>>>>>>>>> Actually, giving it more though - maybe enhancing > Produced > >>>>>>>>>>> with num > >>>>>>>>>>>>>>>>>>> of partitions configuration is not the best approach. > Let me > >>>>>>>>>>> explain > >>>>>>>>>>>>>>>>> why: > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> 1) If we enhance Produced class with this > configuration, > >>>>>>>>>>> this will > >>>>>>>>>>>>>>>>>>> also affect KStream#to operation. Since KStream#to is > the final > >>>>>>>>>>> sink of > >>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>> topology, for me, it seems to be reasonable assumption > that user > >>>>>>>>>>> needs > >>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>> manually create sink topic in advance. And in that > case, having > >>>>>>>>>>> num of > >>>>>>>>>>>>>>>>>>> partitions configuration doesn’t make much sense. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> 2) Looking at Produced class, based on API > contract, seems > >>>>>>>>>>> like > >>>>>>>>>>>>>>>>>>> Produced is designed to be something that is > explicitly for > >>>>>>>>>>> producer > >>>>>>>>>>>>>>>>> (key > >>>>>>>>>>>>>>>>>>> serializer, value serializer, partitioner those all > are producer > >>>>>>>>>>>>>>>>> specific > >>>>>>>>>>>>>>>>>>> configurations) and num of partitions is topic level > >>>>>>>>>>> configuration. And > >>>>>>>>>>>>>>>>> I > >>>>>>>>>>>>>>>>>>> don’t think mixing topic and producer level > configurations > >>>>>>>>>>> together in > >>>>>>>>>>>>>>>>> one > >>>>>>>>>>>>>>>>>>> class is the good approach. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> 3) Looking at KStream interface, seems like > Produced > >>>>>>>>>>> parameter is > >>>>>>>>>>>>>>>>>>> for operations that work with non-internal (e.g topics > created > >>>>>>>>> and > >>>>>>>>>>>>>>>>> managed > >>>>>>>>>>>>>>>>>>> internally by Kafka Streams) topics and I think we > should leave > >>>>>>>>>>> it as > >>>>>>>>>>>>>>>>> it is > >>>>>>>>>>>>>>>>>>> in that case. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Taking all this things into account, I think we > should > >>>>>>>>>>> distinguish > >>>>>>>>>>>>>>>>>>> between DSL operations, where Kafka Streams should > create and > >>>>>>>>>>> manage > >>>>>>>>>>>>>>>>>>> internal topics (KStream#groupBy) vs topics that > should be > >>>>>>>>>>> created in > >>>>>>>>>>>>>>>>>>> advance (e.g KStream#to). > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> To sum it up, I think adding numPartitions > configuration in > >>>>>>>>>>>>>>>>> Produced > >>>>>>>>>>>>>>>>>>> will result in mixing topic and producer level > configuration in > >>>>>>>>>>> one > >>>>>>>>>>>>>>>>> class > >>>>>>>>>>>>>>>>>>> and it’s gonna break existing API semantics. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Regarding making topic name optional in > KStream#through - I > >>>>>>>>>>> think > >>>>>>>>>>>>>>>>>>> underline idea is very useful and giving users > possibility to > >>>>>>>>>>> specify > >>>>>>>>>>>>>>>>> num > >>>>>>>>>>>>>>>>>>> of partitions there is even more useful :) Considering > arguments > >>>>>>>>>>> against > >>>>>>>>>>>>>>>>>>> adding num of partitions in Produced class, I see two > options > >>>>>>>>>>> here: > >>>>>>>>>>>>>>>>>>>>>>>> 1) Add following method overloads > >>>>>>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and > num of > >>>>>>>>>>> partitions > >>>>>>>>>>>>>>>>>>> will be taken from source topic > >>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic will > be auto > >>>>>>>>>>>>>>>>>>> generated with specified num of partitions > >>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final > Produced<K, V> > >>>>>>>>>>>>>>>>>>> produced) - topic will be with generated with > specified num of > >>>>>>>>>>>>>>>>> partitions > >>>>>>>>>>>>>>>>>>> and configuration taken from produced parameter. > >>>>>>>>>>>>>>>>>>>>>>>> 2) Leave KStream#through as it is and introduce > new method > >>>>>>>>> - > >>>>>>>>>>>>>>>>>>> KStream#repartition (I think Matthias suggested this > in one of > >>>>>>>>> the > >>>>>>>>>>>>>>>>> threads) > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Considering all mentioned above I propose the > following > >>>>>>>>> plan: > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Option A: > >>>>>>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is > >>>>>>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to Grouped > class (as > >>>>>>>>>>>>>>>>>>> mentioned in the KIP) > >>>>>>>>>>>>>>>>>>>>>>>> 3) Add following method overloads to > KStream#through > >>>>>>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and > num of > >>>>>>>>>>> partitions > >>>>>>>>>>>>>>>>>>> will be taken from source topic > >>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic will > be auto > >>>>>>>>>>>>>>>>>>> generated with specified num of partitions > >>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final > Produced<K, V> > >>>>>>>>>>>>>>>>>>> produced) - topic will be with generated with > specified num of > >>>>>>>>>>>>>>>>> partitions > >>>>>>>>>>>>>>>>>>> and configuration taken from produced parameter. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Option B: > >>>>>>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is > >>>>>>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to Grouped > class (as > >>>>>>>>>>>>>>>>>>> mentioned in the KIP) > >>>>>>>>>>>>>>>>>>>>>>>> 3) Add new operator KStream#repartition for > creating and > >>>>>>>>>>> managing > >>>>>>>>>>>>>>>>>>> internal repartition topics > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> P.S. I’m sorry if all of this was already > discussed in the > >>>>>>>>>>> mailing > >>>>>>>>>>>>>>>>>>> list, but I kinda got with all the threads that were > about this > >>>>>>>>>>> KIP :( > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Kind regards, > >>>>>>>>>>>>>>>>>>>>>>>> Levani > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> On Jul 1, 2019, at 9:56 AM, Levani Kokhreidze < > >>>>>>>>>>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> > wrote: > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Hello, > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> I would like to resurrect discussion around > KIP-221. Going > >>>>>>>>>>> through > >>>>>>>>>>>>>>>>>>> the discussion thread, there’s seems to agreement > around > >>>>>>>>>>> usefulness of > >>>>>>>>>>>>>>>>> this > >>>>>>>>>>>>>>>>>>> feature. > >>>>>>>>>>>>>>>>>>>>>>>>> Regarding the implementation, as far as I > understood, the > >>>>>>>>>>> most > >>>>>>>>>>>>>>>>>>> optimal solution for me seems the following: > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> 1) Add two method overloads to KStream#through > method > >>>>>>>>>>> (essentially > >>>>>>>>>>>>>>>>>>> making topic name optional) > >>>>>>>>>>>>>>>>>>>>>>>>> 2) Enhance Produced class with numOfPartitions > >>>>>>>>> configuration > >>>>>>>>>>>>>>>>> field. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Those two changes will allow DSL users to control > >>>>>>>>>>> parallelism and > >>>>>>>>>>>>>>>>>>> trigger re-partition without doing stateful operations. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> I will update KIP with interface changes around > >>>>>>>>>>> KStream#through if > >>>>>>>>>>>>>>>>>>> this changes sound sensible. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Kind regards, > >>>>>>>>>>>>>>>>>>>>>>>>> Levani > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>>> > >>> > >> > > > >