Hey Levani,

I think people are busy with the upcoming 2.4 release, and don't have much
spare time at the
moment. It's kind of a difficult time to get attention on things, but feel
free to pick up something else
to work on in the meantime until things have calmed down a bit!

Cheers,
Sophie


On Wed, Oct 16, 2019 at 11:26 AM Levani Kokhreidze <levani.co...@gmail.com>
wrote:

> Hello all,
>
> Sorry for bringing this thread again, but I would like to get some
> attention on this PR: https://github.com/apache/kafka/pull/7170 <
> https://github.com/apache/kafka/pull/7170>
> It's been a while now and I would love to move on to other KIPs as well.
> Please let me know if you have any concerns.
>
> Regards,
> Levani
>
>
> > On Jul 26, 2019, at 11:25 AM, Levani Kokhreidze <levani.co...@gmail.com>
> wrote:
> >
> > Hi all,
> >
> > Here’s voting thread for this KIP:
> https://www.mail-archive.com/dev@kafka.apache.org/msg99680.html <
> https://www.mail-archive.com/dev@kafka.apache.org/msg99680.html>
> >
> > Regards,
> > Levani
> >
> >> On Jul 24, 2019, at 11:15 PM, Levani Kokhreidze <levani.co...@gmail.com
> <mailto:levani.co...@gmail.com>> wrote:
> >>
> >> Hi Matthias,
> >>
> >> Thanks for the suggestion. I Don’t have strong opinion on that one.
> >> Agree that avoiding unnecessary method overloads is a good idea.
> >>
> >> Updated KIP
> >>
> >> Regards,
> >> Levani
> >>
> >>
> >>> On Jul 24, 2019, at 8:50 PM, Matthias J. Sax <matth...@confluent.io
> <mailto:matth...@confluent.io>> wrote:
> >>>
> >>> One question:
> >>>
> >>> Why do we add
> >>>
> >>>> Repartitioned#with(final String name, final int numberOfPartitions)
> >>>
> >>> It seems that `#with(String name)`, `#numberOfPartitions(int)` in
> >>> combination with `withName()` and `withNumberOfPartitions()` should be
> >>> sufficient. Users can chain the method calls.
> >>>
> >>> (I think it's valuable to keep the number of overload small if
> possible.)
> >>>
> >>> Otherwise LGTM.
> >>>
> >>>
> >>> -Matthias
> >>>
> >>>
> >>> On 7/23/19 2:18 PM, Levani Kokhreidze wrote:
> >>>> Hello,
> >>>>
> >>>> Thanks all for your feedback.
> >>>> I started voting procedure for this KIP. If there’re any other
> concerns about this KIP, please let me know.
> >>>>
> >>>> Regards,
> >>>> Levani
> >>>>
> >>>>> On Jul 20, 2019, at 8:39 PM, Levani Kokhreidze <
> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> wrote:
> >>>>>
> >>>>> Hi Matthias,
> >>>>>
> >>>>> Thanks for the suggestion, makes sense.
> >>>>> I’ve updated KIP (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint>
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >>).
> >>>>>
> >>>>> Regards,
> >>>>> Levani
> >>>>>
> >>>>>
> >>>>>> On Jul 20, 2019, at 3:53 AM, Matthias J. Sax <matth...@confluent.io
> <mailto:matth...@confluent.io> <mailto:matth...@confluent.io <mailto:
> matth...@confluent.io>>> wrote:
> >>>>>>
> >>>>>> Thanks for driving the KIP.
> >>>>>>
> >>>>>> I agree that users need to be able to specify a partitioning
> strategy.
> >>>>>>
> >>>>>> Sophie raises a fair point about topic configs and producer
> configs. My
> >>>>>> take is, that consider `Repartitioned` as an "extension" to
> `Produced`,
> >>>>>> that adds topic configuration, is a good way to think about it and
> helps
> >>>>>> to keep the API "clean".
> >>>>>>
> >>>>>>
> >>>>>> With regard to method names. I would prefer to avoid abbreviations.
> Can
> >>>>>> we rename:
> >>>>>>
> >>>>>> `withNumOfPartitions` -> `withNumberOfPartitions`
> >>>>>>
> >>>>>> Furthermore, it might be good to add some more `static` methods:
> >>>>>>
> >>>>>> - Repartitioned.with(Serde<K>, Serde<V>)
> >>>>>> - Repartitioned.withNumberOfPartitions(int)
> >>>>>> - Repartitioned.streamPartitioner(StreamPartitioner)
> >>>>>>
> >>>>>>
> >>>>>> -Matthias
> >>>>>>
> >>>>>> On 7/19/19 3:33 PM, Levani Kokhreidze wrote:
> >>>>>>> Totally agree. I think in KStream interface it makes sense to have
> some duplicate configurations between operators in order to keep API simple
> and usable.
> >>>>>>> Also, as more surface API has, harder it is to have proper
> backward compatibility.
> >>>>>>> While initial idea of keeping topic level configs separate was
> exciting, having Repartitioned class encapsulate some producer level
> configs makes API more readable.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Levani
> >>>>>>>
> >>>>>>>> On Jul 20, 2019, at 1:15 AM, Sophie Blee-Goldman <
> sop...@confluent.io <mailto:sop...@confluent.io> <mailto:
> sop...@confluent.io <mailto:sop...@confluent.io>>> wrote:
> >>>>>>>>
> >>>>>>>> I think that is a good point about trying to keep producer level
> >>>>>>>> configurations and (repartition) topic level considerations
> separate.
> >>>>>>>> Number of partitions is definitely purely a topic level
> configuration. But
> >>>>>>>> on some level, serdes and partitioners are just as much a topic
> >>>>>>>> configuration as a producer one. You could have two producers
> configured
> >>>>>>>> with different serdes and/or partitioners, but if they are
> writing to the
> >>>>>>>> same topic the result would be very difficult to part. So in a
> sense, these
> >>>>>>>> are configurations of topics in Streams, not just producers.
> >>>>>>>>
> >>>>>>>> Another way to think of it: while the Streams API is not always
> true to
> >>>>>>>> this, ideally all the relevant configs for an operator are
> wrapped into a
> >>>>>>>> single object (in this case, Repartitioned). We could instead
> split out the
> >>>>>>>> fields in common with Produced into a separate parameter to keep
> topic and
> >>>>>>>> producer level configurations separate, but this increases the
> API surface
> >>>>>>>> area by a lot. It's much more straightforward to just say "this is
> >>>>>>>> everything that this particular operator needs" without worrying
> about what
> >>>>>>>> exactly you're specifying.
> >>>>>>>>
> >>>>>>>> I suppose you could alternatively make Produced a field of
> Repartitioned,
> >>>>>>>> but I don't think we do this kind of composition elsewhere in
> Streams at
> >>>>>>>> the moment
> >>>>>>>>
> >>>>>>>> On Fri, Jul 19, 2019 at 1:45 PM Levani Kokhreidze <
> levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto:
> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Bill,
> >>>>>>>>>
> >>>>>>>>> Thanks a lot for the feedback.
> >>>>>>>>> Yes, that makes sense. I’ve updated KIP with
> `Repartitioned#partitioner`
> >>>>>>>>> configuration.
> >>>>>>>>> In the beginning, I wanted to introduce a class for topic level
> >>>>>>>>> configuration and keep topic level and producer level
> configurations (such
> >>>>>>>>> as Produced) separately (see my second email in this thread).
> >>>>>>>>> But while looking at the semantics of KStream interface, I
> couldn’t really
> >>>>>>>>> figure out good operation name for Topic level configuration
> class and just
> >>>>>>>>> introducing `Topic` config class was kinda breaking the
> semantics.
> >>>>>>>>> So I think having Repartitioned class which encapsulates topic
> and
> >>>>>>>>> producer level configurations for internal topics is viable
> thing to do.
> >>>>>>>>>
> >>>>>>>>> Regards,
> >>>>>>>>> Levani
> >>>>>>>>>
> >>>>>>>>>> On Jul 19, 2019, at 7:47 PM, Bill Bejeck <bbej...@gmail.com
> <mailto:bbej...@gmail.com> <mailto:bbej...@gmail.com <mailto:
> bbej...@gmail.com>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi Lavani,
> >>>>>>>>>>
> >>>>>>>>>> Thanks for resurrecting this KIP.
> >>>>>>>>>>
> >>>>>>>>>> I'm also a +1 for adding a partition option.  In addition to
> the reason
> >>>>>>>>>> provided by John, my reasoning is:
> >>>>>>>>>>
> >>>>>>>>>> 1. Users may want to use something other than hash-based
> partitioning
> >>>>>>>>>> 2. Users may wish to partition on something different than the
> key
> >>>>>>>>>> without having to change the key.  For example:
> >>>>>>>>>>  1. A combination of fields in the value in conjunction with
> the key
> >>>>>>>>>>  2. Something other than the key
> >>>>>>>>>> 3. We allow users to specify a partitioner on Produced hence in
> >>>>>>>>>> KStream.to and KStream.through, so it makes sense for API
> consistency.
> >>>>>>>>>>
> >>>>>>>>>> Just my  2 cents.
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Bill
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Jul 19, 2019 at 5:46 AM Levani Kokhreidze <
> >>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto:
> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi John,
> >>>>>>>>>>>
> >>>>>>>>>>> In my mind it makes sense.
> >>>>>>>>>>> If we add partitioner configuration to Repartitioned class,
> with the
> >>>>>>>>>>> combination of specifying number of partitions for internal
> topics, user
> >>>>>>>>>>> will have opportunity to ensure co-partitioning before join
> operation.
> >>>>>>>>>>> I think this can be quite powerful feature.
> >>>>>>>>>>> Wondering what others think about this?
> >>>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Levani
> >>>>>>>>>>>
> >>>>>>>>>>>> On Jul 18, 2019, at 1:20 AM, John Roesler <j...@confluent.io
> <mailto:j...@confluent.io> <mailto:j...@confluent.io <mailto:
> j...@confluent.io>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Yes, I believe that's what I had in mind. Again, not totally
> sure it
> >>>>>>>>>>>> makes sense, but I believe something similar is the rationale
> for
> >>>>>>>>>>>> having the partitioner option in Produced.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>> -John
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Wed, Jul 17, 2019 at 3:20 PM Levani Kokhreidze
> >>>>>>>>>>>> <levani.co...@gmail.com <mailto:levani.co...@gmail.com>
> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hey John,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Oh that’s interesting use-case.
> >>>>>>>>>>>>> Do I understand this correctly, in your example I would
> first issue
> >>>>>>>>>>> repartition(Repartitioned) with proper partitioner that
> essentially
> >>>>>>>>> would
> >>>>>>>>>>> be the same as the topic I want to join with and then do the
> >>>>>>>>> KStream#join
> >>>>>>>>>>> with DSL?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>> Levani
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Jul 17, 2019, at 11:11 PM, John Roesler <
> j...@confluent.io <mailto:j...@confluent.io> <mailto:j...@confluent.io
> <mailto:j...@confluent.io>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hey, all, just to chime in,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think it might be useful to have an option to specify the
> >>>>>>>>>>>>>> partitioner. The case I have in mind is that some data may
> get
> >>>>>>>>>>>>>> repartitioned and then joined with an input topic. If the
> right-side
> >>>>>>>>>>>>>> input topic uses a custom partitioning strategy, then the
> >>>>>>>>>>>>>> repartitioned stream also needs to be partitioned with the
> same
> >>>>>>>>>>>>>> strategy.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Does that make sense, or did I maybe miss something
> important?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 2:48 PM Levani Kokhreidze
> >>>>>>>>>>>>>> <levani.co...@gmail.com <mailto:levani.co...@gmail.com>
> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Yes, I was thinking about it as well. To be honest I’m not
> sure
> >>>>>>>>> about
> >>>>>>>>>>> it yet.
> >>>>>>>>>>>>>>> As Kafka Streams DSL user, I don’t really think I would
> need control
> >>>>>>>>>>> over partitioner for internal topics.
> >>>>>>>>>>>>>>> As a user, I would assume that Kafka Streams knows best
> how to
> >>>>>>>>>>> partition data for internal topics.
> >>>>>>>>>>>>>>> In this KIP I wrote that Produced should be used only for
> topics
> >>>>>>>>> that
> >>>>>>>>>>> are created by user In advance.
> >>>>>>>>>>>>>>> In those cases maybe it make sense to have possibility to
> specify
> >>>>>>>>> the
> >>>>>>>>>>> partitioner.
> >>>>>>>>>>>>>>> I don’t have clear answer on that yet, but I guess
> specifying the
> >>>>>>>>>>> partitioner can be added as well if there’s agreement on this.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>> Levani
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Jul 17, 2019, at 10:42 PM, Sophie Blee-Goldman <
> >>>>>>>>>>> sop...@confluent.io <mailto:sop...@confluent.io> <mailto:
> sop...@confluent.io <mailto:sop...@confluent.io>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks for clearing that up. I agree that Repartitioned
> would be a
> >>>>>>>>>>> useful
> >>>>>>>>>>>>>>>> addition. I'm wondering if it might also need to have
> >>>>>>>>>>>>>>>> a withStreamPartitioner method/field, similar to
> Produced? I'm not
> >>>>>>>>>>> sure how
> >>>>>>>>>>>>>>>> widely this feature is really used, but seems it should be
> >>>>>>>>> available
> >>>>>>>>>>> for
> >>>>>>>>>>>>>>>> repartition topics.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 11:26 AM Levani Kokhreidze <
> >>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>
> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com>>>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hey Sophie,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> In both cases KStream#repartition and
> >>>>>>>>>>> KStream#repartition(Repartitioned)
> >>>>>>>>>>>>>>>>> topic will be created and managed by Kafka Streams.
> >>>>>>>>>>>>>>>>> Idea of Repartitioned is to give user more control over
> the topic
> >>>>>>>>>>> such as
> >>>>>>>>>>>>>>>>> num of partitions.
> >>>>>>>>>>>>>>>>> I feel like Repartitioned parameter is something that is
> missing
> >>>>>>>>> in
> >>>>>>>>>>>>>>>>> current DSL design.
> >>>>>>>>>>>>>>>>> Essentially giving user control over parallelism by
> configuring
> >>>>>>>>> num
> >>>>>>>>>>> of
> >>>>>>>>>>>>>>>>> partitions for internal topics.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hope this answers your question.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>> Levani
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Jul 17, 2019, at 9:02 PM, Sophie Blee-Goldman <
> >>>>>>>>>>> sop...@confluent.io <mailto:sop...@confluent.io> <mailto:
> sop...@confluent.io <mailto:sop...@confluent.io>>>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Hey Levani,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Thanks for the KIP! Can you clarify one thing for me --
> for the
> >>>>>>>>>>>>>>>>>> KStream#repartition signature taking a Repartitioned,
> will the
> >>>>>>>>>>> topic be
> >>>>>>>>>>>>>>>>>> auto-created by Streams (which seems to be the case for
> the
> >>>>>>>>>>> signature
> >>>>>>>>>>>>>>>>>> without a Repartitioned) or does it have to be
> pre-created? The
> >>>>>>>>>>> wording
> >>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>> the KIP makes it seem like one version of the method
> will
> >>>>>>>>>>> auto-create
> >>>>>>>>>>>>>>>>>> topics while the other will not.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>> Sophie
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 10:15 AM Levani Kokhreidze <
> >>>>>>>>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>
> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com>>>
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> One more bump about KIP-221 (
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint>
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >>
> >>>>>>>>>>>>>>>>>>> <
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >
> >>>>>>>>>>>>>>>>>> )
> >>>>>>>>>>>>>>>>>>> so it doesn’t get lost in mailing list :)
> >>>>>>>>>>>>>>>>>>> Would love to hear communities opinions/concerns about
> this KIP.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>>>> Levani
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze <
> >>>>>>>>>>> levani.co...@gmail.com
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Kind reminder about this KIP:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >>>>>>>>>>>>>>>>>>> <
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>>>>> Levani
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze <
> >>>>>>>>>>>>>>>>> levani.co...@gmail.com
> >>>>>>>>>>>>>>>>>>> <mailto:levani.co...@gmail.com>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> In order to move this KIP forward, I’ve updated
> confluence
> >>>>>>>>> page
> >>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>> the new proposal
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >>>>>>>>>>>>>>>>>>> <
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I’ve also filled “Rejected Alternatives” section.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Looking forward to discuss this KIP :)
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> King regards,
> >>>>>>>>>>>>>>>>>>>>> Levani
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 1:08 PM, Levani Kokhreidze <
> >>>>>>>>>>>>>>>>> levani.co...@gmail.com
> >>>>>>>>>>>>>>>>>>> <mailto:levani.co...@gmail.com>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Hello Matthias,
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Thanks for the feedback and ideas.
> >>>>>>>>>>>>>>>>>>>>>> I like the idea of introducing dedicated `Topic`
> class for
> >>>>>>>>>>> topic
> >>>>>>>>>>>>>>>>>>> configuration for internal operators like `groupedBy`.
> >>>>>>>>>>>>>>>>>>>>>> Would be great to hear others opinion about this as
> well.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Kind regards,
> >>>>>>>>>>>>>>>>>>>>>> Levani
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 7:00 AM, Matthias J. Sax <
> >>>>>>>>>>> matth...@confluent.io
> >>>>>>>>>>>>>>>>>>> <mailto:matth...@confluent.io>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Levani,
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Thanks for picking up this KIP! And thanks for
> summarizing
> >>>>>>>>>>>>>>>>> everything.
> >>>>>>>>>>>>>>>>>>>>>>> Even if some points may have been discussed
> already (can't
> >>>>>>>>>>> really
> >>>>>>>>>>>>>>>>>>>>>>> remember), it's helpful to get a good summary to
> refresh the
> >>>>>>>>>>>>>>>>>>> discussion.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> I think your reasoning makes sense. With regard to
> the
> >>>>>>>>>>> distinction
> >>>>>>>>>>>>>>>>>>>>>>> between operators that manage topics and operators
> that use
> >>>>>>>>>>>>>>>>>>> user-created
> >>>>>>>>>>>>>>>>>>>>>>> topics: Following this argument, it might indicate
> that
> >>>>>>>>>>> leaving
> >>>>>>>>>>>>>>>>>>>>>>> `through()` as-is (as an operator that uses
> use-defined
> >>>>>>>>>>> topics) and
> >>>>>>>>>>>>>>>>>>>>>>> introducing a new `repartition()` operator (an
> operator that
> >>>>>>>>>>> manages
> >>>>>>>>>>>>>>>>>>>>>>> topics itself) might be good. Otherwise, there is
> one
> >>>>>>>>> operator
> >>>>>>>>>>>>>>>>>>>>>>> `through()` that sometimes manages topics but
> sometimes
> >>>>>>>>> not; a
> >>>>>>>>>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>>>>>>>> name, ie, new operator would make the distinction
> clearer.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> About adding `numOfPartitions` to `Grouped`. I am
> wondering
> >>>>>>>>>>> if the
> >>>>>>>>>>>>>>>>>>> same
> >>>>>>>>>>>>>>>>>>>>>>> argument as for `Produced` does apply and adding
> it is
> >>>>>>>>>>> semantically
> >>>>>>>>>>>>>>>>>>>>>>> questionable? Might be good to get opinions of
> others on
> >>>>>>>>>>> this, too.
> >>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>> am
> >>>>>>>>>>>>>>>>>>>>>>> not sure myself what solution I prefer atm.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> So far, KS uses configuration objects that allow to
> >>>>>>>>> configure
> >>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>> certain
> >>>>>>>>>>>>>>>>>>>>>>> "entity" like a consumer, producer, store. If we
> assume that
> >>>>>>>>>>> a topic
> >>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>> a similar entity, I am wonder if we should have a
> >>>>>>>>>>>>>>>>>>>>>>> `Topic#withNumberOfPartitions()` class and method
> instead of
> >>>>>>>>>>> a plain
> >>>>>>>>>>>>>>>>>>>>>>> integer? This would allow us to add other configs,
> like
> >>>>>>>>>>> replication
> >>>>>>>>>>>>>>>>>>>>>>> factor, retention-time etc, easily, without the
> need to
> >>>>>>>>>>> change the
> >>>>>>>>>>>>>>>>>>> "main
> >>>>>>>>>>>>>>>>>>>>>>> API".
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Just want to give some ideas. Not sure if I like
> them
> >>>>>>>>> myself.
> >>>>>>>>>>> :)
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On 7/1/19 1:04 AM, Levani Kokhreidze wrote:
> >>>>>>>>>>>>>>>>>>>>>>>> Actually, giving it more though - maybe enhancing
> Produced
> >>>>>>>>>>> with num
> >>>>>>>>>>>>>>>>>>> of partitions configuration is not the best approach.
> Let me
> >>>>>>>>>>> explain
> >>>>>>>>>>>>>>>>> why:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> 1) If we enhance Produced class with this
> configuration,
> >>>>>>>>>>> this will
> >>>>>>>>>>>>>>>>>>> also affect KStream#to operation. Since KStream#to is
> the final
> >>>>>>>>>>> sink of
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> topology, for me, it seems to be reasonable assumption
> that user
> >>>>>>>>>>> needs
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> manually create sink topic in advance. And in that
> case, having
> >>>>>>>>>>> num of
> >>>>>>>>>>>>>>>>>>> partitions configuration doesn’t make much sense.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> 2) Looking at Produced class, based on API
> contract, seems
> >>>>>>>>>>> like
> >>>>>>>>>>>>>>>>>>> Produced is designed to be something that is
> explicitly for
> >>>>>>>>>>> producer
> >>>>>>>>>>>>>>>>> (key
> >>>>>>>>>>>>>>>>>>> serializer, value serializer, partitioner those all
> are producer
> >>>>>>>>>>>>>>>>> specific
> >>>>>>>>>>>>>>>>>>> configurations) and num of partitions is topic level
> >>>>>>>>>>> configuration. And
> >>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>> don’t think mixing topic and producer level
> configurations
> >>>>>>>>>>> together in
> >>>>>>>>>>>>>>>>> one
> >>>>>>>>>>>>>>>>>>> class is the good approach.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> 3) Looking at KStream interface, seems like
> Produced
> >>>>>>>>>>> parameter is
> >>>>>>>>>>>>>>>>>>> for operations that work with non-internal (e.g topics
> created
> >>>>>>>>> and
> >>>>>>>>>>>>>>>>> managed
> >>>>>>>>>>>>>>>>>>> internally by Kafka Streams) topics and I think we
> should leave
> >>>>>>>>>>> it as
> >>>>>>>>>>>>>>>>> it is
> >>>>>>>>>>>>>>>>>>> in that case.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Taking all this things into account, I think we
> should
> >>>>>>>>>>> distinguish
> >>>>>>>>>>>>>>>>>>> between DSL operations, where Kafka Streams should
> create and
> >>>>>>>>>>> manage
> >>>>>>>>>>>>>>>>>>> internal topics (KStream#groupBy) vs topics that
> should be
> >>>>>>>>>>> created in
> >>>>>>>>>>>>>>>>>>> advance (e.g KStream#to).
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> To sum it up, I think adding numPartitions
> configuration in
> >>>>>>>>>>>>>>>>> Produced
> >>>>>>>>>>>>>>>>>>> will result in mixing topic and producer level
> configuration in
> >>>>>>>>>>> one
> >>>>>>>>>>>>>>>>> class
> >>>>>>>>>>>>>>>>>>> and it’s gonna break existing API semantics.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Regarding making topic name optional in
> KStream#through - I
> >>>>>>>>>>> think
> >>>>>>>>>>>>>>>>>>> underline idea is very useful and giving users
> possibility to
> >>>>>>>>>>> specify
> >>>>>>>>>>>>>>>>> num
> >>>>>>>>>>>>>>>>>>> of partitions there is even more useful :) Considering
> arguments
> >>>>>>>>>>> against
> >>>>>>>>>>>>>>>>>>> adding num of partitions in Produced class, I see two
> options
> >>>>>>>>>>> here:
> >>>>>>>>>>>>>>>>>>>>>>>> 1) Add following method overloads
> >>>>>>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and
> num of
> >>>>>>>>>>> partitions
> >>>>>>>>>>>>>>>>>>> will be taken from source topic
> >>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic will
> be auto
> >>>>>>>>>>>>>>>>>>> generated with specified num of partitions
> >>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final
> Produced<K, V>
> >>>>>>>>>>>>>>>>>>> produced) - topic will be with generated with
> specified num of
> >>>>>>>>>>>>>>>>> partitions
> >>>>>>>>>>>>>>>>>>> and configuration taken from produced parameter.
> >>>>>>>>>>>>>>>>>>>>>>>> 2) Leave KStream#through as it is and introduce
> new method
> >>>>>>>>> -
> >>>>>>>>>>>>>>>>>>> KStream#repartition (I think Matthias suggested this
> in one of
> >>>>>>>>> the
> >>>>>>>>>>>>>>>>> threads)
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Considering all mentioned above I propose the
> following
> >>>>>>>>> plan:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Option A:
> >>>>>>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is
> >>>>>>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to Grouped
> class (as
> >>>>>>>>>>>>>>>>>>> mentioned in the KIP)
> >>>>>>>>>>>>>>>>>>>>>>>> 3) Add following method overloads to
> KStream#through
> >>>>>>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and
> num of
> >>>>>>>>>>> partitions
> >>>>>>>>>>>>>>>>>>> will be taken from source topic
> >>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic will
> be auto
> >>>>>>>>>>>>>>>>>>> generated with specified num of partitions
> >>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final
> Produced<K, V>
> >>>>>>>>>>>>>>>>>>> produced) - topic will be with generated with
> specified num of
> >>>>>>>>>>>>>>>>> partitions
> >>>>>>>>>>>>>>>>>>> and configuration taken from produced parameter.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Option B:
> >>>>>>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is
> >>>>>>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to Grouped
> class (as
> >>>>>>>>>>>>>>>>>>> mentioned in the KIP)
> >>>>>>>>>>>>>>>>>>>>>>>> 3) Add new operator KStream#repartition for
> creating and
> >>>>>>>>>>> managing
> >>>>>>>>>>>>>>>>>>> internal repartition topics
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> P.S. I’m sorry if all of this was already
> discussed in the
> >>>>>>>>>>> mailing
> >>>>>>>>>>>>>>>>>>> list, but I kinda got with all the threads that were
> about this
> >>>>>>>>>>> KIP :(
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Kind regards,
> >>>>>>>>>>>>>>>>>>>>>>>> Levani
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> On Jul 1, 2019, at 9:56 AM, Levani Kokhreidze <
> >>>>>>>>>>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>
> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> I would like to resurrect discussion around
> KIP-221. Going
> >>>>>>>>>>> through
> >>>>>>>>>>>>>>>>>>> the discussion thread, there’s seems to agreement
> around
> >>>>>>>>>>> usefulness of
> >>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>> feature.
> >>>>>>>>>>>>>>>>>>>>>>>>> Regarding the implementation, as far as I
> understood, the
> >>>>>>>>>>> most
> >>>>>>>>>>>>>>>>>>> optimal solution for me seems the following:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> 1) Add two method overloads to KStream#through
> method
> >>>>>>>>>>> (essentially
> >>>>>>>>>>>>>>>>>>> making topic name optional)
> >>>>>>>>>>>>>>>>>>>>>>>>> 2) Enhance Produced class with numOfPartitions
> >>>>>>>>> configuration
> >>>>>>>>>>>>>>>>> field.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Those two changes will allow DSL users to control
> >>>>>>>>>>> parallelism and
> >>>>>>>>>>>>>>>>>>> trigger re-partition without doing stateful operations.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> I will update KIP with interface changes around
> >>>>>>>>>>> KStream#through if
> >>>>>>>>>>>>>>>>>>> this changes sound sensible.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Kind regards,
> >>>>>>>>>>>>>>>>>>>>>>>>> Levani
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >
>
>

Reply via email to