Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

Sophie Blee-Goldman Tue, 05 Nov 2019 15:31:30 -0800

> Personally, I think Matthias’s concern is valid, but on the other hand
Kafka Streams has already
> optimizer in place which alters topology independently from user


I agree (with you) and think this is a good way to put it -- we currently
auto-repartition for the user so
that they don't have to walk through their entire topology and reason about
when and where to place a
`.through` (or the new `.repartition`), so why suddenly force this onto the
user? How certain are we that
users will always get this right? It's easy to imagine that during
development, you write your new app with
correctly placed repartitions in order to use this new feature. During the
course of development you end up
tweaking the topology, but don't remember to review or move the
repartitioning since you're used to Streams
doing this for you. If you use only single-partition topics for testing,
you might not even notice your app is
spitting out incorrect results!

Anyways, I feel pretty strongly that it would be weird to introduce a new
feature and say that to use it, you can't take
advantage of this other feature anymore. Also, is it possible our
optimization framework could ever include an
optimized repartitioning strategy that is better than what a user could
achieve by manually inserting repartitions?
Do we expect users to have a deep understanding of the best way to
repartition their particular topology, or is it
likely they will end up over-repartitioning either due to missed
optimizations or unnecessary extra repartitions?
I think many users would prefer to just say "if there *is* a repartition
required at this point in the topology, it should
have N partitions"

As to the idea of adding `numberOfPartitions` to Grouped rather than
adding a new parameter to groupBy, that does seem more in line with the
current syntax so +1 from me

On Tue, Nov 5, 2019 at 2:07 PM Levani Kokhreidze <[email protected]>
wrote:

> Hello all,
>
> While https://github.com/apache/kafka/pull/7170 <
> https://github.com/apache/kafka/pull/7170> is under review and it’s
> almost done, I want to resurrect discussion about this KIP to address
> couple of concerns raised by Matthias and John.
>
> As a reminder, idea of the KIP-221 was to allow DSL users control over
> repartitioning and parallelism of sub-topologies by:
> 1) Introducing new KStream#repartition operation which is done in
> https://github.com/apache/kafka/pull/7170 <
> https://github.com/apache/kafka/pull/7170>
> 2) Add new KStream#groupBy(Repartitioned) operation, which is planned to
> be separate PR.
>
> While all agree about general implementation and idea behind
> https://github.com/apache/kafka/pull/7170 <
> https://github.com/apache/kafka/pull/7170> PR, introducing new
> KStream#groupBy(Repartitioned) method overload raised some questions during
> the review.
> Matthias raised concern that there can be cases when user uses
> `KStream#groupBy(Repartitioned)` operation, but actual repartitioning may
> not required, thus configuration passed via `Repartitioned` would never be
> applied (Matthias, please correct me if I misinterpreted your comment).
> So instead, if user wants to control parallelism of sub-topologies, he or
> she should always use `KStream#repartition` operation before groupBy. Full
> comment can be seen here:
> https://github.com/apache/kafka/pull/7170#issuecomment-519303125 <
> https://github.com/apache/kafka/pull/7170#issuecomment-519303125>
>
> On the same topic, John pointed out that, from API design perspective, we
> shouldn’t intertwine configuration classes of different operators between
> one another. So instead of introducing new `KStream#groupBy(Repartitioned)`
> for specifying number of partitions for internal topic, we should update
> existing `Grouped` class with `numberOfPartitions` field.
>
> Personally, I think Matthias’s concern is valid, but on the other hand
> Kafka Streams has already optimizer in place which alters topology
> independently from user. So maybe it makes sense if Kafka Streams,
> internally would optimize topology in the best way possible, even if in
> some cases this means ignoring some operator configurations passed by the
> user. Also, I agree with John about API design semantics. If we go through
> with the changes for `KStream#groupBy` operation, it makes more sense to
> add `numberOfPartitions` field to `Grouped` class instead of introducing
> new `KStream#groupBy(Repartitioned)` method overload.
>
> I would really appreciate communities feedback on this.
>
> Kind regards,
> Levani
>
>
>
> > On Oct 17, 2019, at 12:57 AM, Sophie Blee-Goldman <[email protected]>
> wrote:
> >
> > Hey Levani,
> >
> > I think people are busy with the upcoming 2.4 release, and don't have
> much
> > spare time at the
> > moment. It's kind of a difficult time to get attention on things, but
> feel
> > free to pick up something else
> > to work on in the meantime until things have calmed down a bit!
> >
> > Cheers,
> > Sophie
> >
> >
> > On Wed, Oct 16, 2019 at 11:26 AM Levani Kokhreidze <
> [email protected] <mailto:[email protected]>>
> > wrote:
> >
> >> Hello all,
> >>
> >> Sorry for bringing this thread again, but I would like to get some
> >> attention on this PR: https://github.com/apache/kafka/pull/7170 <
> https://github.com/apache/kafka/pull/7170> <
> >> https://github.com/apache/kafka/pull/7170 <
> https://github.com/apache/kafka/pull/7170>>
> >> It's been a while now and I would love to move on to other KIPs as well.
> >> Please let me know if you have any concerns.
> >>
> >> Regards,
> >> Levani
> >>
> >>
> >>> On Jul 26, 2019, at 11:25 AM, Levani Kokhreidze <
> [email protected] <mailto:[email protected]>>
> >> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> Here’s voting thread for this KIP:
> >> https://www.mail-archive.com/[email protected]/msg99680.html <
> https://www.mail-archive.com/[email protected]/msg99680.html> <
> >> https://www.mail-archive.com/[email protected]/msg99680.html <
> https://www.mail-archive.com/[email protected]/msg99680.html>>
> >>>
> >>> Regards,
> >>> Levani
> >>>
> >>>> On Jul 24, 2019, at 11:15 PM, Levani Kokhreidze <
> [email protected] <mailto:[email protected]>
> >> <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >>>>
> >>>> Hi Matthias,
> >>>>
> >>>> Thanks for the suggestion. I Don’t have strong opinion on that one.
> >>>> Agree that avoiding unnecessary method overloads is a good idea.
> >>>>
> >>>> Updated KIP
> >>>>
> >>>> Regards,
> >>>> Levani
> >>>>
> >>>>
> >>>>> On Jul 24, 2019, at 8:50 PM, Matthias J. Sax <[email protected]
> <mailto:[email protected]>
> >> <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >>>>>
> >>>>> One question:
> >>>>>
> >>>>> Why do we add
> >>>>>
> >>>>>> Repartitioned#with(final String name, final int numberOfPartitions)
> >>>>>
> >>>>> It seems that `#with(String name)`, `#numberOfPartitions(int)` in
> >>>>> combination with `withName()` and `withNumberOfPartitions()` should
> be
> >>>>> sufficient. Users can chain the method calls.
> >>>>>
> >>>>> (I think it's valuable to keep the number of overload small if
> >> possible.)
> >>>>>
> >>>>> Otherwise LGTM.
> >>>>>
> >>>>>
> >>>>> -Matthias
> >>>>>
> >>>>>
> >>>>> On 7/23/19 2:18 PM, Levani Kokhreidze wrote:
> >>>>>> Hello,
> >>>>>>
> >>>>>> Thanks all for your feedback.
> >>>>>> I started voting procedure for this KIP. If there’re any other
> >> concerns about this KIP, please let me know.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Levani
> >>>>>>
> >>>>>>> On Jul 20, 2019, at 8:39 PM, Levani Kokhreidze <
> >> [email protected] <mailto:[email protected]> <mailto:
> [email protected] <mailto:[email protected]>>> wrote:
> >>>>>>>
> >>>>>>> Hi Matthias,
> >>>>>>>
> >>>>>>> Thanks for the suggestion, makes sense.
> >>>>>>> I’ve updated KIP (
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >>
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >
> >>>> ).
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Levani
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Jul 20, 2019, at 3:53 AM, Matthias J. Sax <
> [email protected] <mailto:[email protected]>
> >> <mailto:[email protected] <mailto:[email protected]>> <mailto:
> [email protected] <mailto:[email protected]> <mailto:
> >> [email protected] <mailto:[email protected]>>>> wrote:
> >>>>>>>>
> >>>>>>>> Thanks for driving the KIP.
> >>>>>>>>
> >>>>>>>> I agree that users need to be able to specify a partitioning
> >> strategy.
> >>>>>>>>
> >>>>>>>> Sophie raises a fair point about topic configs and producer
> >> configs. My
> >>>>>>>> take is, that consider `Repartitioned` as an "extension" to
> >> `Produced`,
> >>>>>>>> that adds topic configuration, is a good way to think about it and
> >> helps
> >>>>>>>> to keep the API "clean".
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> With regard to method names. I would prefer to avoid
> abbreviations.
> >> Can
> >>>>>>>> we rename:
> >>>>>>>>
> >>>>>>>> `withNumOfPartitions` -> `withNumberOfPartitions`
> >>>>>>>>
> >>>>>>>> Furthermore, it might be good to add some more `static` methods:
> >>>>>>>>
> >>>>>>>> - Repartitioned.with(Serde<K>, Serde<V>)
> >>>>>>>> - Repartitioned.withNumberOfPartitions(int)
> >>>>>>>> - Repartitioned.streamPartitioner(StreamPartitioner)
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> -Matthias
> >>>>>>>>
> >>>>>>>> On 7/19/19 3:33 PM, Levani Kokhreidze wrote:
> >>>>>>>>> Totally agree. I think in KStream interface it makes sense to
> have
> >> some duplicate configurations between operators in order to keep API
> simple
> >> and usable.
> >>>>>>>>> Also, as more surface API has, harder it is to have proper
> >> backward compatibility.
> >>>>>>>>> While initial idea of keeping topic level configs separate was
> >> exciting, having Repartitioned class encapsulate some producer level
> >> configs makes API more readable.
> >>>>>>>>>
> >>>>>>>>> Regards,
> >>>>>>>>> Levani
> >>>>>>>>>
> >>>>>>>>>> On Jul 20, 2019, at 1:15 AM, Sophie Blee-Goldman <
> >> [email protected] <mailto:[email protected]> <mailto:
> [email protected] <mailto:[email protected]>> <mailto:
> >> [email protected] <mailto:[email protected]> <mailto:
> [email protected] <mailto:[email protected]>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> I think that is a good point about trying to keep producer level
> >>>>>>>>>> configurations and (repartition) topic level considerations
> >> separate.
> >>>>>>>>>> Number of partitions is definitely purely a topic level
> >> configuration. But
> >>>>>>>>>> on some level, serdes and partitioners are just as much a topic
> >>>>>>>>>> configuration as a producer one. You could have two producers
> >> configured
> >>>>>>>>>> with different serdes and/or partitioners, but if they are
> >> writing to the
> >>>>>>>>>> same topic the result would be very difficult to part. So in a
> >> sense, these
> >>>>>>>>>> are configurations of topics in Streams, not just producers.
> >>>>>>>>>>
> >>>>>>>>>> Another way to think of it: while the Streams API is not always
> >> true to
> >>>>>>>>>> this, ideally all the relevant configs for an operator are
> >> wrapped into a
> >>>>>>>>>> single object (in this case, Repartitioned). We could instead
> >> split out the
> >>>>>>>>>> fields in common with Produced into a separate parameter to keep
> >> topic and
> >>>>>>>>>> producer level configurations separate, but this increases the
> >> API surface
> >>>>>>>>>> area by a lot. It's much more straightforward to just say "this
> is
> >>>>>>>>>> everything that this particular operator needs" without worrying
> >> about what
> >>>>>>>>>> exactly you're specifying.
> >>>>>>>>>>
> >>>>>>>>>> I suppose you could alternatively make Produced a field of
> >> Repartitioned,
> >>>>>>>>>> but I don't think we do this kind of composition elsewhere in
> >> Streams at
> >>>>>>>>>> the moment
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Jul 19, 2019 at 1:45 PM Levani Kokhreidze <
> >> [email protected] <mailto:[email protected]> <mailto:
> [email protected] <mailto:[email protected]>> <mailto:
> >> [email protected] <mailto:[email protected]> <mailto:
> [email protected] <mailto:[email protected]>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi Bill,
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks a lot for the feedback.
> >>>>>>>>>>> Yes, that makes sense. I’ve updated KIP with
> >> `Repartitioned#partitioner`
> >>>>>>>>>>> configuration.
> >>>>>>>>>>> In the beginning, I wanted to introduce a class for topic level
> >>>>>>>>>>> configuration and keep topic level and producer level
> >> configurations (such
> >>>>>>>>>>> as Produced) separately (see my second email in this thread).
> >>>>>>>>>>> But while looking at the semantics of KStream interface, I
> >> couldn’t really
> >>>>>>>>>>> figure out good operation name for Topic level configuration
> >> class and just
> >>>>>>>>>>> introducing `Topic` config class was kinda breaking the
> >> semantics.
> >>>>>>>>>>> So I think having Repartitioned class which encapsulates topic
> >> and
> >>>>>>>>>>> producer level configurations for internal topics is viable
> >> thing to do.
> >>>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Levani
> >>>>>>>>>>>
> >>>>>>>>>>>> On Jul 19, 2019, at 7:47 PM, Bill Bejeck <[email protected]
> <mailto:[email protected]>
> >> <mailto:[email protected] <mailto:[email protected]>> <mailto:
> [email protected] <mailto:[email protected]> <mailto:
> >> [email protected] <mailto:[email protected]>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi Lavani,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks for resurrecting this KIP.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'm also a +1 for adding a partition option.  In addition to
> >> the reason
> >>>>>>>>>>>> provided by John, my reasoning is:
> >>>>>>>>>>>>
> >>>>>>>>>>>> 1. Users may want to use something other than hash-based
> >> partitioning
> >>>>>>>>>>>> 2. Users may wish to partition on something different than the
> >> key
> >>>>>>>>>>>> without having to change the key.  For example:
> >>>>>>>>>>>> 1. A combination of fields in the value in conjunction with
> >> the key
> >>>>>>>>>>>> 2. Something other than the key
> >>>>>>>>>>>> 3. We allow users to specify a partitioner on Produced hence
> in
> >>>>>>>>>>>> KStream.to and KStream.through, so it makes sense for API
> >> consistency.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Just my  2 cents.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>> Bill
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, Jul 19, 2019 at 5:46 AM Levani Kokhreidze <
> >>>>>>>>>>> [email protected] <mailto:[email protected]>
> <mailto:[email protected] <mailto:[email protected]>> <mailto:
> >> [email protected] <mailto:[email protected]> <mailto:
> [email protected] <mailto:[email protected]>>>>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hi John,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> In my mind it makes sense.
> >>>>>>>>>>>>> If we add partitioner configuration to Repartitioned class,
> >> with the
> >>>>>>>>>>>>> combination of specifying number of partitions for internal
> >> topics, user
> >>>>>>>>>>>>> will have opportunity to ensure co-partitioning before join
> >> operation.
> >>>>>>>>>>>>> I think this can be quite powerful feature.
> >>>>>>>>>>>>> Wondering what others think about this?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>> Levani
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Jul 18, 2019, at 1:20 AM, John Roesler <
> [email protected] <mailto:[email protected]>
> >> <mailto:[email protected] <mailto:[email protected]>> <mailto:
> [email protected] <mailto:[email protected]> <mailto:
> >> [email protected] <mailto:[email protected]>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes, I believe that's what I had in mind. Again, not totally
> >> sure it
> >>>>>>>>>>>>>> makes sense, but I believe something similar is the
> rationale
> >> for
> >>>>>>>>>>>>>> having the partitioner option in Produced.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 3:20 PM Levani Kokhreidze
> >>>>>>>>>>>>>> <[email protected] <mailto:[email protected]>
> <mailto:[email protected] <mailto:[email protected]>>
> >> <mailto:[email protected] <mailto:[email protected]> <mailto:
> [email protected] <mailto:[email protected]>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hey John,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Oh that’s interesting use-case.
> >>>>>>>>>>>>>>> Do I understand this correctly, in your example I would
> >> first issue
> >>>>>>>>>>>>> repartition(Repartitioned) with proper partitioner that
> >> essentially
> >>>>>>>>>>> would
> >>>>>>>>>>>>> be the same as the topic I want to join with and then do the
> >>>>>>>>>>> KStream#join
> >>>>>>>>>>>>> with DSL?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>> Levani
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Jul 17, 2019, at 11:11 PM, John Roesler <
> >> [email protected] <mailto:[email protected]> <mailto:[email protected]
> <mailto:[email protected]>> <mailto:[email protected] <mailto:
> [email protected]>
> >> <mailto:[email protected] <mailto:[email protected]>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hey, all, just to chime in,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I think it might be useful to have an option to specify
> the
> >>>>>>>>>>>>>>>> partitioner. The case I have in mind is that some data may
> >> get
> >>>>>>>>>>>>>>>> repartitioned and then joined with an input topic. If the
> >> right-side
> >>>>>>>>>>>>>>>> input topic uses a custom partitioning strategy, then the
> >>>>>>>>>>>>>>>> repartitioned stream also needs to be partitioned with the
> >> same
> >>>>>>>>>>>>>>>> strategy.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Does that make sense, or did I maybe miss something
> >> important?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 2:48 PM Levani Kokhreidze
> >>>>>>>>>>>>>>>> <[email protected] <mailto:[email protected]>
> <mailto:[email protected] <mailto:[email protected]>>
> >> <mailto:[email protected] <mailto:[email protected]> <mailto:
> [email protected] <mailto:[email protected]>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Yes, I was thinking about it as well. To be honest I’m
> not
> >> sure
> >>>>>>>>>>> about
> >>>>>>>>>>>>> it yet.
> >>>>>>>>>>>>>>>>> As Kafka Streams DSL user, I don’t really think I would
> >> need control
> >>>>>>>>>>>>> over partitioner for internal topics.
> >>>>>>>>>>>>>>>>> As a user, I would assume that Kafka Streams knows best
> >> how to
> >>>>>>>>>>>>> partition data for internal topics.
> >>>>>>>>>>>>>>>>> In this KIP I wrote that Produced should be used only for
> >> topics
> >>>>>>>>>>> that
> >>>>>>>>>>>>> are created by user In advance.
> >>>>>>>>>>>>>>>>> In those cases maybe it make sense to have possibility to
> >> specify
> >>>>>>>>>>> the
> >>>>>>>>>>>>> partitioner.
> >>>>>>>>>>>>>>>>> I don’t have clear answer on that yet, but I guess
> >> specifying the
> >>>>>>>>>>>>> partitioner can be added as well if there’s agreement on
> this.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>> Levani
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Jul 17, 2019, at 10:42 PM, Sophie Blee-Goldman <
> >>>>>>>>>>>>> [email protected] <mailto:[email protected]> <mailto:
> [email protected] <mailto:[email protected]>> <mailto:
> >> [email protected] <mailto:[email protected]>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Thanks for clearing that up. I agree that Repartitioned
> >> would be a
> >>>>>>>>>>>>> useful
> >>>>>>>>>>>>>>>>>> addition. I'm wondering if it might also need to have
> >>>>>>>>>>>>>>>>>> a withStreamPartitioner method/field, similar to
> >> Produced? I'm not
> >>>>>>>>>>>>> sure how
> >>>>>>>>>>>>>>>>>> widely this feature is really used, but seems it should
> be
> >>>>>>>>>>> available
> >>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>> repartition topics.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 11:26 AM Levani Kokhreidze <
> >>>>>>>>>>>>> [email protected] <mailto:[email protected]>
> >> <mailto:[email protected] <mailto:[email protected]> <mailto:
> [email protected] <mailto:[email protected]>>>>
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Hey Sophie,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> In both cases KStream#repartition and
> >>>>>>>>>>>>> KStream#repartition(Repartitioned)
> >>>>>>>>>>>>>>>>>>> topic will be created and managed by Kafka Streams.
> >>>>>>>>>>>>>>>>>>> Idea of Repartitioned is to give user more control over
> >> the topic
> >>>>>>>>>>>>> such as
> >>>>>>>>>>>>>>>>>>> num of partitions.
> >>>>>>>>>>>>>>>>>>> I feel like Repartitioned parameter is something that
> is
> >> missing
> >>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>> current DSL design.
> >>>>>>>>>>>>>>>>>>> Essentially giving user control over parallelism by
> >> configuring
> >>>>>>>>>>> num
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>> partitions for internal topics.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Hope this answers your question.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>>>> Levani
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Jul 17, 2019, at 9:02 PM, Sophie Blee-Goldman <
> >>>>>>>>>>>>> [email protected] <mailto:[email protected]> <mailto:
> [email protected] <mailto:[email protected]>> <mailto:
> >> [email protected] <mailto:[email protected]>>>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Hey Levani,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Thanks for the KIP! Can you clarify one thing for me
> --
> >> for the
> >>>>>>>>>>>>>>>>>>>> KStream#repartition signature taking a Repartitioned,
> >> will the
> >>>>>>>>>>>>> topic be
> >>>>>>>>>>>>>>>>>>>> auto-created by Streams (which seems to be the case
> for
> >> the
> >>>>>>>>>>>>> signature
> >>>>>>>>>>>>>>>>>>>> without a Repartitioned) or does it have to be
> >> pre-created? The
> >>>>>>>>>>>>> wording
> >>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>> the KIP makes it seem like one version of the method
> >> will
> >>>>>>>>>>>>> auto-create
> >>>>>>>>>>>>>>>>>>>> topics while the other will not.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>> Sophie
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 10:15 AM Levani Kokhreidze <
> >>>>>>>>>>>>>>>>>>> [email protected] <mailto:[email protected]>
> >> <mailto:[email protected] <mailto:[email protected]> <mailto:
> [email protected] <mailto:[email protected]>>>>
> >>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> One more bump about KIP-221 (
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >>
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >
> >>>>
> >>>>>>>>>>>>>>>>>>>>> <
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >>>
> >>>>>>>>>>>>>>>>>>>> )
> >>>>>>>>>>>>>>>>>>>>> so it doesn’t get lost in mailing list :)
> >>>>>>>>>>>>>>>>>>>>> Would love to hear communities opinions/concerns
> about
> >> this KIP.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>>>>>> Levani
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze <
> >>>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Kind reminder about this KIP:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >>>>>>>>>>>>>>>>>>>>> <
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>>>>>>> Levani
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze <
> >>>>>>>>>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>>>>>>>>> <mailto:[email protected]>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> In order to move this KIP forward, I’ve updated
> >> confluence
> >>>>>>>>>>> page
> >>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>> the new proposal
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >>>>>>>>>>>>>>>>>>>>> <
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> I’ve also filled “Rejected Alternatives” section.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Looking forward to discuss this KIP :)
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> King regards,
> >>>>>>>>>>>>>>>>>>>>>>> Levani
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 1:08 PM, Levani Kokhreidze <
> >>>>>>>>>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>>>>>>>>> <mailto:[email protected]>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Hello Matthias,
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Thanks for the feedback and ideas.
> >>>>>>>>>>>>>>>>>>>>>>>> I like the idea of introducing dedicated `Topic`
> >> class for
> >>>>>>>>>>>>> topic
> >>>>>>>>>>>>>>>>>>>>> configuration for internal operators like
> `groupedBy`.
> >>>>>>>>>>>>>>>>>>>>>>>> Would be great to hear others opinion about this
> as
> >> well.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Kind regards,
> >>>>>>>>>>>>>>>>>>>>>>>> Levani
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 7:00 AM, Matthias J. Sax <
> >>>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>>>>>>>>> <mailto:[email protected]>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Levani,
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks for picking up this KIP! And thanks for
> >> summarizing
> >>>>>>>>>>>>>>>>>>> everything.
> >>>>>>>>>>>>>>>>>>>>>>>>> Even if some points may have been discussed
> >> already (can't
> >>>>>>>>>>>>> really
> >>>>>>>>>>>>>>>>>>>>>>>>> remember), it's helpful to get a good summary to
> >> refresh the
> >>>>>>>>>>>>>>>>>>>>> discussion.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> I think your reasoning makes sense. With regard
> to
> >> the
> >>>>>>>>>>>>> distinction
> >>>>>>>>>>>>>>>>>>>>>>>>> between operators that manage topics and
> operators
> >> that use
> >>>>>>>>>>>>>>>>>>>>> user-created
> >>>>>>>>>>>>>>>>>>>>>>>>> topics: Following this argument, it might
> indicate
> >> that
> >>>>>>>>>>>>> leaving
> >>>>>>>>>>>>>>>>>>>>>>>>> `through()` as-is (as an operator that uses
> >> use-defined
> >>>>>>>>>>>>> topics) and
> >>>>>>>>>>>>>>>>>>>>>>>>> introducing a new `repartition()` operator (an
> >> operator that
> >>>>>>>>>>>>> manages
> >>>>>>>>>>>>>>>>>>>>>>>>> topics itself) might be good. Otherwise, there is
> >> one
> >>>>>>>>>>> operator
> >>>>>>>>>>>>>>>>>>>>>>>>> `through()` that sometimes manages topics but
> >> sometimes
> >>>>>>>>>>> not; a
> >>>>>>>>>>>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>>>>>>>>>> name, ie, new operator would make the distinction
> >> clearer.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> About adding `numOfPartitions` to `Grouped`. I am
> >> wondering
> >>>>>>>>>>>>> if the
> >>>>>>>>>>>>>>>>>>>>> same
> >>>>>>>>>>>>>>>>>>>>>>>>> argument as for `Produced` does apply and adding
> >> it is
> >>>>>>>>>>>>> semantically
> >>>>>>>>>>>>>>>>>>>>>>>>> questionable? Might be good to get opinions of
> >> others on
> >>>>>>>>>>>>> this, too.
> >>>>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>> am
> >>>>>>>>>>>>>>>>>>>>>>>>> not sure myself what solution I prefer atm.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> So far, KS uses configuration objects that allow
> to
> >>>>>>>>>>> configure
> >>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>> certain
> >>>>>>>>>>>>>>>>>>>>>>>>> "entity" like a consumer, producer, store. If we
> >> assume that
> >>>>>>>>>>>>> a topic
> >>>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>> a similar entity, I am wonder if we should have a
> >>>>>>>>>>>>>>>>>>>>>>>>> `Topic#withNumberOfPartitions()` class and method
> >> instead of
> >>>>>>>>>>>>> a plain
> >>>>>>>>>>>>>>>>>>>>>>>>> integer? This would allow us to add other
> configs,
> >> like
> >>>>>>>>>>>>> replication
> >>>>>>>>>>>>>>>>>>>>>>>>> factor, retention-time etc, easily, without the
> >> need to
> >>>>>>>>>>>>> change the
> >>>>>>>>>>>>>>>>>>>>> "main
> >>>>>>>>>>>>>>>>>>>>>>>>> API".
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Just want to give some ideas. Not sure if I like
> >> them
> >>>>>>>>>>> myself.
> >>>>>>>>>>>>> :)
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> On 7/1/19 1:04 AM, Levani Kokhreidze wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>> Actually, giving it more though - maybe
> enhancing
> >> Produced
> >>>>>>>>>>>>> with num
> >>>>>>>>>>>>>>>>>>>>> of partitions configuration is not the best approach.
> >> Let me
> >>>>>>>>>>>>> explain
> >>>>>>>>>>>>>>>>>>> why:
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> 1) If we enhance Produced class with this
> >> configuration,
> >>>>>>>>>>>>> this will
> >>>>>>>>>>>>>>>>>>>>> also affect KStream#to operation. Since KStream#to is
> >> the final
> >>>>>>>>>>>>> sink of
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> topology, for me, it seems to be reasonable
> assumption
> >> that user
> >>>>>>>>>>>>> needs
> >>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>> manually create sink topic in advance. And in that
> >> case, having
> >>>>>>>>>>>>> num of
> >>>>>>>>>>>>>>>>>>>>> partitions configuration doesn’t make much sense.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> 2) Looking at Produced class, based on API
> >> contract, seems
> >>>>>>>>>>>>> like
> >>>>>>>>>>>>>>>>>>>>> Produced is designed to be something that is
> >> explicitly for
> >>>>>>>>>>>>> producer
> >>>>>>>>>>>>>>>>>>> (key
> >>>>>>>>>>>>>>>>>>>>> serializer, value serializer, partitioner those all
> >> are producer
> >>>>>>>>>>>>>>>>>>> specific
> >>>>>>>>>>>>>>>>>>>>> configurations) and num of partitions is topic level
> >>>>>>>>>>>>> configuration. And
> >>>>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>> don’t think mixing topic and producer level
> >> configurations
> >>>>>>>>>>>>> together in
> >>>>>>>>>>>>>>>>>>> one
> >>>>>>>>>>>>>>>>>>>>> class is the good approach.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> 3) Looking at KStream interface, seems like
> >> Produced
> >>>>>>>>>>>>> parameter is
> >>>>>>>>>>>>>>>>>>>>> for operations that work with non-internal (e.g
> topics
> >> created
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> managed
> >>>>>>>>>>>>>>>>>>>>> internally by Kafka Streams) topics and I think we
> >> should leave
> >>>>>>>>>>>>> it as
> >>>>>>>>>>>>>>>>>>> it is
> >>>>>>>>>>>>>>>>>>>>> in that case.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Taking all this things into account, I think we
> >> should
> >>>>>>>>>>>>> distinguish
> >>>>>>>>>>>>>>>>>>>>> between DSL operations, where Kafka Streams should
> >> create and
> >>>>>>>>>>>>> manage
> >>>>>>>>>>>>>>>>>>>>> internal topics (KStream#groupBy) vs topics that
> >> should be
> >>>>>>>>>>>>> created in
> >>>>>>>>>>>>>>>>>>>>> advance (e.g KStream#to).
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> To sum it up, I think adding numPartitions
> >> configuration in
> >>>>>>>>>>>>>>>>>>> Produced
> >>>>>>>>>>>>>>>>>>>>> will result in mixing topic and producer level
> >> configuration in
> >>>>>>>>>>>>> one
> >>>>>>>>>>>>>>>>>>> class
> >>>>>>>>>>>>>>>>>>>>> and it’s gonna break existing API semantics.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Regarding making topic name optional in
> >> KStream#through - I
> >>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>>>>>>> underline idea is very useful and giving users
> >> possibility to
> >>>>>>>>>>>>> specify
> >>>>>>>>>>>>>>>>>>> num
> >>>>>>>>>>>>>>>>>>>>> of partitions there is even more useful :)
> Considering
> >> arguments
> >>>>>>>>>>>>> against
> >>>>>>>>>>>>>>>>>>>>> adding num of partitions in Produced class, I see two
> >> options
> >>>>>>>>>>>>> here:
> >>>>>>>>>>>>>>>>>>>>>>>>>> 1) Add following method overloads
> >>>>>>>>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and
> >> num of
> >>>>>>>>>>>>> partitions
> >>>>>>>>>>>>>>>>>>>>> will be taken from source topic
> >>>>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic
> will
> >> be auto
> >>>>>>>>>>>>>>>>>>>>> generated with specified num of partitions
> >>>>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final
> >> Produced<K, V>
> >>>>>>>>>>>>>>>>>>>>> produced) - topic will be with generated with
> >> specified num of
> >>>>>>>>>>>>>>>>>>> partitions
> >>>>>>>>>>>>>>>>>>>>> and configuration taken from produced parameter.
> >>>>>>>>>>>>>>>>>>>>>>>>>> 2) Leave KStream#through as it is and introduce
> >> new method
> >>>>>>>>>>> -
> >>>>>>>>>>>>>>>>>>>>> KStream#repartition (I think Matthias suggested this
> >> in one of
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> threads)
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Considering all mentioned above I propose the
> >> following
> >>>>>>>>>>> plan:
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Option A:
> >>>>>>>>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is
> >>>>>>>>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to
> Grouped
> >> class (as
> >>>>>>>>>>>>>>>>>>>>> mentioned in the KIP)
> >>>>>>>>>>>>>>>>>>>>>>>>>> 3) Add following method overloads to
> >> KStream#through
> >>>>>>>>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and
> >> num of
> >>>>>>>>>>>>> partitions
> >>>>>>>>>>>>>>>>>>>>> will be taken from source topic
> >>>>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic
> will
> >> be auto
> >>>>>>>>>>>>>>>>>>>>> generated with specified num of partitions
> >>>>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final
> >> Produced<K, V>
> >>>>>>>>>>>>>>>>>>>>> produced) - topic will be with generated with
> >> specified num of
> >>>>>>>>>>>>>>>>>>> partitions
> >>>>>>>>>>>>>>>>>>>>> and configuration taken from produced parameter.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Option B:
> >>>>>>>>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is
> >>>>>>>>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to
> Grouped
> >> class (as
> >>>>>>>>>>>>>>>>>>>>> mentioned in the KIP)
> >>>>>>>>>>>>>>>>>>>>>>>>>> 3) Add new operator KStream#repartition for
> >> creating and
> >>>>>>>>>>>>> managing
> >>>>>>>>>>>>>>>>>>>>> internal repartition topics
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> P.S. I’m sorry if all of this was already
> >> discussed in the
> >>>>>>>>>>>>> mailing
> >>>>>>>>>>>>>>>>>>>>> list, but I kinda got with all the threads that were
> >> about this
> >>>>>>>>>>>>> KIP :(
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Kind regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>> Levani
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Jul 1, 2019, at 9:56 AM, Levani Kokhreidze <
> >>>>>>>>>>>>>>>>>>>>> [email protected] <mailto:
> [email protected]>>
> >> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> I would like to resurrect discussion around
> >> KIP-221. Going
> >>>>>>>>>>>>> through
> >>>>>>>>>>>>>>>>>>>>> the discussion thread, there’s seems to agreement
> >> around
> >>>>>>>>>>>>> usefulness of
> >>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>> feature.
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Regarding the implementation, as far as I
> >> understood, the
> >>>>>>>>>>>>> most
> >>>>>>>>>>>>>>>>>>>>> optimal solution for me seems the following:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> 1) Add two method overloads to KStream#through
> >> method
> >>>>>>>>>>>>> (essentially
> >>>>>>>>>>>>>>>>>>>>> making topic name optional)
> >>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Enhance Produced class with numOfPartitions
> >>>>>>>>>>> configuration
> >>>>>>>>>>>>>>>>>>> field.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Those two changes will allow DSL users to
> control
> >>>>>>>>>>>>> parallelism and
> >>>>>>>>>>>>>>>>>>>>> trigger re-partition without doing stateful
> operations.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> I will update KIP with interface changes around
> >>>>>>>>>>>>> KStream#through if
> >>>>>>>>>>>>>>>>>>>>> this changes sound sensible.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Kind regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Levani
>
>

Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

Reply via email to