Hey Levani,

Thanks for the KIP! Can you clarify one thing for me -- for the
KStream#repartition signature taking a Repartitioned, will the topic be
auto-created by Streams (which seems to be the case for the signature
without a Repartitioned) or does it have to be pre-created? The wording in
the KIP makes it seem like one version of the method will auto-create
topics while the other will not.

Cheers,
Sophie

On Wed, Jul 17, 2019 at 10:15 AM Levani Kokhreidze <levani.co...@gmail.com>
wrote:

> Hello,
>
> One more bump about KIP-221 (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint>)
> so it doesn’t get lost in mailing list :)
> Would love to hear communities opinions/concerns about this KIP.
>
> Regards,
> Levani
>
>
> > On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze <levani.co...@gmail.com>
> wrote:
> >
> > Hello,
> >
> > Kind reminder about this KIP:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >
> >
> > Regards,
> > Levani
> >
> >> On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze <levani.co...@gmail.com
> <mailto:levani.co...@gmail.com>> wrote:
> >>
> >> Hello,
> >>
> >> In order to move this KIP forward, I’ve updated confluence page with
> the new proposal
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >
> >> I’ve also filled “Rejected Alternatives” section.
> >>
> >> Looking forward to discuss this KIP :)
> >>
> >> King regards,
> >> Levani
> >>
> >>
> >>> On Jul 3, 2019, at 1:08 PM, Levani Kokhreidze <levani.co...@gmail.com
> <mailto:levani.co...@gmail.com>> wrote:
> >>>
> >>> Hello Matthias,
> >>>
> >>> Thanks for the feedback and ideas.
> >>> I like the idea of introducing dedicated `Topic` class for topic
> configuration for internal operators like `groupedBy`.
> >>> Would be great to hear others opinion about this as well.
> >>>
> >>> Kind regards,
> >>> Levani
> >>>
> >>>
> >>>> On Jul 3, 2019, at 7:00 AM, Matthias J. Sax <matth...@confluent.io
> <mailto:matth...@confluent.io>> wrote:
> >>>>
> >>>> Levani,
> >>>>
> >>>> Thanks for picking up this KIP! And thanks for summarizing everything.
> >>>> Even if some points may have been discussed already (can't really
> >>>> remember), it's helpful to get a good summary to refresh the
> discussion.
> >>>>
> >>>> I think your reasoning makes sense. With regard to the distinction
> >>>> between operators that manage topics and operators that use
> user-created
> >>>> topics: Following this argument, it might indicate that leaving
> >>>> `through()` as-is (as an operator that uses use-defined topics) and
> >>>> introducing a new `repartition()` operator (an operator that manages
> >>>> topics itself) might be good. Otherwise, there is one operator
> >>>> `through()` that sometimes manages topics but sometimes not; a
> different
> >>>> name, ie, new operator would make the distinction clearer.
> >>>>
> >>>> About adding `numOfPartitions` to `Grouped`. I am wondering if the
> same
> >>>> argument as for `Produced` does apply and adding it is semantically
> >>>> questionable? Might be good to get opinions of others on this, too. I
> am
> >>>> not sure myself what solution I prefer atm.
> >>>>
> >>>> So far, KS uses configuration objects that allow to configure a
> certain
> >>>> "entity" like a consumer, producer, store. If we assume that a topic
> is
> >>>> a similar entity, I am wonder if we should have a
> >>>> `Topic#withNumberOfPartitions()` class and method instead of a plain
> >>>> integer? This would allow us to add other configs, like replication
> >>>> factor, retention-time etc, easily, without the need to change the
> "main
> >>>> API".
> >>>>
> >>>> Just want to give some ideas. Not sure if I like them myself. :)
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>>
> >>>>
> >>>> On 7/1/19 1:04 AM, Levani Kokhreidze wrote:
> >>>>> Actually, giving it more though - maybe enhancing Produced with num
> of partitions configuration is not the best approach. Let me explain why:
> >>>>>
> >>>>> 1) If we enhance Produced class with this configuration, this will
> also affect KStream#to operation. Since KStream#to is the final sink of the
> topology, for me, it seems to be reasonable assumption that user needs to
> manually create sink topic in advance. And in that case, having num of
> partitions configuration doesn’t make much sense.
> >>>>>
> >>>>> 2) Looking at Produced class, based on API contract, seems like
> Produced is designed to be something that is explicitly for producer (key
> serializer, value serializer, partitioner those all are producer specific
> configurations) and num of partitions is topic level configuration. And I
> don’t think mixing topic and producer level configurations together in one
> class is the good approach.
> >>>>>
> >>>>> 3) Looking at KStream interface, seems like Produced parameter is
> for operations that work with non-internal (e.g topics created and managed
> internally by Kafka Streams) topics and I think we should leave it as it is
> in that case.
> >>>>>
> >>>>> Taking all this things into account, I think we should distinguish
> between DSL operations, where Kafka Streams should create and manage
> internal topics (KStream#groupBy) vs topics that should be created in
> advance (e.g KStream#to).
> >>>>>
> >>>>> To sum it up, I think adding numPartitions configuration in Produced
> will result in mixing topic and producer level configuration in one class
> and it’s gonna break existing API semantics.
> >>>>>
> >>>>> Regarding making topic name optional in KStream#through - I think
> underline idea is very useful and giving users possibility to specify num
> of partitions there is even more useful :) Considering arguments against
> adding num of partitions in Produced class, I see two options here:
> >>>>> 1) Add following method overloads
> >>>>>   * through() - topic will be auto-generated and num of partitions
> will be taken from source topic
> >>>>>   * through(final int numOfPartitions) - topic will be auto
> generated with specified num of partitions
> >>>>>   * through(final int numOfPartitions, final Produced<K, V>
> produced) - topic will be with generated with specified num of partitions
> and configuration taken from produced parameter.
> >>>>> 2) Leave KStream#through as it is and introduce new method -
> KStream#repartition (I think Matthias suggested this in one of the threads)
> >>>>>
> >>>>> Considering all mentioned above I propose the following plan:
> >>>>>
> >>>>> Option A:
> >>>>> 1) Leave Produced as it is
> >>>>> 2) Add num of partitions configuration to Grouped class (as
> mentioned in the KIP)
> >>>>> 3) Add following method overloads to KStream#through
> >>>>>   * through() - topic will be auto-generated and num of partitions
> will be taken from source topic
> >>>>>   * through(final int numOfPartitions) - topic will be auto
> generated with specified num of partitions
> >>>>>   * through(final int numOfPartitions, final Produced<K, V>
> produced) - topic will be with generated with specified num of partitions
> and configuration taken from produced parameter.
> >>>>>
> >>>>> Option B:
> >>>>> 1) Leave Produced as it is
> >>>>> 2) Add num of partitions configuration to Grouped class (as
> mentioned in the KIP)
> >>>>> 3) Add new operator KStream#repartition for creating and managing
> internal repartition topics
> >>>>>
> >>>>> P.S. I’m sorry if all of this was already discussed in the mailing
> list, but I kinda got with all the threads that were about this KIP :(
> >>>>>
> >>>>> Kind regards,
> >>>>> Levani
> >>>>>
> >>>>>> On Jul 1, 2019, at 9:56 AM, Levani Kokhreidze <
> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> wrote:
> >>>>>>
> >>>>>> Hello,
> >>>>>>
> >>>>>> I would like to resurrect discussion around KIP-221. Going through
> the discussion thread, there’s seems to agreement around usefulness of this
> feature.
> >>>>>> Regarding the implementation, as far as I understood, the most
> optimal solution for me seems the following:
> >>>>>>
> >>>>>> 1) Add two method overloads to KStream#through method (essentially
> making topic name optional)
> >>>>>> 2) Enhance Produced class with numOfPartitions configuration field.
> >>>>>>
> >>>>>> Those two changes will allow DSL users to control parallelism and
> trigger re-partition without doing stateful operations.
> >>>>>>
> >>>>>> I will update KIP with interface changes around KStream#through if
> this changes sound sensible.
> >>>>>>
> >>>>>> Kind regards,
> >>>>>> Levani
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>
>

Reply via email to