One question:

Why do we add

> Repartitioned#with(final String name, final int numberOfPartitions)

It seems that `#with(String name)`, `#numberOfPartitions(int)` in
combination with `withName()` and `withNumberOfPartitions()` should be
sufficient. Users can chain the method calls.

(I think it's valuable to keep the number of overload small if possible.)

Otherwise LGTM.


-Matthias


On 7/23/19 2:18 PM, Levani Kokhreidze wrote:
> Hello,
> 
> Thanks all for your feedback.
> I started voting procedure for this KIP. If there’re any other concerns about 
> this KIP, please let me know.
> 
> Regards,
> Levani
> 
>> On Jul 20, 2019, at 8:39 PM, Levani Kokhreidze <levani.co...@gmail.com> 
>> wrote:
>>
>> Hi Matthias,
>>
>> Thanks for the suggestion, makes sense.
>> I’ve updated KIP 
>> (https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>  
>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint>).
>>
>> Regards,
>> Levani
>>
>>
>>> On Jul 20, 2019, at 3:53 AM, Matthias J. Sax <matth...@confluent.io 
>>> <mailto:matth...@confluent.io>> wrote:
>>>
>>> Thanks for driving the KIP.
>>>
>>> I agree that users need to be able to specify a partitioning strategy.
>>>
>>> Sophie raises a fair point about topic configs and producer configs. My
>>> take is, that consider `Repartitioned` as an "extension" to `Produced`,
>>> that adds topic configuration, is a good way to think about it and helps
>>> to keep the API "clean".
>>>
>>>
>>> With regard to method names. I would prefer to avoid abbreviations. Can
>>> we rename:
>>>
>>> `withNumOfPartitions` -> `withNumberOfPartitions`
>>>
>>> Furthermore, it might be good to add some more `static` methods:
>>>
>>> - Repartitioned.with(Serde<K>, Serde<V>)
>>> - Repartitioned.withNumberOfPartitions(int)
>>> - Repartitioned.streamPartitioner(StreamPartitioner)
>>>
>>>
>>> -Matthias
>>>
>>> On 7/19/19 3:33 PM, Levani Kokhreidze wrote:
>>>> Totally agree. I think in KStream interface it makes sense to have some 
>>>> duplicate configurations between operators in order to keep API simple and 
>>>> usable.
>>>> Also, as more surface API has, harder it is to have proper backward 
>>>> compatibility.
>>>> While initial idea of keeping topic level configs separate was exciting, 
>>>> having Repartitioned class encapsulate some producer level configs makes 
>>>> API more readable.
>>>>
>>>> Regards,
>>>> Levani
>>>>
>>>>> On Jul 20, 2019, at 1:15 AM, Sophie Blee-Goldman <sop...@confluent.io 
>>>>> <mailto:sop...@confluent.io>> wrote:
>>>>>
>>>>> I think that is a good point about trying to keep producer level
>>>>> configurations and (repartition) topic level considerations separate.
>>>>> Number of partitions is definitely purely a topic level configuration. But
>>>>> on some level, serdes and partitioners are just as much a topic
>>>>> configuration as a producer one. You could have two producers configured
>>>>> with different serdes and/or partitioners, but if they are writing to the
>>>>> same topic the result would be very difficult to part. So in a sense, 
>>>>> these
>>>>> are configurations of topics in Streams, not just producers.
>>>>>
>>>>> Another way to think of it: while the Streams API is not always true to
>>>>> this, ideally all the relevant configs for an operator are wrapped into a
>>>>> single object (in this case, Repartitioned). We could instead split out 
>>>>> the
>>>>> fields in common with Produced into a separate parameter to keep topic and
>>>>> producer level configurations separate, but this increases the API surface
>>>>> area by a lot. It's much more straightforward to just say "this is
>>>>> everything that this particular operator needs" without worrying about 
>>>>> what
>>>>> exactly you're specifying.
>>>>>
>>>>> I suppose you could alternatively make Produced a field of Repartitioned,
>>>>> but I don't think we do this kind of composition elsewhere in Streams at
>>>>> the moment
>>>>>
>>>>> On Fri, Jul 19, 2019 at 1:45 PM Levani Kokhreidze <levani.co...@gmail.com 
>>>>> <mailto:levani.co...@gmail.com>>
>>>>> wrote:
>>>>>
>>>>>> Hi Bill,
>>>>>>
>>>>>> Thanks a lot for the feedback.
>>>>>> Yes, that makes sense. I’ve updated KIP with `Repartitioned#partitioner`
>>>>>> configuration.
>>>>>> In the beginning, I wanted to introduce a class for topic level
>>>>>> configuration and keep topic level and producer level configurations 
>>>>>> (such
>>>>>> as Produced) separately (see my second email in this thread).
>>>>>> But while looking at the semantics of KStream interface, I couldn’t 
>>>>>> really
>>>>>> figure out good operation name for Topic level configuration class and 
>>>>>> just
>>>>>> introducing `Topic` config class was kinda breaking the semantics.
>>>>>> So I think having Repartitioned class which encapsulates topic and
>>>>>> producer level configurations for internal topics is viable thing to do.
>>>>>>
>>>>>> Regards,
>>>>>> Levani
>>>>>>
>>>>>>> On Jul 19, 2019, at 7:47 PM, Bill Bejeck <bbej...@gmail.com 
>>>>>>> <mailto:bbej...@gmail.com>> wrote:
>>>>>>>
>>>>>>> Hi Lavani,
>>>>>>>
>>>>>>> Thanks for resurrecting this KIP.
>>>>>>>
>>>>>>> I'm also a +1 for adding a partition option.  In addition to the reason
>>>>>>> provided by John, my reasoning is:
>>>>>>>
>>>>>>> 1. Users may want to use something other than hash-based partitioning
>>>>>>> 2. Users may wish to partition on something different than the key
>>>>>>> without having to change the key.  For example:
>>>>>>>    1. A combination of fields in the value in conjunction with the key
>>>>>>>    2. Something other than the key
>>>>>>> 3. We allow users to specify a partitioner on Produced hence in
>>>>>>> KStream.to and KStream.through, so it makes sense for API consistency.
>>>>>>>
>>>>>>> Just my  2 cents.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Bill
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 19, 2019 at 5:46 AM Levani Kokhreidze <
>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi John,
>>>>>>>>
>>>>>>>> In my mind it makes sense.
>>>>>>>> If we add partitioner configuration to Repartitioned class, with the
>>>>>>>> combination of specifying number of partitions for internal topics, 
>>>>>>>> user
>>>>>>>> will have opportunity to ensure co-partitioning before join operation.
>>>>>>>> I think this can be quite powerful feature.
>>>>>>>> Wondering what others think about this?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Levani
>>>>>>>>
>>>>>>>>> On Jul 18, 2019, at 1:20 AM, John Roesler <j...@confluent.io 
>>>>>>>>> <mailto:j...@confluent.io>> wrote:
>>>>>>>>>
>>>>>>>>> Yes, I believe that's what I had in mind. Again, not totally sure it
>>>>>>>>> makes sense, but I believe something similar is the rationale for
>>>>>>>>> having the partitioner option in Produced.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> -John
>>>>>>>>>
>>>>>>>>> On Wed, Jul 17, 2019 at 3:20 PM Levani Kokhreidze
>>>>>>>>> <levani.co...@gmail.com <mailto:levani.co...@gmail.com>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hey John,
>>>>>>>>>>
>>>>>>>>>> Oh that’s interesting use-case.
>>>>>>>>>> Do I understand this correctly, in your example I would first issue
>>>>>>>> repartition(Repartitioned) with proper partitioner that essentially
>>>>>> would
>>>>>>>> be the same as the topic I want to join with and then do the
>>>>>> KStream#join
>>>>>>>> with DSL?
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Levani
>>>>>>>>>>
>>>>>>>>>>> On Jul 17, 2019, at 11:11 PM, John Roesler <j...@confluent.io 
>>>>>>>>>>> <mailto:j...@confluent.io>>
>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hey, all, just to chime in,
>>>>>>>>>>>
>>>>>>>>>>> I think it might be useful to have an option to specify the
>>>>>>>>>>> partitioner. The case I have in mind is that some data may get
>>>>>>>>>>> repartitioned and then joined with an input topic. If the right-side
>>>>>>>>>>> input topic uses a custom partitioning strategy, then the
>>>>>>>>>>> repartitioned stream also needs to be partitioned with the same
>>>>>>>>>>> strategy.
>>>>>>>>>>>
>>>>>>>>>>> Does that make sense, or did I maybe miss something important?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> -John
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jul 17, 2019 at 2:48 PM Levani Kokhreidze
>>>>>>>>>>> <levani.co...@gmail.com <mailto:levani.co...@gmail.com>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, I was thinking about it as well. To be honest I’m not sure
>>>>>> about
>>>>>>>> it yet.
>>>>>>>>>>>> As Kafka Streams DSL user, I don’t really think I would need 
>>>>>>>>>>>> control
>>>>>>>> over partitioner for internal topics.
>>>>>>>>>>>> As a user, I would assume that Kafka Streams knows best how to
>>>>>>>> partition data for internal topics.
>>>>>>>>>>>> In this KIP I wrote that Produced should be used only for topics
>>>>>> that
>>>>>>>> are created by user In advance.
>>>>>>>>>>>> In those cases maybe it make sense to have possibility to specify
>>>>>> the
>>>>>>>> partitioner.
>>>>>>>>>>>> I don’t have clear answer on that yet, but I guess specifying the
>>>>>>>> partitioner can be added as well if there’s agreement on this.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Levani
>>>>>>>>>>>>
>>>>>>>>>>>>> On Jul 17, 2019, at 10:42 PM, Sophie Blee-Goldman <
>>>>>>>> sop...@confluent.io <mailto:sop...@confluent.io>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for clearing that up. I agree that Repartitioned would be a
>>>>>>>> useful
>>>>>>>>>>>>> addition. I'm wondering if it might also need to have
>>>>>>>>>>>>> a withStreamPartitioner method/field, similar to Produced? I'm not
>>>>>>>> sure how
>>>>>>>>>>>>> widely this feature is really used, but seems it should be
>>>>>> available
>>>>>>>> for
>>>>>>>>>>>>> repartition topics.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 11:26 AM Levani Kokhreidze <
>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hey Sophie,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In both cases KStream#repartition and
>>>>>>>> KStream#repartition(Repartitioned)
>>>>>>>>>>>>>> topic will be created and managed by Kafka Streams.
>>>>>>>>>>>>>> Idea of Repartitioned is to give user more control over the topic
>>>>>>>> such as
>>>>>>>>>>>>>> num of partitions.
>>>>>>>>>>>>>> I feel like Repartitioned parameter is something that is missing
>>>>>> in
>>>>>>>>>>>>>> current DSL design.
>>>>>>>>>>>>>> Essentially giving user control over parallelism by configuring
>>>>>> num
>>>>>>>> of
>>>>>>>>>>>>>> partitions for internal topics.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hope this answers your question.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Jul 17, 2019, at 9:02 PM, Sophie Blee-Goldman <
>>>>>>>> sop...@confluent.io <mailto:sop...@confluent.io>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hey Levani,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for the KIP! Can you clarify one thing for me -- for the
>>>>>>>>>>>>>>> KStream#repartition signature taking a Repartitioned, will the
>>>>>>>> topic be
>>>>>>>>>>>>>>> auto-created by Streams (which seems to be the case for the
>>>>>>>> signature
>>>>>>>>>>>>>>> without a Repartitioned) or does it have to be pre-created? The
>>>>>>>> wording
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>> the KIP makes it seem like one version of the method will
>>>>>>>> auto-create
>>>>>>>>>>>>>>> topics while the other will not.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>> Sophie
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 10:15 AM Levani Kokhreidze <
>>>>>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> One more bump about KIP-221 (
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>>>>>  
>>>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint>
>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>> so it doesn’t get lost in mailing list :)
>>>>>>>>>>>>>>>> Would love to hear communities opinions/concerns about this 
>>>>>>>>>>>>>>>> KIP.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze <
>>>>>>>> levani.co...@gmail.com
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Kind reminder about this KIP:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze <
>>>>>>>>>>>>>> levani.co...@gmail.com
>>>>>>>>>>>>>>>> <mailto:levani.co...@gmail.com>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In order to move this KIP forward, I’ve updated confluence
>>>>>> page
>>>>>>>> with
>>>>>>>>>>>>>>>> the new proposal
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I’ve also filled “Rejected Alternatives” section.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Looking forward to discuss this KIP :)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> King regards,
>>>>>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 1:08 PM, Levani Kokhreidze <
>>>>>>>>>>>>>> levani.co...@gmail.com
>>>>>>>>>>>>>>>> <mailto:levani.co...@gmail.com>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hello Matthias,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks for the feedback and ideas.
>>>>>>>>>>>>>>>>>>> I like the idea of introducing dedicated `Topic` class for
>>>>>>>> topic
>>>>>>>>>>>>>>>> configuration for internal operators like `groupedBy`.
>>>>>>>>>>>>>>>>>>> Would be great to hear others opinion about this as well.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 7:00 AM, Matthias J. Sax <
>>>>>>>> matth...@confluent.io
>>>>>>>>>>>>>>>> <mailto:matth...@confluent.io>> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Levani,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks for picking up this KIP! And thanks for summarizing
>>>>>>>>>>>>>> everything.
>>>>>>>>>>>>>>>>>>>> Even if some points may have been discussed already (can't
>>>>>>>> really
>>>>>>>>>>>>>>>>>>>> remember), it's helpful to get a good summary to refresh 
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> discussion.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I think your reasoning makes sense. With regard to the
>>>>>>>> distinction
>>>>>>>>>>>>>>>>>>>> between operators that manage topics and operators that use
>>>>>>>>>>>>>>>> user-created
>>>>>>>>>>>>>>>>>>>> topics: Following this argument, it might indicate that
>>>>>>>> leaving
>>>>>>>>>>>>>>>>>>>> `through()` as-is (as an operator that uses use-defined
>>>>>>>> topics) and
>>>>>>>>>>>>>>>>>>>> introducing a new `repartition()` operator (an operator 
>>>>>>>>>>>>>>>>>>>> that
>>>>>>>> manages
>>>>>>>>>>>>>>>>>>>> topics itself) might be good. Otherwise, there is one
>>>>>> operator
>>>>>>>>>>>>>>>>>>>> `through()` that sometimes manages topics but sometimes
>>>>>> not; a
>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>> name, ie, new operator would make the distinction clearer.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> About adding `numOfPartitions` to `Grouped`. I am wondering
>>>>>>>> if the
>>>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>>>>>> argument as for `Produced` does apply and adding it is
>>>>>>>> semantically
>>>>>>>>>>>>>>>>>>>> questionable? Might be good to get opinions of others on
>>>>>>>> this, too.
>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>> am
>>>>>>>>>>>>>>>>>>>> not sure myself what solution I prefer atm.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> So far, KS uses configuration objects that allow to
>>>>>> configure
>>>>>>>> a
>>>>>>>>>>>>>>>> certain
>>>>>>>>>>>>>>>>>>>> "entity" like a consumer, producer, store. If we assume 
>>>>>>>>>>>>>>>>>>>> that
>>>>>>>> a topic
>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>> a similar entity, I am wonder if we should have a
>>>>>>>>>>>>>>>>>>>> `Topic#withNumberOfPartitions()` class and method instead 
>>>>>>>>>>>>>>>>>>>> of
>>>>>>>> a plain
>>>>>>>>>>>>>>>>>>>> integer? This would allow us to add other configs, like
>>>>>>>> replication
>>>>>>>>>>>>>>>>>>>> factor, retention-time etc, easily, without the need to
>>>>>>>> change the
>>>>>>>>>>>>>>>> "main
>>>>>>>>>>>>>>>>>>>> API".
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Just want to give some ideas. Not sure if I like them
>>>>>> myself.
>>>>>>>> :)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On 7/1/19 1:04 AM, Levani Kokhreidze wrote:
>>>>>>>>>>>>>>>>>>>>> Actually, giving it more though - maybe enhancing Produced
>>>>>>>> with num
>>>>>>>>>>>>>>>> of partitions configuration is not the best approach. Let me
>>>>>>>> explain
>>>>>>>>>>>>>> why:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 1) If we enhance Produced class with this configuration,
>>>>>>>> this will
>>>>>>>>>>>>>>>> also affect KStream#to operation. Since KStream#to is the final
>>>>>>>> sink of
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> topology, for me, it seems to be reasonable assumption that 
>>>>>>>>>>>>>>>> user
>>>>>>>> needs
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> manually create sink topic in advance. And in that case, having
>>>>>>>> num of
>>>>>>>>>>>>>>>> partitions configuration doesn’t make much sense.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 2) Looking at Produced class, based on API contract, seems
>>>>>>>> like
>>>>>>>>>>>>>>>> Produced is designed to be something that is explicitly for
>>>>>>>> producer
>>>>>>>>>>>>>> (key
>>>>>>>>>>>>>>>> serializer, value serializer, partitioner those all are 
>>>>>>>>>>>>>>>> producer
>>>>>>>>>>>>>> specific
>>>>>>>>>>>>>>>> configurations) and num of partitions is topic level
>>>>>>>> configuration. And
>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>> don’t think mixing topic and producer level configurations
>>>>>>>> together in
>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>> class is the good approach.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 3) Looking at KStream interface, seems like Produced
>>>>>>>> parameter is
>>>>>>>>>>>>>>>> for operations that work with non-internal (e.g topics created
>>>>>> and
>>>>>>>>>>>>>> managed
>>>>>>>>>>>>>>>> internally by Kafka Streams) topics and I think we should leave
>>>>>>>> it as
>>>>>>>>>>>>>> it is
>>>>>>>>>>>>>>>> in that case.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Taking all this things into account, I think we should
>>>>>>>> distinguish
>>>>>>>>>>>>>>>> between DSL operations, where Kafka Streams should create and
>>>>>>>> manage
>>>>>>>>>>>>>>>> internal topics (KStream#groupBy) vs topics that should be
>>>>>>>> created in
>>>>>>>>>>>>>>>> advance (e.g KStream#to).
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> To sum it up, I think adding numPartitions configuration 
>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>> Produced
>>>>>>>>>>>>>>>> will result in mixing topic and producer level configuration in
>>>>>>>> one
>>>>>>>>>>>>>> class
>>>>>>>>>>>>>>>> and it’s gonna break existing API semantics.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Regarding making topic name optional in KStream#through - 
>>>>>>>>>>>>>>>>>>>>> I
>>>>>>>> think
>>>>>>>>>>>>>>>> underline idea is very useful and giving users possibility to
>>>>>>>> specify
>>>>>>>>>>>>>> num
>>>>>>>>>>>>>>>> of partitions there is even more useful :) Considering 
>>>>>>>>>>>>>>>> arguments
>>>>>>>> against
>>>>>>>>>>>>>>>> adding num of partitions in Produced class, I see two options
>>>>>>>> here:
>>>>>>>>>>>>>>>>>>>>> 1) Add following method overloads
>>>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and num of
>>>>>>>> partitions
>>>>>>>>>>>>>>>> will be taken from source topic
>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic will be auto
>>>>>>>>>>>>>>>> generated with specified num of partitions
>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final Produced<K, V>
>>>>>>>>>>>>>>>> produced) - topic will be with generated with specified num of
>>>>>>>>>>>>>> partitions
>>>>>>>>>>>>>>>> and configuration taken from produced parameter.
>>>>>>>>>>>>>>>>>>>>> 2) Leave KStream#through as it is and introduce new method
>>>>>> -
>>>>>>>>>>>>>>>> KStream#repartition (I think Matthias suggested this in one of
>>>>>> the
>>>>>>>>>>>>>> threads)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Considering all mentioned above I propose the following
>>>>>> plan:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Option A:
>>>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is
>>>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to Grouped class 
>>>>>>>>>>>>>>>>>>>>> (as
>>>>>>>>>>>>>>>> mentioned in the KIP)
>>>>>>>>>>>>>>>>>>>>> 3) Add following method overloads to KStream#through
>>>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and num of
>>>>>>>> partitions
>>>>>>>>>>>>>>>> will be taken from source topic
>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic will be auto
>>>>>>>>>>>>>>>> generated with specified num of partitions
>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final Produced<K, V>
>>>>>>>>>>>>>>>> produced) - topic will be with generated with specified num of
>>>>>>>>>>>>>> partitions
>>>>>>>>>>>>>>>> and configuration taken from produced parameter.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Option B:
>>>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is
>>>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to Grouped class 
>>>>>>>>>>>>>>>>>>>>> (as
>>>>>>>>>>>>>>>> mentioned in the KIP)
>>>>>>>>>>>>>>>>>>>>> 3) Add new operator KStream#repartition for creating and
>>>>>>>> managing
>>>>>>>>>>>>>>>> internal repartition topics
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> P.S. I’m sorry if all of this was already discussed in the
>>>>>>>> mailing
>>>>>>>>>>>>>>>> list, but I kinda got with all the threads that were about this
>>>>>>>> KIP :(
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Jul 1, 2019, at 9:56 AM, Levani Kokhreidze <
>>>>>>>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I would like to resurrect discussion around KIP-221. 
>>>>>>>>>>>>>>>>>>>>>> Going
>>>>>>>> through
>>>>>>>>>>>>>>>> the discussion thread, there’s seems to agreement around
>>>>>>>> usefulness of
>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>> feature.
>>>>>>>>>>>>>>>>>>>>>> Regarding the implementation, as far as I understood, the
>>>>>>>> most
>>>>>>>>>>>>>>>> optimal solution for me seems the following:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 1) Add two method overloads to KStream#through method
>>>>>>>> (essentially
>>>>>>>>>>>>>>>> making topic name optional)
>>>>>>>>>>>>>>>>>>>>>> 2) Enhance Produced class with numOfPartitions
>>>>>> configuration
>>>>>>>>>>>>>> field.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Those two changes will allow DSL users to control
>>>>>>>> parallelism and
>>>>>>>>>>>>>>>> trigger re-partition without doing stateful operations.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I will update KIP with interface changes around
>>>>>>>> KStream#through if
>>>>>>>>>>>>>>>> this changes sound sensible.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>
>>
> 
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to