Actually, giving it more though - maybe enhancing Produced with num of 
partitions configuration is not the best approach. Let me explain why:

1) If we enhance Produced class with this configuration, this will also affect 
KStream#to operation. Since KStream#to is the final sink of the topology, for 
me, it seems to be reasonable assumption that user needs to manually create 
sink topic in advance. And in that case, having num of partitions configuration 
doesn’t make much sense. 

2) Looking at Produced class, based on API contract, seems like Produced is 
designed to be something that is explicitly for producer (key serializer, value 
serializer, partitioner those all are producer specific configurations) and num 
of partitions is topic level configuration. And I don’t think mixing topic and 
producer level configurations together in one class is the good approach.

3) Looking at KStream interface, seems like Produced parameter is for 
operations that work with non-internal (e.g topics created and managed 
internally by Kafka Streams) topics and I think we should leave it as it is in 
that case.

Taking all this things into account, I think we should distinguish between DSL 
operations, where Kafka Streams should create and manage internal topics 
(KStream#groupBy) vs topics that should be created in advance (e.g KStream#to).

To sum it up, I think adding numPartitions configuration in Produced will 
result in mixing topic and producer level configuration in one class and it’s 
gonna break existing API semantics.

Regarding making topic name optional in KStream#through - I think underline 
idea is very useful and giving users possibility to specify num of partitions 
there is even more useful :) Considering arguments against adding num of 
partitions in Produced class, I see two options here:
1) Add following method overloads
        * through() - topic will be auto-generated and num of partitions will 
be taken from source topic
        * through(final int numOfPartitions) - topic will be auto generated 
with specified num of partitions
        * through(final int numOfPartitions, final Produced<K, V> produced) - 
topic will be with generated with specified num of partitions and configuration 
taken from produced parameter.
2) Leave KStream#through as it is and introduce new method - 
KStream#repartition (I think Matthias suggested this in one of the threads)

Considering all mentioned above I propose the following plan:

Option A:
1) Leave Produced as it is
2) Add num of partitions configuration to Grouped class (as mentioned in the 
KIP)
3) Add following method overloads to KStream#through
        * through() - topic will be auto-generated and num of partitions will 
be taken from source topic
        * through(final int numOfPartitions) - topic will be auto generated 
with specified num of partitions
        * through(final int numOfPartitions, final Produced<K, V> produced) - 
topic will be with generated with specified num of partitions and configuration 
taken from produced parameter.

Option B:
1) Leave Produced as it is
2) Add num of partitions configuration to Grouped class (as mentioned in the 
KIP)
3) Add new operator KStream#repartition for creating and managing internal 
repartition topics

P.S. I’m sorry if all of this was already discussed in the mailing list, but I 
kinda got with all the threads that were about this KIP :(

Kind regards,
Levani

> On Jul 1, 2019, at 9:56 AM, Levani Kokhreidze <levani.co...@gmail.com> wrote:
> 
> Hello,
> 
> I would like to resurrect discussion around KIP-221. Going through the 
> discussion thread, there’s seems to agreement around usefulness of this 
> feature. 
> Regarding the implementation, as far as I understood, the most optimal 
> solution for me seems the following:
> 
> 1) Add two method overloads to KStream#through method (essentially making 
> topic name optional)
> 2) Enhance Produced class with numOfPartitions configuration field.
> 
> Those two changes will allow DSL users to control parallelism and trigger 
> re-partition without doing stateful operations.
> 
> I will update KIP with interface changes around KStream#through if this 
> changes sound sensible.
> 
> Kind regards,
> Levani

Reply via email to