Is there any update for this KIP?

-Matthias

On 12/4/17 2:08 PM, Matthias J. Sax wrote:
> Jeyhun,
> 
> thanks for updating the KIP.
> 
> I am wondering if you intend to add a new class `Produced`? There is
> already `org.apache.kafka.streams.kstream.Produced`. So if we want to
> add a new class, it must have a different name -- or we might be able to
> merge both into one?
> 
> Also, for the KStream overlaods of `through()` and `to()`, can you add
> the different behavior using different overloads? It's not clear from
> the KIP what the semantics are.
> 
> 
> -Matthias
> 
> On 11/17/17 3:27 PM, Jeyhun Karimov wrote:
>> Hi,
>>
>> Thanks for your comments. I agree with Matthias partially.
>> I think we should relax some requirements related with to() and through()
>> methods.
>> IMHO, Produced class can cover (existing/to be created) topic information,
>> and which will ease our effort:
>>
>> KStream.to(Produced topicInfo)
>> KStream.through(Produced topicInfo)
>>
>> This will decrease the number of overloads but we will need to deprecate
>> the existing to() and through() methods, perhaps.
>> I updated the KIP accordingly.
>>
>>
>> Cheers,
>> Jeyhun
>>
>> On Thu, Nov 16, 2017 at 10:21 PM Matthias J. Sax <matth...@confluent.io>
>> wrote:
>>
>>> @Jan:
>>>
>>> The `Produced` class was introduced in 1.0 to specify key and valud
>>> Serdes (and partitioner) if data is written into a topic.
>>>
>>> Old API:
>>>
>>> KStream#to("topic", keySerde, valueSerde);
>>>
>>> New API:
>>>
>>> KStream#to("topic", Produced.with(keySerde, valueSerde));
>>>
>>>
>>> This allows to reduce the number of overloads for `to()` (and
>>> `through()` that follows the same pattern) -- the second parameter is
>>> used to cover all different variations of option parameters users can
>>> specify, while we only have 2 overload for `to()` itself.
>>>
>>> What is still unclear to me it, what you mean by this topic prefix
>>> thing? Either a user cares about the topic name and thus, must create
>>> and manage it manually. Or the user does not care, and Streams create
>>> it. How would this prefix idea fit in here?
>>>
>>>
>>>
>>> @Guozhang:
>>>
>>> My idea was to extend `Produced` with the hint we want to give for
>>> creating internal topic and pass a optional `Produced` parameter. There
>>> are multiple things we can do here:
>>>
>>> 1) stream.through(null, Produced...).groupBy().aggregate()
>>> -> just allow for `null` topic name indicating that Streams should
>>> create an internal topic
>>>
>>> 2) stream.through(Produced...).groupBy().aggregate()
>>> -> add one overload taking an mandatory `Produced`
>>>
>>> We use `Serialized` to picky back the information
>>>
>>> 3) stream.groupBy(Serialized...).aggregate()
>>> and stream.groupByKey(Serialized...).aggregate()
>>> -> we don't need new top level overloads
>>>
>>>
>>> There are different trade-offs for those alternatives and maybe there
>>> are other ways to change the API. It's just to push the discussion further.
>>>
>>>
>>> -Matthias
>>>
>>> On 11/12/17 1:22 PM, Jan Filipiak wrote:
>>>> Hi Gouzhang,
>>>>
>>>> this felt like these questions are supposed to be answered by me.
>>>> I do not understand the first one. I don't understand why the user
>>>> shouldn't be able to specify a suffix for the topic name.
>>>>
>>>>  For the third question I am not 100% familiar if the Produced class
>>>> came to existence
>>>> at all. I remember proposing it somewhere in our redo DSL discussion that
>>>> I dropped out of later. Finally any call that does:
>>>>
>>>> 1. create the internal topic
>>>> 2. register sink
>>>> 3. register source
>>>>
>>>> will always get the work done. If we have a Produced like class. putting
>>>> all the parameters
>>>> in there make sense. (Partitioner, serde, PartitionHint, internal, name
>>>> ... )
>>>>
>>>> Hope this helps?
>>>>
>>>>
>>>> On 10.11.2017 07:54, Guozhang Wang wrote:
>>>>> A few clarification questions on the proposal details.
>>>>>
>>>>> 1. API: although the repartition only happens at the final stateful
>>>>> operations like agg / join, the repartition flag info was actually
>>> passed
>>>>> from an earlier operator like map / groupBy. So what should be the new
>>>>> API
>>>>> look like? For example, if we do
>>>>>
>>>>> stream.groupBy().through("topic-name", Produced..).aggregate
>>>>>
>>>>> This would be add a bunch of APIs to GroupedKStream/KTable
>>>>>
>>>>> 2. Semantics: as Matthias mentioned, today any topics defined in
>>>>> "through()" call is considered a user topic, and hence users are
>>>>> responsible for managing them, including the topic name. For this KIP's
>>>>> purpose, though, users would not care about the topic name. I.e. as a
>>>>> user
>>>>> I still want to make it be an internal topic so that I do not need to
>>>>> worry
>>>>> about it at all, but only specify num.partitions.
>>>>>
>>>>> 3. Details: in Produced we do not have specs for specifying the
>>>>> num.partitions or should we repartition or not. So it is still not
>>>>> clear to
>>>>> me how we would make use of that to achieve what's in the old
>>>>> proposal's RepartitionHint class.
>>>>>
>>>>>
>>>>>
>>>>> Guozhang
>>>>>
>>>>>
>>>>> On Mon, Nov 6, 2017 at 1:21 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>>
>>>>>> bq. enlarge the score of through()
>>>>>>
>>>>>> I guess you meant scope.
>>>>>>
>>>>>> On Mon, Nov 6, 2017 at 1:15 PM, Jeyhun Karimov <je.kari...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Sorry for the late reply. I am convinced that we should enlarge the
>>>>>>> score
>>>>>>> of through() (add more overloads) instead of introducing a separate
>>> set
>>>>>> of
>>>>>>> overloads to other methods.
>>>>>>> I will update the KIP soon based on the discussion and inform.
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Jeyhun
>>>>>>>
>>>>>>> On Mon, Nov 6, 2017 at 9:18 PM Jan Filipiak <jan.filip...@trivago.com
>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Sorry for not beeing 100% up to date.
>>>>>>>> Back then we had the discussion that when an operation puts a >Sink<
>>>>>>>> into the topology, a >Produced<
>>>>>>>> parameter is added. This produced parameter could have internal or
>>>>>>>> external. If internal I think the name would still make
>>>>>>>> a great suffix for the topic name
>>>>>>>>
>>>>>>>> Is this plan still around? Otherwise having the name as suffix is
>>>>>>>> probably always good it can help the user quicker to identify hot
>>>>>> topics
>>>>>>>> that need more
>>>>>>>> partitions if he has many of these internal repartitions
>>>>>>>>
>>>>>>>> Best Jan
>>>>>>>>
>>>>>>>>
>>>>>>>> On 06.11.2017 20:13, Matthias J. Sax wrote:
>>>>>>>>> I absolute agree with what you say. It's not a requirement to
>>>>>> specify a
>>>>>>>>> topic name -- and this was the idea -- if user does specify a name,
>>>>>> we
>>>>>>>>> treat as is -- if users does not specify a name, Streams create an
>>>>>>>>> internal topic.
>>>>>>>>>
>>>>>>>>> The goal of the Jira is to allow a simplified way to control
>>>>>>>>> repartitioning (atm, user needs to manually create a topic and use
>>>>>> via
>>>>>>>>> through()).
>>>>>>>>>
>>>>>>>>> Thus, the idea is to make the topic name parameter of through
>>>>>> optional.
>>>>>>>>> It's of course just an idea. Happy do have a other API design. The
>>>>>> goal
>>>>>>>>> was, to avoid to many new overloads.
>>>>>>>>>
>>>>>>>>>>> Could you clarify exactly what you mean by keeping the current
>>>>>>>> distinction?
>>>>>>>>> Current distinction is: user topics are created manually and user
>>>>>>>>> specifies the name -- internal topics are created by Kafka Streams
>>>>>> and
>>>>>>>>> an name is generated automatically.
>>>>>>>>>
>>>>>>>>> -> through("user-topic")
>>>>>>>>> -> through(TopicConfig.withNumberOfPartitions(5)) // Streams creates
>>>>>>> an
>>>>>>>>> internal topic
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11/6/17 6:56 PM, Thomas Becker wrote:
>>>>>>>>>> Could you clarify exactly what you mean by keeping the current
>>>>>>>> distinction?
>>>>>>>>>> Actually, re-reading the KIP and JIRA, it's not clear that being
>>>>>> able
>>>>>>>> to specify a custom name is actually a requirement. If the goal is to
>>>>>>>> control repartitioning and tune parallelism, maybe we can just
>>>>>>>> sidestep
>>>>>>>> this issue altogether by removing the ability to set a different
>>> name.
>>>>>>>>>> On Mon, 2017-11-06 at 16:51 +0100, Matthias J. Sax wrote:
>>>>>>>>>>
>>>>>>>>>> That's a good point. In current design, we strictly distinguish
>>>>>> both.
>>>>>>>>>> For example, the reset tools deletes internal topics (starting with
>>>>>>>>>> prefix `<application.id>-` and ending with either `-repartition`
>>> or
>>>>>>>>>> `-changelog`.
>>>>>>>>>>
>>>>>>>>>> Thus, from my point of view, it would make sense to keep the
>>> current
>>>>>>>>>> distinction.
>>>>>>>>>>
>>>>>>>>>> -Matthias
>>>>>>>>>>
>>>>>>>>>> On 11/6/17 4:45 PM, Thomas Becker wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I think this sounds good as well. It's worth clarifying whether
>>>>>> topics
>>>>>>>> that are named by the user but created by streams are considered
>>>>>>> "internal"
>>>>>>>> topics also.
>>>>>>>>>> On Sun, 2017-11-05 at 23:02 +0100, Matthias J. Sax wrote:
>>>>>>>>>>
>>>>>>>>>> My idea was, to relax the requirement for through() that a topic
>>>>>> must
>>>>>>> be
>>>>>>>>>> created manually before startup.
>>>>>>>>>>
>>>>>>>>>> Thus, if no through() call is made, a (internal) topic is created
>>>>>> the
>>>>>>>>>> same way we do it currently.
>>>>>>>>>>
>>>>>>>>>> If one uses `through(String topicName)` we keep the current
>>> behavior
>>>>>>> and
>>>>>>>>>> require users to create the topic manually.
>>>>>>>>>>
>>>>>>>>>> The reasoning is as follows: if a user creates a topic manually, a
>>>>>>> user
>>>>>>>>>> can just use it for repartitioning. As the topic is already there,
>>>>>>> there
>>>>>>>>>> is no need to specify any topic configs.
>>>>>>>>>>
>>>>>>>>>> We add a new `through()` overload (details TBD) that allows to
>>>>>> specify
>>>>>>>>>> topic configs and Streams create the topic with those configs.
>>>>>>>>>>
>>>>>>>>>> Reasoning: user don't want to manage topic manually, thus, it's
>>>>>> still
>>>>>>> an
>>>>>>>>>> internal topic and Streams create the topic name automatically as
>>>>>> for
>>>>>>>>>> all other internal topics. However, users gets some more control
>>>>>> about
>>>>>>>>>> topic parameters like number of partitions (we should discuss what
>>>>>>> other
>>>>>>>>>> configs would be useful).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Does this make sense?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -Matthias
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 11/5/17 1:21 AM, Jan Filipiak wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Im not 100 % up to date what version 1.0 DSL looks like ATM.
>>>>>>>>>> I just would argue that repartitioning should be an own API call
>>>>>> like
>>>>>>>>>> through or something.
>>>>>>>>>> One can use through or to already to get this. I would argue one
>>>>>>> should
>>>>>>>>>> look there instead of overloads
>>>>>>>>>>
>>>>>>>>>> Best Jan
>>>>>>>>>>
>>>>>>>>>> On 04.11.2017 16:01, Jeyhun Karimov wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Dear community,
>>>>>>>>>>
>>>>>>>>>> I would like to initiate discussion on KIP-221 [1] based on issue
>>>>>> [2].
>>>>>>>>>> Please feel free to comment.
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>>
>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>> 221%3A+Repartition+Topic+Hints+in+Streams
>>>>>>>>>> [2] https://issues.apache.org/jira/browse/KAFKA-6037
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Jeyhun
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ________________________________
>>>>>>>>>>
>>>>>>>>>> This email and any attachments may contain confidential and
>>>>>> privileged
>>>>>>>> material for the sole use of the intended recipient. Any review,
>>>>>> copying,
>>>>>>>> or distribution of this email (or any attachments) by others is
>>>>>>> prohibited.
>>>>>>>> If you are not the intended recipient, please contact the sender
>>>>>>>> immediately and permanently delete this email and any attachments. No
>>>>>>>> employee or agent of TiVo Inc. is authorized to conclude any binding
>>>>>>>> agreement on behalf of TiVo Inc. by email. Binding agreements with
>>>>>>>> TiVo
>>>>>>>> Inc. may only be made by a signed written agreement.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ________________________________
>>>>>>>>>>
>>>>>>>>>> This email and any attachments may contain confidential and
>>>>>> privileged
>>>>>>>> material for the sole use of the intended recipient. Any review,
>>>>>> copying,
>>>>>>>> or distribution of this email (or any attachments) by others is
>>>>>>> prohibited.
>>>>>>>> If you are not the intended recipient, please contact the sender
>>>>>>>> immediately and permanently delete this email and any attachments. No
>>>>>>>> employee or agent of TiVo Inc. is authorized to conclude any binding
>>>>>>>> agreement on behalf of TiVo Inc. by email. Binding agreements with
>>>>>>>> TiVo
>>>>>>>> Inc. may only be made by a signed written agreement.
>>>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to