Hi all,
Just some updates. Below is the vote thread:
https://sematext.com/opensee/m/Kafka/uyzND1h1NPW1tLVQR?subj=+VOTE+KIP+557+Add+emit+on+change+support+for+Kafka+Streams
It would be great if we can include this change to Kafka. :)
Cheers,
Richard
On Thu, Feb 27, 2020 at 6:45 PM Richard Yu
Hi all,
@John Will add some notes accordingly.
To all: Thanks for all your input!
It looks like we can wrap up this discussion thread then.
I've started a vote thread, so please feel free to cast your vote there!
We should be pretty close. :)
Cheers,
Richard
On Thu, Feb 27, 2020 at 2:34 PM
Hi Richard,
Thanks for the update!
I read it over, and overall it looks good!
I have only a minor concern about the rate metric definition:
> The rate option indicates the ratio of records dropped to actual volume of
> records passing through the task
That's not the definition of a "rate". It
Hi all,
I might've made a minor mistake. The processor node level is level 3, not
level 1.
I will correct the KIP accordingly.
After looking over things, I decided to start the voting thread this
afternoon.
Cheers,
Richard
On Thu, Feb 27, 2020 at 12:29 PM Richard Yu
wrote:
> Hi Bruno, Hi
Hi Bruno, Hi John,
Thanks for your comments! I updated the KIP accordingly, and it looks like
for quite a few points. I was doing some beating around the bush which
could've been avoided.
Looks like we can reduce the metric to Level 1 (per processor node) then.
I've cleaned up most of the
Hi John,
I agree with you. It is better to measure the metric on processor node
level. The users can do the rollup to task-level by themselves.
Best,
Bruno
On Thu, Feb 27, 2020 at 12:09 AM John Roesler wrote:
>
> Hi Richard,
>
> I've been making a final pass over the KIP.
>
> Re: Proposed
Hi Richard,
I've been making a final pass over the KIP.
Re: Proposed Behavior Change:
I think this point is controversial and probably doesn't need to be there at
all:
> 2.b. In certain situations where there is a high volume of idempotent
> updates throughout the Streams DAG, it will be
Hi Richard,
1. Could you change "idempotent update operations will only be dropped
from KTables, not from other classes." -> idempotent update operations
will only be dropped from materialized KTables? For non-materialized
KTables -- as they can occur after optimization of the topology -- we
Hi John,
Sounds goods. It looks like we are close to wrapping things up. If there
isn't any other revisions which needs to be made. (If so, please comment in
the thread)
I will start the voting process this Thursday (Pacific Standard Time).
Cheers,
Richard
On Tue, Feb 25, 2020 at 11:59 AM John
Hi Richard,
Sorry for the slow reply. I actually think we should avoid checking
equals() for now. Your reasoning is good, but the truth is that
depending on the implementation of equals() is non-trivial,
semantically, and (though I proposed it before), I'm not convinced
it's worth the risk. Much
nd
>>> is:
>>> > > > > > > >
>>> > > > > > > >>>>>>>>> This way, we still don't have to rely on the
>>> existence of an
>>> > > > > > > >>>>>>>>> equals
> > > >>> those
>> > > > > > > > either. Emit-on-change is operator semantics and I don't
>> see why
>> > > > > > > >>> we
>> > > > > > > > would need to have a metric for it? It seems
gt; > > > considering
> > > > > > > >>>>>>>>>> these use cases together.
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> To answer y
>>>>>>>>>> one from the Payroll table (salary), and these attributes
> > > > > > >>> change
> > > > > > >>>>>>>>>> rarely. On the other hand, there might be many other
> > >
>> schemas as we are (e.g. Avro), it seems like this approach
> > > > could
> > > > > >>>>>>>>> lead to a proliferation of "skinny" record types that just
> > > > > >>>>>>>>
; > > >>>>>>>>>> filtering out extra _data_, you have some extra _metadata_
> > > that
> > > > >>>>>>>>>> you still wish to pass down with the data when there is a
> > > > >>> "real"
out the API complexity. Maybe
> > > >>> you
> > > >>>>>>>>>> can help think though how to provide the same benefit while
> > > >>>>>>>>>> limiting user-facing complexity.
> > > >>>&g
teless operations, we'd probably need to rely
> > >>> on
> > >>>>>>>>>> equals() for no-op checking, otherwise we'd wind up requiring
> > >>>>>>>>>> serdes for stateless operations as well. Actually,
gt;>>>>>>>
> >>>>>>>>>> This way, we still don't have to rely on the existence of an
> >>>>>>>>>> equals() method, but if it is there, we can benefit from it.
> >>>>>>>>>> Also, we do
n't totally follow why the individual components
>>> (name,
>>>>>>>>> salary) would have to have serdes here. If Result has one, we
>>>>>>>>> compare bytes, and if Result additionally has an equals() method
>>>>>>>>
gt; ChangeDetector actually be registered? None of the operators in
> > >> > > >> my example have names or topics or any other identifiable
> > >> > > >> characteristic that could be passed to a ChangeDetector class
> > >> > > >
lement a ChangeDetector for every
> >> > > >> single operation in the topology, or you don't get the benefit of
> >> > > >> dropping non-changes internally 2b. Alternatively, you could just
> >> > > >> add the ChangeDetector to one operation toward the en
duplication was going to
>> > > > be opt-in, and it seemed very natural to say something like:
>> > > >
>> > > > employeeInfo.join(employeePayroll, (info, payroll) -> new
>> > > > Result(info.name(), payroll.salary()))
>> > >
s a metadata question, can we just
> > > >> plan to finish up the support for headers in Streams? I.e., give
> > > >> you a way to control the way that headers flow through the
> > > >> topology? Then, we could treat headers the same way we treat
&
fit for this particular case
> > > because it's mostly metadata, though to be honest we haven't looked
> > > at headers much (mostly because, and to your point, support seems
> > > to be lacking). I feel like there would be other cases where this
> >
t;
> >> Hi John, Can you describe how you'd use filtering/mapping to
> >> deduplicate records? To give some background on my suggestion we
> >> currently have a small stream processor that exists solely to
> >> deduplicate, which we do using a process that I assu
: This email was sent from outside TiVo. DO NOT
>> CLICK any links or attachments unless you expected them.
>>
>>
>> Hi Thomas and yuzhihong, That’s an interesting idea. Can you help
>> think of a use case that isn’t also served by filterin
2020 4:51 PM To: dev@kafka.apache.org<mailto:dev@kafka.apache.org>
mailto:dev@kafka.apache.org>> Subject: Re: [KAFKA-557]
Add emit on change support for Kafka Streams
[EXTERNAL EMAIL] Attention: This email was sent from outside TiVo. DO NOT CLICK
any links or attachments unless
about out-of-order handling and
> > > if/how it affects emit-on-change behavior. Also note, that KIP-280 is
> > > allowing a timestamp-based compaction that might allow us to fix a
> > > potential issue (in case there is one).
> > >
> > >
> > > -M
the skipped
> (duplicate) values.
>
> This way, it is easier to observe the effect when this feature is in
> production.
>
> Cheers
>
>
> >
> > -- Forwarded message -
> > From: Richard Yu
> > Date: Sun, Feb 2, 2020 at 10:21 AM
ble? Meaning use something like a hash by default, but
> > you could optionally provide an implementation of something to use instead,
> > like a ChangeDetector. This could be useful for example to ignore changes
> > to certain fields, which may not be relevant to the operation be
ache.org>
mailto:dev@kafka.apache.org>>
Subject: Re: [KAFKA-557] Add emit on change support for Kafka Streams
[EXTERNAL EMAIL] Attention: This email was sent from outside TiVo. DO NOT CLICK
any links or attachments unless you expected them.
Hel
is feature is in production.
Cheers
>
> -- Forwarded message -
> From: Richard Yu
> Date: Sun, Feb 2, 2020 at 10:21 AM
> Subject: Re: [KAFKA-557] Add emit on change support for Kafka Streams
> To:
>
>
> Hi Bruno,
>
> Thanks for the repl
n 1/31/20 5:30 PM, John Roesler wrote:
> > > > Hi Thomas and yuzhihong,
> > > >
> > > > That’s an interesting idea. Can you help think of a use case that
> isn’t
> > > also served by filtering or mapping before
@gmail.com wrote:
> > >> I think this is good idea.
> > >>
> > >>> On Jan 31, 2020, at 4:49 PM, Thomas Becker
> > wrote:
> > >>>
> > >>> How do folks feel about allowing the mechanism by which no-ops are
> > detected
ou could optionally provide an implementation of something to use instead,
> like a ChangeDetector. This could be useful for example to ignore changes
> to certain fields, which may not be relevant to the operation being
> performed.
> >>> _
e useful for example to ignore changes
>>> to certain fields, which may not be relevant to the operation being
>>> performed.
>>> ________________
>>> From: John Roesler
>>> Sent: Friday, January 31, 2020 4:51 PM
>>> To: dev@kafk
iday, January 31, 2020 4:51 PM
> > To: dev@kafka.apache.org
> > Subject: Re: [KAFKA-557] Add emit on change support for Kafka Streams
> >
> > [EXTERNAL EMAIL] Attention: This email was sent from outside TiVo. DO NOT
> > CLICK any links or attachments unless you expe
PM
> To: dev@kafka.apache.org
> Subject: Re: [KAFKA-557] Add emit on change support for Kafka Streams
>
> [EXTERNAL EMAIL] Attention: This email was sent from outside TiVo. DO NOT
> CLICK any links or attachments unless you expected them.
>
>
to certain
fields, which may not be relevant to the operation being performed.
From: John Roesler
Sent: Friday, January 31, 2020 4:51 PM
To: dev@kafka.apache.org
Subject: Re: [KAFKA-557] Add emit on change support for Kafka Streams
[EXTERNAL EMAIL] Attention
;> > >>>
> >> > >>> Hi Richard,
> >> > >>>
> >> > >>> Thanks for picking this up! I know of at least one large community
> >> member
> >> > >>> for which this feature is absolutely essential.
&
; > >>>
>> > >>> Given that any implementation of this feature would have some
>> performance
>> > >>> impact under some workloads, and also that we don't know if anyone
>> really
>> > >>> depends on emit-on-update time seman
t; >>> with the new default being "change"
> > >>>
> > >>> Thanks for pointing out the timestamp issue in particular. I agree
> that if
> > >>> we discard the latter update as a no-op, then we also have to
> discard its
> > >>
ore must remain consistent with what has been emitted).
> >>>
> >>> I have to confess that I disagree with your implementation proposal, but
> >>> it's also not necessary to discuss implementation in the KIP. Maybe it
> >>> would
> >>> be less contro
>> discussion can focus on the behavior change and config.
>>>
>>> Just for reference, there is some research into this domain. For example,
>>> see the "Report" section (3.2.3) of the SECRET paper:
>>> http://people.csail.mit.edu/t
take a brief survey of the
> > behaviors of other systems, along with pros and cons if any are reported.
> >
> > Thanks,
> > -John
> >
> >
> > On Fri, Jan 10, 2020, at 22:27, Richard Yu wrote:
> > > Hi everybody!
> > >
> &g
d be reduced traffic in Kafka Streams since
> > a lot of no-op results would no longer be sent downstream.
> > Here is the KIP for reference.
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-557%3A+Add+emit+on+change+support+for+Kafka+Streams
> >
> >
ger be sent downstream.
> Here is the KIP for reference.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-557%3A+Add+emit+on+change+support+for+Kafka+Streams
>
> Currently, I seek to formalize our approach for this KIP first before we
> determine concrete API additions /
/confluence/display/KAFKA/KIP-557%3A+Add+emit+on+change+support+for+Kafka+Streams
Currently, I seek to formalize our approach for this KIP first before we
determine concrete API additions / configurations.
Some configs might warrant adding, whiles others are not necessary since
adding them would only
49 matches
Mail list logo