Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-28 Thread Richard Yu
Hi all, Just some updates. Below is the vote thread: https://sematext.com/opensee/m/Kafka/uyzND1h1NPW1tLVQR?subj=+VOTE+KIP+557+Add+emit+on+change+support+for+Kafka+Streams It would be great if we can include this change to Kafka. :) Cheers, Richard On Thu, Feb 27, 2020 at 6:45 PM Richard Yu

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-27 Thread Richard Yu
Hi all, @John Will add some notes accordingly. To all: Thanks for all your input! It looks like we can wrap up this discussion thread then. I've started a vote thread, so please feel free to cast your vote there! We should be pretty close. :) Cheers, Richard On Thu, Feb 27, 2020 at 2:34 PM

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-27 Thread John Roesler
Hi Richard, Thanks for the update! I read it over, and overall it looks good! I have only a minor concern about the rate metric definition: > The rate option indicates the ratio of records dropped to actual volume of > records passing through the task That's not the definition of a "rate". It

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-27 Thread Richard Yu
Hi all, I might've made a minor mistake. The processor node level is level 3, not level 1. I will correct the KIP accordingly. After looking over things, I decided to start the voting thread this afternoon. Cheers, Richard On Thu, Feb 27, 2020 at 12:29 PM Richard Yu wrote: > Hi Bruno, Hi

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-27 Thread Richard Yu
Hi Bruno, Hi John, Thanks for your comments! I updated the KIP accordingly, and it looks like for quite a few points. I was doing some beating around the bush which could've been avoided. Looks like we can reduce the metric to Level 1 (per processor node) then. I've cleaned up most of the

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-27 Thread Bruno Cadonna
Hi John, I agree with you. It is better to measure the metric on processor node level. The users can do the rollup to task-level by themselves. Best, Bruno On Thu, Feb 27, 2020 at 12:09 AM John Roesler wrote: > > Hi Richard, > > I've been making a final pass over the KIP. > > Re: Proposed

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-26 Thread John Roesler
Hi Richard, I've been making a final pass over the KIP. Re: Proposed Behavior Change: I think this point is controversial and probably doesn't need to be there at all: > 2.b. In certain situations where there is a high volume of idempotent > updates throughout the Streams DAG, it will be

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-26 Thread Bruno Cadonna
Hi Richard, 1. Could you change "idempotent update operations will only be dropped from KTables, not from other classes." -> idempotent update operations will only be dropped from materialized KTables? For non-materialized KTables -- as they can occur after optimization of the topology -- we

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-25 Thread Richard Yu
Hi John, Sounds goods. It looks like we are close to wrapping things up. If there isn't any other revisions which needs to be made. (If so, please comment in the thread) I will start the voting process this Thursday (Pacific Standard Time). Cheers, Richard On Tue, Feb 25, 2020 at 11:59 AM John

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-25 Thread John Roesler
Hi Richard, Sorry for the slow reply. I actually think we should avoid checking equals() for now. Your reasoning is good, but the truth is that depending on the implementation of equals() is non-trivial, semantically, and (though I proposed it before), I'm not convinced it's worth the risk. Much

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-22 Thread Richard Yu
nd >>> is: >>> > > > > > > > >>> > > > > > > >>>>>>>>> This way, we still don't have to rely on the >>> existence of an >>> > > > > > > >>>>>>>>> equals

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-21 Thread Richard Yu
> > > >>> those >> > > > > > > > either. Emit-on-change is operator semantics and I don't >> see why >> > > > > > > >>> we >> > > > > > > > would need to have a metric for it? It seems

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-21 Thread Richard Yu
gt; > > > considering > > > > > > > >>>>>>>>>> these use cases together. > > > > > > > >>>>>>>>>> > > > > > > > >>>>>>>>>> To answer y

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-21 Thread John Roesler
>>>>>>>>>> one from the Payroll table (salary), and these attributes > > > > > > >>> change > > > > > > >>>>>>>>>> rarely. On the other hand, there might be many other > > >

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-20 Thread Bruno Cadonna
>> schemas as we are (e.g. Avro), it seems like this approach > > > > could > > > > > >>>>>>>>> lead to a proliferation of "skinny" record types that just > > > > > >>>>>>>>

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-19 Thread John Roesler
; > > >>>>>>>>>> filtering out extra _data_, you have some extra _metadata_ > > > that > > > > >>>>>>>>>> you still wish to pass down with the data when there is a > > > > >>> "real"

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-19 Thread Bruno Cadonna
out the API complexity. Maybe > > > >>> you > > > >>>>>>>>>> can help think though how to provide the same benefit while > > > >>>>>>>>>> limiting user-facing complexity. > > > >>>&g

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-18 Thread Richard Yu
teless operations, we'd probably need to rely > > >>> on > > >>>>>>>>>> equals() for no-op checking, otherwise we'd wind up requiring > > >>>>>>>>>> serdes for stateless operations as well. Actually,

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-18 Thread John Roesler
gt;>>>>>>> > >>>>>>>>>> This way, we still don't have to rely on the existence of an > >>>>>>>>>> equals() method, but if it is there, we can benefit from it. > >>>>>>>>>> Also, we do

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-17 Thread Matthias J. Sax
n't totally follow why the individual components >>> (name, >>>>>>>>> salary) would have to have serdes here. If Result has one, we >>>>>>>>> compare bytes, and if Result additionally has an equals() method >>>>>>>>

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-17 Thread Richard Yu
gt; ChangeDetector actually be registered? None of the operators in > > >> > > >> my example have names or topics or any other identifiable > > >> > > >> characteristic that could be passed to a ChangeDetector class > > >> > > >

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-17 Thread John Roesler
lement a ChangeDetector for every > >> > > >> single operation in the topology, or you don't get the benefit of > >> > > >> dropping non-changes internally 2b. Alternatively, you could just > >> > > >> add the ChangeDetector to one operation toward the en

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-11 Thread Richard Yu
duplication was going to >> > > > be opt-in, and it seemed very natural to say something like: >> > > > >> > > > employeeInfo.join(employeePayroll, (info, payroll) -> new >> > > > Result(info.name(), payroll.salary())) >> > >

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-06 Thread Richard Yu
s a metadata question, can we just > > > >> plan to finish up the support for headers in Streams? I.e., give > > > >> you a way to control the way that headers flow through the > > > >> topology? Then, we could treat headers the same way we treat &

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-05 Thread John Roesler
fit for this particular case > > > because it's mostly metadata, though to be honest we haven't looked > > > at headers much (mostly because, and to your point, support seems > > > to be lacking). I feel like there would be other cases where this > >

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-05 Thread Ted Yu
t; > >> Hi John, Can you describe how you'd use filtering/mapping to > >> deduplicate records? To give some background on my suggestion we > >> currently have a small stream processor that exists solely to > >> deduplicate, which we do using a process that I assu

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-04 Thread Matthias J. Sax
: This email was sent from outside TiVo. DO NOT >> CLICK any links or attachments unless you expected them. >> >> >> Hi Thomas and yuzhihong, That’s an interesting idea. Can you help >> think of a use case that isn’t also served by filterin

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-04 Thread Thomas Becker
2020 4:51 PM To: dev@kafka.apache.org<mailto:dev@kafka.apache.org> mailto:dev@kafka.apache.org>> Subject: Re: [KAFKA-557] Add emit on change support for Kafka Streams [EXTERNAL EMAIL] Attention: This email was sent from outside TiVo. DO NOT CLICK any links or attachments unless

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-03 Thread John Roesler
about out-of-order handling and > > > if/how it affects emit-on-change behavior. Also note, that KIP-280 is > > > allowing a timestamp-based compaction that might allow us to fix a > > > potential issue (in case there is one). > > > > > > > > > -M

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-03 Thread John Roesler
the skipped > (duplicate) values. > > This way, it is easier to observe the effect when this feature is in > production. > > Cheers > > > > > > -- Forwarded message - > > From: Richard Yu > > Date: Sun, Feb 2, 2020 at 10:21 AM

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-03 Thread John Roesler
ble? Meaning use something like a hash by default, but > > you could optionally provide an implementation of something to use instead, > > like a ChangeDetector. This could be useful for example to ignore changes > > to certain fields, which may not be relevant to the operation be

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-03 Thread Thomas Becker
ache.org> mailto:dev@kafka.apache.org>> Subject: Re: [KAFKA-557] Add emit on change support for Kafka Streams [EXTERNAL EMAIL] Attention: This email was sent from outside TiVo. DO NOT CLICK any links or attachments unless you expected them. Hel

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-02 Thread Ted Yu
is feature is in production. Cheers > > -- Forwarded message - > From: Richard Yu > Date: Sun, Feb 2, 2020 at 10:21 AM > Subject: Re: [KAFKA-557] Add emit on change support for Kafka Streams > To: > > > Hi Bruno, > > Thanks for the repl

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-02 Thread Richard Yu
n 1/31/20 5:30 PM, John Roesler wrote: > > > > Hi Thomas and yuzhihong, > > > > > > > > That’s an interesting idea. Can you help think of a use case that > isn’t > > > also served by filtering or mapping before

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-02 Thread Bruno Cadonna
@gmail.com wrote: > > >> I think this is good idea. > > >> > > >>> On Jan 31, 2020, at 4:49 PM, Thomas Becker > > wrote: > > >>> > > >>> How do folks feel about allowing the mechanism by which no-ops are > > detected

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-01 Thread Richard Yu
ou could optionally provide an implementation of something to use instead, > like a ChangeDetector. This could be useful for example to ignore changes > to certain fields, which may not be relevant to the operation being > performed. > >>> _

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-01-31 Thread Matthias J. Sax
e useful for example to ignore changes >>> to certain fields, which may not be relevant to the operation being >>> performed. >>> ________________ >>> From: John Roesler >>> Sent: Friday, January 31, 2020 4:51 PM >>> To: dev@kafk

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-01-31 Thread John Roesler
iday, January 31, 2020 4:51 PM > > To: dev@kafka.apache.org > > Subject: Re: [KAFKA-557] Add emit on change support for Kafka Streams > > > > [EXTERNAL EMAIL] Attention: This email was sent from outside TiVo. DO NOT > > CLICK any links or attachments unless you expe

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-01-31 Thread yuzhihong
PM > To: dev@kafka.apache.org > Subject: Re: [KAFKA-557] Add emit on change support for Kafka Streams > > [EXTERNAL EMAIL] Attention: This email was sent from outside TiVo. DO NOT > CLICK any links or attachments unless you expected them. > >

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-01-31 Thread Thomas Becker
to certain fields, which may not be relevant to the operation being performed. From: John Roesler Sent: Friday, January 31, 2020 4:51 PM To: dev@kafka.apache.org Subject: Re: [KAFKA-557] Add emit on change support for Kafka Streams [EXTERNAL EMAIL] Attention

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-01-31 Thread John Roesler
;> > >>> > >> > >>> Hi Richard, > >> > >>> > >> > >>> Thanks for picking this up! I know of at least one large community > >> member > >> > >>> for which this feature is absolutely essential. &

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-01-27 Thread Richard Yu
; > >>> >> > >>> Given that any implementation of this feature would have some >> performance >> > >>> impact under some workloads, and also that we don't know if anyone >> really >> > >>> depends on emit-on-update time seman

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-01-25 Thread Richard Yu
t; >>> with the new default being "change" > > >>> > > >>> Thanks for pointing out the timestamp issue in particular. I agree > that if > > >>> we discard the latter update as a no-op, then we also have to > discard its > > >>

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-01-24 Thread Bruno Cadonna
ore must remain consistent with what has been emitted). > >>> > >>> I have to confess that I disagree with your implementation proposal, but > >>> it's also not necessary to discuss implementation in the KIP. Maybe it > >>> would > >>> be less contro

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-01-24 Thread Matthias J. Sax
>> discussion can focus on the behavior change and config. >>> >>> Just for reference, there is some research into this domain. For example, >>> see the "Report" section (3.2.3) of the SECRET paper: >>> http://people.csail.mit.edu/t

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-01-24 Thread John Roesler
take a brief survey of the > > behaviors of other systems, along with pros and cons if any are reported. > > > > Thanks, > > -John > > > > > > On Fri, Jan 10, 2020, at 22:27, Richard Yu wrote: > > > Hi everybody! > > > > &g

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-01-24 Thread Bruno Cadonna
d be reduced traffic in Kafka Streams since > > a lot of no-op results would no longer be sent downstream. > > Here is the KIP for reference. > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-557%3A+Add+emit+on+change+support+for+Kafka+Streams > > > >

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-01-24 Thread John Roesler
ger be sent downstream. > Here is the KIP for reference. > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-557%3A+Add+emit+on+change+support+for+Kafka+Streams > > Currently, I seek to formalize our approach for this KIP first before we > determine concrete API additions /

[KAFKA-557] Add emit on change support for Kafka Streams

2020-01-10 Thread Richard Yu
/confluence/display/KAFKA/KIP-557%3A+Add+emit+on+change+support+for+Kafka+Streams Currently, I seek to formalize our approach for this KIP first before we determine concrete API additions / configurations. Some configs might warrant adding, whiles others are not necessary since adding them would only