Re: [ANNOUNCE] New Committer: Bruno Cadonna

2021-04-07 Thread Konstantine Karantasis
Congratulations Bruno!

Konstantine

On Wed, Apr 7, 2021 at 8:08 PM Sophie Blee-Goldman
 wrote:

> Congrats!
>
> On Wed, Apr 7, 2021 at 6:32 PM Luke Chen  wrote:
>
> > Congrats Bruno!!
> >
> > Luke
> >
> > On Thu, Apr 8, 2021 at 9:18 AM Matthias J. Sax  wrote:
> >
> > > Congrats Bruno! Very well deserved!
> > >
> > >
> > > -Matthias
> > >
> > > On 4/7/21 3:51 PM, Bill Bejeck wrote:
> > > > Congrats Bruno! Well deserved.
> > > >
> > > > Bill
> > > >
> > > > On Wed, Apr 7, 2021 at 6:34 PM Guozhang Wang 
> > wrote:
> > > >
> > > >> Hello all,
> > > >>
> > > >> I'm happy to announce that Bruno Cadonna has accepted his invitation
> > to
> > > >> become an Apache Kafka committer.
> > > >>
> > > >> Bruno has been contributing to Kafka since Jan. 2019 and has made 99
> > > >> commits and more than 80 PR reviews so far:
> > > >>
> > > >> https://github.com/apache/kafka/commits?author=cadonna
> > > >>
> > > >> He worked on a few key KIPs on Kafka Streams:
> > > >>
> > > >> * KIP-471: Expose RocksDB Metrics in Kafka Streams
> > > >> * KIP-607: Add Metrics to Kafka Streams to Report Properties of
> > RocksDB
> > > >> * KIP-662: Throw Exception when Source Topics of a Streams App are
> > > Deleted
> > > >>
> > > >> Besides all the code contributions and reviews, he's also done a
> > handful
> > > >> for the community: multiple Kafka meetup talks in Berlin and Kafka
> > > Summit
> > > >> talks, an introductory class to Kafka at Humboldt-Universität zu
> > Berlin
> > > >> seminars, and have co-authored a paper on Kafka's stream processing
> > > >> semantics in this year's SIGMOD conference (
> > > >> https://en.wikipedia.org/wiki/SIGMOD). Bruno has also been quite
> > > active on
> > > >> SO channels and AK mailings.
> > > >>
> > > >> Please join me to congratulate Bruno for all the contributions!
> > > >>
> > > >> -- Guozhang
> > > >>
> > > >
> > >
> >
>


Re: [ANNOUNCE] New Committer: Bruno Cadonna

2021-04-07 Thread Sophie Blee-Goldman
Congrats!

On Wed, Apr 7, 2021 at 6:32 PM Luke Chen  wrote:

> Congrats Bruno!!
>
> Luke
>
> On Thu, Apr 8, 2021 at 9:18 AM Matthias J. Sax  wrote:
>
> > Congrats Bruno! Very well deserved!
> >
> >
> > -Matthias
> >
> > On 4/7/21 3:51 PM, Bill Bejeck wrote:
> > > Congrats Bruno! Well deserved.
> > >
> > > Bill
> > >
> > > On Wed, Apr 7, 2021 at 6:34 PM Guozhang Wang 
> wrote:
> > >
> > >> Hello all,
> > >>
> > >> I'm happy to announce that Bruno Cadonna has accepted his invitation
> to
> > >> become an Apache Kafka committer.
> > >>
> > >> Bruno has been contributing to Kafka since Jan. 2019 and has made 99
> > >> commits and more than 80 PR reviews so far:
> > >>
> > >> https://github.com/apache/kafka/commits?author=cadonna
> > >>
> > >> He worked on a few key KIPs on Kafka Streams:
> > >>
> > >> * KIP-471: Expose RocksDB Metrics in Kafka Streams
> > >> * KIP-607: Add Metrics to Kafka Streams to Report Properties of
> RocksDB
> > >> * KIP-662: Throw Exception when Source Topics of a Streams App are
> > Deleted
> > >>
> > >> Besides all the code contributions and reviews, he's also done a
> handful
> > >> for the community: multiple Kafka meetup talks in Berlin and Kafka
> > Summit
> > >> talks, an introductory class to Kafka at Humboldt-Universität zu
> Berlin
> > >> seminars, and have co-authored a paper on Kafka's stream processing
> > >> semantics in this year's SIGMOD conference (
> > >> https://en.wikipedia.org/wiki/SIGMOD). Bruno has also been quite
> > active on
> > >> SO channels and AK mailings.
> > >>
> > >> Please join me to congratulate Bruno for all the contributions!
> > >>
> > >> -- Guozhang
> > >>
> > >
> >
>


Re: [ANNOUNCE] New Committer: Bruno Cadonna

2021-04-07 Thread Luke Chen
Congrats Bruno!!

Luke

On Thu, Apr 8, 2021 at 9:18 AM Matthias J. Sax  wrote:

> Congrats Bruno! Very well deserved!
>
>
> -Matthias
>
> On 4/7/21 3:51 PM, Bill Bejeck wrote:
> > Congrats Bruno! Well deserved.
> >
> > Bill
> >
> > On Wed, Apr 7, 2021 at 6:34 PM Guozhang Wang  wrote:
> >
> >> Hello all,
> >>
> >> I'm happy to announce that Bruno Cadonna has accepted his invitation to
> >> become an Apache Kafka committer.
> >>
> >> Bruno has been contributing to Kafka since Jan. 2019 and has made 99
> >> commits and more than 80 PR reviews so far:
> >>
> >> https://github.com/apache/kafka/commits?author=cadonna
> >>
> >> He worked on a few key KIPs on Kafka Streams:
> >>
> >> * KIP-471: Expose RocksDB Metrics in Kafka Streams
> >> * KIP-607: Add Metrics to Kafka Streams to Report Properties of RocksDB
> >> * KIP-662: Throw Exception when Source Topics of a Streams App are
> Deleted
> >>
> >> Besides all the code contributions and reviews, he's also done a handful
> >> for the community: multiple Kafka meetup talks in Berlin and Kafka
> Summit
> >> talks, an introductory class to Kafka at Humboldt-Universität zu Berlin
> >> seminars, and have co-authored a paper on Kafka's stream processing
> >> semantics in this year's SIGMOD conference (
> >> https://en.wikipedia.org/wiki/SIGMOD). Bruno has also been quite
> active on
> >> SO channels and AK mailings.
> >>
> >> Please join me to congratulate Bruno for all the contributions!
> >>
> >> -- Guozhang
> >>
> >
>


Re: [ANNOUNCE] New Kafka PMC Member: Bill Bejeck

2021-04-07 Thread Luke Chen
Congratulations Bill!

Luke

On Thu, Apr 8, 2021 at 9:17 AM Matthias J. Sax  wrote:

> Hi,
>
> It's my pleasure to announce that Bill Bejeck in now a member of the
> Kafka PMC.
>
> Bill has been a Kafka committer since Feb 2019. He has remained
> active in the community since becoming a committer.
>
>
>
> Congratulations Bill!
>
>  -Matthias, on behalf of Apache Kafka PMC
>


Re: [ANNOUNCE] New Committer: Bruno Cadonna

2021-04-07 Thread Matthias J. Sax
Congrats Bruno! Very well deserved!


-Matthias

On 4/7/21 3:51 PM, Bill Bejeck wrote:
> Congrats Bruno! Well deserved.
> 
> Bill
> 
> On Wed, Apr 7, 2021 at 6:34 PM Guozhang Wang  wrote:
> 
>> Hello all,
>>
>> I'm happy to announce that Bruno Cadonna has accepted his invitation to
>> become an Apache Kafka committer.
>>
>> Bruno has been contributing to Kafka since Jan. 2019 and has made 99
>> commits and more than 80 PR reviews so far:
>>
>> https://github.com/apache/kafka/commits?author=cadonna
>>
>> He worked on a few key KIPs on Kafka Streams:
>>
>> * KIP-471: Expose RocksDB Metrics in Kafka Streams
>> * KIP-607: Add Metrics to Kafka Streams to Report Properties of RocksDB
>> * KIP-662: Throw Exception when Source Topics of a Streams App are Deleted
>>
>> Besides all the code contributions and reviews, he's also done a handful
>> for the community: multiple Kafka meetup talks in Berlin and Kafka Summit
>> talks, an introductory class to Kafka at Humboldt-Universität zu Berlin
>> seminars, and have co-authored a paper on Kafka's stream processing
>> semantics in this year's SIGMOD conference (
>> https://en.wikipedia.org/wiki/SIGMOD). Bruno has also been quite active on
>> SO channels and AK mailings.
>>
>> Please join me to congratulate Bruno for all the contributions!
>>
>> -- Guozhang
>>
> 


[ANNOUNCE] New Kafka PMC Member: Bill Bejeck

2021-04-07 Thread Matthias J. Sax
Hi,

It's my pleasure to announce that Bill Bejeck in now a member of the
Kafka PMC.

Bill has been a Kafka committer since Feb 2019. He has remained
active in the community since becoming a committer.



Congratulations Bill!

 -Matthias, on behalf of Apache Kafka PMC


Re: [ANNOUNCE] New Committer: Bruno Cadonna

2021-04-07 Thread Bill Bejeck
Congrats Bruno! Well deserved.

Bill

On Wed, Apr 7, 2021 at 6:34 PM Guozhang Wang  wrote:

> Hello all,
>
> I'm happy to announce that Bruno Cadonna has accepted his invitation to
> become an Apache Kafka committer.
>
> Bruno has been contributing to Kafka since Jan. 2019 and has made 99
> commits and more than 80 PR reviews so far:
>
> https://github.com/apache/kafka/commits?author=cadonna
>
> He worked on a few key KIPs on Kafka Streams:
>
> * KIP-471: Expose RocksDB Metrics in Kafka Streams
> * KIP-607: Add Metrics to Kafka Streams to Report Properties of RocksDB
> * KIP-662: Throw Exception when Source Topics of a Streams App are Deleted
>
> Besides all the code contributions and reviews, he's also done a handful
> for the community: multiple Kafka meetup talks in Berlin and Kafka Summit
> talks, an introductory class to Kafka at Humboldt-Universität zu Berlin
> seminars, and have co-authored a paper on Kafka's stream processing
> semantics in this year's SIGMOD conference (
> https://en.wikipedia.org/wiki/SIGMOD). Bruno has also been quite active on
> SO channels and AK mailings.
>
> Please join me to congratulate Bruno for all the contributions!
>
> -- Guozhang
>


Re: [ANNOUNCE] New Committer: Bruno Cadonna

2021-04-07 Thread David Klein
He has also been a huge help to the community!  Congratulations Bruno!  And 
thank you!

> On Apr 7, 2021, at 5:34 PM, Guozhang Wang  wrote:
> 
> Hello all,
> 
> I'm happy to announce that Bruno Cadonna has accepted his invitation to
> become an Apache Kafka committer.
> 
> Bruno has been contributing to Kafka since Jan. 2019 and has made 99
> commits and more than 80 PR reviews so far:
> 
> https://github.com/apache/kafka/commits?author=cadonna
> 
> He worked on a few key KIPs on Kafka Streams:
> 
> * KIP-471: Expose RocksDB Metrics in Kafka Streams
> * KIP-607: Add Metrics to Kafka Streams to Report Properties of RocksDB
> * KIP-662: Throw Exception when Source Topics of a Streams App are Deleted
> 
> Besides all the code contributions and reviews, he's also done a handful
> for the community: multiple Kafka meetup talks in Berlin and Kafka Summit
> talks, an introductory class to Kafka at Humboldt-Universität zu Berlin
> seminars, and have co-authored a paper on Kafka's stream processing
> semantics in this year's SIGMOD conference (
> https://en.wikipedia.org/wiki/SIGMOD). Bruno has also been quite active on
> SO channels and AK mailings.
> 
> Please join me to congratulate Bruno for all the contributions!
> 
> -- Guozhang



[ANNOUNCE] New Committer: Bruno Cadonna

2021-04-07 Thread Guozhang Wang
Hello all,

I'm happy to announce that Bruno Cadonna has accepted his invitation to
become an Apache Kafka committer.

Bruno has been contributing to Kafka since Jan. 2019 and has made 99
commits and more than 80 PR reviews so far:

https://github.com/apache/kafka/commits?author=cadonna

He worked on a few key KIPs on Kafka Streams:

* KIP-471: Expose RocksDB Metrics in Kafka Streams
* KIP-607: Add Metrics to Kafka Streams to Report Properties of RocksDB
* KIP-662: Throw Exception when Source Topics of a Streams App are Deleted

Besides all the code contributions and reviews, he's also done a handful
for the community: multiple Kafka meetup talks in Berlin and Kafka Summit
talks, an introductory class to Kafka at Humboldt-Universität zu Berlin
seminars, and have co-authored a paper on Kafka's stream processing
semantics in this year's SIGMOD conference (
https://en.wikipedia.org/wiki/SIGMOD). Bruno has also been quite active on
SO channels and AK mailings.

Please join me to congratulate Bruno for all the contributions!

-- Guozhang


Re: Kafka Streams - Out of Order Handling

2021-04-07 Thread Matthias J. Sax
Sorry for late reply...


> I only see issues of out of order data in my re-partitioned topic as a result 
> of a rebalance happening.

If you re-partition, you may actually see out-of-order data even if
there is no rebalance. In the end, during repartitioning you have
multiple upstream writers for the repartition topic and thus interleaved
writes per partition.

Maybe it's not a issue during regular processing for your use case, as
your throughput seems to be tiny.


> I believe all stream threads across all app instances will pause consuming 
> whilst the rebalance is worked through.

No really. (1) If a thread dies, it takes some time to detect that the
thread died and thus all other thread continue to process until a
rebalance is even started. (2) With incremental rebalancing, partitions
that are not re-assigned are processed throughout a rebalance.


> but am I right in thinking that one streams app (or at least some of its 
> stream threads) will have to wait for state to be synced from the changelog 
> topic?

That is correct. If a thread gets a new task assigned and needs to
recover state, it would pause processing for all its partitions/tasks
until restore finished. However, this pausing of partitions/tasks
happens only on a per-thread basis. Threads, even from the same app
instance, are totally agnostic to each other.

So what your describe in your example make sense.


> If this is the case, do you think increasing standby replicas will lessen the 
> issue? 

Standbys can reduce your recovery time, and for your low throughput use
case may even reduce recovery time to zero. Thus, if the rebalance
itself happens quickly enough, the issue with out-of-order data may go
away (or at least should be large mitigated).


-Matthias


On 3/12/21 3:40 AM, Marcus Horsley-Rai wrote:
> Thanks Matthias - that's great to know.
> 
>> Increasing the grace period should not really affect throughput, but
>> latency.
> 
> Yes, a slip of the tongue on my part, you’re right :-)
> 
> One last question if I may? I only see issues of out of order data in my 
> re-partitioned topic as a result of a rebalance happening.
> My hypothesis is that when an instance of my streams app dies - the 
> consumption of data from the partitions it was responsible for falls behind 
> compared to others.
> I believe all stream threads across all app instances will pause consuming 
> whilst the rebalance is worked through.. but am I right in thinking that one 
> streams app (or at least some of its stream threads) will have to wait for 
> state to be synced from the changelog topic?
> In other words - when a rebalance happens - I assume the consumer group 
> doesn’t wait for the slowest member to be ready to consume?
> 
> To illustrate with an example:
>   If I have 3 partitions of a single topic and three streams app 
> instances (1 partition each)
>   I have a producer that produces to each partition each minute on the 
> minute
>   Normally the timestamp of the head record is roughly the same across 
> all three partitions. This assumes no lag ever builds up on the consumer 
> group, and also assumes data volume and size of messages is comparable.
> 
>   Now I kill streams app A. The rebalance protocol kicks in and gives 
> instance B an extra partition to consume from.
> Could there now be a bigger lag for one or both of the partitions app 
> B is consuming from because it had to sync state store state? (Assume B has 
> enough stream processing threads idle and the machine is specced to cope with 
> the extra load)
>…whereas app C, unhindered by state syncing, has potentially now 
> produced to the through topic a record from a newer batch/time window.
> 
> If this is the case, do you think increasing standby replicas will lessen the 
> issue?  I obviously don’t expect it to be a magic bullet, and grace period is 
> still required in general
> 
> 
> Best Regards,
> 
> Marcus
> 
> 
> 
> 
> On Thu, Mar 11, 2021 at 1:40 AM Matthias J. Sax  > wrote:
>> will it consider a timestamp in the body of the message, if we have 
>> implemented a custom TimeExtractor?
> 
> Yes.
> 
> 
>> Or, which I feel is more likely - does TimeExtractor stream time only apply 
>> later on once deserialisation has happened?
> 
> Well, the extractor does apply after deserialization, but we deserialize
> each partition head-record to be able to apply the timestamp extractor:
> ie, deserialization happens when a record becomes the "head record".
> 
> Cf
> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/RecordQueue.java
>  
> 
> 
> 
>> the accuracy of the aggregates may have to come second to the throughput.
> 
> Increasing the grace period should not really affect throughput, but
> latency.
> 
> 
> 
> -Matthias
> 
> 
> On 3/10

Re: [kafka-clients] Subject: [VOTE] 2.8.0 RC1

2021-04-07 Thread Israel Ekpo
Hello John,

Thanks for running the release for 2.8.0

I was reviewing the request for validation for 2.8.0 RC1 it appears that
the deadline to complete testing/validation by the community is 6th April
2021

Same day delivery :)

I think you meant to say Tuesday, April 13 2021 instead.

Could we assume that was just a carry over from the previous solicitation
for 2.8.0 RC0 and you actually meant to say 2021-04-13?

When you have a moment, please clarify.

I am running my tests shortly and will share my results by the end of the
week.

Thanks.



On Tue, Apr 6, 2021 at 5:37 PM John Roesler  wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the second candidate for release of Apache Kafka
> 2.8.0. This is a major release that includes many new
> features, including:
>
> * Early-access release of replacing Zookeeper with a self-
> managed quorum
> * Add Describe Cluster API
> * Support mutual TLS authentication on SASL_SSL listeners
> * Ergonomic improvements to Streams TopologyTestDriver
> * Logger API improvement to respect the hierarchy
> * Request/response trace logs are now JSON-formatted
> * New API to add and remove Streams threads while running
> * New REST API to expose Connect task configurations
> * Fixed the TimeWindowDeserializer to be able to deserialize
> keys outside of Streams (such as in the console consumer)
> * Streams resilient improvement: new uncaught exception
> handler
> * Streams resilience improvement: automatically recover from
> transient timeout exceptions
>
>
>
>
> Release notes for the 2.8.0 release:
> https://home.apache.org/~vvcephei/kafka-2.8.0-rc1/RELEASE_NOTES.html
>
>
> *** Please download, test and vote by 6 April 2021 ***
>
> Kafka's KEYS file containing PGP keys we use to sign the
> release:
> https://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
>
> https://home.apache.org/~vvcephei/kafka-2.8.0-rc1/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> * Javadoc:
>
> https://home.apache.org/~vvcephei/kafka-2.8.0-rc1/javadoc/
>
> * Tag to be voted upon (off 2.8 branch) is the 2.8.0 tag:
>
> https://github.com/apache/kafka/releases/tag/2.8.0-rc1
>
> * Documentation:
> https://kafka.apache.org/28/documentation.html
>
> * Protocol:
> https://kafka.apache.org/28/protocol.html
>
>
> /**
>
> Thanks,
> John
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kafka-clients/b0814d6acb8f37e0e729e3582bc5552fa30ca8e3.camel%40apache.org
> .
>


Re: Kafka Stream: State replication seems unpredictable.

2021-04-07 Thread Guozhang Wang
Hello Mangat,

With at least once, although some records maybe processed multiple times
their process ordering should not be violated, so what you observed is not
expected. What caught my eyes are this section in your output changelogs
(high-lighted):

Key1, V1
Key1, null
Key1, V1
Key1, null  (processed again)
Key1, V2
Key1, null

*Key1, V1Key1,V2*
Key1, V2+V1 (I guess we didn't process V2 tombstone yet but reprocessed V1
again due to reassignment)

They seem to be the result of first receiving a tombstone which removes V1
and then a new record that adds V2. However, since caching is disabled you
should get

*Key1,V1*
*Key1,null*
*Key1,V2*

instead; without the actual code snippet I cannot tell more what's
happening here. If you can look into the logs you can record each time when
partition migrates, how many records from the changelog was replayed to
restore the store, and from which offset on the input topic does Streams
resume processing. You may also consider upgrading to 2.6.x or higher
version and see if this issue goes away.


Guozhang

On Tue, Apr 6, 2021 at 8:38 AM mangat rai  wrote:

> Hey,
>
> We have the following setup in our infrastructure.
>
>1. Kafka - 2.5.1
>2. Apps use kafka streams `org.apache.kafka` version 2.5.1 library
>3. Low level processor API is used with *atleast-once* semantics
>4. State stores are *in-memory* with *caching disabled* and *changelog
>enabled*
>
>
> Is it possible that during state replication and partition reassignment,
> the input data is not always applied to the state store?
>
> 1. Let's say the input topic is having records like following
>
> ```
> Key1, V1
> Key1, null (tombstone)
> Key1, V2
> Key1, null
> Key1, V3
> Key1, V4
> ```
> 2. The app has an aggregation function which takes these record and update
> the state store so that changelog shall be
>
> ```
> Key1, V1
> Key1, null (tombstone)
> Key1, V2
> Key1, null
> Key1, V3
> Key1, V3 + V4
> ```
> Let's say the partition responsible for processing the above key was
> several times reallocated to different threads due to some infra issues we
> are having(in Kubernetes where we run the app, not the Kafka cluster).
>
> I see the following record in the changelogs
>
> ```
> Key1, V1
> Key1, null
> Key1, V1
> Key1, null  (processed again)
> Key1, V2
> Key1, null
> Key1, V1
> Key1,V2
> Key1, V2+V1 (I guess we didn't process V2 tombstone yet but reprocessed V1
> again due to reassignment)
> Key1,V1 (V2 is gone as there was a tombstone, but then V1 tombstone should
> have been applied also!!)
> Key1, V2+V1 (it is back!!!)
> Key1,V1
> Key1, V1 + V2 + V3 (This is the final state)!
> ```
>
> If you see this means several things
> 1. The state is always correctly applied locally (in developer laptop),
> where there were no reassignments.
> 2. The records are processed multiple times, which is understandable as we
> have at least symantics here.
> 3. As long as we re-apply the same events in the same orders we are golden
> but looks like some records are skipped, but here it looks as if we have
> multiple consumers reading and update the same topics, leading to race
> conditions.
>
> Is there any way, Kafka streams' state replication could lead to such a
> race condition?
>
> Regards,
> Mangat
>


-- 
-- Guozhang