Re: Redis as state store

2021-03-19 Thread Pushkar Deole
Thanks Sophie... that answers my question, however still worried about some
other aspects:

1. If redis is to be restored from changelog topic: what would happen if i
have 3 stream applications and 1 instance went down ... will other 2
instances halt until entire existing state from redis is wiped out and
entire state is restored back from changelog topic? If so then it would
have a significant performance hit especially when this happens during
heavy traffic hours

2. Will #1 be solved by the 2nd alternative that you mentioned in the
comment i.e 'An alternative is to just start buffering updates in-memory
(or in rocksdb, this could be configurable) and then avoid dirtying the
remote storage in the first place as we would only flush the data out to it
during a commit'  It looks to me that this won't need rebuilding entire
state store because changelog is disabled, and this alternative would avoid
making the state store inconsistent in first place, thus saving wipe out
and rebuild ? If so then this also doesn't need to halt other stream
applications and would prove much more better approach from performance
point of view. Is that correct?

On Sat, Mar 20, 2021 at 2:25 AM Sophie Blee-Goldman
 wrote:

> Hey Pushkar, yes, the data will still be backed by a changelog topic unless
> the
> user explicitly disables logging for that state store. The fault tolerance
> mechanism
> of Kafka Streams is based on changelogging, therefore there are no
> correctness
> guarantees if you decide to disable it.
>
> That said, I'm guessing many users do in fact disable the changelog when
> plugging
> in a remote store with it's own fault tolerance guarantees -- is that what
> you're getting
> at? We could definitely build in better support for that case, as either an
> additional
> optimization on top of KAFKA-12475
>  or as an alternative
> implementation to fix the
> underlying EOS problem. Check out my latest comment on the ticket here
> <
> https://issues.apache.org/jira/browse/KAFKA-12475?focusedCommentId=17305191&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17305191
> >
>
> Does that address your question?
>
> On Fri, Mar 19, 2021 at 10:14 AM Pushkar Deole 
> wrote:
>
> > Hello Sophie,
> >
> > may be i am missing something here, however can you let me know how a
> redis
> > based state store will be wiped off the inconsistent state in case stream
> > application dies in the middle of processing e.g. stream application
> > consumed from source topic, processed source event and saved state to
> redis
> > and before producing event on destination topic, the stream application
> had
> > error.
> > If this occurs with a rocksDB or in-memory state store, it will be
> rebuilt
> > from changelog topic, however for redis state store, how it will wiped
> off
> > the state ? are we saying here that the data stored in redis will still
> be
> > backed by changelog topic and redis will be restored from backed topic in
> > case of stream application error?
> >
> > On Tue, Mar 16, 2021 at 12:18 AM Sophie Blee-Goldman
> >  wrote:
> >
> > > This certainly does seem like a flaw in the Streams API, although of
> > course
> > > Streams is just
> > > in general not designed for use with remote anything (API calls,
> stores,
> > > etc)
> > >
> > > That said, I don't see any reason why we *couldn't* have better support
> > for
> > > remote state stores.
> > > Note that there's currently no deleteAll method on the store interface,
> > and
> > > not all databases
> > > necessarily support that. But we could add a default implementation
> which
> > > just calls delete(key)
> > > on all keys in the state store, and for the RocksDB-based state stores
> we
> > > still wipe out the state
> > > as usual (and recommend the same for any custom StateStore which is
> local
> > > rather than remote).
> > > Obviously falling back on individual delete(key) operations for all the
> > > data in the entire store will
> > > have worse performance, but that's better than silently breaking EOS
> when
> > > deleteAll is not available
> > > on a remote store.
> > >
> > > Would you be interested in doing a KIP for this? We should also file a
> > JIRA
> > > with the above explanation,
> > > so that other users of remote storage are aware of this limitation. And
> > > definitely document this somewhere
> > >
> > >
> > > On Mon, Mar 15, 2021 at 8:04 AM Bruno Cadonna
>  > >
> > > wrote:
> > >
> > > > Hi Alex,
> > > >
> > > > I guess wiping out the state directory is easier code-wise, faster,
> > > > and/or at the time of development the developers did not design for
> > > > remote state stores. But I do actually not know the exact reason.
> > > >
> > > > Off the top of my head, I do not know how to solve this for remote
> > state
> > > > stores. Using the uncaught exception handler is not good, because a
> > > > computing node could fail without giving the JVM the opportunity to
> > > > t

Re: Kafka Streams And Partitioning

2021-03-19 Thread Sophie Blee-Goldman
Ah ok, I think I was envisioning a different use case from your initial
description of the problem.
If everything that you want to group together is already correctly
partitioned, then you won't need
a repartitioning step. If I understand correctly, you have something like
this in mind:

 builder
.stream("input-topic")
.selectKey((key, value) -> key + extractKeyData(value)); // eg A --> A:B
.groupByKey()
.aggregate(...);
.to("output-topic");  // keys are of the form A:B, but should be
partitioned by A

If that looks about right, then there are two things to watch out for:
1) To keep partitioning on A instead of A:B, you'll need to provide a
custom Partitioner when writing
to the output topic. See Produced#streamPartitioner
2) Since you have a key-changing operation (selectKey) upstream of a
stateful operation (aggregate),
Streams will automatically infer that a repartitioning step is required
and insert one for you.
Unfortunately there's currently no way to force Streams not to do that,
even when you know the
data is going to be partitioned correctly -- there is a ticket for this
but it has yet to be picked up by
anyone. See https://issues.apache.org/jira/browse/KAFKA-4835
I think the only workaround at this point would be to implement the
aggregation step yourself
using the low-level Processor API. Streams will only handle the
repartitioning for DSL operators.

You can still use the DSL for everything else, and mix in the PAPI by
using a transformer. Streams
does not insert repartitions before a transformer since it can't infer
whether or not the operation is
stateful. I know re-implementing the aggregator is a hassle but you
should be able to just copy and
paste much of the existing aggregation code. Check out the
KStreamAggregateProcessor class.

This would look something like this:

 builder
.stream("input-topic")
.selectKey((key, value) -> key + extractKeyData(value)); // eg A --> A:B
.transform(myAggregateProcessorSupplier); // supplier returns a new
processor which implements the aggregation
.to("output-topic", Produced.streamPartitioner(myStreamPartitioner));
// myStreamPartitioner extracts and partitions based on A

Obviously this situation is not ideal -- if you're interested in improving
things, feel free to pick up KAFKA-4835



On Wed, Mar 17, 2021 at 8:19 PM Gareth Collins 
wrote:

> Hi Sophie,
>
> Thanks very much for the response!
>
> So if I understand correctly it will be impossible to avoid the repartition
> topic?
>
> e.g. my original message may have key = A...and will be partitioned on A.
>
> But in my Kafka Streams app, I will want to aggregate on A:B or A:C or A:D
> (B, C or D come from extra key values in the data)...but continue to
> partition on A. Then later
> read via REST all values for A. So to make this work I have to have a
> repartition topic even though I am not really repartitioning (i.e. all
> records for A should still be processed
> together). Is my understanding correct?
>
> So WindowedStreamPartitioner is a special case for avoiding the repartition
> topic?
>
> thanks in advance,
> Gareth
>
> On Wed, Mar 17, 2021 at 7:59 PM Sophie Blee-Goldman
>  wrote:
>
> > Hey Gareth,
> >
> > Kafka Streams state store partitioning is based on the partitioning of
> the
> > upstream input topics.
> > If you want your RocksDB stores to be partitioned based on the prefix of
> a
> > key, then you should
> > make sure the input topic feeding into it uses whatever partitioning
> > strategy you had in mind.
> >
> > If the source topics are user input topics and you have control over the
> > production to these topics,
> > then just use a custom partitioner to produce to them. If you don't have
> > control over them, you can
> > insert an intermediate/repartition topic between the input topics and the
> > subtopology with the RocksDB.
> > Check out the KStream#repartitioned operator, it accepts a Repartitioned
> > which itself accepts a
> > StreamPartitioner that you can use to control the partitioning.
> >
> > You can check out the class WindowedStreamPartitioner for an example:
> this
> > is how we handle the
> > WindowStore case that you pointed out.
> >
> >
> >
> > On Mon, Mar 15, 2021 at 11:45 AM Gareth Collins <
> > gareth.o.coll...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > This may be a newbie question but is it possible to control the
> > > partitioning of a RocksDB KeyValueStore in Kafka Streams?
> > >
> > > For example, I perhaps only want to partition based on a prefix of a
> key
> > > rather than the full key. I assume something similar must be done for
> the
> > > WindowStore to partition without the window start time and sequence
> > number
> > > (otherwise window entries could be spread across partitions)?
> > >
> > > Sort of like the window store, I am wanting to be able to retrieve all
> > > values with a certain key prefix from the K

Re: [kafka-clients] [VOTE] 2.7.1 RC0

2021-03-19 Thread Sophie Blee-Goldman
Hey Mickael, I just merged the fix back to 2.7 so you should be good to go

Thanks for the PR Bruno!

On Fri, Mar 19, 2021 at 9:34 AM Mickael Maison 
wrote:

> Thanks Bruno,
>
> That indeed sounds like a blocker.
>
> I'm closing this vote, I'll build a new RC once a fix is merged into 2.7
>
> On Fri, Mar 19, 2021 at 2:04 PM Bruno Cadonna
>  wrote:
> >
> > Hi Mickael,
> >
> > Correction to my last e-mail: The bug does not break eos, but it breaks
> > at-least-once.
> >
> > Bruno
> >
> >
> > On 19.03.21 14:54, Bruno Cadonna wrote:
> > > Hi Mickael,
> > >
> > > Please have a look at the following bug report:
> > >
> > > https://issues.apache.org/jira/browse/KAFKA-12508
> > >
> > > I set its priority to blocker since the bug might break at-least-once
> > > and exactly-once processing guarantees.
> > >
> > > Feel free to set it back to major, if you think that it is not a
> blocker.
> > >
> > > Best,
> > > Bruno
> > >
> > >
> > > On 19.03.21 12:26, Mickael Maison wrote:
> > >> Hello Kafka users, developers and client-developers,
> > >>
> > >> This is the first candidate for release of Apache Kafka 2.7.1.
> > >>
> > >> Apache Kafka 2.7.1 is a bugfix release and 40 issues have been fixed
> > >> since 2.7.0.
> > >>
> > >> Release notes for the 2.7.1 release:
> > >> https://home.apache.org/~mimaison/kafka-2.7.1-rc0/RELEASE_NOTES.html
> > >>
> > >> *** Please download, test and vote by Friday, March 26, 5pm PST
> > >>
> > >> Kafka's KEYS file containing PGP keys we use to sign the release:
> > >> https://kafka.apache.org/KEYS
> > >>
> > >> * Release artifacts to be voted upon (source and binary):
> > >> https://home.apache.org/~mimaison/kafka-2.7.1-rc0/
> > >>
> > >> * Maven artifacts to be voted upon:
> > >>
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > >>
> > >> * Javadoc:
> > >> https://home.apache.org/~mimaison/kafka-2.7.1-rc0/javadoc/
> > >>
> > >> * Tag to be voted upon (off 2.7 branch) is the 2.7.1 tag:
> > >> https://github.com/apache/kafka/releases/tag/2.7.1-rc0
> > >>
> > >> * Documentation:
> > >> https://kafka.apache.org/27/documentation.html
> > >>
> > >> * Protocol:
> > >> https://kafka.apache.org/27/protocol.html
> > >>
> > >> * Successful Jenkins builds for the 2.7 branch:
> > >> Unit/integration tests:
> > >> https://ci-builds.apache.org/job/Kafka/job/kafka-2.7-jdk8/135/
> > >>
> > >> /**
> > >>
> > >> Thanks,
> > >> Mickael
> > >>
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kafka-clients/CA%2BOCqna8CaMZa%2BiK7u6nv1jNv9q9cXrHYfh5E%3DXLYQ41GWmheA%40mail.gmail.com
> .
>


Re: [kafka-clients] [VOTE] 2.6.2 RC0

2021-03-19 Thread Sophie Blee-Goldman
Thanks Bruno. I agree this qualifies as a blocker since it was a regression
in 2.6 and
may result in data loss. I'll roll a new RC with the fix

On Fri, Mar 19, 2021 at 7:03 AM 'Bruno Cadonna' via kafka-clients <
kafka-clie...@googlegroups.com> wrote:

> Hi Sophie,
>
> Correction to my last e-mail: The bug does not break eos, but it breaks
> at-least-once.
>
> Bruno
>
> On 19.03.21 14:54, Bruno Cadonna wrote:
> > Hi Sophie,
> >
> > Please have a look at the following bug report:
> >
> > https://issues.apache.org/jira/browse/KAFKA-12508
> >
> > I set its priority to blocker since the bug might break at-least-once
> > and exactly-once processing guarantees.
> >
> > Feel free to set it back to major, if you think that it is not a blocker.
> >
> > Best,
> > Bruno
> >
> > On 12.03.21 19:47, 'Sophie Blee-Goldman' via kafka-clients wrote:
> >> Hello Kafka users, developers and client-developers,
> >>
> >> This is the first candidate for release of Apache Kafka 2.6.2.
> >>
> >> Apache Kafka 2.6.2 is a bugfix release and fixes 30 issues since the
> >> 2.6.1 release. Please see the release notes for more information.
> >>
> >> Release notes for the 2.6.2 release:
> >>
> https://home.apache.org/~ableegoldman/kafka-2.6.2-rc0/RELEASE_NOTES.html
> >> <
> https://home.apache.org/~ableegoldman/kafka-2.6.2-rc0/RELEASE_NOTES.html>
> >>
> >>
> >> *** Please download, test and vote by Friday, March 19th, 9am PST
> >>
> >> Kafka's KEYS file containing PGP keys we use to sign the release:
> >> https://kafka.apache.org/KEYS 
> >>
> >> * Release artifacts to be voted upon (source and binary):
> >> https://home.apache.org/~ableegoldman/kafka-2.6.2-rc0/
> >> 
> >>
> >> * Maven artifacts to be voted upon:
> >> https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >>  >
> >>
> >> * Javadoc:
> >> https://home.apache.org/~ableegoldman/kafka-2.6.2-rc0/javadoc/
> >> 
> >>
> >> * Tag to be voted upon (off 2.6 branch) is the 2.6.2 tag:
> >> https://github.com/apache/kafka/releases/tag/2.6.2-rc0
> >> 
> >>
> >> * Documentation:
> >> https://kafka.apache.org/26/documentation.html
> >> 
> >>
> >> * Protocol:
> >> https://kafka.apache.org/26/protocol.html
> >> 
> >>
> >> * Successful Jenkins builds for the 2.6 branch:
> >> Unit/integration tests:
> >> https://ci-builds.apache.org/job/Kafka/job/kafka-2.6-jdk8/105/
> >> 
> >> System tests:
> >> https://jenkins.confluent.io/job/system-test-kafka-branch-builder/4435/
> 
> >>
> >>
> >> /**
> >>
> >> Thanks,
> >> Sophie
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> >> Groups "kafka-clients" group.
> >> To unsubscribe from this group and stop receiving emails from it, send
> >> an email to kafka-clients+unsubscr...@googlegroups.com
> >> .
> >> To view this discussion on the web visit
> >>
> https://groups.google.com/d/msgid/kafka-clients/CAFLS_9jkHGsj42DT7Og3%3Dov9RbO%3DbEAQX55h0L6YKHJQR9qJpw%40mail.gmail.com
> >> <
> https://groups.google.com/d/msgid/kafka-clients/CAFLS_9jkHGsj42DT7Og3%3Dov9RbO%3DbEAQX55h0L6YKHJQR9qJpw%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>
> >>
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kafka-clients/b1ea7ea3-f7aa-1ccb-eedd-6f2b4c254d1e%40confluent.io
> .
>


Re: Redis as state store

2021-03-19 Thread Sophie Blee-Goldman
Hey Pushkar, yes, the data will still be backed by a changelog topic unless
the
user explicitly disables logging for that state store. The fault tolerance
mechanism
of Kafka Streams is based on changelogging, therefore there are no
correctness
guarantees if you decide to disable it.

That said, I'm guessing many users do in fact disable the changelog when
plugging
in a remote store with it's own fault tolerance guarantees -- is that what
you're getting
at? We could definitely build in better support for that case, as either an
additional
optimization on top of KAFKA-12475
 or as an alternative
implementation to fix the
underlying EOS problem. Check out my latest comment on the ticket here


Does that address your question?

On Fri, Mar 19, 2021 at 10:14 AM Pushkar Deole  wrote:

> Hello Sophie,
>
> may be i am missing something here, however can you let me know how a redis
> based state store will be wiped off the inconsistent state in case stream
> application dies in the middle of processing e.g. stream application
> consumed from source topic, processed source event and saved state to redis
> and before producing event on destination topic, the stream application had
> error.
> If this occurs with a rocksDB or in-memory state store, it will be rebuilt
> from changelog topic, however for redis state store, how it will wiped off
> the state ? are we saying here that the data stored in redis will still be
> backed by changelog topic and redis will be restored from backed topic in
> case of stream application error?
>
> On Tue, Mar 16, 2021 at 12:18 AM Sophie Blee-Goldman
>  wrote:
>
> > This certainly does seem like a flaw in the Streams API, although of
> course
> > Streams is just
> > in general not designed for use with remote anything (API calls, stores,
> > etc)
> >
> > That said, I don't see any reason why we *couldn't* have better support
> for
> > remote state stores.
> > Note that there's currently no deleteAll method on the store interface,
> and
> > not all databases
> > necessarily support that. But we could add a default implementation which
> > just calls delete(key)
> > on all keys in the state store, and for the RocksDB-based state stores we
> > still wipe out the state
> > as usual (and recommend the same for any custom StateStore which is local
> > rather than remote).
> > Obviously falling back on individual delete(key) operations for all the
> > data in the entire store will
> > have worse performance, but that's better than silently breaking EOS when
> > deleteAll is not available
> > on a remote store.
> >
> > Would you be interested in doing a KIP for this? We should also file a
> JIRA
> > with the above explanation,
> > so that other users of remote storage are aware of this limitation. And
> > definitely document this somewhere
> >
> >
> > On Mon, Mar 15, 2021 at 8:04 AM Bruno Cadonna  >
> > wrote:
> >
> > > Hi Alex,
> > >
> > > I guess wiping out the state directory is easier code-wise, faster,
> > > and/or at the time of development the developers did not design for
> > > remote state stores. But I do actually not know the exact reason.
> > >
> > > Off the top of my head, I do not know how to solve this for remote
> state
> > > stores. Using the uncaught exception handler is not good, because a
> > > computing node could fail without giving the JVM the opportunity to
> > > throw an exception.
> > >
> > > In your tests, try to increase the commit interval to a high value and
> > > see if you get inconsistencies. You should get an inconsistency if the
> > > state store maintains counts for keys and after the last commit before
> > > the failure, the Streams app puts an event with a new key K with value
> 1
> > > into the state store. After failover, Streams would put the same event
> > > with key K again into the state store. If the state store deleted all
> of
> > > its data, Streams would put again value 1, but if the state store did
> > > not delete all data, Streams would put value 2 which is wrong because
> it
> > > would count the same event twice.
> > >
> > > Best,
> > > Bruno
> > >
> > >
> > > On 15.03.21 15:20, Alex Craig wrote:
> > > > Bruno,
> > > > Thanks for the info!  that makes sense.  Of course now I have more
> > > > questions.  :)  Do you know why this is being done outside of the
> state
> > > > store API?  I assume there are reasons why a "deleteAll()" type of
> > > function
> > > > wouldn't work, thereby allowing a state store to purge itself?  And
> > maybe
> > > > more importantly, is there a way to achieve a similar behavior with a
> > 3rd
> > > > party store?  I'm not sure if hooking into the uncaught exception
> > handler
> > > > might be a good way to purge/drop a state store in the event of a
> fatal
> > > > error?  I did setup a MongoDB sta

Re: Redis as state store

2021-03-19 Thread Pushkar Deole
Hello Sophie,

may be i am missing something here, however can you let me know how a redis
based state store will be wiped off the inconsistent state in case stream
application dies in the middle of processing e.g. stream application
consumed from source topic, processed source event and saved state to redis
and before producing event on destination topic, the stream application had
error.
If this occurs with a rocksDB or in-memory state store, it will be rebuilt
from changelog topic, however for redis state store, how it will wiped off
the state ? are we saying here that the data stored in redis will still be
backed by changelog topic and redis will be restored from backed topic in
case of stream application error?

On Tue, Mar 16, 2021 at 12:18 AM Sophie Blee-Goldman
 wrote:

> This certainly does seem like a flaw in the Streams API, although of course
> Streams is just
> in general not designed for use with remote anything (API calls, stores,
> etc)
>
> That said, I don't see any reason why we *couldn't* have better support for
> remote state stores.
> Note that there's currently no deleteAll method on the store interface, and
> not all databases
> necessarily support that. But we could add a default implementation which
> just calls delete(key)
> on all keys in the state store, and for the RocksDB-based state stores we
> still wipe out the state
> as usual (and recommend the same for any custom StateStore which is local
> rather than remote).
> Obviously falling back on individual delete(key) operations for all the
> data in the entire store will
> have worse performance, but that's better than silently breaking EOS when
> deleteAll is not available
> on a remote store.
>
> Would you be interested in doing a KIP for this? We should also file a JIRA
> with the above explanation,
> so that other users of remote storage are aware of this limitation. And
> definitely document this somewhere
>
>
> On Mon, Mar 15, 2021 at 8:04 AM Bruno Cadonna 
> wrote:
>
> > Hi Alex,
> >
> > I guess wiping out the state directory is easier code-wise, faster,
> > and/or at the time of development the developers did not design for
> > remote state stores. But I do actually not know the exact reason.
> >
> > Off the top of my head, I do not know how to solve this for remote state
> > stores. Using the uncaught exception handler is not good, because a
> > computing node could fail without giving the JVM the opportunity to
> > throw an exception.
> >
> > In your tests, try to increase the commit interval to a high value and
> > see if you get inconsistencies. You should get an inconsistency if the
> > state store maintains counts for keys and after the last commit before
> > the failure, the Streams app puts an event with a new key K with value 1
> > into the state store. After failover, Streams would put the same event
> > with key K again into the state store. If the state store deleted all of
> > its data, Streams would put again value 1, but if the state store did
> > not delete all data, Streams would put value 2 which is wrong because it
> > would count the same event twice.
> >
> > Best,
> > Bruno
> >
> >
> > On 15.03.21 15:20, Alex Craig wrote:
> > > Bruno,
> > > Thanks for the info!  that makes sense.  Of course now I have more
> > > questions.  :)  Do you know why this is being done outside of the state
> > > store API?  I assume there are reasons why a "deleteAll()" type of
> > function
> > > wouldn't work, thereby allowing a state store to purge itself?  And
> maybe
> > > more importantly, is there a way to achieve a similar behavior with a
> 3rd
> > > party store?  I'm not sure if hooking into the uncaught exception
> handler
> > > might be a good way to purge/drop a state store in the event of a fatal
> > > error?  I did setup a MongoDB state store recently as part of a POC and
> > was
> > > testing it with EOS enabled.  (forcing crashes to occur and checking
> that
> > > the result of my aggregation was still accurate)  I was unable to cause
> > > inconsistent data in the mongo store (which is good!), though of
> course I
> > > may just have been getting lucky.  Thanks again for your help,
> > >
> > > Alex
> > >
> > > On Mon, Mar 15, 2021 at 8:59 AM Pushkar Deole 
> > wrote:
> > >
> > >> Bruno,
> > >>
> > >> i tried to explain this in 'kafka user's language through above
> > mentioned
> > >> scenario, hope i put it properly -:) and correct me if i am wrong
> > >>
> > >> On Mon, Mar 15, 2021 at 7:23 PM Pushkar Deole 
> > >> wrote:
> > >>
> > >>> This is what I understand could be the issue with external state
> store:
> > >>>
> > >>> kafka stream application consumes source topic, does processing,
> stores
> > >>> state to kafka state store (this is backed by topic) and before
> > producing
> > >>> event on destination topic, the application fails with some issue. In
> > >>> this case, the transaction has failed, so kafka guarantees either all
> > or
> > >>> none, means offset written to source topic, state written

Re: Kafka SLIs/SLOs

2021-03-19 Thread rs vas
Any input in this is appreciated...

On Tue, Mar 9, 2021, 7:19 AM rs vas  wrote:

> I am looking for defining these Availability, Latency, ErrorRate, Load for
> Kafka clusters using Prometheus metrics generated by KAfka and jmx
> exporter.
>
> Does anyone have any initial dashboards to start with for defining these
> indicators?
>
> Thank you!
> Rs Vas
>


Re: does Kafka exactly-once guarantee also apply to kafka state stores?

2021-03-19 Thread Pushkar Deole
Matthias,

With reference to your response above, i came across the JIRA ticket
https://issues.apache.org/jira/browse/KAFKA-12475

For rocksDB or in-memory state stores, these are always backed by changelog
topic, so they can be rebuilt from scratch from the changelog topic.
However, how a remote state store can be made consistent in case of error
e.g. stream consumed event from source topic, processed and stored state to
redis, and before producing event to destination topic application dies. In
this case, offset won't be committed to source topic and destination topics
anyway doesn't have the processed event, however redis holds the new state.
How can redis be wiped off the state that was saved while processing above
event(s) ?

On Wed, Jan 6, 2021 at 11:18 PM Matthias J. Sax  wrote:

> Well, that is exactly what I mean by "it depends on the state store
> implementation".
>
> For this case, you cannot get exactly-once.
>
> There are actually ideas to improve the implementation to support the
> case you describe, but there is no timeline for this change yet. Not
> even sure if there is already a Jira ticket...
>
>
> -Matthias
>
> On 1/6/21 2:32 AM, Pushkar Deole wrote:
> > The question is if we want to use state store of 3rd party, e.g. say
> Redis,
> > how can the store be consistent with rest of the system i.e. source and
> > destination topics...
> >
> > e.g. record is consumed from source, processed, state store updated with
> > some state, but before writing to destination there is failure
> > Now, in this case, with kafka state store, it will be wiped off the state
> > stored since the transaction failed.
> >
> > But with Redis, the state store is updated with the new state and there
> is
> > no way to revert back
> >
> > On Tue, Jan 5, 2021 at 11:11 PM Matthias J. Sax 
> wrote:
> >
> >> It depends on the store implementation. Atm, EOS for state store is
> >> achieved by re-creating the state store in case of failure from the
> >> changelog topic.
> >>
> >> For RocksDB stores, we wipe out the local state directories and create a
> >> new empty RocksDB and for in-memory stores the content is "lost" anyway
> >> when state is migrated, and we reply the changelog into an empty store
> >> before processing resumes.
> >>
> >>
> >> -Matthias
> >>
> >> On 1/5/21 6:27 AM, Alex Craig wrote:
> >>> I don't think he's asking about data-loss, but rather data consistency.
> >>> (in the event of an exception or application crash, will EOS ensure
> that
> >>> the state store data is consistent)  My understanding is that it DOES
> >> apply
> >>> to state stores as well, in the sense that a failure during processing
> >>> would mean that the commit wouldn't get flushed and therefore wouldn't
> >> get
> >>> double-counted once processing resumes and message is re-processed.
> >>> As far as using something other than RocksDB, I think as long as you
> are
> >>> implementing the state store API correctly you should be fine.  I did a
> >> POC
> >>> recently using Mongo state-stores with EOS enabled and it worked
> >> correctly,
> >>> even when I intentionally introduced failures and crashes.
> >>>
> >>> -alex
> >>>
> >>> On Tue, Jan 5, 2021 at 1:09 AM Ning Zhang 
> >> wrote:
> >>>
>  If there is a "change-log" topic to back up the state store, then it
> may
>  not lose data.
> 
>  Also, if the third party store is not "kafka community certified" (or
> >> not
>  well-maintained), it may have chances to lose data (in different
> ways).
> 
> 
> 
>  On 2021/01/05 04:56:12, Pushkar Deole  wrote:
> > In case we opt to choose some third party store instead of kafka's
> >> stores
> > for storing state (e.g. Redis cache or Ignite), then will we lose the
> > exactly-once guarantee provided by kafka and the state stores can be
> in
>  an
> > inconsistent state ?
> >
> > On Sat, Jan 2, 2021 at 4:56 AM Ning Zhang 
>  wrote:
> >
> >> The physical store behind "state store" is change-log kafka topic.
> In
> >> Kafka stream, if something fails in the middle, the "state store" is
> >> restored back to the state before the event happens at the first
> step
> >> /
> >> beginning of the stream.
> >>
> >>
> >>
> >> On 2020/12/31 08:48:16, Pushkar Deole  wrote:
> >>> Hi All,
> >>>
> >>> We use Kafka streams and may need to use exactly-once configuration
>  for
> >>> some of the use cases. Currently, the application uses either local
>  or
> >>> global state store to store state.
> >>>  So, the application will consume events from source kafka topic,
>  process
> >>> the events, for state stores it will use either local or global
> state
> >> store
> >>> of kafka, then produce events onto the destination topic.
> >>>
> >>> Question i have is: in the case of exactly-once setting, kafka
>  streams
> >>> guarantees that all actions happen or nothing happens. So, in this
>  case,

Re: [kafka-clients] [VOTE] 2.7.1 RC0

2021-03-19 Thread Mickael Maison
Thanks Bruno,

That indeed sounds like a blocker.

I'm closing this vote, I'll build a new RC once a fix is merged into 2.7

On Fri, Mar 19, 2021 at 2:04 PM Bruno Cadonna
 wrote:
>
> Hi Mickael,
>
> Correction to my last e-mail: The bug does not break eos, but it breaks
> at-least-once.
>
> Bruno
>
>
> On 19.03.21 14:54, Bruno Cadonna wrote:
> > Hi Mickael,
> >
> > Please have a look at the following bug report:
> >
> > https://issues.apache.org/jira/browse/KAFKA-12508
> >
> > I set its priority to blocker since the bug might break at-least-once
> > and exactly-once processing guarantees.
> >
> > Feel free to set it back to major, if you think that it is not a blocker.
> >
> > Best,
> > Bruno
> >
> >
> > On 19.03.21 12:26, Mickael Maison wrote:
> >> Hello Kafka users, developers and client-developers,
> >>
> >> This is the first candidate for release of Apache Kafka 2.7.1.
> >>
> >> Apache Kafka 2.7.1 is a bugfix release and 40 issues have been fixed
> >> since 2.7.0.
> >>
> >> Release notes for the 2.7.1 release:
> >> https://home.apache.org/~mimaison/kafka-2.7.1-rc0/RELEASE_NOTES.html
> >>
> >> *** Please download, test and vote by Friday, March 26, 5pm PST
> >>
> >> Kafka's KEYS file containing PGP keys we use to sign the release:
> >> https://kafka.apache.org/KEYS
> >>
> >> * Release artifacts to be voted upon (source and binary):
> >> https://home.apache.org/~mimaison/kafka-2.7.1-rc0/
> >>
> >> * Maven artifacts to be voted upon:
> >> https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >>
> >> * Javadoc:
> >> https://home.apache.org/~mimaison/kafka-2.7.1-rc0/javadoc/
> >>
> >> * Tag to be voted upon (off 2.7 branch) is the 2.7.1 tag:
> >> https://github.com/apache/kafka/releases/tag/2.7.1-rc0
> >>
> >> * Documentation:
> >> https://kafka.apache.org/27/documentation.html
> >>
> >> * Protocol:
> >> https://kafka.apache.org/27/protocol.html
> >>
> >> * Successful Jenkins builds for the 2.7 branch:
> >> Unit/integration tests:
> >> https://ci-builds.apache.org/job/Kafka/job/kafka-2.7-jdk8/135/
> >>
> >> /**
> >>
> >> Thanks,
> >> Mickael
> >>


Re: Kafka connect replication using MirrorMaker 2.0

2021-03-19 Thread dandaniel97
Hello and thank you for the reply! 

My problem is not with consumption of messages, because as you said, 
MirrorMaker2 knows how to deal with the consumer offsets. Rather my problem is 
with source connectors and the topic connect-offsets. 

Because Kafka connect manages where it stopped reading from a source using to 
topic, the second connect cluster is not aware of the offsets and I can get 
duplicates. I know MirrorMaker2 does not guarantee exactly-once delivery but is 
there a way to maybe sync to topics so the kafka connect will be aware of each 
other offset. 

Daniel 

> On 19 Mar 2021, at 9:09, Ning Zhang  wrote:
> Hi Daniel, MirrorMaker2 creates its own "offsets" topic to track the process 
> of consumption.
> 
> just my 2 cents - If you already have two Kafka connect clusters in two 
> different sites, it sounds practical to:
> (1) use "cluster" mode, instead of "dedicated" mode of MirrorMaker2
> (2) add one "MirrorMaker" connector on each Kafka connect cluster and do 
> "one-way" replication. As you have two "MirrorMaker" connectors, it should 
> behave like "active-active" deployment.
> 
> Just in case you only have one  Kafka connect cluster, do "active-active" 
> replication, rather than "one-way"
> 
> On 2021/03/18 08:15:40, Daniel Beilin  wrote: 
>> Hi everyone,
>> I'm trying to create an active-active deployment of a kafka cluster between
>> two data centers using MirrorMaker2, but I'm facing a problem.
>> In my deployment I have Kafka Connect in both sites which each of them
>> connect to different database using sink and source connectors (MongoDB
>> source connector , JDBC sink/source connector)
>> I’d like to know what’s the best practice for active-active is using Kafka
>> connect , since I noticed the “connect-offsets” topic is not replicated in
>> mm2.
>> 
>> Best regards,
>> Daniel


Re: [kafka-clients] [VOTE] 2.7.1 RC0

2021-03-19 Thread Bruno Cadonna

Hi Mickael,

Correction to my last e-mail: The bug does not break eos, but it breaks 
at-least-once.


Bruno


On 19.03.21 14:54, Bruno Cadonna wrote:

Hi Mickael,

Please have a look at the following bug report:

https://issues.apache.org/jira/browse/KAFKA-12508

I set its priority to blocker since the bug might break at-least-once 
and exactly-once processing guarantees.


Feel free to set it back to major, if you think that it is not a blocker.

Best,
Bruno


On 19.03.21 12:26, Mickael Maison wrote:

Hello Kafka users, developers and client-developers,

This is the first candidate for release of Apache Kafka 2.7.1.

Apache Kafka 2.7.1 is a bugfix release and 40 issues have been fixed
since 2.7.0.

Release notes for the 2.7.1 release:
https://home.apache.org/~mimaison/kafka-2.7.1-rc0/RELEASE_NOTES.html

*** Please download, test and vote by Friday, March 26, 5pm PST

Kafka's KEYS file containing PGP keys we use to sign the release:
https://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
https://home.apache.org/~mimaison/kafka-2.7.1-rc0/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/org/apache/kafka/

* Javadoc:
https://home.apache.org/~mimaison/kafka-2.7.1-rc0/javadoc/

* Tag to be voted upon (off 2.7 branch) is the 2.7.1 tag:
https://github.com/apache/kafka/releases/tag/2.7.1-rc0

* Documentation:
https://kafka.apache.org/27/documentation.html

* Protocol:
https://kafka.apache.org/27/protocol.html

* Successful Jenkins builds for the 2.7 branch:
Unit/integration tests:
https://ci-builds.apache.org/job/Kafka/job/kafka-2.7-jdk8/135/

/**

Thanks,
Mickael



Re: [kafka-clients] [VOTE] 2.6.2 RC0

2021-03-19 Thread Bruno Cadonna

Hi Sophie,

Correction to my last e-mail: The bug does not break eos, but it breaks 
at-least-once.


Bruno

On 19.03.21 14:54, Bruno Cadonna wrote:

Hi Sophie,

Please have a look at the following bug report:

https://issues.apache.org/jira/browse/KAFKA-12508

I set its priority to blocker since the bug might break at-least-once 
and exactly-once processing guarantees.


Feel free to set it back to major, if you think that it is not a blocker.

Best,
Bruno

On 12.03.21 19:47, 'Sophie Blee-Goldman' via kafka-clients wrote:

Hello Kafka users, developers and client-developers,

This is the first candidate for release of Apache Kafka 2.6.2.

Apache Kafka 2.6.2 is a bugfix release and fixes 30 issues since the 
2.6.1 release. Please see the release notes for more information.


Release notes for the 2.6.2 release:
https://home.apache.org/~ableegoldman/kafka-2.6.2-rc0/RELEASE_NOTES.html 
 



*** Please download, test and vote by Friday, March 19th, 9am PST

Kafka's KEYS file containing PGP keys we use to sign the release:
https://kafka.apache.org/KEYS 

* Release artifacts to be voted upon (source and binary):
https://home.apache.org/~ableegoldman/kafka-2.6.2-rc0/ 



* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/org/apache/kafka/ 



* Javadoc:
https://home.apache.org/~ableegoldman/kafka-2.6.2-rc0/javadoc/ 



* Tag to be voted upon (off 2.6 branch) is the 2.6.2 tag:
https://github.com/apache/kafka/releases/tag/2.6.2-rc0 



* Documentation:
https://kafka.apache.org/26/documentation.html 



* Protocol:
https://kafka.apache.org/26/protocol.html 



* Successful Jenkins builds for the 2.6 branch:
Unit/integration tests: 
https://ci-builds.apache.org/job/Kafka/job/kafka-2.6-jdk8/105/ 

System tests: 
https://jenkins.confluent.io/job/system-test-kafka-branch-builder/4435/  



/**

Thanks,
Sophie

--
You received this message because you are subscribed to the Google 
Groups "kafka-clients" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to kafka-clients+unsubscr...@googlegroups.com 
.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/kafka-clients/CAFLS_9jkHGsj42DT7Og3%3Dov9RbO%3DbEAQX55h0L6YKHJQR9qJpw%40mail.gmail.com 
. 



Re: [kafka-clients] [VOTE] 2.6.2 RC0

2021-03-19 Thread Bruno Cadonna

Hi Sophie,

Please have a look at the following bug report:

https://issues.apache.org/jira/browse/KAFKA-12508

I set its priority to blocker since the bug might break at-least-once 
and exactly-once processing guarantees.


Feel free to set it back to major, if you think that it is not a blocker.

Best,
Bruno

On 12.03.21 19:47, 'Sophie Blee-Goldman' via kafka-clients wrote:

Hello Kafka users, developers and client-developers,

This is the first candidate for release of Apache Kafka 2.6.2.

Apache Kafka 2.6.2 is a bugfix release and fixes 30 issues since the 
2.6.1 release. Please see the release notes for more information.


Release notes for the 2.6.2 release:
https://home.apache.org/~ableegoldman/kafka-2.6.2-rc0/RELEASE_NOTES.html 



*** Please download, test and vote by Friday, March 19th, 9am PST

Kafka's KEYS file containing PGP keys we use to sign the release:
https://kafka.apache.org/KEYS 

* Release artifacts to be voted upon (source and binary):
https://home.apache.org/~ableegoldman/kafka-2.6.2-rc0/ 



* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/org/apache/kafka/ 



* Javadoc:
https://home.apache.org/~ableegoldman/kafka-2.6.2-rc0/javadoc/ 



* Tag to be voted upon (off 2.6 branch) is the 2.6.2 tag:
https://github.com/apache/kafka/releases/tag/2.6.2-rc0 



* Documentation:
https://kafka.apache.org/26/documentation.html 



* Protocol:
https://kafka.apache.org/26/protocol.html 



* Successful Jenkins builds for the 2.6 branch:
Unit/integration tests: 
https://ci-builds.apache.org/job/Kafka/job/kafka-2.6-jdk8/105/ 

System tests: 
https://jenkins.confluent.io/job/system-test-kafka-branch-builder/4435/ 



/**

Thanks,
Sophie

--
You received this message because you are subscribed to the Google 
Groups "kafka-clients" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to kafka-clients+unsubscr...@googlegroups.com 
.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/kafka-clients/CAFLS_9jkHGsj42DT7Og3%3Dov9RbO%3DbEAQX55h0L6YKHJQR9qJpw%40mail.gmail.com 
.


Re: [kafka-clients] [VOTE] 2.7.1 RC0

2021-03-19 Thread Bruno Cadonna

Hi Mickael,

Please have a look at the following bug report:

https://issues.apache.org/jira/browse/KAFKA-12508

I set its priority to blocker since the bug might break at-least-once 
and exactly-once processing guarantees.


Feel free to set it back to major, if you think that it is not a blocker.

Best,
Bruno


On 19.03.21 12:26, Mickael Maison wrote:

Hello Kafka users, developers and client-developers,

This is the first candidate for release of Apache Kafka 2.7.1.

Apache Kafka 2.7.1 is a bugfix release and 40 issues have been fixed
since 2.7.0.

Release notes for the 2.7.1 release:
https://home.apache.org/~mimaison/kafka-2.7.1-rc0/RELEASE_NOTES.html

*** Please download, test and vote by Friday, March 26, 5pm PST

Kafka's KEYS file containing PGP keys we use to sign the release:
https://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
https://home.apache.org/~mimaison/kafka-2.7.1-rc0/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/org/apache/kafka/

* Javadoc:
https://home.apache.org/~mimaison/kafka-2.7.1-rc0/javadoc/

* Tag to be voted upon (off 2.7 branch) is the 2.7.1 tag:
https://github.com/apache/kafka/releases/tag/2.7.1-rc0

* Documentation:
https://kafka.apache.org/27/documentation.html

* Protocol:
https://kafka.apache.org/27/protocol.html

* Successful Jenkins builds for the 2.7 branch:
Unit/integration tests:
https://ci-builds.apache.org/job/Kafka/job/kafka-2.7-jdk8/135/

/**

Thanks,
Mickael



Re: [VOTE] 2.6.2 RC0

2021-03-19 Thread Mickael Maison
Hi Sophie,

+1 binding

- checked signatures
- built from source
- verified quickstart with Scala 2.13 binaries
- checked javadocs

Thanks for running the release

On Fri, Mar 19, 2021 at 10:13 AM Manikumar  wrote:
>
> Hi,
>
> +1 (binding)
>
> - verified the signatures
> - ran the tests on the source archive with Scala 2.13
> - verified the quickstart with Scala 2.12 binary
> - verified the artifacts, javadoc
>
> Thanks for running the release!
>
> Thanks,
> Manikumar
>
> On Tue, Mar 16, 2021 at 9:41 AM Sophie Blee-Goldman
>  wrote:
>
> > Thanks Luke. I'll make sure to bump this in the kafka-site repo once the
> > release is finalized.
> >
> > On Mon, Mar 15, 2021 at 8:58 PM Luke Chen  wrote:
> >
> > > Hi Sophie,
> > > A small doc update for 2.6.2. I think we missed it.
> > > https://github.com/apache/kafka/pull/10328
> > >
> > > Thanks.
> > > Luke
> > >
> > > On Tue, Mar 16, 2021 at 11:12 AM Guozhang Wang 
> > wrote:
> > >
> > > > Hi Sophie,
> > > >
> > > > I've reviewed the javadocs / release notes / documentations, and they
> > > LGTM.
> > > >
> > > > +1.
> > > >
> > > >
> > > > Guozhang
> > > >
> > > > On Fri, Mar 12, 2021 at 10:48 AM Sophie Blee-Goldman
> > > >  wrote:
> > > >
> > > > > Hello Kafka users, developers and client-developers,
> > > > >
> > > > > This is the first candidate for release of Apache Kafka 2.6.2.
> > > > >
> > > > > Apache Kafka 2.6.2 is a bugfix release and fixes 30 issues since the
> > > > 2.6.1
> > > > > release. Please see the release notes for more information.
> > > > >
> > > > > Release notes for the 2.6.2 release:
> > > > >
> > > https://home.apache.org/~ableegoldman/kafka-2.6.2-rc0/RELEASE_NOTES.html
> > > > >
> > > > > *** Please download, test and vote by Friday, March 19th, 9am PST
> > > > >
> > > > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > > > https://kafka.apache.org/KEYS
> > > > >
> > > > > * Release artifacts to be voted upon (source and binary):
> > > > > https://home.apache.org/~ableegoldman/kafka-2.6.2-rc0/
> > > > >
> > > > > * Maven artifacts to be voted upon:
> > > > >
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > > > >
> > > > > * Javadoc:
> > > > > https://home.apache.org/~ableegoldman/kafka-2.6.2-rc0/javadoc/
> > > > >
> > > > > * Tag to be voted upon (off 2.6 branch) is the 2.6.2 tag:
> > > > > https://github.com/apache/kafka/releases/tag/2.6.2-rc0
> > > > >
> > > > > * Documentation:
> > > > > https://kafka.apache.org/26/documentation.html
> > > > >
> > > > > * Protocol:
> > > > > https://kafka.apache.org/26/protocol.html
> > > > >
> > > > > * Successful Jenkins builds for the 2.6 branch:
> > > > > Unit/integration tests:
> > > > > https://ci-builds.apache.org/job/Kafka/job/kafka-2.6-jdk8/105/
> > > > > System tests:
> > > > >
> > > https://jenkins.confluent.io/job/system-test-kafka-branch-builder/4435/
> > > > >
> > > > > /**
> > > > >
> > > > > Thanks,
> > > > > Sophie
> > > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >


[VOTE] 2.7.1 RC0

2021-03-19 Thread Mickael Maison
Hello Kafka users, developers and client-developers,

This is the first candidate for release of Apache Kafka 2.7.1.

Apache Kafka 2.7.1 is a bugfix release and 40 issues have been fixed
since 2.7.0.

Release notes for the 2.7.1 release:
https://home.apache.org/~mimaison/kafka-2.7.1-rc0/RELEASE_NOTES.html

*** Please download, test and vote by Friday, March 26, 5pm PST

Kafka's KEYS file containing PGP keys we use to sign the release:
https://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
https://home.apache.org/~mimaison/kafka-2.7.1-rc0/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/org/apache/kafka/

* Javadoc:
https://home.apache.org/~mimaison/kafka-2.7.1-rc0/javadoc/

* Tag to be voted upon (off 2.7 branch) is the 2.7.1 tag:
https://github.com/apache/kafka/releases/tag/2.7.1-rc0

* Documentation:
https://kafka.apache.org/27/documentation.html

* Protocol:
https://kafka.apache.org/27/protocol.html

* Successful Jenkins builds for the 2.7 branch:
Unit/integration tests:
https://ci-builds.apache.org/job/Kafka/job/kafka-2.7-jdk8/135/

/**

Thanks,
Mickael


Re: [VOTE] 2.6.2 RC0

2021-03-19 Thread Manikumar
Hi,

+1 (binding)

- verified the signatures
- ran the tests on the source archive with Scala 2.13
- verified the quickstart with Scala 2.12 binary
- verified the artifacts, javadoc

Thanks for running the release!

Thanks,
Manikumar

On Tue, Mar 16, 2021 at 9:41 AM Sophie Blee-Goldman
 wrote:

> Thanks Luke. I'll make sure to bump this in the kafka-site repo once the
> release is finalized.
>
> On Mon, Mar 15, 2021 at 8:58 PM Luke Chen  wrote:
>
> > Hi Sophie,
> > A small doc update for 2.6.2. I think we missed it.
> > https://github.com/apache/kafka/pull/10328
> >
> > Thanks.
> > Luke
> >
> > On Tue, Mar 16, 2021 at 11:12 AM Guozhang Wang 
> wrote:
> >
> > > Hi Sophie,
> > >
> > > I've reviewed the javadocs / release notes / documentations, and they
> > LGTM.
> > >
> > > +1.
> > >
> > >
> > > Guozhang
> > >
> > > On Fri, Mar 12, 2021 at 10:48 AM Sophie Blee-Goldman
> > >  wrote:
> > >
> > > > Hello Kafka users, developers and client-developers,
> > > >
> > > > This is the first candidate for release of Apache Kafka 2.6.2.
> > > >
> > > > Apache Kafka 2.6.2 is a bugfix release and fixes 30 issues since the
> > > 2.6.1
> > > > release. Please see the release notes for more information.
> > > >
> > > > Release notes for the 2.6.2 release:
> > > >
> > https://home.apache.org/~ableegoldman/kafka-2.6.2-rc0/RELEASE_NOTES.html
> > > >
> > > > *** Please download, test and vote by Friday, March 19th, 9am PST
> > > >
> > > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > > https://kafka.apache.org/KEYS
> > > >
> > > > * Release artifacts to be voted upon (source and binary):
> > > > https://home.apache.org/~ableegoldman/kafka-2.6.2-rc0/
> > > >
> > > > * Maven artifacts to be voted upon:
> > > >
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > > >
> > > > * Javadoc:
> > > > https://home.apache.org/~ableegoldman/kafka-2.6.2-rc0/javadoc/
> > > >
> > > > * Tag to be voted upon (off 2.6 branch) is the 2.6.2 tag:
> > > > https://github.com/apache/kafka/releases/tag/2.6.2-rc0
> > > >
> > > > * Documentation:
> > > > https://kafka.apache.org/26/documentation.html
> > > >
> > > > * Protocol:
> > > > https://kafka.apache.org/26/protocol.html
> > > >
> > > > * Successful Jenkins builds for the 2.6 branch:
> > > > Unit/integration tests:
> > > > https://ci-builds.apache.org/job/Kafka/job/kafka-2.6-jdk8/105/
> > > > System tests:
> > > >
> > https://jenkins.confluent.io/job/system-test-kafka-branch-builder/4435/
> > > >
> > > > /**
> > > >
> > > > Thanks,
> > > > Sophie
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>


Re: Right Use Case For Kafka Streams?

2021-03-19 Thread Ning Zhang
just my 2 cents

the best answer is always from the real-world practices :)

RocksDB https://rocksdb.org/ is the implementation of "state store" in Kafka 
Stream and it is an "embedded" kv store (which is diff than the distributed kv 
store). The "state store" in Kafka Stream is also backed up by "changelog" 
topic, where the physical kv data is stored.

The performance hit may happen if:
(1) one of application node (that runs kafka stream, since kafka stream is a 
library) is gone, the "state store" has to be rebuilt from changelog topic and 
if the changelog topic is huge, the rebuild time could be long.
(2) the stream topology is complex with multiple state store / aggregation or 
called "reduce" operations, the rebuild or recovery time after failure could be 
long.

`num.standby.replicas` should help to significantly reduce the rebuild time, 
but it comes with the storage cost, since the "state store" is replicated at a 
different node.





On 2021/03/16 01:11:00, Gareth Collins  wrote: 
> Hi,
> 
> We have a requirement to calculate metrics on a huge number of keys (could
> be hundreds of millions, perhaps billions of keys - attempting caching on
> individual keys in many cases will have almost a 0% cache hit rate). Is
> Kafka Streams with RocksDB and compacting topics the right tool for a task
> like that?
> 
> As well, just from playing with Kafka Streams for a week it feels like it
> wants to create a lot of separate stores by default (if I want to calculate
> aggregates on five, ten and 30 days I will get three separate stores by
> default for this state data). Coming from a different distributed storage
> solution, I feel like I want to put them together in one store as I/O has
> always been my bottleneck (1 big read and 1 big write is better than three
> small separate reads and three small separate writes).
> 
> But am I perhaps missing something here? I don't want to avoid the DSL that
> Kafka Streams provides if I don't have to. Will the Kafka Streams RocksDB
> solution be so much faster than a distributed read that it won't be the
> bottleneck even with huge amounts of data?
> 
> Any info/opinions would be greatly appreciated.
> 
> thanks in advance,
> Gareth Collins
> 


Re: Kafka connect replication using MirrorMaker 2.0

2021-03-19 Thread Ning Zhang
Hi Daniel, MirrorMaker2 creates its own "offsets" topic to track the process of 
consumption.

just my 2 cents - If you already have two Kafka connect clusters in two 
different sites, it sounds practical to:
(1) use "cluster" mode, instead of "dedicated" mode of MirrorMaker2
(2) add one "MirrorMaker" connector on each Kafka connect cluster and do 
"one-way" replication. As you have two "MirrorMaker" connectors, it should 
behave like "active-active" deployment.

Just in case you only have one  Kafka connect cluster, do "active-active" 
replication, rather than "one-way"

On 2021/03/18 08:15:40, Daniel Beilin  wrote: 
> Hi everyone,
> I'm trying to create an active-active deployment of a kafka cluster between
> two data centers using MirrorMaker2, but I'm facing a problem.
> In my deployment I have Kafka Connect in both sites which each of them
> connect to different database using sink and source connectors (MongoDB
> source connector , JDBC sink/source connector)
> I’d like to know what’s the best practice for active-active is using Kafka
> connect , since I noticed the “connect-offsets” topic is not replicated in
> mm2.
> 
> Best regards,
> Daniel
>