Hello Adam, thanks for the questions. Yes my organization uses Streams, and
yes you can use Streams with MM2/KIP-382, though perhaps not in the way you
are describing.

The architecture you mention is more "active/standby" than "active/active"
IMO. The "secondary" cluster is not being used until a failure, at which
point you migrate your app and expect the data to already be there. This
works for normal consumers where you can seek() and --reset-offsets.
Streams apps can be reset with the kafka-streams-application-reset tool,
but as you point out, that doesn't help with rebuilding an app's internal
state, which would be missing on the secondary cluster. (Granted, that may
be okay depending on your particular application.)

A true "active/active" solution IMO would be to run your same Streams app
in _both_ clusters (primary, secondary), s.t. the entire application state
is available and continuously updated in both clusters. As with normal
consumers, the Streams app should subscribe to any remote topics, e.g. with
a regex, s.t. the application state will reflect input from either source
cluster.

This is essentially what Streams' "standby replicas" are -- extra copies of
application state to support quicker failover. Without these replicas,
Streams would need to start back at offset 0 and re-process everything in
order to rebuild state (which you don't want to do during a disaster,
especially!). The same logic applies to using Streams with MM2. You _could_
failover by resetting the app and rebuilding all the missing state, or you
could have a copy of everything sitting there ready when you need it. The
easiest way to do the latter is to run your app in both clusters.

Hope that helps.

Ryanne

On Mon, Jul 22, 2019 at 3:11 PM Adam Bellemare <adam.bellem...@gmail.com>
wrote:

> Hi Ryanne
>
> I have a quick question for you about Active+Active replication and Kafka
> Streams. First, does your org /do you use Kafka Streams? If not then I
> think this conversation can end here. ;)
>
> Secondly, and for the broader Kafka Dev group - what happens if I want to
> use Active+Active replication with my Kafka Streams app, say, to
> materialize a simple KTable? Based on my understanding, I topic "table" on
> the primary cluster will be replicated to the secondary cluster as
> "primary.table". In the case of a full cluster failure for primary, the
> producer to topic "table" on the primary switches over to the secondary
> cluster, creates its own "table" topic and continues to write to there. So
> now, assuming we have had no data loss, we end up with:
>
>
> *Primary Cluster: (Dead)*
>
>
> *Secondary Cluster: (Live)*
> Topic: "primary.table" (contains data from T = 0 to T = n)
> Topic: "table" (contains data from T = n+1 to now)
>
> If I want to materialize state from using Kafka Streams, obviously I am
> now in a bit of a pickle since I need to consume "primary.table" before I
> consume "table". Have you encountered rebuilding state in Kafka Streams
> using Active-Active? For non-Kafka Streams I can see using a single
> consumer for "primary.table" and one for "table", interleaving the
> timestamps and performing basic event dispatching based on my own tracked
> stream-time, but for Kafka Streams I don't think there exists a solution to
> this.
>
> If you have any thoughts on this or some recommendations for Kafka Streams
> with Active-Active I would be very appreciative.
>
> Thanks
> Adam
>
>
>

Reply via email to