Hello Adam, thanks for the questions. Yes my organization uses Streams, and yes you can use Streams with MM2/KIP-382, though perhaps not in the way you are describing.
The architecture you mention is more "active/standby" than "active/active" IMO. The "secondary" cluster is not being used until a failure, at which point you migrate your app and expect the data to already be there. This works for normal consumers where you can seek() and --reset-offsets. Streams apps can be reset with the kafka-streams-application-reset tool, but as you point out, that doesn't help with rebuilding an app's internal state, which would be missing on the secondary cluster. (Granted, that may be okay depending on your particular application.) A true "active/active" solution IMO would be to run your same Streams app in _both_ clusters (primary, secondary), s.t. the entire application state is available and continuously updated in both clusters. As with normal consumers, the Streams app should subscribe to any remote topics, e.g. with a regex, s.t. the application state will reflect input from either source cluster. This is essentially what Streams' "standby replicas" are -- extra copies of application state to support quicker failover. Without these replicas, Streams would need to start back at offset 0 and re-process everything in order to rebuild state (which you don't want to do during a disaster, especially!). The same logic applies to using Streams with MM2. You _could_ failover by resetting the app and rebuilding all the missing state, or you could have a copy of everything sitting there ready when you need it. The easiest way to do the latter is to run your app in both clusters. Hope that helps. Ryanne On Mon, Jul 22, 2019 at 3:11 PM Adam Bellemare <adam.bellem...@gmail.com> wrote: > Hi Ryanne > > I have a quick question for you about Active+Active replication and Kafka > Streams. First, does your org /do you use Kafka Streams? If not then I > think this conversation can end here. ;) > > Secondly, and for the broader Kafka Dev group - what happens if I want to > use Active+Active replication with my Kafka Streams app, say, to > materialize a simple KTable? Based on my understanding, I topic "table" on > the primary cluster will be replicated to the secondary cluster as > "primary.table". In the case of a full cluster failure for primary, the > producer to topic "table" on the primary switches over to the secondary > cluster, creates its own "table" topic and continues to write to there. So > now, assuming we have had no data loss, we end up with: > > > *Primary Cluster: (Dead)* > > > *Secondary Cluster: (Live)* > Topic: "primary.table" (contains data from T = 0 to T = n) > Topic: "table" (contains data from T = n+1 to now) > > If I want to materialize state from using Kafka Streams, obviously I am > now in a bit of a pickle since I need to consume "primary.table" before I > consume "table". Have you encountered rebuilding state in Kafka Streams > using Active-Active? For non-Kafka Streams I can see using a single > consumer for "primary.table" and one for "table", interleaving the > timestamps and performing basic event dispatching based on my own tracked > stream-time, but for Kafka Streams I don't think there exists a solution to > this. > > If you have any thoughts on this or some recommendations for Kafka Streams > with Active-Active I would be very appreciative. > > Thanks > Adam > > >