[GitHub] [kafka-site] miguno commented on a change in pull request #324: KAFKA-8930: MirrorMaker v2 documentation
miguno commented on a change in pull request #324: URL: https://github.com/apache/kafka-site/pull/324#discussion_r563562309 ## File path: 27/ops.html ## @@ -553,7 +539,558 @@ 6.3 Kafka Configuration + 6.3 Geo-Replication (Cross-Cluster Data Mirroring) + + Geo-Replication Overview + + +Kafka administrators can define data flows that cross the boundaries of individual Kafka clusters, data centers, or geo-regions. Such event streaming setups are often needed for organizational, technical, or legal requirements. Common scenarios include: + + + +Geo-replication +Disaster recovery +Feeding edge clusters into a central, aggregate cluster +Physical isolation of clusters (such as production vs. testing) +Cloud migration or hybrid cloud deployments +Legal and compliance requirements + + + +Administrators can set up such inter-cluster data flows with Kafka's MirrorMaker (version 2), a tool to replicate data between different Kafka environments in a streaming manner. MirrorMaker is built on top of the Kafka Connect framework and supports features such as: + + + +Replicates topics (data plus configurations) +Replicates consumer groups including offsets to migrate applications between clusters +Replicates ACLs +Preserves partitioning +Automatically detects new topics and partitions +Provides a wide range of metrics, such as end-to-end replication latency across multiple data centers/clusters +Fault-tolerant and horizontally scalable operations + + + + Note: Geo-replication with MirrorMaker replicates data across Kafka clusters. This inter-cluster replication is different from Kafka's intra-cluster replication, which replicates data within the same Kafka cluster. + + + What Are Replication Flows + + +With MirrorMaker, Kafka administrators can replicate topics, topic configurations, consumer groups and their offsets, and ACLs from one or more source Kafka clusters to one or more target Kafka clusters, i.e., across cluster environments. In a nutshell, MirrorMaker consumes data from the source cluster with source connectors, and then replicates the data by producing to the target cluster with sink connectors. + + + +These directional flows from source to target clusters are called replication flows. They are defined with the format {source_cluster}->{target_cluster} in the MirrorMaker configuration file as described later. Administrators can create complex replication topologies based on these flows. + + + +Here are some example patterns: + + + +Active/Active high availability deployments: A->B, B->A +Active/Passive or Active/Standby high availability deployments: A->B +Aggregation (e.g., from many clusters to one): A->K, B->K, C->K +Fan-out (e.g., from one to many clusters): K->A, K->B, K->C +Forwarding: A->B, B->C, C->D + + + +By default, a flow replicates all topics and consumer groups. However, each replication flow can be configured independently. For instance, you can define that only specific topics or consumer groups are replicated from the source cluster to the target cluster. + + + +Here is a first example on how to configure data replication from a primary cluster to a secondary cluster (an active/passive setup): + + +# Basic settings +clusters = primary, secondary +primary.bootstrap.servers = broker3-primary:9092 +secondary.bootstrap.servers = broker5-secondary:9092 + +# Define replication flows +primary->secondary.enable = true +primary->secondary.topics = foobar-topic, quux-.* + + + + Configuring Geo-Replication + + +The following sections describe how to configure and run a dedicated MirrorMaker cluster. If you want to run MirrorMaker within an existing Kafka Connect cluster or other supported deployment setups, please refer to https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0;>KIP-382: MirrorMaker 2.0 and be aware that the names of configuration settings may vary between deployment modes. + + + +Beyond what's covered in the following sections, further examples and information on configuration settings are available at: + + + + https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorMakerConfig.java;>MirrorMakerConfig, https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorConnectorConfig.java;>MirrorConnectorConfig + https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/DefaultTopicFilter.java;>DefaultTopicFilter for topics, https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/DefaultGroupFilter.java;>DefaultGroupFilter for consumer groups + Example configuration settings in
[GitHub] [kafka-site] miguno commented on a change in pull request #324: KAFKA-8930: MirrorMaker v2 documentation
miguno commented on a change in pull request #324: URL: https://github.com/apache/kafka-site/pull/324#discussion_r563562851 ## File path: 27/ops.html ## @@ -553,7 +539,558 @@ 6.3 Kafka Configuration + 6.3 Geo-Replication (Cross-Cluster Data Mirroring) + + Geo-Replication Overview + + +Kafka administrators can define data flows that cross the boundaries of individual Kafka clusters, data centers, or geo-regions. Such event streaming setups are often needed for organizational, technical, or legal requirements. Common scenarios include: + + + +Geo-replication +Disaster recovery +Feeding edge clusters into a central, aggregate cluster +Physical isolation of clusters (such as production vs. testing) +Cloud migration or hybrid cloud deployments +Legal and compliance requirements + + + +Administrators can set up such inter-cluster data flows with Kafka's MirrorMaker (version 2), a tool to replicate data between different Kafka environments in a streaming manner. MirrorMaker is built on top of the Kafka Connect framework and supports features such as: + + + +Replicates topics (data plus configurations) +Replicates consumer groups including offsets to migrate applications between clusters +Replicates ACLs +Preserves partitioning +Automatically detects new topics and partitions +Provides a wide range of metrics, such as end-to-end replication latency across multiple data centers/clusters +Fault-tolerant and horizontally scalable operations + + + + Note: Geo-replication with MirrorMaker replicates data across Kafka clusters. This inter-cluster replication is different from Kafka's intra-cluster replication, which replicates data within the same Kafka cluster. + + + What Are Replication Flows + + +With MirrorMaker, Kafka administrators can replicate topics, topic configurations, consumer groups and their offsets, and ACLs from one or more source Kafka clusters to one or more target Kafka clusters, i.e., across cluster environments. In a nutshell, MirrorMaker consumes data from the source cluster with source connectors, and then replicates the data by producing to the target cluster with sink connectors. Review comment: Thanks, @ryannedolan. Text updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka-site] miguno commented on a change in pull request #324: KAFKA-8930: MirrorMaker v2 documentation
miguno commented on a change in pull request #324: URL: https://github.com/apache/kafka-site/pull/324#discussion_r563562309 ## File path: 27/ops.html ## @@ -553,7 +539,558 @@ 6.3 Kafka Configuration + 6.3 Geo-Replication (Cross-Cluster Data Mirroring) + + Geo-Replication Overview + + +Kafka administrators can define data flows that cross the boundaries of individual Kafka clusters, data centers, or geo-regions. Such event streaming setups are often needed for organizational, technical, or legal requirements. Common scenarios include: + + + +Geo-replication +Disaster recovery +Feeding edge clusters into a central, aggregate cluster +Physical isolation of clusters (such as production vs. testing) +Cloud migration or hybrid cloud deployments +Legal and compliance requirements + + + +Administrators can set up such inter-cluster data flows with Kafka's MirrorMaker (version 2), a tool to replicate data between different Kafka environments in a streaming manner. MirrorMaker is built on top of the Kafka Connect framework and supports features such as: + + + +Replicates topics (data plus configurations) +Replicates consumer groups including offsets to migrate applications between clusters +Replicates ACLs +Preserves partitioning +Automatically detects new topics and partitions +Provides a wide range of metrics, such as end-to-end replication latency across multiple data centers/clusters +Fault-tolerant and horizontally scalable operations + + + + Note: Geo-replication with MirrorMaker replicates data across Kafka clusters. This inter-cluster replication is different from Kafka's intra-cluster replication, which replicates data within the same Kafka cluster. + + + What Are Replication Flows + + +With MirrorMaker, Kafka administrators can replicate topics, topic configurations, consumer groups and their offsets, and ACLs from one or more source Kafka clusters to one or more target Kafka clusters, i.e., across cluster environments. In a nutshell, MirrorMaker consumes data from the source cluster with source connectors, and then replicates the data by producing to the target cluster with sink connectors. + + + +These directional flows from source to target clusters are called replication flows. They are defined with the format {source_cluster}->{target_cluster} in the MirrorMaker configuration file as described later. Administrators can create complex replication topologies based on these flows. + + + +Here are some example patterns: + + + +Active/Active high availability deployments: A->B, B->A +Active/Passive or Active/Standby high availability deployments: A->B +Aggregation (e.g., from many clusters to one): A->K, B->K, C->K +Fan-out (e.g., from one to many clusters): K->A, K->B, K->C +Forwarding: A->B, B->C, C->D + + + +By default, a flow replicates all topics and consumer groups. However, each replication flow can be configured independently. For instance, you can define that only specific topics or consumer groups are replicated from the source cluster to the target cluster. + + + +Here is a first example on how to configure data replication from a primary cluster to a secondary cluster (an active/passive setup): + + +# Basic settings +clusters = primary, secondary +primary.bootstrap.servers = broker3-primary:9092 +secondary.bootstrap.servers = broker5-secondary:9092 + +# Define replication flows +primary->secondary.enable = true +primary->secondary.topics = foobar-topic, quux-.* + + + + Configuring Geo-Replication + + +The following sections describe how to configure and run a dedicated MirrorMaker cluster. If you want to run MirrorMaker within an existing Kafka Connect cluster or other supported deployment setups, please refer to https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0;>KIP-382: MirrorMaker 2.0 and be aware that the names of configuration settings may vary between deployment modes. + + + +Beyond what's covered in the following sections, further examples and information on configuration settings are available at: + + + + https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorMakerConfig.java;>MirrorMakerConfig, https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorConnectorConfig.java;>MirrorConnectorConfig + https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/DefaultTopicFilter.java;>DefaultTopicFilter for topics, https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/DefaultGroupFilter.java;>DefaultGroupFilter for consumer groups + Example configuration settings in