miguno commented on a change in pull request #324: URL: https://github.com/apache/kafka-site/pull/324#discussion_r563562851
########## File path: 27/ops.html ########## @@ -553,7 +539,558 @@ <h3 class="anchor-heading"><a id="datacenters" class="anchor-link"></a><a href=" <p> It is generally <i>not</i> advisable to run a <i>single</i> Kafka cluster that spans multiple datacenters over a high-latency link. This will incur very high replication latency both for Kafka writes and ZooKeeper writes, and neither Kafka nor ZooKeeper will remain available in all locations if the network between locations is unavailable. - <h3 class="anchor-heading"><a id="config" class="anchor-link"></a><a href="#config">6.3 Kafka Configuration</a></h3> + <h3 class="anchor-heading"><a id="georeplication" class="anchor-link"></a><a href="#georeplication">6.3 Geo-Replication (Cross-Cluster Data Mirroring)</a></h3> + + <h4 class="anchor-heading"><a id="georeplication-overview" class="anchor-link"></a><a href="#georeplication-overview">Geo-Replication Overview</a></h4> + + <p> + Kafka administrators can define data flows that cross the boundaries of individual Kafka clusters, data centers, or geo-regions. Such event streaming setups are often needed for organizational, technical, or legal requirements. Common scenarios include: + </p> + + <ul> + <li>Geo-replication</li> + <li>Disaster recovery</li> + <li>Feeding edge clusters into a central, aggregate cluster</li> + <li>Physical isolation of clusters (such as production vs. testing)</li> + <li>Cloud migration or hybrid cloud deployments</li> + <li>Legal and compliance requirements</li> + </ul> + + <p> + Administrators can set up such inter-cluster data flows with Kafka's MirrorMaker (version 2), a tool to replicate data between different Kafka environments in a streaming manner. MirrorMaker is built on top of the Kafka Connect framework and supports features such as: + </p> + + <ul> + <li>Replicates topics (data plus configurations)</li> + <li>Replicates consumer groups including offsets to migrate applications between clusters</li> + <li>Replicates ACLs</li> + <li>Preserves partitioning</li> + <li>Automatically detects new topics and partitions</li> + <li>Provides a wide range of metrics, such as end-to-end replication latency across multiple data centers/clusters</li> + <li>Fault-tolerant and horizontally scalable operations</li> + </ul> + + <p> + <em>Note: Geo-replication with MirrorMaker replicates data across Kafka clusters. This inter-cluster replication is different from Kafka's <a href="#replication">intra-cluster replication</a>, which replicates data within the same Kafka cluster.</em> + </p> + + <h4 class="anchor-heading"><a id="georeplication-flows" class="anchor-link"></a><a href="#georeplication-flows">What Are Replication Flows</a></h4> + + <p> + With MirrorMaker, Kafka administrators can replicate topics, topic configurations, consumer groups and their offsets, and ACLs from one or more source Kafka clusters to one or more target Kafka clusters, i.e., across cluster environments. In a nutshell, MirrorMaker consumes data from the source cluster with source connectors, and then replicates the data by producing to the target cluster with sink connectors. Review comment: Thanks, @ryannedolan. Text updated. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org