miguno commented on a change in pull request #324:
URL: https://github.com/apache/kafka-site/pull/324#discussion_r563562851
##########
File path: 27/ops.html
##########
@@ -553,7 +539,558 @@ <h3 class="anchor-heading"><a id="datacenters"
class="anchor-link"></a><a href="
<p>
It is generally <i>not</i> advisable to run a <i>single</i> Kafka cluster
that spans multiple datacenters over a high-latency link. This will incur very
high replication latency both for Kafka writes and ZooKeeper writes, and
neither Kafka nor ZooKeeper will remain available in all locations if the
network between locations is unavailable.
- <h3 class="anchor-heading"><a id="config" class="anchor-link"></a><a
href="#config">6.3 Kafka Configuration</a></h3>
+ <h3 class="anchor-heading"><a id="georeplication" class="anchor-link"></a><a
href="#georeplication">6.3 Geo-Replication (Cross-Cluster Data
Mirroring)</a></h3>
+
+ <h4 class="anchor-heading"><a id="georeplication-overview"
class="anchor-link"></a><a href="#georeplication-overview">Geo-Replication
Overview</a></h4>
+
+ <p>
+ Kafka administrators can define data flows that cross the boundaries of
individual Kafka clusters, data centers, or geo-regions. Such event streaming
setups are often needed for organizational, technical, or legal requirements.
Common scenarios include:
+ </p>
+
+ <ul>
+ <li>Geo-replication</li>
+ <li>Disaster recovery</li>
+ <li>Feeding edge clusters into a central, aggregate cluster</li>
+ <li>Physical isolation of clusters (such as production vs. testing)</li>
+ <li>Cloud migration or hybrid cloud deployments</li>
+ <li>Legal and compliance requirements</li>
+ </ul>
+
+ <p>
+ Administrators can set up such inter-cluster data flows with Kafka's
MirrorMaker (version 2), a tool to replicate data between different Kafka
environments in a streaming manner. MirrorMaker is built on top of the Kafka
Connect framework and supports features such as:
+ </p>
+
+ <ul>
+ <li>Replicates topics (data plus configurations)</li>
+ <li>Replicates consumer groups including offsets to migrate applications
between clusters</li>
+ <li>Replicates ACLs</li>
+ <li>Preserves partitioning</li>
+ <li>Automatically detects new topics and partitions</li>
+ <li>Provides a wide range of metrics, such as end-to-end replication
latency across multiple data centers/clusters</li>
+ <li>Fault-tolerant and horizontally scalable operations</li>
+ </ul>
+
+ <p>
+ <em>Note: Geo-replication with MirrorMaker replicates data across Kafka
clusters. This inter-cluster replication is different from Kafka's <a
href="#replication">intra-cluster replication</a>, which replicates data within
the same Kafka cluster.</em>
+ </p>
+
+ <h4 class="anchor-heading"><a id="georeplication-flows"
class="anchor-link"></a><a href="#georeplication-flows">What Are Replication
Flows</a></h4>
+
+ <p>
+ With MirrorMaker, Kafka administrators can replicate topics, topic
configurations, consumer groups and their offsets, and ACLs from one or more
source Kafka clusters to one or more target Kafka clusters, i.e., across
cluster environments. In a nutshell, MirrorMaker consumes data from the source
cluster with source connectors, and then replicates the data by producing to
the target cluster with sink connectors.
Review comment:
Thanks, @ryannedolan. Text updated.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]