kafka git commit: MINOR: kafka-site introduction section improvements

guozhang Wed, 15 Nov 2017 14:32:25 -0800

Repository: kafka
Updated Branches:
  refs/heads/trunk 54371e63d -> 48f5f048b



MINOR: kafka-site introduction section improvements

*Clarify multi-tenant support, geo-replication, and some grammar fixes.*

Author: Joel Hamill <[email protected]>

Reviewers: GUozhang Wang

Closes #4212 from joel-hamill/intro-cleanup


Project: http://git-wip-us.apache.org/repos/asf/kafka/repo
Commit: http://git-wip-us.apache.org/repos/asf/kafka/commit/48f5f048
Tree: http://git-wip-us.apache.org/repos/asf/kafka/tree/48f5f048
Diff: http://git-wip-us.apache.org/repos/asf/kafka/diff/48f5f048

Branch: refs/heads/trunk
Commit: 48f5f048bc6fd5e059cd1311eb8428f0c1f088e8
Parents: 54371e6
Author: Joel Hamill <[email protected]>
Authored: Wed Nov 15 14:32:00 2017 -0800
Committer: Guozhang Wang <[email protected]>
Committed: Wed Nov 15 14:32:00 2017 -0800

----------------------------------------------------------------------
 docs/introduction.html | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kafka/blob/48f5f048/docs/introduction.html
----------------------------------------------------------------------
diff --git a/docs/introduction.html b/docs/introduction.html
index 5b3bb4a..7f4c3e2 100644
--- a/docs/introduction.html
+++ b/docs/introduction.html
@@ -19,22 +19,21 @@
 
 <script id="introduction-template" type="text/x-handlebars-template">
   <h3> Apache Kafka&reg; is <i>a distributed streaming platform</i>. What 
exactly does that mean?</h3>
-  <p>We think of a streaming platform as having three key capabilities:</p>
-  <ol>
-    <li>It lets you publish and subscribe to streams of records. In this 
respect it is similar to a message queue or enterprise messaging system.
-    <li>It lets you store streams of records in a fault-tolerant way.
-    <li>It lets you process streams of records as they occur.
-  </ol>
-  <p>What is Kafka good for?</p>
-  <p>It gets used for two broad classes of application:</p>
-  <ol>
+  <p>A streaming platform has three key capabilities:</p>
+  <ul>
+    <li>Publish and subscribe to streams of records, similar to a message 
queue or enterprise messaging system.
+    <li>Store streams of records in a fault-tolerant durable way.
+    <li>Process streams of records as they occur.
+  </ul>
+  <p>Kafka is generally used for two broad classes of applications:</p>
+  <ul>
     <li>Building real-time streaming data pipelines that reliably get data 
between systems or applications
     <li>Building real-time streaming applications that transform or react to 
the streams of data
-  </ol>
+  </ul>
   <p>To understand how Kafka does these things, let's dive in and explore 
Kafka's capabilities from the bottom up.</p>
   <p>First a few concepts:</p>
   <ul>
-    <li>Kafka is run as a cluster on one or more servers.
+    <li>Kafka is run as a cluster on one or more servers that can span 
multiple datacenters.
       <li>The Kafka cluster stores streams of <i>records</i> in categories 
called <i>topics</i>.
     <li>Each record consists of a key, a value, and a timestamp.
   </ul>
@@ -60,7 +59,7 @@
   <p> Each partition is an ordered, immutable sequence of records that is 
continually appended to&mdash;a structured commit log. The records in the 
partitions are each assigned a sequential id number called the <i>offset</i> 
that uniquely identifies each record within the partition.
   </p>
   <p>
-  The Kafka cluster retains all published records&mdash;whether or not they 
have been consumed&mdash;using a configurable retention period. For example, if 
the retention policy is set to two days, then for the two days after a record 
is published, it is available for consumption, after which it will be discarded 
to free up space. Kafka's performance is effectively constant with respect to 
data size so storing data for a long time is not a problem.
+  The Kafka cluster durably persists all published records&mdash;whether or 
not they have been consumed&mdash;using a configurable retention period. For 
example, if the retention policy is set to two days, then for the two days 
after a record is published, it is available for consumption, after which it 
will be discarded to free up space. Kafka's performance is effectively constant 
with respect to data size so storing data for a long time is not a problem.
   </p>
   <img class="centered" src="/{{version}}/images/log_consumer.png" 
style="width:400px">
   <p>
@@ -82,6 +81,10 @@
   Each partition has one server which acts as the "leader" and zero or more 
servers which act as "followers". The leader handles all read and write 
requests for the partition while the followers passively replicate the leader. 
If the leader fails, one of the followers will automatically become the new 
leader. Each server acts as a leader for some of its partitions and a follower 
for others so load is well balanced within the cluster.
   </p>
 
+  <h4><a id="intro_geo-replication" 
href="#intro_geo-replication">Geo-Replication</a></h4>
+
+  <p>Kafka MirrorMaker provides geo-replication support for your clusters. 
With MirrorMaker, messages are replicated across multiple datacenters or cloud 
regions. You can use this in active/passive scenarios for backup and recovery; 
or in active/active scenarios to place data closer to your users, or support 
data locality requirements. </p>
+
   <h4><a id="intro_producers" href="#intro_producers">Producers</a></h4>
   <p>
   Producers publish data to the topics of their choice. The producer is 
responsible for choosing which record to assign to which partition within the 
topic. This can be done in a round-robin fashion simply to balance load or it 
can be done according to some semantic partition function (say based on some 
key in the record). More on the use of partitioning in a second!
@@ -111,6 +114,8 @@
   <p>
   Kafka only provides a total order over records <i>within</i> a partition, 
not between different partitions in a topic. Per-partition ordering combined 
with the ability to partition data by key is sufficient for most applications. 
However, if you require a total order over records this can be achieved with a 
topic that has only one partition, though this will mean only one consumer 
process per consumer group.
   </p>
+  <h4><a id="intro_multi-tenancy" 
href="#intro_multi-tenancy">Multi-tenancy</a></h4>
+  <p>You can deploy Kafka as a multi-tenant solution. Multi-tenancy is enabled 
by configuring which topics can produce or consume data. There is also 
operations support for quotas.  Administrators can define and enforce quotas on 
requests to control the broker resources that are used by clients.  For more 
information, see the <a 
href="https://kafka.apache.org/documentation/#security";>security 
documentation</a>. </p>
   <h4><a id="intro_guarantees" href="#intro_guarantees">Guarantees</a></h4>
   <p>
   At a high-level Kafka gives the following guarantees:

kafka git commit: MINOR: kafka-site introduction section improvements

Reply via email to