kafka git commit: KAFKA-4244; Fix formatting issues in documentation

jgus Mon, 10 Oct 2016 13:42:36 -0700

Repository: kafka
Updated Branches:
  refs/heads/trunk 7a5133d55 -> bf98c4738



KAFKA-4244; Fix formatting issues in documentation

Author: Gwen Shapira <[email protected]>

Reviewers: Jason Gustafson <[email protected]>

Closes #1966 from gwenshap/KAFKA-4244


Project: http://git-wip-us.apache.org/repos/asf/kafka/repo
Commit: http://git-wip-us.apache.org/repos/asf/kafka/commit/bf98c473
Tree: http://git-wip-us.apache.org/repos/asf/kafka/tree/bf98c473
Diff: http://git-wip-us.apache.org/repos/asf/kafka/diff/bf98c473

Branch: refs/heads/trunk
Commit: bf98c47389baa735aab9cfbf513190a8205447f9
Parents: 7a5133d
Author: Gwen Shapira <[email protected]>
Authored: Mon Oct 10 13:35:11 2016 -0700
Committer: Jason Gustafson <[email protected]>
Committed: Mon Oct 10 13:39:05 2016 -0700

----------------------------------------------------------------------
 docs/documentation.html |   4 +-
 docs/introduction.html  | 112 +++++++++++++++++++++++++++----------------
 docs/protocol.html      |  12 ++++-
 docs/quickstart.html    |  84 +++++++++++++++++++-------------
 docs/uses.html          |   2 +-
 5 files changed, 134 insertions(+), 80 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kafka/blob/bf98c473/docs/documentation.html
----------------------------------------------------------------------
diff --git a/docs/documentation.html b/docs/documentation.html
index 07ffe84..2f4b8df 100644
--- a/docs/documentation.html
+++ b/docs/documentation.html
@@ -15,9 +15,7 @@
  limitations under the License.
 -->
 
-<!--#include virtual="../includes/header.html" -->
-
-<h1>Kafka 0.10.0 Documentation</h1>
+<h3>Kafka 0.10.0 Documentation</h3>
 Prior releases: <a href="/07/documentation.html">0.7.x</a>, <a 
href="/08/documentation.html">0.8.0</a>, <a 
href="/081/documentation.html">0.8.1.X</a>, <a 
href="/082/documentation.html">0.8.2.X</a>, <a 
href="/090/documentation.html">0.9.0.X</a>.
 </ul>
 

http://git-wip-us.apache.org/repos/asf/kafka/blob/bf98c473/docs/introduction.html
----------------------------------------------------------------------
diff --git a/docs/introduction.html b/docs/introduction.html
index 3f03fc1..484c0e7 100644
--- a/docs/introduction.html
+++ b/docs/introduction.html
@@ -14,156 +14,186 @@
  See the License for the specific language governing permissions and
  limitations under the License.
 -->
-Kafka is <i>a distributed streaming platform</i>. What exactly does that mean?
-<p>
-We think of a streaming platform as having three key capabilities:
+<h3> Kafka is <i>a distributed streaming platform</i>. What exactly does that 
mean?</h3>
+<p>We think of a streaming platform as having three key capabilities:</p>
 <ol>
        <li>It let's you publish and subscribe to streams of records. In this 
respect it is similar to a message queue or enterprise messaging system.
        <li>It let's you store streams of records in a fault-tolerant way.
        <li>It let's you process streams of records as they occur.
 </ol>
-<p>
-What is Kafka good for?
-<p>
-It gets used for two broad classes of application:
+<p>What is Kafka good for?</p>
+<p>It gets used for two broad classes of application:</p>
 <ol>
   <li>Building real-time streaming data pipelines that reliably get data 
between systems or applications
   <li>Building real-time streaming applications that transform or react to the 
streams of data
 </ol>
-<p>
-To understand how Kafka does these things, let's dive in and explore Kafka's 
capabilities from the bottom up.
-<p>
-First a few concepts:
+<p>To understand how Kafka does these things, let's dive in and explore 
Kafka's capabilities from the bottom up.</p>
+<p>First a few concepts:</p>
 <ul>
        <li>Kafka is run as a cluster on one or more servers.
     <li>The Kafka cluster stores streams of <i>records</i> in categories 
called <i>topics</i>.
        <li>Each record consists of a key, a value, and a timestamp.
 </ul>
-Kafka has four core APIs:
-<div style="float: right">
-  <img src="images/kafka-apis.png" style="width:400px">
-</div>
-<ul>
+<p>Kafka has four core APIs:</p>
+<div style="overflow: hidden;">
+    <ul style="float: left; width: 40%;">
     <li>The <a href="/documentation.html#producerapi">Producer API</a> allows 
an application to publish a stream records to one or more Kafka topics.
     <li>The <a href="/documentation.html#consumerapi">Consumer API</a> allows 
an application to subscribe to one or more topics and process the stream of 
records produced to them.
        <li>The <a href="/documentation.html#streams">Streams API</a> allows an 
application to act as a <i>stream processor</i>, consuming an input stream from 
one or more topics and producing an output stream to one or more output topics, 
effectively transforming the input streams to output streams.
        <li>The <a href="/documentation.html#connect">Connector API</a> allows 
building and running reusable producers or consumers that connect Kafka topics 
to existing applications or data systems. For example, a connector to a 
relational database might capture every change to a table.
 </ul>
+    <img src="images/kafka-apis.png" style="float: right; width: 50%;">
+    </div>
 <p>
-In Kafka the communication between the clients and the servers is done with a 
simple, high-performance, language agnostic <a 
href="https://kafka.apache.org/protocol.html";>TCP protocol</a>. This protocol 
is versioned and maintains backwards compatibility with older version. We 
provide a Java client for Kafka, but clients are available in <a 
href="https://cwiki.apache.org/confluence/display/KAFKA/Clients";>many 
languages</a>.
+In Kafka the communication between the clients and the servers is done with a 
simple, high-performance, language agnostic <a 
href="https://kafka.apache.org/protocol.html";>TCP protocol</a>. This protocol 
is versioned and maintains backwards compatibility with older version. We 
provide a Java client for Kafka, but clients are available in <a 
href="https://cwiki.apache.org/confluence/display/KAFKA/Clients";>many 
languages</a>.</p>
 
 <h4><a id="intro_topics" href="#intro_topics">Topics and Logs</a></h4>
-Let's first dive into the core abstraction Kafka provides for a stream of 
records&mdash;the topic.
-<p>
-A topic is a category or feed name to which records are published. Topics in 
Kafka are always multi-subscriber; that is, a topic can have zero, one, or many 
consumers that subscribe to the data written to it.
-<p>
-For each topic, the Kafka cluster maintains a partitioned log that looks like 
this:
-<div style="text-align: center; width: 100%">
-  <img src="images/log_anatomy.png">
-</div>
-Each partition is an ordered, immutable sequence of records that is 
continually appended to&mdash;a structured commit log. The records in the 
partitions are each assigned a sequential id number called the <i>offset</i> 
that uniquely identifies each record within the partition.
+<p>Let's first dive into the core abstraction Kafka provides for a stream of 
records&mdash;the topic.</p>
+<p>A topic is a category or feed name to which records are published. Topics 
in Kafka are always multi-subscriber; that is, a topic can have zero, one, or 
many consumers that subscribe to the data written to it.</p>
+<p> For each topic, the Kafka cluster maintains a partitioned log that looks 
like this: </p>
+<img src="images/log_anatomy.png">
+
+<p> Each partition is an ordered, immutable sequence of records that is 
continually appended to&mdash;a structured commit log. The records in the 
partitions are each assigned a sequential id number called the <i>offset</i> 
that uniquely identifies each record within the partition.
+</p>
 <p>
 The Kafka cluster retains all published records&mdash;whether or not they have 
been consumed&mdash;using a configurable retention period. For example if the 
retention policy is set to two days, then for the two days after a record is 
published, it is available for consumption, after which it will be discarded to 
free up space. Kafka's performance is effectively constant with respect to data 
size so storing data for a long time is not a problem.
+</p>
+<img class="centered" src="images/log_consumer.png" style="width:400px">
 <p>
-<div style="float:right">
-  <img src="images/log_consumer.png" style="width:400px">
-</div>
 In fact, the only metadata retained on a per-consumer basis is the offset or 
position of that consumer in the log. This offset is controlled by the 
consumer: normally a consumer will advance its offset linearly as it reads 
records, but, in fact, since the position is controlled by the consumer it can 
consume records in any order it likes. For example a consumer can reset to an 
older offset to reprocess data from the past or skip ahead to the most recent 
record and start consuming from "now".
+</p>
 <p>
 This combination of features means that Kafka consumers are very 
cheap&mdash;they can come and go without much impact on the cluster or on other 
consumers. For example, you can use our command line tools to "tail" the 
contents of any topic without changing what is consumed by any existing 
consumers.
+</p>
 <p>
 The partitions in the log serve several purposes. First, they allow the log to 
scale beyond a size that will fit on a single server. Each individual partition 
must fit on the servers that host it, but a topic may have many partitions so 
it can handle an arbitrary amount of data. Second they act as the unit of 
parallelism&mdash;more on that in a bit.
+</p>
 
 <h4><a id="intro_distribution" href="#intro_distribution">Distribution</a></h4>
 
+<p>
 The partitions of the log are distributed over the servers in the Kafka 
cluster with each server handling data and requests for a share of the 
partitions. Each partition is replicated across a configurable number of 
servers for fault tolerance.
+</p>
 <p>
 Each partition has one server which acts as the "leader" and zero or more 
servers which act as "followers". The leader handles all read and write 
requests for the partition while the followers passively replicate the leader. 
If the leader fails, one of the followers will automatically become the new 
leader. Each server acts as a leader for some of its partitions and a follower 
for others so load is well balanced within the cluster.
+</p>
 
 <h4><a id="intro_producers" href="#intro_producers">Producers</a></h4>
-
+<p>
 Producers publish data to the topics of their choice. The producer is 
responsible for choosing which record to assign to which partition within the 
topic. This can be done in a round-robin fashion simply to balance load or it 
can be done according to some semantic partition function (say based on some 
key in the record). More on the use of partitioning in a second!
+</p>
 
 <h4><a id="intro_consumers" href="#intro_consumers">Consumers</a></h4>
 
+<p>
 Consumers label themselves with a <i>consumer group</i> name, and each record 
published to a topic is delivered to one consumer instance within each 
subscribing consumer group. Consumer instances can be in separate processes or 
on separate machines.
+</p>
 <p>
-If all the consumer instances have the same consumer group, then the records 
will effectively be load balanced over the consumer instances.
+If all the consumer instances have the same consumer group, then the records 
will effectively be load balanced over the consumer instances.</p>
 <p>
 If all the consumer instances have different consumer groups, then each record 
will be broadcast to all the consumer processes.
-<div style="float: right; margin: 20px; width: 500px" class="caption">
-  <img src="images/consumer-groups.png"><br>
+</p>
+<img class="centered" src="images/consumer-groups.png">
+<p>
   A two server Kafka cluster hosting four partitions (P0-P3) with two consumer 
groups. Consumer group A has two consumer instances and group B has four.
-</div>
+</p>
+
 <p>
 More commonly, however, we have found that topics have a small number of 
consumer groups, one for each "logical subscriber". Each group is composed of 
many consumer instances for scalability and fault tolerance. This is nothing 
more than publish-subscribe semantics where the subscriber is a cluster of 
consumers instead of a single process.
+</p>
 <p>
 The way consumption is implemented in Kafka is by dividing up the partitions 
in the log over the consumer instances so that each instance is the exclusive 
consumer of a "fair share" of partitions at any point in time. This process of 
maintaining membership in the group is handled by the Kafka protocol 
dynamically. If new instances join the group they will take over some 
partitions from other members of the group; if an instance dies, its partitions 
will be distributed to the remaining instances.
+</p>
 <p>
 Kafka only provides a total order over records <i>within</i> a partition, not 
between different partitions in a topic. Per-partition ordering combined with 
the ability to partition data by key is sufficient for most applications. 
However, if you require a total order over records this can be achieved with a 
topic that has only one partition, though this will mean only one consumer 
process per consumer group.
-
+</p>
 <h4><a id="intro_guarantees" href="#intro_guarantees">Guarantees</a></h4>
-
+<p>
 At a high-level Kafka gives the following guarantees:
+</p>
 <ul>
   <li>Messages sent by a producer to a particular topic partition will be 
appended in the order they are sent. That is, if a record M1 is sent by the 
same producer as a record M2, and M1 is sent first, then M1 will have a lower 
offset than M2 and appear earlier in the log.
   <li>A consumer instance sees records in the order they are stored in the log.
   <li>For a topic with replication factor N, we will tolerate up to N-1 server 
failures without losing any records committed to the log.
 </ul>
+<p>
 More details on these guarantees are given in the design section of the 
documentation.
-
+</p>
 <h4><a id="kafka_mq" href="#kafka_mq">Kafka as a Messaging System</a></h4>
-
+<p>
 How does Kafka's notion of streams compare to a traditional enterprise 
messaging system?
+</p>
 <p>
 Messaging traditionally has two models: <a 
href="http://en.wikipedia.org/wiki/Message_queue";>queuing</a> and <a 
href="http://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern";>publish-subscribe</a>.
 In a queue, a pool of consumers may read from a server and each record goes to 
one of them; in publish-subscribe the record is broadcast to all consumers. 
Each of these two models has a strength and a weakness. The strength of queuing 
is that it allows you to divide up the processing of data over multiple 
consumer instances, which lets you scale your processing. Unfortunately queues 
aren't multi-subscriber&mdash;once one process reads the data it's gone. 
Publish-subscribe allows you broadcast data to multiple processes, but has no 
way of scaling processing since every message goes to every subscriber.
+</p>
 <p>
 The consumer group concept in Kafka generalizes these two concepts. As with a 
queue the consumer group allows you to divide up processing over a collection 
of processes (the members of the consumer group). As with publish-subscribe, 
Kafka allows you to broadcast messages to multiple consumer groups.
+</p>
 <p>
 The advantage of Kafka's model is that every topic has both these 
properties&mdash;it can scale processing and is also 
multi-subscriber&mdash;there is no need to choose one or the other.
+</p>
 <p>
 Kafka has stronger ordering guarantees than a traditional messaging system, 
too.
+</p>
 <p>
 A traditional queue retains records in-order on the server, and if multiple 
consumers consume from the queue then the server hands out records in the order 
they are stored. However, although the server hands out records in order, the 
records are delivered asynchronously to consumers, so they may arrive out of 
order on different consumers. This effectively means the ordering of the 
records is lost in the presence of parallel consumption. Messaging systems 
often work around this by having a notion of "exclusive consumer" that allows 
only one process to consume from a queue, but of course this means that there 
is no parallelism in processing.
+</p>
 <p>
 Kafka does it better. By having a notion of parallelism&mdash;the 
partition&mdash;within the topics, Kafka is able to provide both ordering 
guarantees and load balancing over a pool of consumer processes. This is 
achieved by assigning the partitions in the topic to the consumers in the 
consumer group so that each partition is consumed by exactly one consumer in 
the group. By doing this we ensure that the consumer is the only reader of that 
partition and consumes the data in order. Since there are many partitions this 
still balances the load over many consumer instances. Note however that there 
cannot be more consumer instances in a consumer group than partitions.
+</p>
 
 <h4>Kafka as a Storage System</h4>
 
+<p>
 Any message queue that allows publishing messages decoupled from consuming 
them is effectively acting as a storage system for the in-flight messages. What 
is different about Kafka is that it is a very good storage system.
+</p>
 <p>
 Data written to Kafka is written to disk and replicated for fault-tolerance. 
Kafka allows producers to wait on acknowledgement so that a write isn't 
considered complete until it is fully replicated and guaranteed to persist even 
if the server written to fails.
+</p>
 <p>
 The disk structures Kafka uses scale well&mdash;Kafka will perform the same 
whether you have 50 KB or 50 TB of persistent data on the server.
+</p>
 <p>
 As a result of taking storage seriously and allowing the clients to control 
their read position, you can think of Kafka as a kind of special purpose 
distributed filesystem dedicated to high-performance, low-latency commit log 
storage, replication, and propagation.
-
+</p>
 <h4>Kafka for Stream Processing</h4>
 <p>
 It isn't enough to just read, write, and store streams of data, the purpose is 
to enable real-time processing of streams.
+</p>
 <p>
 In Kafka a stream processor is anything that takes continual streams of  data 
from input topics, performs some processing on this input, and produces 
continual streams of data to output topics.
+</p>
 <p>
 For example a retail application might take in input streams of sales and 
shipments, and output a stream of reorders and price adjustments computed off 
this data.
+</p>
 <p>
 It is possible to do simple processing directly using the producer and 
consumer APIs. However for more complex transformations Kafka provides a fully 
integrated <a href="/documentation.html#streams">Streams API</a>. This allows 
building applications that do non-trivial processing that compute aggregations 
off of streams or join streams together.
+</p>
 <p>
 This facility helps solve the hard problems this type of application faces: 
handling out-of-order data, reprocessing input as code changes, performing 
stateful computations, etc.
+</p>
 <p>
 The streams API builds on the core primitives Kafka provides: it uses the 
producer and consumer APIs for input, uses Kafka for stateful storage, and uses 
the same group mechanism for fault tolerance among the stream processor 
instances.
-
+</p>
 <h4>Putting the Pieces Together</h4>
-
+<p>
 This combination of messaging, storage, and stream processing may seem unusual 
but it is essential to Kafka's role as a streaming platform.
+</p>
 <p>
 A distributed file system like HDFS allows storing static files for batch 
processing. Effectively a system like this allows storing and processing 
<i>historical</i> data from the past.
+</p>
 <p>
 A traditional enterprise messaging system allows processing future messages 
that will arrive after you subscribe. Applications built in this way process 
future data as it arrives.
+</p>
 <p>
 Kafka combines both of these capabilities, and the combination is critical 
both for Kafka usage as a platform for streaming applications as well as for 
streaming data pipelines.
+</p>
 <p>
 By combining storage and low-latency subscriptions, streaming applications can 
treat both past and future data the same way. That is a single application can 
process historical, stored data but rather than ending when it reaches the last 
record it can keep processing as future data arrives. This is a generalized 
notion of stream processing that subsumes batch processing as well as 
message-driven applications.
+</p>
 <p>
 Likewise for streaming data pipelines the combination of subscription to 
real-time events make it possible to use Kafka for very low-latency pipelines; 
but the ability to store data reliably make it possible to use it for critical 
data where the delivery of data must be guaranteed or for integration with 
offline systems that load data only periodically or may go down for extended 
periods of time for maintenance. The stream processing facilities make it 
possible to transform data as it arrives.
+</p>
 <p>
 For more information on the guarantees, apis, and capabilities Kafka provides 
see the rest of the <a href="/documentation.html">documentation</a>.
+</p>

http://git-wip-us.apache.org/repos/asf/kafka/blob/bf98c473/docs/protocol.html
----------------------------------------------------------------------
diff --git a/docs/protocol.html b/docs/protocol.html
index e28b0a8..ae70971 100644
--- a/docs/protocol.html
+++ b/docs/protocol.html
@@ -16,8 +16,11 @@
 -->
 
 <!--#include virtual="../includes/header.html" -->
-
-<h3><a id="protocol" href="#protocol">Kafka Wire Protocol</a></h3>
+<!--#include virtual="../includes/top.html" -->
+<div class="content">
+    <!--#include virtual="../includes/nav.html" -->
+    <div class="right">
+        <h1>Kafka protocol guide</h1>
 
 <p>This document covers the wire protocol implemented in Kafka. It is meant to 
give a readable guide to the protocol that covers the available requests, their 
binary format, and the proper way to make use of them to implement a client. 
This document assumes you understand the basic design and terminology described 
<a href="https://kafka.apache.org/documentation.html#design";>here</a></p>
 
@@ -220,4 +223,9 @@ Size => int32
 
 <p>A final question is why we don't use a system like Protocol Buffers or 
Thrift to define our request messages. These packages excel at helping you to 
managing lots and lots of serialized messages. However we have only a few 
messages. Support across languages is somewhat spotty (depending on the 
package). Finally the mapping between binary log format and wire protocol is 
something we manage somewhat carefully and this would not be possible with 
these systems. Finally we prefer the style of versioning APIs explicitly and 
checking this to inferring new values as nulls as it allows more nuanced 
control of compatibility.</p>
 
+    <script>
+        // Show selected style on nav item
+        $(function() { $('.b-nav__project').addClass('selected'); });
+    </script>
+
 <!--#include virtual="../includes/footer.html" -->

http://git-wip-us.apache.org/repos/asf/kafka/blob/bf98c473/docs/quickstart.html
----------------------------------------------------------------------
diff --git a/docs/quickstart.html b/docs/quickstart.html
index 3066960..1038bfc 100644
--- a/docs/quickstart.html
+++ b/docs/quickstart.html
@@ -17,8 +17,10 @@
 
 <h3><a id="quickstart" href="#quickstart">1.3 Quick Start</a></h3>
 
+<p>
 This tutorial assumes you are starting fresh and have no existing Kafka or 
ZooKeeper data.
 Since Kafka console scripts are different for Unix-based and Windows 
platforms, on Windows platforms use <code>bin\windows\</code> instead of 
<code>bin/</code>, and change the script extension to <code>.bat</code>.
+</p>
 
 <h4><a id="quickstart_download" href="#quickstart_download">Step 1: Download 
the code</a></h4>
 
@@ -33,6 +35,7 @@ Since Kafka console scripts are different for Unix-based and 
Windows platforms,
 
 <p>
 Kafka uses ZooKeeper so you need to first start a ZooKeeper server if you 
don't already have one. You can use the convenience script packaged with kafka 
to get a quick-and-dirty single-node ZooKeeper instance.
+</p>
 
 <pre>
 &gt; <b>bin/zookeeper-server-start.sh config/zookeeper.properties</b>
@@ -40,7 +43,7 @@ Kafka uses ZooKeeper so you need to first start a ZooKeeper 
server if you don't
 ...
 </pre>
 
-Now start the Kafka server:
+<p>Now start the Kafka server:</p>
 <pre>
 &gt; <b>bin/kafka-server-start.sh config/server.properties</b>
 [2013-04-22 15:01:47,028] INFO Verifying properties 
(kafka.utils.VerifiableProperties)
@@ -50,23 +53,23 @@ Now start the Kafka server:
 
 <h4><a id="quickstart_createtopic" href="#quickstart_createtopic">Step 3: 
Create a topic</a></h4>
 
-Let's create a topic named "test" with a single partition and only one replica:
+<p>Let's create a topic named "test" with a single partition and only one 
replica:</p>
 <pre>
 &gt; <b>bin/kafka-topics.sh --create --zookeeper localhost:2181 
--replication-factor 1 --partitions 1 --topic test</b>
 </pre>
 
-We can now see that topic if we run the list topic command:
+<p>We can now see that topic if we run the list topic command:</p>
 <pre>
 &gt; <b>bin/kafka-topics.sh --list --zookeeper localhost:2181</b>
 test
 </pre>
-Alternatively, instead of manually creating topics you can also configure your 
brokers to auto-create topics when a non-existent topic is published to.
+<p>Alternatively, instead of manually creating topics you can also configure 
your brokers to auto-create topics when a non-existent topic is published 
to.</p>
 
 <h4><a id="quickstart_send" href="#quickstart_send">Step 4: Send some 
messages</a></h4>
 
-Kafka comes with a command line client that will take input from a file or 
from standard input and send it out as messages to the Kafka cluster. By 
default each line will be sent as a separate message.
+<p>Kafka comes with a command line client that will take input from a file or 
from standard input and send it out as messages to the Kafka cluster. By 
default each line will be sent as a separate message.</p>
 <p>
-Run the producer and then type a few messages into the console to send to the 
server.
+Run the producer and then type a few messages into the console to send to the 
server.</p>
 
 <pre>
 &gt; <b>bin/kafka-console-producer.sh --broker-list localhost:9092 --topic 
test</b>
@@ -76,7 +79,7 @@ Run the producer and then type a few messages into the 
console to send to the se
 
 <h4><a id="quickstart_consume" href="#quickstart_consume">Step 5: Start a 
consumer</a></h4>
 
-Kafka also has a command line consumer that will dump out messages to standard 
output.
+<p>Kafka also has a command line consumer that will dump out messages to 
standard output.</p>
 
 <pre>
 &gt; <b>bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 
--topic test --from-beginning</b>
@@ -92,15 +95,18 @@ All of the command line tools have additional options; 
running the command with
 
 <h4><a id="quickstart_multibroker" href="#quickstart_multibroker">Step 6: 
Setting up a multi-broker cluster</a></h4>
 
-So far we have been running against a single broker, but that's no fun. For 
Kafka, a single broker is just a cluster of size one, so nothing much changes 
other than starting a few more broker instances. But just to get feel for it, 
let's expand our cluster to three nodes (still all on our local machine).
+<p>So far we have been running against a single broker, but that's no fun. For 
Kafka, a single broker is just a cluster of size one, so nothing much changes 
other than starting a few more broker instances. But just to get feel for it, 
let's expand our cluster to three nodes (still all on our local machine).</p>
 <p>
 First we make a config file for each of the brokers (on Windows use the 
<code>copy</code> command instead):
+</p>
 <pre>
 &gt; <b>cp config/server.properties config/server-1.properties</b>
 &gt; <b>cp config/server.properties config/server-2.properties</b>
 </pre>
 
+<p>
 Now edit these new files and set the following properties:
+</p>
 <pre>
 
 config/server-1.properties:
@@ -113,9 +119,10 @@ config/server-2.properties:
     listeners=PLAINTEXT://:9094
     log.dir=/tmp/kafka-logs-2
 </pre>
-The <code>broker.id</code> property is the unique and permanent name of each 
node in the cluster. We have to override the port and log directory only 
because we are running these all on the same machine and we want to keep the 
brokers from all trying to register on the same port or overwrite each others 
data.
+<p>The <code>broker.id</code> property is the unique and permanent name of 
each node in the cluster. We have to override the port and log directory only 
because we are running these all on the same machine and we want to keep the 
brokers from all trying to register on the same port or overwrite each others 
data.</p>
 <p>
 We already have Zookeeper and our single node started, so we just need to 
start the two new nodes:
+</p>
 <pre>
 &gt; <b>bin/kafka-server-start.sh config/server-1.properties &amp;</b>
 ...
@@ -123,34 +130,36 @@ We already have Zookeeper and our single node started, so 
we just need to start
 ...
 </pre>
 
-Now create a new topic with a replication factor of three:
+<p>Now create a new topic with a replication factor of three:</p>
 <pre>
 &gt; <b>bin/kafka-topics.sh --create --zookeeper localhost:2181 
--replication-factor 3 --partitions 1 --topic my-replicated-topic</b>
 </pre>
 
-Okay but now that we have a cluster how can we know which broker is doing 
what? To see that run the "describe topics" command:
+<p>Okay but now that we have a cluster how can we know which broker is doing 
what? To see that run the "describe topics" command:</p>
 <pre>
 &gt; <b>bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic 
my-replicated-topic</b>
 Topic:my-replicated-topic      PartitionCount:1        ReplicationFactor:3     
Configs:
        Topic: my-replicated-topic      Partition: 0    Leader: 1       
Replicas: 1,2,0 Isr: 1,2,0
 </pre>
-Here is an explanation of output. The first line gives a summary of all the 
partitions, each additional line gives information about one partition. Since 
we have only one partition for this topic there is only one line.
+<p>Here is an explanation of output. The first line gives a summary of all the 
partitions, each additional line gives information about one partition. Since 
we have only one partition for this topic there is only one line.</p>
 <ul>
   <li>"leader" is the node responsible for all reads and writes for the given 
partition. Each node will be the leader for a randomly selected portion of the 
partitions.
   <li>"replicas" is the list of nodes that replicate the log for this 
partition regardless of whether they are the leader or even if they are 
currently alive.
   <li>"isr" is the set of "in-sync" replicas. This is the subset of the 
replicas list that is currently alive and caught-up to the leader.
 </ul>
-Note that in my example node 1 is the leader for the only partition of the 
topic.
+<p>Note that in my example node 1 is the leader for the only partition of the 
topic.</p>
 <p>
 We can run the same command on the original topic we created to see where it 
is:
+</p>
 <pre>
 &gt; <b>bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic 
test</b>
 Topic:test     PartitionCount:1        ReplicationFactor:1     Configs:
        Topic: test     Partition: 0    Leader: 0       Replicas: 0     Isr: 0
 </pre>
-So there is no surprise there&mdash;the original topic has no replicas and is 
on server 0, the only server in our cluster when we created it.
+<p>So there is no surprise there&mdash;the original topic has no replicas and 
is on server 0, the only server in our cluster when we created it.</p>
 <p>
 Let's publish a few messages to our new topic:
+</p>
 <pre>
 &gt; <b>bin/kafka-console-producer.sh --broker-list localhost:9092 --topic 
my-replicated-topic</b>
 ...
@@ -158,7 +167,7 @@ Let's publish a few messages to our new topic:
 <b>my test message 2</b>
 <b>^C</b>
 </pre>
-Now let's consume these messages:
+<p>Now let's consume these messages:</p>
 <pre>
 &gt; <b>bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 
--from-beginning --topic my-replicated-topic</b>
 ...
@@ -167,7 +176,7 @@ my test message 2
 <b>^C</b>
 </pre>
 
-Now let's test out fault-tolerance. Broker 1 was acting as the leader so let's 
kill it:
+<p>Now let's test out fault-tolerance. Broker 1 was acting as the leader so 
let's kill it:</p>
 <pre>
 &gt; <b>ps aux | grep server-1.properties</b>
 <i>7564</i> ttys002    0:15.91 
/System/Library/Frameworks/JavaVM.framework/Versions/1.8/Home/bin/java...
@@ -181,13 +190,14 @@ java.exe    java  -Xmx1G -Xms1G -server -XX:+UseG1GC ... 
build\libs\kafka_2.10-0
 &gt; <b>taskkill /pid 644 /f</b>
 </pre>
 
-Leadership has switched to one of the slaves and node 1 is no longer in the 
in-sync replica set:
+<p>Leadership has switched to one of the slaves and node 1 is no longer in the 
in-sync replica set:</p>
+
 <pre>
 &gt; <b>bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic 
my-replicated-topic</b>
 Topic:my-replicated-topic      PartitionCount:1        ReplicationFactor:3     
Configs:
        Topic: my-replicated-topic      Partition: 0    Leader: 2       
Replicas: 1,2,0 Isr: 2,0
 </pre>
-But the messages are still be available for consumption even though the leader 
that took the writes originally is down:
+<p>But the messages are still be available for consumption even though the 
leader that took the writes originally is down:</p>
 <pre>
 &gt; <b>bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 
--from-beginning --topic my-replicated-topic</b>
 ...
@@ -199,40 +209,45 @@ my test message 2
 
 <h4><a id="quickstart_kafkaconnect" href="#quickstart_kafkaconnect">Step 7: 
Use Kafka Connect to import/export data</a></h4>
 
-Writing data from the console and writing it back to the console is a 
convenient place to start, but you'll probably want
+<p>Writing data from the console and writing it back to the console is a 
convenient place to start, but you'll probably want
 to use data from other sources or export data from Kafka to other systems. For 
many systems, instead of writing custom
-integration code you can use Kafka Connect to import or export data.
+integration code you can use Kafka Connect to import or export data.</p>
 
-Kafka Connect is a tool included with Kafka that imports and exports data to 
Kafka. It is an extensible tool that runs
+<p>Kafka Connect is a tool included with Kafka that imports and exports data 
to Kafka. It is an extensible tool that runs
 <i>connectors</i>, which implement the custom logic for interacting with an 
external system. In this quickstart we'll see
 how to run Kafka Connect with simple connectors that import data from a file 
to a Kafka topic and export data from a
-Kafka topic to a file.
+Kafka topic to a file.</p>
 
-First, we'll start by creating some seed data to test with:
+<p>First, we'll start by creating some seed data to test with:</p>
 
 <pre>
 &gt; <b>echo -e "foo\nbar" > test.txt</b>
 </pre>
 
-Next, we'll start two connectors running in <i>standalone</i> mode, which 
means they run in a single, local, dedicated
+<p>Next, we'll start two connectors running in <i>standalone</i> mode, which 
means they run in a single, local, dedicated
 process. We provide three configuration files as parameters. The first is 
always the configuration for the Kafka Connect
 process, containing common configuration such as the Kafka brokers to connect 
to and the serialization format for data.
 The remaining configuration files each specify a connector to create. These 
files include a unique connector name, the connector
-class to instantiate, and any other configuration required by the connector.
+class to instantiate, and any other configuration required by the 
connector.</p>
 
 <pre>
 &gt; <b>bin/connect-standalone.sh config/connect-standalone.properties 
config/connect-file-source.properties config/connect-file-sink.properties</b>
 </pre>
 
+<p>
 These sample configuration files, included with Kafka, use the default local 
cluster configuration you started earlier
 and create two connectors: the first is a source connector that reads lines 
from an input file and produces each to a Kafka topic
 and the second is a sink connector that reads messages from a Kafka topic and 
produces each as a line in an output file.
+</p>
 
+<p>
 During startup you'll see a number of log messages, including some indicating 
that the connectors are being instantiated.
-Once the Kafka Connect process has started, the source connector should start 
reading lines from <pre>test.txt</pre> and
-producing them to the topic <pre>connect-test</pre>, and the sink connector 
should start reading messages from the topic <pre>connect-test</pre>
-and write them to the file <pre>test.sink.txt</pre>. We can verify the data 
has been delivered through the entire pipeline
+Once the Kafka Connect process has started, the source connector should start 
reading lines from <code>test.txt</code> and
+producing them to the topic <code>connect-test</code>, and the sink connector 
should start reading messages from the topic <code>connect-test</code>
+and write them to the file <code>test.sink.txt</code>. We can verify the data 
has been delivered through the entire pipeline
 by examining the contents of the output file:
+</p>
+
 
 <pre>
 &gt; <b>cat test.sink.txt</b>
@@ -240,8 +255,11 @@ foo
 bar
 </pre>
 
-Note that the data is being stored in the Kafka topic <pre>connect-test</pre>, 
so we can also run a console consumer to see the
+<p>
+Note that the data is being stored in the Kafka topic 
<code>connect-test</code>, so we can also run a console consumer to see the
 data in the topic (or use custom consumer code to process it):
+</p>
+
 
 <pre>
 &gt; <b>bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 
--topic connect-test --from-beginning</b>
@@ -250,13 +268,13 @@ data in the topic (or use custom consumer code to process 
it):
 ...
 </pre>
 
-The connectors continue to process data, so we can add data to the file and 
see it move through the pipeline:
+<p>The connectors continue to process data, so we can add data to the file and 
see it move through the pipeline:</p>
 
 <pre>
 &gt; <b>echo "Another line" >> test.txt</b>
 </pre>
 
-You should see the line appear in the console consumer output and in the sink 
file.
+<p>You should see the line appear in the console consumer output and in the 
sink file.</p>
 
 <h4><a id="quickstart_kafkastreams" href="#quickstart_kafkastreams">Step 8: 
Use Kafka Streams to process data</a></h4>
 
@@ -379,8 +397,8 @@ an updated count of a single word, aka record key such as 
"kafka". For multiple
 </p>
 
 <p>
-Now you can write more input messages to the <b>streams-file-input</b> topic 
and observe additional messages added 
-to <b>streams-wordcount-output</b> topic, reflecting updated word counts 
(e.g., using the console producer and the 
+Now you can write more input messages to the <b>streams-file-input</b> topic 
and observe additional messages added
+to <b>streams-wordcount-output</b> topic, reflecting updated word counts 
(e.g., using the console producer and the
 console consumer, as described above).
 </p>
 

http://git-wip-us.apache.org/repos/asf/kafka/blob/bf98c473/docs/uses.html
----------------------------------------------------------------------
diff --git a/docs/uses.html b/docs/uses.html
index 6214ee6..b86d917 100644
--- a/docs/uses.html
+++ b/docs/uses.html
@@ -15,7 +15,7 @@
  limitations under the License.
 -->
 
-Here is a description of a few of the popular use cases for Apache Kafka. For 
an overview of a number of these areas in action, see <a 
href="http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying";>this
 blog post</a>.
+<p> Here is a description of a few of the popular use cases for Apache Kafka. 
For an overview of a number of these areas in action, see <a 
href="http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying";>this
 blog post</a>. </p>
 
 <h4><a id="uses_messaging" href="#uses_messaging">Messaging</a></h4>

kafka git commit: KAFKA-4244; Fix formatting issues in documentation

Reply via email to