[2/2] kafka git commit: KAFKA-2809; Improve documentation linking

junrao Mon, 16 Nov 2015 14:15:13 -0800

KAFKA-2809; Improve documentation linking

Often it is useful to link to a specific header within the documentation. 
Especially when referencing docs in the mailing lists.


This adds anchors and links for all headers in the docs.

Author: Grant Henke <[email protected]>

Reviewers: Jun Rao <[email protected]>

Closes #498 from granthenke/doc-links


Project: http://git-wip-us.apache.org/repos/asf/kafka/repo
Commit: http://git-wip-us.apache.org/repos/asf/kafka/commit/6cbd9759
Tree: http://git-wip-us.apache.org/repos/asf/kafka/tree/6cbd9759
Diff: http://git-wip-us.apache.org/repos/asf/kafka/diff/6cbd9759

Branch: refs/heads/trunk
Commit: 6cbd97597ccf456a4f01f19553da5a03e12c9366
Parents: 5fc4546
Author: Grant Henke <[email protected]>
Authored: Mon Nov 16 14:14:17 2015 -0800
Committer: Jun Rao <[email protected]>
Committed: Mon Nov 16 14:14:17 2015 -0800

----------------------------------------------------------------------
 docs/api.html            |  8 ++--
 docs/configuration.html  | 14 +++----
 docs/connect.html        | 34 +++++++--------
 docs/design.html         | 72 ++++++++++++++++----------------
 docs/documentation.html  | 16 ++++----
 docs/ecosystem.html      |  6 +--
 docs/implementation.html | 55 ++++++++++++-------------
 docs/introduction.html   | 20 ++++-----
 docs/migration.html      |  8 ++--
 docs/ops.html            | 96 +++++++++++++++++++++----------------------
 docs/quickstart.html     | 34 +++++++--------
 docs/security.html       | 40 +++++++++---------
 docs/upgrade.html        | 12 +++---
 docs/uses.html           | 20 ++++-----
 14 files changed, 217 insertions(+), 218 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kafka/blob/6cbd9759/docs/api.html
----------------------------------------------------------------------
diff --git a/docs/api.html b/docs/api.html
index 9b739da..8d79b20 100644
--- a/docs/api.html
+++ b/docs/api.html
@@ -17,7 +17,7 @@
 
 Apache Kafka includes new java clients (in the org.apache.kafka.clients 
package). These are meant to supplant the older Scala clients, but for 
compatability they will co-exist for some time. These clients are available in 
a seperate jar with minimal dependencies, while the old Scala clients remain 
packaged with the server.
 
-<h3><a id="producerapi">2.1 Producer API</a></h3>
+<h3><a id="producerapi" href="#producerapi">2.1 Producer API</a></h3>
 
 We encourage all new development to use the new Java producer. This client is 
production tested and generally both faster and more fully featured than the 
previous Scala client. You can use this client by adding a dependency on the 
client jar using the following example maven co-ordinates (you can change the 
version numbers with new releases):
 <pre>
@@ -36,7 +36,7 @@ For those interested in the legacy Scala producer api, 
information can be found
 here</a>.
 </p>
 
-<h3><a id="highlevelconsumerapi">2.2 High Level Consumer API</a></h3>
+<h3><a id="highlevelconsumerapi" href="#highlevelconsumerapi">2.2 High Level 
Consumer API</a></h3>
 <pre>
 class Consumer {
   /**
@@ -108,7 +108,7 @@ public interface kafka.javaapi.consumer.ConsumerConnector {
 </pre>
 You can follow
 <a 
href="https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example"; 
title="Kafka 0.8 consumer example">this example</a> to learn how to use the 
high level consumer api.
-<h3><a id="simpleconsumerapi">2.3 Simple Consumer API</a></h3>
+<h3><a id="simpleconsumerapi" href="#simpleconsumerapi">2.3 Simple Consumer 
API</a></h3>
 <pre>
 class kafka.javaapi.consumer.SimpleConsumer {
   /**
@@ -144,7 +144,7 @@ class kafka.javaapi.consumer.SimpleConsumer {
 For most applications, the high level consumer Api is good enough. Some 
applications want features not exposed to the high level consumer yet (e.g., 
set initial offset when restarting the consumer). They can instead use our low 
level SimpleConsumer Api. The logic will be a bit more complicated and you can 
follow the example in
 <a 
href="https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example";
 title="Kafka 0.8 SimpleConsumer example">here</a>.
 
-<h3><a id="newconsumerapi">2.4 New Consumer API</a></h3>
+<h3><a id="newconsumerapi" href="#newconsumerapi">2.4 New Consumer API</a></h3>
 As of the 0.9.0 release we have added a replacement for our existing simple 
and high-level consumers. This client is considered beta quality. You can use 
this client by adding a dependency on the client jar using the following 
example maven co-ordinates (you can change the version numbers with new 
releases):
 <pre>
        &lt;dependency&gt;

http://git-wip-us.apache.org/repos/asf/kafka/blob/6cbd9759/docs/configuration.html
----------------------------------------------------------------------
diff --git a/docs/configuration.html b/docs/configuration.html
index abaff63..2dfc757 100644
--- a/docs/configuration.html
+++ b/docs/configuration.html
@@ -17,7 +17,7 @@
 
 Kafka uses key-value pairs in the <a 
href="http://en.wikipedia.org/wiki/.properties";>property file format</a> for 
configuration. These values can be supplied either from a file or 
programmatically.
 
-<h3><a id="brokerconfigs">3.1 Broker Configs</a></h3>
+<h3><a id="brokerconfigs" href="#brokerconfigs">3.1 Broker Configs</a></h3>
 
 The essential configurations are the following:
 <ul>
@@ -32,7 +32,7 @@ Topic-level configurations and defaults are discussed in more 
detail <a href="#t
 
 <p>More details about broker configuration can be found in the scala class 
<code>kafka.server.KafkaConfig</code>.</p>
 
-<a id="topic-config">Topic-level configuration</a>
+<a id="topic-config" href="#topic-config">Topic-level configuration</a>
 
 Configurations pertinent to topics have both a global default as well an 
optional per-topic override. If no per-topic configuration is given the global 
default is used. The override can be set at topic creation time by giving one 
or more <code>--config</code> options. This example creates a topic named 
<i>my-topic</i> with a custom max message size and flush rate:
 <pre>
@@ -147,7 +147,7 @@ The following are the topic-level configurations. The 
server's default configura
     </tr>
 </table>
 
-<h3><a id="producerconfigs">3.2 Producer Configs</a></h3>
+<h3><a id="producerconfigs" href="#producerconfigs">3.2 Producer 
Configs</a></h3>
 
 Below is the configuration of the Java producer:
 <!--#include virtual="producer_config.html" -->
@@ -157,7 +157,7 @@ Below is the configuration of the Java producer:
     here</a>.
 </p>
 
-<h3><a id="consumerconfigs">3.3 Consumer Configs</a></h3>
+<h3><a id="consumerconfigs" href="#consumerconfigs">3.3 Consumer 
Configs</a></h3>
 The essential consumer configurations are the following:
 <ul>
         <li><code>group.id</code>
@@ -327,9 +327,9 @@ The essential consumer configurations are the following:
 
 <p>More details about consumer configuration can be found in the scala class 
<code>kafka.consumer.ConsumerConfig</code>.</p>
 
-<h3><a id="newconsumerconfigs">3.4 New Consumer Configs</a></h3>
+<h3><a id="newconsumerconfigs" href="#newconsumerconfigs">3.4 New Consumer 
Configs</a></h3>
 Since 0.9.0.0 we have been working on a replacement for our existing simple 
and high-level consumers. The code can be considered beta quality. Below is the 
configuration for the new consumer:
 <!--#include virtual="consumer_config.html" -->
 
-<h3><a id="connectconfigs">3.5 Kafka Connect Configs</a></h3>
-<!--#include virtual="connect_config.html" -->
\ No newline at end of file
+<h3><a id="connectconfigs" href="#connectconfigs">3.5 Kafka Connect 
Configs</a></h3>
+<!--#include virtual="connect_config.html" -->

http://git-wip-us.apache.org/repos/asf/kafka/blob/6cbd9759/docs/connect.html
----------------------------------------------------------------------
diff --git a/docs/connect.html b/docs/connect.html
index 8791ab0..0a1a867 100644
--- a/docs/connect.html
+++ b/docs/connect.html
@@ -15,7 +15,7 @@
   ~ limitations under the License.
   ~-->
 
-<h3><a id="connect_overview">8.1 Overview</a></h3>
+<h3><a id="connect_overview" href="#connect_overview">8.1 Overview</a></h3>
 
 Kafka Connect is a tool for scalably and reliably streaming data between 
Apache Kafka and other systems. It makes it simple to quickly define 
<i>connectors</i> that move large collections of data into and out of Kafka. 
Kafka Connect can ingest entire databases or collect metrics from all your 
application servers into Kafka topics, making the data available for stream 
processing with low latency. An export job can deliver data from Kafka topics 
into secondary storage and query systems or into batch systems for offline 
analysis.
 
@@ -29,11 +29,11 @@ Kafka Connect features include:
     <li><b>Streaming/batch integration</b> - leveraging Kafka's existing 
capabilities, Kafka Connect is an ideal solution for bridging streaming and 
batch data systems</li>
 </ul>
 
-<h3><a id="connect_user">8.2 User Guide</a></h3>
+<h3><a id="connect_user" href="#connect_user">8.2 User Guide</a></h3>
 
 The quickstart provides a brief example of how to run a standalone version of 
Kafka Connect. This section describes how to configure, run, and manage Kafka 
Connect in more detail.
 
-<h4>Running Kafka Connect</h4>
+<h4><a id="connect_running" href="#connect_running">Running Kafka 
Connect</a></h4>
 
 Kafka Connect currently supports two modes of execution: standalone (single 
process) and distributed.
 
@@ -64,7 +64,7 @@ The difference is in the class which is started and the 
configuration parameters
 Note that in distributed mode the connector configurations are not passed on 
the command line. Instead, use the REST API described below to create, modify, 
and destroy connectors.
 
 
-<h4>Configuring Connectors</h4>
+<h4><a id="connect_configuring" href="#connect_configuring">Configuring 
Connectors</a></h4>
 
 Connector configurations are simple key-value mappings. For standalone mode 
these are defined in a properties file and passed to the Connect process on the 
command line. In distributed mode, they will be included in the JSON payload 
for the request that creates (or modifies) the connector.
 
@@ -84,7 +84,7 @@ Sink connectors also have one additional option to control 
their input:
 For any other options, you should consult the documentation for the connector.
 
 
-<h4>REST API</h4>
+<h4><a id="connect_rest" href="#connect_rest">REST API</a></h4>
 
 Since Kafka Connect is intended to be run as a service, it also supports a 
REST API for managing connectors. By default this service runs on port 8083. 
The following are the currently supported endpoints:
 
@@ -98,13 +98,13 @@ Since Kafka Connect is intended to be run as a service, it 
also supports a REST
     <li><code>DELETE /connectors/{name}</code> - delete a connector, halting 
all tasks and deleting its configuration</li>
 </ul>
 
-<h3><a id="connect_development">8.3 Connector Development Guide</a></h3>
+<h3><a id="connect_development" href="#connect_development">8.3 Connector 
Development Guide</a></h3>
 
 This guide describes how developers can write new connectors for Kafka Connect 
to move data between Kafka and other systems. It briefly reviews a few key 
concepts and then describes how to create a simple connector.
 
-<h4>Core Concepts and APIs</h4>
+<h4><a id="connect_concepts" href="#connect_concepts">Core Concepts and 
APIs</a></h4>
 
-<h5>Connectors and Tasks</h5>
+<h5><a id="connect_connectorsandtasks" 
href="#connect_connectorsandtasks">Connectors and Tasks</a></h5>
 
 To copy data between Kafka and another system, users create a 
<code>Connector</code> for the system they want to pull data from or push data 
to. Connectors come in two flavors: <code>SourceConnectors</code> import data 
from another system (e.g. <code>JDBCSourceConnector</code> would import a 
relational database into Kafka) and <code>SinkConnectors</code> export data 
(e.g. <code>HDFSSinkConnector</code> would export the contents of a Kafka topic 
to an HDFS file).
 
@@ -113,24 +113,24 @@ To copy data between Kafka and another system, users 
create a <code>Connector</c
 With an assignment in hand, each <code>Task</code> must copy its subset of the 
data to or from Kafka. In Kafka Connect, it should always be possible to frame 
these assignments as a set of input and output streams consisting of records 
with consistent schemas. Sometimes this mapping is obvious: each file in a set 
of log files can be considered a stream with each parsed line forming a record 
using the same schema and offsets stored as byte offsets in the file. In other 
cases it may require more effort to map to this model: a JDBC connector can map 
each table to a stream, but the offset is less clear. One possible mapping uses 
a timestamp column to generate queries incrementally returning new data, and 
the last queried timestamp can be used as the offset.
 
 
-<h5>Streams and Records</h5>
+<h5><a id="connect_streamsandrecords" 
href="#connect_streamsandrecords">Streams and Records</a></h5>
 
 Each stream should be a sequence of key-value records. Both the keys and 
values can have complex structure -- many primitive types are provided, but 
arrays, objects, and nested data structures can be represented as well. The 
runtime data format does not assume any particular serialization format; this 
conversion is handled internally by the framework.
 
 In addition to the key and value, records (both those generated by sources and 
those delivered to sinks) have associated stream IDs and offsets. These are 
used by the framework to periodically commit the offsets of data that have been 
processed so that in the event of failures, processing can resume from the last 
committed offsets, avoiding unnecessary reprocessing and duplication of events.
 
-<h5>Dynamic Connectors</h5>
+<h5><a id="connect_dynamicconnectors" 
href="#connect_dynamicconnectors">Dynamic Connectors</a></h5>
 
 Not all jobs are static, so <code>Connector</code> implementations are also 
responsible for monitoring the external system for any changes that might 
require reconfiguration. For example, in the <code>JDBCSourceConnector</code> 
example, the <code>Connector</code> might assign a set of tables to each 
<code>Task</code>. When a new table is created, it must discover this so it can 
assign the new table to one of the <code>Tasks</code> by updating its 
configuration. When it notices a change that requires reconfiguration (or a 
change in the number of <code>Tasks</code>), it notifies the framework and the 
framework updates anycorresponding <code>Tasks</code>.
 
 
-<h4>Developing a Simple Connector</h4>
+<h4><a id="connect_developing" href="#connect_developing">Developing a Simple 
Connector</a></h4>
 
 Developing a connector only requires implementing two interfaces, the 
<code>Connector</code> and <code>Task</code>. A simple example is included with 
the source code for Kafka in the <code>file</code> package. This connector is 
meant for use in standalone mode and has implementations of a 
<code>SourceConnector</code>/<code>SourceTask</code> to read each line of a 
file and emit it as a record and a 
<code>SinkConnector</code>/<code>SinkTask</code> that writes each record to a 
file.
 
 The rest of this section will walk through some code to demonstrate the key 
steps in creating a connector, but developers should also refer to the full 
example source code as many details are omitted for brevity.
 
-<h5>Connector Example</h5>
+<h5><a id="connect_connectorexample" 
href="#connect_connectorexample">Connector Example</a></h5>
 
 We'll cover the <code>SourceConnector</code> as a simple example. 
<code>SinkConnector</code> implementations are very similar. Start by creating 
the class that inherits from <code>SourceConnector</code> and add a couple of 
fields that will store parsed configuration information (the filename to read 
from and the topic to send data to):
 
@@ -187,7 +187,7 @@ Even with multiple tasks, this method implementation is 
usually pretty simple. I
 
 Note that this simple example does not include dynamic input. See the 
discussion in the next section for how to trigger updates to task configs.
 
-<h5>Task Example - Source Task</h5>
+<h5><a id="connect_taskexample" href="#connect_taskexample">Task Example - 
Source Task</a></h5>
 
 Next we'll describe the implementation of the corresponding 
<code>SourceTask</code>. The implementation is short, but too long to cover 
completely in this guide. We'll use pseudo-code to describe most of the 
implementation, but you can refer to the source code for the full example.
 
@@ -244,7 +244,7 @@ Again, we've omitted some details, but we can see the 
important steps: the <code
 
 Note that this implementation uses the normal Java 
<code>InputStream</code>interface and may sleep if data is not avaiable. This 
is acceptable because Kafka Connect provides each task with a dedicated thread. 
While task implementations have to conform to the basic 
<code>poll()</code>interface, they have a lot of flexibility in how they are 
implemented. In this case, an NIO-based implementation would be more efficient, 
but this simple approach works, is quick to implement, and is compatible with 
older versions of Java.
 
-<h5>Sink Tasks</h5>
+<h5><a id="connect_sinktasks" href="#connect_sinktasks">Sink Tasks</a></h5>
 
 The previous section described how to implement a simple 
<code>SourceTask</code>. Unlike <code>SourceConnector</code>and 
<code>SinkConnector</code>, <code>SourceTask</code>and 
<code>SinkTask</code>have very different interfaces because 
<code>SourceTask</code>uses a pull interface and <code>SinkTask</code>uses a 
push interface. Both share the common lifecycle methods, but the 
<code>SinkTask</code>interface is quite different:
 
@@ -263,7 +263,7 @@ The <code>flush()</code>method is used during the offset 
commit process, which a
 delivery. For example, an HDFS connector could do this and use atomic move 
operations to make sure the <code>flush()</code>operation atomically commits 
the data and offsets to a final location in HDFS.
 
 
-<h5>Resuming from Previous Offsets</h5>
+<h5><a id="connect_resuming" href="#connect_resuming">Resuming from Previous 
Offsets</a></h5>
 
 The <code>SourceTask</code>implementation included a stream ID (the input 
filename) and offset (position in the file) with each record. The framework 
uses this to commit offsets periodically so that in the case of a failure, the 
task can recover and minimize the number of events that are reprocessed and 
possibly duplicated (or to resume from the most recent offset if Kafka Connect 
was stopped gracefully, e.g. in standalone mode or due to a job 
reconfiguration). This commit process is completely automated by the framework, 
but only the connector knows how to seek back to the right position in the 
input stream to resume from that location.
 
@@ -281,7 +281,7 @@ To correctly resume upon startup, the task can use the 
<code>SourceContext</code
 
 Of course, you might need to read many keys for each of the input streams. The 
<code>OffsetStorageReader</code> interface also allows you to issue bulk reads 
to efficiently load all offsets, then apply them by seeking each input stream 
to the appropriate position.
 
-<h4>Dynamic Input/Output Streams</h4>
+<h4><a id="connect_dynamicio" href="#connect_dynamicio">Dynamic Input/Output 
Streams</a></h4>
 
 Kafka Connect is intended to define bulk data copying jobs, such as copying an 
entire database rather than creating many jobs to copy each table individually. 
One consequence of this design is that the set of input or output streams for a 
connector can vary over time.
 
@@ -299,7 +299,7 @@ Ideally this code for monitoring changes would be isolated 
to the <code>Connecto
 
 <code>SinkConnectors</code> usually only have to handle the addition of 
streams, which may translate to new entries in their outputs (e.g., a new 
database table). The framework manages any changes to the Kafka input, such as 
when the set of input topics changes because of a regex subscription. 
<code>SinkTasks</code>should expect new input streams, which may require 
creating new resources in the downstream system, such as a new table in a 
database. The trickiest situation to handle in these cases may be conflicts 
between multiple <code>SinkTasks</code>seeing a new input stream for the first 
time and simultaneoulsy trying to create the new resource. 
<code>SinkConnectors</code>, on the other hand, will generally require no 
special code for handling a dynamic set of streams.
 
-<h4>Working with Schemas</h4>
+<h4><a id="connect_schemas" href="#connect_schemas">Working with 
Schemas</a></h4>
 
 The FileStream connectors are good examples because they are simple, but they 
also have trivially structured data -- each line is just a string. Almost all 
practical connectors will need schemas with more complex data formats.
 

http://git-wip-us.apache.org/repos/asf/kafka/blob/6cbd9759/docs/design.html
----------------------------------------------------------------------
diff --git a/docs/design.html b/docs/design.html
index 347f602..5d3090c 100644
--- a/docs/design.html
+++ b/docs/design.html
@@ -5,9 +5,9 @@
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at
- 
+
     http://www.apache.org/licenses/LICENSE-2.0
- 
+
  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -15,7 +15,7 @@
  limitations under the License.
 -->
 
-<h3><a id="majordesignelements">4.1 Motivation</a></h3>
+<h3><a id="majordesignelements" href="#majordesignelements">4.1 
Motivation</a></h3>
 <p>
 We designed Kafka to be able to act as a unified platform for handling all the 
real-time data feeds <a href="#introduction">a large company might have</a>. To 
do this we had to think through a fairly broad set of use cases.
 <p>
@@ -31,8 +31,8 @@ Finally in cases where the stream is fed into other data 
systems for serving we
 <p>
 Supporting these uses led use to a design with a number of unique elements, 
more akin to a database log then a traditional messaging system. We will 
outline some elements of the design in the following sections.
 
-<h3><a id="persistence">4.2 Persistence</a></h3>
-<h4>Don't fear the filesystem!</h4>
+<h3><a id="persistence" href="#persistence">4.2 Persistence</a></h3>
+<h4><a id="design_filesystem" href="#design_filesystem">Don't fear the 
filesystem!</a></h4>
 <p>
 Kafka relies heavily on the filesystem for storing and caching messages. There 
is a general perception that "disks are slow" which makes people skeptical that 
a persistent structure can offer competitive performance. In fact disks are 
both much slower and much faster than people expect depending on how they are 
used; and a properly designed disk structure can often be as fast as the 
network.
 <p>
@@ -52,7 +52,7 @@ This suggests a design which is very simple: rather than 
maintain as much as pos
 <p>
 This style of pagecache-centric design is described in an <a 
href="http://varnish.projects.linpro.no/wiki/ArchitectNotes";>article</a> on the 
design of Varnish here (along with a healthy dose of arrogance).
 
-<h4>Constant Time Suffices</h4>
+<h4><a id="design_constanttime" href="#design_constanttime">Constant Time 
Suffices</a></h4>
 <p>
 The persistent data structure used in messaging systems are often a 
per-consumer queue with an associated BTree or other general-purpose random 
access data structures to maintain metadata about messages. BTrees are the most 
versatile data structure available, and make it possible to support a wide 
variety of transactional and non-transactional semantics in the messaging 
system. They do come with a fairly high cost, though: Btree operations are 
O(log N). Normally O(log N) is considered essentially equivalent to constant 
time, but this is not true for disk operations. Disk seeks come at 10 ms a pop, 
and each disk can do only one seek at a time so parallelism is limited. Hence 
even a handful of disk seeks leads to very high overhead. Since storage systems 
mix very fast cached operations with very slow physical disk operations, the 
observed performance of tree structures is often superlinear as data increases 
with fixed cache--i.e. doubling your data makes things much worse then twice a
 s slow.
 <p>
@@ -60,7 +60,7 @@ Intuitively a persistent queue could be built on simple reads 
and appends to fil
 <p>
 Having access to virtually unlimited disk space without any performance 
penalty means that we can provide some features not usually found in a 
messaging system. For example, in Kafka, instead of attempting to deleting 
messages as soon as they are consumed, we can retain messages for a relative 
long period (say a week). This leads to a great deal of flexibility for 
consumers, as we will describe.
 
-<h3><a id="maximizingefficiency">4.3 Efficiency</a></h3>
+<h3><a id="maximizingefficiency" href="#maximizingefficiency">4.3 
Efficiency</a></h3>
 <p>
 We have put significant effort into efficiency. One of our primary use cases 
is handling web activity data, which is very high volume: each page view may 
generate dozens of writes. Furthermore we assume each message published is read 
by at least one consumer (often many), hence we strive to make consumption as 
cheap as possible.
 <p>
@@ -74,7 +74,7 @@ To avoid this, our protocol is built around a "message set" 
abstraction that nat
 <p>
 This simple optimization produces orders of magnitude speed up. Batching leads 
to larger network packets, larger sequential disk operations, contiguous memory 
blocks, and so on, all of which allows Kafka to turn a bursty stream of random 
message writes into linear writes that flow to the consumers.
 <p>
-The other inefficiency is in byte copying. At low message rates this is not an 
issue, but under load the impact is significant. To avoid this we employ a 
standardized binary message format that is shared by the producer, the broker, 
and the consumer (so data chunks can be transferred without modification 
between them). 
+The other inefficiency is in byte copying. At low message rates this is not an 
issue, but under load the impact is significant. To avoid this we employ a 
standardized binary message format that is shared by the producer, the broker, 
and the consumer (so data chunks can be transferred without modification 
between them).
 <p>
 The message log maintained by the broker is itself just a directory of files, 
each populated by a sequence of message sets that have been written to disk in 
the same format used by the producer and consumer. Maintaining this common 
format allows optimization of the most important operation: network transfer of 
persistent log chunks. Modern unix operating systems offer a highly optimized 
code path for transferring data out of pagecache to a socket; in Linux this is 
done with the <a 
href="http://man7.org/linux/man-pages/man2/sendfile.2.html";>sendfile system 
call</a>.
 <p>
@@ -94,7 +94,7 @@ This combination of pagecache and sendfile means that on a 
Kafka cluster where t
 <p>
 For more background on the sendfile and zero-copy support in Java, see this <a 
href="http://www.ibm.com/developerworks/linux/library/j-zerocopy";>article</a>.
 
-<h4>End-to-end Batch Compression</h4>
+<h4><a id="design_compression" href="#design_compression">End-to-end Batch 
Compression</a></h4>
 <p>
 In some cases the bottleneck is actually not CPU or disk but network 
bandwidth. This is particularly true for a data pipeline that needs to send 
messages between data centers over a wide-area network. Of course the user can 
always compress its messages one at a time without any support needed from 
Kafka, but this can lead to very poor compression ratios as much of the 
redundancy is due to repetition between messages of the same type (e.g. field 
names in JSON or user agents in web logs or common string values). Efficient 
compression requires compressing multiple messages together rather than 
compressing each message individually.
 <p>
@@ -102,25 +102,25 @@ Kafka supports this by allowing recursive message sets. A 
batch of messages can
 <p>
 Kafka supports GZIP and Snappy compression protocols. More details on 
compression can be found <a 
href="https://cwiki.apache.org/confluence/display/KAFKA/Compression";>here</a>.
 
-<h3><a id="theproducer">4.4 The Producer</a></h3>
+<h3><a id="theproducer" href="#theproducer">4.4 The Producer</a></h3>
 
-<h4>Load balancing</h4>
+<h4><a id="design_loadbalancing" href="#design_loadbalancing">Load 
balancing</a></h4>
 <p>
 The producer sends data directly to the broker that is the leader for the 
partition without any intervening routing tier. To help the producer do this 
all Kafka nodes can answer a request for metadata about which servers are alive 
and where the leaders for the partitions of a topic are at any given time to 
allow the producer to appropriate direct its requests.
 <p>
 The client controls which partition it publishes messages to. This can be done 
at random, implementing a kind of random load balancing, or it can be done by 
some semantic partitioning function. We expose the interface for semantic 
partitioning by allowing the user to specify a key to partition by and using 
this to hash to a partition (there is also an option to override the partition 
function if need be). For example if the key chosen was a user id then all data 
for a given user would be sent to the same partition. This in turn will allow 
consumers to make locality assumptions about their consumption. This style of 
partitioning is explicitly designed to allow locality-sensitive processing in 
consumers.
 
-<h4>Asynchronous send</h4>
+<h4><a id="design_asyncsend" href="#design_asyncsend">Asynchronous 
send</a></h4>
 <p>
 Batching is one of the big drivers of efficiency, and to enable batching the 
Kafka producer will attempt to accumulate data in memory and to send out larger 
batches in a single request. The batching can be configured to accumulate no 
more than a fixed number of messages and to wait no longer than some fixed 
latency bound (say 64k or 10 ms). This allows the accumulation of more bytes to 
send, and few larger I/O operations on the servers. This buffering is 
configurable and gives a mechanism to trade off a small amount of additional 
latency for better throughput.
 <p>
 Details on <a href="#newproducerconfigs">configuration</a> and <a 
href="http://kafka.apache.org/082/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html";>api</a>
 for the producer can be found elsewhere in the documentation.
 
-<h3><a id="theconsumer">4.5 The Consumer</a></h3>
+<h3><a id="theconsumer" href="#theconsumer">4.5 The Consumer</a></h3>
 
 The Kafka consumer works by issuing "fetch" requests to the brokers leading 
the partitions it wants to consume. The consumer specifies its offset in the 
log with each request and receives back a chunk of log beginning from that 
position. The consumer thus has significant control over this position and can 
rewind it to re-consume data if need be.
 
-<h4>Push vs. pull</h4>
+<h4><a id="design_pull" href="#design_pull">Push vs. pull</a></h4>
 <p>
 An initial question we considered is whether consumers should pull data from 
brokers or brokers should push data to the consumer. In this respect Kafka 
follows a more traditional design, shared by most messaging systems, where data 
is pushed to the broker from the producer and pulled from the broker by the 
consumer. Some logging-centric systems, such as <a 
href="http://github.com/facebook/scribe";>Scribe</a> and <a 
href="http://flume.apache.org/";>Apache Flume</a> follow a very different push 
based path where  data is pushed downstream. There are pros and cons to both 
approaches. However a push-based system has difficulty dealing with diverse 
consumers as the broker controls the rate at which data is transferred. The 
goal is generally for the consumer to be able to consume at the maximum 
possible rate; unfortunately in a push system this means the consumer tends to 
be overwhelmed when its rate of consumption falls below the rate of production 
(a denial of service attack, in essence). 
 A pull-based system has the nicer property that the consumer simply falls 
behind and catches up when it can. This can be mitigated with some kind of 
backoff protocol by which the consumer can indicate it is overwhelmed, but 
getting the rate of transfer to fully utilize (but never over-utilize) the 
consumer is trickier than it seems. Previous attempts at building systems in 
this fashion led us to go with a more traditional pull model.
 <p>
@@ -130,7 +130,7 @@ The deficiency of a naive pull-based system is that if the 
broker has no data th
 <p>
 You could imagine other possible designs which would be only pull, end-to-end. 
The producer would locally write to a local log, and brokers would pull from 
that with consumers pulling from them. A similar type of "store-and-forward" 
producer is often proposed. This is intriguing but we felt not very suitable 
for our target use cases which have thousands of producers. Our experience 
running persistent data systems at scale led us to feel that involving 
thousands of disks in the system across many applications would not actually 
make things more reliable and would be a nightmare to operate. And in practice 
we have found that we can run a pipeline with strong SLAs at large scale 
without a need for producer persistence.
 
-<h4>Consumer Position</h4>
+<h4><a id="design_consumerposition" href="#design_consumerposition">Consumer 
Position</a></h4>
 Keeping track of <i>what</i> has been consumed, is, surprisingly, one of the 
key performance points of a messaging system.
 <p>
 Most messaging systems keep metadata about what messages have been consumed on 
the broker. That is, as a message is handed out to a consumer, the broker 
either records that fact locally immediately or it may wait for acknowledgement 
from the consumer. This is a fairly intuitive choice, and indeed for a single 
machine server it is not clear where else this state could go. Since the data 
structure used for storage in many messaging systems scale poorly, this is also 
a pragmatic choice--since the broker knows what is consumed it can immediately 
delete it, keeping the data size small.
@@ -141,13 +141,13 @@ Kafka handles this differently. Our topic is divided into 
a set of totally order
 <p>
 There is a side benefit of this decision. A consumer can deliberately 
<i>rewind</i> back to an old offset and re-consume data. This violates the 
common contract of a queue, but turns out to be an essential feature for many 
consumers. For example, if the consumer code has a bug and is discovered after 
some messages are consumed, the consumer can re-consume those messages once the 
bug is fixed.
 
-<h4>Offline Data Load</h4>
+<h4><a id="design_offlineload" href="#design_offlineload">Offline Data 
Load</a></h4>
 
 Scalable persistence allows for the possibility of consumers that only 
periodically consume such as batch data loads that periodically bulk-load data 
into an offline system such as Hadoop or a relational data warehouse.
 <p>
 In the case of Hadoop we parallelize the data load by splitting the load over 
individual map tasks, one for each node/topic/partition combination, allowing 
full parallelism in the loading. Hadoop provides the task management, and tasks 
which fail can restart without danger of duplicate data&mdash;they simply 
restart from their original position.
 
-<h3><a id="semantics">4.6 Message Delivery Semantics</a></h3>
+<h3><a id="semantics" href="#semantics">4.6 Message Delivery Semantics</a></h3>
 <p>
 Now that we understand a little about how producers and consumers work, let's 
discuss the semantic guarantees Kafka provides between producer and consumer. 
Clearly there are multiple possible message delivery guarantees that could be 
provided:
 <ul>
@@ -160,7 +160,7 @@ Now that we understand a little about how producers and 
consumers work, let's di
   <li>
     <i>Exactly once</i>&mdash;this is what people actually want, each message 
is delivered once and only once.
   </li>
-</ul>  
+</ul>
 
 It's worth noting that this breaks down into two problems: the durability 
guarantees for publishing a message and the guarantees when consuming a message.
 <p>
@@ -181,7 +181,7 @@ Now let's describe the semantics from the point-of-view of 
the consumer. All rep
 <p>
 So effectively Kafka guarantees at-least-once delivery by default and allows 
the user to implement at most once delivery by disabling retries on the 
producer and committing its offset prior to processing a batch of messages. 
Exactly-once delivery requires co-operation with the destination storage system 
but Kafka provides the offset which makes implementing this straight-forward.
 
-<h3><a id="replication">4.7 Replication</a></h3>
+<h3><a id="replication" href="#replication">4.7 Replication</a></h3>
 <p>
 Kafka replicates the log for each topic's partitions across a configurable 
number of servers (you can set this replication factor on a topic-by-topic 
basis). This allows automatic failover to these replicas when a server in the 
cluster fails so messages remain available in the presence of failures.
 <p>
@@ -206,7 +206,7 @@ The guarantee that Kafka offers is that a committed message 
will not be lost, as
 <p>
 Kafka will remain available in the presence of node failures after a short 
fail-over period, but may not remain available in the presence of network 
partitions.
 
-<h4>Replicated Logs: Quorums, ISRs, and State Machines (Oh my!)</h4>
+<h4><a id="design_replicatedlog" href="#design_replicatedlog">Replicated Logs: 
Quorums, ISRs, and State Machines (Oh my!)</a></h4>
 
 At its heart a Kafka partition is a replicated log. The replicated log is one 
of the most basic primitives in distributed data systems, and there are many 
approaches for implementing one. A replicated log can be used by other systems 
as a primitive for implementing other distributed systems in the <a 
href="http://en.wikipedia.org/wiki/State_machine_replication";>state-machine 
style</a>.
 <p>
@@ -230,7 +230,7 @@ For most use cases we hope to handle, we think this 
tradeoff is a reasonable one
 <p>
 Another important design distinction is that Kafka does not require that 
crashed nodes recover with all their data intact. It is not uncommon for 
replication algorithms in this space to depend on the existence of "stable 
storage" that cannot be lost in any failure-recovery scenario without potential 
consistency violations. There are two primary problems with this assumption. 
First, disk errors are the most common problem we observe in real operation of 
persistent data systems and they often do not leave data intact. Secondly, even 
if this were not a problem, we do not want to require the use of fsync on every 
write for our consistency guarantees as this can reduce performance by two to 
three orders of magnitude. Our protocol for allowing a replica to rejoin the 
ISR ensures that before rejoining, it must fully re-sync again even if it lost 
unflushed data in its crash.
 
-<h4>Unclean leader election: What if they all die?</h4>
+<h4><a id="design_uncleanleader" href="#design_uncleanleader">Unclean leader 
election: What if they all die?</a></h4>
 
 Note that Kafka's guarantee with respect to data loss is predicated on at 
least on replica remaining in sync. If all the nodes replicating a partition 
die, this guarantee no longer holds.
 <p>
@@ -245,10 +245,10 @@ This is a simple tradeoff between availability and 
consistency. If we wait for r
 This dilemma is not specific to Kafka. It exists in any quorum-based scheme. 
For example in a majority voting scheme, if a majority of servers suffer a 
permanent failure, then you must either choose to lose 100% of your data or 
violate consistency by taking what remains on an existing server as your new 
source of truth.
 
 
-<h4>Availability and Durability Guarantees</h4>
+<h4><a id="design_ha" href="#design_ha">Availability and Durability 
Guarantees</a></h4>
 
 When writing to Kafka, producers can choose whether they wait for the message 
to be acknowledged by 0,1 or all (-1) replicas.
-Note that "acknowledgement by all replicas" does not guarantee that the full 
set of assigned replicas have received the message. By default, when 
request.required.acks=-1, acknowledgement happens as soon as all the current 
in-sync replicas have received the message. For example, if a topic is 
configured with only two replicas and one fails (i.e., only one in sync replica 
remains), then writes that specify request.required.acks=-1 will succeed. 
However, these writes could be lost if the remaining replica also fails. 
+Note that "acknowledgement by all replicas" does not guarantee that the full 
set of assigned replicas have received the message. By default, when 
request.required.acks=-1, acknowledgement happens as soon as all the current 
in-sync replicas have received the message. For example, if a topic is 
configured with only two replicas and one fails (i.e., only one in sync replica 
remains), then writes that specify request.required.acks=-1 will succeed. 
However, these writes could be lost if the remaining replica also fails.
 
 Although this ensures maximum availability of the partition, this behavior may 
be undesirable to some users who prefer durability over availability. 
Therefore, we provide two topic-level configurations that can be used to prefer 
message durability over availability:
 <ol>
@@ -258,13 +258,13 @@ This setting offers a trade-off between consistency and 
availability. A higher s
 </ol>
 
 
-<h4>Replica Management</h4>
+<h4><a id="design_replicamanagment" href="#design_replicamanagment">Replica 
Management</a></h4>
 
 The above discussion on replicated logs really covers only a single log, i.e. 
one topic partition. However a Kafka cluster will manage hundreds or thousands 
of these partitions. We attempt to balance partitions within a cluster in a 
round-robin fashion to avoid clustering all partitions for high-volume topics 
on a small number of nodes. Likewise we try to balance leadership so that each 
node is the leader for a proportional share of its partitions.
 <p>
 It is also important to optimize the leadership election process as that is 
the critical window of unavailability. A naive implementation of leader 
election would end up running an election per partition for all partitions a 
node hosted when that node failed. Instead, we elect one of the brokers as the 
"controller". This controller detects failures at the broker level and is 
responsible for changing the leader of all affected partitions in a failed 
broker. The result is that we are able to batch together many of the required 
leadership change notifications which makes the election process far cheaper 
and faster for a large number of partitions. If the controller fails, one of 
the surviving brokers will become the new controller.
 
-<h3><a id="compaction">4.8 Log Compaction</a></h3>
+<h3><a id="compaction" href="#compaction">4.8 Log Compaction</a></h3>
 
 Log compaction ensures that Kafka will always retain at least the last known 
value for each message key within the log of data for a single topic partition. 
 It addresses use cases and scenarios such as restoring state after application 
crashes or system failure, or reloading caches after application restarts 
during operational maintenance. Let's dive into these use cases in more detail 
and then describe how compaction works.
 <p>
@@ -299,10 +299,10 @@ The general idea is quite simple. If we had infinite log 
retention, and we logge
 Log compaction is a mechanism to give finer-grained per-record retention, 
rather than the coarser-grained time-based retention. The idea is to 
selectively remove records where we have a more recent update with the same 
primary key. This way the log is guaranteed to have at least the last state for 
each key.
 <p>
 This retention policy can be set per-topic, so a single cluster can have some 
topics where retention is enforced by size or time and other topics where 
retention is enforced by compaction.
-<p> 
+<p>
 This functionality is inspired by one of LinkedIn's oldest and most successful 
pieces of infrastructure&mdash;a database changelog caching service called <a 
href="https://github.com/linkedin/databus";>Databus</a>. Unlike most 
log-structured storage systems Kafka is built for subscription and organizes 
data for fast linear reads and writes. Unlike Databus, Kafka acts a 
source-of-truth store so it is useful even in situations where the upstream 
data source would not otherwise be replayable.
 
-<h4>Log Compaction Basics</h4>
+<h4><a id="design_compactionbasics" href="#design_compactionbasics">Log 
Compaction Basics</a></h4>
 
 Here is a high-level picture that shows the logical structure of a Kafka log 
with the offset for each message.
 <p>
@@ -316,7 +316,7 @@ The compaction is done in the background by periodically 
recopying log segments.
 <p>
 <img src="images/log_compaction.png">
 <p>
-<h4>What guarantees does log compaction provide?</h4>
+<h4><a id="design_compactionguarantees" 
href="#design_compactionguarantees">What guarantees does log compaction 
provide?</a></h4>
 
 Log compaction guarantees the following:
 <ol>
@@ -327,7 +327,7 @@ Log compaction guarantees the following:
 <li>Any consumer progressing from the start of the log, will see at least the 
<em>final</em> state of all records in the order they were written.  All delete 
markers for deleted records will be seen provided the consumer reaches the head 
of the log in a time period less than the topic's 
<code>delete.retention.ms</code> setting (the default is 24 hours).  This is 
important as delete marker removal happens concurrently with read, and thus it 
is important that we do not remove any delete marker prior to the consumer 
seeing it.
 </ol>
 
-<h4>Log Compaction Details</h4>
+<h4><a id="design_compactiondetails" href="#design_compactiondetails">Log 
Compaction Details</a></h4>
 
 Log compaction is handled by the log cleaner, a pool of background threads 
that recopy log segment files, removing records whose key appears in the head 
of the log. Each compactor thread works as follows:
 <ol>
@@ -337,7 +337,7 @@ Log compaction is handled by the log cleaner, a pool of 
background threads that
 <li>The summary of the log head is essentially just a space-compact hash 
table. It uses exactly 24 bytes per entry. As a result with 8GB of cleaner 
buffer one cleaner iteration can clean around 366GB of log head (assuming 1k 
messages).
 </ol>
 <p>
-<h4>Configuring The Log Cleaner</h4>
+<h4><a id="design_compactionconfig" 
href="#design_compactionconfig">Configuring The Log Cleaner</a></h4>
 
 The log cleaner is disabled by default. To enable it set the server config
   <pre>  log.cleaner.enable=true</pre>
@@ -347,21 +347,21 @@ This can be done either at topic creation time or using 
the alter topic command.
 <p>
 Further cleaner configurations are described <a 
href="/documentation.html#brokerconfigs">here</a>.
 
-<h4>Log Compaction Limitations</h4>
+<h4><a id="design_compactionlimitations" 
href="#design_compactionlimitations">Log Compaction Limitations</a></h4>
 
 <ol>
   <li>You cannot configure yet how much log is retained without compaction 
(the "head" of the log).  Currently all segments are eligible except for the 
last segment, i.e. the one currently being written to.</li>
   <li>Log compaction is not yet compatible with compressed topics.</li>
 </ol>
-<h3><a id="semantics">4.9 Quotas</a></h3>
+<h3><a id="design_quotas" href="#design_quotas">4.9 Quotas</a></h3>
 <p>
     Starting in 0.9, the Kafka cluster has the ability to enforce quotas on 
produce and fetch requests. Quotas are basically byte-rate thresholds defined 
per client-id. A client-id logically identifies an application making a 
request. Hence a single client-id can span multiple producer and consumer 
instances and the quota will apply for all of them as a single entity i.e. if 
client-id="test-client" has a produce quota of 10MB/sec, this is shared across 
all instances with that same id.
 
-<h4>Why are quotas necessary?</h4>
+<h4><a id="design_quotasnecessary" href="#design_quotasnecessary">Why are 
quotas necessary?</a></h4>
 <p>
 It is possible for producers and consumers to produce/consume very high 
volumes of data and thus monopolize broker resources, cause network saturation 
and generally DOS other clients and the brokers themselves. Having quotas 
protects against these issues and is all tbe more important in large 
multi-tenant clusters where a small set of badly behaved clients can degrade 
user experience for the well behaved ones. In fact, when running Kafka as a 
service this even makes it possible to enforce API limits according to an 
agreed upon contract.
 </p>
-<h4>Enforcement</h4>
+<h4><a id="design_quotasenforcement" 
href="#design_quotasenforcement">Enforcement</a></h4>
 <p>
     By default, each unique client-id receives a fixed quota in bytes/sec as 
configured by the cluster (quota.producer.default, quota.consumer.default).
     This quota is defined on a per-broker basis. Each client can publish/fetch 
a maximum of X bytes/sec per broker before it gets throttled. We decided that 
defining these quotas per broker is much better than having a fixed cluster 
wide bandwidth per client because that would require a mechanism to share 
client quota usage among all the brokers. This can be harder to get right than 
the quota implementation itself!
@@ -372,9 +372,9 @@ It is possible for producers and consumers to 
produce/consume very high volumes
 <p>
 Client byte rate is measured over multiple small windows (for e.g. 30 windows 
of 1 second each) in order to detect and correct quota violations quickly. 
Typically, having large measurement windows (for e.g. 10 windows of 30 seconds 
each) leads to large bursts of traffic followed by long delays which is not 
great in terms of user experience.
 </p>
-<h4>Quota overrides</h4>
+<h4><a id="design_quotasoverrides" href="#design_quotasoverrides">Quota 
overrides</a></h4>
 <p>
     It is possible to override the default quota for client-ids that need a 
higher (or even lower) quota. The mechanism is similar to the per-topic log 
config overrides.
     Client-id overrides are written to ZooKeeper under 
<i><b>/config/clients</b></i>. These overrides are read by all brokers and are 
effective immediately. This lets us change quotas without having to do a 
rolling restart of the entire cluster. See <a href="/ops.html#quotas">here</a> 
for details.
 
-</p>
\ No newline at end of file
+</p>

http://git-wip-us.apache.org/repos/asf/kafka/blob/6cbd9759/docs/documentation.html
----------------------------------------------------------------------
diff --git a/docs/documentation.html b/docs/documentation.html
index eddc0c6..53c801e 100644
--- a/docs/documentation.html
+++ b/docs/documentation.html
@@ -131,37 +131,37 @@ Prior releases: <a 
href="/07/documentation.html">0.7.x</a>, <a href="/08/documen
     </li>
 </ul>
 
-<h2><a id="gettingStarted">1. Getting Started</a></h2>
+<h2><a id="gettingStarted" href="#gettingStarted">1. Getting Started</a></h2>
 <!--#include virtual="introduction.html" -->
 <!--#include virtual="uses.html" -->
 <!--#include virtual="quickstart.html" -->
 <!--#include virtual="ecosystem.html" -->
 <!--#include virtual="upgrade.html" -->
 
-<h2><a id="api">2. API</a></h2>
+<h2><a id="api" href="#api">2. API</a></h2>
 
 <!--#include virtual="api.html" -->
 
-<h2><a id="configuration">3. Configuration</a></h2>
+<h2><a id="configuration" href="#configuration">3. Configuration</a></h2>
 
 <!--#include virtual="configuration.html" -->
 
-<h2><a id="design">4. Design</a></h2>
+<h2><a id="design" href="#design">4. Design</a></h2>
 
 <!--#include virtual="design.html" -->
 
-<h2><a id="implementation">5. Implementation</a></h2>
+<h2><a id="implementation" href="#implementation">5. Implementation</a></h2>
 
 <!--#include virtual="implementation.html" -->
 
-<h2><a id="operations">6. Operations</a></h2>
+<h2><a id="operations" href="#operations">6. Operations</a></h2>
 
 <!--#include virtual="ops.html" -->
 
-<h2><a id="security">7. Security</a></h2>
+<h2><a id="security" href="#security">7. Security</a></h2>
 <!--#include virtual="security.html" -->
 
-<h2><a id="connect">8. Kafka Connect</a></h2>
+<h2><a id="connect" href="#connect">8. Kafka Connect</a></h2>
 <!--#include virtual="connect.html" -->
 
 <!--#include virtual="../includes/footer.html" -->

http://git-wip-us.apache.org/repos/asf/kafka/blob/6cbd9759/docs/ecosystem.html
----------------------------------------------------------------------
diff --git a/docs/ecosystem.html b/docs/ecosystem.html
index e99a446..73d5706 100644
--- a/docs/ecosystem.html
+++ b/docs/ecosystem.html
@@ -5,9 +5,9 @@
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at
- 
+
     http://www.apache.org/licenses/LICENSE-2.0
- 
+
  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -15,6 +15,6 @@
  limitations under the License.
 -->
 
-<h3><a id="ecosystem">1.4 Ecosystem</a></h3>
+<h3><a id="ecosystem" href="#ecosystem">1.4 Ecosystem</a></h3>
 
 There are a plethora of tools that integrate with Kafka outside the main 
distribution. The <a 
href="https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem";> ecosystem 
page</a> lists many of these, including stream processing systems, Hadoop 
integration, monitoring, and deployment tools.

http://git-wip-us.apache.org/repos/asf/kafka/blob/6cbd9759/docs/implementation.html
----------------------------------------------------------------------
diff --git a/docs/implementation.html b/docs/implementation.html
index b95d36f..0b603d4 100644
--- a/docs/implementation.html
+++ b/docs/implementation.html
@@ -15,9 +15,9 @@
  limitations under the License.
 -->
 
-<h3><a id="apidesign">5.1 API Design</a></h3>
+<h3><a id="apidesign" href="#apidesign">5.1 API Design</a></h3>
 
-<h4>Producer APIs</h4>
+<h4><a id="impl_producer" href="#impl_producer">Producer APIs</a></h4>
 
 <p>
 The Producer API that wraps the 2 low-level producers - 
<code>kafka.producer.SyncProducer</code> and 
<code>kafka.producer.async.AsyncProducer</code>.
@@ -68,7 +68,7 @@ The partition API uses the key and the number of available 
broker partitions to
 </ul>
 </p>
 
-<h4>Consumer APIs</h4>
+<h4><a id="impl_consumer" href="#impl_consumer">Consumer APIs</a></h4>
 <p>
 We have 2 levels of consumer APIs. The low-level "simple" API maintains a 
connection to a single broker and has a close correspondence to the network 
requests sent to the server. This API is completely stateless, with the offset 
being passed in on every request, allowing the user to maintain this metadata 
however they choose.
 </p>
@@ -76,7 +76,7 @@ We have 2 levels of consumer APIs. The low-level "simple" API 
maintains a connec
 The high-level API hides the details of brokers from the consumer and allows 
consuming off the cluster of machines without concern for the underlying 
topology. It also maintains the state of what has been consumed. The high-level 
API also provides the ability to subscribe to topics that match a filter 
expression (i.e., either a whitelist or a blacklist regular expression).
 </p>
 
-<h5>Low-level API</h5>
+<h5><a id="impl_lowlevel" href="#impl_lowlevel">Low-level API</a></h5>
 <pre>
 class SimpleConsumer {
 
@@ -99,7 +99,7 @@ class SimpleConsumer {
 
 The low-level API is used to implement the high-level API as well as being 
used directly for some of our offline consumers which have particular 
requirements around maintaining state.
 
-<h5>High-level API</h5>
+<h5><a id="impl_highlevel" href="#impl_highlevel">High-level API</a></h5>
 <pre>
 
 /* create a connection to the cluster */
@@ -138,15 +138,15 @@ This API is centered around iterators, implemented by the 
KafkaStream class. Eac
 The createMessageStreams call registers the consumer for the topic, which 
results in rebalancing the consumer/broker assignment. The API encourages 
creating many topic streams in a single call in order to minimize this 
rebalancing. The createMessageStreamsByFilter call (additionally) registers 
watchers to discover new topics that match its filter. Note that each stream 
that createMessageStreamsByFilter returns may iterate over messages from 
multiple topics (i.e., if multiple topics are allowed by the filter).
 </p>
 
-<h3><a id="networklayer">5.2 Network Layer</a></h3>
+<h3><a id="networklayer" href="#networklayer">5.2 Network Layer</a></h3>
 <p>
 The network layer is a fairly straight-forward NIO server, and will not be 
described in great detail. The sendfile implementation is done by giving the 
<code>MessageSet</code> interface a <code>writeTo</code> method. This allows 
the file-backed message set to use the more efficient <code>transferTo</code> 
implementation instead of an in-process buffered write. The threading model is 
a single acceptor thread and <i>N</i> processor threads which handle a fixed 
number of connections each. This design has been pretty thoroughly tested <a 
href="http://sna-projects.com/blog/2009/08/introducing-the-nio-socketserver-implementation";>elsewhere</a>
 and found to be simple to implement and fast. The protocol is kept quite 
simple to allow for future implementation of clients in other languages.
 </p>
-<h3><a id="messages">5.3 Messages</a></h3>
+<h3><a id="messages" href="#messages">5.3 Messages</a></h3>
 <p>
 Messages consist of a fixed-size header and variable length opaque byte array 
payload. The header contains a format version and a CRC32 checksum to detect 
corruption or truncation. Leaving the payload opaque is the right decision: 
there is a great deal of progress being made on serialization libraries right 
now, and any particular choice is unlikely to be right for all uses. Needless 
to say a particular application using Kafka would likely mandate a particular 
serialization type as part of its usage. The <code>MessageSet</code> interface 
is simply an iterator over messages with specialized methods for bulk reading 
and writing to an NIO <code>Channel</code>.
 
-<h3><a id="messageformat">5.4 Message Format</a></h3>
+<h3><a id="messageformat" href="#messageformat">5.4 Message Format</a></h3>
 
 <pre>
        /**
@@ -173,7 +173,7 @@ Messages consist of a fixed-size header and variable length 
opaque byte array pa
         */
 </pre>
 </p>
-<h3><a id="log">5.5 Log</a></h3>
+<h3><a id="log" href="#log">5.5 Log</a></h3>
 <p>
 A log for a topic named "my_topic" with two partitions consists of two 
directories (namely <code>my_topic_0</code> and <code>my_topic_1</code>) 
populated with data files containing the messages for that topic. The format of 
the log files is a sequence of "log entries""; each log entry is a 4 byte 
integer <i>N</i> storing the message length which is followed by the <i>N</i> 
message bytes. Each message is uniquely identified by a 64-bit integer 
<i>offset</i> giving the byte position of the start of this message in the 
stream of all messages ever sent to that topic on that partition. The on-disk 
format of each message is given below. Each log file is named with the offset 
of the first message it contains. So the first file created will be 
00000000000.kafka, and each additional file will have an integer name roughly 
<i>S</i> bytes from the previous file where <i>S</i> is the max log file size 
given in the configuration.
 </p>
@@ -192,11 +192,11 @@ payload        : n bytes
 The use of the message offset as the message id is unusual. Our original idea 
was to use a GUID generated by the producer, and maintain a mapping from GUID 
to offset on each broker. But since a consumer must maintain an ID for each 
server, the global uniqueness of the GUID provides no value. Furthermore the 
complexity of maintaining the mapping from a random id to an offset requires a 
heavy weight index structure which must be synchronized with disk, essentially 
requiring a full persistent random-access data structure. Thus to simplify the 
lookup structure we decided to use a simple per-partition atomic counter which 
could be coupled with the partition id and node id to uniquely identify a 
message; this makes the lookup structure simpler, though multiple seeks per 
consumer request are still likely. However once we settled on a counter, the 
jump to directly using the offset seemed natural&mdash;both after all are 
monotonically increasing integers unique to a partition. Since the offs
 et is hidden from the consumer API this decision is ultimately an 
implementation detail and we went with the more efficient approach.
 </p>
 <img src="images/kafka_log.png">
-<h4>Writes</h4>
+<h4><a id="impl_writes" href="#impl_writes">Writes</a></h4>
 <p>
 The log allows serial appends which always go to the last file. This file is 
rolled over to a fresh file when it reaches a configurable size (say 1GB). The 
log takes two configuration parameter <i>M</i> which gives the number of 
messages to write before forcing the OS to flush the file to disk, and <i>S</i> 
which gives a number of seconds after which a flush is forced. This gives a 
durability guarantee of losing at most <i>M</i> messages or <i>S</i> seconds of 
data in the event of a system crash.
 </p>
-<h4>Reads</h4>
+<h4><a id="impl_reads" href="#impl_reads">Reads</a></h4>
 <p>
 Reads are done by giving the 64-bit logical offset of a message and an 
<i>S</i>-byte max chunk size. This will return an iterator over the messages 
contained in the <i>S</i>-byte buffer. <i>S</i> is intended to be larger than 
any single message, but in the event of an abnormally large message, the read 
can be retried multiple times, each time doubling the buffer size, until the 
message is read successfully. A maximum message and buffer size can be 
specified to make the server reject messages larger than some size, and to give 
a bound to the client on the maximum it need ever read to get a complete 
message. It is likely that the read buffer ends with a partial message, this is 
easily detected by the size delimiting.
 </p>
@@ -228,12 +228,11 @@ messageSetSend 1
 ...
 messageSetSend n
 </pre>
-
-<h4>Deletes</h4>
+<h4><a id="impl_deletes" href="#impl_deletes">Deletes</a></h4>
 <p>
 Data is deleted one log segment at a time. The log manager allows pluggable 
delete policies to choose which files are eligible for deletion. The current 
policy deletes any log with a modification time of more than <i>N</i> days ago, 
though a policy which retained the last <i>N</i> GB could also be useful. To 
avoid locking reads while still allowing deletes that modify the segment list 
we use a copy-on-write style segment list implementation that provides 
consistent views to allow a binary search to proceed on an immutable static 
snapshot view of the log segments while deletes are progressing.
 </p>
-<h4>Guarantees</h4>
+<h4><a id="impl_guarantees" href="#impl_guarantees">Guarantees</a></h4>
 <p>
 The log provides a configuration parameter <i>M</i> which controls the maximum 
number of messages that are written before forcing a flush to disk. On startup 
a log recovery process is run that iterates over all messages in the newest log 
segment and verifies that each message entry is valid. A message entry is valid 
if the sum of its size and offset are less than the length of the file AND the 
CRC32 of the message payload matches the CRC stored with the message. In the 
event corruption is detected the log is truncated to the last valid offset.
 </p>
@@ -241,8 +240,8 @@ The log provides a configuration parameter <i>M</i> which 
controls the maximum n
 Note that two kinds of corruption must be handled: truncation in which an 
unwritten block is lost due to a crash, and corruption in which a nonsense 
block is ADDED to the file. The reason for this is that in general the OS makes 
no guarantee of the write order between the file inode and the actual block 
data so in addition to losing written data the file can gain nonsense data if 
the inode is updated with a new size but a crash occurs before the block 
containing that data is not written. The CRC detects this corner case, and 
prevents it from corrupting the log (though the unwritten messages are, of 
course, lost).
 </p>
 
-<h3><a id="distributionimpl">5.6 Distribution</a></h3>
-<h4>Consumer Offset Tracking</h4>
+<h3><a id="distributionimpl" href="#distributionimpl">5.6 Distribution</a></h3>
+<h4><a id="impl_offsettracking" href="#impl_offsettracking">Consumer Offset 
Tracking</a></h4>
 <p>
 The high-level consumer tracks the maximum offset it has consumed in each 
partition and periodically commits its offset vector so that it can resume from 
those offsets in the event of a restart. Kafka provides the option to store all 
the offsets for a given consumer group in a designated broker (for that group) 
called the <i>offset manager</i>. i.e., any consumer instance in that consumer 
group should send its offset commits and fetches to that offset manager 
(broker). The high-level consumer handles this automatically. If you use the 
simple consumer you will need to manage offsets manually. This is currently 
unsupported in the Java simple consumer which can only commit or fetch offsets 
in ZooKeeper. If you use the Scala simple consumer you can discover the offset 
manager and explicitly commit or fetch offsets to the offset manager. A 
consumer can look up its offset manager by issuing a ConsumerMetadataRequest to 
any Kafka broker and reading the ConsumerMetadataResponse which will c
 ontain the offset manager. The consumer can then proceed to commit or fetch 
offsets from the offsets manager broker. In case the offset manager moves, the 
consumer will need to rediscover the offset manager. If you wish to manage your 
offsets manually, you can take a look at these <a 
href="https://cwiki.apache.org/confluence/display/KAFKA/Committing+and+fetching+consumer+offsets+in+Kafka";>code
 samples that explain how to issue OffsetCommitRequest and 
OffsetFetchRequest</a>.
 </p>
@@ -255,7 +254,7 @@ When the offset manager receives an OffsetCommitRequest, it 
appends the request
 When the offset manager receives an offset fetch request, it simply returns 
the last committed offset vector from the offsets cache. In case the offset 
manager was just started or if it just became the offset manager for a new set 
of consumer groups (by becoming a leader for a partition of the offsets topic), 
it may need to load the offsets topic partition into the cache. In this case, 
the offset fetch will fail with an OffsetsLoadInProgress exception and the 
consumer may retry the OffsetFetchRequest after backing off. (This is done 
automatically by the high-level consumer.)
 </p>
 
-<h5><a id="offsetmigration">Migrating offsets from ZooKeeper to Kafka</a></h5>
+<h5><a id="offsetmigration" href="#offsetmigration">Migrating offsets from 
ZooKeeper to Kafka</a></h5>
 <p>
 Kafka consumers in earlier releases store their offsets by default in 
ZooKeeper. It is possible to migrate these consumers to commit offsets into 
Kafka by following these steps:
 <ol>
@@ -271,17 +270,17 @@ Kafka consumers in earlier releases store their offsets 
by default in ZooKeeper.
 A roll-back (i.e., migrating from Kafka back to ZooKeeper) can also be 
performed using the above steps if you set 
<code>offsets.storage=zookeeper</code>.
 </p>
 
-<h4>ZooKeeper Directories</h4>
+<h4><a id="impl_zookeeper" href="#impl_zookeeper">ZooKeeper 
Directories</a></h4>
 <p>
 The following gives the ZooKeeper structures and algorithms used for 
co-ordination between consumers and brokers.
 </p>
 
-<h4>Notation</h4>
+<h4><a id="impl_zknotation" href="#impl_zknotation">Notation</a></h4>
 <p>
 When an element in a path is denoted [xyz], that means that the value of xyz 
is not fixed and there is in fact a ZooKeeper znode for each possible value of 
xyz. For example /topics/[topic] would be a directory named /topics containing 
a sub-directory for each topic name. Numerical ranges are also given such as 
[0...5] to indicate the subdirectories 0, 1, 2, 3, 4. An arrow -> is used to 
indicate the contents of a znode. For example /hello -> world would indicate a 
znode /hello containing the value "world".
 </p>
 
-<h4>Broker Node Registry</h4>
+<h4><a id="impl_zkbroker" href="#impl_zkbroker">Broker Node Registry</a></h4>
 <pre>
 /brokers/ids/[0...N] --> host:port (ephemeral node)
 </pre>
@@ -291,7 +290,7 @@ This is a list of all present broker nodes, each of which 
provides a unique logi
 <p>
 Since the broker registers itself in ZooKeeper using ephemeral znodes, this 
registration is dynamic and will disappear if the broker is shutdown or dies 
(thus notifying consumers it is no longer available).
 </p>
-<h4>Broker Topic Registry</h4>
+<h4><a id="impl_zktopic" href="#impl_zktopic">Broker Topic Registry</a></h4>
 <pre>
 /brokers/topics/[topic]/[0...N] --> nPartions (ephemeral node)
 </pre>
@@ -300,7 +299,7 @@ Since the broker registers itself in ZooKeeper using 
ephemeral znodes, this regi
 Each broker registers itself under the topics it maintains and stores the 
number of partitions for that topic.
 </p>
 
-<h4>Consumers and Consumer Groups</h4>
+<h4><a id="impl_zkconsumers" href="#impl_zkconsumers">Consumers and Consumer 
Groups</a></h4>
 <p>
 Consumers of topics also register themselves in ZooKeeper, in order to 
coordinate with each other and balance the consumption of data. Consumers can 
also store their offsets in ZooKeeper by setting 
<code>offsets.storage=zookeeper</code>. However, this offset storage mechanism 
will be deprecated in a future release. Therefore, it is recommended to <a 
href="#offsetmigration">migrate offsets storage to Kafka</a>.
 </p>
@@ -314,7 +313,7 @@ For example if one consumer is your foobar process, which 
is run across three ma
 The consumers in a group divide up the partitions as fairly as possible, each 
partition is consumed by exactly one consumer in a consumer group.
 </p>
 
-<h4>Consumer Id Registry</h4>
+<h4><a id="impl_zkconsumerid" href="#impl_zkconsumerid">Consumer Id 
Registry</a></h4>
 <p>
 In addition to the group_id which is shared by all consumers in a group, each 
consumer is given a transient, unique consumer_id (of the form hostname:uuid) 
for identification purposes. Consumer ids are registered in the following 
directory.
 <pre>
@@ -323,7 +322,7 @@ In addition to the group_id which is shared by all 
consumers in a group, each co
 Each of the consumers in the group registers under its group and creates a 
znode with its consumer_id. The value of the znode contains a map of &lt;topic, 
#streams&gt;. This id is simply used to identify each of the consumers which is 
currently active within a group. This is an ephemeral node so it will disappear 
if the consumer process dies.
 </p>
 
-<h4>Consumer Offsets</h4>
+<h4><a id="impl_zkconsumeroffsets" href="#impl_zkconsumeroffsets">Consumer 
Offsets</a></h4>
 <p>
 Consumers track the maximum offset they have consumed in each partition. This 
value is stored in a ZooKeeper directory if 
<code>offsets.storage=zookeeper</code>. This valued is stored in a ZooKeeper 
directory.
 </p>
@@ -331,7 +330,7 @@ Consumers track the maximum offset they have consumed in 
each partition. This va
 /consumers/[group_id]/offsets/[topic]/[broker_id-partition_id] --> 
offset_counter_value ((persistent node)
 </pre>
 
-<h4>Partition Owner registry</h4>
+<h4><a id="impl_zkowner" href="#impl_zkowner">Partition Owner registry</a></h4>
 
 <p>
 Each broker partition is consumed by a single consumer within a given consumer 
group. The consumer must establish its ownership of a given partition before 
any consumption can begin. To establish its ownership, a consumer writes its 
own id in an ephemeral node under the particular broker partition it is 
claiming.
@@ -341,13 +340,13 @@ Each broker partition is consumed by a single consumer 
within a given consumer g
 /consumers/[group_id]/owners/[topic]/[broker_id-partition_id] --> 
consumer_node_id (ephemeral node)
 </pre>
 
-<h4>Broker node registration</h4>
+<h4><a id="impl_brokerregistration" href="#impl_brokerregistration">Broker 
node registration</a></h4>
 
 <p>
 The broker nodes are basically independent, so they only publish information 
about what they have. When a broker joins, it registers itself under the broker 
node registry directory and writes information about its host name and port. 
The broker also register the list of existing topics and their logical 
partitions in the broker topic registry. New topics are registered dynamically 
when they are created on the broker.
 </p>
 
-<h4>Consumer registration algorithm</h4>
+<h4><a id="impl_consumerregistration" 
href="#impl_consumerregistration">Consumer registration algorithm</a></h4>
 
 <p>
 When a consumer starts, it does the following:
@@ -363,7 +362,7 @@ When a consumer starts, it does the following:
 </ol>
 </p>
 
-<h4>Consumer rebalancing algorithm</h4>
+<h4><a id="impl_consumerrebalance" href="#impl_consumerrebalance">Consumer 
rebalancing algorithm</a></h4>
 <p>
 The consumer rebalancing algorithms allows all the consumers in a group to 
come into consensus on which consumer is consuming which partitions. Consumer 
rebalancing is triggered on each addition or removal of both broker nodes and 
other consumers within the same group. For a given topic and a given consumer 
group, broker partitions are divided evenly among consumers within the group. A 
partition is always consumed by a single consumer. This design simplifies the 
implementation. Had we allowed a partition to be concurrently consumed by 
multiple consumers, there would be contention on the partition and some kind of 
locking would be required. If there are more consumers than partitions, some 
consumers won't get any data at all. During rebalancing, we try to assign 
partitions to consumers in such a way that reduces the number of broker nodes 
each consumer has to connect to.
 </p>

http://git-wip-us.apache.org/repos/asf/kafka/blob/6cbd9759/docs/introduction.html
----------------------------------------------------------------------
diff --git a/docs/introduction.html b/docs/introduction.html
index 7e0b150..e5b2e78 100644
--- a/docs/introduction.html
+++ b/docs/introduction.html
@@ -5,9 +5,9 @@
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at
- 
+
     http://www.apache.org/licenses/LICENSE-2.0
- 
+
  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -15,7 +15,7 @@
  limitations under the License.
 -->
 
-<h3><a id="introduction">1.1 Introduction</a></h3>
+<h3><a id="introduction" href="#introduction">1.1 Introduction</a></h3>
 Kafka is a distributed, partitioned, replicated commit log service. It 
provides the functionality of a messaging system, but with a unique design.
 <p>
 What does all that mean?
@@ -35,7 +35,7 @@ So, at a high level, producers send messages over the network 
to the Kafka clust
 
 Communication between the clients and the servers is done with a simple, 
high-performance, language agnostic <a 
href="https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol";>TCP
 protocol</a>. We provide a Java client for Kafka, but clients are available in 
<a href="https://cwiki.apache.org/confluence/display/KAFKA/Clients";>many 
languages</a>.
 
-<h4>Topics and Logs</h4>
+<h4><a id="intro_topics" href="#intro_topics">Topics and Logs</a></h4>
 Let's first dive into the high-level abstraction Kafka provides&mdash;the 
topic.
 <p>
 A topic is a category or feed name to which messages are published. For each 
topic, the Kafka cluster maintains a partitioned log that looks like this:
@@ -50,19 +50,19 @@ In fact the only metadata retained on a per-consumer basis 
is the position of th
 <p>
 This combination of features means that Kafka consumers are very 
cheap&mdash;they can come and go without much impact on the cluster or on other 
consumers. For example, you can use our command line tools to "tail" the 
contents of any topic without changing what is consumed by any existing 
consumers.
 <p>
-The partitions in the log serve several purposes. First, they allow the log to 
scale beyond a size that will fit on a single server. Each individual partition 
must fit on the servers that host it, but a topic may have many partitions so 
it can handle an arbitrary amount of data. Second they act as the unit of 
parallelism&mdash;more on that in a bit. 
+The partitions in the log serve several purposes. First, they allow the log to 
scale beyond a size that will fit on a single server. Each individual partition 
must fit on the servers that host it, but a topic may have many partitions so 
it can handle an arbitrary amount of data. Second they act as the unit of 
parallelism&mdash;more on that in a bit.
 
-<h4>Distribution</h4>
+<h4><a id="intro_distribution" href="#intro_distribution">Distribution</a></h4>
 
 The partitions of the log are distributed over the servers in the Kafka 
cluster with each server handling data and requests for a share of the 
partitions. Each partition is replicated across a configurable number of 
servers for fault tolerance.
 <p>
 Each partition has one server which acts as the "leader" and zero or more 
servers which act as "followers". The leader handles all read and write 
requests for the partition while the followers passively replicate the leader. 
If the leader fails, one of the followers will automatically become the new 
leader. Each server acts as a leader for some of its partitions and a follower 
for others so load is well balanced within the cluster.
 
-<h4>Producers</h4>
+<h4><a id="intro_producers" href="#intro_producers">Producers</a></h4>
 
 Producers publish data to the topics of their choice. The producer is 
responsible for choosing which message to assign to which partition within the 
topic. This can be done in a round-robin fashion simply to balance load or it 
can be done according to some semantic partition function (say based on some 
key in the message). More on the use of partitioning in a second.
 
-<h4><a id="intro_consumers">Consumers</a></h4>
+<h4><a id="intro_consumers" href="#intro_consumers">Consumers</a></h4>
 
 Messaging traditionally has two models: <a 
href="http://en.wikipedia.org/wiki/Message_queue";>queuing</a> and <a 
href="http://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern";>publish-subscribe</a>.
 In a queue, a pool of consumers may read from a server and each message goes 
to one of them; in publish-subscribe the message is broadcast to all consumers. 
Kafka offers a single consumer abstraction that generalizes both of 
these&mdash;the <i>consumer group</i>.
 <p>
@@ -70,7 +70,7 @@ Consumers label themselves with a consumer group name, and 
each message publishe
 <p>
 If all the consumer instances have the same consumer group, then this works 
just like a traditional queue balancing load over the consumers.
 <p>
-If all the consumer instances have different consumer groups, then this works 
like publish-subscribe and all messages are broadcast to all consumers. 
+If all the consumer instances have different consumer groups, then this works 
like publish-subscribe and all messages are broadcast to all consumers.
 <p>
 More commonly, however, we have found that topics have a small number of 
consumer groups, one for each "logical subscriber". Each group is composed of 
many consumer instances for scalability and fault tolerance. This is nothing 
more than publish-subscribe semantics where the subscriber is cluster of 
consumers instead of a single process.
 <p>
@@ -88,7 +88,7 @@ Kafka does it better. By having a notion of 
parallelism&mdash;the partition&mdas
 <p>
 Kafka only provides a total order over messages <i>within</i> a partition, not 
between different partitions in a topic. Per-partition ordering combined with 
the ability to partition data by key is sufficient for most applications. 
However, if you require a total order over messages this can be achieved with a 
topic that has only one partition, though this will mean only one consumer 
process per consumer group.
 
-<h4>Guarantees</h4>
+<h4><a id="intro_guarantees" href="#intro_guarantees">Guarantees</a></h4>
 
 At a high-level Kafka gives the following guarantees:
 <ul>

http://git-wip-us.apache.org/repos/asf/kafka/blob/6cbd9759/docs/migration.html
----------------------------------------------------------------------
diff --git a/docs/migration.html b/docs/migration.html
index 18ab6d4..2da6a7e 100644
--- a/docs/migration.html
+++ b/docs/migration.html
@@ -5,9 +5,9 @@
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at
- 
+
     http://www.apache.org/licenses/LICENSE-2.0
- 
+
  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -16,11 +16,11 @@
 -->
 
 <!--#include virtual="../includes/header.html" -->
-<h2>Migrating from 0.7.x to 0.8</h2>
+<h2><a id="migration" href="#migration">Migrating from 0.7.x to 0.8</a></h2>
 
 0.8 is our first (and hopefully last) release with a non-backwards-compatible 
wire protocol, ZooKeeper     layout, and on-disk data format. This was a chance 
for us to clean up a lot of cruft and start fresh. This means performing a 
no-downtime upgrade is more painful than normal&mdash;you cannot just swap in 
the new code in-place.
 
-<h3>Migration Steps</h3>
+<h3><a id="migration_steps" href="#migration_steps">Migration Steps</a></h3>
 
 <ol>
     <li>Setup a new cluster running 0.8.

[2/2] kafka git commit: KAFKA-2809; Improve documentation linking

Reply via email to