Repository: kafka Updated Branches: refs/heads/trunk 93b940016 -> ae9532c6b
MINOR: Fixed broken links in the documentation Author: Vahid Hashemian <vahidhashem...@us.ibm.com> Reviewers: Jason Gustafson <ja...@confluent.io> Closes #2010 from vahidhashemian/doc/fix_hyperlinks Project: http://git-wip-us.apache.org/repos/asf/kafka/repo Commit: http://git-wip-us.apache.org/repos/asf/kafka/commit/ae9532c6 Tree: http://git-wip-us.apache.org/repos/asf/kafka/tree/ae9532c6 Diff: http://git-wip-us.apache.org/repos/asf/kafka/diff/ae9532c6 Branch: refs/heads/trunk Commit: ae9532c6b3befe2b18d336004bb0976e9d27d08e Parents: 93b9400 Author: Vahid Hashemian <vahidhashem...@us.ibm.com> Authored: Tue Oct 11 20:25:35 2016 -0700 Committer: Jason Gustafson <ja...@confluent.io> Committed: Tue Oct 11 20:25:35 2016 -0700 ---------------------------------------------------------------------- docs/api.html | 8 ++++---- docs/design.html | 8 ++++++-- docs/ops.html | 4 +++- 3 files changed, 13 insertions(+), 7 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/kafka/blob/ae9532c6/docs/api.html ---------------------------------------------------------------------- diff --git a/docs/api.html b/docs/api.html index 686b265..366814a 100644 --- a/docs/api.html +++ b/docs/api.html @@ -20,7 +20,7 @@ Kafka includes four core apis: <li>The <a href="#producerapi">Producer</a> API allows applications to send streams of data to topics in the Kafka cluster. <li>The <a href="#consumerapi">Consumer</a> API allows applications to read streams of data from topics in the Kafka cluster. <li>The <a href="#streamsapi">Streams</a> API allows transforming streams of data from input topics to output topics. - <li>The <a href="#producerapi">Connect</a> API allows implementing connectors that continually pull from some source system or application into Kafka or push from Kafka into some sink system or application. + <li>The <a href="#connectapi">Connect</a> API allows implementing connectors that continually pull from some source system or application into Kafka or push from Kafka into some sink system or application. </ol> Kafka exposes all its functionality over a language independent protocol which has clients available in many programming languages. However only the Java clients are maintained as part of the main Kafka project, the others are available as independent open source projects. A list of non-Java clients is available <a href="https://cwiki.apache.org/confluence/display/KAFKA/Clients">here</a>. @@ -58,7 +58,7 @@ To use the consumer, you can use the following maven dependency: </dependency> </pre> -<h3><a id="streamsapi" href="#streamsapi">Streams API</a></h3> +<h3><a id="streamsapi" href="#streamsapi">2.3 Streams API</a></h3> The <a href="#streamsapi">Streams</a> API allows transforming streams of data from input topics to output topics. <p> @@ -77,7 +77,7 @@ To use Kafka Streams you can use the following maven dependency: </dependency> </pre> -<h3><a id="connectapi" href="#connectapi">Connect API</a></h3> +<h3><a id="connectapi" href="#connectapi">2.4 Connect API</a></h3> The Connect API allows implementing connectors that continually pull from some source data system into Kafka or push from Kafka into some sink data system. <p> @@ -86,7 +86,7 @@ Many users of Connect won't need to use this API directly, though, they can use Those who want to implement custom connectors can see the <a href="/0100/javadoc/index.html?org/apache/kafka/connect" title="Kafka 0.10.0 Javadoc">javadoc</a>. <p> -<h3><a id="legacyapis" href="#streamsapi">Legacy APIs</a></h3> +<h3><a id="legacyapis" href="#streamsapi">2.5 Legacy APIs</a></h3> <p> A more limited legacy producer and consumer api is also included in Kafka. These old Scala APIs are deprecated and only still available for compatibility purposes. Information on them can be found here <a href="/081/documentation.html#producerapi" title="Kafka 0.8.1 Docs"> http://git-wip-us.apache.org/repos/asf/kafka/blob/ae9532c6/docs/design.html ---------------------------------------------------------------------- diff --git a/docs/design.html b/docs/design.html index c8c10b4..9e53faf 100644 --- a/docs/design.html +++ b/docs/design.html @@ -327,8 +327,12 @@ makes a log more complete, ensuring log consistency during leader failure or cha <p> This majority vote approach has a very nice property: the latency is dependent on only the fastest servers. That is, if the replication factor is three, the latency is determined by the faster slave not the slower one. <p> -There are a rich variety of algorithms in this family including ZooKeeper's <a href="http://www.stanford.edu/class/cs347/reading/zab.pdf">Zab</a>, <a href="https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf">Raft</a>, -and <a href="http://pmg.csail.mit.edu/papers/vr-revisited.pdf">Viewstamped Replication</a>. The most similar academic publication we are aware of to Kafka's actual implementation is <a href="http://research.microsoft.com/apps/pubs/default.aspx?id=66814">PacificA</a> from Microsoft. +There are a rich variety of algorithms in this family including ZooKeeper's +<a href="http://web.archive.org/web/20140602093727/http://www.stanford.edu/class/cs347/reading/zab.pdf">Zab</a>, +<a href="https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf">Raft</a>, +and <a href="http://pmg.csail.mit.edu/papers/vr-revisited.pdf">Viewstamped Replication</a>. +The most similar academic publication we are aware of to Kafka's actual implementation is +<a href="http://research.microsoft.com/apps/pubs/default.aspx?id=66814">PacificA</a> from Microsoft. <p> The downside of majority vote is that it doesn't take many failures to leave you with no electable leaders. To tolerate one failure requires three copies of the data, and to tolerate two failures requires five copies of the data. In our experience having only enough redundancy to tolerate a single failure is not enough for a practical system, but doing every write five times, with 5x the disk space requirements and 1/5th the http://git-wip-us.apache.org/repos/asf/kafka/blob/ae9532c6/docs/ops.html ---------------------------------------------------------------------- diff --git a/docs/ops.html b/docs/ops.html index 236fef1..ed0c153 100644 --- a/docs/ops.html +++ b/docs/ops.html @@ -566,7 +566,9 @@ In general you don't need to do any low-level tuning of the filesystem, but in t In Linux, data written to the filesystem is maintained in <a href="http://en.wikipedia.org/wiki/Page_cache">pagecache</a> until it must be written out to disk (due to an application-level fsync or the OS's own flush policy). The flushing of data is done by a set of background threads called pdflush (or in post 2.6.32 kernels "flusher threads"). <p> -Pdflush has a configurable policy that controls how much dirty data can be maintained in cache and for how long before it must be written back to disk. This policy is described <a href="http://www.westnet.com/~gsmith/content/linux-pdflush.htm">here</a>. When Pdflush cannot keep up with the rate of data being written it will eventually cause the writing process to block incurring latency in the writes to slow down the accumulation of data. +Pdflush has a configurable policy that controls how much dirty data can be maintained in cache and for how long before it must be written back to disk. +This policy is described <a href="http://web.archive.org/web/20160518040713/http://www.westnet.com/~gsmith/content/linux-pdflush.htm">here</a>. +When Pdflush cannot keep up with the rate of data being written it will eventually cause the writing process to block incurring latency in the writes to slow down the accumulation of data. <p> You can see the current state of OS memory usage by doing <pre>