configuration.html

jkreps Wed, 28 Aug 2013 22:07:58 -0700

Author: jkreps
Date: Thu Aug 29 05:07:04 2013
New Revision: 1518473

URL: http://svn.apache.org/r1518473
Log:
Improve the configuration documentation.



Modified:
    kafka/site/08/configuration.html

Modified: kafka/site/08/configuration.html
URL: 
http://svn.apache.org/viewvc/kafka/site/08/configuration.html?rev=1518473&r1=1518472&r2=1518473&view=diff
==============================================================================
--- kafka/site/08/configuration.html (original)
+++ kafka/site/08/configuration.html Thu Aug 29 05:07:04 2013
@@ -1,3 +1,7 @@
+Kafka uses the <a href="http://en.wikipedia.org/wiki/.properties";>property 
file format</a> for configuration. These can be supplied either from a file or 
programmatically.
+<p>
+Some configurations have both a default global setting as well as a 
topic-level overrides. The topic level properties have the format of csv (e.g., 
"xyz.per.topic=topic1:value1,topic2:value2") and they override the default 
value for the specified topics.
+
 <h3><a id="brokerconfigs">3.1 Broker Configs</a></h3>
 The essential configurations are the following:
 <ul>
@@ -6,8 +10,6 @@ The essential configurations are the fol
        <li><code>zookeeper.connect</code>
 </ul>
 
-Note that some configurations have both a default global setting as well as a 
topic level setting. The topic level properties have the format of csv (e.g., 
"topic1:value1,topic2:value2") and they override the values in the global 
setting for those specified topics.
-
 <table class="data-table">
 <tbody><tr>
       <th>Property</th>
@@ -17,12 +19,18 @@ Note that some configurations have both 
     <tr>
       <td>broker.id</td>
       <td></td>
-      <td>Each broker is uniquely identified by a non-negative integer id. 
This id serves as the brokers "name", and allows the broker to be moved to a 
different host/port without confusing consumers.</td>
+      <td>Each broker is uniquely identified by a non-negative integer id. 
This id serves as the brokers "name", and allows the broker to be moved to a 
different host/port without confusing consumers. You can choose any number you 
like so long as it is unique.
+       </td>
     </tr>
     <tr>
       <td>log.dirs</td>
       <td nowrap>/tmp/kafka-logs</td>
-      <td>The directories in which the log data is kept</td>
+      <td>A comma-separated list of one or more directories in which Kafka 
data is stored. Each new partition that is created will be placed in the 
directory which currently has the fewest partitions.</td>
+    </tr>
+    <tr>
+      <td>port</td>
+      <td>6667</td>
+      <td>The port on which the server accepts client connections.</td>
     </tr>
     <tr>
       <td>zookeeper.connect</td>
@@ -34,140 +42,135 @@ Zookeeper also allows you to add a "chro
     <tr>
       <td>message.max.bytes</td>
       <td>1000000</td>
-      <td>The maximum size of a message that the server can receive</td>
+      <td>The maximum size of a message that the server can receive. It is 
important that this property be in sync with the maximum fetch size your 
consumers use or else an unruly consumer will be able to publish messages too 
large for consumers to consume.</td>
     </tr>
     <tr>
       <td>num.network.threads</td>
       <td>3</td>
-      <td>The number of network threads that the server uses for handling 
network requests</td>
+      <td>The number of network threads that the server uses for handling 
network requests. You probably don't need to change this.</td>
     </tr>
     <tr>
       <td>num.io.threads</td>
       <td>8</td>
-      <td>The number of io threads that the server uses for carrying out 
network requests</td>
+      <td>The number of I/O threads that the server uses for executing 
requests. You should have at least as many threads as you have disks.</td>
     </tr>
     <tr>
       <td>queued.max.requests</td>
       <td>500</td>
-      <td>The number of queued requests allowed before blocking the network 
threads</td>
-    </tr>
-    <tr>
-      <td>port</td>
-      <td>6667</td>
-      <td>The port to listen and accept connections on</td>
+      <td>The number of requests that can be queued up for processing by the 
I/O threads before the network threads stop reading in new requests.</td>
     </tr>
     <tr>
       <td>host.name</td>
       <td>null</td>
       <td>
-        <p>Hostname of broker. If this is set, it will only bind to this 
address. If this is not set, it will bind to all interfaces, and publish one to 
ZK</p>
+        <p>Hostname of broker. If this is set, it will only bind to this 
address. If this is not set, it will bind to all interfaces, and publish one to 
ZK.</p>
      </td>
     </tr>
     <tr>
       <td>socket.send.buffer.bytes</td>
       <td>100 * 1024</td>
-      <td>The SO_SNDBUFF buffer of the socket sever sockets</td>
+      <td>The SO_SNDBUFF buffer the server prefers for socket connections.</td>
     </tr>
     <tr>
       <td>socket.receive.buffer.bytes</td>
       <td>100 * 1024</td>
-      <td>The SO_RCVBUFF buffer of the socket sever sockets</td>
+      <td>The SO_RCVBUFF buffer the server prefers for socket connections.</td>
     </tr>
     <tr>
       <td>socket.request.max.bytes</td>
       <td>100 * 1024 * 1024</td>
-      <td>The maximum number of bytes in a socket request</td>
+      <td>The maximum request size the server will allow. This prevents the 
server from running out of memory and should be smaller than the Java heap 
size.</td>
     </tr>
     <tr>
       <td>num.partitions</td>
       <td>1</td>
-      <td>The default number of log partitions per topic</td>
+      <td>The default number of partitions per topic.</td>
     </tr>
     <tr>
       <td>log.segment.bytes</td>
       <td nowrap>1024 * 1024 * 1024</td>
-      <td>The maximum size of a single log file</td>
+      <td>The log for a topic partition is stored as a directory of segment 
files. This setting controls the size to which a segment file will grow before 
a new segment is rolled over in the log.</td>
     </tr>
     <tr>
       <td>log.segment.bytes.per.topic</td>
       <td>""</td>
-      <td>The maximum size of a single log file for some specific topic</td>
+      <td>This setting allows overriding log.segment.bytes on a per-topic 
basis</td>
     </tr>
     <tr>
       <td>log.roll.hours</td>
       <td>24 * 7</td>
-      <td>The maximum time before a new log segment is rolled out</td>
+      <td>This setting will force Kafka to roll a new log segment even if the 
log.segment.bytes size has not been reached.</td>
     </tr>
     <tr>
       <td>log.roll.hours.per.topic</td>
       <td>""</td>
-      <td>The number of hours before rolling out a new log segment for some 
specific topic</td>
+      <td>This setting allows overriding log.roll.hours on a per-topic 
basis.</td>
     </tr>
     <tr>
       <td>log.retention.hours</td>
       <td>24 * 7</td>
-      <td>The number of hours to keep a log file before deleting it</td>
+      <td>The number of hours to keep a log segment before it is deleted, i.e. 
the default data retention window for all topics. Note that if both 
log.retention.hours and log.retention.bytes are both set we delete a segment 
when either limit is exceeded.</td>
     </tr>
     <tr>
       <td>log.retention.hours.per.topic</td>
       <td>""</td>
-      <td>The number of hours to keep a log file before deleting it for some 
specific topic</td>
+      <td>A per-topic override for log.retention.hours.</td>
     </tr>
     <tr>
       <td>log.retention.bytes</td>
       <td>-1</td>
-      <td>The maximum size of the log per partition</td>
+      <td>The amount of data to retain in the log for each topic-partitions. 
Note that this is the limit per-partition so multiple by the number of 
partitions to get the total data retained for the topic. Also note that if both 
log.retention.hours and log.retention.bytes are both set we delete a segment 
when either limit is exceeded.</td>
     </tr>
     <tr>
       <td>log.retention.bytes.per.topic</td>
       <td>""</td>
-      <td>The maximum size of the log for each partition in some specific 
topics</td>
+      <td>A per-topic override for log.retention.bytes.</td>
     </tr>
     <tr>
       <td>log.cleanup.interval.mins</td>
       <td>10</td>
-      <td>The frequency in minutes that the log cleaner checks whether any log 
is eligible for deletion</td>
+      <td>The frequency in minutes that the log cleaner checks whether any log 
segment is eligible for deletion to meet the retention policies.</td>
     </tr>
     <tr>
       <td>log.index.size.max.bytes</td>
       <td>10 * 1024 * 1024</td>
-      <td>The maximum size in bytes of the offset index</td>
+      <td>The maximum size in bytes we allow for the offset index for each log 
segment. Note that we will always pre-allocate a sparse file with this much 
space and shrink it down when the log rolls. If the index fills up we will roll 
a new log segment even if we haven't reached the log.segment.bytes limit.</td>
     </tr>
     <tr>
       <td>log.index.interval.bytes</td>
       <td>4096</td>
-      <td>The interval with which we add an entry to the offset index</td>
+      <td>The byte interval at which we add an entry to the offset index. When 
executing a fetch request the server must do a linear scan for up to this many 
bytes to find the correct position in the log to begin and end the fetch. So 
setting this value to be larger will mean larger index files (and a bit more 
memory usage) but less scanning. However the server will never add more than 
one index entry per log append (even if more than log.index.interval worth of 
messages are appended). In general you probably don't need to mess with this 
value.</td>
     </tr>
     <tr>
       <td>log.flush.interval.messages</td>
       <td>10000</td>
-      <td>The number of messages accumulated on a log partition before 
messages are flushed to disk</td>
+      <td>The number of messages written to a log partition before we force an 
fsync on the log. Setting this higher will improve performance a lot but will 
increase the window of data at risk in the event of a crash (though that is 
usually best addressed through replication). If both this setting and 
log.flush.interval.ms are both used the log will be flushed when either 
criteria is met.</td>
     </tr>
     <tr>
       <td>log.flush.interval.ms.per.topic</td>
       <td>""</td>
-      <td>The maximum time in ms that a message in selected topics is kept in 
memory before flushed to disk, e.g., topic1:3000,topic2:6000</td>
+      <td>The per-topic override for log.flush.interval.messages, e.g., 
topic1:3000,topic2:6000</td>
     </tr>
     <tr>
       <td>log.flush.scheduler.interval.ms</td>
       <td>3000</td>
-      <td>The frequency in ms that the log flusher checks whether any log 
needs to be flushed to disk</td>
+      <td>The frequency in ms that the log flusher checks whether any log is 
eligible to be flushed to disk.</td>
     </tr>
     <tr>
       <td>log.flush.interval.ms</td>
       <td>3000
      </td>
-      <td>The maximum time in ms that a message in any topic is kept in memory 
before flushed to disk</td>
+      <td>The maximum time between fsync calls on the log. If used in 
conjuction with log.flush.interval.messages the log will be flushed when either 
criteria is met.</td>
     </tr>
     <tr>
       <td>auto.create.topics.enable</td>
       <td>true</td>
-      <td>Enable auto creation of topic on the server</td>
+      <td>Enable auto creation of topic on the server.  If this is set to true 
then attempts to produce, consume, or fetch metadata for a non-existent topic 
will automatically create it with the default replication factor and number of 
partitions.</td>
     </tr>
     <tr>
       <td>controller.socket.timeout.ms</td>
       <td>30000</td>
-      <td>The socket timeout for controller-to-broker channels</td>
+      <td>The socket timeout for commands from the partition management 
controller to the replicas.</td>
     </tr>
     <tr>
       <td>controller.message.queue.size</td>
@@ -177,74 +180,74 @@ Zookeeper also allows you to add a "chro
     <tr>
       <td>default.replication.factor</td>
       <td>1</td>
-      <td>Default replication factors for automatically created topics</td>
+      <td>The default replication factor for automatically created topics.</td>
     </tr>
     <tr>
       <td>replica.lag.time.max.ms</td>
       <td>10000</td>
-      <td>If a follower hasn't sent any fetch requests during this time, the 
leader will remove the follower from isr</td>
+      <td>If a follower hasn't sent any fetch requests for this window of 
time, the leader will remove the follower from ISR and treat it as dead.</td>
     </tr>
     <tr>
       <td>replica.lag.max.messages</td>
       <td>4000</td>
-      <td>If the lag in messages between a leader and a follower exceeds this 
number, the leader will remove the follower from isr</td>
+      <td>If a replica falls more than this many messages behind the leader, 
the leader will remove the follower from ISR and treat it as dead.</td>
     </tr>
     <tr>
       <td>replica.socket.timeout.ms</td>
       <td>30 * 1000</td>
-      <td>The socket timeout for network requests</td>
+      <td>The socket timeout for network requests to the leader for 
replicating data.</td>
     </tr>
     <tr>
       <td>replica.socket.receive.buffer.bytes</td>
       <td>64 * 1024</td>
-      <td>The socket receive buffer for network requests</td>
+      <td>The socket receive buffer for network requests to the leader for 
replicating data.</td>
     </tr>
     <tr>
       <td>replica.fetch.max.bytes</td>
       <td nowrap>1024 * 1024</td>
-      <td>The number of byes of messages to attempt to fetch</td>
+      <td>The number of byes of messages to attempt to fetch for each 
partition in the fetch requests the replicas send to the leader.</td>
     </tr>
     <tr>
       <td>replica.fetch.wait.max.ms</td>
       <td>500</td>
-      <td>Max wait time for each fetcher request issued by follower 
replicas</td>
+      <td>The maximum amount of time to wait time for data to arrive on the 
leader in the fetch requests sent by the replicas to the leader.</td>
     </tr>
     <tr>
       <td>replica.fetch.min.bytes</td>
       <td>1</td>
-      <td>Minimum bytes expected for each fetch response. If not enough bytes, 
wait up to replicaMaxWaitTimeMs</td>
+      <td>Minimum bytes expected for each fetch response for the fetch 
requests from the replica to the leader. If not enough bytes, wait up to 
replica.fetch.wait.max.ms for this many bytes to arrive.</td>
     </tr>
     <tr>
       <td>num.replica.fetchers</td>
       <td>1</td>
       <td>
-        <p>Number of fetcher threads used to replicate messages from a source 
broker. Increasing this value can increase the degree of I/O parallelism in the 
follower broker.</p>
+        <p>Number of threads used to replicate messages from leaders. 
Increasing this value can increase the degree of I/O parallelism in the 
follower broker.</p>
      </td>
     </tr>
     <tr>
       <td>replica.high.watermark.checkpoint.interval.ms</td>
       <td>5000</td>
-      <td>The frequency with which the high watermark is saved out to disk</td>
+      <td>The frequency with which each replica saves its high watermark to 
disk to handle recovery.</td>
     </tr>
     <tr>
       <td>fetch.purgatory.purge.interval.requests</td>
       <td>10000</td>
-      <td>The purge interval (in number of requests) of the fetch request 
purgatory</td>
+      <td>The purge interval (in number of requests) of the fetch request 
purgatory.</td>
     </tr>
     <tr>
       <td>producer.purgatory.purge.interval.requests</td>
       <td>10000</td>
-      <td>The purge interval (in number of requests) of the producer request 
purgatory</td>
+      <td>The purge interval (in number of requests) of the producer request 
purgatory.</td>
     </tr>
     <tr>
       <td>zookeeper.session.timeout.ms</td>
       <td>6000</td>
-      <td>Zookeeper session timeout</td>
+      <td>Zookeeper session timeout. If the server fails to heartbeat to 
zookeeper within this period of time it is considered dead. If you set this too 
low the server may be falsely considered dead; if you set it too high it may 
take too long to recognize a truly dead server.</td>
     </tr>
     <tr>
       <td>zookeeper.connection.timeout.ms</td>
       <td>6000</td>
-      <td>The max time that the client waits to establish a connection to 
zookeeper</td>
+      <td>The max time that the client waits to establish a connection to 
zookeeper.</td>
     </tr>
     <tr>
       <td>zookeeper.sync.time.ms</td>
@@ -259,16 +262,17 @@ Zookeeper also allows you to add a "chro
     <tr>
       <td>controlled.shutdown.max.retries</td>
       <td>3</td>
-      <td>Number of retries to complete the controlled shutdown 
successfully</td>
+      <td>Number of retries to complete the controlled shutdown successfully 
before executing an unclean shutdown.</td>
     </tr>
     <tr>
       <td>controlled.shutdown.retry.backoff.ms</td>
       <td>5000</td>
-      <td>Backoff time between two retries</td>
+      <td>Backoff time between shutdown retries.</td>
     </tr>
 </tbody></table>
 
 <p>More details about broker configuration can be found in the scala class 
<code>kafka.server.KafkaConfig</code>.</p>
+
 <h3><a id="consumerconfigs">3.2 Consumer Configs</a></h3>
 The essential consumer configurations are the following:
 <ul>
@@ -285,7 +289,7 @@ The essential consumer configurations ar
     <tr>
       <td>group.id</td>
       <td colspan="1"></td>
-      <td>A string that uniquely identifies a set of consumers within the same 
consumer group</td>
+      <td>A string that uniquely identifies the group of consumer processes to 
which this consumer belongs. By setting the same group id multiple processes 
indicate that they are all part of the same consumer group.</td>
     </tr>
     <tr>
       <td>zookeeper.connect</td>
@@ -314,17 +318,17 @@ The essential consumer configurations ar
     <tr>
       <td>fetch.message.max.bytes</td>
       <td nowrap>1024 * 1024</td>
-      <td>The number of byes of messages to attempt to fetch</td>
+      <td>The number of byes of messages to attempt to fetch for each 
topic-partition in each fetch request. These bytes will be read into memory for 
each partition, so this helps control the memory used by the consumer. The 
fetch request size must be at least as large as the maximum message size the 
server allows or else it is possible for the producer to send messages larger 
than the consumer can fetch.</td>
     </tr>
     <tr>
       <td>auto.commit.enable</td>
       <td colspan="1">true</td>
-      <td>If true, periodically commit to zookeeper the offset of messages 
already fetched by the consumer</td>
+      <td>If true, periodically commit to zookeeper the offset of messages 
already fetched by the consumer. This committed offset will be used when the 
process fails as the position from which the new consumer will begin.</td>
     </tr>
     <tr>
       <td>auto.commit.interval.ms</td>
       <td colspan="1">60 * 1000</td>
-      <td>The frequency in ms that the consumer offsets are committed to 
zookeeper</td>
+      <td>The frequency in ms that the consumer offsets are committed to 
zookeeper.</td>
     </tr>
     <tr>
       <td>queued.max.message.chunks</td>
@@ -334,12 +338,12 @@ The essential consumer configurations ar
     <tr>
       <td>rebalance.max.retries</td>
       <td colspan="1">4</td>
-      <td>Max number of retries during rebalance</td>
+      <td>When a new consumer joins a consumer group the set of consumers 
attempt to "rebalance" the load to assign partitions to each consumer. If the 
set of consumers changes while this assignment is taking place the rebalance 
will fail and retry. This setting controls the maximum number of attempts 
before giving up.</td>
     </tr>
     <tr>
       <td>fetch.min.bytes</td>
       <td colspan="1">1</td>
-      <td>The minimum amount of data the server should return for a fetch 
request. If insufficient data is available the request will block</td>
+      <td>The minimum amount of data the server should return for a fetch 
request. If insufficient data is available the request will wait for that much 
data to accumulate before answering the request.</td>
     </tr>
     <tr>
       <td>fetch.wait.max.ms</td>
@@ -349,12 +353,12 @@ The essential consumer configurations ar
     <tr>
       <td>rebalance.backoff.ms</td>
       <td>2000</td>
-      <td>Backoff time between retries during rebalance</td>
+      <td>Backoff time between retries during rebalance.</td>
     </tr>
     <tr>
       <td>refresh.leader.backoff.ms</td>
       <td colspan="1">200</td>
-      <td>Backoff time to refresh the leader of a partition after it loses the 
current leader</td>
+      <td>Backoff time to wait before trying to determine the leader of a 
partition that has just lost its leader.</td>
     </tr>
     <tr>
       <td>auto.offset.reset</td>
@@ -370,18 +374,18 @@ The essential consumer configurations ar
     </tr>
     <tr>
       <td>client.id</td>
-      <td colspan="1">${group.id}</td>
-      <td>Client id is specified by the kafka consumer client, used to 
distinguish different clients</td>
+      <td colspan="1">group id value</td>
+      <td>The client id is a user-specified string sent in each request to 
help trace calls. It should logically identify the application making the 
request.</td>
     </tr>
     <tr>
       <td>zookeeper.session.timeout.msÂ </td>
       <td colspan="1">6000</td>
-      <td>Zookeeper session timeout</td>
+      <td>Zookeeper session timeout. If the consumer fails to heartbeat to 
zookeeper for this period of time it is considered dead and a rebalance will 
occur.</td>
     </tr>
     <tr>
       <td>zookeeper.connection.timeout.ms</td>
       <td colspan="1">6000</td>
-      <td>The max time that the client waits to establish a connection to 
zookeeper</td>
+      <td>The max time that the client waits while establishing a connection 
to zookeeper.</td>
     </tr>
     <tr>
       <td>zookeeper.sync.time.msÂ </td>
@@ -419,35 +423,46 @@ Essential configuration properties for t
       <td>request.required.acks</td>
       <td colspan="1">0</td>
       <td>
-        <p>This value controls when the producer receives an acknowledgement 
from the broker. Typical values are (1) 0, which means that the producer never 
waits for an acknowledgement from the broker (the same behavior as 0.7); (2) 1, 
which means that the producer gets an acknowledgement after the leader replica 
has received the data; (3) -1, which means that the producer gets an 
acknowledgement after all in-sync replicas have received the data. The first 
option provides the lowest latency (no network delay), but the worst durability 
(some data loss when the leader replica fails). The second option provides 
lower latency (one network round trip) and better durability (few data loss 
when the leader replica fails). The last option provides low latency (two 
network round trips) and the best durability (no data loss as long as the 
number of failed brokers is less the replication factor of the topic).</p>
+        <p>This value controls when a produce request is considered completed. 
Specifically, how many other brokers must have committed the data to their log 
and acknowledged this to the leader? Typical values are 
+              <ul>
+                    <li>0, which means that the producer never waits for an 
acknowledgement from the broker (the same behavior as 0.7). This option 
provides the lowest latency but the weakest durability guarantees (some data 
will be lost when a server fails).
+                        <li> 1, which means that the producer gets an 
acknowledgement after the leader replica has received the data. This option 
provides better durability as the client waits until the server acknowledges 
the request as successful (only messages that were written to the now-dead 
leader but not yet replicated will be lost).
+                        <li> -1, which means that the producer gets an 
acknowledgement after all in-sync replicas have received the data. This option 
provides the best durability, we guarantee that no messages will be lost as 
long as at least one in sync replica remains.
+                       </ul>
+               </p>
      </td>
     </tr>
     <tr>
+      <td>request.timeout.ms</td>
+      <td colspan="1">1500</td>
+      <td>The amount of time the broker will wait trying to meet the 
request.required.acks requirement before sending back an error to the 
client.</td>
+    </tr>
+    <tr>
       <td>producer.type</td>
       <td colspan="1">sync</td>
       <td>
-        <p>This parameter specifies whether the messages are sent 
asynchronously or not. Valid values are (1) async for asynchronous send and (2) 
sync for synchronous send.</p>
+        <p>This parameter specifies whether the messages are sent 
asynchronously in a background thread. Valid values are (1) async for 
asynchronous send and (2) sync for synchronous send. By setting the producer to 
async we allow batching together of requests (which is great for throughput) 
but open the possibility of a failure of the client machine dropping unsent 
data.</p>
      </td>
     <tr>
       <td>serializer.class</td>
-      <td colspan="1">DefaultEncoder</td>
+      <td colspan="1">kafka.serializer.DefaultEncoder</td>
       <td>The serializer class for messages. The default encoder takes a 
byte[] and returns the same byte[].</td>
     </tr>
     <tr>
       <td>key.serializer.class</td>
-      <td colspan="1">${serializer.class}</td>
-      <td>The serializer class for keys (defaults to the same as for 
messages)</td>
+      <td colspan="1"></td>
+      <td>The serializer class for keys (defaults to the same as for messages 
if nothing is given).</td>
     </tr>
     <tr>
       <td>partitioner.class</td>
-      <td colspan="1">DefaultPartitioner</td>
+      <td colspan="1">kafka.producer.DefaultPartitioner</td>
       <td>The partitioner class for partitioning messages amongst sub-topics. 
The default partitioner is based on the hash of the key.</td>
     </tr>
     <tr>
       <td>compression.codec</td>
       <td colspan="1">none</td>
       <td>
-        <p>This parameter allows you to specify the compression codec for all 
data generated by this producer. Valid values are none, gzip and snappy.</p>
+        <p>This parameter allows you to specify the compression codec for all 
data generated by this producer. Valid values are "none", "gzip" and 
"snappy".</p>
      </td>
     </tr>
     <tr>
@@ -461,14 +476,14 @@ Essential configuration properties for t
       <td>message.send.max.retries</td>
       <td colspan="1">3</td>
       <td>
-        <p>The leader may be unavailable transiently, which can fail the 
sending of a message. This property specifies the number of retries when such 
failures occur.</p>
+        <p>This property will cause the producer to automatically retry a 
failed send request. This property specifies the number of retries when such 
failures occur. Note that setting a non-zero value here can lead to duplicates 
in the case of network errors that cause a message to be sent but the 
acknowledgement to be lost.</p>
      </td>
     </tr>
     <tr>
       <td>retry.backoff.ms</td>
       <td colspan="1">100</td>
       <td>
-        <p>Before each retry, the producer refreshes the metadata of relevant 
topics. Since leader election takes a bit of time, this property specifies the 
amount of time that the producer waits before refreshing the metadata.</p>
+        <p>Before each retry, the producer refreshes the metadata of relevant 
topics to see if a new leader has been elected. Since leader election takes a 
bit of time, this property specifies the amount of time that the producer waits 
before refreshing the metadata.</p>
      </td>
     </tr>
     <tr>
@@ -481,24 +496,24 @@ Essential configuration properties for t
     <tr>
       <td>queue.buffering.max.ms</td>
       <td colspan="1">5000</td>
-      <td>Maximum time, in milliseconds, for buffering data on the producer 
queue</td>
+      <td>Maximum time to buffer data when using async mode. For example a 
setting of 100 will try to batch together 100ms of messages to send at once. 
This will improve throughput but adds message delivery latency due to the 
buffering.</td>
     </tr>
     <tr>
       <td>queue.buffering.max.messages</td>
       <td colspan="1">10000</td>
-      <td>The maximum size of the blocking queue for buffering on the 
producer</td>
+      <td>The maximum number of unsent messages that can be queued up the 
producer when using async mode before either the producer must be blocked or 
data must be dropped.</td>
     </tr>
     <tr>
       <td>queue.enqueue.timeout.ms</td>
       <td colspan="1">-1</td>
       <td>
-        <p>Timeout for event enqueue:<br/> * 0: events will be enqueued 
immediately or dropped if the queue is full<br/> * -ve: enqueue will block 
indefinitely if the queue is full<br/> * +ve: enqueue will block up to this 
many milliseconds if the queue is full</p>
+        <p>The amount of time to block before dropping messages when running 
in async mode and the buffer has reached queue.buffering.max.messages. If set 
to 0 events will be enqueued immediately or dropped if the queue is full (the 
producer send call will never block). If set to -1 the producer will block 
indefinitely and never willingly drop a send.</p>
      </td>
     </tr>
     <tr>
       <td>batch.num.messages</td>
       <td colspan="1">200</td>
-      <td>The number of messages batched at the producer</td>
+      <td>The number of messages to send in one batch when using async mode. 
The producer will wait until either this number of messages are ready to send 
or queue.buffer.max.ms is reached.</td>
     </tr>
     <tr>
       <td>send.buffer.bytes</td>
@@ -508,12 +523,7 @@ Essential configuration properties for t
     <tr>
       <td>client.id</td>
       <td colspan="1">""</td>
-      <td>The client application sending the producer requests</td>
-    </tr>
-    <tr>
-      <td>request.timeout.ms</td>
-      <td colspan="1">1500</td>
-      <td>The ack timeout of the producer requests. Value must be non-negative 
and non-zero</td>
+      <td>The client id is a user-specified string sent in each request to 
help trace calls. It should logically identify the application making the 
request.</td>
     </tr>
 </tbody></table>
 <p>More details about producer configuration can be found in the scala class 
<code>kafka.producer.ProducerConfig</code>.</p>

svn commit: r1518473 - /kafka/site/08/configuration.html

Reply via email to