Author: junrao
Date: Sun Mar 24 22:54:39 2013
New Revision: 1460481
URL: http://svn.apache.org/r1460481
Log:
add more FAQ
Modified:
kafka/site/faq.html
Modified: kafka/site/faq.html
URL:
http://svn.apache.org/viewvc/kafka/site/faq.html?rev=1460481&r1=1460480&r2=1460481&view=diff
==============================================================================
--- kafka/site/faq.html (original)
+++ kafka/site/faq.html Sun Mar 24 22:54:39 2013
@@ -2,12 +2,24 @@
<h2>Frequently asked questions</h3>
<ol>
+<li> <h3> Why do I get QueueFullException in my producer when running in async
mode? </h3>
+This typically happens when the producer is trying to send messages quicker
than the broker can handle. If the producer can't block, one will have to add
enough brokers so that they jointly can handle the load. If the producer can
block, one can set queue.enqueueTimeout.ms in producer config to -1. This way,
if the queue is full, the producer will block instead of dropping messages.
+
<li> <h3> Why does my consumer get InvalidMessageSizeException? </h3>
This typically means that the "fetch size" of the consumer is too small. Each
time the consumer pulls data from the broker, it reads bytes up to a configured
limit. If that limit is smaller than the largest single message stored in
Kafka, the consumer can't decode the message properly and will throw an
InvalidMessageSizeException. To fix this, increase the limit by setting the
property "fetch.size" properly in config/consumer.properties. The default
fetch.size is 300,000 bytes.
<li> <h3> On EC2, why can't my high-level consumers connect to the brokers?
</h3>
When a broker starts up, it registers its host ip in ZK. The high-level
consumer later uses the registered host ip to establish the socket connection
to the broker. By default, the registered ip is given by
InetAddress.getLocalHost.getHostAddress. Typically, this should return the real
ip of the host. However, in EC2, the returned ip is an internal one and can't
be connected to from outside. The solution is to explicitly set the host ip to
be registered in ZK by setting the "hostname" property in server.properties.
+<li> <h3> Why some of the consumers in a consumer group never receive any
message? </h3>
+Currently, a topic partition is the smallest unit that we distribute messages
among consumers in the same consumer group. So, if the number of consumers is
larger than the total number of partitions in a Kafka cluster (across all
brokers), some consumers will never get any data. The solution is to increase
the number of partitions on the broker.
+
+<li> <h3> How do I choose the number of partitions for a topic? </h3>
+Having more partitions increases I/O parallelism for writes and thus leads to
higher producer throughput. It also increases the degree of parallelism for
consumers (see the previous question). On the other hand, more partitions adds
some overhead: (a) there will be more segment files and thus more open file
handlers in the broker; (b) there are more offsets to be checkpointed by
consumers which can increase the load of Zookeeper. So, one needs to balace
these tradeoffs.
+
+<li> <h3> Why are there many rebalances in my consumer log? </h3>
+A typical reason for many rebalances is the consumer side GC. If so, you will
see Zookeeper session expirations in the consumer log (grep for Expired).
Occasional rebalances are fine. Too many rebalances can slow down the
consumption and one will need to tune the java GC setting.
+
<li> <h3> My consumer seems to have stopped, why? </h3>
First, try to figure out if the consumer has really stopped or is just slow,
using our tool <code>ConsumerOffsetChecker</code>.
<pre>