faq.html

junrao Sun, 24 Mar 2013 15:55:03 -0700

Author: junrao
Date: Sun Mar 24 22:54:39 2013
New Revision: 1460481

URL: http://svn.apache.org/r1460481
Log:
add more FAQ


Modified:
    kafka/site/faq.html

Modified: kafka/site/faq.html
URL: 
http://svn.apache.org/viewvc/kafka/site/faq.html?rev=1460481&r1=1460480&r2=1460481&view=diff
==============================================================================
--- kafka/site/faq.html (original)
+++ kafka/site/faq.html Sun Mar 24 22:54:39 2013
@@ -2,12 +2,24 @@
 
 <h2>Frequently asked questions</h3>
 <ol>   
+<li> <h3> Why do I get QueueFullException in my producer when running in async 
mode? </h3>
+This typically happens when the producer is trying to send messages quicker 
than the broker can handle. If the producer can't block, one will have to add 
enough brokers so that they jointly can handle the load. If the producer can 
block, one can set queue.enqueueTimeout.ms in producer config to -1. This way, 
if the queue is full, the producer will block instead of dropping messages.
+
 <li> <h3> Why does my consumer get InvalidMessageSizeException? </h3>
 This typically means that the "fetch size" of the consumer is too small. Each 
time the consumer pulls data from the broker, it reads bytes up to a configured 
limit. If that limit is smaller than the largest single message stored in 
Kafka, the consumer can't decode the message properly and will throw an 
InvalidMessageSizeException. To fix this, increase the limit by setting the 
property "fetch.size" properly in config/consumer.properties. The default 
fetch.size is 300,000 bytes.
 
 <li> <h3> On EC2, why can't my high-level consumers connect to the brokers? 
</h3>
 When a broker starts up, it registers its host ip in ZK. The high-level 
consumer later uses the registered host ip to establish the socket connection 
to the broker. By default, the registered ip is given by 
InetAddress.getLocalHost.getHostAddress. Typically, this should return the real 
ip of the host. However, in EC2, the returned ip is an internal one and can't 
be connected to from outside. The solution is to explicitly set the host ip to 
be registered in ZK by setting the "hostname" property in server.properties.
 
+<li> <h3> Why some of the consumers in a consumer group never receive any 
message? </h3>
+Currently, a topic partition is the smallest unit that we distribute messages 
among consumers in the same consumer group. So, if the number of consumers is 
larger than the total number of partitions in a Kafka cluster (across all 
brokers), some consumers will never get any data. The solution is to increase 
the number of partitions on the broker.
+
+<li> <h3> How do I choose the number of partitions for a topic? </h3>
+Having more partitions increases I/O parallelism for writes and thus leads to 
higher producer throughput. It also increases the degree of parallelism for 
consumers (see the previous question). On the other hand, more partitions adds 
some overhead: (a) there will be more segment files and thus more open file 
handlers in the broker; (b) there are more offsets to be checkpointed by 
consumers which can increase the load of Zookeeper. So, one needs to balace 
these tradeoffs. 
+
+<li> <h3> Why are there many rebalances in my consumer log? </h3>
+A typical reason for many rebalances is the consumer side GC. If so, you will 
see Zookeeper session expirations in the consumer log (grep for Expired). 
Occasional rebalances are fine. Too many rebalances can slow down the 
consumption and one will need to tune the java GC setting.
+
 <li> <h3> My consumer seems to have stopped, why? </h3>
 First, try to figure out if the consumer has really stopped or is just slow, 
using our tool <code>ConsumerOffsetChecker</code>.
 <pre>

svn commit: r1460481 - /kafka/site/faq.html

Reply via email to