David Hay created KAFKA-2135:
--------------------------------
Summary: New Kafka Producer Client: Send requests wait
indefinitely if no broker is available.
Key: KAFKA-2135
URL: https://issues.apache.org/jira/browse/KAFKA-2135
Project: Kafka
Issue Type: Bug
Components: producer
Affects Versions: 0.8.2.0
Reporter: David Hay
Assignee: Jun Rao
Priority: Critical
I'm seeing issues when sending a message with the new producer client API. The
future returned from Producer.send() will block indefinitely if the cluster is
unreachable for some reason. Here are the steps:
# Start up a single node kafka cluster locally.
# Start up application and create a KafkaProducer with the following config:
{noformat}
KafkaProducerWrapper values:
compression.type = snappy
metric.reporters = []
metadata.max.age.ms = 300000
metadata.fetch.timeout.ms = 60000
acks = all
batch.size = 16384
reconnect.backoff.ms = 10
bootstrap.servers = [localhost:9092]
receive.buffer.bytes = 32768
retry.backoff.ms = 100
buffer.memory = 33554432
timeout.ms = 30000
key.serializer = class com.mycompany.kafka.serializer.ToStringEncoder
retries = 3
max.request.size = 1048576
block.on.buffer.full = true
value.serializer = class com.mycompany.kafka.serializer.JsonEncoder
metrics.sample.window.ms = 30000
send.buffer.bytes = 131072
max.in.flight.requests.per.connection = 5
metrics.num.samples = 2
linger.ms = 0
client.id = site-json
{noformat}
# Send some messages...they are successfully sent
# Shut down the kafka broker
# Send another message.
At this point, calling {{get()}} on the returned Future will block indefinitely
until the broker is restarted.
It appears that there is some logic in
{{org.apache.kafka.clients.producer.internal.Sender}} that is supposed to mark
the Future as "done" in response to a disconnect event (towards the end of the
run(long) method). However, the while loop earlier in this method seems to
remove the broker from consideration entirely, so the final loop over
ClientResponse objects is never executed.
It seems like "timeout.ms" configuration should be honored in this case, or
perhaps introduce another timeout, indicating that we should give up waiting
for the cluster to return.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)