Mike Pedersen created KAFKA-16651:
-------------------------------------
Summary: KafkaProducer.send does not throw TimeoutException as
documented
Key: KAFKA-16651
URL: https://issues.apache.org/jira/browse/KAFKA-16651
Project: Kafka
Issue Type: Bug
Components: producer
Affects Versions: 3.6.2
Reporter: Mike Pedersen
In the JavaDoc for {{KafkaProducer#send(ProducerRecord, Callback)}}, it claims
that it will throw a {{TimeoutException}} if blocking on fetching metadata or
allocating memory and surpassing {{max.block.ms}}.
bq. Throws:
bq. {{TimeoutException}} - If the time taken for fetching metadata or
allocating memory for the record has surpassed max.block.ms.
([link|https://kafka.apache.org/36/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html#send(org.apache.kafka.clients.producer.ProducerRecord,org.apache.kafka.clients.producer.Callback)])
But this is not the case. As {{TimeoutException}} is an {{ApiException}} it
will hit [this
catch|https://github.com/a0x8o/kafka/blob/54eff6af115ee647f60129f2ce6a044cb17215d0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1073-L1084]
which will result in a failed future being returned instead of the exception
being thrown.
The "allocating memory" part likely changed as part of
[KAFKA-3720|https://github.com/apache/kafka/pull/8399/files#diff-43491ffa1e0f8d28db071d8c23f1a76b54f1f20ea98cf6921bfd1c77a90446abR29]
which changed the base exception for buffer exhaustion exceptions to
{{TimeoutException}}. Timing out waiting on metadata suffers the same issue,
but it is not clear whether this has always been the case.
This is basically a discrepancy between documentation and behavior, so it's a
question of which one should be adjusted.
And on that, being able to differentiate between synchronous timeouts (as
caused by waiting on metadata or allocating memory) and asynchronous timeouts
(eg. timing out waiting for acks) is useful. In the former case we _know_ that
the broker has not received the event but in the latter it _may_ be that the
broker has received it but the ack could not be delivered, and our actions
might vary because of this. The current behavior makes this hard to
differentiate since both result in a {{TimeoutException}} being delivered via
the callback. Currently, we are relying on the exception message, but this is
basically just relying on implementation detail that may change at any time.
Therefore I would suggest to either:
* Revert to the documented behavior of throwing in case of synchronous timeouts
* Correct the javadoc and introduce an exception base class/interface for
synchronous timeouts
--
This message was sent by Atlassian Jira
(v8.20.10#820010)