[ 
https://issues.apache.org/jira/browse/KAFKA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17150855#comment-17150855
 ] 

Christian Becker commented on KAFKA-10228:
------------------------------------------

If you don't want to change the exception type to ensure compatibility, it 
might make sense to change the log line from debug to info or something more 
severe to give a clue about the cause:
{code:java}
log.debug("Disconnecting from node {} due to request timeout.", nodeId); {code}

> producer: NETWORK_EXCEPTION is thrown instead of a request timeout
> ------------------------------------------------------------------
>
>                 Key: KAFKA-10228
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10228
>             Project: Kafka
>          Issue Type: Improvement
>          Components: clients
>    Affects Versions: 2.3.1
>            Reporter: Christian Becker
>            Priority: Major
>
> We're currently seeing an issue with the java client (producer), when message 
> producing runs into a timeout. Namely a NETWORK_EXCEPTION is thrown instead 
> of a timeout exception.
> *Situation and relevant code:*
> Config
> {code:java}
> request.timeout.ms: 200
> retries: 3
> acks: all{code}
> {code:java}
> for (UnpublishedEvent event : unpublishedEvents) {
>     ListenableFuture<SendResult<String, String>> future;
>     future = kafkaTemplate.send(new ProducerRecord<>(event.getTopic(), 
> event.getKafkaKey(), event.getPayload()));
>     futures.add(future.completable());
> }
> CompletableFuture.allOf(futures.stream().toArray(CompletableFuture[]::new)).join();{code}
> We're using the KafkaTemplate from SpringBoot here, but it shouldn't matter, 
> as it's merely a wrapper. There we put in batches of messages to be sent.
> 200ms later, we can see the following in the logs: (not sure about the order, 
> they've arrived in the same ms, so our logging system might not display them 
> in the right order)
> {code:java}
> [Producer clientId=producer-1] Received invalid metadata error in produce 
> request on partition events-6 due to 
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.. Going to request metadata update now
> [Producer clientId=producer-1] Got error produce response with correlation id 
> 3094 on topic-partition events-6, retrying (2 attempts left). Error: 
> NETWORK_EXCEPTION {code}
> There is also a corresponding error on the broker (within a few ms):
> {code:java}
> Attempting to send response via channel for which there is no open 
> connection, connection id XXX (kafka.network.Processor) {code}
> This was somewhat unexpected and sent us for a hunt across the infrastructure 
> for possible connection issues, but we've found none.
> Side note: In some cases the retries worked and the messages were 
> successfully produced.
> Only after many hours of heavy debugging, we've noticed, that the error might 
> be related to the low timeout setting. We've removed that setting now, as it 
> was a remnant from the past and no longer valid for our use-case. However in 
> order to avoid other people having that issue again and to simplify future 
> debugging, some form of timeout exception should be thrown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to