[ https://issues.apache.org/jira/browse/KAFKA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17150855#comment-17150855 ]
Christian Becker commented on KAFKA-10228: ------------------------------------------ If you don't want to change the exception type to ensure compatibility, it might make sense to change the log line from debug to info or something more severe to give a clue about the cause: {code:java} log.debug("Disconnecting from node {} due to request timeout.", nodeId); {code} > producer: NETWORK_EXCEPTION is thrown instead of a request timeout > ------------------------------------------------------------------ > > Key: KAFKA-10228 > URL: https://issues.apache.org/jira/browse/KAFKA-10228 > Project: Kafka > Issue Type: Improvement > Components: clients > Affects Versions: 2.3.1 > Reporter: Christian Becker > Priority: Major > > We're currently seeing an issue with the java client (producer), when message > producing runs into a timeout. Namely a NETWORK_EXCEPTION is thrown instead > of a timeout exception. > *Situation and relevant code:* > Config > {code:java} > request.timeout.ms: 200 > retries: 3 > acks: all{code} > {code:java} > for (UnpublishedEvent event : unpublishedEvents) { > ListenableFuture<SendResult<String, String>> future; > future = kafkaTemplate.send(new ProducerRecord<>(event.getTopic(), > event.getKafkaKey(), event.getPayload())); > futures.add(future.completable()); > } > CompletableFuture.allOf(futures.stream().toArray(CompletableFuture[]::new)).join();{code} > We're using the KafkaTemplate from SpringBoot here, but it shouldn't matter, > as it's merely a wrapper. There we put in batches of messages to be sent. > 200ms later, we can see the following in the logs: (not sure about the order, > they've arrived in the same ms, so our logging system might not display them > in the right order) > {code:java} > [Producer clientId=producer-1] Received invalid metadata error in produce > request on partition events-6 due to > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received.. Going to request metadata update now > [Producer clientId=producer-1] Got error produce response with correlation id > 3094 on topic-partition events-6, retrying (2 attempts left). Error: > NETWORK_EXCEPTION {code} > There is also a corresponding error on the broker (within a few ms): > {code:java} > Attempting to send response via channel for which there is no open > connection, connection id XXX (kafka.network.Processor) {code} > This was somewhat unexpected and sent us for a hunt across the infrastructure > for possible connection issues, but we've found none. > Side note: In some cases the retries worked and the messages were > successfully produced. > Only after many hours of heavy debugging, we've noticed, that the error might > be related to the low timeout setting. We've removed that setting now, as it > was a remnant from the past and no longer valid for our use-case. However in > order to avoid other people having that issue again and to simplify future > debugging, some form of timeout exception should be thrown. -- This message was sent by Atlassian Jira (v8.3.4#803005)