[
https://issues.apache.org/jira/browse/KAFKA-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
radha updated KAFKA-4939:
-------------------------
Description:
If you have many brokers in your bootstrap.servers list and some cannot be
reached by a specific Kafka client for whatever reason, it does not log this as
ERROR and fails publishing with other errors that can never be resolved by
increasing timeouts or metadata or retries.
{noformat}
ERROR pool-3-thread-3 [ProducerDroppedMessageExceptionLogger ]
- Exception occured while producing message: Failed to update metadata after
1000 ms.
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata
after 1000 ms.
ERROR kafka-producer-network-thread | producer-1
[ProducerDroppedMessageExceptionLogger ] - Exception occured while producing
message: Expiring 1 record(s) for Q.REST.TOPIC-18 due to 5048 ms has passed
since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for
Q.REST.TOPIC-18 due to 5048 ms has passed since batch creation plus linger time
{noformat}
You will see connections established to other Kafka brokers when doing netstat,
even though these messages fail to be published.
We have wasted several hours before increasing log levels to TRACE and seeing
these and confirming that we cannot even ping that specific Kafka Broker.
Logs that should be in ERROR and also retried:
{noformat}
[org.apache.kafka.common.network.Selector] - Connection with
some-prd-kafk02/*.*.*.* disconnected
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at
org.apache.kafka.common.network.PlaintextTransportLayer.finishConnect(PlaintextTransportLayer.java:51)
at
org.apache.kafka.common.network.KafkaChannel.finishConnect(KafkaChannel.java:73)
at
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:323)
at org.apache.kafka.common.network.Selector.poll(Selector.java:291)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
at
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:236)
at
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135)
at java.lang.Thread.run(Thread.java:745)
[org.apache.kafka.clients.NetworkClient ] - Node 206 disconnected.
{noformat}
was:
If you have many brokers in your bootstrap.servers list and some cannot be
reached by a specific Kafka client for whatever reason, (cannot ping), it does
not log this as ERROR and fails publishing with other errors that can never be
resolved by increasing timeouts or metadata or retries.
{noformat}
ERROR pool-3-thread-3 [ProducerDroppedMessageExceptionLogger ]
- Exception occured while producing message: Failed to update metadata after
1000 ms.
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata
after 1000 ms.
ERROR kafka-producer-network-thread | producer-1
[ProducerDroppedMessageExceptionLogger ] - Exception occured while producing
message: Expiring 1 record(s) for Q.REST.TOPIC-18 due to 5048 ms has passed
since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for
Q.REST.TOPIC-18 due to 5048 ms has passed since batch creation plus linger time
{noformat}
You will see connections established to other Kafka brokers when doing netstat,
even though these messages fail to be published.
We have wasted several hours before increasing log levels to TRACE and seeing
these and confirming that we cannot even ping that specific Kafka Broker.
Logs that should be in ERROR and also retried:
{noformat}
[org.apache.kafka.common.network.Selector] - Connection with
some-prd-kafk02/*.*.*.* disconnected
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at
org.apache.kafka.common.network.PlaintextTransportLayer.finishConnect(PlaintextTransportLayer.java:51)
at
org.apache.kafka.common.network.KafkaChannel.finishConnect(KafkaChannel.java:73)
at
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:323)
at org.apache.kafka.common.network.Selector.poll(Selector.java:291)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
at
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:236)
at
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135)
at java.lang.Thread.run(Thread.java:745)
[org.apache.kafka.clients.NetworkClient ] - Node 206 disconnected.
{noformat}
> Kafka does not log NoRouteToHostException in ERROR log level
> -------------------------------------------------------------
>
> Key: KAFKA-4939
> URL: https://issues.apache.org/jira/browse/KAFKA-4939
> Project: Kafka
> Issue Type: Bug
> Components: clients
> Affects Versions: 0.10.1.1
> Reporter: radha
> Priority: Minor
>
> If you have many brokers in your bootstrap.servers list and some cannot be
> reached by a specific Kafka client for whatever reason, it does not log this
> as ERROR and fails publishing with other errors that can never be resolved by
> increasing timeouts or metadata or retries.
> {noformat}
> ERROR pool-3-thread-3 [ProducerDroppedMessageExceptionLogger ]
> - Exception occured while producing message: Failed to update metadata
> after 1000 ms.
> org.apache.kafka.common.errors.TimeoutException: Failed to update metadata
> after 1000 ms.
> ERROR kafka-producer-network-thread | producer-1
> [ProducerDroppedMessageExceptionLogger ] - Exception occured while
> producing message: Expiring 1 record(s) for Q.REST.TOPIC-18 due to 5048 ms
> has passed since batch creation plus linger time
> org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for
> Q.REST.TOPIC-18 due to 5048 ms has passed since batch creation plus linger
> time
> {noformat}
> You will see connections established to other Kafka brokers when doing
> netstat, even though these messages fail to be published.
> We have wasted several hours before increasing log levels to TRACE and seeing
> these and confirming that we cannot even ping that specific Kafka Broker.
> Logs that should be in ERROR and also retried:
> {noformat}
> [org.apache.kafka.common.network.Selector] - Connection with
> some-prd-kafk02/*.*.*.* disconnected
> java.net.NoRouteToHostException: No route to host
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
> at
> org.apache.kafka.common.network.PlaintextTransportLayer.finishConnect(PlaintextTransportLayer.java:51)
> at
> org.apache.kafka.common.network.KafkaChannel.finishConnect(KafkaChannel.java:73)
> at
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:323)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:291)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
> at
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:236)
> at
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135)
> at java.lang.Thread.run(Thread.java:745)
> [org.apache.kafka.clients.NetworkClient ] - Node 206 disconnected.
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)