[
https://issues.apache.org/jira/browse/KAFKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991677#comment-14991677
]
Rajini Sivaram commented on KAFKA-2749:
---------------------------------------
[~geoffra] Had a look at the failure, and I couldn't figure out what the cause
of the failure was. I have been running just that test in a loop locally, and I
haven't been able to recreate the failure.
Failing test scenario:
- EndToEndLatency test is run with a topic with 6 partitions and replication
factor 3 on a cluster of 3 Kafka brokers (broker-1=worker7, broker-2=worker8,
broker-3=worker9). So every broker has to available and responsive for the test
to complete successfully. The failing test was using SSL for clients as well as
interbroker communication. No other security protocol was enabled on the
brokers.
- Test logs indicate a problem with worker9 and the problem persists for 10
minutes after which the test times out. The replication thread on both worker7
and worker8 got socket timeouts (30 second timeout) in their fetch request to
worker9.
{quote}
WARN [ReplicaFetcherThread-0-3], Error in fetch
kafka.server.ReplicaFetcherThread$FetchRequest@3a58942f. Possible cause:
java.io.IOException: Connection to 3 was disconnected before the response was
read (kafka.server.ReplicaFetcherThread)
{quote}
- Replication thread on both worker7 and worker8 are retrying the connection to
worker9 every 30 seconds for the 10 minute interval and are managing to create
successful SSL connections throughout the 10 minutes (there are no handshake
exceptions in the logs), so it doesn't look like a network glitch. The timeout
is always on the fetch request in the replication thread, recurring as new
connections are created every 30 seconds.
- Didn't find anything untoward in the controller logs (worker7)
Could someone with better knowledge of replication take a look and see what can
cause socket timeouts in fetch requests when there is connectivity?
> Failure of end to end latency test during nightly run
> -----------------------------------------------------
>
> Key: KAFKA-2749
> URL: https://issues.apache.org/jira/browse/KAFKA-2749
> Project: Kafka
> Issue Type: Bug
> Reporter: Geoff Anderson
>
> With SSL enabled, end to end latency timed out during the following nightly
> run:
> http://testing.confluent.io/kafka/2015-11-04--001/Benchmark/test_end_to_end_latency/security_protocol=SSL/
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)