[ 
https://issues.apache.org/jira/browse/KAFKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991677#comment-14991677
 ] 

Rajini Sivaram commented on KAFKA-2749:
---------------------------------------

[~geoffra] Had a look at the failure, and I couldn't figure out what the cause 
of the failure was. I have been running just that test in a loop locally, and I 
haven't been able to recreate the failure.

Failing test scenario:
- EndToEndLatency test is run with a topic with 6 partitions and replication 
factor 3 on a cluster of 3 Kafka brokers (broker-1=worker7, broker-2=worker8, 
broker-3=worker9). So every broker has to available and responsive for the test 
to complete successfully. The failing test was using SSL for clients as well as 
interbroker communication. No other security protocol was enabled on the 
brokers. 
- Test logs indicate a problem with worker9 and the problem persists for 10 
minutes after which the test times out. The replication thread on both worker7 
and worker8 got socket timeouts (30 second timeout) in their fetch request to 
worker9.
{quote}
WARN [ReplicaFetcherThread-0-3], Error in fetch 
kafka.server.ReplicaFetcherThread$FetchRequest@3a58942f. Possible cause: 
java.io.IOException: Connection to 3 was disconnected before the response was 
read (kafka.server.ReplicaFetcherThread)
{quote}
- Replication thread on both worker7 and worker8 are retrying the connection to 
worker9 every 30 seconds for the 10 minute interval and are managing to create 
successful SSL connections throughout the 10 minutes (there are no handshake 
exceptions in the logs), so it doesn't look like a network glitch. The timeout 
is always on the fetch request in the replication thread, recurring as new 
connections are created every 30 seconds.
- Didn't find anything untoward in the controller logs (worker7)

Could someone with better knowledge of replication take a look and see what can 
cause socket timeouts in fetch requests when there is connectivity?



> Failure of end to end latency test during nightly run
> -----------------------------------------------------
>
>                 Key: KAFKA-2749
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2749
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Geoff Anderson
>
> With SSL enabled, end to end latency timed out during the following nightly 
> run:
> http://testing.confluent.io/kafka/2015-11-04--001/Benchmark/test_end_to_end_latency/security_protocol=SSL/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to