[ 
https://issues.apache.org/jira/browse/KAFKA-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16690140#comment-16690140
 ] 

Dong Lin commented on KAFKA-7648:
---------------------------------

Currently TestUtils.createTopic(...) will re-send znode creation request to 
zookeeper service if the previous response shows Code.CONNECTIONLOSS. See 
KafkaZkClient.retryRequestsUntilConnected() for related logic.

This means that the test will fail if the zookeeper has created znode upon the 
first request, the response to the first request is lost or timed-out, the 
second request is sent, and the response of the second request shows 
Code.NODEEXISTS.

In order to fix this flaky test, we probably should implement some logic 
similar to KafkaZkClient.CheckedEphemeral() to check whether the znode has been 
created in the with the same session id after receiving Code.NODEEXISTS.



> Flaky test DeleteTopicsRequestTest.testValidDeleteTopicRequests
> ---------------------------------------------------------------
>
>                 Key: KAFKA-7648
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7648
>             Project: Kafka
>          Issue Type: Sub-task
>            Reporter: Dong Lin
>            Priority: Major
>
> Observed in 
> [https://builds.apache.org/job/kafka-2.1-jdk8/52/testReport/junit/kafka.server/DeleteTopicsRequestTest/testValidDeleteTopicRequests/]
>  
> {code}
> Error Message
> org.apache.kafka.common.errors.TopicExistsException: Topic 'topic-4' already 
> exists.
> h3. Stacktrace
> org.apache.kafka.common.errors.TopicExistsException: Topic 'topic-4' already 
> exists.
> h3. Standard Output
> [2018-11-07 17:53:10,812] ERROR [ReplicaFetcher replicaId=2, leaderId=0, 
> fetcherId=0] Error for partition topic-3-3 at offset 0 
> (kafka.server.ReplicaFetcherThread:76) 
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition. [2018-11-07 17:53:10,812] ERROR 
> [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition 
> topic-3-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition. [2018-11-07 17:53:14,805] WARN Client 
> session timed out, have not heard from server in 4000ms for sessionid 
> 0x10051eebf480003 (org.apache.zookeeper.ClientCnxn:1112) [2018-11-07 
> 17:53:14,806] WARN Unable to read additional data from client sessionid 
> 0x10051eebf480003, likely client has closed socket 
> (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:14,807] 
> WARN Client session timed out, have not heard from server in 4002ms for 
> sessionid 0x10051eebf480002 (org.apache.zookeeper.ClientCnxn:1112) 
> [2018-11-07 17:53:14,807] WARN Unable to read additional data from client 
> sessionid 0x10051eebf480002, likely client has closed socket 
> (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:14,823] 
> WARN Client session timed out, have not heard from server in 4002ms for 
> sessionid 0x10051eebf480001 (org.apache.zookeeper.ClientCnxn:1112) 
> [2018-11-07 17:53:14,824] WARN Unable to read additional data from client 
> sessionid 0x10051eebf480001, likely client has closed socket 
> (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:15,423] 
> WARN Client session timed out, have not heard from server in 4002ms for 
> sessionid 0x10051eebf480000 (org.apache.zookeeper.ClientCnxn:1112) 
> [2018-11-07 17:53:15,423] WARN Unable to read additional data from client 
> sessionid 0x10051eebf480000, likely client has closed socket 
> (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:15,879] 
> WARN fsync-ing the write ahead log in SyncThread:0 took 4456ms which will 
> adversely effect operation latency. See the ZooKeeper troubleshooting guide 
> (org.apache.zookeeper.server.persistence.FileTxnLog:338) [2018-11-07 
> 17:53:16,831] ERROR [ReplicaFetcher replicaId=0, leaderId=2, fetcherId=0] 
> Error for partition topic-4-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76) 
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition. [2018-11-07 17:53:23,087] ERROR 
> [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition 
> invalid-timeout-1 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition. [2018-11-07 17:53:23,088] ERROR 
> [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=0] Error for partition 
> invalid-timeout-3 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition. [2018-11-07 17:53:23,137] ERROR 
> [ReplicaFetcher replicaId=0, leaderId=2, fetcherId=0] Error for partition 
> invalid-timeout-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
>   
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to