[ https://issues.apache.org/jira/browse/KAFKA-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16770240#comment-16770240 ]
Peter M Elias edited comment on KAFKA-6098 at 2/16/19 10:18 PM: ---------------------------------------------------------------- I've reproduced this for 1.0.0 and also want to note for detail that there is a state reached whereby the configuration of a "recreated" topic is present in Zookeeper but IS NOT returned by querying the brokers' metadata because the broker has not successfully initialized the log directories. I reproduced this by creating 3 topics using a function that loops until it receives TopicExistsException for each topic, followed by a function that deletes the 3 topics until it receives UnknownTopicOrPartitionException for each of them, followed by calling the first function again. The typical result is that 2 out of 3 topics are recreated but the 3rd topic is reported as existing via TopicExistsException response AND has the expected entries in Zookeeper... however it's log directories are never initialized and this can be confirmed both via inspecting the file system as well as the LACK of the log output here: [https://github.com/apache/kafka/blob/1.0/core/src/main/scala/kafka/log/Log.scala#L208-L220] Presumably, something is blocking one of those calls from ever completing and hence we are left with a topic that "exists" but has no initialized segments and therefore is not advertised by the brokers in response to a Metadata request... yet it cannot be created because it is present in Zookeeper. It should be noted that I ran this test against a SINGLE Kafka broker with a SINGLE Zookeeper process and a SINGLE client. was (Author: petermelias): I've reproduced this for 1.0.0 and also want to note for detail that there is a state reached whereby the configuration of a "recreated" topic is present in Zookeeper but IS NOT returned by querying the brokers' metadata because the broker has not successfully initialized the log directories. I reproduced this by creating 3 topics using a function that loops until it receives TopicExistsException for each topic, followed by a function that deletes the 3 topics until it receives UnknownTopicOrPartitionException for each of them, followed by calling the first function again. The typical result is that 2 out of 3 topics are recreated but the 3rd topic is reported as existing via TopicExistsException response AND has the expected entries in Zookeeper... however it's log directories are never initialized and this can be confirmed both via inspecting the file system as well as the LACK of the log output here: [https://github.com/apache/kafka/blob/1.0/core/src/main/scala/kafka/log/Log.scala#L208-L220] Presumably, something is blocking one of those calls from ever completing and hence we are left with a topic that "exists" but has no initialized segments and therefore is not advertised by the brokers in response to a Metadata request... yet it cannot be created because it is present in Zookeeper. > Delete and Re-create topic operation could result in race condition > ------------------------------------------------------------------- > > Key: KAFKA-6098 > URL: https://issues.apache.org/jira/browse/KAFKA-6098 > Project: Kafka > Issue Type: Bug > Reporter: Guozhang Wang > Priority: Major > Labels: reliability > > Here is the following process to re-produce this issue: > 1. Delete a topic using the delete topic request. > 2. Confirm the topic is deleted using the list topics request. > 3. Create the topic using the create topic request. > In step 3) a race condition can happen that the response returns a > {{TOPIC_ALREADY_EXISTS}} error code, indicating the topic has already existed. > The root cause of the above issue is in the {{TopicDeletionManager}} class: > {code} > controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq, > OfflinePartition) > controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq, > NonExistentPartition) > topicsToBeDeleted -= topic > partitionsToBeDeleted.retain(_.topic != topic) > kafkaControllerZkUtils.deleteTopicZNode(topic) > kafkaControllerZkUtils.deleteTopicConfigs(Seq(topic)) > kafkaControllerZkUtils.deleteTopicDeletions(Seq(topic)) > controllerContext.removeTopic(topic) > {code} > I.e. it first update the broker's metadata cache through the ISR and metadata > update request, then delete the topic zk path, and then delete the > topic-deletion zk path. However, upon handling the create topic request, the > broker will simply try to write to the topic zk path directly. Hence there is > a race condition that between brokers update their metadata cache (hence list > topic request not returning this topic anymore) and zk path for the topic be > deleted (hence the create topic succeed). > The reason this problem could be exposed, is through current handling logic > of the create topic response, most of which takes {{TOPIC_ALREADY_EXISTS}} as > "OK" and moves on, and the zk path will be deleted later, hence leaving the > topic to be not created at all: > https://github.com/apache/kafka/blob/249e398bf84cdd475af6529e163e78486b43c570/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamsKafkaClient.java#L221 > https://github.com/apache/kafka/blob/1a653c813c842c0b67f26fb119d7727e272cf834/connect/runtime/src/main/java/org/apache/kafka/connect/util/TopicAdmin.java#L232 > Looking at the code history, it seems this race condition always exist, but > testing on trunk / 1.0 with the above steps it is more likely to happen than > before. I wonder if the ZK async calls have an effect here. cc [~junrao] > [~onurkaraman] -- This message was sent by Atlassian JIRA (v7.6.3#76005)