[jira] [Commented] (KAFKA-6098) Delete and Re-create topic operation could result in race condition
[ https://issues.apache.org/jira/browse/KAFKA-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16948689#comment-16948689 ] ASF GitHub Bot commented on KAFKA-6098: --- bdbyrne commented on pull request #7343: KAFKA-6098: Fix race between topic deletion and re-creation. URL: https://github.com/apache/kafka/pull/7343 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Delete and Re-create topic operation could result in race condition > --- > > Key: KAFKA-6098 > URL: https://issues.apache.org/jira/browse/KAFKA-6098 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang >Priority: Major > Labels: reliability > > Here is the following process to re-produce this issue: > 1. Delete a topic using the delete topic request. > 2. Confirm the topic is deleted using the list topics request. > 3. Create the topic using the create topic request. > In step 3) a race condition can happen that the response returns a > {{TOPIC_ALREADY_EXISTS}} error code, indicating the topic has already existed. > The root cause of the above issue is in the {{TopicDeletionManager}} class: > {code} > controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq, > OfflinePartition) > controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq, > NonExistentPartition) > topicsToBeDeleted -= topic > partitionsToBeDeleted.retain(_.topic != topic) > kafkaControllerZkUtils.deleteTopicZNode(topic) > kafkaControllerZkUtils.deleteTopicConfigs(Seq(topic)) > kafkaControllerZkUtils.deleteTopicDeletions(Seq(topic)) > controllerContext.removeTopic(topic) > {code} > I.e. it first update the broker's metadata cache through the ISR and metadata > update request, then delete the topic zk path, and then delete the > topic-deletion zk path. However, upon handling the create topic request, the > broker will simply try to write to the topic zk path directly. Hence there is > a race condition that between brokers update their metadata cache (hence list > topic request not returning this topic anymore) and zk path for the topic be > deleted (hence the create topic succeed). > The reason this problem could be exposed, is through current handling logic > of the create topic response, most of which takes {{TOPIC_ALREADY_EXISTS}} as > "OK" and moves on, and the zk path will be deleted later, hence leaving the > topic to be not created at all: > https://github.com/apache/kafka/blob/249e398bf84cdd475af6529e163e78486b43c570/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamsKafkaClient.java#L221 > https://github.com/apache/kafka/blob/1a653c813c842c0b67f26fb119d7727e272cf834/connect/runtime/src/main/java/org/apache/kafka/connect/util/TopicAdmin.java#L232 > Looking at the code history, it seems this race condition always exist, but > testing on trunk / 1.0 with the above steps it is more likely to happen than > before. I wonder if the ZK async calls have an effect here. cc [~junrao] > [~onurkaraman] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-6098) Delete and Re-create topic operation could result in race condition
[ https://issues.apache.org/jira/browse/KAFKA-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930837#comment-16930837 ] ASF GitHub Bot commented on KAFKA-6098: --- bdbyrne commented on pull request #7343: KAFKA-6098: Fix race between topic deletion and re-creation. URL: https://github.com/apache/kafka/pull/7343 During topic deletion, there exists a window where a broker updates its metadata cache to remove a deleted topic's partitions, and the controller removing the topic's znode. Consequently, it was possible for a broker to believe a topic no longer existed, however it couldn't be re-created due to outstanding ZK metadata. The fix is to return a transient error for when this condition is encountered. Given the window is anticipated to be small, a retry will eventually resolve the issue. Additionally, in rare cases, there existed a window where it was possible to re-create a topic's znode before its deletion process was fully complete. This is fixed by making topic creation account for any outstanding topic deletion znode. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Delete and Re-create topic operation could result in race condition > --- > > Key: KAFKA-6098 > URL: https://issues.apache.org/jira/browse/KAFKA-6098 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang >Priority: Major > Labels: reliability > > Here is the following process to re-produce this issue: > 1. Delete a topic using the delete topic request. > 2. Confirm the topic is deleted using the list topics request. > 3. Create the topic using the create topic request. > In step 3) a race condition can happen that the response returns a > {{TOPIC_ALREADY_EXISTS}} error code, indicating the topic has already existed. > The root cause of the above issue is in the {{TopicDeletionManager}} class: > {code} > controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq, > OfflinePartition) > controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq, > NonExistentPartition) > topicsToBeDeleted -= topic > partitionsToBeDeleted.retain(_.topic != topic) > kafkaControllerZkUtils.deleteTopicZNode(topic) > kafkaControllerZkUtils.deleteTopicConfigs(Seq(topic)) > kafkaControllerZkUtils.deleteTopicDeletions(Seq(topic)) > controllerContext.removeTopic(topic) > {code} > I.e. it first update the broker's metadata cache through the ISR and metadata > update request, then delete the topic zk path, and then delete the > topic-deletion zk path. However, upon handling the create topic request, the > broker will simply try to write to the topic zk path directly. Hence there is > a race condition that between brokers update their metadata cache (hence list > topic request not returning this topic anymore) and zk path for the topic be > deleted (hence the create topic succeed). > The reason this problem could be exposed, is through current handling logic > of the create topic response, most of which takes {{TOPIC_ALREADY_EXISTS}} as > "OK" and moves on, and the zk path will be deleted later, hence leaving the > topic to be not created at all: > https://github.com/apache/kafka/blob/249e398bf84cdd475af6529e163e78486b43c570/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamsKafkaClient.java#L221 > https://github.com/apache/kafka/blob/1a653c813c842c0b67f26fb119d7727e272cf834/connect/runtime/src/main/java/org/apache/kafka/connect/util/TopicAdmin.java#L232 > Looking at the code history, it seems this race condition always exist, but > testing on trunk / 1.0 with the above steps it is more likely to happen than > before. I wonder if the ZK async calls have an effect here. cc [~junrao] > [~onurkaraman] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (KAFKA-6098) Delete and Re-create topic operation could result in race condition
[ https://issues.apache.org/jira/browse/KAFKA-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928844#comment-16928844 ] Brian Byrne commented on KAFKA-6098: I've been investigating this issue and have collected some thoughts on it. Since I'm relatively new to Kafka, I'll be verbose in my explanation so that my understanding may be validated/corrected. The successful client criteria for deleting a topic is that the server persists the intent to delete the topic via creating ZK node /admin/delete_topics/. At this time, in-memory data structures will be modified to reflect the ongoing destruction of the topic, where consequently the topic's ZK node will be removed, followed by the deletion intent node*. The purpose for the operation to be performed asynchronously is that a topic may be ineligible for deletion for an indefinite amount of time during partition reassignment or broker instability. Topic listing/creating appear to be at odds with each other, further complicated by the race-prone ZK update sequence: it's required that the deletion intent node is removed after the topic's node for obvious recovery consistency reasons, however this also means there's a window where the deletion intent exists but the topic doesn't. In this case, a racing topic recreation is prone to some unexpected and undesirable behavior as the former may still be undergoing deletion (note topic creation doesn't check for a deletion intent). The 'list topics' request uses a different source of truth than the creation path, where the topics are gathered by looking at a topic's outstanding partitions' states. The partitions may be removed while the deletion is still outstanding, hence why the ZK node may still exist on creation, as [~guozhang] noted. A possible fix would be to have 'list topics' return a more conservative set of topics that are undergoing deletion. This might require some changes to how metadata snapshots are handled which seems a bit excessive for resolving this issue, although I'm not familiar with this component. The "easy-fix" solution would have the create topic path check the metadata cache for the topic's existence, where if it doesn't exist but the topic's deletion intent does, then a transient error is returned that asks the client to backoff+retry. This ensures that all possible state for the previous topic has been eliminated before the new one is created. The only downside is that there's a window where no partitions for the topic exists (i.e. doesn't appear in list topics), but the topic deletion cannot be completed, which should be relatively small and likely due to ZK inaccessibility, which would prevent the creation from completing anyway. Does this sound reasonable? [*] There's actually a deletion of the topic's configuration in-between, which may be missed in this case, which may be Peter's issue: [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/KafkaController.scala#L1621-L1627] > Delete and Re-create topic operation could result in race condition > --- > > Key: KAFKA-6098 > URL: https://issues.apache.org/jira/browse/KAFKA-6098 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang >Priority: Major > Labels: reliability > > Here is the following process to re-produce this issue: > 1. Delete a topic using the delete topic request. > 2. Confirm the topic is deleted using the list topics request. > 3. Create the topic using the create topic request. > In step 3) a race condition can happen that the response returns a > {{TOPIC_ALREADY_EXISTS}} error code, indicating the topic has already existed. > The root cause of the above issue is in the {{TopicDeletionManager}} class: > {code} > controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq, > OfflinePartition) > controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq, > NonExistentPartition) > topicsToBeDeleted -= topic > partitionsToBeDeleted.retain(_.topic != topic) > kafkaControllerZkUtils.deleteTopicZNode(topic) > kafkaControllerZkUtils.deleteTopicConfigs(Seq(topic)) > kafkaControllerZkUtils.deleteTopicDeletions(Seq(topic)) > controllerContext.removeTopic(topic) > {code} > I.e. it first update the broker's metadata cache through the ISR and metadata > update request, then delete the topic zk path, and then delete the > topic-deletion zk path. However, upon handling the create topic request, the > broker will simply try to write to the topic zk path directly. Hence there is > a race condition that between brokers update their metadata cache (hence list > topic request not returning this topic anymore) and zk path for the topic be > deleted (hence the create
[jira] [Commented] (KAFKA-6098) Delete and Re-create topic operation could result in race condition
[ https://issues.apache.org/jira/browse/KAFKA-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16770240#comment-16770240 ] Peter M Elias commented on KAFKA-6098: -- I've reproduced this for 1.0.0 and also want to note for detail that there is a state reached whereby the configuration of a "recreated" topic is present in Zookeeper but IS NOT returned by querying the brokers' metadata because the broker has not successfully initialized the log directories. I reproduced this by creating 3 topics using a function that loops until it receives TopicExistsException for each topic, followed by a function that deletes the 3 topics until it receives UnknownTopicOrPartitionException for each of them, followed by calling the first function again. The typical result is that 2 out of 3 topics are recreated but the 3rd topic is reported as existing via TopicExistsException response AND has the expected entries in Zookeeper... however it's log directories are never initialized and this can be confirmed both via inspecting the file system as well as the LACK of the log output here: [https://github.com/apache/kafka/blob/1.0/core/src/main/scala/kafka/log/Log.scala#L208-L220] Presumably, something is blocking one of those calls from ever completing and hence we are left with a topic that "exists" but has no initialized segments and therefore is not advertised by the brokers in response to a Metadata request... yet it cannot be created because it is present in Zookeeper. > Delete and Re-create topic operation could result in race condition > --- > > Key: KAFKA-6098 > URL: https://issues.apache.org/jira/browse/KAFKA-6098 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang >Priority: Major > Labels: reliability > > Here is the following process to re-produce this issue: > 1. Delete a topic using the delete topic request. > 2. Confirm the topic is deleted using the list topics request. > 3. Create the topic using the create topic request. > In step 3) a race condition can happen that the response returns a > {{TOPIC_ALREADY_EXISTS}} error code, indicating the topic has already existed. > The root cause of the above issue is in the {{TopicDeletionManager}} class: > {code} > controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq, > OfflinePartition) > controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq, > NonExistentPartition) > topicsToBeDeleted -= topic > partitionsToBeDeleted.retain(_.topic != topic) > kafkaControllerZkUtils.deleteTopicZNode(topic) > kafkaControllerZkUtils.deleteTopicConfigs(Seq(topic)) > kafkaControllerZkUtils.deleteTopicDeletions(Seq(topic)) > controllerContext.removeTopic(topic) > {code} > I.e. it first update the broker's metadata cache through the ISR and metadata > update request, then delete the topic zk path, and then delete the > topic-deletion zk path. However, upon handling the create topic request, the > broker will simply try to write to the topic zk path directly. Hence there is > a race condition that between brokers update their metadata cache (hence list > topic request not returning this topic anymore) and zk path for the topic be > deleted (hence the create topic succeed). > The reason this problem could be exposed, is through current handling logic > of the create topic response, most of which takes {{TOPIC_ALREADY_EXISTS}} as > "OK" and moves on, and the zk path will be deleted later, hence leaving the > topic to be not created at all: > https://github.com/apache/kafka/blob/249e398bf84cdd475af6529e163e78486b43c570/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamsKafkaClient.java#L221 > https://github.com/apache/kafka/blob/1a653c813c842c0b67f26fb119d7727e272cf834/connect/runtime/src/main/java/org/apache/kafka/connect/util/TopicAdmin.java#L232 > Looking at the code history, it seems this race condition always exist, but > testing on trunk / 1.0 with the above steps it is more likely to happen than > before. I wonder if the ZK async calls have an effect here. cc [~junrao] > [~onurkaraman] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6098) Delete and Re-create topic operation could result in race condition
[ https://issues.apache.org/jira/browse/KAFKA-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16522943#comment-16522943 ] Dhruvil Shah commented on KAFKA-6098: - I don't think we can provide any formal guarantees for these APIs if topics are being created and deleted concurrently. From the discussion so far, it looks like we want to be able to define the semantics for what happens in a single threaded application trying to delete, list, create topics, is this correct? One way to fix this problem could be to have deleteTopics return success only after the topic has been completely deleted (i.e. the topic znode has been deleted). listTopics could continue returning the topic information for this duration. Would this address the issue? > Delete and Re-create topic operation could result in race condition > --- > > Key: KAFKA-6098 > URL: https://issues.apache.org/jira/browse/KAFKA-6098 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang >Assignee: Dhruvil Shah >Priority: Major > Labels: reliability > > Here is the following process to re-produce this issue: > 1. Delete a topic using the delete topic request. > 2. Confirm the topic is deleted using the list topics request. > 3. Create the topic using the create topic request. > In step 3) a race condition can happen that the response returns a > {{TOPIC_ALREADY_EXISTS}} error code, indicating the topic has already existed. > The root cause of the above issue is in the {{TopicDeletionManager}} class: > {code} > controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq, > OfflinePartition) > controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq, > NonExistentPartition) > topicsToBeDeleted -= topic > partitionsToBeDeleted.retain(_.topic != topic) > kafkaControllerZkUtils.deleteTopicZNode(topic) > kafkaControllerZkUtils.deleteTopicConfigs(Seq(topic)) > kafkaControllerZkUtils.deleteTopicDeletions(Seq(topic)) > controllerContext.removeTopic(topic) > {code} > I.e. it first update the broker's metadata cache through the ISR and metadata > update request, then delete the topic zk path, and then delete the > topic-deletion zk path. However, upon handling the create topic request, the > broker will simply try to write to the topic zk path directly. Hence there is > a race condition that between brokers update their metadata cache (hence list > topic request not returning this topic anymore) and zk path for the topic be > deleted (hence the create topic succeed). > The reason this problem could be exposed, is through current handling logic > of the create topic response, most of which takes {{TOPIC_ALREADY_EXISTS}} as > "OK" and moves on, and the zk path will be deleted later, hence leaving the > topic to be not created at all: > https://github.com/apache/kafka/blob/249e398bf84cdd475af6529e163e78486b43c570/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamsKafkaClient.java#L221 > https://github.com/apache/kafka/blob/1a653c813c842c0b67f26fb119d7727e272cf834/connect/runtime/src/main/java/org/apache/kafka/connect/util/TopicAdmin.java#L232 > Looking at the code history, it seems this race condition always exist, but > testing on trunk / 1.0 with the above steps it is more likely to happen than > before. I wonder if the ZK async calls have an effect here. cc [~junrao] > [~onurkaraman] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6098) Delete and Re-create topic operation could result in race condition
[ https://issues.apache.org/jira/browse/KAFKA-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213044#comment-16213044 ] Ismael Juma commented on KAFKA-6098: [~guozhang], that's a good point. Because deletion actually happens asynchronously at the broker level, it happens reasonably fast and we could wait until the path is removed before returning a success. Having said that, if the call times out, a user may still rely on `listTopics`, so it would be good to handle the case where the delete is in progress during the create call. > Delete and Re-create topic operation could result in race condition > --- > > Key: KAFKA-6098 > URL: https://issues.apache.org/jira/browse/KAFKA-6098 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang > > Here is the following process to re-produce this issue: > 1. Delete a topic using the delete topic request. > 2. Confirm the topic is deleted using the list topics request. > 3. Create the topic using the create topic request. > In step 3) a race condition can happen that the response returns a > {{TOPIC_ALREADY_EXISTS}} error code, indicating the topic has already existed. > The root cause of the above issue is in the {{TopicDeletionManager}} class: > {code} > controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq, > OfflinePartition) > controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq, > NonExistentPartition) > topicsToBeDeleted -= topic > partitionsToBeDeleted.retain(_.topic != topic) > kafkaControllerZkUtils.deleteTopicZNode(topic) > kafkaControllerZkUtils.deleteTopicConfigs(Seq(topic)) > kafkaControllerZkUtils.deleteTopicDeletions(Seq(topic)) > controllerContext.removeTopic(topic) > {code} > I.e. it first update the broker's metadata cache through the ISR and metadata > update request, then delete the topic zk path, and then delete the > topic-deletion zk path. However, upon handling the create topic request, the > broker will simply try to write to the topic zk path directly. Hence there is > a race condition that between brokers update their metadata cache (hence list > topic request not returning this topic anymore) and zk path for the topic be > deleted (hence the create topic succeed). > The reason this problem could be exposed, is through current handling logic > of the create topic response, most of which takes {{TOPIC_ALREADY_EXISTS}} as > "OK" and moves on, and the zk path will be deleted later, hence leaving the > topic to be not created at all: > https://github.com/apache/kafka/blob/249e398bf84cdd475af6529e163e78486b43c570/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamsKafkaClient.java#L221 > https://github.com/apache/kafka/blob/1a653c813c842c0b67f26fb119d7727e272cf834/connect/runtime/src/main/java/org/apache/kafka/connect/util/TopicAdmin.java#L232 > Looking at the code history, it seems this race condition always exist, but > testing on trunk / 1.0 with the above steps it is more likely to happen than > before. I wonder if the ZK async calls have an effect here. cc [~junrao] > [~onurkaraman] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6098) Delete and Re-create topic operation could result in race condition
[ https://issues.apache.org/jira/browse/KAFKA-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212990#comment-16212990 ] Guozhang Wang commented on KAFKA-6098: -- I think the reason this race condition gap gets "bigger" is that, when we park the delayed delete topic request we set the complete criterion as "local metadata gets updated to remove this topic". I.e. once the update metadata request gets received, this delayed delete topic gets returned immediately. Our Javadoc of the admin client actually claims this behavior: {code} * It may take several seconds after AdminClient#deleteTopics returns * success for all the brokers to become aware that the topics are gone. * During this time, AdminClient#listTopics and AdminClient#describeTopics * may continue to return information about the deleted topics. {code} But the problem is that, even when AdminClient#listTopics does not return the topic any more, the deletion process may still not 100% complete; so after the listTopics check if a following create topic request get "TOPIC EXISTED", we cannot tell if this is transient (i.e. the topic will be gone later) or permanent, so either finite/infinite retrying or moving on is not perfect. > Delete and Re-create topic operation could result in race condition > --- > > Key: KAFKA-6098 > URL: https://issues.apache.org/jira/browse/KAFKA-6098 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang > > Here is the following process to re-produce this issue: > 1. Delete a topic using the delete topic request. > 2. Confirm the topic is deleted using the list topics request. > 3. Create the topic using the create topic request. > In step 3) a race condition can happen that the response returns a > {{TOPIC_ALREADY_EXISTS}} error code, indicating the topic has already existed. > The root cause of the above issue is in the {{TopicDeletionManager}} class: > {code} > controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq, > OfflinePartition) > controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq, > NonExistentPartition) > topicsToBeDeleted -= topic > partitionsToBeDeleted.retain(_.topic != topic) > kafkaControllerZkUtils.deleteTopicZNode(topic) > kafkaControllerZkUtils.deleteTopicConfigs(Seq(topic)) > kafkaControllerZkUtils.deleteTopicDeletions(Seq(topic)) > controllerContext.removeTopic(topic) > {code} > I.e. it first update the broker's metadata cache through the ISR and metadata > update request, then delete the topic zk path, and then delete the > topic-deletion zk path. However, upon handling the create topic request, the > broker will simply try to write to the topic zk path directly. Hence there is > a race condition that between brokers update their metadata cache (hence list > topic request not returning this topic anymore) and zk path for the topic be > deleted (hence the create topic succeed). > The reason this problem could be exposed, is through current handling logic > of the create topic response, most of which takes {{TOPIC_ALREADY_EXISTS}} as > "OK" and moves on, and the zk path will be deleted later, hence leaving the > topic to be not created at all: > https://github.com/apache/kafka/blob/249e398bf84cdd475af6529e163e78486b43c570/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamsKafkaClient.java#L221 > https://github.com/apache/kafka/blob/1a653c813c842c0b67f26fb119d7727e272cf834/connect/runtime/src/main/java/org/apache/kafka/connect/util/TopicAdmin.java#L232 > Looking at the code history, it seems this race condition always exist, but > testing on trunk / 1.0 with the above steps it is more likely to happen than > before. I wonder if the ZK async calls have an effect here. cc [~junrao] > [~onurkaraman] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6098) Delete and Re-create topic operation could result in race condition
[ https://issues.apache.org/jira/browse/KAFKA-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212894#comment-16212894 ] Ismael Juma commented on KAFKA-6098: [~junrao] I think the delete path is fine. It's the creation path that needs to handle this scenario, I think. > Delete and Re-create topic operation could result in race condition > --- > > Key: KAFKA-6098 > URL: https://issues.apache.org/jira/browse/KAFKA-6098 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang > Fix For: 1.0.0 > > > Here is the following process to re-produce this issue: > 1. Delete a topic using the delete topic request. > 2. Confirm the topic is deleted using the list topics request. > 3. Create the topic using the create topic request. > In step 3) a race condition can happen that the response returns a > {{TOPIC_ALREADY_EXISTS}} error code, indicating the topic has already existed. > The root cause of the above issue is in the {{TopicDeletionManager}} class: > {code} > controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq, > OfflinePartition) > controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq, > NonExistentPartition) > topicsToBeDeleted -= topic > partitionsToBeDeleted.retain(_.topic != topic) > kafkaControllerZkUtils.deleteTopicZNode(topic) > kafkaControllerZkUtils.deleteTopicConfigs(Seq(topic)) > kafkaControllerZkUtils.deleteTopicDeletions(Seq(topic)) > controllerContext.removeTopic(topic) > {code} > I.e. it first update the broker's metadata cache through the ISR and metadata > update request, then delete the topic zk path, and then delete the > topic-deletion zk path. However, upon handling the create topic request, the > broker will simply try to write to the topic zk path directly. Hence there is > a race condition that between brokers update their metadata cache (hence list > topic request not returning this topic anymore) and zk path for the topic be > deleted (hence the create topic succeed). > The reason this problem could be exposed, is through current handling logic > of the create topic response, most of which takes {{TOPIC_ALREADY_EXISTS}} as > "OK" and moves on, and the zk path will be deleted later, hence leaving the > topic to be not created at all: > https://github.com/apache/kafka/blob/249e398bf84cdd475af6529e163e78486b43c570/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamsKafkaClient.java#L221 > https://github.com/apache/kafka/blob/1a653c813c842c0b67f26fb119d7727e272cf834/connect/runtime/src/main/java/org/apache/kafka/connect/util/TopicAdmin.java#L232 > Looking at the code history, it seems this race condition always exist, but > testing on trunk / 1.0 with the above steps it is more likely to happen than > before. I wonder if the ZK async calls have an effect here. cc [~junrao] > [~onurkaraman] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6098) Delete and Re-create topic operation could result in race condition
[ https://issues.apache.org/jira/browse/KAFKA-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212883#comment-16212883 ] Jun Rao commented on KAFKA-6098: The reason for deleting the topic path first and then the topic deletion path in ZK is probably that if the controller fails in the middle, the new controller can complete the deletion of the topic. > Delete and Re-create topic operation could result in race condition > --- > > Key: KAFKA-6098 > URL: https://issues.apache.org/jira/browse/KAFKA-6098 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang > Fix For: 1.0.0 > > > Here is the following process to re-produce this issue: > 1. Delete a topic using the delete topic request. > 2. Confirm the topic is deleted using the list topics request. > 3. Create the topic using the create topic request. > In step 3) a race condition can happen that the response returns a > {{TOPIC_ALREADY_EXISTS}} error code, indicating the topic has already existed. > The root cause of the above issue is in the {{TopicDeletionManager}} class: > {code} > controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq, > OfflinePartition) > controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq, > NonExistentPartition) > topicsToBeDeleted -= topic > partitionsToBeDeleted.retain(_.topic != topic) > kafkaControllerZkUtils.deleteTopicZNode(topic) > kafkaControllerZkUtils.deleteTopicConfigs(Seq(topic)) > kafkaControllerZkUtils.deleteTopicDeletions(Seq(topic)) > controllerContext.removeTopic(topic) > {code} > I.e. it first update the broker's metadata cache through the ISR and metadata > update request, then delete the topic zk path, and then delete the > topic-deletion zk path. However, upon handling the create topic request, the > broker will simply try to write to the topic zk path directly. Hence there is > a race condition that between brokers update their metadata cache (hence list > topic request not returning this topic anymore) and zk path for the topic be > deleted (hence the create topic succeed). > The reason this problem could be exposed, is through current handling logic > of the create topic response, most of which takes {{TOPIC_ALREADY_EXISTS}} as > "OK" and moves on, and the zk path will be deleted later, hence leaving the > topic to be not created at all: > https://github.com/apache/kafka/blob/249e398bf84cdd475af6529e163e78486b43c570/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamsKafkaClient.java#L221 > https://github.com/apache/kafka/blob/1a653c813c842c0b67f26fb119d7727e272cf834/connect/runtime/src/main/java/org/apache/kafka/connect/util/TopicAdmin.java#L232 > Looking at the code history, it seems this race condition always exist, but > testing on trunk / 1.0 with the above steps it is more likely to happen than > before. I wonder if the ZK async calls have an effect here. cc [~junrao] > [~onurkaraman] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6098) Delete and Re-create topic operation could result in race condition
[ https://issues.apache.org/jira/browse/KAFKA-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212817#comment-16212817 ] Ismael Juma commented on KAFKA-6098: Nice catch. > Delete and Re-create topic operation could result in race condition > --- > > Key: KAFKA-6098 > URL: https://issues.apache.org/jira/browse/KAFKA-6098 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang > Fix For: 1.0.0 > > > Here is the following process to re-produce this issue: > 1. Delete a topic using the delete topic request. > 2. Confirm the topic is deleted using the list topics request. > 3. Create the topic using the create topic request. > In step 3) a race condition can happen that the response returns a > {{TOPIC_ALREADY_EXISTS}} error code, indicating the topic has already existed. > The root cause of the above issue is in the {{TopicDeletionManager}} class: > {code} > controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq, > OfflinePartition) > controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq, > NonExistentPartition) > topicsToBeDeleted -= topic > partitionsToBeDeleted.retain(_.topic != topic) > kafkaControllerZkUtils.deleteTopicZNode(topic) > kafkaControllerZkUtils.deleteTopicConfigs(Seq(topic)) > kafkaControllerZkUtils.deleteTopicDeletions(Seq(topic)) > controllerContext.removeTopic(topic) > {code} > I.e. it first update the broker's metadata cache through the ISR and metadata > update request, then delete the topic zk path, and then delete the > topic-deletion zk path. However, upon handling the create topic request, the > broker will simply try to write to the topic zk path directly. Hence there is > a race condition that between brokers update their metadata cache (hence list > topic request not returning this topic anymore) and zk path for the topic be > deleted (hence the create topic succeed). > The reason this problem could be exposed, is through current handling logic > of the create topic response, most of which takes {{TOPIC_ALREADY_EXISTS}} as > "OK" and moves on, and the zk path will be deleted later, hence leaving the > topic to be not created at all: > https://github.com/apache/kafka/blob/249e398bf84cdd475af6529e163e78486b43c570/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamsKafkaClient.java#L221 > https://github.com/apache/kafka/blob/1a653c813c842c0b67f26fb119d7727e272cf834/connect/runtime/src/main/java/org/apache/kafka/connect/util/TopicAdmin.java#L232 > Looking at the code history, it seems this race condition always exist, but > testing on trunk / 1.0 with the above steps it is more likely to happen than > before. I wonder if the ZK async calls have an effect here. cc [~junrao] > [~onurkaraman] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6098) Delete and Re-create topic operation could result in race condition
[ https://issues.apache.org/jira/browse/KAFKA-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212800#comment-16212800 ] Guozhang Wang commented on KAFKA-6098: -- One example of such a race condition is exposed here: https://issues.apache.org/jira/browse/KAFKA-5140 > Delete and Re-create topic operation could result in race condition > --- > > Key: KAFKA-6098 > URL: https://issues.apache.org/jira/browse/KAFKA-6098 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang > Fix For: 1.0.0 > > > Here is the following process to re-produce this issue: > 1. Delete a topic using the delete topic request. > 2. Confirm the topic is deleted using the list topics request. > 3. Create the topic using the create topic request. > In step 3) a race condition can happen that the response returns a > {{TOPIC_ALREADY_EXISTS}} error code, indicating the topic has already existed. > The root cause of the above issue is in the {{TopicDeletionManager}} class: > {code} > controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq, > OfflinePartition) > controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq, > NonExistentPartition) > topicsToBeDeleted -= topic > partitionsToBeDeleted.retain(_.topic != topic) > kafkaControllerZkUtils.deleteTopicZNode(topic) > kafkaControllerZkUtils.deleteTopicConfigs(Seq(topic)) > kafkaControllerZkUtils.deleteTopicDeletions(Seq(topic)) > controllerContext.removeTopic(topic) > {code} > I.e. it first update the broker's metadata cache through the ISR and metadata > update request, then delete the topic zk path, and then delete the > topic-deletion zk path. However, upon handling the create topic request, the > broker will simply try to write to the topic zk path directly. Hence there is > a race condition that between brokers update their metadata cache (hence list > topic request not returning this topic anymore) and zk path for the topic be > deleted (hence the create topic succeed). > The reason this problem could be exposed, is through current handling logic > of the create topic response, most of which takes {{TOPIC_ALREADY_EXISTS}} as > "OK" and moves on, and the zk path will be deleted later, hence leaving the > topic to be not created at all: > https://github.com/apache/kafka/blob/249e398bf84cdd475af6529e163e78486b43c570/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamsKafkaClient.java#L221 > https://github.com/apache/kafka/blob/1a653c813c842c0b67f26fb119d7727e272cf834/connect/runtime/src/main/java/org/apache/kafka/connect/util/TopicAdmin.java#L232 > Looking at the code history, it seems this race condition always exist, but > testing on trunk / 1.0 with the above steps it is more likely to happen than > before. I wonder if the ZK async calls have an effect here. cc [~junrao] > [~onurkaraman] -- This message was sent by Atlassian JIRA (v6.4.14#64029)