[ 
https://issues.apache.org/jira/browse/KAFKA-13538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459974#comment-17459974
 ] 

joecqupt edited comment on KAFKA-13538 at 12/16/21, 3:43 AM:
-------------------------------------------------------------

it seems that it because KafkaAdminClient retry mechanism, you can set 
"retries" param 0 to avoid this problem, you will get a TimeoutException


was (Author: joecqupt):
it seems that it because KafkaAdminClient retry mechanism, you can unset 
"retries" param to avoid this problem, you will get a TimeoutException

> Unexpected TopicExistsException related to Admin#createTopics after broker 
> crash
> --------------------------------------------------------------------------------
>
>                 Key: KAFKA-13538
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13538
>             Project: Kafka
>          Issue Type: Bug
>          Components: build
>    Affects Versions: 2.8.0
>            Reporter: Haoze Wu
>            Priority: Major
>         Attachments: Screenshot from 2021-12-12 21-16-56.png, Screenshot from 
> 2021-12-12 21-17-04.png
>
>
> We were using the official Kafka Java API to create a topic in a Kafka broker 
> cluster (3 brokers):
> {code:java}
> CreateTopicsResult result = admin.createTopics(...);
> ... = result.all().get(); {code}
> The topic we create always has replication factor = 2, and partition = 2. If 
> one of the brokers crashes for some reason and the client tries to create a 
> topic exactly in this crashed broker, we usually observe that the client may 
> suffer from a delay of a few seconds due to the disconnection issue, and then 
> the client automatically connects to another broker and creates the topic in 
> this broker. Everything is done automatically in the client, under the code 
> of `admin.createTopics(...)` and `result.all().get()`.
> However, we found that sometimes we got `TopicExistsException` from 
> `result.all().get()`, but we had never created this topic beforehand.
> After some investigation on the source code of client, we found that this 
> issue happens in this way:
>  # The client connects to a broker (say, broker X) and then sends the topic 
> creation request.
>  # This topic has replication factor = 2 and partition = 2, so broker X may 
> inform another broker of this information.
>  # Broker X suddenly crashes for some reason, and the response for the topic 
> creation request has not been sent back to the client.
>  # The client eventually learns that broker X crashes, but never gets the 
> response for the topic creation request. Thus the client thinks the topic 
> creation request fails, and thus connects to another broker (say, broker Y) 
> and then sends the topic creation request again.
>  # This topic creation request (with replication factor = 2 and partition = 
> 2) had been partially executed before broker X crashes, so broker Y may have 
> done something required by broker X. For example, broker Y has some metadata 
> about this topic. Therefore, when Broker Y does some sanity check with the 
> metadata, it will find this topic exists, so broker Y directly returns 
> `TopicExistsException` as the response.
>  # The client receives `TopicExistsException`, and directly believes that 
> this topic has been created, so it is thrown back to the user with the API 
> `result.all().get()`.
> There are 2 diagrams illustrating these 6 steps:
> !Screenshot from 2021-12-12 21-16-56.png!
> !Screenshot from 2021-12-12 21-17-04.png!
> Now the core question is whether this workflow violates the semantic & design 
> of the Kafka Client API. We read the “Create Topics Response” section in 
> KIP-4 
> ([https://cwiki.apache.org/confluence/display/kafka/kip-4+-+command+line+and+centralized+administrative+operations]).
>  We found that the description in KIP-4 focuses on the batch request of topic 
> creations and how they work independently. It does not talk about how the 
> client should deal with the aforementioned buggy scenario.
> According to “common sense”, we think the client should be able to know that 
> the metadata existing in broker Y is actually created by the client via the 
> crashed broker X. Also, the client should not throw `TopicExistsException` to 
> the user.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to