junrao commented on code in PR #19964: URL: https://github.com/apache/kafka/pull/19964#discussion_r2162542690
########## clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java: ########## @@ -609,7 +607,18 @@ private void handleProduceResponse(ClientResponse response, Map<TopicPartition, .collect(Collectors.toList()), p.errorMessage(), p.currentLeader()); - ProducerBatch batch = batches.get(tp); + // Version 13 drop topic name and add support to topic id. + // We need to find batch based on topic id and partition index only as + // topic name in the response might be empty. Review Comment: How about the following? // Version 13 drop topic name and add support to topic id. // We need to find batch based on topic id and partition index only as // topic name in the response will be empty. // For older versions, topic id is zero and we will find the batch based on the topic name. ########## clients/src/main/java/org/apache/kafka/common/TopicIdPartition.java: ########## @@ -78,6 +80,21 @@ public TopicPartition topicPartition() { return topicPartition; } + /** + * Checking if TopicIdPartition meant to be the same reference to same this object but doesn't have all the data. + * If topic name is empty and topic id is persisted then the method will rely on topic id only + * otherwise the method will rely on topic name. + * @return true if topic has same topicId and partition index as topic names some time might be empty. + */ + public boolean same(TopicIdPartition tpId) { + if (Utils.isBlank(tpId.topic()) && !tpId.topicId.equals(Uuid.ZERO_UUID)) { + return topicId.equals(tpId.topicId) && + topicPartition.partition() == tpId.partition(); + } else { + return topicPartition.equals(tpId.topicPartition()); Review Comment: Actually, what I said about topic deletion wasn't correct. `sendProduceRequest()` is called in the Sender thread and at that point, the metadata cached in the client won't change. We drain `ProducerBatch` based on the topic/partition in the metadata. The `topicIds` map created is based on the same metadata. So, if a partition is included in a produce request, the `topicIds` should always include that topic (assuming the topic Id is supported on the server side), even though the server may have deleted that topic. So, this code is fine. Regarding whether to use `TopicIdPartition` in `ProducerBatch`. In the rare case, topicId could change over time for the same topic. So, we probably can't store `TopicIdPartition` as a final field when `ProducerBatch` is created. The easiest way to do this is probably to bind the topicId when the ProducerBatch is drained, based on the metadata at that time. This is more or less what this PR does. ########## clients/clients-integration-tests/src/test/java/org/apache/kafka/clients/producer/ProducerSendWhileDeletionTest.java: ########## @@ -157,6 +164,42 @@ public void testSendWithRecreatedTopic() throws Exception { } } + @Timeout(90) + @ClusterTest + public void testSendWhileTopicGetRecreated() { + int maxNumRecreatTopicAttempts = 10; + List<Uuid> topicIds = new CopyOnWriteArrayList<>(); + var recreateTopicFuture = CompletableFuture.runAsync(() -> { + for (int i = 1; i <= maxNumRecreatTopicAttempts; i++) { + Uuid topicId = recreateTopic(); + if (topicId != Uuid.ZERO_UUID) { + topicIds.add(topicId); + } + } + }); + + AtomicInteger numSuccess = new AtomicInteger(0); + var producerFutures = IntStream.range(0, 2).mapToObj(producerIndex -> CompletableFuture.runAsync(() -> { + try (var producer = cluster.producer()) { + for (int i = 1; i <= numRecords; i++) { + var resp = producer.send(new ProducerRecord<>(topic, null, ("value" + i).getBytes()), + (metadata, exception) -> { + if (metadata != null) { + numSuccess.incrementAndGet(); + } + }).get(); + assertEquals(resp.topic(), topic); + } + } catch (Exception e) { + // ignore + } + })).toList(); + recreateTopicFuture.join(); + producerFutures.forEach(CompletableFuture::join); + assertTrue(Math.abs(maxNumRecreatTopicAttempts - topicIds.size()) <= 5); + assertEquals(20, numSuccess.intValue()); Review Comment: 20 => 2 * numRecords ? ########## clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java: ########## @@ -609,7 +607,18 @@ private void handleProduceResponse(ClientResponse response, Map<TopicPartition, .collect(Collectors.toList()), p.errorMessage(), p.currentLeader()); - ProducerBatch batch = batches.get(tp); + // Version 13 drop topic name and add support to topic id. + // We need to find batch based on topic id and partition index only as + // topic name in the response might be empty. + TopicIdPartition tpId = new TopicIdPartition(r.topicId(), p.index(), r.name()); + ProducerBatch batch = batches.entrySet().stream() Review Comment: An alternative is for `handleProduceResponse()` to take a Map<TopicPartition, ProducerBatch> and a Map<UUID, String>. If the response has non-zero topicId, we look up the second map to find the topic name and then use the first map to find the batch. Otherwise, we look up the first map using the topic name. ########## clients/src/main/java/org/apache/kafka/common/TopicIdPartition.java: ########## @@ -78,6 +80,21 @@ public TopicPartition topicPartition() { return topicPartition; } + /** + * Checking if TopicIdPartition meant to be the same reference to same this object but doesn't have all the data. Review Comment: Hmm, it seems this isn't fixed? ########## clients/clients-integration-tests/src/test/java/org/apache/kafka/clients/producer/ProducerSendWhileDeletionTest.java: ########## @@ -157,6 +164,42 @@ public void testSendWithRecreatedTopic() throws Exception { } } + @Timeout(90) + @ClusterTest + public void testSendWhileTopicGetRecreated() { + int maxNumRecreatTopicAttempts = 10; + List<Uuid> topicIds = new CopyOnWriteArrayList<>(); + var recreateTopicFuture = CompletableFuture.runAsync(() -> { + for (int i = 1; i <= maxNumRecreatTopicAttempts; i++) { + Uuid topicId = recreateTopic(); + if (topicId != Uuid.ZERO_UUID) { + topicIds.add(topicId); + } + } + }); + + AtomicInteger numSuccess = new AtomicInteger(0); + var producerFutures = IntStream.range(0, 2).mapToObj(producerIndex -> CompletableFuture.runAsync(() -> { + try (var producer = cluster.producer()) { + for (int i = 1; i <= numRecords; i++) { + var resp = producer.send(new ProducerRecord<>(topic, null, ("value" + i).getBytes()), + (metadata, exception) -> { + if (metadata != null) { + numSuccess.incrementAndGet(); + } + }).get(); + assertEquals(resp.topic(), topic); + } + } catch (Exception e) { + // ignore + } + })).toList(); + recreateTopicFuture.join(); + producerFutures.forEach(CompletableFuture::join); + assertTrue(Math.abs(maxNumRecreatTopicAttempts - topicIds.size()) <= 5); Review Comment: Hmm, why doesn't each topic creation create a new topic Id? ########## clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java: ########## @@ -609,7 +607,18 @@ private void handleProduceResponse(ClientResponse response, Map<TopicPartition, .collect(Collectors.toList()), p.errorMessage(), p.currentLeader()); - ProducerBatch batch = batches.get(tp); + // Version 13 drop topic name and add support to topic id. + // We need to find batch based on topic id and partition index only as + // topic name in the response might be empty. + TopicIdPartition tpId = new TopicIdPartition(r.topicId(), p.index(), r.name()); + ProducerBatch batch = batches.entrySet().stream() + .filter(entry -> entry.getKey().same(tpId)) + .map(Map.Entry::getValue).findFirst().orElse(null); + + if (batch == null) { + throw new IllegalStateException("batch created for " + tpId + " can't be found, " + + "topic might be recreated after the batch creation."); Review Comment: It's find to throw IllegalStateException here. We just need to adjust the error message since recreating a topic shouldn't cause this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org