darinspivey opened a new issue, #24879: URL: https://github.com/apache/pulsar/issues/24879
### Search before reporting - [x] I searched in the [issues](https://github.com/apache/pulsar/issues) and found nothing similar. ### Read release policy - [x] I understand that [unsupported versions](https://pulsar.apache.org/contribute/release-policy/#supported-versions) don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker. ### User environment We are using partitioned topics of 6 partitions, and we have auto-topic-deletion turned on. When the GC fires to delete the partitions, as expected, they mostly fire at the same time. This appears to sometimes cause a race condition when they all attempt to update the topic's metadata. This has a cascading effect of: * The metadata update never completes, even with jitter and retries * The topic deletion eventually fails as a whole because the race conditions causes 1 or more partitions to remain * The partition(s) becomes orphaned, and the parent topic never can be deleted because GC does not appear to try again after a period of time. ## System info ``` /pulsar $ pulsar version Current version of pulsar is: 4.0.6 Git Revision 4538ef7645c45a3c8686092128fde6c5d61c762b Git Branch branch-4.0 Built by Lari Hotari <[email protected]> on Laris-MBP.lan at 2025-07-30T13:37:25+0300 /pulsar $ uname -a Linux pulsar-broker-0 6.12.46-66.121.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Sep 22 16:35:59 UTC 2025 x86_64 GNU/Linux ``` ## Relevant helm values (Chart `pulsar-4.2.0`) ``` brokerDeleteInactiveTopicsEnabled: 'true' brokerDeleteInactiveTopicsFrequencySeconds: '60' brokerDeleteInactiveTopicsMaxInactiveDurationSeconds: '60' brokerDeleteInactiveTopicsMode: delete_when_no_subscriptions brokerDeleteInactivePartitionedTopicMetadataEnabled: 'true' allowAutoTopicCreation: 'true' defaultNumPartitions: '6' allowAutoTopicCreationType: 'partitioned' ``` * Note: other settings for auto-deleting subscriptions works fine. The problem is happening when the topic attempts auto-deletion (TLDR `delete_when_no_subscriptions` is working fine). ## Relevant Logs ``` Oct 18 06:31:02 pulsar-broker-1 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-2 Moving to FencedForDeletion state Oct 18 06:31:02 pulsar-broker-1 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.MetaStoreImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-2] Remove ManagedLedger Oct 18 06:31:02 pulsar-broker-1 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleting path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-2 (v. Optional.empty) Oct 18 06:31:02 pulsar-broker-1 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleted path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-2 (v. Optional.empty) Oct 18 06:31:02 pulsar-broker-1 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-2] Successfully deleted managed ledger Oct 18 06:31:02 pulsar-broker-1 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-2] Topic deleted Oct 18 06:31:03 pulsar-broker-1 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75] Delete topic metadata failed because another partition exist. Oct 18 06:31:03 pulsar-broker-1 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-2] Skip to delete partitioned topic: Another partition exists for [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75]. Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [pulsar-inactivity-monitor-OrderedScheduler-0-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0] Global topic inactive for 60 seconds, closed repl producers Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [pulsar-inactivity-monitor-OrderedScheduler-0-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3] Global topic inactive for 60 seconds, closed repl producers Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [pulsar-inactivity-monitor-OrderedScheduler-0-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4] Global topic inactive for 60 seconds, closed repl producers Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO org.apache.pulsar.broker.service.BrokerService - Successfully delete authentication policies for topic persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0 Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0 Moving to FencedForDeletion state Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO org.apache.pulsar.broker.service.BrokerService - Successfully delete authentication policies for topic persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3 Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3 Moving to FencedForDeletion state Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO org.apache.pulsar.broker.service.BrokerService - Successfully delete authentication policies for topic persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4 Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4 Moving to FencedForDeletion state Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.MetaStoreImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0] Remove ManagedLedger Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleting path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0 (v. Optional.empty) Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.MetaStoreImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3] Remove ManagedLedger Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleting path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3 (v. Optional.empty) Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.MetaStoreImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4] Remove ManagedLedger Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleting path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4 (v. Optional.empty) Oct 18 06:31:05 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleted path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0 (v. Optional.empty) Oct 18 06:31:05 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleted path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3 (v. Optional.empty) Oct 18 06:31:05 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleted path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4 (v. Optional.empty) Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0] Successfully deleted managed ledger Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0] Topic deleted Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3] Successfully deleted managed ledger Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3] Topic deleted Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4] Successfully deleted managed ledger Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4] Topic deleted Oct 18 06:31:06 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 5 ms. Mandatory stop: false. Elapsed time: 1760783466079 ms Oct 18 06:31:06 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 5 ms. Mandatory stop: false. Elapsed time: 1760783466180 ms Oct 18 06:31:06 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75] Delete topic metadata failed because another partition exist. Oct 18 06:31:06 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 10 ms. Mandatory stop: false. Elapsed time: 703 ms Oct 18 06:31:06 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75] Delete topic metadata failed because another partition exist. Oct 18 06:31:06 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 5 ms. Mandatory stop: false. Elapsed time: 1760783466983 ms Oct 18 06:31:07 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 20 ms. Mandatory stop: false. Elapsed time: 902 ms Oct 18 06:31:07 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3] Skip to delete partitioned topic: Another partition exists for [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75]. Oct 18 06:31:07 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 10 ms. Mandatory stop: false. Elapsed time: 184 ms Oct 18 06:31:07 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 19 ms. Mandatory stop: false. Elapsed time: 213 ms Oct 18 06:31:07 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75] Delete topic metadata failed because another partition exist. Oct 18 06:31:07 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4] Skip to delete partitioned topic: Another partition exists for [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75]. Oct 18 06:31:07 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 5 ms. Mandatory stop: false. Elapsed time: 1760783467281 ms Oct 18 06:31:07 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0] Skip to delete partitioned topic: Another partition exists for [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75]. Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [pulsar-inactivity-monitor-OrderedScheduler-0-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1] Global topic inactive for 60 seconds, closed repl producers Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [pulsar-inactivity-monitor-OrderedScheduler-0-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5] Global topic inactive for 60 seconds, closed repl producers Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO org.apache.pulsar.broker.service.BrokerService - Successfully delete authentication policies for topic persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1 Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1 Moving to FencedForDeletion state Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO org.apache.pulsar.broker.service.BrokerService - Successfully delete authentication policies for topic persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5 Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5 Moving to FencedForDeletion state Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.MetaStoreImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1] Remove ManagedLedger Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleting path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1 (v. Optional.empty) Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.MetaStoreImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5] Remove ManagedLedger Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleting path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5 (v. Optional.empty) Oct 18 06:31:14 pulsar-broker-0 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleted path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1 (v. Optional.empty) Oct 18 06:31:14 pulsar-broker-0 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleted path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5 (v. Optional.empty) Oct 18 06:31:14 pulsar-broker-0 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1] Successfully deleted managed ledger Oct 18 06:31:14 pulsar-broker-0 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1] Topic deleted Oct 18 06:31:14 pulsar-broker-0 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5] Successfully deleted managed ledger Oct 18 06:31:14 pulsar-broker-0 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5] Topic deleted Oct 18 06:31:14 pulsar-broker-0 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 5 ms. Mandatory stop: false. Elapsed time: 1760783474475 ms Oct 18 06:36:14 pulsar-broker-0 pulsar-broker WARN [delayer-47-1] WARN org.apache.pulsar.client.admin.internal.BaseResource - [http://pulsar-broker-0.pulsar-broker.pulsar.svc.cluster.local:8080/admin/v2/persistent/ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75/partitions?force=false&deleteSchema=true] Failed to perform http delete request: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Request timeout Oct 18 06:36:14 pulsar-broker-0 pulsar-broker WARN [metadata-store-10-1] WARN org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5] Inactive topic deletion failed java.util.concurrent.CompletionException: org.apache.pulsar.client.admin.PulsarAdminException: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Request timeout at java.base/java.util.concurrent.CompletableFuture.encodeRelay(Unknown Source) ~[?:?] at java.base/java.util.concurrent.CompletableFuture.completeRelay(Unknown Source) ~[?:?] at java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(Unknown Source) ~[?:?] at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:?] at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) ~[?:?] at org.apache.pulsar.client.admin.internal.BaseResource$4.failed(BaseResource.java:237) ~[org.apache.pulsar-pulsar-client-admin-original-4.0.6.jar:4.0.6] at org.glassfish.jersey.client.JerseyInvocation$1.failed(JerseyInvocation.java:898) ~[org.glassfish.jersey.core-jersey-client-2.42.jar:?] Oct 18 06:36:14 pulsar-broker-0 pulsar-broker WARN [delayer-47-1] WARN org.apache.pulsar.client.admin.internal.BaseResource - [http://pulsar-broker-0.pulsar-broker.pulsar.svc.cluster.local:8080/admin/v2/persistent/ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75/partitions?force=false&deleteSchema=true] Failed to perform http delete request: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Request timeout Oct 18 06:36:14 pulsar-broker-0 pulsar-broker WARN [metadata-store-10-1] WARN org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1] Inactive topic deletion failed java.util.concurrent.CompletionException: org.apache.pulsar.client.admin.PulsarAdminException: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Request timeout at java.base/java.util.concurrent.CompletableFuture.encodeRelay(Unknown Source) ~[?:?] at java.base/java.util.concurrent.CompletableFuture.completeRelay(Unknown Source) ~[?:?] at java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(Unknown Source) ~[?:?] at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:?] at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) ~[?:?] at org.apache.pulsar.client.admin.internal.BaseResource$4.failed(BaseResource.java:237) ~[org.apache.pulsar-pulsar-client-admin-original-4.0.6.jar:4.0.6] ``` The logs clearly show that partitions enter the deletion GC around the same time, and most partitions clean up fine (some logs of other partition's successes may have been omitted here). The errors to delete the parent metadata fail as normal until the last partition is gone. However, the metadata update failure appears to never allow that to happen. ### Issue Description ## What happened Using auto topic deletion with partitioned topics fails sporadically due to a race condition. The failures cause orphaned topic partitions and the parent topic is never deleted. ## What did you expect to happen All partitions can successfully be deleted, along with the parent metadata, even if they all are GC'd at the same time. ## Why is this a bug The logs clearly show a race condition that orphans 1 or more topic partitions. This causes the topic to never be deleted, and it will never attempt to be GC'd again. This appears to be a distributed bug, as partitions can be held across many brokers all with their own GC loops. Therefore, setting execution threads to `1` isn't a solution either. The value used for jitter is also not exposed, but even if it was, I'm not sure that would be helpful. It seems reasonable that the system's retries that already exist should be able to complete the metadata update at some point, given that it tries for many minutes after the initial failure. ## Reproducing the Error Because this is a race condition, it's not possible to reproduce on demand. However, I see this problem in our production servers (which have hundreds of topics) on a daily basis with up to dozens of orphaned partitions. ### Error messages ```text Oct 18 06:31:14 pulsar-broker-0 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 5 ms. Mandatory stop: false. Elapsed time: 1760783474475 ms Oct 18 06:36:14 pulsar-broker-0 pulsar-broker WARN [delayer-47-1] WARN org.apache.pulsar.client.admin.internal.BaseResource - [http://pulsar-broker-0.pulsar-broker.pulsar.svc.cluster.local:8080/admin/v2/persistent/ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75/partitions?force=false&deleteSchema=true] Failed to perform http delete request: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Request timeout Oct 18 06:36:14 pulsar-broker-0 pulsar-broker WARN [metadata-store-10-1] WARN org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5] Inactive topic deletion failed java.util.concurrent.CompletionException: org.apache.pulsar.client.admin.PulsarAdminException: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Request timeout ``` ### Reproducing the issue Topic auto deletion is a great feature, but without the ability to rely 100% on all partitions of a topic being deleted, we will not be able to use it because it makes the system unclean. ### Additional information _No response_ ### Are you willing to submit a PR? - [ ] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
