poorbarcode opened a new pull request, #24945: URL: https://github.com/apache/pulsar/pull/24945
### Motivation **Issue 1**: concurrently initialising transaction buffer snapshot Before https://github.com/apache/pulsar/pull/21406, the snapshot would be taken when the persistent topic is initialising, so no concurrency. After #21406, the transaction buffer snapshot is triggered by publishing messages, so concurrency occurs. #21406 forgot to handle this case, which caused the following errors ``` 2025-11-04T22:44:14,413 - WARN - [pulsar-io-28-3:PersistentTopic] - [persistent://public/txn/tp-8064cb9f-1f8f-44f1-8bf2-872cc3870973] Failed to persist msg in store: org.apache.pulsar.broker.service.BrokerServiceException$ServiceUnitNotReadyException: Transaction Buffer take first snapshot failed, the current state is: Ready 2025-11-04T22:44:14,413 - INFO - [pulsar-io-28-3:PersistentTopic] - [persistent://public/txn/tp-8064cb9f-1f8f-44f1-8bf2-872cc3870973] Un-fencing topic... 2025-11-04T22:44:14,414 - INFO - [pulsar-client-io-96-3:ClientCnx] - [localhost/127.0.0.1:57291] Broker notification of closed producer: 0, assignedBrokerUrl: null, assignedBrokerUrlTls: null 2025-11-04T22:44:14,412 - WARN - [pulsar-client-io-262-3:ClientCnx] - [id: 0xe9ef6b71, L:/127.0.0.1:57301 - R:localhost/127.0.0.1:57291] Received send error from server: PersistenceError : org.apache.bookkeeper.mledger.ManagedLedgerException: org.apache.pulsar.broker.service.BrokerServiceException$ServiceUnitNotReadyException: Transaction Buffer take first snapshot failed, the current state is: Ready 2025-11-04T22:44:14,412 - WARN - [pulsar-client-io-262-3:ClientCnx] - [id: 0xe9ef6b71, L:/127.0.0.1:57301 - R:localhost/127.0.0.1:57291] Producer with id 0 not found while handling send error 2025-11-04T22:44:14,413 - INFO - [pulsar-client-io-96-3:ProducerImpl] - [persistent://public/txn/tp-8064cb9f-1f8f-44f1-8bf2-872cc3870973] [test-0-1] Created producer on cnx [id: 0x2f0343b6, L:/127.0.0.1:57296 - R:localhost/127.0.0.1:57291] 2025-11-04T22:44:14,413 - INFO - [pulsar-client-io-96-3:ProducerImpl] - [persistent://public/txn/tp-8064cb9f-1f8f-44f1-8bf2-872cc3870973] [test-0-1] Re-Sending 1 messages to server 2025-11-04T22:44:14,413 - INFO - [broker-topic-workers-OrderedExecutor-8-0:ServerCnx] - [/127.0.0.1:57298] Created new producer: Producer{topic=PersistentTopic{topic=persistent://public/txn/tp-8064cb9f-1f8f-44f1-8bf2-872cc3870973}, client=[id: 0x960ce44f, L:/127.0.0.1:57291 - R:/127.0.0.1:57298] [SR:127.0.0.1, state:Connected], producerName=test-0-3, producerId=0}, role: null 2025-11-04T22:44:14,413 - INFO - [pulsar-io-28-3:Producer] - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://public/txn/tp-8064cb9f-1f8f-44f1-8bf2-872cc3870973}, client=[id: 0x960ce44f, L:/127.0.0.1:57291 - R:/127.0.0.1:57298] [SR:127.0.0.1, state:Connected], producerName=test-0-3, producerId=0}, assignedBrokerLookupData: Optional.empty 2025-11-04T22:44:14,413 - INFO - [pulsar-io-28-3:Producer] - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://public/txn/tp-8064cb9f-1f8f-44f1-8bf2-872cc3870973}, client=[id: 0x039d7a90, L:/127.0.0.1:57291 - R:/127.0.0.1:57296] [SR:127.0.0.1, state:Connected], producerName=test-0-1, producerId=0}, assignedBrokerLookupData: Optional.empty 2025-11-04T22:44:14,413 - WARN - [pulsar-io-28-3:PersistentTopic] - [persistent://public/txn/tp-8064cb9f-1f8f-44f1-8bf2-872cc3870973] Failed to persist msg in store: org.apache.pulsar.broker.service.BrokerServiceException$ServiceUnitNotReadyException: Transaction Buffer take first snapshot failed, the current state is: Ready 2025-11-04T22:44:14,413 - INFO - [pulsar-io-28-3:PersistentTopic] - [persistent://public/txn/tp-8064cb9f-1f8f-44f1-8bf2-872cc3870973] Un-fencing topic... 2025-11-04T22:44:14,414 - INFO - [pulsar-client-io-96-3:ClientCnx] - [localhost/127.0.0.1:57291] Broker notification of closed producer: 0, assignedBrokerUrl: null, assignedBrokerUrlTls: null 2025-11-04T22:44:14,414 - INFO - [pulsar-client-io-163-3:ProducerImpl] - [persistent://public/txn/tp-8064cb9f-1f8f-44f1-8bf2-872cc3870973] [test-0-3] Created producer on cnx [id: 0xfbd3c65b, L:/127.0.0.1:57298 - R:localhost/127.0.0.1:57291] 2025-11-04T22:44:14,414 - INFO - [pulsar-client-io-96-3:ConnectionHandler] - [persistent://public/txn/tp-8064cb9f-1f8f-44f1-8bf2-872cc3870973] [test-0-1] Closed connection [id: 0x2f0343b6, L:/127.0.0.1:57296 - R:localhost/127.0.0.1:57291] -- Will try again in 0.1 s, hostUrl: null 2025-11-04T22:44:14,414 - WARN - [pulsar-client-io-96-3:ClientCnx] - [id: 0x2f0343b6, L:/127.0.0.1:57296 - R:localhost/127.0.0.1:57291] Received send error from server: PersistenceError : org.apache.bookkeeper.mledger.ManagedLedgerException: org.apache.pulsar.broker.service.BrokerServiceException$ServiceUnitNotReadyException: Transaction Buffer take first snapshot failed, the current state is: Ready 2025-11-04T22:44:14,414 - WARN - [pulsar-client-io-96-3:ClientCnx] - [id: 0x2f0343b6, L:/127.0.0.1:57296 - R:localhost/127.0.0.1:57291] Producer with id 0 not found while handling send error 2025-11-04T22:44:14,414 - INFO - [pulsar-client-io-163-3:ClientCnx] - [localhost/127.0.0.1:57291] Broker notification of closed producer: 0, assignedBrokerUrl: null, assignedBrokerUrlTls: null ``` **Issue 2: publishing messages before the transaction buffer is recovered.** Before https://github.com/apache/pulsar/pull/21406: a wrong variable was used when reconstructing the class, the correct variable should be `snapshotAbortedTxnProcessor`, but it used `publishFuture`. See follows: - https://github.com/apache/pulsar/pull/21406/files#diff-ecd728301a585f256e8a649b5e65b28c166194477355b3a1eefc198d014c25d3L221 - https://github.com/apache/pulsar/pull/21406/files#diff-ecd728301a585f256e8a649b5e65b28c166194477355b3a1eefc198d014c25d3R255 This issue makes transaction buffer recovery and taking a transaction snapshot execute concurrently. ### Modifications Fix the two issues ### Documentation <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. --> - [ ] `doc` <!-- Your PR contains doc changes. --> - [ ] `doc-required` <!-- Your PR changes impact docs and you will update later --> - [x] `doc-not-needed` <!-- Your PR changes do not impact docs --> - [ ] `doc-complete` <!-- Docs have been already added --> ### Matching PR in forked repository PR in forked repository: x -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
