BewareMyPower commented on PR #24947: URL: https://github.com/apache/pulsar/pull/24947#issuecomment-3490967410
IMO, we should not apply a timeout on topic loading. When the timeout happens, there must be a bug or the CPU resource is exhausted. You can see `topicLoadTimeoutSeconds` is 60, which is already too long. Just imagine creating a producer and the future does not complete in 60 seconds, or a reconnection due to topic unloading didn't succeed in 60 seconds. In production env, the timeout is usually caused by topic replay because the metadata store operation should not be such slow: 1. `__change_events` 2. or message deduplication replay on the topic The 1st case is usually caused by a bug with system topic based topic policies service. It is hard to say whether retrying helps. The 2nd case is usually caused by the snapshot was not taken in time. Retry the topic replay never helps for the 2nd case. That's also the motivation of https://github.com/apache/pulsar/pull/23004, which introduces `testFinishTakeSnapshotWhenTopicLoading`. I've been working on this part before but I didn't have time recently. Here are some local notes I wrote before. ---- Specifically, for a partition of a partitioned persistent topic, the topic load process is as follows: 1. Check if the topic exists (Metadata store: `/managed-ledgers/<persistence-naming-encoding>`) 2. Get topic policies (via `TopicPoliciesService#getTopicPoliciesAsync`) 3. Check topic ownership 4. Check if the pending topic loading count exceeds the `maxConcurrentTopicLoadRequest` config (default: 5000), if not, the following steps will be executed after the previous topic loading operations are completed. 5. Check topic ownership again 6. Fetch topic properties (Metadata store: `/admin/partitioned-topics/<tenant>/<namespace>/persistent/<encoded-topic>`) 7. Check if the topic count in the namespace will exceed the `maxTopicsPerNamespace` namespace policy (default: 1000) (Two metadata store operations, one is same with the 1st step, the other is `/admin/policies/<tenant>/<namespace>`) 8. Check if the cluster migration is enabled (Metadata store: `/admin/clusters/<cluster>/policies` and `/admin/local-policies/<tenant>/<namespace>`) 9. Validate topic partition metadata consistency (see PIP-414), which is similar to step 6 10. Get managed ledger config, it will require the topic policies (same with step 2) and namespace policies (Metadata store: `/admin/policies/<tenant>/<namespace>`) 11. Create the managed ledegr via `ManagedLedgerFactory#asyncOpen` 12. Create the `PersistentTopic` instance via `TopicFactory#create` and perform some initializations Normally, the metadata store operations above are fast because many operations are done in previous RPC during topic lookup and the results will be cached. The main concern is the topic level policies, whose default implementation is to read from the `__change_events` topic in the same namespace with the user topic. Apart from the topic policies, each topic has its unique properties stored in metadata store as well: - Partitioned topic metadata: `/admin/partitioned-topics/<tenant>/<namespace>/<topic>` - Managed ledger properties: `/managed-ledgers/<persistence-naming-encoding>` (it's actually partition level) Topic policies are good to manage massive topic level metadata without touching the metadata store too much. Each topic will register itself as a listener to `TopicPoliciesService` so it can be aware of the topic policies changes. However, block ing the topic loading process to wait for the topic policies is not a good idea, especially for the case when the namespace was not owned by the new owner broker before. When there are many namespaces, each namespace will create a reader to repl ay the whole `<tenant>/<ns>/__change_events` topic, which could bring much pressure on BookKeeper and then "too many xxx requests" could happen. The topic policies are used to apply the `PersistencePolicies`, `RetentionPolicies` and `OffloadPolicies` on the managed ledger config. However, these policies also exist at namespace level. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
