GitHub user ahabel-wob edited a discussion: Namespace bundle ownership during kubernetes node upgrades
We run apache pulsar in kubernetes, and in some cases it does not survive the node upgrade process. We are running pulsar on 4 nodes, and the pods are the following 4 brokers, 4 bookkeepers, 3 zookepers, 3 proxies, 1 bastion, 1 autorecovery. In the logs we see this error: ```Failed to create consumer: consumerId=23954, Namespace bundle for topic (persistent://app/platform-prod/__transaction_buffer_snapshot-partition-4) not served by this instance:app-pulsar-prd-broker-1.app-pulsar-prd-broker.apache-pulsar.svc.cluster.local:8080. Please redo the lookup. Request is denied: namespace=app/platform-prod``` The error is logged between proxy and the brokers, it fails to do a new lookup. The broker can be stuck like this for hours, e.g.: 6 hours. Only a manual broker restart resolved the issue.What causes this, and how can we configure our pulsar cluster to be more resilient? The slack pulsar's AI response is the following: ```The issue you're experiencing seems related to the broker's inability to correctly handle namespace bundle ownership during node upgrades or restarts. This can lead to lookup failures and prolonged unavailability of certain topics. According to the Pulsar documentation on load balancing, topics are dynamically assigned to brokers based on load conditions, and issues can arise if the ownership of namespace bundles is not correctly managed during broker restarts. To improve resilience, consider the following: 1. Graceful Cluster Rollout: Implement a Graceful Cluster Rollout strategy. This involves using StatefulSets with revision hashes to manage broker upgrades, ensuring that traffic is gradually migrated from old to new broker pods without disrupting service. 2. Load Balancing Configuration: Ensure that your load balancing settings are optimized. You can adjust the number of namespace bundles to better distribute the load across brokers, as described in the Pulsar load balancing guide.``` GitHub link: https://github.com/apache/pulsar/discussions/23873 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
