rdhabalia opened a new pull request #550: ZookeeperCache children-cache invalidation on watch-event and LoadMa? URL: https://github.com/apache/incubator-pulsar/pull/550 ?nager handling if availableBrokerCache is not updated ### Motivation When broker shutdowns, it deletes its own znode from `/loadbalance/brokers` and Leader of [ModularLoadManager](https://github.com/apache/incubator-pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/ModularLoadManagerImpl.java) should get watch event which should update the [available-broker-list-cache](https://github.com/apache/incubator-pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/ModularLoadManagerImpl.java#L183) and loadManager should have up to date list of availableBroker. LoadManager also has Zk-Data watch (`ZooKeeperDataCache`) for broker's node. so sometimes, we saw that zk triggers only 1 watch event per zkSession and it notifies only `ZooKeeperDataCache` and not `ZooKeeperChildrenCache` which fails to update availableBrokerList and load-manager fails to update bundle-ownership data which cause bundle downtime. ``` ### Only received ZooKeeperDataCache event which doesn't update available Broker list 22:14:03.646 [main-EventThread] INFO c.y.p.zookeeper.ZooKeeperDataCache - [State:CONNECTED Timeout:30000 sessionid:0x459d943ea7cef26 local:/ remoteserver:zk4/ lastZxid:391013064804 xid:600512 sent:600512 recv:751510 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDeleted path:/loadbalance/brokers/broker2:4080 : 22:14:08.535 [main-EventThread] INFO c.y.p.zookeeper.ZooKeeperDataCache - [State:CONNECTED Timeout:30000 sessionid:0x459d943ea7cef26 local:/ remoteserver:zk4/ lastZxid:3910130 66537 xid:600708 sent:600708 recv:751737 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/loadbalance/brokers/broker15:4080 22:14:08.538 [pool-21-thread-1] WARN c.y.p.b.l.i.ModularLoadManagerImpl - Error reading broker data from cache for broker - [broker2:4080], [KeeperErrorCode = NoNode] : #### Because of stale availableBrokerList : Load-manager failed to update bundle ownership here 22:14:08.538 [pool-21-thread-1] WARN c.y.p.b.l.i.ModularLoadManagerImpl - Error reading broker data from cache for broker - [broker2:4080], [KeeperErrorCode = NoNode] 22:14:21.006 [pool-21-thread-1] WARN c.y.p.b.l.i.ModularLoadManagerImpl - Error reading broker data from cache for broker - [broker2:4080], [KeeperErrorCode = NoNode] 22:14:30.097 [pool-21-thread-1] WARN c.y.p.b.l.i.ModularLoadManagerImpl - Error reading broker data from cache for broker - [broker2:4080], [KeeperErrorCode = NoNode] : ##### All lookup fails until broker comes back again 22:14:31.127 [zk-cache-callback-2-2] WARN c.y.p.b.lookup.DestinationLookup - Failed to lookup broker for topic persistent://sla-monitor/myCluster/broker2:4080/persistent-c2023ca5-e8f4-46fe-bb9f-3bf28b050faa: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /loadbalance/brokers/broker2:4080 java.util.concurrent.CompletionException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /loadbalance/brokers/broker2:4080 Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /loadbalance/brokers/broker2:4080 22:14:31.367 [zk-cache-callback-2-4] WARN c.y.p.b.lookup.DestinationLookup - Failed to lookup broker for topic persistent://sla-monitor/myCluster/broker2:4080/persistent-c2023ca5-e8f4-46fe-bb9f-3bf28b050faa: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /loadbalance/brokers/broker2:4080 java.util.concurrent.CompletionException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /loadbalance/brokers/broker2:4080 Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /loadbalance/brokers/broker2:4080 ``` ### Modifications - ZKCache: invalidate parent-zkCache if node is deleted/created - LoadManager: Handle if availableBrokersCache is not update while updating load-report ### Result It will help LoadManager leader to keep latest bundle ownership data and broker restart will not cause downtime for bundle assignment. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
With regards, Apache Git Services
