rdhabalia opened a new pull request #550: ZookeeperCache children-cache 
invalidation on watch-event  and LoadMa?
URL: https://github.com/apache/incubator-pulsar/pull/550
 
 
   ?nager handling if availableBrokerCache is not updated
   
   ### Motivation
   
   When broker shutdowns, it deletes its own znode from `/loadbalance/brokers` 
and Leader of 
[ModularLoadManager](https://github.com/apache/incubator-pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/ModularLoadManagerImpl.java)
 should get watch event which should update the 
[available-broker-list-cache](https://github.com/apache/incubator-pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/ModularLoadManagerImpl.java#L183)
 and loadManager should have up to date list of availableBroker. 
   
   LoadManager also has Zk-Data watch (`ZooKeeperDataCache`) for broker's node. 
so sometimes, we saw that zk triggers only 1 watch event per zkSession and it 
notifies only `ZooKeeperDataCache` and not `ZooKeeperChildrenCache` which fails 
to update availableBrokerList and load-manager fails to update bundle-ownership 
data which cause bundle downtime.
   
   ```
   ### Only received ZooKeeperDataCache event which doesn't update available 
Broker list
   22:14:03.646 [main-EventThread] INFO  c.y.p.zookeeper.ZooKeeperDataCache   - 
[State:CONNECTED Timeout:30000 sessionid:0x459d943ea7cef26 local:/ 
remoteserver:zk4/ lastZxid:391013064804 xid:600512 sent:600512 recv:751510 
queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: 
WatchedEvent state:SyncConnected type:NodeDeleted 
path:/loadbalance/brokers/broker2:4080
   :
   22:14:08.535 [main-EventThread] INFO  c.y.p.zookeeper.ZooKeeperDataCache   - 
[State:CONNECTED Timeout:30000 sessionid:0x459d943ea7cef26 local:/ 
remoteserver:zk4/ lastZxid:3910130
   66537 xid:600708 sent:600708 recv:751737 queuedpkts:0 pendingresp:0 
queuedevents:0] Received ZooKeeper watch event: WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/loadbalance/brokers/broker15:4080
   22:14:08.538 [pool-21-thread-1] WARN  c.y.p.b.l.i.ModularLoadManagerImpl   - 
Error reading broker data from cache for broker - [broker2:4080], 
[KeeperErrorCode = NoNode]
   :
   #### Because of stale availableBrokerList : Load-manager failed to update 
bundle ownership here
   22:14:08.538 [pool-21-thread-1] WARN  c.y.p.b.l.i.ModularLoadManagerImpl   - 
Error reading broker data from cache for broker - [broker2:4080], 
[KeeperErrorCode = NoNode]
   22:14:21.006 [pool-21-thread-1] WARN  c.y.p.b.l.i.ModularLoadManagerImpl   - 
Error reading broker data from cache for broker - [broker2:4080], 
[KeeperErrorCode = NoNode]
   22:14:30.097 [pool-21-thread-1] WARN  c.y.p.b.l.i.ModularLoadManagerImpl   - 
Error reading broker data from cache for broker - [broker2:4080], 
[KeeperErrorCode = NoNode]
   :
   ##### All lookup fails until broker comes back again
   22:14:31.127 [zk-cache-callback-2-2] WARN  c.y.p.b.lookup.DestinationLookup  
   - Failed to lookup broker for topic 
persistent://sla-monitor/myCluster/broker2:4080/persistent-c2023ca5-e8f4-46fe-bb9f-3bf28b050faa:
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for /loadbalance/brokers/broker2:4080
   java.util.concurrent.CompletionException: 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for /loadbalance/brokers/broker2:4080
   Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /loadbalance/brokers/broker2:4080
   22:14:31.367 [zk-cache-callback-2-4] WARN  c.y.p.b.lookup.DestinationLookup  
   - Failed to lookup broker for topic 
persistent://sla-monitor/myCluster/broker2:4080/persistent-c2023ca5-e8f4-46fe-bb9f-3bf28b050faa:
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for /loadbalance/brokers/broker2:4080
   java.util.concurrent.CompletionException: 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for /loadbalance/brokers/broker2:4080
   Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /loadbalance/brokers/broker2:4080
   ```
   
   ### Modifications
   
   - ZKCache: invalidate parent-zkCache if node is deleted/created
   - LoadManager: Handle if availableBrokersCache is not update while updating 
load-report
   
   ### Result
   
   It will help LoadManager leader to keep latest bundle ownership data and 
broker restart will not cause downtime for bundle assignment.
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to