ethqunzhong opened a new pull request, #4504:
URL: https://github.com/apache/bookkeeper/pull/4504

   ### Motivation
   we use RegionAwareEnsemblePlacementPolicy in our pulsar cluster
   We encountered some unexpected issues.
   (In some situation, eg, Broker and bookie restart concurrently.)
   1. Bookie X join cluster for the first time, encounters a region exception, 
and `address2Region` record X's region as default-region.
   2. Bookie X left cluster and is removed from knownBookies, but 
address2Region retains the information of bookie X.
   3. update Bookie X's rack info, and calling `onBookieRackChange` will only 
update address2Region for addresses present in knownBookies; therefore, bookie 
X's region info is not updated.
   4. Bookie X join cluster again, since address2Region contains the previous 
default-region information, getRegion will directly use cached data, resulting 
of an incorrect region.
   
   which may cause traffic skew in ensemble selection, Causing the bookie disk 
to be filled up quickly.
   <img width="1760" alt="image" 
src="https://github.com/user-attachments/assets/7d332bd0-83eb-48fc-bd26-5de0eccfb466";>
   
   ### Changes
   We should ensure that when a bookie leaves the cluster, we also clean up the 
corresponding region information for that bookie in address2Region, so that it 
can update the correct region for the bookie during onBookieRackChange and
   handleBookiesThatJoined.
   do  `leftBookies.forEach(address2Region::remove)`  in handleBookiesThatLeft
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to