sodonnel commented on PR #8934:
URL: https://github.com/apache/ozone/pull/8934#issuecomment-3189172581

   You have probably already thought about this, but the removal (or not) of 
the node from the Topology is all about the edge case where the removal of hte 
node causes the number of racks in the cluster to decrease. Ie, the node that 
goes dead is the last node on the rack. The reduction in available racks on the 
cluster can influence mis-replication, especially for EC, but even for RATIS if 
the cluster only had 2 racks.
   
   If the racks are reduced by a dead maintenance node then it affects two 
areas:
   
   1. New writes - eg for EC going to 4 racks rather than 5. In this case, 
these writes would become mis-replicated after the node / rack comes back. I 
think we have to reduce the racks in this case, or writes will fail to find 
enough racks. The reduction in the topology is not an issue here.
   2. RM checking for mis-replication etc. Here we need to get it to think the 
rack still exists so it doesn't make the container be mis-replicated. I am not 
sure, but does that mean we should ask topology for the racklist and then merge 
in any extra racks from dead maintenance nodes to give the total rack list / 
count, perhaps only if there are maintenance nodes in the system?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to