sodonnel commented on PR #8934: URL: https://github.com/apache/ozone/pull/8934#issuecomment-3189172581
You have probably already thought about this, but the removal (or not) of the node from the Topology is all about the edge case where the removal of hte node causes the number of racks in the cluster to decrease. Ie, the node that goes dead is the last node on the rack. The reduction in available racks on the cluster can influence mis-replication, especially for EC, but even for RATIS if the cluster only had 2 racks. If the racks are reduced by a dead maintenance node then it affects two areas: 1. New writes - eg for EC going to 4 racks rather than 5. In this case, these writes would become mis-replicated after the node / rack comes back. I think we have to reduce the racks in this case, or writes will fail to find enough racks. The reduction in the topology is not an issue here. 2. RM checking for mis-replication etc. Here we need to get it to think the rack still exists so it doesn't make the container be mis-replicated. I am not sure, but does that mean we should ask topology for the racklist and then merge in any extra racks from dead maintenance nodes to give the total rack list / count, perhaps only if there are maintenance nodes in the system? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
