Wei-Chiu Chuang created HDDS-15594:
--------------------------------------

             Summary: Rack Aware and Rack Scatter Policies to consider rack 
density/capacity
                 Key: HDDS-15594
                 URL: https://issues.apache.org/jira/browse/HDDS-15594
             Project: Apache Ozone
          Issue Type: Task
            Reporter: Wei-Chiu Chuang


h1. Problem Statement
The current Apache Ozone container placement policies 
(`SCMContainerPlacementRackAware` and `SCMContainerPlacementRackScatter`) 
select racks and nodes under those racks without considering the aggregate 
storage capacity or density of the racks.

*(Note: Node-level capacity awareness is addressed separately; this proposal 
focuses strictly on rack-level capacity imbalances).*

In a heterogeneous deployment (e.g., where Rack A has an aggregate capacity of 
1PB across its datanodes, and Rack B has an aggregate capacity of 10PB):
1. **Uniform Rack Selection:** The placement policies select racks uniformly at 
random or in a round-robin/scatter manner to satisfy rack-level fault tolerance 
(e.g., 2 racks for 3-replica Ratis pipelines, or maximizing unique racks for 
Erasure Coding).
2. **Aggressive Depletion of Low-Capacity Racks:** Because the placement policy 
treats all racks as having equal capacity, the 1PB rack receives a similar 
number of container allocations as the 10PB rack. Consequently, the 1PB rack 
will reach capacity **10 times faster**.
3. **Loss of Rack-Level Fault Tolerance:** Once the 1PB rack is full, SCM can 
no longer allocate new containers that span that rack. This forces SCM to 
either fail new container allocations or fallback to placing multiple replicas 
on the same rack, violating the rack-safety policy.
4. **Sub-optimal Rebalancing:** Even if individual datanode capacity policies 
(or the `ContainerBalancer`) optimize node-level usage, the placement path 
lacks the global rack-level awareness needed to prevent entire low-capacity 
racks from filling up prematurely.

h1. Proposed Improvement
We should introduce rack capacity and density awareness into SCM placement 
policies to weight the selection of racks themselves:

h2. Option 1: Rack-Weighted Selection
* Modify the rack selection step in `SCMContainerPlacementRackAware` and 
`SCMContainerPlacementRackScatter`.
* The probability of selecting a particular rack should be weighted by the 
aggregate capacity (or aggregate remaining capacity) of all healthy, active 
datanodes in that rack.

h2. Option 2: Rack-Capacity aware Placement Constraints
* When selecting target racks for container replicas (especially under Erasure 
Coding where spanning multiple racks is critical), the algorithm should 
optimize the distribution such that higher-density racks host a proportionally 
larger share of the replica load without violating fault tolerance bounds.

h1. Benefits
* Prevents low-capacity racks from filling up prematurely in heterogeneous rack 
environments.
* Proactively preserves rack-level fault tolerance for the entire cluster by 
spreading the write load proportionally to each rack's storage footprint.
* Reduces I/O and network overhead generated by reactive post-write balancing 
(`ContainerBalancer`).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to