siddhantsangwan commented on PR #4415:
URL: https://github.com/apache/ozone/pull/4415#issuecomment-1475724957

   DU is run in a DN every 60 minutes by default 
(`hdds.datanode.du.refresh.period`). That means SCM gets updated information on 
a DN's free space every 60 minutes. This information is used by Container 
Balancer to compare DNs based on their space utilisation (free space divided by 
total space). Container Balancer is stateless between iterations - all the 
space utilisation information is recalculated every iteration. So, it's good to 
have a default Container Balancer iteration interval that's greater than the DU 
interval. It prevents balancer from making moves with the same stale 
information that was used in the previous iteration.
   
   If we increase `moveTimeout` to 90 minutes, then at the latest we expect 
moves to complete close to the 90th minute. In the worst case, DU will run 
before moves have completed. This means if our Container Balancer iteration 
interval is close to (and greater than) 90 minutes, it'll start a new iteration 
that does not account for moves made by the previous iteration because DU 
hasn't calculated the latest space yet. That's why I think the iteration 
interval should be greater than 90 + 60 minutes. 160 minutes seems like a good 
default to me. If anyone wants to make it more aggressive, they can enable 
`trigger.du.before.move.enable` and reduce the iteration interval.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to