Hi guys, I want to implement a more honest heuristic for historical rebalance. Before, a cluster makes a choice between the historical rebalance or not it only from a partition size. This threshold more known by a name of property IGNITE_PDS_WAL_REBALANCE_THRESHOLD. It might prevent a historical rebalance when a partition is too small, but not if WAL contains more updates than a size of partition, historical rebalance still can be chosen. There is a ticket where need to implement more fair heuristic[1].
My idea for implementation is need to estimate a size of data which will be transferred owe network. In other word if need to rebalance a part of WAL that contains N updates, for recover a partition on another node, which have to contain M rows at all, need chooses a historical rebalance on the case where N < M (WAL history should be presented as well). This approach is easy implemented, because a coordinator node has the size of partitions and counters' interval. But in this case cluster still can find not many updates in too long WAL history. I assume a possibility to work around it, if rebalance historical iterator will not handle checkpoints where not contains updates of particular cache. Checkpoints can skip if counters for the cache (maybe even a specific partitions) was not changed between it and next one. Ticket for improvement rebalance historical iterator[2] I want to hear a view of community on the thought above. Maybe anyone has another opinion? [1]: https://issues.apache.org/jira/browse/IGNITE-13253 [2]: https://issues.apache.org/jira/browse/IGNITE-13254 -- Vladislav Pyatkov