Hi guys,

I want to implement a more honest heuristic for historical rebalance.
Before, a cluster makes a choice between the historical rebalance or not it
only from a partition size. This threshold more known by a name of property
IGNITE_PDS_WAL_REBALANCE_THRESHOLD.
It might prevent a historical rebalance when a partition is too small, but
not if WAL contains more updates than a size of partition, historical
rebalance still can be chosen.
There is a ticket where need to implement more fair heuristic[1].

My idea for implementation is need to estimate a size of data which will be
transferred owe network. In other word if need to rebalance a part of WAL
that contains N updates, for recover a partition on another node, which
have to contain M rows at all, need chooses a historical rebalance on the
case where N < M (WAL history should be presented as well).

This approach is easy implemented, because a coordinator node has the size
of partitions and counters' interval. But in this case cluster still can
find not many updates in too long WAL history. I assume a possibility to
work around it, if rebalance historical iterator will not handle
checkpoints where not contains updates of particular cache. Checkpoints can
skip if counters for the cache (maybe even a specific partitions) was not
changed between it and next one.

Ticket for improvement rebalance historical iterator[2]

I want to hear a view of community on the thought above.
Maybe anyone has another opinion?

[1]: https://issues.apache.org/jira/browse/IGNITE-13253
[2]: https://issues.apache.org/jira/browse/IGNITE-13254

-- 
Vladislav Pyatkov

Reply via email to