> > I think we can modify the heuristic so > 1) Exclude partitions by threshold (IGNITE_PDS_WAL_REBALANCE_THRESHOLD - > reduce it to 500) > 2) Select only that partition for historical rebalance where difference > between counters less that partition size.
Agreed, let's go this way. On Thu, Jul 16, 2020 at 11:03 AM Vladislav Pyatkov <vldpyat...@gmail.com> wrote: > I completely forget about another promise to favor of using historical > rebalance where it is possible. When cluster decided to use a full balance, > demander nodes should clear not empty partitions. > This can to consume a long time, in some cases that may be compared with a > time of rebalance. > It also accepts a side of heuristics above. > > On Thu, Jul 16, 2020 at 12:09 AM Vladislav Pyatkov <vldpyat...@gmail.com> > wrote: > > > Ivan, > > > > I agree with a combined approach: threshold for small partitions and > count > > of update for partition that outgrew it. > > This helps to avoid partitions that update not frequently. > > > > Reading of a big WAL piece (more than 100Gb) it can happen, when a client > > configured it intentionally. > > There are no doubts we can to read it, otherwise WAL space was not > > configured that too large. > > > > I don't see a connection optimization of iterator and issue in atomic > > protocol. > > Reordering in WAL, that happened in checkpoint where counter was not > > changing, is an extremely rare case and the issue will not solve for > > generic case, this should be fixed in bound of protocol. > > > > I think we can modify the heuristic so > > 1) Exclude partitions by threshold (IGNITE_PDS_WAL_REBALANCE_THRESHOLD - > > reduce it to 500) > > 2) Select only that partition for historical rebalance where difference > > between counters less that partition size. > > > > Also implement mentioned optimization for historical iterator, that may > > reduce a time on reading large WAL interval. > > > > On Wed, Jul 15, 2020 at 3:15 PM Ivan Rakov <ivan.glu...@gmail.com> > wrote: > > > >> Hi Vladislav, > >> > >> Thanks for raising this topic. > >> Currently present IGNITE_PDS_WAL_REBALANCE_THRESHOLD (default is > 500_000) > >> is controversial. Assuming that the default number of partitions is > 1024, > >> cache should contain a really huge amount of data in order to make WAL > >> delta rebalancing possible. In fact, it's currently disabled for most > >> production cases, which makes rebalancing of persistent caches > >> unreasonably > >> long. > >> > >> I think, your approach [1] makes much more sense than the current > >> heuristic, let's move forward with the proposed solution. > >> > >> Though, there are some other corner cases, e.g. this one: > >> - Configured size of WAL archive is big (>100 GB) > >> - Cache has small partitions (e.g. 1000 entries) > >> - Infrequent updates (e.g. ~100 in the whole WAL history of any node) > >> - There is another cache with very frequent updates which allocate >99% > of > >> WAL > >> In such scenario we may need to iterate over >100 GB of WAL in order to > >> fetch <1% of needed updates. Even though the amount of network traffic > is > >> still optimized, it would be more effective to transfer partitions with > >> ~1000 entries fully instead of reading >100 GB of WAL. > >> > >> I want to highlight that your heuristic definitely makes the situation > >> better, but due to possible corner cases we should keep the fallback > lever > >> to restrict or limit historical rebalance as before. Probably, it would > be > >> handy to keep IGNITE_PDS_WAL_REBALANCE_THRESHOLD property with a low > >> default value (1000, 500 or even 0) and apply your heuristic only for > >> partitions with bigger size. > >> > >> Regarding case [2]: it looks like an improvement that can mitigate some > >> corner cases (including the one that I have described). I'm ok with it > as > >> long as it takes data updates reordering on backup nodes into account. > We > >> don't track skipped updates for atomic caches. As a result, detection of > >> the absence of updates between two checkpoint markers with the same > >> partition counter can be false positive. > >> > >> -- > >> Best Regards, > >> Ivan Rakov > >> > >> On Tue, Jul 14, 2020 at 3:03 PM Vladislav Pyatkov <vldpyat...@gmail.com > > > >> wrote: > >> > >> > Hi guys, > >> > > >> > I want to implement a more honest heuristic for historical rebalance. > >> > Before, a cluster makes a choice between the historical rebalance or > >> not it > >> > only from a partition size. This threshold more known by a name of > >> property > >> > IGNITE_PDS_WAL_REBALANCE_THRESHOLD. > >> > It might prevent a historical rebalance when a partition is too small, > >> but > >> > not if WAL contains more updates than a size of partition, historical > >> > rebalance still can be chosen. > >> > There is a ticket where need to implement more fair heuristic[1]. > >> > > >> > My idea for implementation is need to estimate a size of data which > >> will be > >> > transferred owe network. In other word if need to rebalance a part of > >> WAL > >> > that contains N updates, for recover a partition on another node, > which > >> > have to contain M rows at all, need chooses a historical rebalance on > >> the > >> > case where N < M (WAL history should be presented as well). > >> > > >> > This approach is easy implemented, because a coordinator node has the > >> size > >> > of partitions and counters' interval. But in this case cluster still > can > >> > find not many updates in too long WAL history. I assume a possibility > to > >> > work around it, if rebalance historical iterator will not handle > >> > checkpoints where not contains updates of particular cache. > Checkpoints > >> can > >> > skip if counters for the cache (maybe even a specific partitions) was > >> not > >> > changed between it and next one. > >> > > >> > Ticket for improvement rebalance historical iterator[2] > >> > > >> > I want to hear a view of community on the thought above. > >> > Maybe anyone has another opinion? > >> > > >> > [1]: https://issues.apache.org/jira/browse/IGNITE-13253 > >> > [2]: https://issues.apache.org/jira/browse/IGNITE-13254 > >> > > >> > -- > >> > Vladislav Pyatkov > >> > > >> > > > > > > -- > > Vladislav Pyatkov > > > > > -- > Vladislav Pyatkov >