>
>  I think we can modify the heuristic so
> 1) Exclude partitions by threshold (IGNITE_PDS_WAL_REBALANCE_THRESHOLD -
> reduce it to 500)
> 2) Select only that partition for historical rebalance where difference
> between counters less that partition size.

Agreed, let's go this way.

On Thu, Jul 16, 2020 at 11:03 AM Vladislav Pyatkov <vldpyat...@gmail.com>
wrote:

> I completely forget about another promise to favor of using historical
> rebalance where it is possible. When cluster decided to use a full balance,
> demander nodes should clear not empty partitions.
> This can to consume a long time, in some cases that may be compared with a
> time of rebalance.
> It also accepts a side of heuristics above.
>
> On Thu, Jul 16, 2020 at 12:09 AM Vladislav Pyatkov <vldpyat...@gmail.com>
> wrote:
>
> > Ivan,
> >
> > I agree with a combined approach: threshold for small partitions and
> count
> > of update for partition that outgrew it.
> > This helps to avoid partitions that update not frequently.
> >
> > Reading of a big WAL piece (more than 100Gb) it can happen, when a client
> > configured it intentionally.
> > There are no doubts we can to read it, otherwise WAL space was not
> > configured that too large.
> >
> > I don't see a connection optimization of iterator and issue in atomic
> > protocol.
> > Reordering in WAL, that happened in checkpoint where counter was not
> > changing, is an extremely rare case and the issue will not solve for
> > generic case, this should be fixed in bound of protocol.
> >
> > I think we can modify the heuristic so
> > 1) Exclude partitions by threshold (IGNITE_PDS_WAL_REBALANCE_THRESHOLD -
> > reduce it to 500)
> > 2) Select only that partition for historical rebalance where difference
> > between counters less that partition size.
> >
> > Also implement mentioned optimization for historical iterator, that may
> > reduce a time on reading large WAL interval.
> >
> > On Wed, Jul 15, 2020 at 3:15 PM Ivan Rakov <ivan.glu...@gmail.com>
> wrote:
> >
> >> Hi Vladislav,
> >>
> >> Thanks for raising this topic.
> >> Currently present IGNITE_PDS_WAL_REBALANCE_THRESHOLD (default is
> 500_000)
> >> is controversial. Assuming that the default number of partitions is
> 1024,
> >> cache should contain a really huge amount of data in order to make WAL
> >> delta rebalancing possible. In fact, it's currently disabled for most
> >> production cases, which makes rebalancing of persistent caches
> >> unreasonably
> >> long.
> >>
> >> I think, your approach [1] makes much more sense than the current
> >> heuristic, let's move forward with the proposed solution.
> >>
> >> Though, there are some other corner cases, e.g. this one:
> >> - Configured size of WAL archive is big (>100 GB)
> >> - Cache has small partitions (e.g. 1000 entries)
> >> - Infrequent updates (e.g. ~100 in the whole WAL history of any node)
> >> - There is another cache with very frequent updates which allocate >99%
> of
> >> WAL
> >> In such scenario we may need to iterate over >100 GB of WAL in order to
> >> fetch <1% of needed updates. Even though the amount of network traffic
> is
> >> still optimized, it would be more effective to transfer partitions with
> >> ~1000 entries fully instead of reading >100 GB of WAL.
> >>
> >> I want to highlight that your heuristic definitely makes the situation
> >> better, but due to possible corner cases we should keep the fallback
> lever
> >> to restrict or limit historical rebalance as before. Probably, it would
> be
> >> handy to keep IGNITE_PDS_WAL_REBALANCE_THRESHOLD property with a low
> >> default value (1000, 500 or even 0) and apply your heuristic only for
> >> partitions with bigger size.
> >>
> >> Regarding case [2]: it looks like an improvement that can mitigate some
> >> corner cases (including the one that I have described). I'm ok with it
> as
> >> long as it takes data updates reordering on backup nodes into account.
> We
> >> don't track skipped updates for atomic caches. As a result, detection of
> >> the absence of updates between two checkpoint markers with the same
> >> partition counter can be false positive.
> >>
> >> --
> >> Best Regards,
> >> Ivan Rakov
> >>
> >> On Tue, Jul 14, 2020 at 3:03 PM Vladislav Pyatkov <vldpyat...@gmail.com
> >
> >> wrote:
> >>
> >> > Hi guys,
> >> >
> >> > I want to implement a more honest heuristic for historical rebalance.
> >> > Before, a cluster makes a choice between the historical rebalance or
> >> not it
> >> > only from a partition size. This threshold more known by a name of
> >> property
> >> > IGNITE_PDS_WAL_REBALANCE_THRESHOLD.
> >> > It might prevent a historical rebalance when a partition is too small,
> >> but
> >> > not if WAL contains more updates than a size of partition, historical
> >> > rebalance still can be chosen.
> >> > There is a ticket where need to implement more fair heuristic[1].
> >> >
> >> > My idea for implementation is need to estimate a size of data which
> >> will be
> >> > transferred owe network. In other word if need to rebalance a part of
> >> WAL
> >> > that contains N updates, for recover a partition on another node,
> which
> >> > have to contain M rows at all, need chooses a historical rebalance on
> >> the
> >> > case where N < M (WAL history should be presented as well).
> >> >
> >> > This approach is easy implemented, because a coordinator node has the
> >> size
> >> > of partitions and counters' interval. But in this case cluster still
> can
> >> > find not many updates in too long WAL history. I assume a possibility
> to
> >> > work around it, if rebalance historical iterator will not handle
> >> > checkpoints where not contains updates of particular cache.
> Checkpoints
> >> can
> >> > skip if counters for the cache (maybe even a specific partitions) was
> >> not
> >> > changed between it and next one.
> >> >
> >> > Ticket for improvement rebalance historical iterator[2]
> >> >
> >> > I want to hear a view of community on the thought above.
> >> > Maybe anyone has another opinion?
> >> >
> >> > [1]: https://issues.apache.org/jira/browse/IGNITE-13253
> >> > [2]: https://issues.apache.org/jira/browse/IGNITE-13254
> >> >
> >> > --
> >> > Vladislav Pyatkov
> >> >
> >>
> >
> >
> > --
> > Vladislav Pyatkov
> >
>
>
> --
> Vladislav Pyatkov
>

Reply via email to