Re: Triggering rebalancing on timeout or manually if the baseline topology is not reassembled

Denis Magda Tue, 17 Apr 2018 10:48:46 -0700

Thanks, Pavel!

Alexey, Ivan, could you check that there are no any pitfalls in the example
and it can be used as a template for our users?
https://issues.apache.org/jira/secure/attachment/12919452/BaselineWatcher.java


--
Denis

On Tue, Apr 17, 2018 at 10:40 AM, Pavel Kovalenko <jokse...@gmail.com>
wrote:

> Denis,
>
> I've attached example how to manage baseline automatically (It's named
> BaselineWatcher). It's just an concept and doesn't cover all possible
> cases, but might be good for a start.
>
> 2018-04-13 2:14 GMT+03:00 Denis Magda <dma...@apache.org>:
>
> > Pavel, thanks for the suggestions. They would definitely work out. I
> would
> > document the one with the event subscription:
> > https://issues.apache.org/jira/browse/IGNITE-8241
> >
> > Could you help preparing a sample code snippet with such a listener that
> > will be added to the doc? I know that there are some caveats related to
> the
> > way how such an event has to be processed.
> >
> > Ivan, truly like your idea. Alex G., what's your thought on this?
> >
> > --
> > Denis
> >
> > On Thu, Apr 12, 2018 at 2:22 PM, Ivan Rakov <ivan.glu...@gmail.com>
> wrote:
> >
> > > Guys,
> > >
> > > I also heard complaints about absence of option to automatically change
> > > baseline topology. They absolutely make sense.
> > > What Pavel suggested will work as a workaround. I think, in future
> > > releases we should give user an option to enable a similar behavior via
> > > Ignite Configuration.
> > > It may be called "Baseline Topology change policy". I see it as
> > rule-based
> > > language, which allows to specify conditions of BLT change using
> several
> > > parameters - timeout and minimum allowed number of partition copies
> left
> > > (maybe this option should be provided also on per-cache-group level).
> > > Policy can also specify conditions for including new nodes in BLT if
> they
> > > are present - including node attributes filters and so on.
> > >
> > > What do you think?
> > >
> > > Best Regards,
> > > Ivan Rakov
> > >
> > >
> > > On 12.04.2018 19:41, Pavel Kovalenko wrote:
> > >
> > >> Denis,
> > >>
> > >> It's just one of the ways to implement it. We also can subscribe on
> node
> > >> join / fail events to properly track downtime of a node.
> > >>
> > >> 2018-04-12 19:38 GMT+03:00 Pavel Kovalenko <jokse...@gmail.com>:
> > >>
> > >> Denis,
> > >>>
> > >>> Using our API we can implement this task as follows:
> > >>> Do each minute:
> > >>> 1) Get all alive server nodes consistent ids =>
> > >>> ignite().context().discovery().aliveServerNodes() =>
> > >>> mapToConsistentIds().
> > >>> 2) Get current baseline topology => ignite().cluster().
> > >>> currentBaselineTopology()
> > >>> 3) For each node in baseline and not in alive server nodes check
> > timeout
> > >>> for this node.
> > >>> 4) If timeout is reached remove node from baseline
> > >>> 5) If baseline is changed set new baseline => ignite().cluster().
> > >>> setNewBaseline()
> > >>>
> > >>>
> > >>> 2018-04-12 2:18 GMT+03:00 Denis Magda <dma...@apache.org>:
> > >>>
> > >>> Pavel, Val,
> > >>>>
> > >>>> So, it means that the rebalancing will be initiated only after an
> > >>>> administrator remove the failed node from the topology, right?
> > >>>>
> > >>>> Next, imagine that you are that IT administrator who has to automate
> > the
> > >>>> rebalancing activation if the node failed and not recovered within 1
> > >>>> minute. What would you do and what Ignite provides to fulfill the
> > task?
> > >>>>
> > >>>> --
> > >>>> Denis
> > >>>>
> > >>>> On Wed, Apr 11, 2018 at 1:01 PM, Pavel Kovalenko <
> jokse...@gmail.com>
> > >>>> wrote:
> > >>>>
> > >>>> Denis,
> > >>>>>
> > >>>>> In case of incomplete baseline topology IgniteCache.rebalance()
> will
> > do
> > >>>>> nothing, because this event doesn't trigger partitions exchange or
> > >>>>>
> > >>>> affinity
> > >>>>
> > >>>>> change, so states of existing partitions are hold.
> > >>>>>
> > >>>>> 2018-04-11 22:27 GMT+03:00 Valentin Kulichenko <
> > >>>>> valentin.kuliche...@gmail.com>:
> > >>>>>
> > >>>>> Denis,
> > >>>>>>
> > >>>>>> In my understanding, in this case you should remove node from BLT
> > and
> > >>>>>>
> > >>>>> that
> > >>>>>
> > >>>>>> will trigger the rebalancing, no?
> > >>>>>>
> > >>>>>> -Val
> > >>>>>>
> > >>>>>> On Wed, Apr 11, 2018 at 12:23 PM, Denis Magda <
> dma...@gridgain.com>
> > >>>>>>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Igniters,
> > >>>>>>>
> > >>>>>>> As we know the rebalancing doesn't happen if one of the nodes
> goes
> > >>>>>>>
> > >>>>>> down,
> > >>>>>
> > >>>>>> thus, shrinking the baseline topology. It complies with our
> > >>>>>>>
> > >>>>>> assumption
> > >>>>
> > >>>>> that
> > >>>>>>
> > >>>>>>> the node should be recovered soon and there is no need to waste
> > >>>>>>> CPU/memory/networking resources of the cluster shifting the data
> > >>>>>>>
> > >>>>>> around.
> > >>>>>
> > >>>>>> However, there are always edge cases. I was reasonably asked how
> to
> > >>>>>>>
> > >>>>>> trigger
> > >>>>>>
> > >>>>>>> the rebalancing within the baseline topology manually or on
> timeout
> > >>>>>>>
> > >>>>>> if:
> > >>>>
> > >>>>>     - It's not expected that the failed node would be resurrected
> in
> > >>>>>>>
> > >>>>>> the
> > >>>>
> > >>>>>     nearest time and
> > >>>>>>>     - It's not likely that that node will be replaced by the
> other
> > >>>>>>>
> > >>>>>> one.
> > >>>>
> > >>>>> The question. If I call IgniteCache.rebalance() or configure
> > >>>>>>> CacheConfiguration.rebalanceTimeout will the rebalancing be
> fired
> > >>>>>>>
> > >>>>>> within
> > >>>>>
> > >>>>>> the baseline topology?
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> Denis
> > >>>>>>>
> > >>>>>>>
> > >>>
> > >
> >
>

Re: Triggering rebalancing on timeout or manually if the baseline topology is not reassembled

Reply via email to