Re: Triggering rebalancing on timeout or manually if the baseline topology is not reassembled

Pavel Kovalenko Thu, 12 Apr 2018 09:41:40 -0700

Denis,

It's just one of the ways to implement it. We also can subscribe on node
join / fail events to properly track downtime of a node.


2018-04-12 19:38 GMT+03:00 Pavel Kovalenko <jokse...@gmail.com>:

> Denis,
>
> Using our API we can implement this task as follows:
> Do each minute:
> 1) Get all alive server nodes consistent ids =>
> ignite().context().discovery().aliveServerNodes() => mapToConsistentIds().
> 2) Get current baseline topology => ignite().cluster().
> currentBaselineTopology()
> 3) For each node in baseline and not in alive server nodes check timeout
> for this node.
> 4) If timeout is reached remove node from baseline
> 5) If baseline is changed set new baseline => ignite().cluster().
> setNewBaseline()
>
>
> 2018-04-12 2:18 GMT+03:00 Denis Magda <dma...@apache.org>:
>
>> Pavel, Val,
>>
>> So, it means that the rebalancing will be initiated only after an
>> administrator remove the failed node from the topology, right?
>>
>> Next, imagine that you are that IT administrator who has to automate the
>> rebalancing activation if the node failed and not recovered within 1
>> minute. What would you do and what Ignite provides to fulfill the task?
>>
>> --
>> Denis
>>
>> On Wed, Apr 11, 2018 at 1:01 PM, Pavel Kovalenko <jokse...@gmail.com>
>> wrote:
>>
>> > Denis,
>> >
>> > In case of incomplete baseline topology IgniteCache.rebalance() will do
>> > nothing, because this event doesn't trigger partitions exchange or
>> affinity
>> > change, so states of existing partitions are hold.
>> >
>> > 2018-04-11 22:27 GMT+03:00 Valentin Kulichenko <
>> > valentin.kuliche...@gmail.com>:
>> >
>> > > Denis,
>> > >
>> > > In my understanding, in this case you should remove node from BLT and
>> > that
>> > > will trigger the rebalancing, no?
>> > >
>> > > -Val
>> > >
>> > > On Wed, Apr 11, 2018 at 12:23 PM, Denis Magda <dma...@gridgain.com>
>> > wrote:
>> > >
>> > > > Igniters,
>> > > >
>> > > > As we know the rebalancing doesn't happen if one of the nodes goes
>> > down,
>> > > > thus, shrinking the baseline topology. It complies with our
>> assumption
>> > > that
>> > > > the node should be recovered soon and there is no need to waste
>> > > > CPU/memory/networking resources of the cluster shifting the data
>> > around.
>> > > >
>> > > > However, there are always edge cases. I was reasonably asked how to
>> > > trigger
>> > > > the rebalancing within the baseline topology manually or on timeout
>> if:
>> > > >
>> > > >    - It's not expected that the failed node would be resurrected in
>> the
>> > > >    nearest time and
>> > > >    - It's not likely that that node will be replaced by the other
>> one.
>> > > >
>> > > > The question. If I call IgniteCache.rebalance() or configure
>> > > > CacheConfiguration.rebalanceTimeout will the rebalancing be fired
>> > within
>> > > > the baseline topology?
>> > > >
>> > > > --
>> > > > Denis
>> > > >
>> > >
>> >
>>
>
>

Re: Triggering rebalancing on timeout or manually if the baseline topology is not reassembled

Reply via email to