Denis, It's just one of the ways to implement it. We also can subscribe on node join / fail events to properly track downtime of a node.
2018-04-12 19:38 GMT+03:00 Pavel Kovalenko <jokse...@gmail.com>: > Denis, > > Using our API we can implement this task as follows: > Do each minute: > 1) Get all alive server nodes consistent ids => > ignite().context().discovery().aliveServerNodes() => mapToConsistentIds(). > 2) Get current baseline topology => ignite().cluster(). > currentBaselineTopology() > 3) For each node in baseline and not in alive server nodes check timeout > for this node. > 4) If timeout is reached remove node from baseline > 5) If baseline is changed set new baseline => ignite().cluster(). > setNewBaseline() > > > 2018-04-12 2:18 GMT+03:00 Denis Magda <dma...@apache.org>: > >> Pavel, Val, >> >> So, it means that the rebalancing will be initiated only after an >> administrator remove the failed node from the topology, right? >> >> Next, imagine that you are that IT administrator who has to automate the >> rebalancing activation if the node failed and not recovered within 1 >> minute. What would you do and what Ignite provides to fulfill the task? >> >> -- >> Denis >> >> On Wed, Apr 11, 2018 at 1:01 PM, Pavel Kovalenko <jokse...@gmail.com> >> wrote: >> >> > Denis, >> > >> > In case of incomplete baseline topology IgniteCache.rebalance() will do >> > nothing, because this event doesn't trigger partitions exchange or >> affinity >> > change, so states of existing partitions are hold. >> > >> > 2018-04-11 22:27 GMT+03:00 Valentin Kulichenko < >> > valentin.kuliche...@gmail.com>: >> > >> > > Denis, >> > > >> > > In my understanding, in this case you should remove node from BLT and >> > that >> > > will trigger the rebalancing, no? >> > > >> > > -Val >> > > >> > > On Wed, Apr 11, 2018 at 12:23 PM, Denis Magda <dma...@gridgain.com> >> > wrote: >> > > >> > > > Igniters, >> > > > >> > > > As we know the rebalancing doesn't happen if one of the nodes goes >> > down, >> > > > thus, shrinking the baseline topology. It complies with our >> assumption >> > > that >> > > > the node should be recovered soon and there is no need to waste >> > > > CPU/memory/networking resources of the cluster shifting the data >> > around. >> > > > >> > > > However, there are always edge cases. I was reasonably asked how to >> > > trigger >> > > > the rebalancing within the baseline topology manually or on timeout >> if: >> > > > >> > > > - It's not expected that the failed node would be resurrected in >> the >> > > > nearest time and >> > > > - It's not likely that that node will be replaced by the other >> one. >> > > > >> > > > The question. If I call IgniteCache.rebalance() or configure >> > > > CacheConfiguration.rebalanceTimeout will the rebalancing be fired >> > within >> > > > the baseline topology? >> > > > >> > > > -- >> > > > Denis >> > > > >> > > >> > >> > >