Thanks, Pavel! Alexey, Ivan, could you check that there are no any pitfalls in the example and it can be used as a template for our users? https://issues.apache.org/jira/secure/attachment/12919452/BaselineWatcher.java
-- Denis On Tue, Apr 17, 2018 at 10:40 AM, Pavel Kovalenko <jokse...@gmail.com> wrote: > Denis, > > I've attached example how to manage baseline automatically (It's named > BaselineWatcher). It's just an concept and doesn't cover all possible > cases, but might be good for a start. > > 2018-04-13 2:14 GMT+03:00 Denis Magda <dma...@apache.org>: > > > Pavel, thanks for the suggestions. They would definitely work out. I > would > > document the one with the event subscription: > > https://issues.apache.org/jira/browse/IGNITE-8241 > > > > Could you help preparing a sample code snippet with such a listener that > > will be added to the doc? I know that there are some caveats related to > the > > way how such an event has to be processed. > > > > Ivan, truly like your idea. Alex G., what's your thought on this? > > > > -- > > Denis > > > > On Thu, Apr 12, 2018 at 2:22 PM, Ivan Rakov <ivan.glu...@gmail.com> > wrote: > > > > > Guys, > > > > > > I also heard complaints about absence of option to automatically change > > > baseline topology. They absolutely make sense. > > > What Pavel suggested will work as a workaround. I think, in future > > > releases we should give user an option to enable a similar behavior via > > > Ignite Configuration. > > > It may be called "Baseline Topology change policy". I see it as > > rule-based > > > language, which allows to specify conditions of BLT change using > several > > > parameters - timeout and minimum allowed number of partition copies > left > > > (maybe this option should be provided also on per-cache-group level). > > > Policy can also specify conditions for including new nodes in BLT if > they > > > are present - including node attributes filters and so on. > > > > > > What do you think? > > > > > > Best Regards, > > > Ivan Rakov > > > > > > > > > On 12.04.2018 19:41, Pavel Kovalenko wrote: > > > > > >> Denis, > > >> > > >> It's just one of the ways to implement it. We also can subscribe on > node > > >> join / fail events to properly track downtime of a node. > > >> > > >> 2018-04-12 19:38 GMT+03:00 Pavel Kovalenko <jokse...@gmail.com>: > > >> > > >> Denis, > > >>> > > >>> Using our API we can implement this task as follows: > > >>> Do each minute: > > >>> 1) Get all alive server nodes consistent ids => > > >>> ignite().context().discovery().aliveServerNodes() => > > >>> mapToConsistentIds(). > > >>> 2) Get current baseline topology => ignite().cluster(). > > >>> currentBaselineTopology() > > >>> 3) For each node in baseline and not in alive server nodes check > > timeout > > >>> for this node. > > >>> 4) If timeout is reached remove node from baseline > > >>> 5) If baseline is changed set new baseline => ignite().cluster(). > > >>> setNewBaseline() > > >>> > > >>> > > >>> 2018-04-12 2:18 GMT+03:00 Denis Magda <dma...@apache.org>: > > >>> > > >>> Pavel, Val, > > >>>> > > >>>> So, it means that the rebalancing will be initiated only after an > > >>>> administrator remove the failed node from the topology, right? > > >>>> > > >>>> Next, imagine that you are that IT administrator who has to automate > > the > > >>>> rebalancing activation if the node failed and not recovered within 1 > > >>>> minute. What would you do and what Ignite provides to fulfill the > > task? > > >>>> > > >>>> -- > > >>>> Denis > > >>>> > > >>>> On Wed, Apr 11, 2018 at 1:01 PM, Pavel Kovalenko < > jokse...@gmail.com> > > >>>> wrote: > > >>>> > > >>>> Denis, > > >>>>> > > >>>>> In case of incomplete baseline topology IgniteCache.rebalance() > will > > do > > >>>>> nothing, because this event doesn't trigger partitions exchange or > > >>>>> > > >>>> affinity > > >>>> > > >>>>> change, so states of existing partitions are hold. > > >>>>> > > >>>>> 2018-04-11 22:27 GMT+03:00 Valentin Kulichenko < > > >>>>> valentin.kuliche...@gmail.com>: > > >>>>> > > >>>>> Denis, > > >>>>>> > > >>>>>> In my understanding, in this case you should remove node from BLT > > and > > >>>>>> > > >>>>> that > > >>>>> > > >>>>>> will trigger the rebalancing, no? > > >>>>>> > > >>>>>> -Val > > >>>>>> > > >>>>>> On Wed, Apr 11, 2018 at 12:23 PM, Denis Magda < > dma...@gridgain.com> > > >>>>>> > > >>>>> wrote: > > >>>>> > > >>>>>> Igniters, > > >>>>>>> > > >>>>>>> As we know the rebalancing doesn't happen if one of the nodes > goes > > >>>>>>> > > >>>>>> down, > > >>>>> > > >>>>>> thus, shrinking the baseline topology. It complies with our > > >>>>>>> > > >>>>>> assumption > > >>>> > > >>>>> that > > >>>>>> > > >>>>>>> the node should be recovered soon and there is no need to waste > > >>>>>>> CPU/memory/networking resources of the cluster shifting the data > > >>>>>>> > > >>>>>> around. > > >>>>> > > >>>>>> However, there are always edge cases. I was reasonably asked how > to > > >>>>>>> > > >>>>>> trigger > > >>>>>> > > >>>>>>> the rebalancing within the baseline topology manually or on > timeout > > >>>>>>> > > >>>>>> if: > > >>>> > > >>>>> - It's not expected that the failed node would be resurrected > in > > >>>>>>> > > >>>>>> the > > >>>> > > >>>>> nearest time and > > >>>>>>> - It's not likely that that node will be replaced by the > other > > >>>>>>> > > >>>>>> one. > > >>>> > > >>>>> The question. If I call IgniteCache.rebalance() or configure > > >>>>>>> CacheConfiguration.rebalanceTimeout will the rebalancing be > fired > > >>>>>>> > > >>>>>> within > > >>>>> > > >>>>>> the baseline topology? > > >>>>>>> > > >>>>>>> -- > > >>>>>>> Denis > > >>>>>>> > > >>>>>>> > > >>> > > > > > >