Anton Kalashnikov created IGNITE-12653:
------------------------------------------

             Summary: Add example of baseline auto-adjust feature
                 Key: IGNITE-12653
                 URL: https://issues.apache.org/jira/browse/IGNITE-12653
             Project: Ignite
          Issue Type: Task
          Components: examples
            Reporter: Anton Kalashnikov


Work on the Phase II of IEP-4 (Baseline topology) [1] has finished. It makes 
sense to implement some examples of "Baseline auto-adjust" [2]. 

"Baseline auto-adjust" feature implements mechanism of auto-adjust baseline 
corresponding to current topology after event join/left was appeared. It is 
required because when a node left the grid and nobody would change baseline 
manually it can lead to lost data(when some more nodes left the grid on depends 
in backup factor) but permanent tracking of grid is not always 
possible/desirible. Looks like in many cases auto-adjust baseline after some 
timeout is very helpfull. 

Distributed metastore[3](it is already done): 

First of all it is required the ability to store configuration data 
consistently and cluster-wide. Ignite doesn't have any specific API for such 
configurations and we don't want to have many similar implementations of the 
same feature in our code. After some thoughts is was proposed to implement it 
as some kind of distributed metastorage that gives the ability to store any 
data in it. 
First implementation is based on existing local metastorage API for persistent 
clusters (in-memory clusters will store data in memory). Write/remove operation 
use Discovery SPI to send updates to the cluster, it guarantees updates order 
and the fact that all existing (alive) nodes have handled the update message. 
As a way to find out which node has the latest data there is a "version" value 
of distributed metastorage, which is basically <number of all updates, hash of 
updates>. All updates history until some point in the past is stored along with 
the data, so when an outdated node connects to the cluster it will receive all 
the missing data and apply it locally. If there's not enough history stored or 
joining node is clear then it'll receive shapshot of distributed metastorage so 
there won't be inconsistencies. 

Baseline auto-adjust: 

Main scenario: 
        - There is a grid with the baseline is equal to the current topology 
        - New node joins to grid or some node left(failed) the grid 
        - New mechanism detects this event and it add a task for changing 
baseline to queue with configured timeout 
        - If a new event happens before baseline would be changed task would be 
removed from the queue and a new task will be added 
        - When a timeout is expired the task would try to set new baseline 
corresponded to current topology 

First of all we need to add two parameters[4]: 
        - baselineAutoAdjustEnabled - enable/disable "Baseline auto-adjust" 
feature. 
        - baselineAutoAdjustTimeout - timeout after which baseline should be 
changed. 

These parameters are cluster-wide and can be changed in real-time because it is 
based on "Distributed metastore". 

Restrictions: 
        - This mechanism handling events only on active grid 
        - for in-memory nodes - enabled by default. For persistent nodes - 
disabled.
        - If lost partitions was detected this feature would be disabled 
        - If baseline was adjusted manually on baselineNodes != gridNodes the 
exception would be thrown

[1] 
https://cwiki.apache.org/confluence/display/IGNITE/IEP-4+Baseline+topology+for+caches
[2] https://issues.apache.org/jira/browse/IGNITE-8571
[3] https://issues.apache.org/jira/browse/IGNITE-10640
[4] https://issues.apache.org/jira/browse/IGNITE-8573



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to