Roman Puchkovskiy created IGNITE-18077:
------------------------------------------

             Summary: Handle 'RAFT log abruptly deleted' scenario
                 Key: IGNITE-18077
                 URL: https://issues.apache.org/jira/browse/IGNITE-18077
             Project: Ignite
          Issue Type: Improvement
            Reporter: Roman Puchkovskiy
             Fix For: 3.0.0-beta2


The following case is possible: some writes happen to storage (via RAFT 
infrastructure), then Ignite is stopped, RAFT log is deleted (or its FS 
partition is simply unmounted) and node is started again. When it starts, it 
sees no RAFT log, so, according to RAFT semantics, it might think that this is 
a shiny fresh node that just entered the RAFT group, reset index to 0 and 
request log/snapshot from a leader, while in reality it has some data in its 
state machine storage and the correct action would be to remount the FS 
partition with RAFT log.

So, we need a special handling for situations when there is no RAFT log (at 
all), but the main storage reports that its persistedIndex is non-zero. One 
option would be to put the whole Ignite node in a Maintenance node (or its 
equivalent).

Please note that the described situation is different from the 'fresh node' 
scenario where RAFT log is absent, but main storage's persistedIndex is 0. This 
is a normal situation when joining a RAFT group for the first time, no special 
handling is needed (it's handled by JRaft).

A design is needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to