Kirill Tkalenko created IGNITE-25501:
----------------------------------------

             Summary: Incorrect partition state when entering node with index 
greater than current majority after snapshot
                 Key: IGNITE-25501
                 URL: https://issues.apache.org/jira/browse/IGNITE-25501
             Project: Ignite
          Issue Type: Bug
            Reporter: Kirill Tkalenko


When analyzing IGNITE-24802, it was discovered that if a snapshot is taken 
before stopping the partition leader, thereby disabling raft log suffix 
truncations. Then when the node returns, the logs will show the message "FATAL 
ERROR: Can't truncate logs before appliedId=LogId [index=26, term=2], 
lastIndexKept=0" and the partition will be in a healthy state and it will be 
possible to read records from it that should not be there. This needs to be 
fixed.

It is possible to simply put the partition in an erroneous state so that the 
user can then fix this situation himself using the disaster recovery mechanism.

To reproduce this, a raft snapshot needs to be taken in  
*org.apache.ignite.internal.ItTruncateRaftLogAndRestartNodesTest#enterNodeWithIndexGreaterThanCurrentMajority*
 before stopping the node with index "2", for example like this 
*org.apache.ignite.internal.replicator.Replica#createSnapshotOn*.

It is suggested to add a new test with this behavior, since the current test 
has another problem, it will be possible to get the partition state through 
*org.apache.ignite.internal.table.distributed.disaster.DisasterRecoveryManager#localTablePartitionStates*.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to