[ 
https://issues.apache.org/jira/browse/IGNITE-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Pavlov updated IGNITE-8529:
-----------------------------------
    Fix Version/s:     (was: 2.6)
                   2.7

> Implement testing framework for checking WAL delta records consistency
> ----------------------------------------------------------------------
>
>                 Key: IGNITE-8529
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8529
>             Project: Ignite
>          Issue Type: New Feature
>          Components: persistence
>            Reporter: Ivan Rakov
>            Assignee: Aleksey Plekhanov
>            Priority: Major
>             Fix For: 2.7
>
>
> We use sharp checkpointing of page memory in persistent mode. That implies 
> that we write two types of records to write-ahead log: logical (e.g. data 
> records) and phyisical (page snapshots + binary delta records). Physical 
> records are applied only when node crashes/stops during ongoing checkpoint. 
> We have the following invariant: checkpoint #(n-1) + all physical records = 
> checkpoint #n.
> If correctness of physical records is broken, Ignite node may recover with 
> incorrect page memory state, which in turn can bring unexpected delayed 
> errors. However, consistency of physical records is poorly tested: only small 
> part of our autotests perform node restarts, and even less part of them 
> perform node stop when ongoing checkpoint is running.
> We should implement abstract test that:
> 1. Enforces checkpoint, freezes memory state at the moment of checkpoint.
> 2. Performs necessary test load.
> 3. Enforces checkpoint again, replays WAL and checks that page store at the 
> moment of previous checkpoint with all applied physical records exactly 
> equals to current checkpoint state.
> Except for checking correctness, test framework should do the following:
> 1. Gather statistics (like histogram) for types of wriiten physical records. 
> That will help us to know what types of physical records are covered by test.
> 2. Visualize expected and actual page state (with all applied physical 
> records) if incorrect page state is detected.
> Regarding implementation, I suppose we can use checkpoint listener mechanism 
> to freeze page memory state at the moment of checkpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to