[ https://issues.apache.org/jira/browse/IGNITE-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512774#comment-16512774 ]
Ivan Rakov commented on IGNITE-8529: ------------------------------------ [~alex_pl], thanks for your contribution! Merged to master. > Implement testing framework for checking WAL delta records consistency > ---------------------------------------------------------------------- > > Key: IGNITE-8529 > URL: https://issues.apache.org/jira/browse/IGNITE-8529 > Project: Ignite > Issue Type: New Feature > Components: persistence > Reporter: Ivan Rakov > Assignee: Aleksey Plekhanov > Priority: Major > Fix For: 2.6 > > > We use sharp checkpointing of page memory in persistent mode. That implies > that we write two types of records to write-ahead log: logical (e.g. data > records) and phyisical (page snapshots + binary delta records). Physical > records are applied only when node crashes/stops during ongoing checkpoint. > We have the following invariant: checkpoint #(n-1) + all physical records = > checkpoint #n. > If correctness of physical records is broken, Ignite node may recover with > incorrect page memory state, which in turn can bring unexpected delayed > errors. However, consistency of physical records is poorly tested: only small > part of our autotests perform node restarts, and even less part of them > perform node stop when ongoing checkpoint is running. > We should implement abstract test that: > 1. Enforces checkpoint, freezes memory state at the moment of checkpoint. > 2. Performs necessary test load. > 3. Enforces checkpoint again, replays WAL and checks that page store at the > moment of previous checkpoint with all applied physical records exactly > equals to current checkpoint state. > Except for checking correctness, test framework should do the following: > 1. Gather statistics (like histogram) for types of wriiten physical records. > That will help us to know what types of physical records are covered by test. > 2. Visualize expected and actual page state (with all applied physical > records) if incorrect page state is detected. > Regarding implementation, I suppose we can use checkpoint listener mechanism > to freeze page memory state at the moment of checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)