[ https://issues.apache.org/jira/browse/IGNITE-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506087#comment-16506087 ]
ASF GitHub Bot commented on IGNITE-8529: ---------------------------------------- GitHub user alex-plekhanov opened a pull request: https://github.com/apache/ignite/pull/4159 IGNITE-8529 Implement testing framework for checking WAL delta records consistency You can merge this pull request into a Git repository by running: $ git pull https://github.com/alex-plekhanov/ignite ignite-8529 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/ignite/pull/4159.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4159 ---- commit a5c142daf7c46a354d5417dac7cf7c3c79a9488b Author: Aleksey Plekhanov <plehanov.alex@...> Date: 2018-06-07T10:39:52Z IGNITE-8529 Draft 3 WIP commit 0ddd4d82c3625e45f21650267685bd2020997cb1 Author: Aleksey Plekhanov <plehanov.alex@...> Date: 2018-06-07T12:25:42Z IGNITE-8529 Draft 3 WIP commit ada909a74d5b000ac741c07421da7f5bcc955023 Author: Aleksey Plekhanov <plehanov.alex@...> Date: 2018-06-07T16:46:33Z IGNITE-8529 Draft 3 WIP commit 3f570c578b4946c6d599e9efbabf6260a45bce50 Author: Aleksey Plekhanov <plehanov.alex@...> Date: 2018-06-07T16:51:08Z IGNITE-8529 Draft 3 WIP commit 883acf9447c2619799f6078523504082ada4dc21 Author: Aleksey Plekhanov <plehanov.alex@...> Date: 2018-06-07T21:36:02Z IGNITE-8529 Draft 2 WIP commit 7cb3d90ff758e42ef7d876d17cb4d597fb0ee240 Author: Aleksey Plekhanov <plehanov.alex@...> Date: 2018-06-08T07:46:42Z IGNITE-8529 Draft 3 WIP commit 41d2dc6a44c3a3775254f9d68595e04ba4198e98 Author: Aleksey Plekhanov <plehanov.alex@...> Date: 2018-06-08T10:43:18Z IGNITE-8529 Implement testing framework for checking WAL delta records consistency commit 4678f6a6b4c7a5922063f2118bb4810f5e2b6d12 Author: Aleksey Plekhanov <plehanov.alex@...> Date: 2018-06-08T12:52:01Z IGNITE-8529 Made page memory reusable after cache destroy. commit c64719bf6be1562b0ad8f660eecf780cafca4334 Author: Aleksey Plekhanov <plehanov.alex@...> Date: 2018-06-08T14:23:14Z IGNITE-8529 Made page memory reusable after cache destroy (fix). commit 755cae5c68ef472a56871a891095721aebe60ff0 Author: Aleksey Plekhanov <plehanov.alex@...> Date: 2018-06-08T14:32:47Z IGNITE-8529 Cleanup ---- > Implement testing framework for checking WAL delta records consistency > ---------------------------------------------------------------------- > > Key: IGNITE-8529 > URL: https://issues.apache.org/jira/browse/IGNITE-8529 > Project: Ignite > Issue Type: New Feature > Components: persistence > Reporter: Ivan Rakov > Assignee: Aleksey Plekhanov > Priority: Major > Fix For: 2.6 > > > We use sharp checkpointing of page memory in persistent mode. That implies > that we write two types of records to write-ahead log: logical (e.g. data > records) and phyisical (page snapshots + binary delta records). Physical > records are applied only when node crashes/stops during ongoing checkpoint. > We have the following invariant: checkpoint #(n-1) + all physical records = > checkpoint #n. > If correctness of physical records is broken, Ignite node may recover with > incorrect page memory state, which in turn can bring unexpected delayed > errors. However, consistency of physical records is poorly tested: only small > part of our autotests perform node restarts, and even less part of them > perform node stop when ongoing checkpoint is running. > We should implement abstract test that: > 1. Enforces checkpoint, freezes memory state at the moment of checkpoint. > 2. Performs necessary test load. > 3. Enforces checkpoint again, replays WAL and checks that page store at the > moment of previous checkpoint with all applied physical records exactly > equals to current checkpoint state. > Except for checking correctness, test framework should do the following: > 1. Gather statistics (like histogram) for types of wriiten physical records. > That will help us to know what types of physical records are covered by test. > 2. Visualize expected and actual page state (with all applied physical > records) if incorrect page state is detected. > Regarding implementation, I suppose we can use checkpoint listener mechanism > to freeze page memory state at the moment of checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)