keith-turner opened a new issue #542: Could add stateful checks to WAL recovery code URL: https://github.com/apache/accumulo/issues/542 Data is written to WALs in temporal order. Mutations are written to a WAL with per tablet sequence numbers. The sequence numbers do not change until a minor compaction occurs. The fact of a minor compaction is recorded in the WAL. Below is an example of a WAL in the order it was written with the following explanation of the contents. * Defines tablet `2<<` with id `5`. Everything else the log will use the id `5` * A mutation to set row `r1` column `f1:q1` to `v1`. This mutation has a seq of `1` in the WAL. * A mutation setting `r1 f1:q2=v2` with a seq of `1` * Compaction start event with seq `2` * Compaction finish event with seq `3` * A mutation setting `r1 f1:q1=v3` with a seq of `3` * Etc ``` DEFINE_TABLET 5 1 2<< MANY_MUTATIONS 5 1 1 mutations: r1 f1:q1 [system]:1529685833137 [] v1 MANY_MUTATIONS 5 1 1 mutations: r1 f1:q2 [system]:1529685833149 [] v2 COMPACTION_START 5 2 hdfs://localhost:8020/accumulo/tables/2/default_tablet/F0000005.rf COMPACTION_FINISH 5 3 MANY_MUTATIONS 5 3 1 mutations: r1 f1:q1 [system]:1529685849576 [] v3 COMPACTION_START 5 4 hdfs://localhost:8020/accumulo/tables/2/default_tablet/F0000006.rf COMPACTION_FINISH 5 5 MANY_MUTATIONS 5 5 1 mutations: r1 f1:q1 [system]:1529685856321 [] v4 MANY_MUTATIONS 5 5 1 mutations: r1 f1:q2 [system]:1529685867727 [] v5 ``` Given the example above it would be odd to see mutations in a WAL with sequence numbers `X`,`X+2`, and `X+4` without seeing corresponding compaction events between the mutations. So we could add two types of sanity checks to the recovery code : * Check that compaction start and finish events increment in a orderly way. Should increment by one. Seeing a them jump by more than one may indicate data is missing. * Check that mutation seq numbers in the WAL are in an expected range. Not completely sure, but that range may be : `[min_compaction_finish-2, max_compaction_finish]`. The retry behavior when writing to WALs and its efect on seq numbers, if any, needs to be looked into.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
