Todd Lipcon has submitted this change and it was merged.
Change subject: KUDU-969. Fix handling of crashes just before tablet metadata
flush
......................................................................
KUDU-969. Fix handling of crashes just before tablet metadata flush
This commit fixes a long-standing bug with the following race:
- a compaction is in "duplicating" phase (i.e. updates are written to both
the input and output rowsets)
- a update arrives which is duplicated to both input and output rowsets
- we crash before writing the new metadata
In bootstrap, we see that the update was written to one rowset (i.e. the output
rowset) whose ID is not present in our metadata. We assumed incorrectly that
this meant the rowset had been written, and then compacted away, and therefore
considered it a "flushed" store. However, in fact, it really should be
considered an "unflushed" store.
Really, we cannot distinguish whether this update was flushed and later
compacted
away, or if it was never flushed at all. So, this patch removes usage of the
term
'unflushed' from the bootstrap code (see below for more details).
Despite having somewhat flawed reasoning, it turns out that most of the
bootstrap logic ended up being correct, anyway. In the case above, despite
incorrectly considering the edit "flushed", the other duplicated target (the
input rowset) was considered "unflushed", so we correctly replayed the edit.
However, we did not handle this case correctly in our sanity checks regarding
pending commits at the end of bootstrap. In particular, we checked that a
pending commit did not refer to any store which was considered "flushed". In
the above scenario, the duplicated "output" was designated as "flushed" and
caused the bootstrap process to flag a corruption.
The bulk of this patch is made up of a new fault point and integration test
which exercises crashes just prior to flushing tablet metadata. This ensures
that these scenarios in bootstrap are well covered, and that the resulting
tablet does not diverge from another replica which did not have a fault
injected.
The changes to the bootstrap code itself are actually not so large:
- First, we stop using the term 'unflushed' to refer to stores. Instead, we
call a store 'active' if it was a current in-memory store at the time of
crash according to the metadata. A store is considered inactive if either
(a) we know it was flushed to disk (eg a DMS ID lower than the durable DMS
ID for a given DRS), or:
(b) it is a DRS ID which is not currently represented in the tablet
metadata.
This could either because the DRS was flushed and compacted away, or
because it was a DRS that was in the process of being written when the
server crashed.
- Because of the above terminology improvement, the patch renames
'WasStoreAlreadyFlushed' to 'IsMemStoreActive' (and correspondingly inverts
its
return value). The logic itself is unchanged, though the comments are
expanded.
- FilterMutate() and FilterInsert() are changed correspondingly to use the new
method (WasStoreAlreadyFlushed() changed to !IsMemStoreActive())
- Now that we better understand these code paths, I replaced a section of TODO
in FilterMutate() which previously had a DFATAL with a better comment and a
Corruption status return.
- The checks for pending commits at the end of bootstrap were changed per the
description above: we only need to ensure that _one_ target of each pending
commit is active, not that _all_ targets are active.
Without the changes to tablet_bootstrap.cc, the test fails frequently with
errors like:
FAILED: Corruption: Failed log replay. Reason: CommitMsg was pending but it
referred to flushed stores. ...
With the change, it passed 500/500 iterations:
http://dist-test.cloudera.org/job?job_id=todd.1456816652.10895
Change-Id: I1d996a470b9a7957ad3eb7fe02f22f85c32b5f9d
Reviewed-on: http://gerrit.cloudera.org:8080/2333
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves <[email protected]>
---
M src/kudu/integration-tests/delete_table-test.cc
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/integration-tests/external_mini_cluster.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet_bootstrap.cc
6 files changed, 298 insertions(+), 73 deletions(-)
Approvals:
David Ribeiro Alves: Looks good to me, approved
Kudu Jenkins: Verified
--
To view, visit http://gerrit.cloudera.org:8080/2333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: I1d996a470b9a7957ad3eb7fe02f22f85c32b5f9d
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <[email protected]>