Durability of appendable metadata for Log Block Manager

Mike Percy Tue, 05 Apr 2016 00:16:22 -0700

I'm working on solving an issue in the Log Block Manager (LBM) filed as
KUDU-1377. In this case, writing a partial record to the end of a container
metadata file can cause a failure to restart Kudu. One possible wversay for
this to happen is to run out of disk space when appending a block id to
this metadata file. I wanted to discuss a potential fix for this issue.


The LBM uses the protobuf container (PBC) file format documented in
pb_util.h for the container metadata file. The current version of this
format (V1) looks like this:


<magic number>
<container version>
repeated <record>


with each <record> looking like the following:

<uint32 data length>
<data bytes>
<data checksum>


In the LBM, each time we create a new block we append the block id to the
metadata file. On startup, we verify that all records in the file are
valid. If not, we print to the log and exit.

In the case of a full disk, we will have written a partial record to the
metadata file and at startup we will fail validation, however we should be
able to detect this case and ignore the partial record on startup. Because
we still need to support deleting blocks, we need to be able to continue
appending to this metadata file after startup, so we also need to truncate
the file to the last good record when this occurs.

Here is what I am thinking about to fix this issue:

1. When we are reading a container metadata file at startup, if we detect
that there is a trailing record that is too short to fit a valid record
(relative to the length of the file) then we truncate the last partial
record from the file and continue as normal.

2. To avoid truncating "good" records in the case that there is data
corruption in one of the length fields, we also need to extend the PBC
format to add a checksum for the record length. So a record would now look
like the following:

<uint32 data length>

<length checksum>

<data bytes>
<data checksum>


Does anyone see any drawbacks to this approach?

If you made it this far, thanks for reading.

Mike

Durability of appendable metadata for Log Block Manager

Reply via email to