I'm working on solving an issue in the Log Block Manager (LBM) filed as KUDU-1377. In this case, writing a partial record to the end of a container metadata file can cause a failure to restart Kudu. One possible wversay for this to happen is to run out of disk space when appending a block id to this metadata file. I wanted to discuss a potential fix for this issue.
The LBM uses the protobuf container (PBC) file format documented in pb_util.h for the container metadata file. The current version of this format (V1) looks like this: <magic number> <container version> repeated <record> with each <record> looking like the following: <uint32 data length> <data bytes> <data checksum> In the LBM, each time we create a new block we append the block id to the metadata file. On startup, we verify that all records in the file are valid. If not, we print to the log and exit. In the case of a full disk, we will have written a partial record to the metadata file and at startup we will fail validation, however we should be able to detect this case and ignore the partial record on startup. Because we still need to support deleting blocks, we need to be able to continue appending to this metadata file after startup, so we also need to truncate the file to the last good record when this occurs. Here is what I am thinking about to fix this issue: 1. When we are reading a container metadata file at startup, if we detect that there is a trailing record that is too short to fit a valid record (relative to the length of the file) then we truncate the last partial record from the file and continue as normal. 2. To avoid truncating "good" records in the case that there is data corruption in one of the length fields, we also need to extend the PBC format to add a checksum for the record length. So a record would now look like the following: <uint32 data length> <length checksum> <data bytes> <data checksum> Does anyone see any drawbacks to this approach? If you made it this far, thanks for reading. Mike
