Quick note: the existing checksum is actually of the length AND the bytes, not just the bytes. But that doesn't help solve your problem, because the checksum follows the data, which means if the length is corrupted, you don't really know where the checksum lies.
Isn't the solution you're describing effectively the same as what's described in KUDU-668? That closely mirrors what the WAL does, which I'd argue is a good thing (since it's tried and true). On Tue, Apr 5, 2016 at 12:14 AM, Mike Percy <[email protected]> wrote: > I'm working on solving an issue in the Log Block Manager (LBM) filed as > KUDU-1377. In this case, writing a partial record to the end of a container > metadata file can cause a failure to restart Kudu. One possible wversay for > this to happen is to run out of disk space when appending a block id to > this metadata file. I wanted to discuss a potential fix for this issue. > > The LBM uses the protobuf container (PBC) file format documented in > pb_util.h for the container metadata file. The current version of this > format (V1) looks like this: > > > <magic number> > <container version> > repeated <record> > > > with each <record> looking like the following: > > <uint32 data length> > <data bytes> > <data checksum> > > > In the LBM, each time we create a new block we append the block id to the > metadata file. On startup, we verify that all records in the file are > valid. If not, we print to the log and exit. > > In the case of a full disk, we will have written a partial record to the > metadata file and at startup we will fail validation, however we should be > able to detect this case and ignore the partial record on startup. Because > we still need to support deleting blocks, we need to be able to continue > appending to this metadata file after startup, so we also need to truncate > the file to the last good record when this occurs. > > Here is what I am thinking about to fix this issue: > > 1. When we are reading a container metadata file at startup, if we detect > that there is a trailing record that is too short to fit a valid record > (relative to the length of the file) then we truncate the last partial > record from the file and continue as normal. > > 2. To avoid truncating "good" records in the case that there is data > corruption in one of the length fields, we also need to extend the PBC > format to add a checksum for the record length. So a record would now look > like the following: > > <uint32 data length> > > <length checksum> > > <data bytes> > <data checksum> > > > Does anyone see any drawbacks to this approach? > > If you made it this far, thanks for reading. > > Mike >
