Oh, one thing I forgot to mention with this approach. Extending the PBC format to a V2 would require some level of backwards compatibility. The way I was thinking about dealing w/ this is that new files would be created in V2 format, including the ability to truncate log block manager container metadata files and therefore tolerate a trailing partial record in metadata files. The current PBC V1 format would continue to be supported, just not created by default anymore, and V1 files would continue to be affected by KUDU-1377.
For people who wanted to migrate their existing metadata to the V2 format, we could provide a migration tool that could convert V1 metadata files to V2, or possibly even automatically upgrade to the new format on startup. However, if someone ever wanted to roll back to an old version of Kudu for some reason, they would not be able to do so after migrating to the new PBC version. Mike On Tue, Apr 5, 2016 at 12:14 AM, Mike Percy <[email protected]> wrote: > I'm working on solving an issue in the Log Block Manager (LBM) filed as > KUDU-1377. In this case, writing a partial record to the end of a container > metadata file can cause a failure to restart Kudu. One possible wversay for > this to happen is to run out of disk space when appending a block id to > this metadata file. I wanted to discuss a potential fix for this issue. > > The LBM uses the protobuf container (PBC) file format documented in > pb_util.h for the container metadata file. The current version of this > format (V1) looks like this: > > > <magic number> > <container version> > repeated <record> > > > with each <record> looking like the following: > > <uint32 data length> > <data bytes> > <data checksum> > > > In the LBM, each time we create a new block we append the block id to the > metadata file. On startup, we verify that all records in the file are > valid. If not, we print to the log and exit. > > In the case of a full disk, we will have written a partial record to the > metadata file and at startup we will fail validation, however we should be > able to detect this case and ignore the partial record on startup. Because > we still need to support deleting blocks, we need to be able to continue > appending to this metadata file after startup, so we also need to truncate > the file to the last good record when this occurs. > > Here is what I am thinking about to fix this issue: > > 1. When we are reading a container metadata file at startup, if we detect > that there is a trailing record that is too short to fit a valid record > (relative to the length of the file) then we truncate the last partial > record from the file and continue as normal. > > 2. To avoid truncating "good" records in the case that there is data > corruption in one of the length fields, we also need to extend the PBC > format to add a checksum for the record length. So a record would now look > like the following: > > <uint32 data length> > > <length checksum> > > <data bytes> > <data checksum> > > > Does anyone see any drawbacks to this approach? > > If you made it this far, thanks for reading. > > Mike > >
