Perhaps not related to the issue exactly, but there is a pretty good slide deck on CRCs here: https://users.ece.cmu.edu/~koopman/pubs/KoopmanCRCWebinar9May2012.pdf. Starting on slide 38 it has some do's and dont's, including:
1) Use a CRC over the length field, and then the data field, otherwise a 1 bit error in the length field may go undetected. Adar also brought this up. 2) Use a seed other than 0. This prevents zeroed runs in the file from passing the CRC check. 3) "Be careful with bit ordering" - I'm not sure exactly how this applies to our use case. 4) The slide deck doesn't say this - but perhaps we should be chaining the CRCs by using the previous CRC as the seed for the next. I'm still trying to find some evidence that this is a beneficial thing to do. It's probably worth thinking about these things if we are going to revise the format. Back to the issue at hand, I think it may help to enumerate the possible failure scenarios. Here are the failures I can think of: 1) Header magic number does not match 2) Container version is not recognized 3) Record length field is 0 (perhaps this isn't an error). 4) Record length field is greater than the remaining bytes in the file. 5) Record length CRC does not match (if we go ahead and add a CRC for the length). 6) Record data CRC does not match. >From what I understand, it's #4 that is the motivator behind KUDU-1377, right? - Dan On Tue, Apr 5, 2016 at 1:07 AM, Mike Percy <[email protected]> wrote: > It looks to me like the WAL suffers from an issue very similar to > KUDU-1377, where we could be more forgiving when a partial record is > written. The WAL will rebuild a segment footer for you, but it won't ignore > a partial record. However it's less likely because the WAL uses > preallocation. > > Mike > > On Tue, Apr 5, 2016 at 12:38 AM, Adar Dembo <[email protected]> wrote: > > > Quick note: the existing checksum is actually of the length AND the > bytes, > > not just the bytes. But that doesn't help solve your problem, because the > > checksum follows the data, which means if the length is corrupted, you > > don't really know where the checksum lies. > > > > Isn't the solution you're describing effectively the same as what's > > described in KUDU-668? That closely mirrors what the WAL does, which I'd > > argue is a good thing (since it's tried and true). > > > > On Tue, Apr 5, 2016 at 12:14 AM, Mike Percy <[email protected]> wrote: > > > > > I'm working on solving an issue in the Log Block Manager (LBM) filed as > > > KUDU-1377. In this case, writing a partial record to the end of a > > container > > > metadata file can cause a failure to restart Kudu. One possible wversay > > for > > > this to happen is to run out of disk space when appending a block id to > > > this metadata file. I wanted to discuss a potential fix for this issue. > > > > > > The LBM uses the protobuf container (PBC) file format documented in > > > pb_util.h for the container metadata file. The current version of this > > > format (V1) looks like this: > > > > > > > > > <magic number> > > > <container version> > > > repeated <record> > > > > > > > > > with each <record> looking like the following: > > > > > > <uint32 data length> > > > <data bytes> > > > <data checksum> > > > > > > > > > In the LBM, each time we create a new block we append the block id to > the > > > metadata file. On startup, we verify that all records in the file are > > > valid. If not, we print to the log and exit. > > > > > > In the case of a full disk, we will have written a partial record to > the > > > metadata file and at startup we will fail validation, however we should > > be > > > able to detect this case and ignore the partial record on startup. > > Because > > > we still need to support deleting blocks, we need to be able to > continue > > > appending to this metadata file after startup, so we also need to > > truncate > > > the file to the last good record when this occurs. > > > > > > Here is what I am thinking about to fix this issue: > > > > > > 1. When we are reading a container metadata file at startup, if we > detect > > > that there is a trailing record that is too short to fit a valid record > > > (relative to the length of the file) then we truncate the last partial > > > record from the file and continue as normal. > > > > > > 2. To avoid truncating "good" records in the case that there is data > > > corruption in one of the length fields, we also need to extend the PBC > > > format to add a checksum for the record length. So a record would now > > look > > > like the following: > > > > > > <uint32 data length> > > > > > > <length checksum> > > > > > > <data bytes> > > > <data checksum> > > > > > > > > > Does anyone see any drawbacks to this approach? > > > > > > If you made it this far, thanks for reading. > > > > > > Mike > > > > > >
