7) After the header or final record, there are more than 0 bytes remaining in the file, but not enough for the length and CRC(length) fields.
On Tue, Apr 5, 2016 at 10:34 AM, Dan Burkert <[email protected]> wrote: > Perhaps not related to the issue exactly, but there is a pretty good slide > deck on CRCs here: > https://users.ece.cmu.edu/~koopman/pubs/KoopmanCRCWebinar9May2012.pdf. > Starting on slide 38 it has some do's and dont's, including: > > 1) Use a CRC over the length field, and then the data field, otherwise a 1 > bit error in the length field may go undetected. Adar also brought this up. > 2) Use a seed other than 0. This prevents zeroed runs in the file from > passing the CRC check. > 3) "Be careful with bit ordering" - I'm not sure exactly how this applies > to our use case. > 4) The slide deck doesn't say this - but perhaps we should be chaining the > CRCs by using the previous CRC as the seed for the next. I'm still trying > to find some evidence that this is a beneficial thing to do. > > It's probably worth thinking about these things if we are going to revise > the format. > > Back to the issue at hand, I think it may help to enumerate the possible > failure scenarios. Here are the failures I can think of: > > 1) Header magic number does not match > 2) Container version is not recognized > 3) Record length field is 0 (perhaps this isn't an error). > 4) Record length field is greater than the remaining bytes in the file. > 5) Record length CRC does not match (if we go ahead and add a CRC for the > length). > 6) Record data CRC does not match. > > From what I understand, it's #4 that is the motivator behind KUDU-1377, > right? > > - Dan > > On Tue, Apr 5, 2016 at 1:07 AM, Mike Percy <[email protected]> wrote: > >> It looks to me like the WAL suffers from an issue very similar to >> KUDU-1377, where we could be more forgiving when a partial record is >> written. The WAL will rebuild a segment footer for you, but it won't >> ignore >> a partial record. However it's less likely because the WAL uses >> preallocation. >> >> Mike >> >> On Tue, Apr 5, 2016 at 12:38 AM, Adar Dembo <[email protected]> wrote: >> >> > Quick note: the existing checksum is actually of the length AND the >> bytes, >> > not just the bytes. But that doesn't help solve your problem, because >> the >> > checksum follows the data, which means if the length is corrupted, you >> > don't really know where the checksum lies. >> > >> > Isn't the solution you're describing effectively the same as what's >> > described in KUDU-668? That closely mirrors what the WAL does, which I'd >> > argue is a good thing (since it's tried and true). >> > >> > On Tue, Apr 5, 2016 at 12:14 AM, Mike Percy <[email protected]> wrote: >> > >> > > I'm working on solving an issue in the Log Block Manager (LBM) filed >> as >> > > KUDU-1377. In this case, writing a partial record to the end of a >> > container >> > > metadata file can cause a failure to restart Kudu. One possible >> wversay >> > for >> > > this to happen is to run out of disk space when appending a block id >> to >> > > this metadata file. I wanted to discuss a potential fix for this >> issue. >> > > >> > > The LBM uses the protobuf container (PBC) file format documented in >> > > pb_util.h for the container metadata file. The current version of this >> > > format (V1) looks like this: >> > > >> > > >> > > <magic number> >> > > <container version> >> > > repeated <record> >> > > >> > > >> > > with each <record> looking like the following: >> > > >> > > <uint32 data length> >> > > <data bytes> >> > > <data checksum> >> > > >> > > >> > > In the LBM, each time we create a new block we append the block id to >> the >> > > metadata file. On startup, we verify that all records in the file are >> > > valid. If not, we print to the log and exit. >> > > >> > > In the case of a full disk, we will have written a partial record to >> the >> > > metadata file and at startup we will fail validation, however we >> should >> > be >> > > able to detect this case and ignore the partial record on startup. >> > Because >> > > we still need to support deleting blocks, we need to be able to >> continue >> > > appending to this metadata file after startup, so we also need to >> > truncate >> > > the file to the last good record when this occurs. >> > > >> > > Here is what I am thinking about to fix this issue: >> > > >> > > 1. When we are reading a container metadata file at startup, if we >> detect >> > > that there is a trailing record that is too short to fit a valid >> record >> > > (relative to the length of the file) then we truncate the last partial >> > > record from the file and continue as normal. >> > > >> > > 2. To avoid truncating "good" records in the case that there is data >> > > corruption in one of the length fields, we also need to extend the PBC >> > > format to add a checksum for the record length. So a record would now >> > look >> > > like the following: >> > > >> > > <uint32 data length> >> > > >> > > <length checksum> >> > > >> > > <data bytes> >> > > <data checksum> >> > > >> > > >> > > Does anyone see any drawbacks to this approach? >> > > >> > > If you made it this far, thanks for reading. >> > > >> > > Mike >> > > >> > >> > >
