Re: Durability of appendable metadata for Log Block Manager

Dan Burkert Tue, 05 Apr 2016 10:44:36 -0700

7) After the header or final record, there are more than 0 bytes remaining
in the file, but not enough for the length and CRC(length) fields.


On Tue, Apr 5, 2016 at 10:34 AM, Dan Burkert <[email protected]> wrote:

> Perhaps not related to the issue exactly, but there is a pretty good slide
> deck on CRCs here:
> https://users.ece.cmu.edu/~koopman/pubs/KoopmanCRCWebinar9May2012.pdf.
> Starting on slide 38 it has some do's and dont's, including:
>
> 1) Use a CRC over the length field, and then the data field, otherwise a 1
> bit error in the length field may go undetected.  Adar also brought this up.
> 2) Use a seed other than 0. This prevents zeroed runs in the file from
> passing the CRC check.
> 3) "Be careful with bit ordering" - I'm not sure exactly how this applies
> to our use case.
> 4) The slide deck doesn't say this - but perhaps we should be chaining the
> CRCs by using the previous CRC as the seed for the next. I'm still trying
> to find some evidence that this is a beneficial thing to do.
>
> It's probably worth thinking about these things if we are going to revise
> the format.
>
> Back to the issue at hand, I think it may help to enumerate the possible
> failure scenarios.  Here are the failures I can think of:
>
> 1) Header magic number does not match
> 2) Container version is not recognized
> 3) Record length field is 0 (perhaps this isn't an error).
> 4) Record length field is greater than the remaining bytes in the file.
> 5) Record length CRC does not match (if we go ahead and add a CRC for the
> length).
> 6) Record data CRC does not match.
>
> From what I understand, it's #4 that is the motivator behind KUDU-1377,
> right?
>
> - Dan
>
> On Tue, Apr 5, 2016 at 1:07 AM, Mike Percy <[email protected]> wrote:
>
>> It looks to me like the WAL suffers from an issue very similar to
>> KUDU-1377, where we could be more forgiving when a partial record is
>> written. The WAL will rebuild a segment footer for you, but it won't
>> ignore
>> a partial record. However it's less likely because the WAL uses
>> preallocation.
>>
>> Mike
>>
>> On Tue, Apr 5, 2016 at 12:38 AM, Adar Dembo <[email protected]> wrote:
>>
>> > Quick note: the existing checksum is actually of the length AND the
>> bytes,
>> > not just the bytes. But that doesn't help solve your problem, because
>> the
>> > checksum follows the data, which means if the length is corrupted, you
>> > don't really know where the checksum lies.
>> >
>> > Isn't the solution you're describing effectively the same as what's
>> > described in KUDU-668? That closely mirrors what the WAL does, which I'd
>> > argue is a good thing (since it's tried and true).
>> >
>> > On Tue, Apr 5, 2016 at 12:14 AM, Mike Percy <[email protected]> wrote:
>> >
>> > > I'm working on solving an issue in the Log Block Manager (LBM) filed
>> as
>> > > KUDU-1377. In this case, writing a partial record to the end of a
>> > container
>> > > metadata file can cause a failure to restart Kudu. One possible
>> wversay
>> > for
>> > > this to happen is to run out of disk space when appending a block id
>> to
>> > > this metadata file. I wanted to discuss a potential fix for this
>> issue.
>> > >
>> > > The LBM uses the protobuf container (PBC) file format documented in
>> > > pb_util.h for the container metadata file. The current version of this
>> > > format (V1) looks like this:
>> > >
>> > >
>> > > <magic number>
>> > > <container version>
>> > > repeated <record>
>> > >
>> > >
>> > > with each <record> looking like the following:
>> > >
>> > > <uint32 data length>
>> > > <data bytes>
>> > > <data checksum>
>> > >
>> > >
>> > > In the LBM, each time we create a new block we append the block id to
>> the
>> > > metadata file. On startup, we verify that all records in the file are
>> > > valid. If not, we print to the log and exit.
>> > >
>> > > In the case of a full disk, we will have written a partial record to
>> the
>> > > metadata file and at startup we will fail validation, however we
>> should
>> > be
>> > > able to detect this case and ignore the partial record on startup.
>> > Because
>> > > we still need to support deleting blocks, we need to be able to
>> continue
>> > > appending to this metadata file after startup, so we also need to
>> > truncate
>> > > the file to the last good record when this occurs.
>> > >
>> > > Here is what I am thinking about to fix this issue:
>> > >
>> > > 1. When we are reading a container metadata file at startup, if we
>> detect
>> > > that there is a trailing record that is too short to fit a valid
>> record
>> > > (relative to the length of the file) then we truncate the last partial
>> > > record from the file and continue as normal.
>> > >
>> > > 2. To avoid truncating "good" records in the case that there is data
>> > > corruption in one of the length fields, we also need to extend the PBC
>> > > format to add a checksum for the record length. So a record would now
>> > look
>> > > like the following:
>> > >
>> > > <uint32 data length>
>> > >
>> > > <length checksum>
>> > >
>> > > <data bytes>
>> > > <data checksum>
>> > >
>> > >
>> > > Does anyone see any drawbacks to this approach?
>> > >
>> > > If you made it this far, thanks for reading.
>> > >
>> > > Mike
>> > >
>> >
>>
>
>

Re: Durability of appendable metadata for Log Block Manager

Reply via email to