VirtNorFlashDxe: fix corruption + misc small improvements

Ard Biesheuvel Mon, 15 Jan 2024 09:57:03 -0800

On Mon, 15 Jan 2024 at 11:21, Laszlo Ersek <[email protected]> wrote:
>
> On 1/12/24 12:37, Gerd Hoffmann wrote:
> > This is a little series containing the flash corruption fix sent
> > yesterday with an slightly improved commit message and some small
> > improvements on top of this.
> >
> > Gerd Hoffmann (4):
> >   OvmfPkg/VirtNorFlashDxe: fix shadowbuffer reads
> >   OvmfPkg/VirtNorFlashDxe: clarify block write logic
> >   OvmfPkg/VirtNorFlashDxe: allow larger writes without block erase
> >   OvmfPkg/VirtNorFlashDxe: ValidateFvHeader: unwritten state is EOL too
> >
> >  OvmfPkg/VirtNorFlashDxe/VirtNorFlash.c    | 33 +++++++++++------------
> >  OvmfPkg/VirtNorFlashDxe/VirtNorFlashFvb.c |  5 ++++
> >  2 files changed, 21 insertions(+), 17 deletions(-)
> >
>
> Looking at the original code makes me throw a fit (no offense -- I don't
> know who wrote it, and I don't want to check).
>


Hi Laszlo,

I am not the author of the original code, but I suppose I should take
at least some of the blame here, having added some of the logic to
reduce the number of MMIO accesses (which are disproportionately
expensive under virtualization), and this is where the bug got
introduced afaict.

> There is not a single diagram in the code, when that would be central to
> the whole thing.
>
>
>     0               128              256
>     [----------------|----------------]
>     ^         ^             ^
>     |         |             |
>     |         |     (Offset & 0x7F) + NumBytes; i.e., the Offset inside
>     |         |     (or just past) the *double-word* such that Offset is
>     |         |     the *exclusive* end of the (logical) update
>     |         |
>     |         Offset & 0x7F; i.e., Offset within the "word";
>     |         this is where the (logical) update is supposed to start
>     |
>     Offset & ~(UINTN)0x7F; i.e., Offset truncated to "word" boundary
>
> In this diagram, NumBytes is already limited to 256; that's because of
> the existent condition
>
>    if ((*NumBytes + (Offset & BOUNDARY_OF_32_WORDS)) <= (2 * 
> P30_MAX_BUFFER_SIZE_IN_BYTES)) {
>
> So, independently of the bug in the code that this series is supposed to
> fix, some problems with the original code:
>
> - no diagram (see above)
>
> - rampant duplication of hard to understand expressions, such as:
>
>   - Offset & ~BOUNDARY_OF_32_WORDS
>
>     (side comment: applying the bit-neg on a *signed integer* deserves
>     its own brown paper bag)
>
>   - *NumBytes + (Offset & BOUNDARY_OF_32_WORDS)
>
>   - Offset & ~BOUNDARY_OF_32_WORDS
>
> - more bit-neg applied to a *signed integer*:
>
>   ~OrigData[CurOffset]
>
>     because OrigData[CurOffset] is a UINT8, which gets promoted to
>     INT32, and that's when the bit-neg is applied
>
> - when the second word write is deemed necessary, then the
>   *BlockAddress* variable is bumped by 128 bytes out of laziness for
>   said second write -- and that is a *semantic wreck*. The BlockAddress
>   does not change *at all*; it's the start offset within the block that
>   increases by 128 bytes for the second word write.
>
> - The weird Exit and DoErase labels are fugly. The function should
>   either be split into two functions, or at least reorganized with "ifs"
>   such that this jumping is not necessary. Gotos are fine, but only for
>   error paths / cleanup on exit, not for business logic selection. IOW,
>   the main offender is DoErase.
>

Agree with all of these points.

>
> Then comments on the patch set:
>
> - In my opinion, the series should progress in opposite order. First
>   introduce a diagram (!), then refactor with the helper variables, and
>   then fix the bug. With the refactoring in place *first*, the bugfix
>   should be easier to understand. Then, potentially, generalize the code
>   to larger-than-two multiples of a word, for writes.
>
> - The first patch in the series is wrong.
>
>   In case we need not erase the whole flash block, we will want to write
>   one or two (consecutive) 128-byte "words". That is, 128 bytes, or 256
>   bytes. That means we need to read the exact same byte counts as well.
>
>   The *second* patch in the series actually seems to do this, with
>
>     End   = (Offset + *NumBytes + BOUNDARY_OF_32_WORDS) & 
> ~BOUNDARY_OF_32_WORDS;
>
>   (This *in itself* would *much better* be written as follows:
>
>     End = ALIGN_VALUE (Offset + *NumBytes, P30_MAX_BUFFER_SIZE_IN_BYTES);
>
>   but I digress.)
>
>   However, the first patch still introduces:
>
>     (((Offset & BOUNDARY_OF_32_WORDS) + *NumBytes) | BOUNDARY_OF_32_WORDS) + 1
>
>   as the byte count for the read.
>
>   Unfortunately, the "saturation logic" (i.e., OR-ing 0x7F to the
>   exclusive end offset, for "seeking" to the end of the word), and then
>   adding 1, does not implement a correct "align-up" operation.
>
>   Consider
>
>     Offset == 0 && *NumBytes == 256
>
>   This circumstance is *valid* for the optimization path (and it is
>   correctly permitted by the top-most check).
>
>   But the expression introduced by patch#1 produces *384* for it, which
>   is wrong.
>
>   Similarly, given (for example)
>
>     Offset == 1 && *NumBytes == 127
>
>   the formula from patch#1 evaluates to 256.
>
>   The expression does not consider the case when the exclusive end
>   offset of the requested (logical write) is immediately at a word
>   boundary (i.e., a multiple of 128). In that case namely, saturating
>   with the bit-or, and adding 1, is wrong -- because in that case, no
>   additional block should be read at all.
>
>   So the first patch in the series replaces the *pre-series* bug with a
>   different (less harmful) bug, and then the second patch silently
>   *fixes* the replacement bug.
>
> - This is in fact the fundamental bug: the incorrect implementation of
>   the "align-up" operation with "saturate, then add 1". Both the
>   pre-series code, and the code in patch#1, contain this mistake.
>
>   The only thing that patch#1 changes is the *input*, to which the
>   (incorrect) operation is applied -- namely in patch#1, the *input*
>   changes from "NumBytes" to "exclusive end offset of the logical write,
>   relative to the start of the (double-)word".
>
>   That input change is in fact good (and *necessary*), but it's not
>   *sufficient*. The operation itself needs to be fixed.
>
> Summary:
>
> - please rewrite the series in the following order: refactoring, then
>   bugfix, then further armoring (additional sanity checks).
>
> - please only ever apply the bit-neg operator on values that are UINT32,
>   UINTN, or UINT64. Otherwise we get sign bit flipping, and that's
>   terrible. (Most people are not even aware of it happening.)
>
> - bit-fiddling should be kept to the absolute minimum. This means both a
>   need for helper variables (calculated as early as possible), and usage
>   of macros such as ALIGN_VALUE rather-than open-coded logic.
>
> It's possible that the refactoring in patch#2 is effectively impossible
> to do without fixing the *pre-series* bug at once. That's fine, as long
> as we point out the bug in the commit message.
>
> Importantly, the commit message should provide an actual (Offset,
> *NumBytes) tuple (an example) where the pre-series expression
>
>   (*NumBytes | BOUNDARY_OF_32_WORDS) + 1
>
> calculates a bogus byte count for the read.
>
> IOW, there are two things to highlight in the commit message:
>
> - round-up operation incorrectly implemented,
>
> - wrong input provided to the (already incorrect) round-up operation.
>

Thanks for taking the time to review this series as well as the existing code.

I agree with all of this, and I feel responsible for the current state
to some extent, so I will make time to get this fixed.

Gerd, if you are up for doing some of the work too and see a
meaningful split that would allow us to spread the load, feel free to
throw some of it my way. Otherwise, I will put it on my TODO list, and
I will get to it before the end of the month.


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#113837): https://edk2.groups.io/g/devel/message/113837
Mute This Topic: https://groups.io/mt/103680930/21656
Group Owner: [email protected]
Unsubscribe: https://edk2.groups.io/g/devel/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [edk2-devel] [PATCH 0/4] OvmfPkg/VirtNorFlashDxe: fix corruption + misc small improvements

Reply via email to