On 1/12/24 12:37, Gerd Hoffmann wrote:
> This is a little series containing the flash corruption fix sent
> yesterday with an slightly improved commit message and some small
> improvements on top of this.
>
> Gerd Hoffmann (4):
>   OvmfPkg/VirtNorFlashDxe: fix shadowbuffer reads
>   OvmfPkg/VirtNorFlashDxe: clarify block write logic
>   OvmfPkg/VirtNorFlashDxe: allow larger writes without block erase
>   OvmfPkg/VirtNorFlashDxe: ValidateFvHeader: unwritten state is EOL too
>
>  OvmfPkg/VirtNorFlashDxe/VirtNorFlash.c    | 33 +++++++++++------------
>  OvmfPkg/VirtNorFlashDxe/VirtNorFlashFvb.c |  5 ++++
>  2 files changed, 21 insertions(+), 17 deletions(-)
>

Looking at the original code makes me throw a fit (no offense -- I don't
know who wrote it, and I don't want to check).

There is not a single diagram in the code, when that would be central to
the whole thing.


    0               128              256
    [----------------|----------------]
    ^         ^             ^
    |         |             |
    |         |     (Offset & 0x7F) + NumBytes; i.e., the Offset inside
    |         |     (or just past) the *double-word* such that Offset is
    |         |     the *exclusive* end of the (logical) update
    |         |
    |         Offset & 0x7F; i.e., Offset within the "word";
    |         this is where the (logical) update is supposed to start
    |
    Offset & ~(UINTN)0x7F; i.e., Offset truncated to "word" boundary

In this diagram, NumBytes is already limited to 256; that's because of
the existent condition

   if ((*NumBytes + (Offset & BOUNDARY_OF_32_WORDS)) <= (2 * 
P30_MAX_BUFFER_SIZE_IN_BYTES)) {

So, independently of the bug in the code that this series is supposed to
fix, some problems with the original code:

- no diagram (see above)

- rampant duplication of hard to understand expressions, such as:

  - Offset & ~BOUNDARY_OF_32_WORDS

    (side comment: applying the bit-neg on a *signed integer* deserves
    its own brown paper bag)

  - *NumBytes + (Offset & BOUNDARY_OF_32_WORDS)

  - Offset & ~BOUNDARY_OF_32_WORDS

- more bit-neg applied to a *signed integer*:

  ~OrigData[CurOffset]

    because OrigData[CurOffset] is a UINT8, which gets promoted to
    INT32, and that's when the bit-neg is applied

- when the second word write is deemed necessary, then the
  *BlockAddress* variable is bumped by 128 bytes out of laziness for
  said second write -- and that is a *semantic wreck*. The BlockAddress
  does not change *at all*; it's the start offset within the block that
  increases by 128 bytes for the second word write.

- The weird Exit and DoErase labels are fugly. The function should
  either be split into two functions, or at least reorganized with "ifs"
  such that this jumping is not necessary. Gotos are fine, but only for
  error paths / cleanup on exit, not for business logic selection. IOW,
  the main offender is DoErase.


Then comments on the patch set:

- In my opinion, the series should progress in opposite order. First
  introduce a diagram (!), then refactor with the helper variables, and
  then fix the bug. With the refactoring in place *first*, the bugfix
  should be easier to understand. Then, potentially, generalize the code
  to larger-than-two multiples of a word, for writes.

- The first patch in the series is wrong.

  In case we need not erase the whole flash block, we will want to write
  one or two (consecutive) 128-byte "words". That is, 128 bytes, or 256
  bytes. That means we need to read the exact same byte counts as well.

  The *second* patch in the series actually seems to do this, with

    End   = (Offset + *NumBytes + BOUNDARY_OF_32_WORDS) & ~BOUNDARY_OF_32_WORDS;

  (This *in itself* would *much better* be written as follows:

    End = ALIGN_VALUE (Offset + *NumBytes, P30_MAX_BUFFER_SIZE_IN_BYTES);

  but I digress.)

  However, the first patch still introduces:

    (((Offset & BOUNDARY_OF_32_WORDS) + *NumBytes) | BOUNDARY_OF_32_WORDS) + 1

  as the byte count for the read.

  Unfortunately, the "saturation logic" (i.e., OR-ing 0x7F to the
  exclusive end offset, for "seeking" to the end of the word), and then
  adding 1, does not implement a correct "align-up" operation.

  Consider

    Offset == 0 && *NumBytes == 256

  This circumstance is *valid* for the optimization path (and it is
  correctly permitted by the top-most check).

  But the expression introduced by patch#1 produces *384* for it, which
  is wrong.

  Similarly, given (for example)

    Offset == 1 && *NumBytes == 127

  the formula from patch#1 evaluates to 256.

  The expression does not consider the case when the exclusive end
  offset of the requested (logical write) is immediately at a word
  boundary (i.e., a multiple of 128). In that case namely, saturating
  with the bit-or, and adding 1, is wrong -- because in that case, no
  additional block should be read at all.

  So the first patch in the series replaces the *pre-series* bug with a
  different (less harmful) bug, and then the second patch silently
  *fixes* the replacement bug.

- This is in fact the fundamental bug: the incorrect implementation of
  the "align-up" operation with "saturate, then add 1". Both the
  pre-series code, and the code in patch#1, contain this mistake.

  The only thing that patch#1 changes is the *input*, to which the
  (incorrect) operation is applied -- namely in patch#1, the *input*
  changes from "NumBytes" to "exclusive end offset of the logical write,
  relative to the start of the (double-)word".

  That input change is in fact good (and *necessary*), but it's not
  *sufficient*. The operation itself needs to be fixed.

Summary:

- please rewrite the series in the following order: refactoring, then
  bugfix, then further armoring (additional sanity checks).

- please only ever apply the bit-neg operator on values that are UINT32,
  UINTN, or UINT64. Otherwise we get sign bit flipping, and that's
  terrible. (Most people are not even aware of it happening.)

- bit-fiddling should be kept to the absolute minimum. This means both a
  need for helper variables (calculated as early as possible), and usage
  of macros such as ALIGN_VALUE rather-than open-coded logic.

It's possible that the refactoring in patch#2 is effectively impossible
to do without fixing the *pre-series* bug at once. That's fine, as long
as we point out the bug in the commit message.

Importantly, the commit message should provide an actual (Offset,
*NumBytes) tuple (an example) where the pre-series expression

  (*NumBytes | BOUNDARY_OF_32_WORDS) + 1

calculates a bogus byte count for the read.

IOW, there are two things to highlight in the commit message:

- round-up operation incorrectly implemented,

- wrong input provided to the (already incorrect) round-up operation.

Thanks
Laszlo



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#113814): https://edk2.groups.io/g/devel/message/113814
Mute This Topic: https://groups.io/mt/103680930/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: 
https://edk2.groups.io/g/devel/leave/9847357/21656/1706620634/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to