(snipping liberally)

On 09/25/19 10:13, Marvin Häuser wrote:
> Am 24.09.2019 um 22:26 schrieb Laszlo Ersek:

>> I'm opposed to enforcing the strict aliasing rules, even though in
>> all code that I write, I either try to conform to them, or at least I
>> seek to be fully conscious of breaking them.
>
> I agree with that, but I often see completely needless violations of
> them. In my opinion, all intentional violations should be documented
> to  signal the conscious breakage of the standard, as should be any
> "abuse"  of known implementation-defined behaviour.

Agreed!

> I suppose for the amount of in-place parsing done, the Strict Aliasing
> rule should be an exception for these cases, as per the assumptions
> below.

Probably so; from personal experience, I'm basically only drawn to type
punning when there's some binary data to parse into (or via)
structure(s); and in firmware, there are *surprisingly* many sequences /
tables to parse.

>> Consider the following example. You have a dynamically allocated
>> buffer. You read some data into it, from the network or the disk,
>> using PCI DMA. Let's assume that, from the block read via PCI DMA,
>> the library function(s) or protocol member(s) that you call, directly
>> or indirectly, there is at least one that:
>> - copies data from a source buffer to a target buffer, using UINT32
>> or UINT64 assignments (for speed),
>
> Honestly, I did not consider this and only had memcpy/memmove in mind.

It may not be an intuitive example.

There are other (similarly practical) instances of this pattern. See
commit 6e2543b01d0c ("ArmVirtualizationPkg: introduce QemuFwCfgLib
instance for DXE drivers", 2015-01-02), for example:

>     [...]
>
>     Because MMIO accesses are costly on KVM/ARM, InternalQemuFwCfgReadBytes()
>     accesses the fw_cfg data register in full words. This speeds up transfers
>     almost linearly.
>
>     [...]
>
> +/**
> +  Reads firmware configuration bytes into a buffer
> +
> +  @param[in] Size    Size in bytes to read
> +  @param[in] Buffer  Buffer to store data into  (OPTIONAL if Size is 0)
> +
> +**/
> +STATIC
> +VOID
> +EFIAPI
> +InternalQemuFwCfgReadBytes (
> +  IN UINTN Size,
> +  IN VOID  *Buffer OPTIONAL
> +  )
> +{
> +  UINTN Left;
> +  UINT8 *Ptr;
> +  UINT8 *End;
> +
> +#ifdef MDE_CPU_AARCH64
> +  Left = Size & 7;
> +#else
> +  Left = Size & 3;
> +#endif
> +
> +  Size -= Left;
> +  Ptr = Buffer;
> +  End = Ptr + Size;
> +
> +#ifdef MDE_CPU_AARCH64
> +  while (Ptr < End) {
> +    *(UINT64 *)Ptr = MmioRead64 (mFwCfgDataAddress);
> +    Ptr += 8;
> +  }
> +  if (Left & 4) {
> +    *(UINT32 *)Ptr = MmioRead32 (mFwCfgDataAddress);
> +    Ptr += 4;
> +  }
> +#else
> +  while (Ptr < End) {
> +    *(UINT32 *)Ptr = MmioRead32 (mFwCfgDataAddress);
> +    Ptr += 4;
> +  }
> +#endif
> +
> +  if (Left & 2) {
> +    *(UINT16 *)Ptr = MmioRead16 (mFwCfgDataAddress);
> +    Ptr += 2;
> +  }
> +  if (Left & 1) {
> +    *Ptr = MmioRead8 (mFwCfgDataAddress);
> +  }
> +}

And then the data read like this from the hypervisor may contain
arbitrary structures for the firmware to parse.

Back to the discussion:

On 09/25/19 10:13, Marvin Häuser wrote:

> However, if we "virtually" treat CopyMem as memmove, the compiler
> compatibility verification would be reduced from all callers to just
> it, i.e. CopyMem must be implemented in a way that, for all supported
> compilers, we can assume the original effective type is "carried
> over", such with at worst (which should not be required with any sane
> compiler) char-copying.

As far as I understand, you're saying that, if we ensure that compilers
recognize CopyMem() as similar to memmove(), then we can apply the C
standard (~ the effective type rules) to edk2 too, only replacing
memcpy() / memmove() references in the std language with CopyMem()
references.

Then we could call edk2 conformant once all such data manipulation
boiled down to correct use of CopyMem().

If that's your point, then I agree with it.

> I'm not looking to have absolute C compliance enforced, but to reduce
> pointless violations and possibly "concentrate" violations for easier
> compatibility verification.

These are very good goals.

(As a digression, consider the following -- very frequent -- pattern:

  EFI_PCI_IO_PROTOCOL *PciIo;

  Status = gBS->OpenProtocol (
                  DeviceHandle,
                  &gEfiPciIoProtocolGuid,
                  (VOID **)&PciIo,
                  This->DriverBindingHandle,
                  DeviceHandle,
                  EFI_OPEN_PROTOCOL_BY_DRIVER
                  );

Technically, the third argument

  (VOID **)&PciIo

is wrong, as pointer-to-structure types (which have identical
representation to each other) need not have the same representation as
pointer-to-void. See C99 "6.2.5 Types", p27:

> A pointer to void shall have the same representation and alignment
> requirements as a pointer to a character type. [...] Similarly,
> pointers to qualified or unqualified versions of compatible types
> shall have the same representation and alignment requirements. All
> pointers to structure types shall have the same representation and
> alignment requirements as each other. All pointers to union types
> shall have the same representation and alignment requirements as each
> other. Pointers to other types need not have the same representation
> or alignment requirements.

The proper way to do it would be:

  VOID                *Interface;
  EFI_PCI_IO_PROTOCOL *PciIo;

  Status = gBS->OpenProtocol (
                  DeviceHandle,
                  &gEfiPciIoProtocolGuid,
                  &Interface,
                  This->DriverBindingHandle,
                  DeviceHandle,
                  EFI_OPEN_PROTOCOL_BY_DRIVER
                  );
  if (!EFI_ERROR (Status)) {
    PciIo = Interface;
  }

Because, in the assignment, the pointer-to-void is *converted* to
pointer-to-structure (not just re-interpreted as), with any necessary
updates to the internal representation.

IIRC, POSIX ultimately added a requirement (beyond the C standard) for
implementations, for said pointer representations to be identical.

So, this is a very frequent violation of the standard; I think it's even
visible in the UEFI spec, in various example code.

Is this violation pointless? It wouldn't be difficult to write all new
code following the second (proper) pattern. Updating all existent sites
would be a nightmare though.

)

Continuing:

On 09/25/19 10:13, Marvin Häuser wrote:

> If nothing else, casting away CONST drastically increases the
> likeliness of misuse happening, as the only indicator for const-ness
> has been dropped.

I agree; before casting away an existing const-qualification, we should
think thrice.

(OTOH, trying to design all function prototypes as tightly as possible,
const-qualifying as many as possible pointed-to objects under the
pointer-typed parameters, tends to become unwieldy really quick. In my
experience anyway.)

>> It would be nice to remove all toolchains that don't support the
>> flexible array member, and then to replace all struct hacks with the
>> flexible array member. I agree.
>>
>> Unfortunately, there's one extra difficulty (beyond the "expected"
>> regressions in adjusting code for the fixed element at offset 0): the
>> struct hack is used in several places in the UEFI 2.8 spec. So that
>> would have to be updated too.
>
> Agreed. However, I see value in updating the UEFI specification, as it
> should mandate the abstract-ish concept (trailing array of a length not
> known at compile time), not the implementation (struct hack), which in
> this case even is a language standard violation.

It's not that I don't see value in updating the spec -- I generally
don't have time for working on the spec (or for reviewing ECRs)! :)

Thanks!
Laszlo

-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.

View/Reply Online (#48039): https://edk2.groups.io/g/devel/message/48039
Mute This Topic: https://groups.io/mt/34180197/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to