On Sun, Nov 24, 2024 at 9:23 PM Pierrick Bouvier
<[email protected]> wrote:
> > This one unfortunately shows why the global change is wrong.  The size
> > of _GIOChannel must match between glib and QEMU, otherwise you have an
> > ABI mismatch.
>
> In the codebase, we always use this type as an opaque type (through
> pointer, using library functions to interact with it and never use it's
> size). So the fact we see a different layout is *not* an issue for QEMU.
> I don't see it as a counter example that this does not work.

_GIOChannel is just an example, and in principle macros could be
relying on the layout of GIOChannel. My point is that compiling a
program with a different ABI than the rest of the system is a ticking
time bomb, and therefore compiling QEMU with -mno-ms-bitfields is not
a solution.

This is also explained at
https://github.com/GNOME/glib/blob/main/docs/win32-build.md: "You
should link to GLib using the -mms-bitfields GCC flag. This flag means
that the struct layout rules are identical to those used by MSVC. This
is essential if the same DLLs are to be usable both from gcc- and
MSVC-compiled code".

> -mms-bitfields is already the (silent) gcc default on windows, to mimic
> MSVC behaviour. Yes, it would be preferable to use this default and
> gcc_struct attribute, but right now, it does not work with clang. So the
> whole point is to accept a compromise for this.
> I don't understand the strong pushback against clang support on windows.
> Because of a "theoretical" problem, that was proved here we don't have
> currently, we are stuck with gcc_struct attribute.

I think you are unnecessarily pessimistic:

* there is no need for a compromise, eliminating gcc_struct is the
best solution but it needs proof that it introduces no guest-visible
changes

* there is no pushback against clang support, there is pushback
against asking for a change without understanding the problem

On the other hand, the problem is very real even though we might not
have it *now*; it's not hard to find out when and why the option was
introduced. If you do "git log -Sgcc_struct", and then Google for the
commit subject "Add new macro QEMU_PACKED for packed C structures",
you'll quickly find one of these two links:

https://patchwork.ozlabs.org/project/qemu-devel/patch/[email protected]/
https://lists.gnu.org/archive/html/qemu-devel/2011-08/msg01877.html

As an aside, at https://github.com/msys2/MINGW-packages/pull/21540 you
said "I think too it's more a FUD argument than a real problem", which
is a bit too dismissive. If anything it's a case of "once bitten,
twice shy".

Anyway, the problem is *not* that QEMU uses gcc_struct. Rather, the
rare but real problem is that there is, in some cases involving
bitfields, incompatible struct layout between Linux and Windows
compilers.

gcc_struct is *one* solution to that problem, and the one that QEMU is
currently using. It has the advantage that it cannot go wrong, and the
disadvantage that clang doesn't support it. Note that it's perfectly
possible that there are no such cases in QEMU, i.e. that the attribute
has no effect now. But it's been there for 13 years and it was
introduced because of a bug whose cause was MSVC bitfield layout.
People *should* be wary of removing it.

clang's lack of support for gcc_struct is stupid, but we have to work
around it and we can, without introducing potential ABI breaks. Just
verify that gcc_struct still has any effect. Initially I mentioned
checking sizeof() but actually it's possible to do the same using
debug info, similar to your scripts.

For example, I tried doing another kind of analysis based on
llvm-dwarfdump. This will give the list of source files with bitfields
in them:

find . -name "*.c.o" | xargs llvm-dwarfdump|\
  grep -we DW_AT_data_bit_offset -e DW_AT_bit_size -e DW_AT_decl_file|\
  grep -B1 _bit_ | grep decl_file | sort -u

Restricting the list of files further to those that have QEMU_PACKED
in them, the list becomes just:

hw/nvme/ctrl.c
hw/pci/pcie_doe.c
net/util.h
include/hw/cxl/cxl_device.h
include/hw/hyperv/dynmem-proto.h
include/hw/i386/intel_iommu.h
include/hw/i386/x86-iommu.h
include/net/eth.h

Looking at these eight files, in some of them the bitfields are
defined via registers.h. In others the bitfields:
- have the same type within a word
- don't span multiple words
- use the smallest integer type that fits them (e.g. uint8_t in net/util.h)
- always have a nonzero width

So it *should* be fine to remove gcc_struct from QEMU even without
-mms-bitfields. But we need to be *sure* that it is, hence the next
part of the email...

> > However, your script lets you do the opposite experiment: remove
> > gcc_struct QEMU_PACKED and check if anything changes, i.e. whether there
> > are any QEMU_PACKED structs that do rely on the gcc_struct attribute.
> > If there are any, then it should be possible to change the definition
> > and fix them.
>
> It does not only remove gcc_struct attribute, it replaces it with an
> option that does the same thing globally for all packed structs.

I understand that, and I'm asking you to do another experiment. Do not
change the compile-time options. Instead, change QEMU_PACKED to just

#define QEMU_PACKED __attribute__((packed))

and see if any struct definitions (which will all follow the ms_struct
rules) change. If there are changes, let's examine what they are and
why my analysis above was incorrect. Fix those cases, add
QEMU_BUILD_BUG_ON checks only to the affected structs, and once you've
addressed any differences (if they exist), you can proceed with
dropping gcc_struct since there will be concrete evidence proving it's
safe.

Paolo


Reply via email to