On Sun, Nov 24, 2024 at 9:23 PM Pierrick Bouvier <[email protected]> wrote: > > This one unfortunately shows why the global change is wrong. The size > > of _GIOChannel must match between glib and QEMU, otherwise you have an > > ABI mismatch. > > In the codebase, we always use this type as an opaque type (through > pointer, using library functions to interact with it and never use it's > size). So the fact we see a different layout is *not* an issue for QEMU. > I don't see it as a counter example that this does not work.
_GIOChannel is just an example, and in principle macros could be relying on the layout of GIOChannel. My point is that compiling a program with a different ABI than the rest of the system is a ticking time bomb, and therefore compiling QEMU with -mno-ms-bitfields is not a solution. This is also explained at https://github.com/GNOME/glib/blob/main/docs/win32-build.md: "You should link to GLib using the -mms-bitfields GCC flag. This flag means that the struct layout rules are identical to those used by MSVC. This is essential if the same DLLs are to be usable both from gcc- and MSVC-compiled code". > -mms-bitfields is already the (silent) gcc default on windows, to mimic > MSVC behaviour. Yes, it would be preferable to use this default and > gcc_struct attribute, but right now, it does not work with clang. So the > whole point is to accept a compromise for this. > I don't understand the strong pushback against clang support on windows. > Because of a "theoretical" problem, that was proved here we don't have > currently, we are stuck with gcc_struct attribute. I think you are unnecessarily pessimistic: * there is no need for a compromise, eliminating gcc_struct is the best solution but it needs proof that it introduces no guest-visible changes * there is no pushback against clang support, there is pushback against asking for a change without understanding the problem On the other hand, the problem is very real even though we might not have it *now*; it's not hard to find out when and why the option was introduced. If you do "git log -Sgcc_struct", and then Google for the commit subject "Add new macro QEMU_PACKED for packed C structures", you'll quickly find one of these two links: https://patchwork.ozlabs.org/project/qemu-devel/patch/[email protected]/ https://lists.gnu.org/archive/html/qemu-devel/2011-08/msg01877.html As an aside, at https://github.com/msys2/MINGW-packages/pull/21540 you said "I think too it's more a FUD argument than a real problem", which is a bit too dismissive. If anything it's a case of "once bitten, twice shy". Anyway, the problem is *not* that QEMU uses gcc_struct. Rather, the rare but real problem is that there is, in some cases involving bitfields, incompatible struct layout between Linux and Windows compilers. gcc_struct is *one* solution to that problem, and the one that QEMU is currently using. It has the advantage that it cannot go wrong, and the disadvantage that clang doesn't support it. Note that it's perfectly possible that there are no such cases in QEMU, i.e. that the attribute has no effect now. But it's been there for 13 years and it was introduced because of a bug whose cause was MSVC bitfield layout. People *should* be wary of removing it. clang's lack of support for gcc_struct is stupid, but we have to work around it and we can, without introducing potential ABI breaks. Just verify that gcc_struct still has any effect. Initially I mentioned checking sizeof() but actually it's possible to do the same using debug info, similar to your scripts. For example, I tried doing another kind of analysis based on llvm-dwarfdump. This will give the list of source files with bitfields in them: find . -name "*.c.o" | xargs llvm-dwarfdump|\ grep -we DW_AT_data_bit_offset -e DW_AT_bit_size -e DW_AT_decl_file|\ grep -B1 _bit_ | grep decl_file | sort -u Restricting the list of files further to those that have QEMU_PACKED in them, the list becomes just: hw/nvme/ctrl.c hw/pci/pcie_doe.c net/util.h include/hw/cxl/cxl_device.h include/hw/hyperv/dynmem-proto.h include/hw/i386/intel_iommu.h include/hw/i386/x86-iommu.h include/net/eth.h Looking at these eight files, in some of them the bitfields are defined via registers.h. In others the bitfields: - have the same type within a word - don't span multiple words - use the smallest integer type that fits them (e.g. uint8_t in net/util.h) - always have a nonzero width So it *should* be fine to remove gcc_struct from QEMU even without -mms-bitfields. But we need to be *sure* that it is, hence the next part of the email... > > However, your script lets you do the opposite experiment: remove > > gcc_struct QEMU_PACKED and check if anything changes, i.e. whether there > > are any QEMU_PACKED structs that do rely on the gcc_struct attribute. > > If there are any, then it should be possible to change the definition > > and fix them. > > It does not only remove gcc_struct attribute, it replaces it with an > option that does the same thing globally for all packed structs. I understand that, and I'm asking you to do another experiment. Do not change the compile-time options. Instead, change QEMU_PACKED to just #define QEMU_PACKED __attribute__((packed)) and see if any struct definitions (which will all follow the ms_struct rules) change. If there are changes, let's examine what they are and why my analysis above was incorrect. Fix those cases, add QEMU_BUILD_BUG_ON checks only to the affected structs, and once you've addressed any differences (if they exist), you can proceed with dropping gcc_struct since there will be concrete evidence proving it's safe. Paolo
