On 24/11/2024 21.23, Pierrick Bouvier wrote:
On 11/24/24 04:10, Paolo Bonzini wrote:
On 11/24/24 01:21, Pierrick Bouvier wrote:
After thinking about it, a simple, exhaustive and reliable way to find
this type information is the debug (dwarf) info.
By compiling qemu binaries with --enable-debug, and extracting info
using llvm-dwarfdump plus a custom filter [4], we can obtain a text
representation of all structures QEMU uses.

Yes, this is a good idea.

As there is a lot of repetition between all qemu binaries, the reduced
list of structs concerned is [6]:
+name:ArduinoMachineClass size:0x0198
+name:ARMCacheAttrs size:0x04
+name:ARMVAParameters size:0x04
+name:AspeedMachineClass size:0x01d0
+name:_GIOChannel size:0x70

This one unfortunately shows why the global change is wrong.  The size
of _GIOChannel must match between glib and QEMU, otherwise you have an
ABI mismatch.


In the codebase, we always use this type as an opaque type (through pointer, using library functions to interact with it and never use it's size). So the fact we see a different layout is *not* an issue for QEMU. I don't see it as a counter example that this does not work.

If you have a look at the difference between the binaries with "pahole", you can see also differences in bitfields in glib structs (GdkEventKey), even if the overall size of the struct does not change.

So even if it would be working by chance with the current libraries, this is a ticking time bomb - for each future library, this could break at any point in time, so IMHO we really must not enable -mno-ms-bitfields globally on Windows.

In other words, the global default _must_ be -mms-bitfields, because all
other libraries (and also Windows itself, though you didn't find any
occurrences) are built with MSVC ABI compatibility.  Bitfields are
relatively rare, and therefore you only found one occurrence; however,
this is a constraint that we cannot get rid of.


-mms-bitfields is already the (silent) gcc default on windows, to mimic MSVC behaviour. Yes, it would be preferable to use this default and gcc_struct attribute, but right now, it does not work with clang. So the whole point is to accept a compromise for this. We can even apply this selectively when clang is detected in case QEMU community is scared changing this default would break existing code.

I don't understand the strong pushback against clang support on windows.

It's certainly not a dislike of Clang. It's simply that we've seen problems in the past with bitfields in structures, and so far nobody went ahead and checked each and every target whether we are clean nowadays. And as long as nobody does this tedious job, we cannot say that Clang is really a supported compiler on Windows.

Because of a "theoretical" problem, that was proved here we don't have currently, we are stuck with gcc_struct attribute. This is currently blocking clang, and I don't see the point about that.

It's not a "theoretical" problem, it's a real one. Have a look at commit 642ba89672279fbdd14016a90da239c85e845d18 for example. If you revert the change to VTD_IR_TableEntry in include/hw/i386/intel_iommu.h and have a look a the difference of the output of "pahole" of the binaries, one compiled with -mms-bitfields and one with -mnoms-bitfields, you can see:

 union VTD_IR_TableEntry {
        struct {
                uint32_t           present:1;          /*     0: 0  4 */
                uint32_t           fault_disable:1;    /*     0: 1  4 */
                uint32_t           dest_mode:1;        /*     0: 2  4 */
                uint32_t           redir_hint:1;       /*     0: 3  4 */
                uint32_t           trigger_mode:1;     /*     0: 4  4 */
                uint32_t           delivery_mode:3;    /*     0: 5  4 */
                uint32_t           __avail:4;          /*     0: 8  4 */
                uint32_t           __reserved_0:3;     /*     0:12  4 */
                uint32_t           irte_mode:1;        /*     0:15  4 */
                uint32_t           vector:8;           /*     0:16  4 */
                uint32_t           __reserved_1:8;     /*     0:24  4 */
                uint32_t           dest_id;            /*     4     4 */
                uint16_t           source_id;          /*     8     2 */

-               /* XXX 6 bytes hole, try to pack */
+               /* Bitfield combined with previous fields */

-               uint64_t           sid_q:2;            /*    16: 0  8 */
-               uint64_t           sid_vtype:2;        /*    16: 2  8 */
-               uint64_t           __reserved_2:44;    /*    16: 4  8 */
-       } irte;                                        /*     0    24 */
+               uint64_t           sid_q:2;            /*     8:16  8 */
+               uint64_t           sid_vtype:2;        /*     8:18  8 */
+               uint64_t           __reserved_2:44;    /*     8:20  8 */
+       } irte;                                        /*     0    16 */
        uint64_t                   data[2];            /*     0    16 */
 };

This problem is fixed nowadays since it also caused issues on big endian hosts (btw, using bitfields in structs that are used for data interchange is evil, we should maybe forbid this in the coding styles) ... but do we know whether there are other problems left? So far, nobody really looked at it yet, I think.

I checked the x86_64, aarch64 and ppc64 targets with pahole, and we seem to be clear there nowadays. But before we could drop the attribute on Clang on Windows, I think we need someone who *carefully* looks at *all* targets and confirms that we don't have any issues left.

 Thomas


Reply via email to