On 24/11/2024 21.23, Pierrick Bouvier wrote:
On 11/24/24 04:10, Paolo Bonzini wrote:
On 11/24/24 01:21, Pierrick Bouvier wrote:
After thinking about it, a simple, exhaustive and reliable way to find
this type information is the debug (dwarf) info.
By compiling qemu binaries with --enable-debug, and extracting info
using llvm-dwarfdump plus a custom filter [4], we can obtain a text
representation of all structures QEMU uses.
Yes, this is a good idea.
As there is a lot of repetition between all qemu binaries, the reduced
list of structs concerned is [6]:
+name:ArduinoMachineClass size:0x0198
+name:ARMCacheAttrs size:0x04
+name:ARMVAParameters size:0x04
+name:AspeedMachineClass size:0x01d0
+name:_GIOChannel size:0x70
This one unfortunately shows why the global change is wrong. The size
of _GIOChannel must match between glib and QEMU, otherwise you have an
ABI mismatch.
In the codebase, we always use this type as an opaque type (through pointer,
using library functions to interact with it and never use it's size). So the
fact we see a different layout is *not* an issue for QEMU. I don't see it as
a counter example that this does not work.
If you have a look at the difference between the binaries with "pahole", you
can see also differences in bitfields in glib structs (GdkEventKey), even if
the overall size of the struct does not change.
So even if it would be working by chance with the current libraries, this is
a ticking time bomb - for each future library, this could break at any point
in time, so IMHO we really must not enable -mno-ms-bitfields globally on
Windows.
In other words, the global default _must_ be -mms-bitfields, because all
other libraries (and also Windows itself, though you didn't find any
occurrences) are built with MSVC ABI compatibility. Bitfields are
relatively rare, and therefore you only found one occurrence; however,
this is a constraint that we cannot get rid of.
-mms-bitfields is already the (silent) gcc default on windows, to mimic MSVC
behaviour. Yes, it would be preferable to use this default and gcc_struct
attribute, but right now, it does not work with clang. So the whole point is
to accept a compromise for this.
We can even apply this selectively when clang is detected in case QEMU
community is scared changing this default would break existing code.
I don't understand the strong pushback against clang support on windows.
It's certainly not a dislike of Clang. It's simply that we've seen problems
in the past with bitfields in structures, and so far nobody went ahead and
checked each and every target whether we are clean nowadays. And as long as
nobody does this tedious job, we cannot say that Clang is really a supported
compiler on Windows.
Because of a "theoretical" problem, that was proved here we don't have
currently, we are stuck with gcc_struct attribute. This is currently
blocking clang, and I don't see the point about that.
It's not a "theoretical" problem, it's a real one. Have a look at commit
642ba89672279fbdd14016a90da239c85e845d18 for example. If you revert the
change to VTD_IR_TableEntry in include/hw/i386/intel_iommu.h and have a look
a the difference of the output of "pahole" of the binaries, one compiled
with -mms-bitfields and one with -mnoms-bitfields, you can see:
union VTD_IR_TableEntry {
struct {
uint32_t present:1; /* 0: 0 4 */
uint32_t fault_disable:1; /* 0: 1 4 */
uint32_t dest_mode:1; /* 0: 2 4 */
uint32_t redir_hint:1; /* 0: 3 4 */
uint32_t trigger_mode:1; /* 0: 4 4 */
uint32_t delivery_mode:3; /* 0: 5 4 */
uint32_t __avail:4; /* 0: 8 4 */
uint32_t __reserved_0:3; /* 0:12 4 */
uint32_t irte_mode:1; /* 0:15 4 */
uint32_t vector:8; /* 0:16 4 */
uint32_t __reserved_1:8; /* 0:24 4 */
uint32_t dest_id; /* 4 4 */
uint16_t source_id; /* 8 2 */
- /* XXX 6 bytes hole, try to pack */
+ /* Bitfield combined with previous fields */
- uint64_t sid_q:2; /* 16: 0 8 */
- uint64_t sid_vtype:2; /* 16: 2 8 */
- uint64_t __reserved_2:44; /* 16: 4 8 */
- } irte; /* 0 24 */
+ uint64_t sid_q:2; /* 8:16 8 */
+ uint64_t sid_vtype:2; /* 8:18 8 */
+ uint64_t __reserved_2:44; /* 8:20 8 */
+ } irte; /* 0 16 */
uint64_t data[2]; /* 0 16 */
};
This problem is fixed nowadays since it also caused issues on big endian
hosts (btw, using bitfields in structs that are used for data interchange is
evil, we should maybe forbid this in the coding styles) ... but do we know
whether there are other problems left? So far, nobody really looked at it
yet, I think.
I checked the x86_64, aarch64 and ppc64 targets with pahole, and we seem to
be clear there nowadays. But before we could drop the attribute on Clang on
Windows, I think we need someone who *carefully* looks at *all* targets and
confirms that we don't have any issues left.
Thomas