On Wed, Jul 31, 2024 at 11:33 AM Richard Biener <rguent...@suse.de> wrote:
> > > > > > OK. Richard, can you please mention the above in the comment why > > > > > > XFmode is rejected in the hook? > > > > > > > > > > > > Later, we can perhaps benchmark XFmode move vs. generic memory copy > > > > > > to > > > > > > get some hard data. > > > > > > > > > > My (limited) understanding was that the hook would be used only for > > > > > cases > > > > > where we'd like to e.g. value number some SF/DF/XF etc. mode loads > > > > > and some > > > > > subsequent loads from the same address with different mode but same > > > > > size > > > > > the same and replace say int or long long later load with > > > > > VIEW_CONVERT_EXPR > > > > > of the result of the SF/SF mode load. That is what was incorrect, > > > > > because > > > > > the load didn't preserve all the bits. The patch would still keep > > > > > doing > > > > > normal SF/DF/XF etc. mode copies if that is all that happens in the > > > > > program, > > > > > load some floating point value and store it elsewhere or as part of > > > > > larger > > > > > aggregate copy. > > > > > > > > So, the hook should allow everything besides SF/DFmode, simply: > > > > > > > > > > > > switch (GET_MODE_INNER (mode)) > > > > { > > > > case SFmode: > > > > case DFmode: > > > > /* These suffer from normalization upon load when not using > > > > SSE. */ > > > > return !(ix86_fpmath & FPMATH_387); > > > > default: > > > > return true; > > > > } > > > > > > OK, I think I'll go with this then. I'm now unsure whether the > > > wrapper around the hook should reject modes with padding or if > > > the supposed users (value-numbering and SRA) should deal with that > > > issue separately. I do wonder whether > > > > > > ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_LONG_DOUBLE > > > ? &ieee_extended_intel_128_format > > > : TARGET_96_ROUND_53_LONG_DOUBLE > > > ? &ieee_extended_intel_96_round_53_format > > > : &ieee_extended_intel_96_format)); > > > ADJUST_BYTESIZE (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 12); > > > ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 4); > > > > > > unambiguously specifies where the padding is - m68k has > > > > > > FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_motorola_format); > > > > > > It's also not clear we can model a x87 10 byte memory copy in RTL since > > > a mem:XF still touches 12 or 16 bytes - IIRC a store leaves > > > possible padding as unspecified and not "masked out" even if > > > the actual fstp will only store 10 bytes. > > > > The hardware will never touch bytes outside 10 bytes range, the > > padding is some artificial compiler thingy, so IMO it should be > > handled before the hook is called. Please find attached the source I > > have used to confirm that a) the copied bits will never be mangled and > > b) there is no access outside the 10 bytes range. (BTW: these > > particular values are to test the effect of leading bit 63, the > > non-hidden normalized bit). > > Thanks - I do wonder why GET_MODE_SIZE (XFmode) is not 10 then, > mode_base_align[XFmode] seems to be correctly set to ensure > 12 bytes / 16 bytes "effective" size. FTR, "long double" AKA __float80 is defined as fundamental type in psABI as: sizeof 12, alignment 4 for i386 [1] and sizeof 16, alignment 16 for x86_64 [2]. These values are thus set by ABI despite the fact that hardware handles only 10 bytes. [1] Table 2.1, page 8 of https://www.uclibc.org/docs/psABI-i386.pdf [2] Figure 3.1, page 12 of https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf Uros.