Re: [middle-end PATCH take #2] Only call targetm.truly_noop_truncation for truncations.
On 12/31/23 09:23, Roger Sayle wrote: Very many thanks (and a Happy New Year) to the pre-commit patch testing folks at linaro.org. Their testing has revealed that although my patch is clean on x86_64, it triggers some problems on aarch64 and arm. The issue (with the previous version of my patch) is that these platforms require a paradoxical subreg to be generated by the middle-end, where we were previously checking for truly_noop_truncation. This has been fixed (in revision 2) below. Where previously I had: @@ -66,7 +66,9 @@ gen_lowpart_general (machine_mode mode, rtx x) scalar_int_mode xmode; if (is_a (GET_MODE (x), &xmode) && GET_MODE_SIZE (xmode) <= UNITS_PER_WORD - && TRULY_NOOP_TRUNCATION_MODES_P (mode, xmode) + && (known_lt (GET_MODE_SIZE (mode), GET_MODE_SIZE (xmode)) + ? TRULY_NOOP_TRUNCATION_MODES_P (mode, xmode) + : known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE (xmode))) && !reload_completed) return gen_lowpart_general (mode, force_reg (xmode, x)); the correct change is: scalar_int_mode xmode; if (is_a (GET_MODE (x), &xmode) && GET_MODE_SIZE (xmode) <= UNITS_PER_WORD - && TRULY_NOOP_TRUNCATION_MODES_P (mode, xmode) + && (known_ge (GET_MODE_SIZE (mode), GET_MODE_SIZE (xmode)) + || TRULY_NOOP_TRUNCATION_MODES_P (mode, xmode)) && !reload_completed) return gen_lowpart_general (mode, force_reg (xmode, x)); i.e. we only call TRULY_NOOP_TRUNCATION_MODES_P when we know we have a truncation, but the behaviour of non-truncations is preserved (no longer depends upon unspecified behaviour) and gen_lowpart_general is called to create the paradoxical SUBREG. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? Hopefully this revision tests cleanly on the linaro.org CI pipeline. 2023-12-31 Roger Sayle gcc/ChangeLog * combine.cc (make_extraction): Confirm that OUTPREC is less than INPREC before calling TRULY_NOOP_TRUNCATION_MODES_P. * expmed.cc (store_bit_field_using_insv): Likewise. (extract_bit_field_using_extv): Likewise. (extract_bit_field_as_subreg): Likewise. * optabs-query.cc (get_best_extraction_insn): Likewise. * optabs.cc (expand_parity): Likewise. * rtlhooks.cc (gen_lowpart_general): Likewise. * simplify-rtx.cc (simplify_truncation): Disallow truncations to the same precision. (simplify_unary_operation_1) : Move optimization of truncations to the same mode earlier. So just an FYI. This should probably wait for gcc-15. But it is worth noting that this causes tbz_2.c to fail on aarch64. Basically with this patch we use the "x" registers whereas before we used a "w" register. Meaning we're doing a 64bit comparison now rather than a 32bit comparison. THe test expects a 32bit comparison. jeff
Re: [middle-end PATCH take #2] Only call targetm.truly_noop_truncation for truncations.
"Roger Sayle" writes: > Very many thanks (and a Happy New Year) to the pre-commit > patch testing folks at linaro.org. Their testing has revealed that > although my patch is clean on x86_64, it triggers some problems > on aarch64 and arm. The issue (with the previous version of my > patch) is that these platforms require a paradoxical subreg to be > generated by the middle-end, where we were previously checking > for truly_noop_truncation. > > This has been fixed (in revision 2) below. Where previously I had: > > @@ -66,7 +66,9 @@ gen_lowpart_general (machine_mode mode, rtx x) >scalar_int_mode xmode; >if (is_a (GET_MODE (x), &xmode) > && GET_MODE_SIZE (xmode) <= UNITS_PER_WORD > - && TRULY_NOOP_TRUNCATION_MODES_P (mode, xmode) > + && (known_lt (GET_MODE_SIZE (mode), GET_MODE_SIZE (xmode)) > + ? TRULY_NOOP_TRUNCATION_MODES_P (mode, xmode) > + : known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE (xmode))) > && !reload_completed) > return gen_lowpart_general (mode, force_reg (xmode, x)); > > the correct change is: > >scalar_int_mode xmode; >if (is_a (GET_MODE (x), &xmode) > && GET_MODE_SIZE (xmode) <= UNITS_PER_WORD > - && TRULY_NOOP_TRUNCATION_MODES_P (mode, xmode) > + && (known_ge (GET_MODE_SIZE (mode), GET_MODE_SIZE (xmode)) > + || TRULY_NOOP_TRUNCATION_MODES_P (mode, xmode)) > && !reload_completed) > return gen_lowpart_general (mode, force_reg (xmode, x)); > > i.e. we only call TRULY_NOOP_TRUNCATION_MODES_P when we > know we have a truncation, but the behaviour of non-truncations > is preserved (no longer depends upon unspecified behaviour) and > gen_lowpart_general is called to create the paradoxical SUBREG. > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? > > Hopefully this revision tests cleanly on the linaro.org CI pipeline. > > 2023-12-31 Roger Sayle > > gcc/ChangeLog > * combine.cc (make_extraction): Confirm that OUTPREC is less than > INPREC before calling TRULY_NOOP_TRUNCATION_MODES_P. > * expmed.cc (store_bit_field_using_insv): Likewise. > (extract_bit_field_using_extv): Likewise. > (extract_bit_field_as_subreg): Likewise. > * optabs-query.cc (get_best_extraction_insn): Likewise. > * optabs.cc (expand_parity): Likewise. > * rtlhooks.cc (gen_lowpart_general): Likewise. > * simplify-rtx.cc (simplify_truncation): Disallow truncations > to the same precision. > (simplify_unary_operation_1) : Move optimization > of truncations to the same mode earlier. > > >> -Original Message- >> From: Roger Sayle >> Sent: 28 December 2023 15:35 >> To: 'gcc-patches@gcc.gnu.org' >> Cc: 'Jeff Law' >> Subject: [middle-end PATCH] Only call targetm.truly_noop_truncation for >> truncations. >> >> >> The truly_noop_truncation target hook is documented, in target.def, as > "true if it >> is safe to convert a value of inprec bits to one of outprec bits (where > outprec is >> smaller than inprec) by merely operating on it as if it had only outprec > bits", i.e. >> the middle-end can use a SUBREG instead of a TRUNCATE. >> >> What's perhaps potentially a little ambiguous in the above description is > whether >> it is the caller or the callee that's responsible for ensuring or checking > whether >> "outprec < inprec". The name TRULY_NOOP_TRUNCATION_P, like >> SUBREG_PROMOTED_P, may be prone to being understood as a predicate that >> confirms that something is a no-op truncation or a promoted subreg, when > in fact >> the caller must first confirm this is a truncation/subreg and only then > call the >> "classification" macro. >> >> Alas making the following minor tweak (for testing) to the i386 backend: >> >> static bool >> ix86_truly_noop_truncation (poly_uint64 outprec, poly_uint64 inprec) { >> gcc_assert (outprec < inprec); >> return true; >> } >> >> #undef TARGET_TRULY_NOOP_TRUNCATION >> #define TARGET_TRULY_NOOP_TRUNCATION ix86_truly_noop_truncation >> >> reveals that there are numerous callers in middle-end that rely on the > default >> behaviour of silently returning true for any (invalid) input. >> These are fixed below. >> >> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and >> make -k check, both with and without --target_board=unix{-m32} with no new >> failures. Ok for mainline? >> >> >> 2023-12-28 Roger Sayle >> >> gcc/ChangeLog >> * combine.cc (make_extraction): Confirm that OUTPREC is less than >> INPREC before calling TRULY_NOOP_TRUNCATION_MODES_P. >> * expmed.cc (store_bit_field_using_insv): Likewise. >> (extract_bit_field_using_extv): Likewise. >> (extract_bit_field_as_subreg): Likewise. >> * optabs-query.cc (get_best_extract
[middle-end PATCH take #2] Only call targetm.truly_noop_truncation for truncations.
Very many thanks (and a Happy New Year) to the pre-commit patch testing folks at linaro.org. Their testing has revealed that although my patch is clean on x86_64, it triggers some problems on aarch64 and arm. The issue (with the previous version of my patch) is that these platforms require a paradoxical subreg to be generated by the middle-end, where we were previously checking for truly_noop_truncation. This has been fixed (in revision 2) below. Where previously I had: @@ -66,7 +66,9 @@ gen_lowpart_general (machine_mode mode, rtx x) scalar_int_mode xmode; if (is_a (GET_MODE (x), &xmode) && GET_MODE_SIZE (xmode) <= UNITS_PER_WORD - && TRULY_NOOP_TRUNCATION_MODES_P (mode, xmode) + && (known_lt (GET_MODE_SIZE (mode), GET_MODE_SIZE (xmode)) + ? TRULY_NOOP_TRUNCATION_MODES_P (mode, xmode) + : known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE (xmode))) && !reload_completed) return gen_lowpart_general (mode, force_reg (xmode, x)); the correct change is: scalar_int_mode xmode; if (is_a (GET_MODE (x), &xmode) && GET_MODE_SIZE (xmode) <= UNITS_PER_WORD - && TRULY_NOOP_TRUNCATION_MODES_P (mode, xmode) + && (known_ge (GET_MODE_SIZE (mode), GET_MODE_SIZE (xmode)) + || TRULY_NOOP_TRUNCATION_MODES_P (mode, xmode)) && !reload_completed) return gen_lowpart_general (mode, force_reg (xmode, x)); i.e. we only call TRULY_NOOP_TRUNCATION_MODES_P when we know we have a truncation, but the behaviour of non-truncations is preserved (no longer depends upon unspecified behaviour) and gen_lowpart_general is called to create the paradoxical SUBREG. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? Hopefully this revision tests cleanly on the linaro.org CI pipeline. 2023-12-31 Roger Sayle gcc/ChangeLog * combine.cc (make_extraction): Confirm that OUTPREC is less than INPREC before calling TRULY_NOOP_TRUNCATION_MODES_P. * expmed.cc (store_bit_field_using_insv): Likewise. (extract_bit_field_using_extv): Likewise. (extract_bit_field_as_subreg): Likewise. * optabs-query.cc (get_best_extraction_insn): Likewise. * optabs.cc (expand_parity): Likewise. * rtlhooks.cc (gen_lowpart_general): Likewise. * simplify-rtx.cc (simplify_truncation): Disallow truncations to the same precision. (simplify_unary_operation_1) : Move optimization of truncations to the same mode earlier. > -Original Message- > From: Roger Sayle > Sent: 28 December 2023 15:35 > To: 'gcc-patches@gcc.gnu.org' > Cc: 'Jeff Law' > Subject: [middle-end PATCH] Only call targetm.truly_noop_truncation for > truncations. > > > The truly_noop_truncation target hook is documented, in target.def, as "true if it > is safe to convert a value of inprec bits to one of outprec bits (where outprec is > smaller than inprec) by merely operating on it as if it had only outprec bits", i.e. > the middle-end can use a SUBREG instead of a TRUNCATE. > > What's perhaps potentially a little ambiguous in the above description is whether > it is the caller or the callee that's responsible for ensuring or checking whether > "outprec < inprec". The name TRULY_NOOP_TRUNCATION_P, like > SUBREG_PROMOTED_P, may be prone to being understood as a predicate that > confirms that something is a no-op truncation or a promoted subreg, when in fact > the caller must first confirm this is a truncation/subreg and only then call the > "classification" macro. > > Alas making the following minor tweak (for testing) to the i386 backend: > > static bool > ix86_truly_noop_truncation (poly_uint64 outprec, poly_uint64 inprec) { > gcc_assert (outprec < inprec); > return true; > } > > #undef TARGET_TRULY_NOOP_TRUNCATION > #define TARGET_TRULY_NOOP_TRUNCATION ix86_truly_noop_truncation > > reveals that there are numerous callers in middle-end that rely on the default > behaviour of silently returning true for any (invalid) input. > These are fixed below. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and > make -k check, both with and without --target_board=unix{-m32} with no new > failures. Ok for mainline? > > > 2023-12-28 Roger Sayle > > gcc/ChangeLog > * combine.cc (make_extraction): Confirm that OUTPREC is less than > INPREC before calling TRULY_NOOP_TRUNCATION_MODES_P. > * expmed.cc (store_bit_field_using_insv): Likewise. > (extract_bit_field_using_extv): Likewise. > (extract_bit_field_as_subreg): Likewise. > * optabs-query.cc (get_best_extraction_insn): Likewise. > * optabs.cc (expand_parity): Likewise. > * rtlhooks.cc (gen_lowpart_general): Likewise. > * simplify-rtx.cc (simplify_truncation): Disallow truncations