Re: [middle-end PATCH take #2] Only call targetm.truly_noop_truncation for truncations.

2024-01-24 Thread Jeff Law




On 12/31/23 09:23, Roger Sayle wrote:


Very many thanks (and a Happy New Year) to the pre-commit
patch testing folks at linaro.org.   Their testing has revealed that
although my patch is clean on x86_64, it triggers some problems
on aarch64 and arm.  The issue (with the previous version of my
patch) is that these platforms require a paradoxical subreg to be
generated by the middle-end, where we were previously checking
for truly_noop_truncation.

This has been fixed (in revision 2) below.  Where previously I had:

@@ -66,7 +66,9 @@ gen_lowpart_general (machine_mode mode, rtx x)
scalar_int_mode xmode;
if (is_a  (GET_MODE (x), )
   && GET_MODE_SIZE (xmode) <= UNITS_PER_WORD
- && TRULY_NOOP_TRUNCATION_MODES_P (mode, xmode)
+ && (known_lt (GET_MODE_SIZE (mode), GET_MODE_SIZE (xmode))
+ ? TRULY_NOOP_TRUNCATION_MODES_P (mode, xmode)
+ : known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE (xmode)))
   && !reload_completed)
 return gen_lowpart_general (mode, force_reg (xmode, x));

the correct change is:

scalar_int_mode xmode;
if (is_a  (GET_MODE (x), )
   && GET_MODE_SIZE (xmode) <= UNITS_PER_WORD
- && TRULY_NOOP_TRUNCATION_MODES_P (mode, xmode)
+ && (known_ge (GET_MODE_SIZE (mode), GET_MODE_SIZE (xmode))
+ || TRULY_NOOP_TRUNCATION_MODES_P (mode, xmode))
   && !reload_completed)
 return gen_lowpart_general (mode, force_reg (xmode, x));

i.e. we only call TRULY_NOOP_TRUNCATION_MODES_P when we
know we have a truncation, but the behaviour of non-truncations
is preserved (no longer depends upon unspecified behaviour) and
gen_lowpart_general is called to create the paradoxical SUBREG.


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?

Hopefully this revision tests cleanly on the linaro.org CI pipeline.

2023-12-31  Roger Sayle  

gcc/ChangeLog
 * combine.cc (make_extraction): Confirm that OUTPREC is less than
 INPREC before calling TRULY_NOOP_TRUNCATION_MODES_P.
 * expmed.cc (store_bit_field_using_insv): Likewise.
 (extract_bit_field_using_extv): Likewise.
 (extract_bit_field_as_subreg): Likewise.
 * optabs-query.cc (get_best_extraction_insn): Likewise.
 * optabs.cc (expand_parity): Likewise.
 * rtlhooks.cc (gen_lowpart_general): Likewise.
 * simplify-rtx.cc (simplify_truncation): Disallow truncations
 to the same precision.
 (simplify_unary_operation_1) : Move optimization
 of truncations to the same mode earlier.
So just an FYI.  This should probably wait for gcc-15.  But it is worth 
noting that this causes tbz_2.c to fail on aarch64.  Basically with this 
patch we use the "x" registers whereas before we used a "w" register. 
Meaning we're doing a 64bit comparison now rather than a 32bit 
comparison.  THe test expects a 32bit comparison.


jeff


Re: [middle-end PATCH take #2] Only call targetm.truly_noop_truncation for truncations.

2024-01-05 Thread Richard Sandiford
"Roger Sayle"  writes:
> Very many thanks (and a Happy New Year) to the pre-commit
> patch testing folks at linaro.org.   Their testing has revealed that
> although my patch is clean on x86_64, it triggers some problems
> on aarch64 and arm.  The issue (with the previous version of my
> patch) is that these platforms require a paradoxical subreg to be
> generated by the middle-end, where we were previously checking
> for truly_noop_truncation.
>
> This has been fixed (in revision 2) below.  Where previously I had:
>
> @@ -66,7 +66,9 @@ gen_lowpart_general (machine_mode mode, rtx x)
>scalar_int_mode xmode;
>if (is_a  (GET_MODE (x), )
>   && GET_MODE_SIZE (xmode) <= UNITS_PER_WORD
> - && TRULY_NOOP_TRUNCATION_MODES_P (mode, xmode)
> + && (known_lt (GET_MODE_SIZE (mode), GET_MODE_SIZE (xmode))
> + ? TRULY_NOOP_TRUNCATION_MODES_P (mode, xmode)
> + : known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE (xmode)))
>   && !reload_completed)
> return gen_lowpart_general (mode, force_reg (xmode, x));
>
> the correct change is:
>
>scalar_int_mode xmode;
>if (is_a  (GET_MODE (x), )
>   && GET_MODE_SIZE (xmode) <= UNITS_PER_WORD
> - && TRULY_NOOP_TRUNCATION_MODES_P (mode, xmode)
> + && (known_ge (GET_MODE_SIZE (mode), GET_MODE_SIZE (xmode))
> + || TRULY_NOOP_TRUNCATION_MODES_P (mode, xmode))
>   && !reload_completed)
> return gen_lowpart_general (mode, force_reg (xmode, x));
>
> i.e. we only call TRULY_NOOP_TRUNCATION_MODES_P when we
> know we have a truncation, but the behaviour of non-truncations
> is preserved (no longer depends upon unspecified behaviour) and
> gen_lowpart_general is called to create the paradoxical SUBREG.
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
> Hopefully this revision tests cleanly on the linaro.org CI pipeline.
>
> 2023-12-31  Roger Sayle  
>
> gcc/ChangeLog
> * combine.cc (make_extraction): Confirm that OUTPREC is less than
> INPREC before calling TRULY_NOOP_TRUNCATION_MODES_P.
> * expmed.cc (store_bit_field_using_insv): Likewise.
> (extract_bit_field_using_extv): Likewise.
> (extract_bit_field_as_subreg): Likewise.
> * optabs-query.cc (get_best_extraction_insn): Likewise.
> * optabs.cc (expand_parity): Likewise.
> * rtlhooks.cc (gen_lowpart_general): Likewise.
> * simplify-rtx.cc (simplify_truncation): Disallow truncations
> to the same precision.
> (simplify_unary_operation_1) : Move optimization
> of truncations to the same mode earlier.
>
>
>> -Original Message-
>> From: Roger Sayle 
>> Sent: 28 December 2023 15:35
>> To: 'gcc-patches@gcc.gnu.org' 
>> Cc: 'Jeff Law' 
>> Subject: [middle-end PATCH] Only call targetm.truly_noop_truncation for
>> truncations.
>> 
>> 
>> The truly_noop_truncation target hook is documented, in target.def, as
> "true if it
>> is safe to convert a value of inprec bits to one of outprec bits (where
> outprec is
>> smaller than inprec) by merely operating on it as if it had only outprec
> bits", i.e.
>> the middle-end can use a SUBREG instead of a TRUNCATE.
>> 
>> What's perhaps potentially a little ambiguous in the above description is
> whether
>> it is the caller or the callee that's responsible for ensuring or checking
> whether
>> "outprec < inprec".  The name TRULY_NOOP_TRUNCATION_P, like
>> SUBREG_PROMOTED_P, may be prone to being understood as a predicate that
>> confirms that something is a no-op truncation or a promoted subreg, when
> in fact
>> the caller must first confirm this is a truncation/subreg and only then
> call the
>> "classification" macro.
>> 
>> Alas making the following minor tweak (for testing) to the i386 backend:
>> 
>> static bool
>> ix86_truly_noop_truncation (poly_uint64 outprec, poly_uint64 inprec) {
>>   gcc_assert (outprec < inprec);
>>   return true;
>> }
>> 
>> #undef TARGET_TRULY_NOOP_TRUNCATION
>> #define TARGET_TRULY_NOOP_TRUNCATION ix86_truly_noop_truncation
>> 
>> reveals that there are numerous callers in middle-end that rely on the
> default
>> behaviour of silently returning true for any (invalid) input.
>> These are fixed below.
>> 
>> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
>> make -k check, both with and without --target_board=unix{-m32} with no new
>> failures.  Ok for mainline?
>> 
>> 
>> 2023-12-28  Roger Sayle  
>> 
>> gcc/ChangeLog
>> * combine.cc (make_extraction): Confirm that OUTPREC is less than
>> INPREC before calling TRULY_NOOP_TRUNCATION_MODES_P.
>> * expmed.cc (store_bit_field_using_insv): Likewise.
>> (extract_bit_field_using_extv): Likewise.
>> (extract_bit_field_as_subreg): Likewise.
>> * optabs-query.cc (get_best_extraction_insn):