Re: [PATCH 1/2] Change the name of array_at_struct_end_p to array_ref_flexible_size_p

2022-11-08 Thread Richard Biener via Gcc-patches
On Tue, 8 Nov 2022, Qing Zhao wrote:

> The name of the utility routine "array_at_struct_end_p" is misleading
> and should be changed to a new name that more accurately reflects its
> real meaning.
> 
> The routine "array_at_struct_end_p" is used to check whether an array
> reference is to an array whose actual size might be larger than its
> upper bound implies, which includes 3 different cases:
> 
>A. a ref to a flexible array member at the end of a structure;
>B. a ref to an array with a different type against the original decl;
>C. a ref to an array that was passed as a parameter;
> 
> The old name only reflects the above case A, therefore very confusing
> when reading the corresponding gcc source code.
> 
> In this patch, A new name "array_ref_flexible_size_p" is used to replace
> the old name.
> 
> All the references to the routine "array_at_struct_end_p" was replaced
> with this new name, and the corresponding comments were updated to make
> them clean and consistent.

Since you seem to feel strongly about this - OK.

Thanks,
Richard.

> gcc/ChangeLog:
> 
>   * gimple-array-bounds.cc (trailing_array): Replace
>   array_at_struct_end_p with new name and update comments.
>   * gimple-fold.cc (get_range_strlen_tree): Likewise.
>   * gimple-ssa-warn-restrict.cc (builtin_memref::builtin_memref):
>   Likewise.
>   * graphite-sese-to-poly.cc (bounds_are_valid): Likewise.
>   * tree-if-conv.cc (idx_within_array_bound): Likewise.
>   * tree-object-size.cc (addr_object_size): Likewise.
>   * tree-ssa-alias.cc (component_ref_to_zero_sized_trailing_array_p):
>   Likewise.
>   (stmt_kills_ref_p): Likewise.
>   * tree-ssa-loop-niter.cc (idx_infer_loop_bounds): Likewise.
>   * tree-ssa-strlen.cc (maybe_set_strlen_range): Likewise.
>   * tree.cc (array_at_struct_end_p): Rename to ...
>   (array_ref_flexible_size_p): ... this.
>   (component_ref_size): Replace array_at_struct_end_p with new name.
>   * tree.h (array_at_struct_end_p): Rename to ...
>   (array_ref_flexible_size_p): ... this.
> ---
>  gcc/gimple-array-bounds.cc  |  4 ++--
>  gcc/gimple-fold.cc  |  6 ++
>  gcc/gimple-ssa-warn-restrict.cc |  5 +++--
>  gcc/graphite-sese-to-poly.cc|  4 ++--
>  gcc/tree-if-conv.cc |  7 +++
>  gcc/tree-object-size.cc |  2 +-
>  gcc/tree-ssa-alias.cc   |  8 
>  gcc/tree-ssa-loop-niter.cc  | 15 +++
>  gcc/tree-ssa-strlen.cc  |  2 +-
>  gcc/tree.cc | 11 ++-
>  gcc/tree.h  |  8 
>  11 files changed, 35 insertions(+), 37 deletions(-)
> 
> diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc
> index e190b93aa85..fbf448e045d 100644
> --- a/gcc/gimple-array-bounds.cc
> +++ b/gcc/gimple-array-bounds.cc
> @@ -129,7 +129,7 @@ get_ref_size (tree arg, tree *pref)
>  }
>  
>  /* Return true if REF is (likely) an ARRAY_REF to a trailing array member
> -   of a struct.  It refines array_at_struct_end_p by detecting a pointer
> +   of a struct.  It refines array_ref_flexible_size_p by detecting a pointer
> to an array and an array parameter declared using the [N] syntax (as
> opposed to a pointer) and returning false.  Set *PREF to the decl or
> expression REF refers to.  */
> @@ -167,7 +167,7 @@ trailing_array (tree arg, tree *pref)
>   return false;
>  }
>  
> -  return array_at_struct_end_p (arg);
> +  return array_ref_flexible_size_p (arg);
>  }
>  
>  /* Checks one ARRAY_REF in REF, located at LOCUS. Ignores flexible
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index 9055cd8982d..cafd331ca98 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -1690,13 +1690,11 @@ get_range_strlen_tree (tree arg, bitmap visited, 
> strlen_range_kind rkind,
> /* Handle a MEM_REF into a DECL accessing an array of integers,
>being conservative about references to extern structures with
>flexible array members that can be initialized to arbitrary
> -  numbers of elements as an extension (static structs are okay).
> -  FIXME: Make this less conservative -- see
> -  component_ref_size in tree.cc.  */
> +  numbers of elements as an extension (static structs are okay).  */
> tree ref = TREE_OPERAND (TREE_OPERAND (arg, 0), 0);
> if ((TREE_CODE (ref) == PARM_DECL || VAR_P (ref))
> && (decl_binds_to_current_def_p (ref)
> -   || !array_at_struct_end_p (arg)))
> +   || !array_ref_flexible_size_p (arg)))
>   {
> /* Fail if the offset is out of bounds.  Such accesses
>should be diagnosed at some point.  */
> diff --git a/gcc/gimple-ssa-warn-restrict.cc b/gcc/gimple-ssa-warn-restrict.cc
> index b7ed15c8902..832456ba6fc 100644
> --- a/gcc/gimple-ssa-warn-restrict.cc
> +++ b/gcc/gimple-ssa-warn-restrict.cc
> @@ -296,8 +296,9 @@ 

Re: [PATCH] rtl: Try to remove EH edges after {pro, epi}logue generation [PR90259]

2022-11-08 Thread Eric Botcazou via Gcc-patches
> The previous testings on powerpc64{,le}-linux-gnu covered language Go, but
> not Ada.  I re-tested it with languages c,c++,fortran,objc,obj-c++,go,ada
> on powerpc64le-linux-gnu, the result looked good.  Both x86 and aarch64
> cfarm machines which I used for testing don't have gnat installed, do you
> think testing Ada on ppc64le is enough?

Sure, thanks for having done it!

-- 
Eric Botcazou




Re: [RFC] propgation leap over memory copy for struct

2022-11-08 Thread Jiufu Guo via Gcc-patches
Jiufu Guo via Gcc-patches  writes:

> Richard Biener  writes:
>
>> On Tue, 1 Nov 2022, Jiufu Guo wrote:
>>
>>> Segher Boessenkool  writes:
>>> 
>>> > On Mon, Oct 31, 2022 at 04:13:38PM -0600, Jeff Law wrote:
>>> >> On 10/30/22 20:42, Jiufu Guo via Gcc-patches wrote:
>>> >> >We know that for struct variable assignment, memory copy may be used.
>>> >> >And for memcpy, we may load and store more bytes as possible at one 
>>> >> >time.
>>> >> >While it may be not best here:
>>> >
>>> >> So the first question in my mind is can we do better at the gimple 
>>> >> phase?  For the second case in particular can't we just "return a" 
>>> >> rather than copying a into  then returning ?  This feels 
>>> >> a lot like the return value optimization from C++.  I'm not sure if it 
>>> >> applies to the first case or not, it's been a long time since I looked 
>>> >> at NRV optimizations, but it might be worth poking around in there a bit 
>>> >> (tree-nrv.cc).
>>> >
>>> > If it is a bigger struct you end up with quite a lot of stuff in
>>> > registers.  GCC will eventually put that all in memory so it will work
>>> > out fine in the end, but you are likely to get inefficient code.
>>> Yes.  We may need to use memory to save regiters for big struct.
>>> Small struct may be practical to use registers.  We may leverage the
>>> idea that: some type of small struct are passing to function through
>>> registers. 
>>> 
>>> >
>>> > OTOH, 8 bytes isn't as big as we would want these days, is it?  So it
>>> > would be useful to put smaller temportaries, say 32 bytes and smaller,
>>> > in registers instead of in memory.
>>> I think you mean:  we should try to registers to avoid memory accesing,
>>> and using registers would be ok for more bytes memcpy(32bytes).
>>> Great sugguestion, thanks a lot!
>>> 
>>> Like below idea:
>>> [r100:TI, r101:TI] = src;  //Or r100:OI/OO = src;
>>> dest = [r100:TI, r101:TI];
>>> 
>>> Currently, for 8bytes structure, we are using TImode for it.
>>> And subreg/fwprop/cse passes are able to optimize it as expected.
>>> Two concerns here: larger int modes(OI/OO/..) may be not introduced yet;
>>> I'm not sure if current infrastructure supports to use two more
>>> registers for one structure.
>>> 
>>> >
>>> >> But even so, these kinds of things are still bound to happen, so it's 
>>> >> probably worth thinking about if we can do better in RTL as well.
>>> >
>>> > Always.  It is a mistake to think that having better high-level
>>> > optimisations means that you don't need good low-level optimisations
>>> > anymore: in fact deficiencies there become more glaringly apparent if
>>> > the early pipeline opts become better :-)
>>> Understant, thanks :)
>>> 
>>> >
>>> >> The first thing that comes to my mind is to annotate memcpy calls that 
>>> >> are structure assignments.  The idea here is that we may want to expand 
>>> >> a memcpy differently in those cases.   Changing how we expand an opaque 
>>> >> memcpy call is unlikely to be beneficial in most cases.  But changing 
>>> >> how we expand a structure copy may be beneficial by exposing the 
>>> >> underlying field values.   This would roughly correspond to your method 
>>> >> #1.
>>> >> 
>>> >> Or instead of changing how we expand, teach the optimizers about these 
>>> >> annotated memcpy calls -- they're just a a copy of each field.   That's 
>>> >> how CSE and the propagators could treat them. After some point we'd 
>>> >> lower them in the usual ways, but at least early in the RTL pipeline we 
>>> >> could keep them as annotated memcpy calls.  This roughly corresponds to 
>>> >> your second suggestion.
>>> >
>>> > Ideally this won't ever make it as far as RTL, if the structures do not
>>> > need to go via memory.  All high-level optimissations should have been
>>> > done earlier, and hopefully it was not expand tiself that forced stuff
>>> > into memory!  :-/
>>> Currently, after early gimple optimization, the struct member accessing
>>> may still need to be in memory (if the mode of the struct is BLK).
>>> For example:
>>> 
>>> _Bool foo (const A a) { return a.a[0] > 1.0; }
>>> 
>>> The optimized gimple would be:
>>>   _1 = a.a[0];
>>>   _3 = _1 > 1.0e+0;
>>>   return _3;
>>> 
>>> During expand to RTL, parm 'a' is store to memory from arg regs firstly,
>>> and "a.a[0]" is also reading from memory.  It may be better to use
>>> "f1" for "a.a[0]" here.
>>> 
>>> Maybe, method3 is similar with your idea: using "parallel:BLK {DF;DF;DF; 
>>> DF}"
>>> for the struct (BLK may be changed), and using 4 DF registers to access
>>> the structure in expand pass.
>>
>> I think for cases like this it might be a good idea to perform
>> SRA-like analysis at RTL expansion time when we know how parameters
>> arrive (in pieces) and take that knowledge into account when
>> assigning the RTL to a decl.  The same applies for the return ABI.
>> Since we rely on RTL to elide copies to/from return/argument
>> registers/slots we have to assign "layout compatible" registers
>> to the 

Re: [PATCH] Fix doc typo

2022-11-08 Thread Richard Biener via Gcc-patches
On Wed, Nov 9, 2022 at 4:29 AM Sinan via Gcc-patches
 wrote:
>
> add a missing variable name.

OK.


Re: Announcement: Porting the Docs to Sphinx - tomorrow

2022-11-08 Thread Richard Biener via Gcc-patches
On Wed, Nov 9, 2022 at 1:09 AM Sam James via Gcc-patches
 wrote:
>
>
>
> > On 9 Nov 2022, at 00:00, Joseph Myers  wrote:
> >
> > On Tue, 8 Nov 2022, Sam James via Gcc wrote:
> >
> >> Yes, please (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106899)
> >> even for snapshots? Pretty please? :)
> >
> > I think we want snapshots to come out weekly even if the compiler or
> > documentation build fails, which makes anything involving a build as part
> > of the snapshot process problematic.
>
> If that's your expectation, that's fine, but I'd regard it as pretty
> serious if one didn't build, and I don't remember a time when it didn't.
>
> It's not like it's that much use if it fails to build on a bog-standard
> amd64 platform anyway, as if nothing else, you'd get a deluge
> of duplicate bug reports.

I'd say that doing a trunk snapshot build every day as CI would be nice, we
can then publish one once a week, skipping days where the build failed.

For release branches having generated files in the snapshots would be
even more useful and I very much hope the build there always succeeds.

Richard.


Re: [PATCH 2/4] LoongArch: Add ftint{, rm, rp}.{w, l}.{s, d} instructions

2022-11-08 Thread Lulu Cheng
There is a paragraph in the explanation information for the compile 
parameter '-fno-fp-int-builtin-inexact' in the gcc.pdf document:


    "Do not allow the built-in functions ceil, floor, round and trunc, 
and their float and long double variants,


    to generate code that raises the “inexact” floating-point exception 
for noninteger arguments.


    ISO C99 and C11 allow these functions to raise the “inexact” 
exception, but ISO/IEC TS 18661-1:2014,


    the C bindings to IEEE 754-2008, as integrated into ISO C2X, does 
not allow these functions to do so."


So I think the implementation of these functions needs to be confirmed 
again.


Or am I misinterpreting this description?:-[


在 2022/11/9 下午3:21, Xi Ruoyao 写道:

This allows to optimize the following builtins if -fno-math-errno:

- __builtin_lrint{,f}
- __builtin_lfloor{,f}
- __builtin_lceil{,f}

Inspired by
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605287.html.

ANYFI is added so the compiler won't try ftint.l.s if -mfpu=32.  If we
simply used GPR here an ICE would be triggered with __builtin_lrintf
and -mfpu=32.

Note that the .w.{s,d} variants are not tested because we don't support
ILP32 for now.

gcc/ChangeLog:

* config/loongarch/loongarch.md (UNSPEC_FTINT): New unspec.
(UNSPEC_FTINTRM): Likewise.
(UNSPEC_FTINTRP): Likewise.
(LRINT): New define_int_iterator.
(lrint_pattern): New define_int_attr.
(lrint_submenmonic): Likewise.
(ANYFI): New define_mode_iterator.
(lrint): New instruction template.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/ftint.c: New test.
---
  gcc/config/loongarch/loongarch.md  | 28 ++
  gcc/testsuite/gcc.target/loongarch/ftint.c | 44 ++
  2 files changed, 72 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/ftint.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index a14ab14ac24..35cef272060 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -38,6 +38,9 @@ (define_c_enum "unspec" [
UNSPEC_FMAX
UNSPEC_FMIN
UNSPEC_FCOPYSIGN
+  UNSPEC_FTINT
+  UNSPEC_FTINTRM
+  UNSPEC_FTINTRP
  
;; Override return address for exception handling.

UNSPEC_EH_RETURN
@@ -374,6 +377,11 @@ (define_mode_iterator QHWD [QI HI SI (DI "TARGET_64BIT")])
  (define_mode_iterator ANYF [(SF "TARGET_HARD_FLOAT")
(DF "TARGET_DOUBLE_FLOAT")])
  
+;; Iterator for fixed-point modes which can be hold by a hardware

+;; floating-point register.
+(define_mode_iterator ANYFI [(SI "TARGET_HARD_FLOAT")
+(DI "TARGET_DOUBLE_FLOAT")])
+
  ;; A mode for which moves involving FPRs may need to be split.
  (define_mode_iterator SPLITF
[(DF "!TARGET_64BIT && TARGET_DOUBLE_FLOAT")
@@ -515,6 +523,16 @@ (define_code_attr fcond [(unordered "cun")
  (define_code_attr sel [(eq "masknez") (ne "maskeqz")])
  (define_code_attr selinv [(eq "maskeqz") (ne "masknez")])
  
+;; Iterator and attributes for floating-point to fixed-point conversion

+;; instructions.
+(define_int_iterator LRINT [UNSPEC_FTINT UNSPEC_FTINTRM UNSPEC_FTINTRP])
+(define_int_attr lrint_pattern [(UNSPEC_FTINT "lrint")
+   (UNSPEC_FTINTRM "lfloor")
+   (UNSPEC_FTINTRP "lceil")])
+(define_int_attr lrint_submenmonic [(UNSPEC_FTINT "")
+   (UNSPEC_FTINTRM "rm")
+   (UNSPEC_FTINTRP "rp")])
+
  ;;
  ;;  
  ;;
@@ -2022,6 +2040,16 @@ (define_insn "rint2"
[(set_attr "type" "fcvt")
 (set_attr "mode" "")])
  
+;; Convert floating-point numbers to integers

+(define_insn "2"
+  [(set (match_operand:ANYFI 0 "register_operand" "=f")
+   (unspec:ANYFI [(match_operand:ANYF 1 "register_operand" "f")]
+ LRINT))]
+  "TARGET_HARD_FLOAT"
+  "ftint.. %0,%1"
+  [(set_attr "type" "fcvt")
+   (set_attr "mode" "")])
+
  ;; Load the low word of operand 0 with operand 1.
  (define_insn "load_low"
[(set (match_operand:SPLITF 0 "register_operand" "=f,f")
diff --git a/gcc/testsuite/gcc.target/loongarch/ftint.c 
b/gcc/testsuite/gcc.target/loongarch/ftint.c
new file mode 100644
index 000..9c3c3a8a756
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/ftint.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64d -mdouble-float -fno-math-errno" } */
+/* { dg-final { scan-assembler "ftint\\.l\\.s" } } */
+/* { dg-final { scan-assembler "ftint\\.l\\.d" } } */
+/* { dg-final { scan-assembler "ftintrm\\.l\\.s" } } */
+/* { dg-final { scan-assembler "ftintrm\\.l\\.d" } } */
+/* { dg-final { scan-assembler "ftintrp\\.l\\.s" } } */
+/* { dg-final { scan-assembler "ftintrp\\.l\\.d" } } */
+
+long
+my_lrint (double a)
+{
+  return __builtin_lrint (a);
+}
+
+long
+my_lrintf (float a)
+{
+  return __builtin_lrintf (a);
+}
+
+long
+my_lfloor (double a)

Re: [PATCH] match.pd: rewrite select to branchless expression

2022-11-08 Thread Richard Biener via Gcc-patches
On Tue, Nov 8, 2022 at 9:02 PM Michael Collison  wrote:
>
> This patches transforms (cond (and (x , 0x1) == 0), y, (z op y)) into
> (-(and (x , 0x1)) & z ) op y, where op is a '^' or a '|'. It also
> transforms (cond (and (x , 0x1) != 0), (z op y), y ) into (-(and (x ,
> 0x1)) & z ) op y.
>
> Matching this patterns allows GCC to generate branchless code for one of
> the functions in coremark.
>
> Bootstrapped and tested on x86 and RISC-V. Okay?
>
> Michael.
>
> 2022-11-08  Michael Collison  
>
>  * match.pd ((cond (and (x , 0x1) == 0), y, (z op y) )
>  -> (-(and (x , 0x1)) & z ) op y)
>
> 2022-11-08  Michael Collison  
>
>  * gcc.dg/tree-ssa/branchless-cond.c: New test.
>
> ---
>   gcc/match.pd  | 22 
>   .../gcc.dg/tree-ssa/branchless-cond.c | 26 +++
>   2 files changed, 48 insertions(+)
>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 194ba8f5188..722f517ac6d 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3486,6 +3486,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
> (max @2 @1))
>
> +/* (cond (and (x , 0x1) == 0), y, (z ^ y) ) -> (-(and (x , 0x1)) & z )
> ^ y */

Please write the match as a C expression in the comment, as present
it's a weird mix.  So x & 0x1 == 0 ? y : z  y -> (-(typeof(y))(x &
0x1) & z)  y

> +(for op (bit_xor bit_ior)
> + (simplify
> +  (cond (eq (bit_and @0 integer_onep@1)
> +integer_zerop)
> +@2
> +(op:c @3 @2))
> +  (if (INTEGRAL_TYPE_P (type)
> +   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
> +   (op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2

Since you are literally keeping (bit_and @0 @1) and not matching @0 with
anything I suspect you could instead use

 (simplify (cond (eq zero_one_valued_p@0 integer_zerop) ...

eventually extending that to cover bit_and with one.  Do you need to guard
this against 'type' being a signed/unsigned 1-bit precision integer?

> +
> +/* (cond (and (x , 0x1) != 0), (z ^ y), y ) -> (-(and (x , 0x1)) & z )
> ^ y */
> +(for op (bit_xor bit_ior)
> + (simplify
> +  (cond (ne (bit_and @0 integer_onep@1)
> +integer_zerop)
> +(op:c @3 @2)
> +@2)
> +  (if (INTEGRAL_TYPE_P (type)
> +   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
> +   (op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2
> +
>   /* Simplifications of shift and rotates.  */
>
>   (for rotate (lrotate rrotate)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
> b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
> new file mode 100644
> index 000..68087ae6568
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +int f1(unsigned int x, unsigned int y, unsigned int z)
> +{
> +  return ((x & 1) == 0) ? y : z ^ y;
> +}
> +
> +int f2(unsigned int x, unsigned int y, unsigned int z)
> +{
> +  return ((x & 1) != 0) ? z ^ y : y;
> +}
> +
> +int f3(unsigned int x, unsigned int y, unsigned int z)
> +{
> +  return ((x & 1) == 0) ? y : z | y;
> +}
> +
> +int f4(unsigned int x, unsigned int y, unsigned int z)
> +{
> +  return ((x & 1) != 0) ? z | y : y;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " -" 4 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " & " 8 "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "if" "optimized" } } */
> --
> 2.34.1
>
>
>
>


Re: [PATCH v2] LoongArch: fix signed overflow in loongarch_emit_int_compare

2022-11-08 Thread Lulu Cheng

LGTM.

Thanks!

在 2022/11/9 下午3:26, Xi Ruoyao 写道:

Signed overflow is an undefined behavior, so we need to prevent it from
happening, instead of "checking" the result.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_emit_int_compare):
Avoid signed overflow.
---

v1 -> v2: break instead of continue if the inc/dec will overflow. So the
logic wouldn't be changed if signed overflow was defined to wrap.

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

  gcc/config/loongarch/loongarch.cc | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index f54c233f90c..8d5d8d965dd 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -4178,10 +4178,13 @@ loongarch_emit_int_compare (enum rtx_code *code, rtx 
*op0, rtx *op1)
  if (!increment && !decrement)
continue;
  
+	  if ((increment && rhs == HOST_WIDE_INT_MAX)

+ || (decrement && rhs == HOST_WIDE_INT_MIN))
+   break;
+
  new_rhs = rhs + (increment ? 1 : -1);
  if (loongarch_integer_cost (new_rhs)
-   < loongarch_integer_cost (rhs)
- && (rhs < 0) == (new_rhs < 0))
+   < loongarch_integer_cost (rhs))
{
  *op1 = GEN_INT (new_rhs);
  *code = mag_comparisons[i][increment];




[PATCH v2] LoongArch: fix signed overflow in loongarch_emit_int_compare

2022-11-08 Thread Xi Ruoyao via Gcc-patches
Signed overflow is an undefined behavior, so we need to prevent it from
happening, instead of "checking" the result.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_emit_int_compare):
Avoid signed overflow.
---

v1 -> v2: break instead of continue if the inc/dec will overflow. So the
logic wouldn't be changed if signed overflow was defined to wrap.

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index f54c233f90c..8d5d8d965dd 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -4178,10 +4178,13 @@ loongarch_emit_int_compare (enum rtx_code *code, rtx 
*op0, rtx *op1)
  if (!increment && !decrement)
continue;
 
+ if ((increment && rhs == HOST_WIDE_INT_MAX)
+ || (decrement && rhs == HOST_WIDE_INT_MIN))
+   break;
+
  new_rhs = rhs + (increment ? 1 : -1);
  if (loongarch_integer_cost (new_rhs)
-   < loongarch_integer_cost (rhs)
- && (rhs < 0) == (new_rhs < 0))
+   < loongarch_integer_cost (rhs))
{
  *op1 = GEN_INT (new_rhs);
  *code = mag_comparisons[i][increment];
-- 
2.38.1



[PATCH 1/4] LoongArch: Rename frint_ to rint2

2022-11-08 Thread Xi Ruoyao via Gcc-patches
Use standard name so __builtin_rint{,f} can be expanded to one
instruction.

gcc/ChangeLog:

* config/loongarch/loongarch.md (frint_): Rename to ..
(rint2): .. this.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/frint.c: New test.
---
 gcc/config/loongarch/loongarch.md  |  4 ++--
 gcc/testsuite/gcc.target/loongarch/frint.c | 16 
 2 files changed, 18 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/frint.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index bda34d0f3db..a14ab14ac24 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2012,8 +2012,8 @@ (define_insn "lui_h_hi12"
   [(set_attr "type" "move")]
 )
 
-;; Convert floating-point numbers to integers
-(define_insn "frint_"
+;; Round floating-point numbers to integers
+(define_insn "rint2"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
(unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")]
  UNSPEC_FRINT))]
diff --git a/gcc/testsuite/gcc.target/loongarch/frint.c 
b/gcc/testsuite/gcc.target/loongarch/frint.c
new file mode 100644
index 000..3ee6a8f973a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/frint.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mdouble-float" } */
+/* { dg-final { scan-assembler "frint\\.s" } } */
+/* { dg-final { scan-assembler "frint\\.d" } } */
+
+double
+my_rint (double a)
+{
+  return __builtin_rint (a);
+}
+
+float
+my_rintf (float a)
+{
+  return __builtin_rintf (a);
+}
-- 
2.38.1



[PATCH 4/4] LoongArch: Add flogb.{s, d} instructions and expand logb{sf, df}2

2022-11-08 Thread Xi Ruoyao via Gcc-patches
On LoongArch, flogb instructions extract the exponent of a non-negative
floating point value, but produces NaN for negative values.  So we need
to add a fabs instruction when we expand logb.

gcc/ChangeLog:

* config/loongarch/loongarch.md (UNSPEC_FLOGB): New unspec.
(type): Add flogb.
(logb_non_negative2): New instruction template.
(logb2): New define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/flogb.c: New test.
---
 gcc/config/loongarch/loongarch.md  | 35 --
 gcc/testsuite/gcc.target/loongarch/flogb.c | 18 +++
 2 files changed, 51 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/flogb.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 9070ac4e2f8..072c3163b75 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -42,6 +42,7 @@ (define_c_enum "unspec" [
   UNSPEC_FTINTRM
   UNSPEC_FTINTRP
   UNSPEC_FSCALEB
+  UNSPEC_FLOGB
 
   ;; Override return address for exception handling.
   UNSPEC_EH_RETURN
@@ -217,6 +218,7 @@ (define_attr "qword_mode" "no,yes"
 ;; fdivfloating point divide
 ;; frdiv   floating point reciprocal divide
 ;; fabsfloating point absolute value
+;; flogb   floating point exponent extract
 ;; fnegfloating point negation
 ;; fcmpfloating point compare
 ;; fcopysign   floating point copysign
@@ -233,8 +235,8 @@ (define_attr "type"
   "unknown,branch,jump,call,load,fpload,fpidxload,store,fpstore,fpidxstore,
prefetch,prefetchx,condmove,mgtf,mftg,const,arith,logical,
shift,slt,signext,clz,trap,imul,idiv,move,
-   fmove,fadd,fmul,fmadd,fdiv,frdiv,fabs,fneg,fcmp,fcopysign,fcvt,fscaleb,
-   fsqrt,frsqrt,accext,accmod,multi,atomic,syncloop,nop,ghost"
+   fmove,fadd,fmul,fmadd,fdiv,frdiv,fabs,flogb,fneg,fcmp,fcopysign,fcvt,
+   fscaleb,fsqrt,frsqrt,accext,accmod,multi,atomic,syncloop,nop,ghost"
   (cond [(eq_attr "jirl" "!unset") (const_string "call")
 (eq_attr "got" "load") (const_string "load")
 
@@ -1036,6 +1038,35 @@ (define_insn "ldexp3"
(set_attr "mode" "")])
 
 ;;
+;;  
+;;
+;; FLOATING POINT EXPONENT EXTRACT
+;;
+;;  
+
+(define_insn "logb_non_negative2"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+   (unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")]
+UNSPEC_FLOGB))]
+  "TARGET_HARD_FLOAT"
+  "flogb.\t%0,%1"
+  [(set_attr "type" "flogb")
+   (set_attr "mode" "")])
+
+(define_expand "logb2"
+  [(set (match_operand:ANYF 0 "register_operand")
+   (unspec:ANYF [(abs:ANYF (match_operand:ANYF 1 "register_operand"))]
+UNSPEC_FLOGB))]
+  "TARGET_HARD_FLOAT"
+{
+  rtx tmp = gen_reg_rtx (mode);
+
+  emit_insn (gen_abs2 (tmp, operands[1]));
+  emit_insn (gen_logb_non_negative2 (operands[0], tmp));
+  DONE;
+})
+
+;;
 ;;  ...
 ;;
 ;;  Count leading zeroes.
diff --git a/gcc/testsuite/gcc.target/loongarch/flogb.c 
b/gcc/testsuite/gcc.target/loongarch/flogb.c
new file mode 100644
index 000..1daefe54e13
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/flogb.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mdouble-float -fno-math-errno" } */
+/* { dg-final { scan-assembler "fabs\\.s" } } */
+/* { dg-final { scan-assembler "fabs\\.d" } } */
+/* { dg-final { scan-assembler "flogb\\.s" } } */
+/* { dg-final { scan-assembler "flogb\\.d" } } */
+
+double
+my_logb (double a)
+{
+  return __builtin_logb (a);
+}
+
+float
+my_logbf (float a)
+{
+  return __builtin_logbf (a);
+}
-- 
2.38.1



[PATCH 3/4] LoongArch: Add fscaleb.{s, d} instructions as ldexp{sf, df}3

2022-11-08 Thread Xi Ruoyao via Gcc-patches
This allows optimizing __builtin_ldexp{,f} and __builtin_scalbn{,f} with
-fno-math-errno.

IMODE is added because we can't hard code SI for operand 2: fscaleb.d
instruction always take the high half of both source registers into
account.  See my_ldexp_long in the test case.

gcc/ChangeLog:

* config/loongarch/loongarch.md (UNSPEC_FSCALEB): New unspec.
(type): Add fscaleb.
(IMODE): New mode attr.
(ldexp3): New instruction template.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/fscaleb.c: New test.
---
 gcc/config/loongarch/loongarch.md| 26 ++-
 gcc/testsuite/gcc.target/loongarch/fscaleb.c | 48 
 2 files changed, 72 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/fscaleb.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 35cef272060..9070ac4e2f8 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -41,6 +41,7 @@ (define_c_enum "unspec" [
   UNSPEC_FTINT
   UNSPEC_FTINTRM
   UNSPEC_FTINTRP
+  UNSPEC_FSCALEB
 
   ;; Override return address for exception handling.
   UNSPEC_EH_RETURN
@@ -220,6 +221,7 @@ (define_attr "qword_mode" "no,yes"
 ;; fcmpfloating point compare
 ;; fcopysign   floating point copysign
 ;; fcvtfloating point convert
+;; fscaleb floating point scale
 ;; fsqrt   floating point square root
 ;; frsqrt   floating point reciprocal square root
 ;; multi   multiword sequence (or user asm statements)
@@ -231,8 +233,8 @@ (define_attr "type"
   "unknown,branch,jump,call,load,fpload,fpidxload,store,fpstore,fpidxstore,
prefetch,prefetchx,condmove,mgtf,mftg,const,arith,logical,
shift,slt,signext,clz,trap,imul,idiv,move,
-   fmove,fadd,fmul,fmadd,fdiv,frdiv,fabs,fneg,fcmp,fcopysign,fcvt,fsqrt,
-   frsqrt,accext,accmod,multi,atomic,syncloop,nop,ghost"
+   fmove,fadd,fmul,fmadd,fdiv,frdiv,fabs,fneg,fcmp,fcopysign,fcvt,fscaleb,
+   fsqrt,frsqrt,accext,accmod,multi,atomic,syncloop,nop,ghost"
   (cond [(eq_attr "jirl" "!unset") (const_string "call")
 (eq_attr "got" "load") (const_string "load")
 
@@ -418,6 +420,10 @@ (define_mode_attr UNITMODE [(SF "SF") (DF "DF")])
 ;; the controlling mode.
 (define_mode_attr HALFMODE [(DF "SI") (DI "SI") (TF "DI")])
 
+;; This attribute gives the integer mode that has the same size of a
+;; floating-point mode.
+(define_mode_attr IMODE [(SF "SI") (DF "DI")])
+
 ;; This code iterator allows signed and unsigned widening multiplications
 ;; to use the same template.
 (define_code_iterator any_extend [sign_extend zero_extend])
@@ -1011,7 +1017,23 @@ (define_insn "copysign3"
   "fcopysign.\t%0,%1,%2"
   [(set_attr "type" "fcopysign")
(set_attr "mode" "")])
+
+;;
+;;  
+;;
+;; FLOATING POINT SCALE
+;;
+;;  
 
+(define_insn "ldexp3"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+   (unspec:ANYF [(match_operand:ANYF1 "register_operand" "f")
+ (match_operand: 2 "register_operand" "f")]
+UNSPEC_FSCALEB))]
+  "TARGET_HARD_FLOAT"
+  "fscaleb.\t%0,%1,%2"
+  [(set_attr "type" "fscaleb")
+   (set_attr "mode" "")])
 
 ;;
 ;;  ...
diff --git a/gcc/testsuite/gcc.target/loongarch/fscaleb.c 
b/gcc/testsuite/gcc.target/loongarch/fscaleb.c
new file mode 100644
index 000..f18470fbb8f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/fscaleb.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mabi=lp64d -mdouble-float -fno-math-errno" } */
+/* { dg-final { scan-assembler-times "fscaleb\\.s" 3 } } */
+/* { dg-final { scan-assembler-times "fscaleb\\.d" 4 } } */
+/* { dg-final { scan-assembler-times "slli\\.w" 1 } } */
+
+double
+my_scalbln (double a, long b)
+{
+  return __builtin_scalbln (a, b);
+}
+
+double
+my_scalbn (double a, int b)
+{
+  return __builtin_scalbn (a, b);
+}
+
+double
+my_ldexp (double a, int b)
+{
+  return __builtin_ldexp (a, b);
+}
+
+float
+my_scalblnf (float a, long b)
+{
+  return __builtin_scalblnf (a, b);
+}
+
+float
+my_scalbnf (float a, int b)
+{
+  return __builtin_scalbnf (a, b);
+}
+
+float
+my_ldexpf (float a, int b)
+{
+  return __builtin_ldexpf (a, b);
+}
+
+/* b must be sign-extended */
+double
+my_ldexp_long (double a, long b)
+{
+  return __builtin_ldexp (a, b);
+}
-- 
2.38.1



[PATCH 2/4] LoongArch: Add ftint{,rm,rp}.{w,l}.{s,d} instructions

2022-11-08 Thread Xi Ruoyao via Gcc-patches
This allows to optimize the following builtins if -fno-math-errno:

- __builtin_lrint{,f}
- __builtin_lfloor{,f}
- __builtin_lceil{,f}

Inspired by
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605287.html.

ANYFI is added so the compiler won't try ftint.l.s if -mfpu=32.  If we
simply used GPR here an ICE would be triggered with __builtin_lrintf
and -mfpu=32.

Note that the .w.{s,d} variants are not tested because we don't support
ILP32 for now.

gcc/ChangeLog:

* config/loongarch/loongarch.md (UNSPEC_FTINT): New unspec.
(UNSPEC_FTINTRM): Likewise.
(UNSPEC_FTINTRP): Likewise.
(LRINT): New define_int_iterator.
(lrint_pattern): New define_int_attr.
(lrint_submenmonic): Likewise.
(ANYFI): New define_mode_iterator.
(lrint): New instruction template.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/ftint.c: New test.
---
 gcc/config/loongarch/loongarch.md  | 28 ++
 gcc/testsuite/gcc.target/loongarch/ftint.c | 44 ++
 2 files changed, 72 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/ftint.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index a14ab14ac24..35cef272060 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -38,6 +38,9 @@ (define_c_enum "unspec" [
   UNSPEC_FMAX
   UNSPEC_FMIN
   UNSPEC_FCOPYSIGN
+  UNSPEC_FTINT
+  UNSPEC_FTINTRM
+  UNSPEC_FTINTRP
 
   ;; Override return address for exception handling.
   UNSPEC_EH_RETURN
@@ -374,6 +377,11 @@ (define_mode_iterator QHWD [QI HI SI (DI "TARGET_64BIT")])
 (define_mode_iterator ANYF [(SF "TARGET_HARD_FLOAT")
(DF "TARGET_DOUBLE_FLOAT")])
 
+;; Iterator for fixed-point modes which can be hold by a hardware
+;; floating-point register.
+(define_mode_iterator ANYFI [(SI "TARGET_HARD_FLOAT")
+(DI "TARGET_DOUBLE_FLOAT")])
+
 ;; A mode for which moves involving FPRs may need to be split.
 (define_mode_iterator SPLITF
   [(DF "!TARGET_64BIT && TARGET_DOUBLE_FLOAT")
@@ -515,6 +523,16 @@ (define_code_attr fcond [(unordered "cun")
 (define_code_attr sel [(eq "masknez") (ne "maskeqz")])
 (define_code_attr selinv [(eq "maskeqz") (ne "masknez")])
 
+;; Iterator and attributes for floating-point to fixed-point conversion
+;; instructions.
+(define_int_iterator LRINT [UNSPEC_FTINT UNSPEC_FTINTRM UNSPEC_FTINTRP])
+(define_int_attr lrint_pattern [(UNSPEC_FTINT "lrint")
+   (UNSPEC_FTINTRM "lfloor")
+   (UNSPEC_FTINTRP "lceil")])
+(define_int_attr lrint_submenmonic [(UNSPEC_FTINT "")
+   (UNSPEC_FTINTRM "rm")
+   (UNSPEC_FTINTRP "rp")])
+
 ;;
 ;;  
 ;;
@@ -2022,6 +2040,16 @@ (define_insn "rint2"
   [(set_attr "type" "fcvt")
(set_attr "mode" "")])
 
+;; Convert floating-point numbers to integers
+(define_insn "2"
+  [(set (match_operand:ANYFI 0 "register_operand" "=f")
+   (unspec:ANYFI [(match_operand:ANYF 1 "register_operand" "f")]
+ LRINT))]
+  "TARGET_HARD_FLOAT"
+  "ftint.. %0,%1"
+  [(set_attr "type" "fcvt")
+   (set_attr "mode" "")])
+
 ;; Load the low word of operand 0 with operand 1.
 (define_insn "load_low"
   [(set (match_operand:SPLITF 0 "register_operand" "=f,f")
diff --git a/gcc/testsuite/gcc.target/loongarch/ftint.c 
b/gcc/testsuite/gcc.target/loongarch/ftint.c
new file mode 100644
index 000..9c3c3a8a756
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/ftint.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64d -mdouble-float -fno-math-errno" } */
+/* { dg-final { scan-assembler "ftint\\.l\\.s" } } */
+/* { dg-final { scan-assembler "ftint\\.l\\.d" } } */
+/* { dg-final { scan-assembler "ftintrm\\.l\\.s" } } */
+/* { dg-final { scan-assembler "ftintrm\\.l\\.d" } } */
+/* { dg-final { scan-assembler "ftintrp\\.l\\.s" } } */
+/* { dg-final { scan-assembler "ftintrp\\.l\\.d" } } */
+
+long
+my_lrint (double a)
+{
+  return __builtin_lrint (a);
+}
+
+long
+my_lrintf (float a)
+{
+  return __builtin_lrintf (a);
+}
+
+long
+my_lfloor (double a)
+{
+  return __builtin_lfloor (a);
+}
+
+long
+my_lfloorf (float a)
+{
+  return __builtin_lfloorf (a);
+}
+
+long
+my_lceil (double a)
+{
+  return __builtin_lceil (a);
+}
+
+long
+my_lceilf (float a)
+{
+  return __builtin_lceilf (a);
+}
-- 
2.38.1



[PATCH 0/4] LoongArch: Add some floating-point operations

2022-11-08 Thread Xi Ruoyao via Gcc-patches
These patches allow to expand the following builtins to floating point
instructions for LoongArch:

- __builtin_rint{,f}
- __builtin_{l,ll}rint{,f}
- __builtin_{l,ll}floor{,f}
- __builtin_{l,ll}ceil{,f}
- __builtin_scalb{n,ln}{,f}
- __builtin_logb{,f}

Bootstrapped and regtested on loongarch64-linux-gnu.  And a modified
Glibc using the builtins for rint{,f}, {l,ll}rint{,f}, and logb{,f}
also survived Glibc test suite.

Please review ASAP because GCC 13 stage 1 will end on Nov. 13th.

Xi Ruoyao (4):
  LoongArch: Rename frint_ to rint2
  LoongArch: Add ftint{,rm,rp}.{w,l}.{s,d} instructions
  LoongArch: Add fscaleb.{s,d} instructions as ldexp{sf,df}3
  LoongArch: Add flogb.{s,d} instructions and expand logb{sf,df}2

 gcc/config/loongarch/loongarch.md| 89 +++-
 gcc/testsuite/gcc.target/loongarch/flogb.c   | 18 
 gcc/testsuite/gcc.target/loongarch/frint.c   | 16 
 gcc/testsuite/gcc.target/loongarch/fscaleb.c | 48 +++
 gcc/testsuite/gcc.target/loongarch/ftint.c   | 44 ++
 5 files changed, 211 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/flogb.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/frint.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/fscaleb.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/ftint.c

-- 
2.38.1



[PATCH] i386: Add ISA check for newly introduced prefetch builtins.

2022-11-08 Thread Haochen Jiang via Gcc-patches
Hi all,

As Hongtao said, the fail on pentiumpro is caused by missing ISA check
since we are using emit_insn () through new builtins and it won't check
if the TARGET matches. Previously, the builtin in middle-end will check
that.

On pentiumpro, we won't have anything that supports any prefetch so that
it dropped into the pattern and then failed.

I have added the restrictions just like what middle-end builtin_prefetch
does. Also I added missing checks for PREFETCHI. Ok for trunk?

BRs,
Haochen

gcc/ChangeLog:

* config/i386/i386-builtin.def (BDESC): Add
OPTION_MASK_ISA2_PREFETCHI for prefetchi builtin.
* config/i386/i386-expand.cc (ix86_expand_builtin):
Add ISA check before emit_insn.
* config/i386/prfchiintrin.h: Add target for intrin.

gcc/testsuite/ChangeLog:

* gcc.target/i386/prefetchi-5.c: New test.
---
 gcc/config/i386/i386-builtin.def|  2 +-
 gcc/config/i386/i386-expand.cc  | 11 +--
 gcc/config/i386/prfchiintrin.h  | 14 +-
 gcc/testsuite/gcc.target/i386/prefetchi-5.c |  4 
 4 files changed, 27 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/prefetchi-5.c

diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index ea3aff7f125..5e0461acc00 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -498,7 +498,7 @@ BDESC (0, OPTION_MASK_ISA2_WIDEKL, CODE_FOR_nothing, 
"__builtin_ia32_aesencwide1
 BDESC (0, OPTION_MASK_ISA2_WIDEKL, CODE_FOR_nothing, 
"__builtin_ia32_aesencwide256kl_u8", IX86_BUILTIN_AESENCWIDE256KLU8, UNKNOWN, 
(int) UINT8_FTYPE_PV2DI_PCV2DI_PCVOID)
 
 /* PREFETCHI */
-BDESC (0, 0, CODE_FOR_prefetchi, "__builtin_ia32_prefetchi", 
IX86_BUILTIN_PREFETCHI, UNKNOWN, (int) VOID_FTYPE_PCVOID_INT)
+BDESC (0, OPTION_MASK_ISA2_PREFETCHI, CODE_FOR_prefetchi, 
"__builtin_ia32_prefetchi", IX86_BUILTIN_PREFETCHI, UNKNOWN, (int) 
VOID_FTYPE_PCVOID_INT)
 BDESC (0, 0, CODE_FOR_nothing, "__builtin_ia32_prefetch", 
IX86_BUILTIN_PREFETCH, UNKNOWN, (int) VOID_FTYPE_PCVOID_INT_INT_INT)
 
 BDESC_END (SPECIAL_ARGS, PURE_ARGS)
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 9c92b07d5cd..0e45c195390 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -13131,7 +13131,7 @@ ix86_expand_builtin (tree exp, rtx target, rtx 
subtarget,
 
if (INTVAL (op3) == 1)
  {
-   if (TARGET_64BIT
+   if (TARGET_64BIT && TARGET_PREFETCHI
&& local_func_symbolic_operand (op0, GET_MODE (op0)))
  emit_insn (gen_prefetchi (op0, op2));
else
@@ -13150,7 +13150,14 @@ ix86_expand_builtin (tree exp, rtx target, rtx 
subtarget,
op0 = convert_memory_address (Pmode, op0);
op0 = copy_addr_to_reg (op0);
  }
-   emit_insn (gen_prefetch (op0, op1, op2));
+
+   if (TARGET_3DNOW || TARGET_PREFETCH_SSE
+   || TARGET_PRFCHW || TARGET_PREFETCHWT1)
+ emit_insn (gen_prefetch (op0, op1, op2));
+   else if (!MEM_P (op0) && side_effects_p (op0))
+ /* Don't do anything with direct references to volatile memory,
+but generate code to handle other side effects.  */
+ emit_insn (op0);
  }
 
return 0;
diff --git a/gcc/config/i386/prfchiintrin.h b/gcc/config/i386/prfchiintrin.h
index 06deef488ba..996a4be1aba 100644
--- a/gcc/config/i386/prfchiintrin.h
+++ b/gcc/config/i386/prfchiintrin.h
@@ -30,6 +30,13 @@
 
 #ifdef __x86_64__
 
+
+#ifndef __PREFETCHI__
+#pragma GCC push_options
+#pragma GCC target("prefetchi")
+#define __DISABLE_PREFETCHI__
+#endif /* __PREFETCHI__ */
+
 extern __inline void
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _m_prefetchit0 (void* __P)
@@ -44,6 +51,11 @@ _m_prefetchit1 (void* __P)
   __builtin_ia32_prefetchi (__P, 2);
 }
 
-#endif
+#ifdef __DISABLE_PREFETCHI__
+#undef __DISABLE_PREFETCHI__
+#pragma GCC pop_options
+#endif /* __DISABLE_PREFETCHI__ */
+
+#endif /* __x86_64__ */
 
 #endif /* _PRFCHIINTRIN_H_INCLUDED */
diff --git a/gcc/testsuite/gcc.target/i386/prefetchi-5.c 
b/gcc/testsuite/gcc.target/i386/prefetchi-5.c
new file mode 100644
index 000..8c26540f96a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/prefetchi-5.c
@@ -0,0 +1,4 @@
+/* { dg-do compile { target { ia32 } } } */
+/* { dg-options "-O0 -march=pentiumpro" } */
+
+#include "prefetchi-4.c"
-- 
2.18.1



[COMMITTED] [range-op-float] Implement MINUS_EXPR.

2022-11-08 Thread Aldy Hernandez via Gcc-patches
Now that the generic parts of the binary operators have been
abstracted, implementing MINUS_EXPR is a cinch.

The op[12]_range entries will be submitted as a follow-up.

gcc/ChangeLog:

* range-op-float.cc (class foperator_minus): New.
(floating_op_table::floating_op_table): Add MINUS_EXPR entry.
---
 gcc/range-op-float.cc | 24 
 1 file changed, 24 insertions(+)

diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
index 7075c25442a..d52e971f84e 100644
--- a/gcc/range-op-float.cc
+++ b/gcc/range-op-float.cc
@@ -1884,6 +1884,29 @@ class foperator_plus : public range_operator_float
 } fop_plus;
 
 
+class foperator_minus : public range_operator_float
+{
+  void rv_fold (REAL_VALUE_TYPE , REAL_VALUE_TYPE , bool _nan,
+   tree type,
+   const REAL_VALUE_TYPE _lb,
+   const REAL_VALUE_TYPE _ub,
+   const REAL_VALUE_TYPE _lb,
+   const REAL_VALUE_TYPE _ub) const final override
+  {
+frange_arithmetic (MINUS_EXPR, type, lb, lh_lb, rh_ub, dconstninf);
+frange_arithmetic (MINUS_EXPR, type, ub, lh_ub, rh_lb, dconstinf);
+
+// [+INF] - [+INF] = NAN
+if (real_isinf (_ub, false) && real_isinf (_ub, false))
+  maybe_nan = true;
+// [-INF] - [-INF] = NAN
+else if (real_isinf (_lb, true) && real_isinf (_lb, true))
+  maybe_nan = true;
+else
+  maybe_nan = false;
+  }
+} fop_minus;
+
 // Instantiate a range_op_table for floating point operations.
 static floating_op_table global_floating_table;
 
@@ -1917,6 +1940,7 @@ floating_op_table::floating_op_table ()
   set (ABS_EXPR, fop_abs);
   set (NEGATE_EXPR, fop_negate);
   set (PLUS_EXPR, fop_plus);
+  set (MINUS_EXPR, fop_minus);
 }
 
 // Return a pointer to the range_operator_float instance, if there is
-- 
2.38.1



[COMMITTED] [range-op-float] Abstract out binary operator code out of PLUS_EXPR entry.

2022-11-08 Thread Aldy Hernandez via Gcc-patches
The PLUS_EXPR was always meant to be a template for further
development, since most of the binary operators will share a similar
structure.  This patch abstracts out the common bits into the default
definition for range_operator_float::fold_range() and provides an
rv_fold() to be implemented by the individual entries wishing to use
the generic folder.  This is akin to what we do with fold_range() and
wi_fold() in the integer version of range-ops.

gcc/ChangeLog:

* range-op-float.cc (range_operator_float::fold_range): Abstract
out from foperator_plus.
(range_operator_float::rv_fold): New.
(foperator_plus::fold_range): Remove.
(foperator_plus::rv_fold): New.
(propagate_nans): Remove.
* range-op.h (class range_operator_float): Add rv_fold.
---
 gcc/range-op-float.cc | 156 +-
 gcc/range-op.h|   7 ++
 2 files changed, 84 insertions(+), 79 deletions(-)

diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
index 8282c912fc4..7075c25442a 100644
--- a/gcc/range-op-float.cc
+++ b/gcc/range-op-float.cc
@@ -49,13 +49,66 @@ along with GCC; see the file COPYING3.  If not see
 // Default definitions for floating point operators.
 
 bool
-range_operator_float::fold_range (frange  ATTRIBUTE_UNUSED,
- tree type ATTRIBUTE_UNUSED,
- const frange  ATTRIBUTE_UNUSED,
- const frange  ATTRIBUTE_UNUSED,
+range_operator_float::fold_range (frange , tree type,
+ const frange , const frange ,
  relation_trio) const
 {
-  return false;
+  if (empty_range_varying (r, type, op1, op2))
+return true;
+  if (op1.known_isnan () || op2.known_isnan ())
+{
+  r.set_nan (op1.type ());
+  return true;
+}
+
+  REAL_VALUE_TYPE lb, ub;
+  bool maybe_nan;
+  rv_fold (lb, ub, maybe_nan, type,
+  op1.lower_bound (), op1.upper_bound (),
+  op2.lower_bound (), op2.upper_bound ());
+
+  // Handle possible NANs by saturating to the appropriate INF if only
+  // one end is a NAN.  If both ends are a NAN, just return a NAN.
+  bool lb_nan = real_isnan ();
+  bool ub_nan = real_isnan ();
+  if (lb_nan && ub_nan)
+{
+  r.set_nan (type);
+  return true;
+}
+  if (lb_nan)
+lb = dconstninf;
+  else if (ub_nan)
+ub = dconstinf;
+
+  r.set (type, lb, ub);
+
+  if (lb_nan || ub_nan || maybe_nan)
+// Keep the default NAN (with a varying sign) set by the setter.
+;
+  else if (!op1.maybe_isnan () && !op2.maybe_isnan ())
+r.clear_nan ();
+
+  return true;
+}
+
+// For a given operation, fold two sets of ranges into [lb, ub].
+// MAYBE_NAN is set to TRUE if, in addition to any result in LB or
+// UB, the final range has the possiblity of a NAN.
+void
+range_operator_float::rv_fold (REAL_VALUE_TYPE ,
+  REAL_VALUE_TYPE ,
+  bool _nan,
+  tree type ATTRIBUTE_UNUSED,
+  const REAL_VALUE_TYPE _lb ATTRIBUTE_UNUSED,
+  const REAL_VALUE_TYPE _ub ATTRIBUTE_UNUSED,
+  const REAL_VALUE_TYPE _lb ATTRIBUTE_UNUSED,
+  const REAL_VALUE_TYPE _ub ATTRIBUTE_UNUSED)
+  const
+{
+  lb = dconstninf;
+  ub = dconstinf;
+  maybe_nan = true;
 }
 
 bool
@@ -192,19 +245,6 @@ frelop_early_resolve (irange , tree type,
  && relop_early_resolve (r, type, op1, op2, rel, my_rel));
 }
 
-// If either operand is a NAN, set R to NAN and return TRUE.
-
-inline bool
-propagate_nans (frange , const frange , const frange )
-{
-  if (op1.known_isnan () || op2.known_isnan ())
-{
-  r.set_nan (op1.type ());
-  return true;
-}
-  return false;
-}
-
 // Set VALUE to its next real value, or INF if the operation overflows.
 
 inline void
@@ -1822,69 +1862,27 @@ foperator_unordered_equal::op1_range (frange , tree 
type,
 
 class foperator_plus : public range_operator_float
 {
-  using range_operator_float::fold_range;
-
-public:
-  bool fold_range (frange , tree type,
-  const frange ,
-  const frange ,
-  relation_trio = TRIO_VARYING) const final override;
+  void rv_fold (REAL_VALUE_TYPE , REAL_VALUE_TYPE , bool _nan,
+   tree type,
+   const REAL_VALUE_TYPE _lb,
+   const REAL_VALUE_TYPE _ub,
+   const REAL_VALUE_TYPE _lb,
+   const REAL_VALUE_TYPE _ub) const final override
+  {
+frange_arithmetic (PLUS_EXPR, type, lb, lh_lb, rh_lb, dconstninf);
+frange_arithmetic (PLUS_EXPR, type, ub, lh_ub, rh_ub, dconstinf);
+
+// [-INF] + [+INF] = NAN
+if (real_isinf (_lb, true) && real_isinf (_ub, false))
+  maybe_nan = true;
+// [+INF] + [-INF] = NAN
+else if (real_isinf (_ub, false) && real_isinf (_lb, true))
+  maybe_nan = true;
+  

Re: [PATCH] rtl: Try to remove EH edges after {pro,epi}logue generation [PR90259]

2022-11-08 Thread Kewen.Lin via Gcc-patches
Hi Richi and Eric,

Thanks for your reviews and comments!

on 2022/11/8 18:37, Eric Botcazou wrote:
>> It looks reasonable - OK if the others CCed have no comments.
> 
> My only comment is that it needs to be tested with languages enabling -fnon-
> call-exceptions by default (Ada & Go), if not already done.
> 

The previous testings on powerpc64{,le}-linux-gnu covered language Go, but
not Ada.  I re-tested it with languages c,c++,fortran,objc,obj-c++,go,ada
on powerpc64le-linux-gnu, the result looked good.  Both x86 and aarch64
cfarm machines which I used for testing don't have gnat installed, do you
think testing Ada on ppc64le is enough?

BR,
Kewen


Re: [PATCH] [PR24021] Implement PLUS_EXPR range-op entry for floats.

2022-11-08 Thread Aldy Hernandez via Gcc-patches
This patch fixes the oversight.

Tested on x86-64 Linux.

Pushed.

On Wed, Nov 9, 2022 at 12:05 AM Aldy Hernandez  wrote:
>
> Sigh, one more thing.
>
> There are further possibilities for a NAN result, even if the operands
> are !NAN and the result from frange_arithmetic is free of NANs.
> Adding different signed infinities.
>
> For example, [-INF,+INF] + [-INF,+INF] has the possibility of adding
> -INF and +INF, which is a NAN.  Since we end up calling frange
> arithmetic on the lower bounds and then on the upper bounds, we miss
> this, and mistakenly think we're free of NANs.
>
> I have a patch in testing, but FYI, in case anyone notices this before
> I get around to it tomorrow.
>
> Aldy
>
> On Tue, Nov 8, 2022 at 3:11 PM Jakub Jelinek  wrote:
> >
> > On Tue, Nov 08, 2022 at 03:06:53PM +0100, Aldy Hernandez wrote:
> > > +// If either operand is a NAN, set R to the combination of both NANs
> > > +// signwise and return TRUE.
> >
> > This comment doesn't describe what it does now.
> > If either operand is a NAN, set R to NAN with unspecified sign bit and 
> > return
> > TRUE.
> > ?
> >
> > Other than this LGTM.
> >
> > Jakub
> >
From 68b0615be2aaff3a8ce91ba7cd0f69ebbd93702c Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Tue, 8 Nov 2022 23:42:04 +0100
Subject: [PATCH] [range-op-float] Set NAN possibility for INF + (-INF) and
 vice versa.

Some combinations of operations can yield a NAN even if no operands
have the possiblity of a NAN.  For example, [-INF] + [+INF] = NAN and
vice versa.

For [-INF,+INF] + [-INF,+INF], frange_arithmetic will not return a
NAN, and since the operands have no possibility of a NAN, we will
mistakenly assume the result cannot have a NAN.  This fixes the
oversight.

gcc/ChangeLog:

	* range-op-float.cc (foperator_plus::fold_range): Set NAN for
	addition of different signed infinities.
	(range_op_float_tests): New test.
---
 gcc/range-op-float.cc | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
index 3bc6cc8849d..8282c912fc4 100644
--- a/gcc/range-op-float.cc
+++ b/gcc/range-op-float.cc
@@ -1863,7 +1863,21 @@ foperator_plus::fold_range (frange , tree type,
 
   r.set (type, lb, ub);
 
-  if (lb_nan || ub_nan)
+  // Some combinations can yield a NAN even if no operands have the
+  // possibility of a NAN.
+  bool maybe_nan;
+  // [-INF] + [+INF] = NAN
+  if (real_isinf (_bound (), true)
+  && real_isinf (_bound (), false))
+maybe_nan = true;
+  // [+INF] + [-INF] = NAN
+  else if (real_isinf (_bound (), false)
+	   && real_isinf (_bound (), true))
+maybe_nan = true;
+  else
+maybe_nan = false;
+
+  if (lb_nan || ub_nan || maybe_nan)
 // Keep the default NAN (with a varying sign) set by the setter.
 ;
   else if (!op1.maybe_isnan () && !op2.maybe_isnan ())
@@ -1960,6 +1974,16 @@ range_op_float_tests ()
   r1 = frange_float ("-1", "-0");
   r1.update_nan (false);
   ASSERT_EQ (r, r1);
+
+  // [-INF,+INF] + [-INF,+INF] could be a NAN.
+  range_op_handler plus (PLUS_EXPR, float_type_node);
+  r0.set_varying (float_type_node);
+  r1.set_varying (float_type_node);
+  r0.clear_nan ();
+  r1.clear_nan ();
+  plus.fold_range (r, float_type_node, r0, r1);
+  if (HONOR_NANS (float_type_node))
+ASSERT_TRUE (r.maybe_isnan ());
 }
 
 } // namespace selftest
-- 
2.38.1



Re: [PATCH] RISC-V: costs: handle BSWAP

2022-11-08 Thread Palmer Dabbelt

On Tue, 08 Nov 2022 20:43:20 PST (-0800), pins...@gmail.com wrote:

On Tue, Nov 8, 2022 at 7:16 PM Palmer Dabbelt  wrote:


On Tue, 08 Nov 2022 18:57:26 PST (-0800), jeffreya...@gmail.com wrote:
>
> On 11/8/22 12:54, Philipp Tomsich wrote:
>> The BSWAP operation is not handled in rtx_costs. Add it.
>>
>> With Zbb, BSWAP for XLEN is a single instruction; for smaller modes,
>> it will expand into two.
>>
>> gcc/ChangeLog:
>>
>>  * config/riscv/riscv.c (rtx_costs): Add BSWAP.
>
> OK.

It's riscv_rtx_costs.

(I don't usually read ChangeLog entries that closely, just happened to
stumble on it when poking around.)


Using contrib/git-commit-mklog.py can help here to make sure you
always get the correct format for the changelog and it does a decent
job of figuring out function names too.
You can also use contrib/gcc-git-customization.sh to install it such
that you can use it when doing git commits.
After invoking that inside the GCC git; you can just do "git
gcc-commit-mklog " Where  would be what you normally put for
"git commit" (but as if in the toplevel directory).


Thanks, that's awesome.


Re: [PATCH] RISC-V: costs: handle BSWAP

2022-11-08 Thread Andrew Pinski via Gcc-patches
On Tue, Nov 8, 2022 at 7:16 PM Palmer Dabbelt  wrote:
>
> On Tue, 08 Nov 2022 18:57:26 PST (-0800), jeffreya...@gmail.com wrote:
> >
> > On 11/8/22 12:54, Philipp Tomsich wrote:
> >> The BSWAP operation is not handled in rtx_costs. Add it.
> >>
> >> With Zbb, BSWAP for XLEN is a single instruction; for smaller modes,
> >> it will expand into two.
> >>
> >> gcc/ChangeLog:
> >>
> >>  * config/riscv/riscv.c (rtx_costs): Add BSWAP.
> >
> > OK.
>
> It's riscv_rtx_costs.
>
> (I don't usually read ChangeLog entries that closely, just happened to
> stumble on it when poking around.)

Using contrib/git-commit-mklog.py can help here to make sure you
always get the correct format for the changelog and it does a decent
job of figuring out function names too.
You can also use contrib/gcc-git-customization.sh to install it such
that you can use it when doing git commits.
After invoking that inside the GCC git; you can just do "git
gcc-commit-mklog " Where  would be what you normally put for
"git commit" (but as if in the toplevel directory).

Thanks,
Andrew Pinski

>
>
> >
> > Jeff


Re: [PATCH] RISC-V: cost model for loading 64bit constant in rv32

2022-11-08 Thread Palmer Dabbelt

On Tue, 08 Nov 2022 19:26:01 PST (-0800), gcc-patches@gcc.gnu.org wrote:

loading constant 0x739290001LL in rv32 can be done with three instructions
output:
li a1, 7
lui a1, 234128
addi a1, a1, 1


Probably more useful to load the constant into two different registers?  
The point is the same, though, so just a commit message issue.



Similarly, loading 0x839290001LL in rv32 can be done within three instructions
expected output:
li a1, 8
lui a1, 234128
addi a1, a1, 1
However, riscv_build_integer does not handle this case well and makes a wrong 
prediction about the number of instructions needed and then the constant is 
forced to put in the memory via riscv_const_insns and emit_move_insn.
real output:
lui a4,%hi(.LC0)
lw a2,%lo(.LC0)(a4)
lw a3,%lo(.LC0+4)(a4)
.LC0:
 .word958988289
 .word8


That's still 3 instructions, but having loads and a constant is worse so 
that's just another commit message issue.



comparison with clang:
https://godbolt.org/z/v5nxTbKe9 


IIUC the rules are generally no compiler explorer links (so we can 
preserve history) and no attachment patches (ie, inline them like 
git-send-email does).  There's some documenation on sending patches at 
.


Given those usually show up for first-time contributors there's also 
some copyright/DCO procedures to bo followed.  I see some other 
linux.alibaba.com emails called out explicitly, but then also a generic 
"GCC Alibaba Group Holding Limited".  I think that means we're OK for 
copyright assignment here?  There's still the "you wrote the code" bits 
that are worth reading, though.


Looking at the attached patch:


+  if ((value > INT32_MAX || value < INT32_MIN) && !TARGET_64BIT)
+{
+  unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
+  unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 32);
+  struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS],
+  hicode[RISCV_MAX_INTEGER_OPS];
+  int hi_cost, lo_cost;
+
+  hi_cost = riscv_build_integer_1 (hicode, hival, mode);
+  if (hi_cost < cost)
+   {
+lo_cost = riscv_build_integer_1 (alt_codes, loval, mode);
+if (lo_cost + hi_cost < cost)
+  {
+memcpy (codes, alt_codes,
+lo_cost * sizeof (struct riscv_integer_op));
+memcpy (codes + lo_cost, hicode,
+hi_cost * sizeof (struct riscv_integer_op));
+cost = lo_cost + hi_cost;
+  }
+   }
+}


This kind of stuff always seems like it should be possible to handle 
with generic middle-end optimizations: it's essentially just splitting 
the 64-bit constant into two 32-bit constants, which is free because 
it's going into two registers anyway.  That's not a RISC-V specific 
problem, it's just the case any time a constant is going to be split 
between two registers -- it'd even apply for 128-bit constants on rv64, 
though those are probably rare enough they don't matter all that much.


I'm not opposed to doing this in the backend, but maybe it's a sign 
we've just screwed something else up and can avoid adding the code?



+++ b/gcc/testsuite/gcc.target/riscv/rv32-load-64bit-constant.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc -mabi=ilp32 -Os" } */


This has the same library/abi problems we've had before, but in this 
case I think it's fine to just leave the -march/-mabi out: the test 
cases won't be that exciting on rv64 (unless we add the 128-bit splits 
too?), but they should still stay away from the constant pool.


IIUC this should work on more than Os, at least O2/above?  Not that 
exciting for the test case, though.



+/* { dg-final { scan-assembler-not "\.LC\[0-9\]" } } */


That's a bit fragile, maybe just match on load-word?


Ping: [PATCH V2] rs6000: Support to build constants by li/lis+oris/xoris

2022-11-08 Thread Jiufu Guo via Gcc-patches
Hi,

Gentle ping:
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604415.html

BR,
Jeff(Jiufu)


Jiufu Guo  writes:

> Hi,
>
> PR106708 constaint some constants which can be support by li/lis + oris/xoris.
>
> For constant C:
> if '(c & 0x80008000ULL) == 0x8000ULL' or say:
> 32(0) || 1(1) || 15(x) || 1(0) || 15(x), we could use li+oris to
> build constant 'C'.
> Here N(M) means N continuous bit M, x for M means it is ok for either
> 1 or 0; '||' means concatenation.
>
> if '(c & 0x8000ULL) == 0x8000ULL' or say:
> 32(1) || 16(x) || 1(1) || 15(x), using li+xoris would be ok.
>
> if '(c & 0xULL) == 0x' or say:
> 32(1) || 1(0) || 15(x) || 16(0), using lis+xoris would be ok.
>
> This patch update rs6000_emit_set_long_const to support these forms.
> Bootstrap and regtest pass on ppc64 and ppc64le.
>
> Is this ok for trunk?
>
> BR,
> Jeff(Jiufu)
>
>
>   PR target/106708
>
> gcc/ChangeLog:
>
>   * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Support
>   constants which can be built with li + oris or li/lis + xoris.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/powerpc/pr106708-run.c: New test.
>   * gcc.target/powerpc/pr106708.c: New test.
>   * gcc.target/powerpc/pr106708.h: New file.
>
> ---
>  gcc/config/rs6000/rs6000.cc   | 41 ++-
>  .../gcc.target/powerpc/pr106708-run.c | 17 
>  gcc/testsuite/gcc.target/powerpc/pr106708.c   | 12 ++
>  gcc/testsuite/gcc.target/powerpc/pr106708.h   |  9 
>  4 files changed, 69 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106708-run.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106708.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106708.h
>
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index d2743f7bce6..9b7a51f052d 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -10228,6 +10228,7 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
>  {
>rtx temp;
>HOST_WIDE_INT ud1, ud2, ud3, ud4;
> +  HOST_WIDE_INT orig_c = c;
>  
>ud1 = c & 0x;
>c = c >> 16;
> @@ -10253,21 +10254,41 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT 
> c)
>   gen_rtx_IOR (DImode, copy_rtx (temp),
>GEN_INT (ud1)));
>  }
> +  else if ((ud4 == 0x && ud3 == 0x)
> +&& ((ud1 & 0x8000) || (ud1 == 0 && !(ud2 & 0x8000
> +{
> +  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
> +
> +  HOST_WIDE_INT imm = (ud1 & 0x8000) ? ((ud1 ^ 0x8000) - 0x8000)
> +  : ((ud2 << 16) - 0x8000);
> +  /* li/lis + xoris */
> +  emit_move_insn (temp, GEN_INT (imm));
> +  emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
> +  GEN_INT (orig_c ^ imm)));
> +}
>else if (ud3 == 0 && ud4 == 0)
>  {
>temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
>  
>gcc_assert (ud2 & 0x8000);
> -  emit_move_insn (copy_rtx (temp),
> -   GEN_INT (((ud2 << 16) ^ 0x8000) - 0x8000));
> -  if (ud1 != 0)
> - emit_move_insn (copy_rtx (temp),
> - gen_rtx_IOR (DImode, copy_rtx (temp),
> -  GEN_INT (ud1)));
> -  emit_move_insn (dest,
> -   gen_rtx_ZERO_EXTEND (DImode,
> -gen_lowpart (SImode,
> - copy_rtx (temp;
> +
> +  if (!(ud1 & 0x8000))
> + {
> +   /* li+oris */
> +   emit_move_insn (temp, GEN_INT (ud1));
> +   emit_move_insn (dest,
> +   gen_rtx_IOR (DImode, temp, GEN_INT (ud2 << 16)));
> + }
> +  else
> + {
> +   emit_move_insn (temp,
> +   GEN_INT (((ud2 << 16) ^ 0x8000) - 0x8000));
> +   if (ud1 != 0)
> + emit_move_insn (temp, gen_rtx_IOR (DImode, temp, GEN_INT (ud1)));
> +   emit_move_insn (dest,
> +   gen_rtx_ZERO_EXTEND (DImode,
> +gen_lowpart (SImode, temp)));
> + }
>  }
>else if (ud1 == ud3 && ud2 == ud4)
>  {
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106708-run.c 
> b/gcc/testsuite/gcc.target/powerpc/pr106708-run.c
> new file mode 100644
> index 000..df65c321f6b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106708-run.c
> @@ -0,0 +1,17 @@
> +/* PR target/106708 */
> +/* { dg-do run } */
> +/* { dg-options "-O2" } */
> +
> +#include "pr106708.h"
> +
> +long long arr[] = {0x98765432ULL, 0x7cdeab55ULL, 
> 0x6543ULL};
> +int
> +main ()
> +{
> +  long long a[3];
> +
> +  foo (a);
> +  if (__builtin_memcmp (a, arr, sizeof (arr)) != 0)
> +__builtin_abort ();
> +  return 0;
> +}
> 

[PATCH] Fix doc typo

2022-11-08 Thread Sinan via Gcc-patches
add a missing variable name.


0001-doc-FixDocTypo.patch
Description: Binary data


[PATCH] RISC-V: cost model for loading 64bit constant in rv32

2022-11-08 Thread Sinan via Gcc-patches
loading constant 0x739290001LL in rv32 can be done with three instructions
output:
li a1, 7
lui a1, 234128
addi a1, a1, 1
Similarly, loading 0x839290001LL in rv32 can be done within three instructions
expected output:
li a1, 8
lui a1, 234128
addi a1, a1, 1
However, riscv_build_integer does not handle this case well and makes a wrong 
prediction about the number of instructions needed and then the constant is 
forced to put in the memory via riscv_const_insns and emit_move_insn.
real output:
lui a4,%hi(.LC0)
lw a2,%lo(.LC0)(a4)
lw a3,%lo(.LC0+4)(a4)
.LC0:
 .word958988289
 .word8
comparison with clang:
https://godbolt.org/z/v5nxTbKe9 


0001-riscv-improve-cost-model-rv32-load-64bit-constant.patch
Description: Binary data


Ping: [PATCH] rs6000: Enable const_anchor for 'addi'

2022-11-08 Thread Jiufu Guo via Gcc-patches
Hi,

I would like to have a ping for this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html

BR,
Jeff(Jiufu)

Jiufu Guo  writes:

> Hi,
>
> There is a functionality as const_anchor in cse.cc.  This const_anchor
> supports to generate new constants through adding small gap/offsets to
> existing constant.  For example:
>
> void __attribute__ ((noinline)) foo (long long *a)
> {
>   *a++ = 0x2351847027482577LL;
>   *a++ = 0x2351847027482578LL;
> }
> The second constant (0x2351847027482578LL) can be compated by adding '1'
> to the first constant (0x2351847027482577LL).
> This is profitable if more than one instructions are need to build the
> second constant.
>
> * For rs6000, we can enable this functionality, as the instruction
> 'addi' is just for this when gap is smaller than 0x8000.
>
> * Besides enabling TARGET_CONST_ANCHOR on rs6000, this patch also fixed
> one issue. The issue is:
> "gcc_assert (SCALAR_INT_MODE_P (mode))" is an requirement for function
> "try_const_anchors". e.g. it may not need to check const_anchor for
> {[%1:DI]=0;} which is in BLK mode. And "SCALAR_INT_MODE_P (mode)" is
> checked when invoking insert_const_anchors.
> So, this patch also adds this checking before calling try_const_anchors.
>
> * One potential side effect of this patch:
> Comparing with
> "r101=0x2351847027482577LL
> ...
> r201=0x2351847027482578LL"
> The new r201 will be "r201=r101+1", and then r101 will live longer,
> and would increase pressure when allocating registers.
After r201 is changed to "r201=r101+1", r101 will live longer, and
r201 depends on r101.  This would be the major concern for this patch,
I guess.
> But I feel, this would be acceptable for this const_anchor feature.
>
> * With this patch, I checked the performance change on SPEC2017, while,
> and the performance is not aggressive, since this functionality is not
> hit on any hot path. There are runtime wavings/noise(e.g. on
> povray_r/xalancbmk_r/xz_r), that are not caused by the patch.
>
> With this patch, I also checked the changes in object files (from
> GCC bootstrap and SPEC), the significant changes are the improvement
> that: "addi" vs. "2 or more insns: lis+or.."; it also exposes some
> other optimizations opportunities: like combine/jump2. While the
> code to store/load one more register is also occurring in few cases,
> but it does not impact overall performance.
>
> * To refine this patch, some history discussions are referenced:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33699
> https://gcc.gnu.org/pipermail/gcc-patches/2009-April/260421.html
> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566744.html
>
>
> Bootstrap and regtest pass on ppc64 and ppc64le for this patch.
> Is this ok for trunk?
>
>
> BR,
> Jeff (Jiufu)
>
> gcc/ChangeLog:
>
>   * config/rs6000/rs6000.cc (TARGET_CONST_ANCHOR): New define.
>   * cse.cc (cse_insn): Add guard condition.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/powerpc/const_anchors.c: New test.
>   * gcc.target/powerpc/try_const_anchors_ice.c: New test.
>
> ---
>  gcc/config/rs6000/rs6000.cc   |  4 
>  gcc/cse.cc|  3 ++-
>  .../gcc.target/powerpc/const_anchors.c| 20 +++
>  .../powerpc/try_const_anchors_ice.c   | 16 +++
>  4 files changed, 42 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/const_anchors.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/try_const_anchors_ice.c
>
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index d2743f7bce6..80cded6dec1 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -1760,6 +1760,10 @@ static const struct attribute_spec 
> rs6000_attribute_table[] =
>  
>  #undef TARGET_UPDATE_IPA_FN_TARGET_INFO
>  #define TARGET_UPDATE_IPA_FN_TARGET_INFO rs6000_update_ipa_fn_target_info
> +
> +#undef TARGET_CONST_ANCHOR
> +#define TARGET_CONST_ANCHOR 0x8000
> +
>  
>  
>  /* Processor table.  */
> diff --git a/gcc/cse.cc b/gcc/cse.cc
> index b13afd4ba72..56542b91c1e 100644
> --- a/gcc/cse.cc
> +++ b/gcc/cse.cc
> @@ -5005,7 +5005,8 @@ cse_insn (rtx_insn *insn)
>if (targetm.const_anchor
> && !src_related
> && src_const
> -   && GET_CODE (src_const) == CONST_INT)
> +   && GET_CODE (src_const) == CONST_INT
> +   && SCALAR_INT_MODE_P (mode))
>   {
> src_related = try_const_anchors (src_const, mode);
> src_related_is_const_anchor = src_related != NULL_RTX;
> diff --git a/gcc/testsuite/gcc.target/powerpc/const_anchors.c 
> b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
> new file mode 100644
> index 000..39958ff9765
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile { target has_arch_ppc64 } } */
> +/* { dg-options "-O2" } */
> +
> +#define C1 0x2351847027482577ULL
> +#define C2 0x2351847027482578ULL
> +
> 

Re: [PATCH] RISC-V: costs: handle BSWAP

2022-11-08 Thread Palmer Dabbelt

On Tue, 08 Nov 2022 18:57:26 PST (-0800), jeffreya...@gmail.com wrote:


On 11/8/22 12:54, Philipp Tomsich wrote:

The BSWAP operation is not handled in rtx_costs. Add it.

With Zbb, BSWAP for XLEN is a single instruction; for smaller modes,
it will expand into two.

gcc/ChangeLog:

 * config/riscv/riscv.c (rtx_costs): Add BSWAP.


OK.


It's riscv_rtx_costs.

(I don't usually read ChangeLog entries that closely, just happened to 
stumble on it when poking around.)





Jeff


Ping^3: [PATCH V6] rs6000: Optimize cmp on rotated 16bits constant

2022-11-08 Thread Jiufu Guo via Gcc-patches


Hi,

Gentle ping this:
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600475.html

BR,
Jeff (Jiufu)


Jiufu Guo via Gcc-patches  writes:

> Gentle ping:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600475.html
>
> BR,
> Jeff (Jiufu)
>
> Jiufu Guo via Gcc-patches  writes:
>
>> Ping: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600475.html
>>
>> BR,
>> Jeff(Jiufu)
>>
>>
>> Jiufu Guo  writes:
>>
>>> Hi,
>>>
>>> When checking eq/ne with a constant which has only 16bits, it can be
>>> optimized to check the rotated data.  By this, the constant building
>>> is optimized.
>>>
>>> As the example in PR103743:
>>> For "in == 0x8000LL", this patch generates:
>>> rotldi %r3,%r3,16
>>> cmpldi %cr0,%r3,32768
>>> instead:
>>> li %r9,-1
>>> rldicr %r9,%r9,0,0
>>> cmpd %cr0,%r3,%r9
>>>
>>> Compare with previous patchs:
>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600385.html
>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600198.html
>>>
>>> This patch releases the condition on can_create_pseudo_p and adds
>>> clobbers to allow the splitter can be run both before and after RA.
>>>
>>> This is updated patch based on previous patch and comments:
>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600315.html
>>>
>>> This patch pass bootstrap and regtest on ppc64 and ppc64le.
>>> Is it ok for trunk?  Thanks for comments!
>>>
>>> BR,
>>> Jeff(Jiufu)
>>>
>>>
>>> PR target/103743
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/rs6000/rs6000-protos.h (rotate_from_leading_zeros_const): New.
>>> (compare_rotate_immediate_p): New.
>>> * config/rs6000/rs6000.cc (rotate_from_leading_zeros_const): New
>>> definition.
>>> (compare_rotate_immediate_p): New definition.
>>> * config/rs6000/rs6000.md (EQNE): New code_attr.
>>> (*rotate_on_cmpdi): New define_insn_and_split.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.target/powerpc/pr103743.c: New test.
>>> * gcc.target/powerpc/pr103743_1.c: New test.
>>>
>>> ---
>>>  gcc/config/rs6000/rs6000-protos.h |  2 +
>>>  gcc/config/rs6000/rs6000.cc   | 41 
>>>  gcc/config/rs6000/rs6000.md   | 62 +++-
>>>  gcc/testsuite/gcc.target/powerpc/pr103743.c   | 52 ++
>>>  gcc/testsuite/gcc.target/powerpc/pr103743_1.c | 95 +++
>>>  5 files changed, 251 insertions(+), 1 deletion(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743.c
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743_1.c
>>>
>>> diff --git a/gcc/config/rs6000/rs6000-protos.h 
>>> b/gcc/config/rs6000/rs6000-protos.h
>>> index b3c16e7448d..78847e6b3db 100644
>>> --- a/gcc/config/rs6000/rs6000-protos.h
>>> +++ b/gcc/config/rs6000/rs6000-protos.h
>>> @@ -35,6 +35,8 @@ extern bool xxspltib_constant_p (rtx, machine_mode, int 
>>> *, int *);
>>>  extern int vspltis_shifted (rtx);
>>>  extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
>>>  extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
>>> +extern int rotate_from_leading_zeros_const (unsigned HOST_WIDE_INT, int);
>>> +extern bool compare_rotate_immediate_p (unsigned HOST_WIDE_INT);
>>>  extern int num_insns_constant (rtx, machine_mode);
>>>  extern int small_data_operand (rtx, machine_mode);
>>>  extern bool mem_operand_gpr (rtx, machine_mode);
>>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>>> index df491bee2ea..a548db42660 100644
>>> --- a/gcc/config/rs6000/rs6000.cc
>>> +++ b/gcc/config/rs6000/rs6000.cc
>>> @@ -14797,6 +14797,47 @@ rs6000_reverse_condition (machine_mode mode, enum 
>>> rtx_code code)
>>>  return reverse_condition (code);
>>>  }
>>>  
>>> +/* Check if C can be rotated from an immediate which starts (as 64bit 
>>> integer)
>>> +   with at least CLZ bits zero.
>>> +
>>> +   Return the number by which C can be rotated from the immediate.
>>> +   Return -1 if C can not be rotated as from.  */
>>> +
>>> +int
>>> +rotate_from_leading_zeros_const (unsigned HOST_WIDE_INT c, int clz)
>>> +{
>>> +  /* case a. 0..0xxx: already at least clz zeros.  */
>>> +  int lz = clz_hwi (c);
>>> +  if (lz >= clz)
>>> +return 0;
>>> +
>>> +  /* case b. 0..0xxx0..0: at least clz zeros.  */
>>> +  int tz = ctz_hwi (c);
>>> +  if (lz + tz >= clz)
>>> +return tz;
>>> +
>>> +  /* case c. xx10.0xx: rotate 'clz + 1' bits firstly, then check case 
>>> b.
>>> +  ^bit -> Vbit
>>> +00...00xxx100, 'clz + 1' >= bits of .  */
>>> +  const int rot_bits = HOST_BITS_PER_WIDE_INT - clz + 1;
>>> +  unsigned HOST_WIDE_INT rc = (c >> rot_bits) | (c << (clz - 1));
>>> +  tz = ctz_hwi (rc);
>>> +  if (clz_hwi (rc) + tz >= clz)
>>> +return tz + rot_bits;
>>> +
>>> +  return -1;
>>> +}
>>> +
>>> +/* Check if C can be rotated from an immediate operand of cmpdi or cmpldi. 
>>>  */
>>> +
>>> +bool
>>> +compare_rotate_immediate_p (unsigned HOST_WIDE_INT c)
>>> +{
>>> +  /* leading 48 

[PATCH] RISC-V: Add the Zihpm and Zicntr extensions

2022-11-08 Thread Palmer Dabbelt
These extensions were recently frozen [1].  As per Andrew's post [2]
we're meant to ignore these in software, this just adds them to the list
of allowed extensions and otherwise ignores them.  I added these under
SPEC_CLASS_NONE even though the PDF lists them as 20190614 because it
seems pointless to add another spec class just to accept two extensions
we then ignore.

1: 
https://groups.google.com/a/groups.riscv.org/g/isa-dev/c/HZGoqP1eyps/m/GTNKRLJoAQAJ
2: 
https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/QKjQhChrq9Q/m/7gqdkctgAgAJ

gcc/ChangeLog

* common/config/riscv/riscv-common.cc: Add Zihpm and Zicnttr
extensions.

---

These deserves documentation, a test case, and a NEWS entry.  I didn't
write those yet because it's not super clear this is the way we wanted
to go, though: just flat out ignoring the ISA feels like the wrong thing
to do, but the guidance here is pretty clear.  Still feels odd, though.

We've also still got an open discussion on how we want to handle -march
going forwards that's pretty relevant here, so I figured it'd be best to
send this out sooner rather than later as it's sort of related.
---
 gcc/common/config/riscv/riscv-common.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 4b7f777c103..72981f05ac7 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -190,6 +190,9 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zicbom",ISA_SPEC_CLASS_NONE, 1, 0},
   {"zicbop",ISA_SPEC_CLASS_NONE, 1, 0},
 
+  {"zicntr", ISA_SPEC_CLASS_NONE, 2, 0},
+  {"zihpm",  ISA_SPEC_CLASS_NONE, 2, 0},
+
   {"zk",ISA_SPEC_CLASS_NONE, 1, 0},
   {"zkn",   ISA_SPEC_CLASS_NONE, 1, 0},
   {"zks",   ISA_SPEC_CLASS_NONE, 1, 0},
-- 
2.38.1



Re: [PATCH] invoke: RISC-V's -march doesn't take ISA strings

2022-11-08 Thread Palmer Dabbelt

On Tue, 08 Nov 2022 05:40:10 PST (-0800), christoph.muell...@vrull.eu wrote:

On Mon, Nov 7, 2022 at 8:01 PM Palmer Dabbelt  wrote:


The docs say we take ISA strings, but that's never really been the case:
at a bare minimum we've required lower case strings, but there's
generally been some subtle differences as well in things like version
handling and such.  We talked about removing the lower case requirement
in the last GNU toolchain meeting and we've always called other
differences just bugs.  We don't have profile support yet, but based on
the discussions on the RISC-V lists it looks like we're going to have
some differences there as well.




So let's just stop pretending these are ISA strings.  That's been a
headache for years now, if we're meant to just be ISA-string-like here
then we don't have to worry about all these long-tail ISA string parsing
issues.



You are right, we should first properly specify the -march string,
before we talk about the implementation details of the parser.

I tried to collect all the recent change requests and undocumented
properties of the -march string and worked on a first draft specification.
As the -march flag should share a common behavior across different
compilers and tools, I've made a PR to the RISC-V toolchain-conventions
repo:
  https://github.com/riscv-non-isa/riscv-toolchain-conventions/pull/26

Do you mind if we continue the discussion there?


IMO trying to handle this with another RISC-V spec is a waste of time: 
we've spent many years trying to follow the specs here, it's pretty 
clear they're just not meant to be read in that level of detail.  This 
sort of problem is all over the place in RISC-V land, moving to a 
different spec doesn't fix the problem.



Link: https://lists.riscv.org/g/sig-toolchains/message/486

gcc/ChangeLog

doc/invoke.texi (RISC-V): -march doesn't take ISA strings.

---

This is now woefully under-documented, as we can't even fall back on the
"it's just an ISA string" excuse any more.  I'm happy to go document
that, but figured I'd just send this along now so we can have the
discussion.
---
 gcc/doc/invoke.texi | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 94a2e20cfc1..780b0364c52 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -28617,11 +28617,11 @@ Produce code conforming to version 20191213.
 The default is @option{-misa-spec=20191213} unless GCC has been configured
 with @option{--with-isa-spec=} specifying a different default version.

-@item -march=@var{ISA-string}
+@item -march=@var{target-string}
 @opindex march
-Generate code for given RISC-V ISA (e.g.@: @samp{rv64im}).  ISA strings
must be
-lower-case.  Examples include @samp{rv64i}, @samp{rv32g}, @samp{rv32e},
and
-@samp{rv32imaf}.
+Generate code for given target (e.g.@: @samp{rv64im}).  Target strings
are
+similar to ISA strings, but must be lower-case.  Examples include
@samp{rv64i},
+@samp{rv32g}, @samp{rv32e}, and @samp{rv32imaf}.

 When @option{-march=} is not specified, use the setting from
@option{-mcpu}.

--
2.38.1




Re: [PATCH] RISC-V: costs: handle BSWAP

2022-11-08 Thread Jeff Law via Gcc-patches



On 11/8/22 12:54, Philipp Tomsich wrote:

The BSWAP operation is not handled in rtx_costs. Add it.

With Zbb, BSWAP for XLEN is a single instruction; for smaller modes,
it will expand into two.

gcc/ChangeLog:

 * config/riscv/riscv.c (rtx_costs): Add BSWAP.


OK.

Jeff




Re: [PATCH] Support Intel prefetchit0/t1

2022-11-08 Thread Hongtao Liu via Gcc-patches
On Tue, Nov 8, 2022 at 6:07 PM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Fri, Nov 04, 2022 at 03:46:32PM +0800, Haochen Jiang via Gcc-patches wrote:
> > We will take back the patches which add a new parameter on original
> > builtin_prefetch and implement instruction prefetch on that.
> >
> > Also we consider that since we will only do that on specific backend,
> > no need to add a new rtl for that.
> >
> > This patch will only support instructions prefetch for x86 backend.
> >
> > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
>
> The gcc.target/i386/prefetchi-4.c testcase ICEs for me on i686-linux.
> Can be reproduced even on x86_64, with:
> ./cc1 -quiet -m32 -march=pentiumpro prefetchi-4.c -isystem include/
> during RTL pass: expand
> prefetchi-4.c: In function ‘prefetch_test’:
> prefetchi-4.c:11:3: internal compiler error: in gen_prefetch, at 
> config/i386/i386.md:23913
>11 |   __builtin_ia32_prefetch (p, 0, 3, 0);
>   |   ^~~~
> 0x1b92416 gen_prefetch(rtx_def*, rtx_def*, rtx_def*)
> ../../gcc/config/i386/i386.md:23913
> 0x141dcf3 ix86_expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, 
> int)
> ../../gcc/config/i386/i386-expand.cc:13077
> 0x60deb4 expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, int)
> ../../gcc/builtins.cc:7321
> 0x80803d expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
> expand_modifier, rtx_def**, bool)
> ../../gcc/expr.cc:11865
> 0x7fa4d5 expand_expr_real(tree_node*, rtx_def*, machine_mode, 
> expand_modifier, rtx_def**, bool)
> ../../gcc/expr.cc:9000
> 0x648c12 expand_expr
> ../../gcc/expr.h:310
> 0x651c17 expand_call_stmt
> ../../gcc/cfgexpand.cc:2831
> 0x655709 expand_gimple_stmt_1
> ../../gcc/cfgexpand.cc:3880
> 0x655d93 expand_gimple_stmt
> ../../gcc/cfgexpand.cc:4044
> 0x65e061 expand_gimple_basic_block
> ../../gcc/cfgexpand.cc:6096
> 0x660575 execute
> ../../gcc/cfgexpand.cc:6822
> Please submit a full bug report, with preprocessed source (by using 
> -freport-bug).
> Please include the complete backtrace with any bug report.
> See  for instructions.
>
> The ICE is on
>   gcc_assert (TARGET_3DNOW);
>   operands[2] = GEN_INT (3);
> The expander has
>   "TARGET_3DNOW || TARGET_PREFETCH_SSE || TARGET_PRFCHW || TARGET_PREFETCHWT1"
> condition and for write handles all those different ISAs, so gcc_assert 
> (TARGET_3DNOW);
> at the end only asserts the obvious that the expander condition had to be
> satisfied.  But for !write, it only has:
>   if (TARGET_PREFETCH_SSE)
> ;
>   else
> {
>   gcc_assert (TARGET_3DNOW);
>   operands[2] = GEN_INT (3);
> }
> and here I don't understand how it can work, because if
> !TARGET_3DNOW && !TARGET_PREFETCH_SSE, but
> TARGET_PRFCHW || TARGET_PREFETCHWT1
> then it clearly ICEs.  Both of the latter ISAs can be enabled/disabled
> individually without dependencies.
>
> It is unclear what exactly changed though, because the prefetch pattern
> has not changed, but it didn't ICE before that commit.
We need to check the expander condition before gen_prefetch.
>
> Jakub
>


-- 
BR,
Hongtao


[PATCH v3 1/3] libcpp: reject codepoints above 0x10FFFF

2022-11-08 Thread Ben Boeckel via Gcc-patches
Unicode does not support such values because they are unrepresentable in
UTF-16.

libcpp/

* charset.cc: Reject encodings of codepoints above 0x10.
UTF-16 does not support such codepoints and therefore all
Unicode rejects such values.

Signed-off-by: Ben Boeckel 
---
 libcpp/charset.cc | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index 12a398e7527..324b5b19136 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -158,6 +158,10 @@ struct _cpp_strbuf
encoded as any of DF 80, E0 9F 80, F0 80 9F 80, F8 80 80 9F 80, or
FC 80 80 80 9F 80.  Only the first is valid.
 
+   Additionally, Unicode declares that all codepoints above 0010 are
+   invalid because they cannot be represented in UTF-16. As such, all 5- and
+   6-byte encodings are invalid.
+
An implementation note: the transformation from UTF-16 to UTF-8, or
vice versa, is easiest done by using UTF-32 as an intermediary.  */
 
@@ -216,7 +220,7 @@ one_utf8_to_cppchar (const uchar **inbufp, size_t 
*inbytesleftp,
   if (c <= 0x3FF && nbytes > 5) return EILSEQ;
 
   /* Make sure the character is valid.  */
-  if (c > 0x7FFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
+  if (c > 0x10 || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
 
   *cp = c;
   *inbufp = inbuf;
@@ -320,7 +324,7 @@ one_utf32_to_utf8 (iconv_t bigend, const uchar **inbufp, 
size_t *inbytesleftp,
   s += inbuf[bigend ? 2 : 1] << 8;
   s += inbuf[bigend ? 3 : 0];
 
-  if (s >= 0x7FFF || (s >= 0xD800 && s <= 0xDFFF))
+  if (s > 0x10 || (s >= 0xD800 && s <= 0xDFFF))
 return EILSEQ;
 
   rval = one_cppchar_to_utf8 (s, outbufp, outbytesleftp);
-- 
2.38.1



[PATCH v3 3/3] p1689r5: initial support

2022-11-08 Thread Ben Boeckel via Gcc-patches
This patch implements support for [P1689R5][] to communicate to a build
system the C++20 module dependencies to build systems so that they may
build `.gcm` files in the proper order.

Support is communicated through the following three new flags:

- `-fdeps-format=` specifies the format for the output. Currently named
  `p1689r5`.

- `-fdeps-file=` specifies the path to the file to write the format to.

- `-fdep-output=` specifies the `.o` that will be written for the TU
  that is scanned. This is required so that the build system can
  correlate the dependency output with the actual compilation that will
  occur.

CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0)
using an experimental feature selection (to allow for future usage
evolution without committing to how it works today). While it remains
experimental, docs may be found in CMake's documentation for
experimental features.

Future work may include using this format for Fortran module
dependencies as well, however this is still pending work.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html
[cmake-experimental]: 
https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst

TODO:

- header-unit information fields

Header units (including the standard library headers) are 100%
unsupported right now because the `-E` mechanism wants to import their
BMIs. A new mode (i.e., something more workable than existing `-E`
behavior) that mocks up header units as if they were imported purely
from their path and content would be required.

- non-utf8 paths

The current standard says that paths that are not unambiguously
represented using UTF-8 are not supported (because these cases are rare
and the extra complication is not worth it at this time). Future
versions of the format might have ways of encoding non-UTF-8 paths. For
now, this patch just doesn't support non-UTF-8 paths (ignoring the
"unambiguously represetable in UTF-8" case).

- figure out why junk gets placed at the end of the file

Sometimes it seems like the file gets a lot of `NUL` bytes appended to
it. It happens rarely and seems to be the result of some
`ftruncate`-style call which results in extra padding in the contents.
Noting it here as an observation at least.

libcpp/

* include/cpplib.h: Add cpp_deps_format enum.
(cpp_options): Add format field
(cpp_finish): Add dependency stream parameter.
* include/mkdeps.h (deps_add_module_target): Add new preprocessor
parameter used for C++ module tracking.
* init.cc (cpp_finish): Add new preprocessor parameter used for C++
module tracking.
* mkdeps.cc (mkdeps): Implement P1689R5 output.

gcc/

* doc/invoke.texi: Document -fdeps-format=, -fdep-file=, and
-fdep-output= flags.

gcc/c-family/

* c-opts.cc (c_common_handle_option): Add fdeps_file variable and
-fdeps-format=, -fdep-file=, and -fdep-output= parsing.
* c.opt: Add -fdeps-format=, -fdep-file=, and -fdep-output= flags.

gcc/cp/

* module.cc (preprocessed_module): Pass whether the module is
exported to dependency tracking.

gcc/testsuite/

* g++.dg/modules/depflags-f-MD.C: New test.
* g++.dg/modules/depflags-f.C: New test.
* g++.dg/modules/depflags-fi.C: New test.
* g++.dg/modules/depflags-fj-MD.C: New test.
* g++.dg/modules/depflags-fj.C: New test.
* g++.dg/modules/depflags-fjo-MD.C: New test.
* g++.dg/modules/depflags-fjo.C: New test.
* g++.dg/modules/depflags-fo-MD.C: New test.
* g++.dg/modules/depflags-fo.C: New test.
* g++.dg/modules/depflags-j-MD.C: New test.
* g++.dg/modules/depflags-j.C: New test.
* g++.dg/modules/depflags-jo-MD.C: New test.
* g++.dg/modules/depflags-jo.C: New test.
* g++.dg/modules/depflags-o-MD.C: New test.
* g++.dg/modules/depflags-o.C: New test.
* g++.dg/modules/p1689-1.C: New test.
* g++.dg/modules/p1689-1.exp.json: New test expectation.
* g++.dg/modules/p1689-2.C: New test.
* g++.dg/modules/p1689-2.exp.json: New test expectation.
* g++.dg/modules/p1689-3.C: New test.
* g++.dg/modules/p1689-3.exp.json: New test expectation.
* g++.dg/modules/p1689-4.C: New test.
* g++.dg/modules/p1689-4.exp.json: New test expectation.
* g++.dg/modules/p1689-5.C: New test.
* g++.dg/modules/p1689-5.exp.json: New test expectation.
* g++.dg/modules/modules.exp: Load new P1689 library routines.
* g++.dg/modules/test-p1689.py: New tool for validating P1689 output.
* lib/modules.exp: Support for validating P1689 outputs.

Signed-off-by: Ben Boeckel 
---
 gcc/c-family/c-opts.cc|  40 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/module.cc  |   3 +-
 gcc/doc/invoke.texi   |  15 ++
 

[PATCH v3 2/3] libcpp: add a function to determine UTF-8 validity of a C string

2022-11-08 Thread Ben Boeckel via Gcc-patches
This simplifies the interface for other UTF-8 validity detections when a
simple "yes" or "no" answer is sufficient.

libcpp/

* charset.cc: Add `_cpp_valid_utf8_str` which determines whether
a C string is valid UTF-8 or not.
* internal.h: Add prototype for `_cpp_valid_utf8_str`.

Signed-off-by: Ben Boeckel 
---
 libcpp/charset.cc | 20 
 libcpp/internal.h |  2 ++
 2 files changed, 22 insertions(+)

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index 324b5b19136..e130bc01f48 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1868,6 +1868,26 @@ _cpp_valid_utf8 (cpp_reader *pfile,
   return true;
 }
 
+/*  Detect whether a C-string is a valid UTF-8-encoded set of bytes. Returns
+`false` if any contained byte sequence encodes an invalid Unicode codepoint
+or is not a valid UTF-8 sequence. Returns `true` otherwise. */
+
+extern bool
+_cpp_valid_utf8_str (const char *name)
+{
+  const uchar* in = (const uchar*)name;
+  size_t len = strlen(name);
+  cppchar_t cp;
+
+  while (*in)
+{
+  if (one_utf8_to_cppchar(, , ))
+   return false;
+}
+
+  return true;
+}
+
 /* Subroutine of convert_hex and convert_oct.  N is the representation
in the execution character set of a numeric escape; write it into the
string buffer TBUF and update the end-of-string pointer therein.  WIDE
diff --git a/libcpp/internal.h b/libcpp/internal.h
index badfd1b40da..4f2dd4a2f5c 100644
--- a/libcpp/internal.h
+++ b/libcpp/internal.h
@@ -834,6 +834,8 @@ extern bool _cpp_valid_utf8 (cpp_reader *pfile,
 struct normalize_state *nst,
 cppchar_t *cp);
 
+extern bool _cpp_valid_utf8_str (const char *str);
+
 extern void _cpp_destroy_iconv (cpp_reader *);
 extern unsigned char *_cpp_convert_input (cpp_reader *, const char *,
  unsigned char *, size_t, size_t,
-- 
2.38.1



[PATCH v3 0/3] RFC: P1689R5 support

2022-11-08 Thread Ben Boeckel via Gcc-patches
Hi,

This patch adds initial support for ISO C++'s [P1689R5][], a format for
describing C++ module requirements and provisions based on the source
code. This is required because compiling C++ with modules is not
embarrassingly parallel and need to be ordered to ensure that `import
some_module;` can be satisfied in time by making sure that the TU with
`export import some_module;` is compiled first.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html

I'd like feedback on the approach taken here with respect to the
user-visible flags. I'll also note that header units are not supported
at this time because the current `-E` behavior with respect to `import
;` is to search for an appropriate `.gcm` file which is not
something such a "scan" can support. A new mode will likely need to be
created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
where headers are looked up "normally" and processed only as much as
scanning requires.

For the record, Clang has patches with similar flags and behavior by
Chuanqi Xu here:

https://reviews.llvm.org/D134269

with the same flags.

Thanks,

--Ben

---
v2 -> v3:

- changelog entries moved to commit messages
- documentation updated/added in the UTF-8 routine editing

v1 -> v2:

- removal of the `deps_write(extra)` parameter to option-checking where
  ndeeded
- default parameter of `cpp_finish(fdeps_stream = NULL)`
- unification of libcpp UTF-8 validity functions from v1
- test cases for flag parsing states (depflags-*) and p1689 output
  (p1689-*)

Ben Boeckel (3):
  libcpp: reject codepoints above 0x10
  libcpp: add a function to determine UTF-8 validity of a C string
  p1689r5: initial support

 gcc/c-family/c-opts.cc|  40 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/module.cc  |   3 +-
 gcc/doc/invoke.texi   |  15 ++
 gcc/testsuite/g++.dg/modules/depflags-f-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-f.C |   1 +
 gcc/testsuite/g++.dg/modules/depflags-fi.C|   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj.C|   4 +
 .../g++.dg/modules/depflags-fjo-MD.C  |   4 +
 gcc/testsuite/g++.dg/modules/depflags-fjo.C   |   5 +
 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-j-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-j.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-o-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-o.C |   3 +
 gcc/testsuite/g++.dg/modules/modules.exp  |   1 +
 gcc/testsuite/g++.dg/modules/p1689-1.C|  18 ++
 gcc/testsuite/g++.dg/modules/p1689-1.exp.json |  27 +++
 gcc/testsuite/g++.dg/modules/p1689-2.C|  16 ++
 gcc/testsuite/g++.dg/modules/p1689-2.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-3.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-3.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-4.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-4.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/test-p1689.py| 222 ++
 gcc/testsuite/lib/modules.exp |  71 ++
 libcpp/charset.cc |  28 ++-
 libcpp/include/cpplib.h   |  12 +-
 libcpp/include/mkdeps.h   |  17 +-
 libcpp/init.cc|  13 +-
 libcpp/internal.h |   2 +
 libcpp/mkdeps.cc  | 149 +++-
 38 files changed, 773 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fi.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-1.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-1.exp.json
 create mode 100644 

Re: [PATCH] maintainer-scripts/gcc_release: compress xz in parallel

2022-11-08 Thread Xi Ruoyao via Gcc-patches
On Wed, 2022-11-09 at 01:52 +, Joseph Myers wrote:
> On Tue, 8 Nov 2022, Xi Ruoyao via Gcc-patches wrote:
> 
> > I'm wondering if running xz -T0 on different machines (with different
> > core numbers) may produce different compressed data.  The difference can
> > cause trouble distributing checksums.
> 
> gcc_release definitely doesn't use any options to make the tar file 
> reproducible (the timestamps, user and group names and ordering of the
> files in the tarball, and quite likely permissions other than whether a 
> file has execute permission, may depend on when the script was run and on 
> what system as what user - not just on the commit from which the tar file 
> was built).  So I don't think possible variation of xz output matters here 
> at present.

OK then.  I'm already using commands like

git archive --format=tar --prefix=gcc-$(git gcc-descr HEAD)/ HEAD | xz -T0 > 
../gcc-$(git gcc-descr HEAD).tar.xz

when I generate a GCC snapshot tarball for my own use.


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] maintainer-scripts/gcc_release: compress xz in parallel

2022-11-08 Thread Joseph Myers
On Tue, 8 Nov 2022, Xi Ruoyao via Gcc-patches wrote:

> I'm wondering if running xz -T0 on different machines (with different
> core numbers) may produce different compressed data.  The difference can
> cause trouble distributing checksums.

gcc_release definitely doesn't use any options to make the tar file 
reproducible (the timestamps, user and group names and ordering of the 
files in the tarball, and quite likely permissions other than whether a 
file has execute permission, may depend on when the script was run and on 
what system as what user - not just on the commit from which the tar file 
was built).  So I don't think possible variation of xz output matters here 
at present.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH V2] Enable small loop unrolling for O2

2022-11-08 Thread Hongyu Wang via Gcc-patches
> Although ix86_small_unroll_insns is coming from issue_rate, it's tuned
> for codesize.
> Make it exact as issue_rate and using factor * issue_width /
> loop->ninsns may increase code size too much.
> So I prefer to add those 2 parameters to the cost table for core
> tunings instead of 1.

Yes, here is the updated patch that changes the cost table.

Bootstrapped & regrtested on x86_64-pc-linux-gnu.

Ok for trunk?

Hongtao Liu via Gcc-patches  于2022年11月8日周二 11:05写道:
>
> On Mon, Nov 7, 2022 at 10:25 PM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Wed, Nov 2, 2022 at 4:37 AM Hongyu Wang  wrote:
> > >
> > > Hi, this is the updated patch of
> > > https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604345.html,
> > > which uses targetm.loop_unroll_adjust as gate to enable small loop unroll.
> > >
> > > This patch does not change rs6000/s390 since I don't have machine to
> > > test them, but I suppose the default behavior is the same since they
> > > enable flag_unroll_loops at O2.
> > >
> > > Bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > >
> > > Ok for trunk?
> > >
> > > -- Patch content 
> > >
> > > Modern processors has multiple way instruction decoders
> > > For x86, icelake/zen3 has 5 uops, so for small loop with <= 4
> > > instructions (usually has 3 uops with a cmp/jmp pair that can be
> > > macro-fused), the decoder would have 2 uops bubble for each iteration
> > > and the pipeline could not be fully utilized.
> > >
> > > Therefore, this patch enables loop unrolling for small size loop at O2
> > > to fullfill the decoder as much as possible. It turns on rtl loop
> > > unrolling when targetm.loop_unroll_adjust exists and O2 plus speed only.
> > > In x86 backend the default behavior is to unroll small loops with less
> > > than 4 insns by 1 time.
> > >
> > > This improves 548.exchange2 by 9% on icelake and 7.4% on zen3 with
> > > 0.9% codesize increment. For other benchmarks the variants are minor
> > > and overall codesize increased by 0.2%.
> > >
> > > The kernel image size increased by 0.06%, and no impact on eembc.
> > >
> > > gcc/ChangeLog:
> > >
> > > * common/config/i386/i386-common.cc (ix86_optimization_table):
> > > Enable small loop unroll at O2 by default.
> > > * config/i386/i386.cc (ix86_loop_unroll_adjust): Adjust unroll
> > > factor if -munroll-only-small-loops enabled and -funroll-loops/
> > > -funroll-all-loops are disabled.
> > > * config/i386/i386.opt: Add -munroll-only-small-loops,
> > > -param=x86-small-unroll-ninsns= for loop insn limit,
> > > -param=x86-small-unroll-factor= for unroll factor.
> > > * doc/invoke.texi: Document -munroll-only-small-loops,
> > > x86-small-unroll-ninsns and x86-small-unroll-factor.
> > > * loop-init.cc (pass_rtl_unroll_loops::gate): Enable rtl
> > > loop unrolling for -O2-speed and above if target hook
> > > loop_unroll_adjust exists.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.dg/guality/loop-1.c: Add additional option
> > >   -mno-unroll-only-small-loops.
> > > * gcc.target/i386/pr86270.c: Add -mno-unroll-only-small-loops.
> > > * gcc.target/i386/pr93002.c: Likewise.
> > > ---
> > >  gcc/common/config/i386/i386-common.cc   |  1 +
> > >  gcc/config/i386/i386.cc | 18 ++
> > >  gcc/config/i386/i386.opt| 13 +
> > >  gcc/doc/invoke.texi | 16 
> > >  gcc/loop-init.cc| 10 +++---
> > >  gcc/testsuite/gcc.dg/guality/loop-1.c   |  2 ++
> > >  gcc/testsuite/gcc.target/i386/pr86270.c |  2 +-
> > >  gcc/testsuite/gcc.target/i386/pr93002.c |  2 +-
> > >  8 files changed, 59 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/gcc/common/config/i386/i386-common.cc 
> > > b/gcc/common/config/i386/i386-common.cc
> > > index f66bdd5a2af..c6891486078 100644
> > > --- a/gcc/common/config/i386/i386-common.cc
> > > +++ b/gcc/common/config/i386/i386-common.cc
> > > @@ -1724,6 +1724,7 @@ static const struct default_options 
> > > ix86_option_optimization_table[] =
> > >  /* The STC algorithm produces the smallest code at -Os, for x86.  */
> > >  { OPT_LEVELS_2_PLUS, OPT_freorder_blocks_algorithm_, NULL,
> > >REORDER_BLOCKS_ALGORITHM_STC },
> > > +{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 
> > > 1 },
> > >  /* Turn off -fschedule-insns by default.  It tends to make the
> > > problem with not enough registers even worse.  */
> > >  { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
> > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > index c0f37149ed0..0f94a3b609e 100644
> > > --- a/gcc/config/i386/i386.cc
> > > +++ b/gcc/config/i386/i386.cc
> > > @@ -23827,6 +23827,24 @@ ix86_loop_unroll_adjust (unsigned nunroll, class 
> > > loop *loop)
> > >unsigned i;
> > >unsigned mem_count = 0;
> > 

Re: [PATCH] Fix incorrect insn type to avoid ICE in memory attr auto-detection.

2022-11-08 Thread Hongtao Liu via Gcc-patches
On Tue, Nov 8, 2022 at 9:17 AM liuhongt  wrote:
>
> Memory attribute auto detection will check operand 2 for type sselog,
> and check operand 1 for type sselog1. For below 2 insns, there's no
> operand 2. Change type to sselog1.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
Committed as an obvious fix.

>
> gcc/ChangeLog:
>
> PR target/107540
> * config/i386/sse.md (avx512f_movddup512): Change
> type from sselog to sselog1.
> (avx_movddup256): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr107540.c: New test.
> ---
>  gcc/config/i386/sse.md   |  4 ++--
>  gcc/testsuite/gcc.target/i386/pr107540.c | 12 
>  2 files changed, 14 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr107540.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index fa93ae7bf21..4e8463addc3 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -12203,7 +12203,7 @@ (define_insn "avx512f_movddup512"
>  (const_int 6) (const_int 14)])))]
>"TARGET_AVX512F"
>"vmovddup\t{%1, %0|%0, %1}"
> -  [(set_attr "type" "sselog")
> +  [(set_attr "type" "sselog1")
> (set_attr "prefix" "evex")
> (set_attr "mode" "V8DF")])
>
> @@ -12234,7 +12234,7 @@ (define_insn "avx_movddup256"
>  (const_int 2) (const_int 6)])))]
>"TARGET_AVX && "
>"vmovddup\t{%1, %0|%0, %1}"
> -  [(set_attr "type" "sselog")
> +  [(set_attr "type" "sselog1")
> (set_attr "prefix" "")
> (set_attr "mode" "V4DF")])
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr107540.c 
> b/gcc/testsuite/gcc.target/i386/pr107540.c
> new file mode 100644
> index 000..a0351ff9cb5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr107540.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-flive-range-shrinkage -mavx" } */
> +
> +typedef double __attribute__((__vector_size__ (32))) V;
> +
> +V v;
> +
> +void
> +foo (void)
> +{
> +  v = __builtin_ia32_movddup256 (v);
> +}
> --
> 2.27.0
>


--
BR,
Hongtao


Re: Announcement: Porting the Docs to Sphinx - tomorrow

2022-11-08 Thread Sam James via Gcc-patches


> On 9 Nov 2022, at 00:00, Joseph Myers  wrote:
> 
> On Tue, 8 Nov 2022, Sam James via Gcc wrote:
> 
>> Yes, please (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106899)
>> even for snapshots? Pretty please? :)
> 
> I think we want snapshots to come out weekly even if the compiler or
> documentation build fails, which makes anything involving a build as part
> of the snapshot process problematic.

If that's your expectation, that's fine, but I'd regard it as pretty
serious if one didn't build, and I don't remember a time when it didn't.

It's not like it's that much use if it fails to build on a bog-standard
amd64 platform anyway, as if nothing else, you'd get a deluge
of duplicate bug reports.


signature.asc
Description: Message signed with OpenPGP


[PATCH v2] RISC-V: No extensions for SImode min/max against safe constant

2022-11-08 Thread Philipp Tomsich
Optimize the common case of a SImode min/max against a constant
that is safe both for sign- and zero-extension.
E.g., consider the case
  int f(unsigned int* a)
  {
const int C = 1000;
return *a * 3 > C ? C : *a * 3;
  }
where the constant C will yield the same result in DImode whether
sign- or zero-extended.

This should eventually go away once the lowering to RTL smartens up
and considers the precision/signedness and the value-ranges of the
operands to MIN_EXPR nad MAX_EXPR.

gcc/ChangeLog:

* config/riscv/bitmanip.md (*minmax): Additional pattern for
  min/max against constants that are extension-invariant.
* config/riscv/iterators.md (minmax_optab): Add an iterator
  that has only min and max rtl.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbb-min-max-02.c: New test.

Signed-off-by: Philipp Tomsich 
---

Changes in v2:

- fixes the use of minmax_optab (was: bitmanip_optab), which is a
  change that dropped off the cherry-pick on the previous submission

 gcc/config/riscv/bitmanip.md   | 18 ++
 gcc/config/riscv/iterators.md  |  4 
 .../gcc.target/riscv/zbb-min-max-02.c  | 14 ++
 3 files changed, 36 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 3422c43..bb23ceb86d9 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -360,6 +360,24 @@
   "\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
+;; Optimize the common case of a SImode min/max against a constant
+;; that is safe both for sign- and zero-extension.
+(define_insn_and_split "*minmax"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (sign_extend:DI
+ (subreg:SI
+   (bitmanip_minmax:DI (zero_extend:DI (match_operand:SI 1 
"register_operand" "r"))
+   (match_operand:DI 2 
"immediate_operand" "i"))
+  0)))
+   (clobber (match_scratch:DI 3 "="))
+   (clobber (match_scratch:DI 4 "="))]
+  "TARGET_64BIT && TARGET_ZBB && sext_hwi (INTVAL (operands[2]), 32) >= 0"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 3) (sign_extend:DI (match_dup 1)))
+   (set (match_dup 4) (match_dup 2))
+   (set (match_dup 0) (:DI (match_dup 3) (match_dup 4)))])
+
 ;; ZBS extension.
 
 (define_insn "*bset"
diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 50380ecfac9..cbbf61f6514 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -213,6 +213,10 @@
   [(plus "add") (ior "or") (xor "xor") (and "and")])
 
 ; bitmanip code attributes
+(define_code_attr minmax_optab [(smin "smin")
+   (smax "smax")
+   (umin "umin")
+   (umax "umax")])
 (define_code_attr bitmanip_optab [(smin "smin")
  (smax "smax")
  (umin "umin")
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c 
b/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c
new file mode 100644
index 000..b462859f10f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zba_zbb -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Os" "-Oz" "-Og" } } */
+
+int f(unsigned int* a)
+{
+  const int C = 1000;
+  return *a * 3 > C ? C : *a * 3;
+}
+
+/* { dg-final { scan-assembler-times "minu" 1 } } */
+/* { dg-final { scan-assembler-times "sext.w" 1 } } */
+/* { dg-final { scan-assembler-not "zext.w" } } */
+
-- 
2.34.1



Re: Announcement: Porting the Docs to Sphinx - tomorrow

2022-11-08 Thread Joseph Myers
On Tue, 8 Nov 2022, Sam James via Gcc wrote:

> Yes, please (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106899)
> even for snapshots? Pretty please? :)

I think we want snapshots to come out weekly even if the compiler or 
documentation build fails, which makes anything involving a build as part 
of the snapshot process problematic.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] riscv: implement TARGET_MODE_REP_EXTENDED

2022-11-08 Thread Philipp Tomsich
On Mon, 7 Nov 2022 at 14:55, Alexander Monakov  wrote:
>
>
>
> On Sat, 5 Nov 2022, Philipp Tomsich wrote:
>
> > Alexander,
> >
> > I had missed your comment until now.
>
> Please make sure to read replies from Jeff and Palmer as well (their responses
> went to gcc-patches with empty Cc list):
> https://inbox.sourceware.org/gcc-patches/ba895f78-7f47-0f4-5bfe-e21893c4...@ispras.ru/T/#m7b7e5708b82de3b05ba8007ae6544891a03bdc42
>
> For now let me respond to some of the more prominent points:
>
> > > I think this leads to a counter-intuitive requirement that a hand-written
> > > inline asm must sign-extend its output operands that are bound to either
> > > signed or unsigned 32-bit lvalues. Will compiler users be aware of that?
> >
> > I am not sure if I fully understand your concern, as the mode of the
> > asm-output will be derived from the variable type.
> > So "asm (... : "=r" (a))" will take DI/SI/HI/QImode depending on the type
> > of a.
>
> Yes. The problem arises when 'a' later undergoes conversion to a wider type.
>
> > The concern, as far as I understand would be the case where the
> > assembly-sequence leaves an incompatible extension in the register.
>
> Existing documentation like the psABI does not constrain compiler users in any
> way, so their inline asm snippets are free to leave garbage in upper bits. 
> Thus
> there's no "incompatibility" to speak of.
>
> To give a specific example that will be problematic if you go far enough down
> the road of matching MIPS64 behavior:
>
> long f(void)
> {
> int x;
> asm("" : "=r"(x));
> return x;
> }
>
> here GCC (unlike LLVM) omits sign extension of 'x', assuming that asm output
> must have been sign-extended to 64 bits by the programmer.

In fact, with the proposed patch (but also without it), GCC will sign-extend:
f:
  sext.w a0,a0
  ret
  .size f, .-f

To make sure that this is not just an extension to promote the int to
long for the function return, I next added another empty asm to
consume 'x'.
This clearly shows that the extension is performed to postprocess the
output of the asm-statement:

f:
  # ./asm2.c:4: asm("" : "=r"(x));
  sext.w a0,a0 # x, x
  # ./asm2.c:5: asm("" : : "r"(x));
  # ./asm2.c:7: }
  ret

Thanks,
Philipp.


Re: [RFC PATCH] c++: Minimal handling of carries_dependency attribute

2022-11-08 Thread Jason Merrill via Gcc-patches

On 11/8/22 04:42, Jakub Jelinek wrote:

Hi!

A comment in D2552R1:
"The only questionable (but still conforming) case we found was
[[carries_dependency(some_argument)]] on GCC, where the emitted diagnostic said 
that the
carries_dependency attribute is not supported, but did not specifically call 
out the syntax error
in the argument clause."
made me try the following patch, where we'll error at least
for arguments to the attribute and for some uses of the attribute
appertaining to something not mentioned in the standard warn
with different diagnostics (or should that be an error?; clang++
does that, but I think we never do for any attribute, standard or not).
The diagnostics on toplevel attribute declaration is still an
attribute ignored warning and on empty statement different wording.

The paper additionally mentions
struct X { [[nodiscard]]; }; // no diagnostic on GCC
and 2 cases of missing diagnostics on [[fallthrough]] (guess I should
file a PR about those; one problem is that do { ... } while (0); there
is replaced during genericization just by ... and another that
[[fallthrough]] there is followed by a label, but not user/case/default
label, but an artificial one created from while loop genericization.

Thoughts on this?


LGTM.


2022-11-08  Jakub Jelinek  

* tree.cc (handle_carries_dependency_attribute): New function.
(std_attribute_table): Add carries_dependency attribute.
* parser.cc (cp_parser_check_std_attribute): Add carries_dependency
attribute.

* g++.dg/cpp0x/attr-carries_dependency1.C: New test.

--- gcc/cp/tree.cc.jj   2022-11-07 10:30:42.758629740 +0100
+++ gcc/cp/tree.cc  2022-11-08 14:45:08.853864684 +0100
@@ -4923,6 +4923,32 @@ structural_type_p (tree t, bool explain)
return true;
  }
  
+/* Partially handle the C++11 [[carries_dependency]] attribute.

+   Just emit a different diagnostics when it is used on something the
+   spec doesn't allow vs. where it allows and we just choose to ignore
+   it.  */
+
+static tree
+handle_carries_dependency_attribute (tree *node, tree name,
+tree ARG_UNUSED (args),
+int ARG_UNUSED (flags),
+bool *no_add_attrs)
+{
+  if (TREE_CODE (*node) != FUNCTION_DECL
+  && TREE_CODE (*node) != PARM_DECL)
+{
+  warning (OPT_Wattributes, "%qE attribute can only be applied to "
+  "functions or parameters", name);
+  *no_add_attrs = true;
+}
+  else
+{
+  warning (OPT_Wattributes, "%qE attribute ignored", name);
+  *no_add_attrs = true;
+}
+  return NULL_TREE;
+}
+
  /* Handle the C++17 [[nodiscard]] attribute, which is similar to the GNU
 warn_unused_result attribute.  */
  
@@ -5036,6 +5062,8 @@ const struct attribute_spec std_attribut

  handle_likeliness_attribute, attr_cold_hot_exclusions },
{ "noreturn", 0, 0, true, false, false, false,
  handle_noreturn_attribute, attr_noreturn_exclusions },
+  { "carries_dependency", 0, 0, true, false, false, false,
+handle_carries_dependency_attribute, NULL },
{ NULL, 0, 0, false, false, false, false, NULL, NULL }
  };
  
--- gcc/cp/parser.cc.jj	2022-11-04 18:11:41.523945997 +0100

+++ gcc/cp/parser.cc2022-11-08 13:41:35.075135139 +0100
@@ -29239,8 +29239,7 @@ cp_parser_std_attribute (cp_parser *pars
  
  /* Warn if the attribute ATTRIBUTE appears more than once in the

 attribute-list ATTRIBUTES.  This used to be enforced for certain
-   attributes, but the restriction was removed in P2156.  Note that
-   carries_dependency ([dcl.attr.depend]) isn't implemented yet in GCC.
+   attributes, but the restriction was removed in P2156.
 LOC is the location of ATTRIBUTE.  Returns true if ATTRIBUTE was not
 found in ATTRIBUTES.  */
  
@@ -29249,7 +29248,7 @@ cp_parser_check_std_attribute (location_

  {
static auto alist = { "noreturn", "deprecated", "nodiscard", "maybe_unused",
"likely", "unlikely", "fallthrough",
-   "no_unique_address" };
+   "no_unique_address", "carries_dependency" };
if (attributes)
  for (const auto  : alist)
if (is_attribute_p (a, get_attribute_name (attribute))
--- gcc/testsuite/g++.dg/cpp0x/attr-carries_dependency1.C.jj2022-11-08 
15:17:43.168238390 +0100
+++ gcc/testsuite/g++.dg/cpp0x/attr-carries_dependency1.C   2022-11-08 
15:16:39.695104787 +0100
@@ -0,0 +1,17 @@
+// { dg-do compile { target c++11 } }
+
+[[carries_dependency]] int *f1 (); // { dg-warning "attribute 
ignored" }
+int f2 (int *x [[carries_dependency]]);// { dg-warning "attribute 
ignored" }
+[[carries_dependency]] int f3 ();  // { dg-warning "attribute 
ignored" }
+int f4 (int x [[carries_dependency]]); // { dg-warning "attribute 
ignored" }
+[[carries_dependency(1)]] int f5 ();   // { dg-error "'carries_dependency' 
attribute does not take 

Re: [PATCH] [PR24021] Implement PLUS_EXPR range-op entry for floats.

2022-11-08 Thread Aldy Hernandez via Gcc-patches
Sigh, one more thing.

There are further possibilities for a NAN result, even if the operands
are !NAN and the result from frange_arithmetic is free of NANs.
Adding different signed infinities.

For example, [-INF,+INF] + [-INF,+INF] has the possibility of adding
-INF and +INF, which is a NAN.  Since we end up calling frange
arithmetic on the lower bounds and then on the upper bounds, we miss
this, and mistakenly think we're free of NANs.

I have a patch in testing, but FYI, in case anyone notices this before
I get around to it tomorrow.

Aldy

On Tue, Nov 8, 2022 at 3:11 PM Jakub Jelinek  wrote:
>
> On Tue, Nov 08, 2022 at 03:06:53PM +0100, Aldy Hernandez wrote:
> > +// If either operand is a NAN, set R to the combination of both NANs
> > +// signwise and return TRUE.
>
> This comment doesn't describe what it does now.
> If either operand is a NAN, set R to NAN with unspecified sign bit and return
> TRUE.
> ?
>
> Other than this LGTM.
>
> Jakub
>



[committed] analyzer: eliminate region_model::eval_condition_without_cm [PR101962]

2022-11-08 Thread David Malcolm via Gcc-patches
In r12-3094-ge82e0f149b0aba I added the assumption that
POINTER_PLUS_EXPR of non-NULL is non-NULL (for PR analyzer/101962).

Whilst working on another bug, I noticed that this only works
when the LHS is known to be non-NULL via
region_model::eval_condition_without_cm, but not when it's known through
a constraint.

This distinction predates the original commit of the analyzer in GCC 10,
but I believe it became irrelevant in the GCC 11 rewrite of the region
model code (r11-2694-g808f4dfeb3a95f).

Hence this patch eliminates region_model::eval_condition_without_cm in
favor of all users simply calling region_model::eval_condition.  Doing
so enables the "POINTER_PLUS_EXPR of non-NULL is non-NULL" assumption to
also be made when the LHS is known through a constraint (e.g. a
conditional).

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-3819-g9bbcee450deb0f.

gcc/analyzer/ChangeLog:
PR analyzer/101962
* region-model-impl-calls.cc: Update comment.
* region-model.cc (region_model::check_symbolic_bounds): Fix
layout of "void" return.  Replace usage of
eval_condition_without_cm with eval_condition.
(region_model::eval_condition): Take over body of...
(region_model::eval_condition_without_cm): ...this subroutine,
dropping the latter.  Eliminating this distinction avoids issues
where constraints were not considered when recursing.
(region_model::compare_initial_and_pointer): Update comment.
(region_model::symbolic_greater_than): Replace usage of
eval_condition_without_cm with eval_condition.
* region-model.h
(region_model::eval_condition_without_cm): Delete decl.

gcc/testsuite/ChangeLog:
PR analyzer/101962
* gcc.dg/analyzer/data-model-23.c (test_3): New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/region-model-impl-calls.cc   |  2 +-
 gcc/analyzer/region-model.cc  | 75 +++
 gcc/analyzer/region-model.h   |  3 -
 gcc/testsuite/gcc.dg/analyzer/data-model-23.c | 11 +++
 4 files changed, 38 insertions(+), 53 deletions(-)

diff --git a/gcc/analyzer/region-model-impl-calls.cc 
b/gcc/analyzer/region-model-impl-calls.cc
index bc644f8f3ad..9ef31f6ab05 100644
--- a/gcc/analyzer/region-model-impl-calls.cc
+++ b/gcc/analyzer/region-model-impl-calls.cc
@@ -498,7 +498,7 @@ region_model::impl_call_fread (const call_details )
 
This has to be done here so that the sm-handling can use the fact
that they point to the same region to establish that they are equal
-   (in region_model::eval_condition_without_cm), and thus transition
+   (in region_model::eval_condition), and thus transition
all pointers to the region to the "freed" state together, regardless
of casts.  */
 
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 0ca454a0f9c..5ffad64a9c5 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -1764,12 +1764,13 @@ public:
 
 /* Check whether an access is past the end of the BASE_REG.  */
 
-void region_model::check_symbolic_bounds (const region *base_reg,
- const svalue *sym_byte_offset,
- const svalue *num_bytes_sval,
- const svalue *capacity,
- enum access_direction dir,
- region_model_context *ctxt) const
+void
+region_model::check_symbolic_bounds (const region *base_reg,
+const svalue *sym_byte_offset,
+const svalue *num_bytes_sval,
+const svalue *capacity,
+enum access_direction dir,
+region_model_context *ctxt) const
 {
   gcc_assert (ctxt);
 
@@ -1777,7 +1778,7 @@ void region_model::check_symbolic_bounds (const region 
*base_reg,
 = m_mgr->get_or_create_binop (num_bytes_sval->get_type (), PLUS_EXPR,
  sym_byte_offset, num_bytes_sval);
 
-  if (eval_condition_without_cm (next_byte, GT_EXPR, capacity).is_true ())
+  if (eval_condition (next_byte, GT_EXPR, capacity).is_true ())
 {
   tree diag_arg = get_representative_tree (base_reg);
   tree offset_tree = get_representative_tree (sym_byte_offset);
@@ -4161,44 +4162,18 @@ tristate
 region_model::eval_condition (const svalue *lhs,
   enum tree_code op,
   const svalue *rhs) const
-{
-  /* For now, make no attempt to capture constraints on floating-point
- values.  */
-  if ((lhs->get_type () && FLOAT_TYPE_P (lhs->get_type ()))
-  || (rhs->get_type () && FLOAT_TYPE_P (rhs->get_type (
-return tristate::unknown ();
-
-  tristate ts = eval_condition_without_cm (lhs, op, rhs);
-  if (ts.is_known ())

Re: [PATCH, v3] Fortran: ordering of hidden procedure arguments [PR107441]

2022-11-08 Thread Mikael Morin

Le 08/11/2022 à 21:31, Harald Anlauf a écrit :

Hi Mikael,

Am 08.11.22 um 11:32 schrieb Mikael Morin:

this is mostly good.
There is one last corner case that is not properly handled:


diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index 63515b9072a..94988b8690e 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc

(...)

@@ -2619,6 +2620,15 @@ create_function_arglist (gfc_symbol * sym)
 if (f->sym != NULL)    /* Ignore alternate returns.  */
   hidden_typelist = TREE_CHAIN (hidden_typelist);

+  /* Advance hidden_typelist over optional+value argument presence
flags.  */
+  optval_typelist = hidden_typelist;
+  for (f = gfc_sym_get_dummy_args (sym); f; f = f->next)
+    if (f->sym != NULL
+    && f->sym->attr.optional && f->sym->attr.value
+    && !f->sym->attr.dimension && f->sym->ts.type != BT_CLASS
+    && !gfc_bt_struct (f->sym->ts.type))
+  hidden_typelist = TREE_CHAIN (hidden_typelist);
+


This new loop copies the condition guarding the handling of optional
value presence arguments, except that the condition is in an "else if",
and the complement of the condition in the corresponding "if" is
missing, to have strictly the same conditions.


I know, and I left that intentionally, as it is related to
PR107444, assuming that it doesn't lead to a new ICE.  Bad idea.


Admittedly, it only makes a difference for character optional value
arguments, which are hardly working.  At least they work as long as one
doesn't try to query their presence.  Below is a case regressing with
your patch.



With that fixed, I think it's good for mainline.
Thanks for your patience.


! { dg-do compile }
!
! PR fortran/107441
! Check that procedure types and procedure decls match when the procedure
! has both character-typed and character-typed optional value args.
!
! Contributed by M.Morin

program p
   interface
 subroutine i(c, o)
   character(*) :: c
   character(3), optional, value :: o
 end subroutine i
   end interface
   procedure(i), pointer :: pp
   pp => s
   call pp("abcd", "xyz")
contains
   subroutine s(c, o)
 character(*) :: c
 character(3), optional, value :: o
 if (o /= "xyz") stop 1
 if (c /= "abcd") stop 2
   end subroutine s
end program p


Well, that testcase may compile with 12-branch, but it gives
wrong code.  Furthermore, it is arguably invalid, as you are
currently unable to check the presence of the optional argument
due to PR107444.  I am therefore reluctant to have that testcase
now.

To fix that, we may have to bite the bullet and break the
documented ABI, or rather update it, as character,value,optional
is broken in all current gfortran versions, and the documentation
is not completely consistent.  I had planned to do this with the
fix for PR107444, which want to keep separate from the current
patch for good reasons.

I have modified my patch so that your testcase above compiles
and runs.  But as explained, I don't want to add it now.

Regtested again.  What do you think?

Let's proceed with the v3 then.  Character optional value arguments are 
corner cases anyway.


[PING][PATCH] i386: Allow setting target attribute from conditional expression

2022-11-08 Thread J.W. Jagersma via Gcc-patches
I realize this is a feature with a somewhat niche use-case, but I'd really
like to have it in gcc 13, if possible.  Any feedback is appreciated.

On 2022-10-17 16:44, J.W. Jagersma wrote:
> Recently I tried to set a function's target attribute conditionally
> based on template parameters, eg.:
> 
> template
> [[gnu::target (enable_sse ? "sse" : "")]]
> void func () { /* ... */ }
> 
> I then discovered that this is currently not possible.  This small patch
> resolves that.
> 
> A possible alternative solution is to do this globally, eg. in
> decl_attributes.  But doing so would trigger empty-string warnings from
> handle_target_attribute, and I don't know how safe it is to remove that.
> There likely isn't much use for this with other attributes, anyway.
> 
> 2022-10-17  Jan W. Jagersma  
> 
> gcc/ChangeLog:
>   * config/i386/i386-options.cc
>   (ix86_valid_target_attribute_inner_p):  Dereference args string
>   from ADDR_EXPR.
> 
> gcc/testsuite/ChangeLog:
>   * g++.target/i386/target-attr-conditional.C: New test.
> ---
>  gcc/config/i386/i386-options.cc   |  9 
>  .../g++.target/i386/target-attr-conditional.C | 53 +++
>  2 files changed, 62 insertions(+)
>  create mode 100644 gcc/testsuite/g++.target/i386/target-attr-conditional.C
> 
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index acb2291e70f..915f3b0c1f0 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -1123,6 +1123,15 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree 
> args, char *p_strings[],
>  = fndecl == NULL ? UNKNOWN_LOCATION : DECL_SOURCE_LOCATION (fndecl);
>const char *attr_name = target_clone_attr ? "target_clone" : "target";
>  
> +  args = tree_strip_nop_conversions (args);
> +
> +  if (TREE_CODE (args) == ADDR_EXPR)
> +{
> +  /* Attribute string is given by a constexpr function or conditional
> +  expression.  Dereference ADDR_EXPR, operand should be a STRING_CST.  */
> +  args = TREE_OPERAND (args, 0);
> +}
> +
>/* If this is a list, recurse to get the options.  */
>if (TREE_CODE (args) == TREE_LIST)
>  {
> diff --git a/gcc/testsuite/g++.target/i386/target-attr-conditional.C 
> b/gcc/testsuite/g++.target/i386/target-attr-conditional.C
> new file mode 100644
> index 000..2d418ed90bf
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/target-attr-conditional.C
> @@ -0,0 +1,53 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Wno-psabi -m32 -march=i386 -std=c++20" } */
> +
> +#pragma GCC push_options
> +#pragma GCC target("sse")
> +
> +typedef int __m64 __attribute__ ((__vector_size__ (8), __may_alias__));
> +typedef short __v4hi __attribute__ ((__vector_size__ (8)));
> +
> +extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> +_mm_extract_pi16 (__m64 const __A, int const __N)
> +{
> +  return (unsigned short) __builtin_ia32_vec_ext_v4hi ((__v4hi)__A, __N);
> +}
> +
> +#pragma GCC pop_options
> +
> +consteval const char*
> +target_string (bool enable_sse)
> +{
> +  return enable_sse ? "sse" : "";
> +}
> +
> +// Via consteval function
> +template
> +[[gnu::target (target_string (enable_sse))]]
> +int
> +extract1 (__m64 const src)
> +{
> +  if constexpr (enable_sse)
> +return _mm_extract_pi16 (src, 0);
> +  else
> +return reinterpret_cast<__v4hi>(src)[1];
> +}
> +
> +// Via ternary operator
> +template
> +[[gnu::target (enable_sse ? "sse" : "")]]
> +int
> +extract2 (__m64 const src)
> +{
> +  if constexpr (enable_sse)
> +return _mm_extract_pi16 (src, 2);
> +  else
> +return reinterpret_cast<__v4hi>(src)[3];
> +}
> +
> +int
> +test (__m64 const src)
> +{
> +  return extract1(src) + extract1(src)
> +   + extract2(src) + extract2(src);
> +}



[PATCH] RISC-V: No extensions for SImode min/max against safe constant

2022-11-08 Thread Philipp Tomsich
Optimize the common case of a SImode min/max against a constant
that is safe both for sign- and zero-extension.
E.g., consider the case
  int f(unsigned int* a)
  {
const int C = 1000;
return *a * 3 > C ? C : *a * 3;
  }
where the constant C will yield the same result in DImode whether
sign- or zero-extended.

This should eventually go away once the lowering to RTL smartens up
and considers the precision/signedness and the value-ranges of the
operands to MIN_EXPR nad MAX_EXPR.

gcc/ChangeLog:

* config/riscv/bitmanip.md (*minmax): Additional pattern for
  min/max against constants that are extension-invariant.
* config/riscv/iterators.md (minmax_optab): Add an iterator
  that has only min and max rtl.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbb-min-max-02.c: New test.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/bitmanip.md   | 18 ++
 gcc/config/riscv/iterators.md  |  4 
 .../gcc.target/riscv/zbb-min-max-02.c  | 14 ++
 3 files changed, 36 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 3422c43..7e2ff4f79f9 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -360,6 +360,24 @@
   "\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
+;; Optimize the common case of a SImode min/max against a constant
+;; that is safe both for sign- and zero-extension.
+(define_insn_and_split "*minmax"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (sign_extend:DI
+ (subreg:SI
+   (bitmanip_minmax:DI (zero_extend:DI (match_operand:SI 1 
"register_operand" "r"))
+   (match_operand:DI 2 
"immediate_operand" "i"))
+  0)))
+   (clobber (match_scratch:DI 3 "="))
+   (clobber (match_scratch:DI 4 "="))]
+  "TARGET_64BIT && TARGET_ZBB && sext_hwi (INTVAL (operands[2]), 32) >= 0"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 3) (sign_extend:DI (match_dup 1)))
+   (set (match_dup 4) (match_dup 2))
+   (set (match_dup 0) (:DI (match_dup 3) (match_dup 4)))])
+
 ;; ZBS extension.
 
 (define_insn "*bset"
diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 50380ecfac9..cbbf61f6514 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -213,6 +213,10 @@
   [(plus "add") (ior "or") (xor "xor") (and "and")])
 
 ; bitmanip code attributes
+(define_code_attr minmax_optab [(smin "smin")
+   (smax "smax")
+   (umin "umin")
+   (umax "umax")])
 (define_code_attr bitmanip_optab [(smin "smin")
  (smax "smax")
  (umin "umin")
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c 
b/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c
new file mode 100644
index 000..b462859f10f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zba_zbb -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Os" "-Oz" "-Og" } } */
+
+int f(unsigned int* a)
+{
+  const int C = 1000;
+  return *a * 3 > C ? C : *a * 3;
+}
+
+/* { dg-final { scan-assembler-times "minu" 1 } } */
+/* { dg-final { scan-assembler-times "sext.w" 1 } } */
+/* { dg-final { scan-assembler-not "zext.w" } } */
+
-- 
2.34.1



[PATCH] RISC-V: Optimize branches testing a bit-range or a shifted immediate

2022-11-08 Thread Philipp Tomsich
gcc/ChangeLog:

* config/riscv/predicates.md (shifted_const_arith_operand):
(uimm_extra_bit_operand):
* config/riscv/riscv.md (*branch_shiftedarith_equals_zero):
(*branch_shiftedmask_equals_zero):

gcc/testsuite/ChangeLog:

* gcc.target/riscv/branch-1.c: New test.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/predicates.md| 23 ++
 gcc/config/riscv/riscv.md | 51 +++
 gcc/testsuite/gcc.target/riscv/branch-1.c | 37 
 3 files changed, 111 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/branch-1.c

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index c2ff41bb0fd..6772228e5b6 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -285,3 +285,26 @@
(ior (match_operand 0 "register_operand")
(match_test "GET_CODE (op) == UNSPEC
 && (XINT (op, 1) == UNSPEC_VUNDEF)"
+
+;; A CONST_INT operand that consists of a single run of 32 consecutive
+;; set bits.
+(define_predicate "consecutive_bits32_operand"
+  (and (match_operand 0 "consecutive_bits_operand")
+   (match_test "popcount_hwi (UINTVAL (op)) == 32")))
+
+;; A CONST_INT operand that, if shifted down to start with its least
+;; significant non-zero bit, is a SMALL_OPERAND (suitable as an
+;; immediate to logical and arithmetic instructions).
+(define_predicate "shifted_const_arith_operand"
+  (and (match_code "const_int")
+   (match_test "ctz_hwi (INTVAL (op)) > 0")
+   (match_test "SMALL_OPERAND (INTVAL (op) >> ctz_hwi (INTVAL (op)))")))
+
+;; A CONST_INT operand that fits into the unsigned half of a
+;; signed-immediate after the top bit has been cleared.
+(define_predicate "uimm_extra_bit_operand"
+  (and (match_code "const_int")
+   (not (and (match_test "SMALL_OPERAND (INTVAL (op))")
+(match_test "INTVAL (op) > 0")))
+   (ior (match_test "SMALL_OPERAND (UINTVAL (op) & ~(HOST_WIDE_INT_1U << 
floor_log2 (UINTVAL (op")
+   (match_test "popcount_hwi (UINTVAL (op)) == 2"
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 97f6ca37891..171a0cdced6 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2205,6 +2205,57 @@
 
 ;; Conditional branches
 
+(define_insn_and_split "*branch_shiftedarith_equals_zero"
+  [(set (pc)
+   (if_then_else (match_operator 1 "equality_operator"
+  [(and:ANYI (match_operand:ANYI 2 "register_operand" "r")
+ (match_operand 3 
"shifted_const_arith_operand" "i"))
+   (const_int 0)])
+(label_ref (match_operand 0 "" ""))
+(pc)))
+   (clobber (match_scratch:ANYI 4 "="))]
+  "INTVAL (operands[3]) >= 0 || !partial_subreg_p (operands[2])"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 4) (lshiftrt:ANYI (match_dup 2) (match_dup 6)))
+   (set (match_dup 4) (and:ANYI (match_dup 4) (match_dup 7)))
+   (set (pc) (if_then_else (match_op_dup 1 [(match_dup 4) (const_int 0)])
+  (label_ref (match_dup 0)) (pc)))]
+{
+   HOST_WIDE_INT mask = INTVAL (operands[3]);
+   int trailing = ctz_hwi (mask);
+
+   operands[6] = GEN_INT (trailing);
+   operands[7] = GEN_INT (mask >> trailing);
+})
+
+(define_insn_and_split "*branch_shiftedmask_equals_zero"
+  [(set (pc)
+   (if_then_else (match_operator 1 "equality_operator"
+  [(and:ANYI (match_operand:ANYI 2 "register_operand" "r")
+ (match_operand 3 "consecutive_bits_operand" 
"i"))
+   (const_int 0)])
+(label_ref (match_operand 0 "" ""))
+(pc)))
+   (clobber (match_scratch:X 4 "="))]
+  "(INTVAL (operands[3]) >= 0 || !partial_subreg_p (operands[2]))
+&& popcount_hwi (INTVAL (operands[3])) > 1
+&& !SMALL_OPERAND (INTVAL (operands[3]))"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 4) (ashift:X (subreg:X (match_dup 2) 0) (match_dup 6)))
+   (set (match_dup 4) (lshiftrt:X (match_dup 4) (match_dup 7)))
+   (set (pc) (if_then_else (match_op_dup 1 [(match_dup 4) (const_int 0)])
+  (label_ref (match_dup 0)) (pc)))]
+{
+   unsigned HOST_WIDE_INT mask = INTVAL (operands[3]);
+   int leading  = clz_hwi (mask);
+   int trailing = ctz_hwi (mask);
+
+   operands[6] = GEN_INT (leading);
+   operands[7] = GEN_INT (leading + trailing);
+})
+
 (define_insn "*branch_equals_zero"
   [(set (pc)
(if_then_else
diff --git a/gcc/testsuite/gcc.target/riscv/branch-1.c 
b/gcc/testsuite/gcc.target/riscv/branch-1.c
new file mode 100644
index 000..b4a3a946379
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/branch-1.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" } } */
+
+void g();
+
+void f(long long a) 
+{
+  if (a 

Re: [PATCH, v3] Fortran: ordering of hidden procedure arguments [PR107441]

2022-11-08 Thread Harald Anlauf via Gcc-patches

Hi Mikael,

Am 08.11.22 um 11:32 schrieb Mikael Morin:

this is mostly good.
There is one last corner case that is not properly handled:


diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index 63515b9072a..94988b8690e 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc

(...)

@@ -2619,6 +2620,15 @@ create_function_arglist (gfc_symbol * sym)
 if (f->sym != NULL)    /* Ignore alternate returns.  */
   hidden_typelist = TREE_CHAIN (hidden_typelist);

+  /* Advance hidden_typelist over optional+value argument presence
flags.  */
+  optval_typelist = hidden_typelist;
+  for (f = gfc_sym_get_dummy_args (sym); f; f = f->next)
+    if (f->sym != NULL
+    && f->sym->attr.optional && f->sym->attr.value
+    && !f->sym->attr.dimension && f->sym->ts.type != BT_CLASS
+    && !gfc_bt_struct (f->sym->ts.type))
+  hidden_typelist = TREE_CHAIN (hidden_typelist);
+


This new loop copies the condition guarding the handling of optional
value presence arguments, except that the condition is in an "else if",
and the complement of the condition in the corresponding "if" is
missing, to have strictly the same conditions.


I know, and I left that intentionally, as it is related to
PR107444, assuming that it doesn't lead to a new ICE.  Bad idea.


Admittedly, it only makes a difference for character optional value
arguments, which are hardly working.  At least they work as long as one
doesn't try to query their presence.  Below is a case regressing with
your patch.



With that fixed, I think it's good for mainline.
Thanks for your patience.


! { dg-do compile }
!
! PR fortran/107441
! Check that procedure types and procedure decls match when the procedure
! has both character-typed and character-typed optional value args.
!
! Contributed by M.Morin

program p
   interface
     subroutine i(c, o)
   character(*) :: c
   character(3), optional, value :: o
     end subroutine i
   end interface
   procedure(i), pointer :: pp
   pp => s
   call pp("abcd", "xyz")
contains
   subroutine s(c, o)
     character(*) :: c
     character(3), optional, value :: o
     if (o /= "xyz") stop 1
     if (c /= "abcd") stop 2
   end subroutine s
end program p


Well, that testcase may compile with 12-branch, but it gives
wrong code.  Furthermore, it is arguably invalid, as you are
currently unable to check the presence of the optional argument
due to PR107444.  I am therefore reluctant to have that testcase
now.

To fix that, we may have to bite the bullet and break the
documented ABI, or rather update it, as character,value,optional
is broken in all current gfortran versions, and the documentation
is not completely consistent.  I had planned to do this with the
fix for PR107444, which want to keep separate from the current
patch for good reasons.

I have modified my patch so that your testcase above compiles
and runs.  But as explained, I don't want to add it now.

Regtested again.  What do you think?

Thanks,
Harald

From 8694d1d2cbd19b5148b5d1d891b182cc3e718f40 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Fri, 28 Oct 2022 21:58:08 +0200
Subject: [PATCH] Fortran: ordering of hidden procedure arguments [PR107441]

The gfortran argument passing conventions specify a certain order for
procedure arguments that should be followed consequently: the hidden
presence status flags of optional+value scalar arguments of intrinsic type
shall come before the hidden character length, coarray token and offset.
Clarify that in the documentation.

gcc/fortran/ChangeLog:

	PR fortran/107441
	* gfortran.texi (Argument passing conventions): Clarify the gfortran
	argument passing conventions with regard to OPTIONAL dummy arguments
	of intrinsic type.
	* trans-decl.cc (create_function_arglist): Adjust the ordering of
	automatically generated hidden procedure arguments to match the
	documented ABI for gfortran.
	* trans-types.cc (gfc_get_function_type): Separate hidden parameters
	so that the presence flag for optional+value arguments come before
	string length, coarray token and offset, as required.

gcc/testsuite/ChangeLog:

	PR fortran/107441
	* gfortran.dg/coarray/pr107441-caf.f90: New test.
	* gfortran.dg/optional_absent_6.f90: New test.
	* gfortran.dg/optional_absent_7.f90: New test.
---
 gcc/fortran/gfortran.texi |  3 +-
 gcc/fortran/trans-decl.cc | 31 +++---
 gcc/fortran/trans-types.cc| 25 
 .../gfortran.dg/coarray/pr107441-caf.f90  | 27 +
 .../gfortran.dg/optional_absent_6.f90 | 60 +++
 .../gfortran.dg/optional_absent_7.f90 | 31 ++
 6 files changed, 157 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/coarray/pr107441-caf.f90
 create mode 100644 gcc/testsuite/gfortran.dg/optional_absent_6.f90
 create mode 100644 gcc/testsuite/gfortran.dg/optional_absent_7.f90

diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index 

nvptx: stack size limits are relevant for execution only (was: [PATCH, testsuite] Add effective target stack_size)

2022-11-08 Thread Thomas Schwinge
Hi!

On 2017-06-09T16:24:30+0200, Tom de Vries  wrote:
> The patch defines an effective target stack_size, which is used in
> individual test-cases to add -DSTACK_SIZE= [...]

> gccint.info (edited for long lines):
> ...
> 7.2.3.12 Other attributes
> .
>
> 'stack_size'
>   Target has limited stack size.  [...]

On top of that, OK to push the attached
"nvptx: stack size limits are relevant for execution only"?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 158a077129cb1579b93ddf440a5bb60b457e4b7c Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 8 Nov 2022 12:10:03 +0100
Subject: [PATCH] nvptx: stack size limits are relevant for execution only

For non-'dg-do run' test cases, that means: big 'dg-require-stack-size' need
not be UNSUPPORTED (and indeed now do all PASS), 'dg-add-options stack_size'
need not define (and thus limit) 'STACK_SIZE' (and still do all PASS).

Re "Find 'dg-do-what' in an outer frame", currently (sources not completely
clean, though), we've got:

$ git grep -F 'check_effective_target_stack_size: found dg-do-what at level ' -- build-gcc/\*.log | sort | uniq -c
  6 build-gcc/gcc/testsuite/gcc/gcc.log:check_effective_target_stack_size: found dg-do-what at level 2
267 build-gcc/gcc/testsuite/gcc/gcc.log:check_effective_target_stack_size: found dg-do-what at level 3
239 build-gcc/gcc/testsuite/gcc/gcc.log:check_effective_target_stack_size: found dg-do-what at level 4

	gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_stack_size): For
	nvptx target, stack size limits are relevant for execution only.
	gcc/
	* doc/sourcebuild.texi (stack_size): Update.
---
 gcc/doc/sourcebuild.texi  |  4 
 gcc/testsuite/lib/target-supports.exp | 16 
 2 files changed, 20 insertions(+)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 137f00aadc1f..5bbf6fc55909 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2848,6 +2848,10 @@ Target has limited stack size.  The stack size limit can be obtained using the
 STACK_SIZE macro defined by @ref{stack_size_ao,,@code{dg-add-options} feature
 @code{stack_size}}.
 
+Note that for certain targets, stack size limits are relevant for
+execution only, and therefore considered only if @code{dg-do run} is
+in effect, otherwise unlimited.
+
 @item static
 Target supports @option{-static}.
 
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 750897d08548..39ed1723b03a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -625,6 +625,22 @@ proc check_effective_target_trampolines { } {
 # Return 1 if target has limited stack size.
 
 proc check_effective_target_stack_size { } {
+# For nvptx target, stack size limits are relevant for execution only.
+if { [istarget nvptx-*-*] } {
+	# Find 'dg-do-what' in an outer frame.
+	set level 1
+	while true {
+	upvar $level dg-do-what dg-do-what
+	if [info exists dg-do-what] then break
+	incr level
+	}
+	verbose "check_effective_target_stack_size: found dg-do-what at level $level" 2
+
+	if { ![string equal [lindex ${dg-do-what} 0] run] } {
+	return 0
+	}
+}
+
 if [target_info exists gcc,stack_size] {
 	return 1
 }
-- 
2.35.1



Re: [newlib] Generally make all 'long double complex' methods available in

2022-11-08 Thread Jeff Johnston via Gcc-patches
LGTM.

-- Jeff J.

On Tue, Nov 8, 2022 at 1:31 PM Thomas Schwinge 
wrote:

> ..., not just '#if defined(__CYGWIN__)'.  (Exception: 'clog10l' which
> currently
> indeed is for Cygwin only.)
>
> This completes 2017-07-05 commit be3ca3947402827aa52709e677369bc7ad30aa1d
> "Fixed warnings for some long double complex methods" after Aditya
> Upadhyay's
> work on importing "Long double complex methods" from NetBSD.
>
> For example, this changes GCC/nvptx libgfortran 'configure' output as
> follows:
>
> [...]
> checking for ccosf... yes
> checking for ccos... yes
> checking for ccosl... [-no-]{+yes+}
> [...]
>
> ..., and correspondingly GCC/nvptx 'nvptx-none/libgfortran/config.h' as
> follows:
>
> [...]
>  /* Define to 1 if you have the `ccosl' function. */
> -/* #undef HAVE_CCOSL */
> +#define HAVE_CCOSL 1
> [...]
>
> Similarly for 'ccoshl', 'cexpl', 'cpowl', 'csinl', 'csinhl', 'ctanl',
> 'ctanhl',
> 'cacoshl', 'cacosl', 'casinhl', 'catanhl'.  ('conjl', 'cprojl' are not
> currently being used in libgfortran.)
>
> This in turn simplifies GCC/nvptx 'libgfortran/intrinsics/c99_functions.c'
> compilation such that this files doesn't have to provide its own
> "Implementation of various C99 functions" for those, when in fact they're
> available in newlib libm.
> ---
>
> A few more words on why this is relevant for GCC.
>
> For example, 'cexpl' usually is provided by libm, but if it isn't, the
> open-coded replacement function in
> 'libgfortran/intrinsics/c99_functions.c' is effective if it holds that
> 'defined(HAVE_COSL) && defined(HAVE_SINL) && defined(HAVE_EXPL)':
>
> long double complex
> cexpl (long double complex z)
> {
>   long double a, b;
>   long double complex v;
>
>   a = REALPART (z);
>   b = IMAGPART (z);
>   COMPLEX_ASSIGN (v, cosl (b), sinl (b));
>   return expl (a) * v;
> }
>
> This replacement code is active for current GCC/nvptx (... if no longer
> compiling GCC/nvptx libgfortran in "minimal" mode, 'LIBGFOR_MINIMAL',
> which I'm currently working on).
>
> Comparing the preceeding to the 'c99_functions.c.188t.sincos' dump, we see
> for
> that function:
>
>  __attribute__((nothrow, leaf, const))
>  complex long double cexpl (complex long double z)
>  {
>long double b;
>long double a;
>long double _1;
>long double _2;
>long double _4;
>long double _5;
>long double _11;
> +  complex long double sincostmp_13;
>
> [local count: 1073741824]:
>a_7 = REALPART_EXPR ;
>b_8 = IMAGPART_EXPR ;
> -  _1 = cosl (b_8);
> -  _2 = sinl (b_8);
> +  sincostmp_13 = __builtin_cexpil (b_8);
> +  _1 = REALPART_EXPR ;
> +  _2 = IMAGPART_EXPR ;
>_11 = expl (a_7);
>_4 = _1 * _11;
>_5 = _2 * _11;
>REALPART_EXPR <> = _4;
>IMAGPART_EXPR <> = _5;
>return ;
>
>  }
>
> That is, the 'cosl (b)', 'sinl (b)' sequence is replaced by
> '__builtin_cexpil'.  That '__builtin_cexpil' is then later mapped back
> into: 'cexpl'.  We've now got an infinitely-recursive 'cexpl' replacement
> function, "implemented via itself"; GCC/nvptx libgfortran assumes there
> is no 'cexpl' in libm, whereas this 'sincos' transformation does assume
> that there is.  (..., which looks like an additional bug on its own.)
>
> At the PTX-level, this leads to the following:
>
> [...]
> // BEGIN GLOBAL FUNCTION DECL: cexpl
> .visible .func cexpl (.param.u64 %in_ar0, .param.f64 %in_ar1,
> .param.f64 %in_ar2);
>
> // BEGIN GLOBAL FUNCTION DEF: cexpl
> .visible .func cexpl (.param.u64 %in_ar0, .param.f64 %in_ar1,
> .param.f64 %in_ar2)
> {
> [...]
> call cexpl, (%out_arg1, %out_arg2, %out_arg3);
> [...]
> ret;
> }
>
> [...]
> // BEGIN GLOBAL FUNCTION DECL: cexpl
> .extern .func cexpl (.param.u64 %in_ar0, .param.f64 %in_ar1,
> .param.f64 %in_ar2);
> [...]
>
> We see the '.visible .func cexpl' declaration and definition for the
> libgfortran replacement function and in the same compilation unit also
> the '.extern .func cexpl' declaration that implicitly gets introduced via
> the 'sincos' transformation (via the GCC/nvptx back end emitting an
> explicit declaration of any function referenced), and 'ptxas' then
> (rightfully so) complains about that mismatch:
>
> ptxas c99_functions.o, line 35; error   : Inconsistent redefinition of
> variable 'cexpl'
> ptxas fatal   : Ptx assembly aborted due to errors
> nvptx-as: ptxas returned 255 exit status
> make[2]: *** [c99_functions.lo] Error 1
>
> ---
>  newlib/libc/include/complex.h | 35 ---
>  1 file changed, 16 insertions(+), 19 deletions(-)
>
> diff --git a/newlib/libc/include/complex.h b/newlib/libc/include/complex.h
> index 0a3ea97ed..ad3028e4c 100644
> --- a/newlib/libc/include/complex.h
> +++ b/newlib/libc/include/complex.h
> @@ -20,6 

Re: [PATCH] match.pd: rewrite select to branchless expression

2022-11-08 Thread Andrew Pinski via Gcc-patches
On Tue, Nov 8, 2022 at 12:02 PM Michael Collison  wrote:
>
> This patches transforms (cond (and (x , 0x1) == 0), y, (z op y)) into
> (-(and (x , 0x1)) & z ) op y, where op is a '^' or a '|'. It also
> transforms (cond (and (x , 0x1) != 0), (z op y), y ) into (-(and (x ,
> 0x1)) & z ) op y.
>
> Matching this patterns allows GCC to generate branchless code for one of
> the functions in coremark.
>
> Bootstrapped and tested on x86 and RISC-V. Okay?

This seems like a (much) reduced (simplified?) version of
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584411.html .
I have not had time for the last year to go through the comments on
that patch and resubmit it though.
It seems like you are aiming for one specific case in coremarks rather
than a more generic fix too.

Thanks,
Andrew Pinski

>
> Michael.
>
> 2022-11-08  Michael Collison  
>
>  * match.pd ((cond (and (x , 0x1) == 0), y, (z op y) )
>  -> (-(and (x , 0x1)) & z ) op y)
>
> 2022-11-08  Michael Collison  
>
>  * gcc.dg/tree-ssa/branchless-cond.c: New test.
>
> ---
>   gcc/match.pd  | 22 
>   .../gcc.dg/tree-ssa/branchless-cond.c | 26 +++
>   2 files changed, 48 insertions(+)
>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 194ba8f5188..722f517ac6d 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3486,6 +3486,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
> (max @2 @1))
>
> +/* (cond (and (x , 0x1) == 0), y, (z ^ y) ) -> (-(and (x , 0x1)) & z )
> ^ y */
> +(for op (bit_xor bit_ior)
> + (simplify
> +  (cond (eq (bit_and @0 integer_onep@1)
> +integer_zerop)
> +@2
> +(op:c @3 @2))
> +  (if (INTEGRAL_TYPE_P (type)
> +   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
> +   (op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2
> +
> +/* (cond (and (x , 0x1) != 0), (z ^ y), y ) -> (-(and (x , 0x1)) & z )
> ^ y */
> +(for op (bit_xor bit_ior)
> + (simplify
> +  (cond (ne (bit_and @0 integer_onep@1)
> +integer_zerop)
> +(op:c @3 @2)
> +@2)
> +  (if (INTEGRAL_TYPE_P (type)
> +   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
> +   (op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2
> +
>   /* Simplifications of shift and rotates.  */
>
>   (for rotate (lrotate rrotate)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
> b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
> new file mode 100644
> index 000..68087ae6568
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +int f1(unsigned int x, unsigned int y, unsigned int z)
> +{
> +  return ((x & 1) == 0) ? y : z ^ y;
> +}
> +
> +int f2(unsigned int x, unsigned int y, unsigned int z)
> +{
> +  return ((x & 1) != 0) ? z ^ y : y;
> +}
> +
> +int f3(unsigned int x, unsigned int y, unsigned int z)
> +{
> +  return ((x & 1) == 0) ? y : z | y;
> +}
> +
> +int f4(unsigned int x, unsigned int y, unsigned int z)
> +{
> +  return ((x & 1) != 0) ? z | y : y;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " -" 4 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " & " 8 "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "if" "optimized" } } */
> --
> 2.34.1
>
>
>
>


[PATCH] RISC-V: allow bseti on SImode without sign-extension

2022-11-08 Thread Philipp Tomsich
As long as the SImode operand is not a partial subreg, we can use a
bseti without postprocessing to or in a bit, as the middle end is
smart enough to stay away from the signbit.

gcc/ChangeLog:

* config/riscv/bitmanip.md (*bsetidisi): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbs-bexti-02.c: New test.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/bitmanip.md  | 12 +
 gcc/testsuite/gcc.target/riscv/zbs-bseti-02.c | 25 +++
 2 files changed, 37 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-bseti-02.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index cbc00455b67..3422c43 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -408,6 +408,18 @@
   "bseti\t%0,%1,%S2"
   [(set_attr "type" "bitmanip")])
 
+;; As long as the SImode operand is not a partial subreg, we can use a
+;; bseti without postprocessing, as the middle end is smart enough to
+;; stay away from the signbit.
+(define_insn "*bsetidisi"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (ior:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
+   (match_operand 2 "single_bit_mask_operand" "i")))]
+  "TARGET_ZBS && TARGET_64BIT
+   && !partial_subreg_p (operands[2])"
+  "bseti\t%0,%1,%S2"
+  [(set_attr "type" "bitmanip")])
+
 (define_insn "*bclr"
   [(set (match_operand:X 0 "register_operand" "=r")
(and:X (rotate:X (const_int -2)
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bseti-02.c 
b/gcc/testsuite/gcc.target/riscv/zbs-bseti-02.c
new file mode 100644
index 000..d3629946375
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbs-bseti-02.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+/* bexti */
+int f(int* a, int b)
+{
+  return ((*a << b) | (1 << 14));
+}
+
+int g(int a, int b)
+{
+  return ((a + b)| (1 << 30));
+}
+
+int h(int a, int b)
+{
+  return ((a + b)| (1ULL << 33));
+}
+
+/* { dg-final { scan-assembler-times "addw\t" 2 } } */
+/* { dg-final { scan-assembler-times "sllw\t" 1 } } */
+/* { dg-final { scan-assembler-times "bseti\t" 2 } } */
+/* { dg-final { scan-assembler-not "sext.w\t" } } */
+
-- 
2.34.1



[PATCH] match.pd: rewrite select to branchless expression

2022-11-08 Thread Michael Collison
This patches transforms (cond (and (x , 0x1) == 0), y, (z op y)) into 
(-(and (x , 0x1)) & z ) op y, where op is a '^' or a '|'. It also 
transforms (cond (and (x , 0x1) != 0), (z op y), y ) into (-(and (x , 
0x1)) & z ) op y.


Matching this patterns allows GCC to generate branchless code for one of 
the functions in coremark.


Bootstrapped and tested on x86 and RISC-V. Okay?

Michael.

2022-11-08  Michael Collison  

    * match.pd ((cond (and (x , 0x1) == 0), y, (z op y) )
    -> (-(and (x , 0x1)) & z ) op y)

2022-11-08  Michael Collison  

    * gcc.dg/tree-ssa/branchless-cond.c: New test.

---
 gcc/match.pd  | 22 
 .../gcc.dg/tree-ssa/branchless-cond.c | 26 +++
 2 files changed, 48 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 194ba8f5188..722f517ac6d 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3486,6 +3486,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
   (max @2 @1))

+/* (cond (and (x , 0x1) == 0), y, (z ^ y) ) -> (-(and (x , 0x1)) & z ) 
^ y */

+(for op (bit_xor bit_ior)
+ (simplify
+  (cond (eq (bit_and @0 integer_onep@1)
+    integer_zerop)
+    @2
+    (op:c @3 @2))
+  (if (INTEGRAL_TYPE_P (type)
+   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
+   (op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2
+
+/* (cond (and (x , 0x1) != 0), (z ^ y), y ) -> (-(and (x , 0x1)) & z ) 
^ y */

+(for op (bit_xor bit_ior)
+ (simplify
+  (cond (ne (bit_and @0 integer_onep@1)
+    integer_zerop)
+    (op:c @3 @2)
+    @2)
+  (if (INTEGRAL_TYPE_P (type)
+   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
+   (op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2
+
 /* Simplifications of shift and rotates.  */

 (for rotate (lrotate rrotate)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c 
b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c

new file mode 100644
index 000..68087ae6568
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int f1(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z ^ y;
+}
+
+int f2(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z ^ y : y;
+}
+
+int f3(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z | y;
+}
+
+int f4(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z | y : y;
+}
+
+/* { dg-final { scan-tree-dump-times " -" 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " & " 8 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "if" "optimized" } } */
--
2.34.1






[PATCH] RISC-V: Optimize slli(.uw)? + addw + zext.w into sh[123]add + zext.w

2022-11-08 Thread Philipp Tomsich
gcc/ChangeLog:

* config/riscv/bitmanip.md: Handle corner-cases for combine
when chaining slli(.uw)? + addw

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zba-shNadd-04.c: New test.

---

 gcc/config/riscv/bitmanip.md  | 49 +++
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv.cc |  7 +++
 .../gcc.target/riscv/zba-shNadd-04.c  | 23 +
 4 files changed, 80 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zba-shNadd-04.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 726a07b0d90..cbc00455b67 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -56,6 +56,55 @@
[(set (match_dup 5) (plus:DI (ashift:DI (match_dup 1) (match_dup 2)) 
(match_dup 3)))
 (set (match_dup 0) (sign_extend:DI (div:SI (subreg:SI (match_dup 5) 0) 
(subreg:SI (match_dup 4) 0])
 
+; Zba does not provide W-forms of sh[123]add(.uw)?, which leads to an
+; interesting irregularity: we can generate a signed 32-bit result
+; using slli(.uw)?+ addw, but a unsigned 32-bit result can be more
+; efficiently be generated as sh[123]add+zext.w (the .uw can be
+; dropped, if we zero-extend the output anyway).
+;
+; To enable this optimization, we split [ slli(.uw)?, addw, zext.w ]
+; into [ sh[123]add, zext.w ] for use during combine.
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+   (zero_extend:DI (plus:SI (ashift:SI (subreg:SI (match_operand:DI 1 
"register_operand") 0)
+  (match_operand:QI 2 
"imm123_operand"))
+(subreg:SI (match_operand:DI 3 
"register_operand") 0]
+  "TARGET_64BIT && TARGET_ZBA"
+  [(set (match_dup 0) (plus:DI (ashift:DI (match_dup 1) (match_dup 2)) 
(match_dup 3)))
+   (set (match_dup 0) (zero_extend:DI (subreg:SI (match_dup 0) 0)))])
+
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+   (zero_extend:DI (plus:SI (subreg:SI (and:DI (ashift:DI 
(match_operand:DI 1 "register_operand")
+  
(match_operand:QI 2 "imm123_operand"))
+   (match_operand:DI 3 
"consecutive_bits_operand")) 0)
+(subreg:SI (match_operand:DI 4 
"register_operand") 0]
+  "TARGET_64BIT && TARGET_ZBA
+   && riscv_shamt_matches_mask_p (INTVAL (operands[2]), INTVAL (operands[3]))"
+  [(set (match_dup 0) (plus:DI (ashift:DI (match_dup 1) (match_dup 2)) 
(match_dup 4)))
+   (set (match_dup 0) (zero_extend:DI (subreg:SI (match_dup 0) 0)))])
+
+; Make sure that an andi followed by a sh[123]add remains a two instruction
+; sequence--and is not torn apart into slli, slri, add.
+(define_insn_and_split "*andi_add.uw"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (plus:DI (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
+   (match_operand:QI 2 "imm123_operand" "Ds3"))
+(match_operand:DI 3 "consecutive_bits_operand" ""))
+(match_operand:DI 4 "register_operand" "r")))
+   (clobber (match_scratch:DI 5 "="))]
+  "TARGET_64BIT && TARGET_ZBA
+   && riscv_shamt_matches_mask_p (INTVAL (operands[2]), INTVAL (operands[3]))
+   && SMALL_OPERAND (INTVAL (operands[3]) >> INTVAL (operands[2]))"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 5) (and:DI (match_dup 1) (match_dup 3)))
+   (set (match_dup 0) (plus:DI (ashift:DI (match_dup 5) (match_dup 2))
+  (match_dup 4)))]
+{
+   operands[3] = GEN_INT (INTVAL (operands[3]) >> INTVAL (operands[2]));
+})
+
 (define_insn "*shNadduw"
   [(set (match_operand:DI 0 "register_operand" "=r")
(plus:DI
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5a718bb62b4..2ec3af05aa4 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -77,6 +77,7 @@ extern bool riscv_gpr_save_operation_p (rtx);
 extern void riscv_reinit (void);
 extern poly_uint64 riscv_regmode_natural_size (machine_mode);
 extern bool riscv_v_ext_vector_mode_p (machine_mode);
+extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT);
 
 /* Routines implemented in riscv-c.cc.  */
 void riscv_cpu_cpp_builtins (cpp_reader *);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 0b2c4b3599d..5a632058003 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -6497,6 +6497,13 @@ riscv_regmode_natural_size (machine_mode mode)
   return UNITS_PER_WORD;
 }
 
+/* Return true if a shift-amount matches the trailing cleared bits on  

a bitmask */
+bool
+riscv_shamt_matches_mask_p (int shamt, HOST_WIDE_INT mask)
+{
+  return shamt == 

[PATCH] RISC-V: split to allow formation of sh[123]add before divw

2022-11-08 Thread Philipp Tomsich
When using strength-reduction, we will reduce a multiplication to a
sequence of shifts and adds.  If this is performed with 32-bit types
and followed by a division, the lack of w-form sh[123]add will make
combination impossible and lead to a slli + addw being generated.

Split the sequence with the knowledge that a w-form div will perform
implicit sign-extensions.

gcc/ChangeLog:

* config/riscv/bitmanip.md: Add a define_split to optimize
  slliw + addiw + divw into sh[123]add + divw.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zba-shNadd-05.c: New test.

Signed-off-by: Philipp Tomsich 

---

 gcc/config/riscv/bitmanip.md   | 17 +
 gcc/testsuite/gcc.target/riscv/zba-shNadd-05.c | 11 +++
 2 files changed, 28 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zba-shNadd-05.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 30dabdf8ddc..726a07b0d90 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -39,6 +39,23 @@
   [(set_attr "type" "bitmanip")
(set_attr "mode" "")])
 
+; When using strength-reduction, we will reduce a multiplication to a
+; sequence of shifts and adds.  If this is performed with 32-bit types
+; and followed by a division, the lack of w-form sh[123]add will make
+; combination impossible and lead to a slli + addw being generated.
+; Split the sequence with the knowledge that a w-form div will perform
+; implicit sign-extensions.
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+   (sign_extend:DI (div:SI (plus:SI (subreg:SI (ashift:DI 
(match_operand:DI 1 "register_operand")
+  
(match_operand:QI 2 "imm123_operand")) 0)
+   (subreg:SI 
(match_operand:DI 3 "register_operand") 0))
+   (subreg:SI (match_operand:DI 4 "register_operand") 0
+   (clobber (match_operand:DI 5 "register_operand"))]
+  "TARGET_64BIT && TARGET_ZBA"
+   [(set (match_dup 5) (plus:DI (ashift:DI (match_dup 1) (match_dup 2)) 
(match_dup 3)))
+(set (match_dup 0) (sign_extend:DI (div:SI (subreg:SI (match_dup 5) 0) 
(subreg:SI (match_dup 4) 0])
+
 (define_insn "*shNadduw"
   [(set (match_operand:DI 0 "register_operand" "=r")
(plus:DI
diff --git a/gcc/testsuite/gcc.target/riscv/zba-shNadd-05.c 
b/gcc/testsuite/gcc.target/riscv/zba-shNadd-05.c
new file mode 100644
index 000..271c3a8c0ac
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zba-shNadd-05.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zba -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Os" "-Oz" "-Og" } } */
+
+long long f(int a, int b)
+{
+  return (a * 3) / b;
+}
+
+/* { dg-final { scan-assembler-times "sh1add\t" 1 } } */
+/* { dg-final { scan-assembler-times "divw\t" 1 } } */
-- 
2.34.1



[PATCH] RISC-V: bitmanip: use bexti for "(a & (1 << BIT_NO)) ? 0 : -1"

2022-11-08 Thread Philipp Tomsich
Consider creating a polarity-reversed mask from a set-bit (i.e., if
the bit is set, produce all-ones; otherwise: all-zeros).  Using Zbb,
this can be expressed as bexti, followed by an addi of minus-one.  To
enable the combiner to discover this opportunity, we need to split the
canonical expression for "(a & (1 << BIT_NO)) ? 0 : -1" into a form
combinable into bexti.

Consider the function:
long f(long a)
{
  return (a & (1 << BIT_NO)) ? 0 : -1;
}
This produces the following sequence prior to this change:
andia0,a0,16
seqza0,a0
neg a0,a0
ret
Following this change, it results in:
bexti   a0,a0,4
addia0,a0,-1
ret

gcc/ChangeLog:

* config/riscv/bitmanip.md: Add a splitter to generate
  polarity-reversed masks from a set bit using bexti + addi.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbs-bexti.c: New test.

---

 gcc/config/riscv/bitmanip.md   | 13 +
 gcc/testsuite/gcc.target/riscv/zbs-bexti.c | 14 ++
 2 files changed, 27 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-bexti.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index d26f3567182..30dabdf8ddc 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -394,3 +394,16 @@
   "TARGET_ZBS && UINTVAL (operands[2]) < GET_MODE_BITSIZE (mode)"
   "bexti\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
+
+;; We can create a polarity-reversed mask (i.e. bit N -> { set = 0, clear = -1 
})
+;; using a bext(i) followed by an addi instruction.
+;; This splits the canonical representation of "(a & (1 << BIT_NO)) ? 0 : -1".
+(define_split
+  [(set (match_operand:GPR 0 "register_operand")
+   (neg:GPR (eq:GPR (zero_extract:GPR (match_operand:GPR 1 
"register_operand")
+  (const_int 1)
+  (match_operand 2))
+(const_int 0]
+  "TARGET_ZBS"
+  [(set (match_dup 0) (zero_extract:GPR (match_dup 1) (const_int 1) (match_dup 
2)))
+   (set (match_dup 0) (plus:GPR (match_dup 0) (const_int -1)))])
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bexti.c 
b/gcc/testsuite/gcc.target/riscv/zbs-bexti.c
new file mode 100644
index 000..99e3b58309c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbs-bexti.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64 -O2" } */
+
+/* bexti */
+#define BIT_NO  4
+
+long
+foo0 (long a)
+{
+  return (a & (1 << BIT_NO)) ? 0 : -1;
+}
+
+/* { dg-final { scan-assembler "bexti" } } */
+/* { dg-final { scan-assembler "addi" } } */
-- 
2.34.1



[PATCH] RISC-V: branch-(not)equals-zero compares against $zero

2022-11-08 Thread Philipp Tomsich
If we are testing a register or a paradoxical subreg (i.e. anything that is not
a partial subreg) for equality/non-equality with zero, we can generate a branch
that compares against $zero.  This will work for QI, HI, SI and DImode, so we
enable this for ANYI.

2020-08-30  gcc/ChangeLog:

* config/riscv/riscv.md (*branch_equals_zero): Added pattern.

---

 gcc/config/riscv/riscv.md | 13 +
 1 file changed, 13 insertions(+)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 798f7370a08..97f6ca37891 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2205,6 +2205,19 @@
 
 ;; Conditional branches
 
+(define_insn "*branch_equals_zero"
+  [(set (pc)
+   (if_then_else
+(match_operator 1 "equality_operator"
+[(match_operand:ANYI 2 "register_operand" "r")
+ (const_int 0)])
+(label_ref (match_operand 0 "" ""))
+(pc)))]
+  "!partial_subreg_p (operands[2])"
+  "b%C1\t%2,zero,%0"
+  [(set_attr "type" "branch")
+   (set_attr "mode" "none")])
+
 (define_insn "*branch"
   [(set (pc)
(if_then_else
-- 
2.34.1



[PATCH] RISC-V: optimize '(a >= 0) ? b : 0' to srai + andn, if compiling for Zbb

2022-11-08 Thread Philipp Tomsich
If-conversion is turning '(a >= 0) ? b : 0' into a branchless sequence
not a5,a0
sraia5,a5,63
and a0,a1,a5
missing the opportunity to combine the NOT and AND into an ANDN.

This adds a define_split to help the combiner reassociate the NOT with
the AND.


gcc/ChangeLog:

* config/riscv/bitmanip.md: New define_split.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbb-srai-andn.c: New test.

---

 gcc/config/riscv/bitmanip.md   | 13 +
 gcc/testsuite/gcc.target/riscv/zbb-srai-andn.c | 15 +++
 2 files changed, 28 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-srai-andn.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index b44fb9517e7..d26f3567182 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -128,6 +128,19 @@
   [(set_attr "type" "bitmanip")
(set_attr "mode" "")])
 
+;; '(a >= 0) ? b : 0' is emitted branchless (from if-conversion).  Without a
+;; bit of extra help for combine (i.e., the below split), we end up emitting
+;; not/srai/and instead of combining the not into an andn.
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+   (and:DI (neg:DI (ge:DI (match_operand:DI 1 "register_operand")
+  (const_int 0)))
+   (match_operand:DI 2 "register_operand")))
+   (clobber (match_operand:DI 3 "register_operand"))]
+  "TARGET_ZBB"
+  [(set (match_dup 3) (ashiftrt:DI (match_dup 1) (const_int 63)))
+   (set (match_dup 0) (and:DI (not:DI (match_dup 3)) (match_dup 2)))])
+
 (define_insn "*xor_not"
   [(set (match_operand:X 0 "register_operand" "=r")
 (not:X (xor:X (match_operand:X 1 "register_operand" "r")
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-srai-andn.c 
b/gcc/testsuite/gcc.target/riscv/zbb-srai-andn.c
new file mode 100644
index 000..afe9fba5f05
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-srai-andn.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbb -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+
+long long foo0(long long a, long long b)
+{
+  if (a >= 0)
+return b;
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "srai\t" 1 } } */
+/* { dg-final { scan-assembler-times "andn\t" 1 } } */
+
-- 
2.34.1



[PATCH] RISC-V: costs: support shift-and-add in strength-reduction

2022-11-08 Thread Philipp Tomsich
The strength-reduction implementation in expmed.c will assess the
profitability of using shift-and-add using a RTL expression that wraps
a MULT (with a power-of-2) in a PLUS.  Unless the RISC-V rtx_costs
function recognizes this as expressing a sh[123]add instruction, we
will return an inflated cost---thus defeating the optimization.

This change adds the necessary idiom recognition to provide an
accurate cost for this for of expressing sh[123]add.

Instead on expanding to
li  a5,200
mulwa0,a5,a0
with this change, the expression 'a * 200' is sythesized as:
sh2add  a0,a0,a0   // *5 = a + 4 * a
sh2add  a0,a0,a0   // *5 = a + 4 * a
sllia0,a0,3// *8

gcc/ChangeLog:

* config/riscv/riscv.c (riscv_rtx_costs): Recognize shNadd,
if expressed as a plus and multiplication with a power-of-2.

---

 gcc/config/riscv/riscv.cc | 13 +
 1 file changed, 13 insertions(+)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index ab6c745c722..0b2c4b3599d 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2451,6 +2451,19 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
  *total = COSTS_N_INSNS (1);
  return true;
}
+  /* Before strength-reduction, the shNadd can be expressed as the addition
+of a multiplication with a power-of-two.  If this case is not handled,
+the strength-reduction in expmed.c will calculate an inflated cost. */
+  if (TARGET_ZBA
+ && mode == word_mode
+ && GET_CODE (XEXP (x, 0)) == MULT
+ && REG_P (XEXP (XEXP (x, 0), 0))
+ && CONST_INT_P (XEXP (XEXP (x, 0), 1))
+ && IN_RANGE (pow2p_hwi (INTVAL (XEXP (XEXP (x, 0), 1))), 1, 3))
+   {
+ *total = COSTS_N_INSNS (1);
+ return true;
+   }
   /* shNadd.uw pattern for zba.
 [(set (match_operand:DI 0 "register_operand" "=r")
   (plus:DI
-- 
2.34.1



[PATCH] RISC-V: costs: handle BSWAP

2022-11-08 Thread Philipp Tomsich
The BSWAP operation is not handled in rtx_costs. Add it.

With Zbb, BSWAP for XLEN is a single instruction; for smaller modes,
it will expand into two.

gcc/ChangeLog:

* config/riscv/riscv.c (rtx_costs): Add BSWAP.

---

 gcc/config/riscv/riscv.cc | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 32f9ef9ade9..ab6c745c722 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2562,6 +2562,16 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
   *total = riscv_extend_cost (XEXP (x, 0), GET_CODE (x) == ZERO_EXTEND);
   return false;
 
+case BSWAP:
+  if (TARGET_ZBB)
+   {
+ /* RISC-V only defines rev8 for XLEN, so we will need an extra
+shift-right instruction for smaller modes. */
+ *total = COSTS_N_INSNS (mode == word_mode ? 1 : 2);
+ return true;
+   }
+  return false;
+
 case FLOAT:
 case UNSIGNED_FLOAT:
 case FIX:
-- 
2.34.1



Re: Announcement: Porting the Docs to Sphinx - tomorrow

2022-11-08 Thread Sam James via Gcc-patches


> On 8 Nov 2022, at 13:55, Martin Liška  wrote:
> 
> Hi.
> 
> Tomorrow in the morning (UTC time), I'm going to migrate the documentation
> to Sphinx. The final version of the branch can be seen here:
> 
> $ git fetch origin refs/users/marxin/heads/sphinx-final
> $ git co FETCH_HEAD
> 
> URL: https://splichal.eu/gccsphinx-final/
> 
> TL;DR;
> 
> After the migration, people should be able to build (and install) GCC even
> if they miss Sphinx (similar happens now if you miss makeinfo). However, 
> please
> install Sphinx >= 5.3.0 (for manual and info pages - only *core* package is 
> necessary) [1]
> 
> Steps following the migration:
> 
> 1) update of web HTML (and PDF documentation) pages:
>   I prepared a script and tested our server has all what we need.
> 2) gcc_release --enable-generated-files-in-srcdir: here I would like
>   to ask Joseph for cooperation

Yes, please (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106899)
even for snapshots? Pretty please? :)

> 3) URL for diagnostics (used for warning) - will utilize [3]
> 4) package source tarballs - https://gcc.gnu.org/onlinedocs/ (listed here)
> 5) updating links from gcc.gnu.org that point to documentation
> 6) removal of the further Texinfo leftovers
> ...
> 

Thanks for working on this. Very excited.

> Cheers,
> Martin

Best,
sam


signature.asc
Description: Message signed with OpenPGP


[newlib] Generally make all 'long double complex' methods available in

2022-11-08 Thread Thomas Schwinge
..., not just '#if defined(__CYGWIN__)'.  (Exception: 'clog10l' which currently
indeed is for Cygwin only.)

This completes 2017-07-05 commit be3ca3947402827aa52709e677369bc7ad30aa1d
"Fixed warnings for some long double complex methods" after Aditya Upadhyay's
work on importing "Long double complex methods" from NetBSD.

For example, this changes GCC/nvptx libgfortran 'configure' output as follows:

[...]
checking for ccosf... yes
checking for ccos... yes
checking for ccosl... [-no-]{+yes+}
[...]

..., and correspondingly GCC/nvptx 'nvptx-none/libgfortran/config.h' as
follows:

[...]
 /* Define to 1 if you have the `ccosl' function. */
-/* #undef HAVE_CCOSL */
+#define HAVE_CCOSL 1
[...]

Similarly for 'ccoshl', 'cexpl', 'cpowl', 'csinl', 'csinhl', 'ctanl', 'ctanhl',
'cacoshl', 'cacosl', 'casinhl', 'catanhl'.  ('conjl', 'cprojl' are not
currently being used in libgfortran.)

This in turn simplifies GCC/nvptx 'libgfortran/intrinsics/c99_functions.c'
compilation such that this files doesn't have to provide its own
"Implementation of various C99 functions" for those, when in fact they're
available in newlib libm.
---

A few more words on why this is relevant for GCC.

For example, 'cexpl' usually is provided by libm, but if it isn't, the
open-coded replacement function in
'libgfortran/intrinsics/c99_functions.c' is effective if it holds that
'defined(HAVE_COSL) && defined(HAVE_SINL) && defined(HAVE_EXPL)':

long double complex
cexpl (long double complex z)
{
  long double a, b;
  long double complex v;

  a = REALPART (z);
  b = IMAGPART (z);
  COMPLEX_ASSIGN (v, cosl (b), sinl (b));
  return expl (a) * v;
}

This replacement code is active for current GCC/nvptx (... if no longer
compiling GCC/nvptx libgfortran in "minimal" mode, 'LIBGFOR_MINIMAL',
which I'm currently working on).

Comparing the preceeding to the 'c99_functions.c.188t.sincos' dump, we see for
that function:

 __attribute__((nothrow, leaf, const))
 complex long double cexpl (complex long double z)
 {
   long double b;
   long double a;
   long double _1;
   long double _2;
   long double _4;
   long double _5;
   long double _11;
+  complex long double sincostmp_13;

[local count: 1073741824]:
   a_7 = REALPART_EXPR ;
   b_8 = IMAGPART_EXPR ;
-  _1 = cosl (b_8);
-  _2 = sinl (b_8);
+  sincostmp_13 = __builtin_cexpil (b_8);
+  _1 = REALPART_EXPR ;
+  _2 = IMAGPART_EXPR ;
   _11 = expl (a_7);
   _4 = _1 * _11;
   _5 = _2 * _11;
   REALPART_EXPR <> = _4;
   IMAGPART_EXPR <> = _5;
   return ;

 }

That is, the 'cosl (b)', 'sinl (b)' sequence is replaced by
'__builtin_cexpil'.  That '__builtin_cexpil' is then later mapped back
into: 'cexpl'.  We've now got an infinitely-recursive 'cexpl' replacement
function, "implemented via itself"; GCC/nvptx libgfortran assumes there
is no 'cexpl' in libm, whereas this 'sincos' transformation does assume
that there is.  (..., which looks like an additional bug on its own.)

At the PTX-level, this leads to the following:

[...]
// BEGIN GLOBAL FUNCTION DECL: cexpl
.visible .func cexpl (.param.u64 %in_ar0, .param.f64 %in_ar1, .param.f64 
%in_ar2);

// BEGIN GLOBAL FUNCTION DEF: cexpl
.visible .func cexpl (.param.u64 %in_ar0, .param.f64 %in_ar1, .param.f64 
%in_ar2)
{
[...]
call cexpl, (%out_arg1, %out_arg2, %out_arg3);
[...]
ret;
}

[...]
// BEGIN GLOBAL FUNCTION DECL: cexpl
.extern .func cexpl (.param.u64 %in_ar0, .param.f64 %in_ar1, .param.f64 
%in_ar2);
[...]

We see the '.visible .func cexpl' declaration and definition for the
libgfortran replacement function and in the same compilation unit also
the '.extern .func cexpl' declaration that implicitly gets introduced via
the 'sincos' transformation (via the GCC/nvptx back end emitting an
explicit declaration of any function referenced), and 'ptxas' then
(rightfully so) complains about that mismatch:

ptxas c99_functions.o, line 35; error   : Inconsistent redefinition of 
variable 'cexpl'
ptxas fatal   : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status
make[2]: *** [c99_functions.lo] Error 1

---
 newlib/libc/include/complex.h | 35 ---
 1 file changed, 16 insertions(+), 19 deletions(-)

diff --git a/newlib/libc/include/complex.h b/newlib/libc/include/complex.h
index 0a3ea97ed..ad3028e4c 100644
--- a/newlib/libc/include/complex.h
+++ b/newlib/libc/include/complex.h
@@ -20,6 +20,7 @@ __BEGIN_DECLS
 /* 7.3.5.1 The cacos functions */
 double complex cacos(double complex);
 float complex cacosf(float complex);
+long double complex cacosl(long double complex);

 /* 7.3.5.2 The casin functions */
 double complex casin(double complex);
@@ -34,44 +35,54 @@ long double complex catanl(long double complex);
 /* 

[Patch Arm] Fix PR 92999

2022-11-08 Thread Ramana Radhakrishnan via Gcc-patches
PR92999 is a case where the VFP calling convention does not allocate
enough FP registers for a homogenous aggregate containing FP16 values.
I believe this is the complete fix but would appreciate another set of
eyes on this.

Could I get a hand with a regression test run on an armhf environment
while I fix my environment ?

gcc/ChangeLog:

PR target/92999
*  config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to handle
aggregates with elements smaller than SFmode.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr92999.c: New test.


Thanks,
Ramana

Signed-off-by: Ramana Radhakrishnan 
---
 gcc/config/arm/arm.cc  |  6 -
 gcc/testsuite/gcc.target/arm/pr92999.c | 31 ++
 2 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr92999.c

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 2eb4d51e4a3..03f4057f717 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -6740,7 +6740,11 @@ aapcs_vfp_allocate_return_reg (enum arm_pcs pcs_variant 
ATTRIBUTE_UNUSED,
  count *= 2;
}
}
-  shift = GET_MODE_SIZE(ag_mode) / GET_MODE_SIZE(SFmode);
+
+  /* Aggregates can contain FP16 or BF16 values which would need to
+be passed in via FP registers.  */
+  shift = (MAX(GET_MODE_SIZE(ag_mode), GET_MODE_SIZE(SFmode))
+  / GET_MODE_SIZE(SFmode));
   par = gen_rtx_PARALLEL (mode, rtvec_alloc (count));
   for (i = 0; i < count; i++)
{
diff --git a/gcc/testsuite/gcc.target/arm/pr92999.c 
b/gcc/testsuite/gcc.target/arm/pr92999.c
new file mode 100644
index 000..faa21fdb7d2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr92999.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "-mfp16-format=ieee" } */
+
+//
+// Compile with gcc -mfp16-format=ieee
+// Any optimization level is fine.
+//
+// Correct output should be
+// "y.first = 1, y.second = -99"
+//
+// Buggy output is
+// "y.first = -99, y.second = -99"
+//
+#include 
+struct phalf {
+__fp16 first;
+__fp16 second;
+};
+
+struct phalf phalf_copy(struct phalf* src) __attribute__((noinline));
+struct phalf phalf_copy(struct phalf* src) {
+return *src;
+}
+
+int main() {
+struct phalf x = { 1.0, -99.0};
+struct phalf y = phalf_copy();
+if (y.first != 1.0 && y.second != -99.0)
+   abort();
+return 0;
+}
-- 
2.34.1


Re: [PATCH] [PR24021] Implement PLUS_EXPR range-op entry for floats.

2022-11-08 Thread Andrew Waterman
On Tue, Nov 8, 2022 at 10:11 AM Jakub Jelinek  wrote:
>
> On Tue, Nov 08, 2022 at 09:44:40AM -0800, Andrew Waterman wrote:
> > On Tue, Nov 8, 2022 at 3:20 AM Jakub Jelinek via Gcc-patches
> >  wrote:
> > >
> > > On Mon, Nov 07, 2022 at 04:41:23PM +0100, Aldy Hernandez wrote:
> > > > As suggested upthread, I have also adjusted update_nan_sign() to drop
> > > > the NAN sign to VARYING if both operands are NAN.  As an optimization
> > > > I keep the sign if both operands are NAN and have the same sign.
> > >
> > > For NaNs this still relies on something IEEE754 doesn't guarantee,
> > > as I cited, after a binary operation the sign bit of the NaN is
> > > unspecified, whether there is one NaN operand or two.
> > > It might be that all CPUs handle it the way you've implemented
> > > (that for one NaN operand the sign of NaN result will be the same
> > > as that NaN operand and for two it will be the sign of one of the two
> > > NaNs operands, never something else), but I think we'd need to check
> > > more than one implementation for that (I've only tried x86_64 and thus
> > > SSE behavior in it), so one would need to test i387 long double behavior
> > > too, ARM/AArch64, PowerPC, s390{,x}, RISCV, ...
> > > The guarantee given by IEEE754 is only for those copy, negate, abs, 
> > > copySign
> > > operations, so copying values around, NEG_EXPR, ABS_EXPR, __builtin_fabs*,
> > > __builtin_copysign*.
> >
> > FWIW, RISC-V canonicalizes NaNs by clearing the sign bit; the signs of
> > the input NaNs do not factor in.
>
> Just for binary operations and some unary, or also the ones that
> IEEE754 spells out (moves, negations, absolute value and copysign)?

I should've been more specific in my earlier email: I was referring to
the arithmetic operators.  Copysign and friends do not canonicalize
NaNs.

>
> Jakub
>


Re: [PATCH] [PR24021] Implement PLUS_EXPR range-op entry for floats.

2022-11-08 Thread Jakub Jelinek via Gcc-patches
On Tue, Nov 08, 2022 at 09:44:40AM -0800, Andrew Waterman wrote:
> On Tue, Nov 8, 2022 at 3:20 AM Jakub Jelinek via Gcc-patches
>  wrote:
> >
> > On Mon, Nov 07, 2022 at 04:41:23PM +0100, Aldy Hernandez wrote:
> > > As suggested upthread, I have also adjusted update_nan_sign() to drop
> > > the NAN sign to VARYING if both operands are NAN.  As an optimization
> > > I keep the sign if both operands are NAN and have the same sign.
> >
> > For NaNs this still relies on something IEEE754 doesn't guarantee,
> > as I cited, after a binary operation the sign bit of the NaN is
> > unspecified, whether there is one NaN operand or two.
> > It might be that all CPUs handle it the way you've implemented
> > (that for one NaN operand the sign of NaN result will be the same
> > as that NaN operand and for two it will be the sign of one of the two
> > NaNs operands, never something else), but I think we'd need to check
> > more than one implementation for that (I've only tried x86_64 and thus
> > SSE behavior in it), so one would need to test i387 long double behavior
> > too, ARM/AArch64, PowerPC, s390{,x}, RISCV, ...
> > The guarantee given by IEEE754 is only for those copy, negate, abs, copySign
> > operations, so copying values around, NEG_EXPR, ABS_EXPR, __builtin_fabs*,
> > __builtin_copysign*.
> 
> FWIW, RISC-V canonicalizes NaNs by clearing the sign bit; the signs of
> the input NaNs do not factor in.

Just for binary operations and some unary, or also the ones that
IEEE754 spells out (moves, negations, absolute value and copysign)?

Jakub



[PATCH 2/3] Define __LIBGCC_DWARF_REG_SIZES_CONSTANT__ if DWARF register size is constant

2022-11-08 Thread Florian Weimer via Gcc-patches
And use that to speed up the libgcc unwinder.

* gcc/debug.h (dwarf_reg_sizes_constant): Declare.
* gcc/dwarf2cfi.cc (dwarf_reg_sizes_constant): New function.
* gcc/c-family/c-cppbuiltin.c
(__LIBGCC_DWARF_REG_SIZES_CONSTANT__): Define if constant is
known.

libgcc/

* unwind-dw2.c (dwarf_reg_size): New function.
(_Unwind_GetGR, _Unwind_SetGR, _Unwind_SetGRPtr)
(_Unwind_SetSpColumn, uw_install_context_1): Use it.
(uw_init_context_1): Do not initialize dwarf_reg_size_table
if not in use.
---
 gcc/c-family/c-cppbuiltin.cc |  6 ++
 gcc/debug.h  |  2 ++
 gcc/dwarf2cfi.cc | 23 
 libgcc/unwind-dw2.c  | 41 +---
 4 files changed, 60 insertions(+), 12 deletions(-)

diff --git a/gcc/c-family/c-cppbuiltin.cc b/gcc/c-family/c-cppbuiltin.cc
index cdb658f6ac9..ab98bf3b059 100644
--- a/gcc/c-family/c-cppbuiltin.cc
+++ b/gcc/c-family/c-cppbuiltin.cc
@@ -1515,6 +1515,12 @@ c_cpp_builtins (cpp_reader *pfile)
 #endif
   builtin_define_with_int_value ("__LIBGCC_DWARF_FRAME_REGISTERS__",
 DWARF_FRAME_REGISTERS);
+  {
+   int value = dwarf_reg_sizes_constant ();
+   if (value > 0)
+ builtin_define_with_int_value ("__LIBGCC_DWARF_REG_SIZES_CONSTANT__",
+value);
+  }
 #ifdef EH_RETURN_STACKADJ_RTX
   cpp_define (pfile, "__LIBGCC_EH_RETURN_STACKADJ_RTX__");
 #endif
diff --git a/gcc/debug.h b/gcc/debug.h
index fe85115d5f3..6bcc8da1f76 100644
--- a/gcc/debug.h
+++ b/gcc/debug.h
@@ -245,6 +245,8 @@ extern const struct gcc_debug_hooks vmsdbg_debug_hooks;
 
 /* Dwarf2 frame information.  */
 
+extern int dwarf_reg_sizes_constant ();
+
 extern void dwarf2out_begin_prologue (unsigned int, unsigned int,
  const char *);
 extern void dwarf2out_vms_end_prologue (unsigned int, const char *);
diff --git a/gcc/dwarf2cfi.cc b/gcc/dwarf2cfi.cc
index b29173b2156..d45d20478b4 100644
--- a/gcc/dwarf2cfi.cc
+++ b/gcc/dwarf2cfi.cc
@@ -334,6 +334,29 @@ generate_dwarf_reg_sizes (poly_uint16 *sizes)
 targetm.init_dwarf_reg_sizes_extra (sizes);
 }
 
+/* Return 0 if the DWARF register sizes are not constant, otherwise
+   return the size constant.  */
+
+int
+dwarf_reg_sizes_constant ()
+{
+  poly_uint16 *sizes = XALLOCAVEC (poly_uint16, DWARF_FRAME_REGISTERS);
+  generate_dwarf_reg_sizes (sizes);
+
+  int result;
+  for (unsigned int i = 0; i < DWARF_FRAME_REGISTERS; i++)
+{
+  unsigned short value;
+  if (!sizes[i].is_constant ())
+   return 0;
+  if (i == 0)
+   result = value;
+  else if (result != value)
+   return 0;
+}
+  return result;
+}
+
 /* Generate code to initialize the dwarf register size table located
at the provided ADDRESS.  */
 
diff --git a/libgcc/unwind-dw2.c b/libgcc/unwind-dw2.c
index eaceace2029..c370121bb29 100644
--- a/libgcc/unwind-dw2.c
+++ b/libgcc/unwind-dw2.c
@@ -148,9 +148,25 @@ struct _Unwind_Context
   char by_value[__LIBGCC_DWARF_FRAME_REGISTERS__+1];
 };
 
+#ifdef __LIBGCC_DWARF_REG_SIZES_CONSTANT__
+static inline unsigned char
+dwarf_reg_size (int index __attribute__ ((__unused__)))
+{
+  return __LIBGCC_DWARF_REG_SIZES_CONSTANT__;
+}
+#else
 /* Byte size of every register managed by these routines.  */
 static unsigned char dwarf_reg_size_table[__LIBGCC_DWARF_FRAME_REGISTERS__+1];
 
+
+static inline unsigned char
+dwarf_reg_size (unsigned index)
+{
+  gcc_assert (index < sizeof (dwarf_reg_size_table));
+  return dwarf_reg_size_table[index];
+}
+#endif
+
 
 /* Read unaligned data from the instruction buffer.  */
 
@@ -232,8 +248,7 @@ _Unwind_GetGR (struct _Unwind_Context *context, int regno)
 #endif
 
   index = DWARF_REG_TO_UNWIND_COLUMN (regno);
-  gcc_assert (index < (int) sizeof(dwarf_reg_size_table));
-  size = dwarf_reg_size_table[index];
+  size = dwarf_reg_size (index);
   val = context->reg[index];
 
   if (_Unwind_IsExtendedContext (context) && context->by_value[index])
@@ -280,8 +295,7 @@ _Unwind_SetGR (struct _Unwind_Context *context, int index, 
_Unwind_Word val)
   void *ptr;
 
   index = DWARF_REG_TO_UNWIND_COLUMN (index);
-  gcc_assert (index < (int) sizeof(dwarf_reg_size_table));
-  size = dwarf_reg_size_table[index];
+  size = dwarf_reg_size (index);
 
   if (_Unwind_IsExtendedContext (context) && context->by_value[index])
 {
@@ -329,9 +343,8 @@ _Unwind_SetGRValue (struct _Unwind_Context *context, int 
index,
_Unwind_Word val)
 {
   index = DWARF_REG_TO_UNWIND_COLUMN (index);
-  gcc_assert (index < (int) sizeof(dwarf_reg_size_table));
   /* Return column size may be smaller than _Unwind_Context_Reg_Val.  */
-  gcc_assert (dwarf_reg_size_table[index] <= sizeof (_Unwind_Context_Reg_Val));
+  gcc_assert (dwarf_reg_size (index) <= sizeof (_Unwind_Context_Reg_Val));
 
   context->by_value[index] = 1;
   context->reg[index] = 

[PATCH 3/3] libgcc: Specialize execute_cfa_program in DWARF unwinder for alignments

2022-11-08 Thread Florian Weimer via Gcc-patches
The parameters fs->data_align and fs->code_align always have fixed
values for a particular target in GCC-generated code.  Specialize
execute_cfa_program for these values, to avoid multiplications.

gcc/

* c-family/c-cppbuiltin.c (c_cpp_builtins): Define
__LIBGCC_DWARF_CIE_DATA_ALIGNMENT__.

libgcc/

* unwind-dw2-execute_cfa.h: New file.  Extracted from
the execute_cfa_program function in unwind-dw2.c.
* unwind-dw2.c (execute_cfa_program_generic): New function.
(execute_cfa_program_specialized): Likewise.
(execute_cfa_program): Call execute_cfa_program_specialized
or execute_cfa_program_generic, as appropriate.
---
 gcc/c-family/c-cppbuiltin.cc|   2 +
 libgcc/unwind-dw2-execute_cfa.h | 322 
 libgcc/unwind-dw2.c | 319 +++
 3 files changed, 354 insertions(+), 289 deletions(-)
 create mode 100644 libgcc/unwind-dw2-execute_cfa.h

diff --git a/gcc/c-family/c-cppbuiltin.cc b/gcc/c-family/c-cppbuiltin.cc
index ab98bf3b059..c8c327b3b2e 100644
--- a/gcc/c-family/c-cppbuiltin.cc
+++ b/gcc/c-family/c-cppbuiltin.cc
@@ -1521,6 +1521,8 @@ c_cpp_builtins (cpp_reader *pfile)
  builtin_define_with_int_value ("__LIBGCC_DWARF_REG_SIZES_CONSTANT__",
 value);
   }
+  builtin_define_with_int_value ("__LIBGCC_DWARF_CIE_DATA_ALIGNMENT__",
+DWARF_CIE_DATA_ALIGNMENT);
 #ifdef EH_RETURN_STACKADJ_RTX
   cpp_define (pfile, "__LIBGCC_EH_RETURN_STACKADJ_RTX__");
 #endif
diff --git a/libgcc/unwind-dw2-execute_cfa.h b/libgcc/unwind-dw2-execute_cfa.h
new file mode 100644
index 000..dd97b786668
--- /dev/null
+++ b/libgcc/unwind-dw2-execute_cfa.h
@@ -0,0 +1,322 @@
+/* DWARF2 exception handling CFA execution engine.
+   Copyright (C) 1997-2022 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* This file is included from unwind-dw2.c to specialize the code for certain
+   values of DATA_ALIGN and CODE_ALIGN.  These macros must be defined prior to
+   including this file.  */
+
+{
+  struct frame_state_reg_info *unused_rs = NULL;
+
+  /* Don't allow remember/restore between CIE and FDE programs.  */
+  fs->regs.prev = NULL;
+
+  /* The comparison with the return address uses < rather than <= because
+ we are only interested in the effects of code before the call; for a
+ noreturn function, the return address may point to unrelated code with
+ a different stack configuration that we are not interested in.  We
+ assume that the call itself is unwind info-neutral; if not, or if
+ there are delay instructions that adjust the stack, these must be
+ reflected at the point immediately before the call insn.
+ In signal frames, return address is after last completed instruction,
+ so we add 1 to return address to make the comparison <=.  */
+  while (insn_ptr < insn_end
+&& fs->pc < context->ra + _Unwind_IsSignalFrame (context))
+{
+  unsigned char insn = *insn_ptr++;
+  _uleb128_t reg, utmp;
+  _sleb128_t offset, stmp;
+
+  if ((insn & 0xc0) == DW_CFA_advance_loc)
+   fs->pc += (insn & 0x3f) * CODE_ALIGN;
+  else if ((insn & 0xc0) == DW_CFA_offset)
+   {
+ reg = insn & 0x3f;
+ insn_ptr = read_uleb128 (insn_ptr, );
+ offset = (_Unwind_Sword) utmp * DATA_ALIGN;
+ reg = DWARF_REG_TO_UNWIND_COLUMN (reg);
+ if (UNWIND_COLUMN_IN_RANGE (reg))
+   {
+ fs->regs.how[reg] = REG_SAVED_OFFSET;
+ fs->regs.reg[reg].loc.offset = offset;
+   }
+   }
+  else if ((insn & 0xc0) == DW_CFA_restore)
+   {
+ reg = insn & 0x3f;
+ reg = DWARF_REG_TO_UNWIND_COLUMN (reg);
+ if (UNWIND_COLUMN_IN_RANGE (reg))
+   fs->regs.how[reg] = REG_UNSAVED;
+   }
+  else switch (insn)
+   {
+   case DW_CFA_set_loc:
+ {
+   _Unwind_Ptr pc;
+
+   insn_ptr = read_encoded_value (context, 

[PATCH 0/3] Further libgcc unwinder improvements

2022-11-08 Thread Florian Weimer via Gcc-patches
This series makes some further unwinder improvements.  Unfortunately,
not many targets define __LIBGCC_DWARF_REG_SIZES_CONSTANT__; x86-64
does, and it makes uw_install_context_1 quite a bit faster because GCC
no longer has to emit generic memcpy code for it.  In general, it may be
worthwhile to replace this code with target-specific implementations.

Tested on powerpc64le-linux-gnu, x86_64-linux-gnu; I didn't see any test
result differences.  Built GCC for msp430-elf, too.

The revision for the patch I posted earlier (using SWAR techniques for
get_cie_encoding) is not ready yet and probably won't make GCC 13.  It
requires some header cleanups first.

Thanks,
Florian

Florian Weimer (3):
  Compute a table of DWARF register sizes at compile
  Define __LIBGCC_DWARF_REG_SIZES_CONSTANT__ if DWARF register size is
constant
  libgcc: Specialize execute_cfa_program in DWARF unwinder for
alignments

 gcc/c-family/c-cppbuiltin.cc|   8 +
 gcc/config/msp430/msp430.cc |  11 +-
 gcc/config/rs6000/rs6000.cc |  14 +-
 gcc/debug.h |   2 +
 gcc/doc/tm.texi |   7 +-
 gcc/dwarf2cfi.cc| 116 +-
 gcc/target.def  |   8 +-
 libgcc/unwind-dw2-execute_cfa.h | 322 
 libgcc/unwind-dw2.c | 360 ++--
 9 files changed, 472 insertions(+), 376 deletions(-)
 create mode 100644 libgcc/unwind-dw2-execute_cfa.h


base-commit: 5d060d8b0477ff4911f41c816281daaa24b41a13
-- 
2.38.1



[PATCH 1/3] Compute a table of DWARF register sizes at compile

2022-11-08 Thread Florian Weimer via Gcc-patches
The sizes are compile-time constants.  Create a vector with them,
so that they can be inspected at compile time.

* gcc/dwarf2cfi.cc (init_return_column_size): Remove.
(init_one_dwarf_reg_size): Adjust.
(generate_dwarf_reg_sizes): New function.  Extracted
from expand_builtin_init_dwarf_reg_sizes.
(expand_builtin_init_dwarf_reg_sizes): Call
generate_dwarf_reg_sizes.
* gcc/target.def (init_dwarf_reg_sizes_extra): Adjust
hook signature.
* gcc/config/msp430/msp430.cc
(msp430_init_dwarf_reg_sizes_extra): Adjust.
* gcc/config/rs6000.cc (rs6000_init_dwarf_reg_sizes_extra):
Likewise.
* gcc/doc/tm.texi: Update.
---
 gcc/config/msp430/msp430.cc | 11 +
 gcc/config/rs6000/rs6000.cc | 14 +-
 gcc/doc/tm.texi |  7 +--
 gcc/dwarf2cfi.cc| 93 ++---
 gcc/target.def  |  8 ++--
 5 files changed, 58 insertions(+), 75 deletions(-)

diff --git a/gcc/config/msp430/msp430.cc b/gcc/config/msp430/msp430.cc
index 6c15780a2b6..dbea8d7f50f 100644
--- a/gcc/config/msp430/msp430.cc
+++ b/gcc/config/msp430/msp430.cc
@@ -3202,11 +3202,9 @@ msp430_expand_eh_return (rtx eh_handler)
 #undef  TARGET_INIT_DWARF_REG_SIZES_EXTRA
 #define TARGET_INIT_DWARF_REG_SIZES_EXTRA msp430_init_dwarf_reg_sizes_extra
 void
-msp430_init_dwarf_reg_sizes_extra (tree address)
+msp430_init_dwarf_reg_sizes_extra (poly_uint16 *sizes)
 {
   int i;
-  rtx addr = expand_normal (address);
-  rtx mem = gen_rtx_MEM (BLKmode, addr);
 
   /* This needs to match msp430_unwind_word_mode (above).  */
   if (!msp430x)
@@ -3218,12 +3216,7 @@ msp430_init_dwarf_reg_sizes_extra (tree address)
   unsigned int rnum = DWARF2_FRAME_REG_OUT (dnum, 1);
 
   if (rnum < DWARF_FRAME_REGISTERS)
-   {
- HOST_WIDE_INT offset = rnum * GET_MODE_SIZE (QImode);
-
- emit_move_insn (adjust_address (mem, QImode, offset),
- gen_int_mode (4, QImode));
-   }
+   sizes[rnum] = 4;
 }
 }
 
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index a85d7630b41..fddb6a8a0f7 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -23783,27 +23783,17 @@ rs6000_initial_elimination_offset (int from, int to)
 /* Fill in sizes of registers used by unwinder.  */
 
 static void
-rs6000_init_dwarf_reg_sizes_extra (tree address)
+rs6000_init_dwarf_reg_sizes_extra (poly_uint16 *sizes)
 {
   if (TARGET_MACHO && ! TARGET_ALTIVEC)
 {
   int i;
-  machine_mode mode = TYPE_MODE (char_type_node);
-  rtx addr = expand_expr (address, NULL_RTX, VOIDmode, EXPAND_NORMAL);
-  rtx mem = gen_rtx_MEM (BLKmode, addr);
-  rtx value = gen_int_mode (16, mode);
 
   /* On Darwin, libgcc may be built to run on both G3 and G4/5.
 The unwinder still needs to know the size of Altivec registers.  */
 
   for (i = FIRST_ALTIVEC_REGNO; i < LAST_ALTIVEC_REGNO+1; i++)
-   {
- int column = DWARF_REG_TO_UNWIND_COLUMN
-   (DWARF2_FRAME_REG_OUT (DWARF_FRAME_REGNUM (i), true));
- HOST_WIDE_INT offset = column * GET_MODE_SIZE (mode);
-
- emit_move_insn (adjust_address (mem, mode, offset), value);
-   }
+   sizes[i] = 16;
 }
 }
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 8572313b308..09a3ab3e55c 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -9824,13 +9824,14 @@ used to return a smaller mode than the raw mode to 
prevent call
 clobbered parts of a register altering the frame register size
 @end deftypefn
 
-@deftypefn {Target Hook} void TARGET_INIT_DWARF_REG_SIZES_EXTRA (tree 
@var{address})
+@deftypefn {Target Hook} void TARGET_INIT_DWARF_REG_SIZES_EXTRA (poly_uint16 
*@var{sizes})
 If some registers are represented in Dwarf-2 unwind information in
 multiple pieces, define this hook to fill in information about the
 sizes of those pieces in the table used by the unwinder at runtime.
-It will be called by @code{expand_builtin_init_dwarf_reg_sizes} after
+It will be called by @code{generate_dwarf_reg_sizes} after
 filling in a single size corresponding to each hard register;
-@var{address} is the address of the table.
+@var{sizes} is the address of the table.  It will contain
+@code{DWARF_FRAME_REGISTERS} elements when this hook is called.
 @end deftypefn
 
 @deftypefn {Target Hook} bool TARGET_ASM_TTYPE (rtx @var{sym})
diff --git a/gcc/dwarf2cfi.cc b/gcc/dwarf2cfi.cc
index bef3165e691..b29173b2156 100644
--- a/gcc/dwarf2cfi.cc
+++ b/gcc/dwarf2cfi.cc
@@ -36,7 +36,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #include "except.h"/* expand_builtin_dwarf_sp_column */
 #include "profile-count.h" /* For expr.h */
-#include "expr.h"  /* init_return_column_size */
+#include "expr.h"  /* expand_normal, emit_move_insn */
 #include "output.h"/* asm_out_file */
 #include "debug.h" /* dwarf2out_do_frame, 

[committed] libstdc++: Fix -Wsystem-headers warnings in tests

2022-11-08 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* testsuite/18_support/new_nothrow.cc: Add missing noexcept
to operator delete replacements.
* testsuite/20_util/any/cons/92156.cc: Disable
-Winit-list-lifetime warnings from instantiating invalid
specialization of manager function.
* testsuite/20_util/any/modifiers/92156.cc: Likewise.
* testsuite/20_util/default_delete/void_neg.cc: Prune additional
diagnostics.
* testsuite/20_util/headers/memory/synopsis.cc: Add missing
noexcept.
* testsuite/20_util/shared_ptr/cons/void_neg.cc: Prune
additional diagnostic.
* testsuite/20_util/unique_ptr/creation/for_overwrite.cc: Add
missing noexcept to operator delete replacements.
* testsuite/21_strings/basic_string/cons/char/103919.cc:
Likewise.
* testsuite/23_containers/map/modifiers/emplace/92300.cc:
Likewise.
* testsuite/23_containers/map/modifiers/insert/92300.cc:
Likewise.
* testsuite/24_iterators/headers/iterator/range_access_c++11.cc:
Add missing noexcept to synopsis declarations.
* testsuite/24_iterators/headers/iterator/range_access_c++14.cc:
Likewise.
* testsuite/24_iterators/headers/iterator/range_access_c++17.cc:
Likewise.
---
 libstdc++-v3/testsuite/18_support/new_nothrow.cc   | 14 ++
 libstdc++-v3/testsuite/20_util/any/cons/92156.cc   |  1 +
 .../testsuite/20_util/any/modifiers/92156.cc   |  1 +
 .../testsuite/20_util/default_delete/void_neg.cc   |  3 +++
 .../testsuite/20_util/headers/memory/synopsis.cc   |  2 +-
 .../testsuite/20_util/shared_ptr/cons/void_neg.cc  |  2 ++
 .../20_util/unique_ptr/creation/for_overwrite.cc   |  4 ++--
 .../21_strings/basic_string/cons/char/103919.cc|  4 ++--
 .../23_containers/map/modifiers/emplace/92300.cc   |  4 ++--
 .../23_containers/map/modifiers/insert/92300.cc|  4 ++--
 .../headers/iterator/range_access_c++11.cc |  4 ++--
 .../headers/iterator/range_access_c++14.cc | 12 ++--
 .../headers/iterator/range_access_c++17.cc | 12 ++--
 13 files changed, 40 insertions(+), 27 deletions(-)

diff --git a/libstdc++-v3/testsuite/18_support/new_nothrow.cc 
b/libstdc++-v3/testsuite/18_support/new_nothrow.cc
index d5e7eb58782..37806122bd0 100644
--- a/libstdc++-v3/testsuite/18_support/new_nothrow.cc
+++ b/libstdc++-v3/testsuite/18_support/new_nothrow.cc
@@ -64,7 +64,13 @@ void* operator new (size_t n)
 }
 }
 
-void operator delete (void *p)
+#if __cplusplus >= 201103L
+#define NOEXCEPT noexcept
+#else
+#define NOEXCEPT
+#endif
+
+void operator delete (void *p) NOEXCEPT
 {
 ++delete_called;
 if (p)
@@ -77,18 +83,18 @@ void* operator new[] (size_t n)
 return operator new(n);
 }
 
-void operator delete[] (void *p)
+void operator delete[] (void *p) NOEXCEPT
 {
 ++delete_vec_called;
 operator delete(p);
 }
 
 #if __cplusplus >= 201402L
-void operator delete (void *p, std::size_t)
+void operator delete (void *p, std::size_t) noexcept
 {
   ::operator delete(p);
 }
-void operator delete[] (void *p, std::size_t)
+void operator delete[] (void *p, std::size_t) noexcept
 {
   ::operator delete[](p);
 }
diff --git a/libstdc++-v3/testsuite/20_util/any/cons/92156.cc 
b/libstdc++-v3/testsuite/20_util/any/cons/92156.cc
index 71e9dd94090..0e768df9a00 100644
--- a/libstdc++-v3/testsuite/20_util/any/cons/92156.cc
+++ b/libstdc++-v3/testsuite/20_util/any/cons/92156.cc
@@ -1,4 +1,5 @@
 // { dg-do run { target c++17 } }
+// { dg-options "-Wno-init-list-lifetime" }
 
 // Copyright (C) 2020-2022 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/20_util/any/modifiers/92156.cc 
b/libstdc++-v3/testsuite/20_util/any/modifiers/92156.cc
index d8f9893667b..b98d0e8e92a 100644
--- a/libstdc++-v3/testsuite/20_util/any/modifiers/92156.cc
+++ b/libstdc++-v3/testsuite/20_util/any/modifiers/92156.cc
@@ -1,4 +1,5 @@
 // { dg-do run { target c++17 } }
+// { dg-options "-Wno-init-list-lifetime" }
 
 // Copyright (C) 2020-2022 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/20_util/default_delete/void_neg.cc 
b/libstdc++-v3/testsuite/20_util/default_delete/void_neg.cc
index f6aefc0a7ff..04042c2d745 100644
--- a/libstdc++-v3/testsuite/20_util/default_delete/void_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/default_delete/void_neg.cc
@@ -27,3 +27,6 @@ void test01()
   d(nullptr);   // { dg-error "here" }
   // { dg-error "delete pointer to incomplete type" "" { target *-*-* } 0 }
 }
+
+// { dg-prune-output "invalid application of 'sizeof' to a void type" }
+// { dg-prune-output "deleting 'void*' is undefined" }
diff --git a/libstdc++-v3/testsuite/20_util/headers/memory/synopsis.cc 
b/libstdc++-v3/testsuite/20_util/headers/memory/synopsis.cc
index 15437c72ee0..b14c4278cd3 100644
--- a/libstdc++-v3/testsuite/20_util/headers/memory/synopsis.cc
+++ 

[committed] libstdc++: Fix -Wsystem-headers warnings

2022-11-08 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

Fix some problems noticed with -Wsystem-headers.

libstdc++-v3/ChangeLog:

* include/bits/stl_tempbuf.h (_Temporary_buffer): Disable
warnings about get_temporary_buffer being deprecated.
* include/ext/functional (mem_fun1, mem_fun1_ref): Disable
warnings about mem_fun1_t, const_mem_fun1_t, mem_fun1_ref_t and
const_mem_fun1_ref_t being deprecated.
* include/std/array (__array_traits): Remove artificial
attributes which give warnings about being ignored.
* include/std/spanstream (basic_spanbuf::setbuf): Add assertion
and adjust to avoid narrowing warning.
* libsupc++/exception_ptr.h [!__cpp_rtti && !__cpp_exceptions]
(make_exception_ptr): Add missing inline specifier.
---
 libstdc++-v3/include/bits/stl_tempbuf.h | 3 +++
 libstdc++-v3/include/ext/functional | 4 ++--
 libstdc++-v3/include/std/array  | 4 ++--
 libstdc++-v3/include/std/spanstream | 3 ++-
 libstdc++-v3/libsupc++/exception_ptr.h  | 2 +-
 5 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_tempbuf.h 
b/libstdc++-v3/include/bits/stl_tempbuf.h
index b13aa3b0fcc..f3d4dd73073 100644
--- a/libstdc++-v3/include/bits/stl_tempbuf.h
+++ b/libstdc++-v3/include/bits/stl_tempbuf.h
@@ -257,6 +257,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  __ucr(__first, __last, __seed);
 }
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
   template
 _Temporary_buffer<_ForwardIterator, _Tp>::
 _Temporary_buffer(_ForwardIterator __seed, size_type __original_len)
@@ -281,6 +283,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
}
 }
+#pragma GCC diagnostic pop
 
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
diff --git a/libstdc++-v3/include/ext/functional 
b/libstdc++-v3/include/ext/functional
index 9cf864d9290..a947ee6384d 100644
--- a/libstdc++-v3/include/ext/functional
+++ b/libstdc++-v3/include/ext/functional
@@ -396,8 +396,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { _M_initialize(161803398u); }
   };
 
-#pragma GCC diagnostic pop
-
   // Mem_fun adaptor helper functions mem_fun1 and mem_fun1_ref,
   // provided for backward compatibility, they are no longer part of
   // the C++ standard.
@@ -422,6 +420,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 mem_fun1_ref(_Ret (_Tp::*__f)(_Arg) const)
 { return std::const_mem_fun1_ref_t<_Ret, _Tp, _Arg>(__f); }
 
+#pragma GCC diagnostic pop
+
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
 
diff --git a/libstdc++-v3/include/std/array b/libstdc++-v3/include/std/array
index 7ba92d0e90d..e26390e6f80 100644
--- a/libstdc++-v3/include/std/array
+++ b/libstdc++-v3/include/std/array
@@ -64,11 +64,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  struct _Type
  {
// Indexing is undefined.
-   __attribute__((__always_inline__,__artificial__,__noreturn__))
+   __attribute__((__always_inline__,__noreturn__))
_Tp& operator[](size_t) const noexcept { __builtin_trap(); }
 
// Conversion to a pointer produces a null pointer.
-   __attribute__((__always_inline__,__artificial__))
+   __attribute__((__always_inline__))
operator _Tp*() const noexcept { return nullptr; }
  };
 
diff --git a/libstdc++-v3/include/std/spanstream 
b/libstdc++-v3/include/std/spanstream
index 6abf013d41b..483996b274f 100644
--- a/libstdc++-v3/include/std/spanstream
+++ b/libstdc++-v3/include/std/spanstream
@@ -136,7 +136,8 @@ template
 basic_streambuf<_CharT, _Traits>*
 setbuf(_CharT* __s, streamsize __n) override
 {
-  span({__s, __n});
+  __glibcxx_assert(__n >= 0);
+  this->span(std::span<_CharT>(__s, __n));
   return this;
 }
 
diff --git a/libstdc++-v3/libsupc++/exception_ptr.h 
b/libstdc++-v3/libsupc++/exception_ptr.h
index fd9ceec88d4..b0118102123 100644
--- a/libstdc++-v3/libsupc++/exception_ptr.h
+++ b/libstdc++-v3/libsupc++/exception_ptr.h
@@ -280,7 +280,7 @@ namespace std _GLIBCXX_VISIBILITY(default)
   // instead of a working one compiled with RTTI and/or exceptions enabled.
   template
 __attribute__ ((__always_inline__))
-exception_ptr
+inline exception_ptr
 make_exception_ptr(_Ex) _GLIBCXX_USE_NOEXCEPT
 { return exception_ptr(); }
 #endif
-- 
2.38.1



[committed] libstdc++: Add always_inline to most allocator functions

2022-11-08 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

This reduces the abstraction penalty for allocator support in
unoptimized code. Constructing and using allocators in containers calls
many one-line (or completely empty) inline functions. Those can all be
inlined to reduce code size and function call overhead for -O0.

libstdc++-v3/ChangeLog:

* include/bits/alloc_traits.h (allocator_traits): Add
always_inline attribute to all member functions.
(__do_alloc_on_copy, __alloc_on_copy, __do_alloc_on_move)
(__alloc_on_move, __do_alloc_on_swap, __alloc_on_swap)
(_Destroy(FwdIter, FwdIter, allocator&)): : Add
always_inline attribute.
* include/bits/allocator.h (allocator): Add always_inline
attribute to all member functions and equality operators.
* include/bits/new_allocator.h (__new_allocator): Likewise.
* include/ext/alloc_traits.h (__gnu_cxx::__alloc_traits):
Likewise.
---
 libstdc++-v3/include/bits/alloc_traits.h  | 40 ++-
 libstdc++-v3/include/bits/allocator.h | 13 ++--
 libstdc++-v3/include/bits/new_allocator.h | 13 ++--
 libstdc++-v3/include/ext/alloc_traits.h   | 21 ++--
 4 files changed, 72 insertions(+), 15 deletions(-)

diff --git a/libstdc++-v3/include/bits/alloc_traits.h 
b/libstdc++-v3/include/bits/alloc_traits.h
index 8479bfd612f..203988ab933 100644
--- a/libstdc++-v3/include/bits/alloc_traits.h
+++ b/libstdc++-v3/include/bits/alloc_traits.h
@@ -463,7 +463,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*
*  Calls @c a.allocate(n)
   */
-  _GLIBCXX_NODISCARD static _GLIBCXX20_CONSTEXPR pointer
+  [[__nodiscard__,__gnu__::__always_inline__]]
+  static _GLIBCXX20_CONSTEXPR pointer
   allocate(allocator_type& __a, size_type __n)
   { return __a.allocate(__n); }
 
@@ -477,7 +478,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*
*  Returns  a.allocate(n, hint) 
   */
-  _GLIBCXX_NODISCARD static _GLIBCXX20_CONSTEXPR pointer
+  [[__nodiscard__,__gnu__::__always_inline__]]
+  static _GLIBCXX20_CONSTEXPR pointer
   allocate(allocator_type& __a, size_type __n, const_void_pointer __hint)
   {
 #if __cplusplus <= 201703L
@@ -495,6 +497,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*
*  Calls  a.deallocate(p, n) 
   */
+  [[__gnu__::__always_inline__]]
   static _GLIBCXX20_CONSTEXPR void
   deallocate(allocator_type& __a, pointer __p, size_type __n)
   { __a.deallocate(__p, __n); }
@@ -511,6 +514,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  `std::construct_at(__p, std::forward<_Args>(__args)...)` instead.
   */
   template
+   [[__gnu__::__always_inline__]]
static _GLIBCXX20_CONSTEXPR void
construct(allocator_type& __a __attribute__((__unused__)), _Up* __p,
  _Args&&... __args)
@@ -531,6 +535,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  Calls @c __a.destroy(__p).
   */
   template
+   [[__gnu__::__always_inline__]]
static _GLIBCXX20_CONSTEXPR void
destroy(allocator_type& __a __attribute__((__unused__)), _Up* __p)
noexcept(is_nothrow_destructible<_Up>::value)
@@ -547,6 +552,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @param  __a  An allocator.
*  @return @c __a.max_size()
   */
+  [[__gnu__::__always_inline__]]
   static _GLIBCXX20_CONSTEXPR size_type
   max_size(const allocator_type& __a __attribute__((__unused__))) noexcept
   {
@@ -562,6 +568,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @param  __rhs  An allocator.
*  @return @c __rhs
   */
+  [[__gnu__::__always_inline__]]
   static _GLIBCXX20_CONSTEXPR allocator_type
   select_on_container_copy_construction(const allocator_type& __rhs)
   { return __rhs; }
@@ -633,6 +640,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  `std::construct_at(__p, std::forward<_Args>(__args)...)` instead.
   */
   template
+   [[__gnu__::__always_inline__]]
static _GLIBCXX20_CONSTEXPR void
construct(allocator_type&, _Up* __p, _Args&&... __args)
noexcept(std::is_nothrow_constructible<_Up, _Args...>::value)
@@ -646,6 +654,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  Invokes the destructor for `*__p`.
   */
   template
+   [[__gnu__::__always_inline__]]
static _GLIBCXX20_CONSTEXPR void
destroy(allocator_type&, _Up* __p)
noexcept(is_nothrow_destructible<_Up>::value)
@@ -660,6 +669,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @param  __rhs  An allocator.
*  @return `__rhs`
   */
+  [[__gnu__::__always_inline__]]
   static _GLIBCXX20_CONSTEXPR allocator_type
   select_on_container_copy_construction(const allocator_type& __rhs)
   { return __rhs; }
@@ -669,22 +679,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// @cond undocumented
 #if __cplusplus < 201703L
   template
+[[__gnu__::__always_inline__]]
 

Re: [PATCH v2] libstdc++: basic_filebuf: don't flush more often than necessary.

2022-11-08 Thread Jonathan Wakely via Gcc-patches
On Mon, 7 Nov 2022 at 17:00, Jonathan Wakely wrote:
>
> On Thu, 6 Oct 2022 at 20:03, Charles-Francois Natali via Libstdc++
>  wrote:
> >
> > `basic_filebuf::xsputn` would bypass the buffer when passed a chunk of
> > size 1024 and above, seemingly as an optimisation.
> >
> > This can have a significant performance impact if the overhead of a
> > `write` syscall is non-negligible, e.g. on a slow disk, on network
> > filesystems, or simply during IO contention because instead of flushing
> > every `BUFSIZ` (by default), we can flush every 1024 char.
> > The impact is even greater with custom larger buffers, e.g. for network
> > filesystems, because the code could issue `write` for example 1000X more
> > often than necessary with respect to the buffer size.
> > It also introduces a significant discontinuity in performance when
> > writing chunks of size 1024 and above.
> >
> > See this reproducer which writes down a fixed number of chunks to a file
> > open with `O_SYNC` - to replicate high-latency `write` - for varying
> > size of chunks:
> >
> > ```
> > $ cat test_fstream_flush.cpp
> >
> > int
> > main(int argc, char* argv[])
> > {
> >   assert(argc == 3);
> >
> >   const auto* path = argv[1];
> >   const auto chunk_size = std::stoul(argv[2]);
> >
> >   const auto fd =
> > open(path, O_CREAT | O_TRUNC | O_WRONLY | O_SYNC | O_CLOEXEC, 0666);
> >   assert(fd >= 0);
> >
> >   auto filebuf = __gnu_cxx::stdio_filebuf(fd, std::ios_base::out);
> >   auto stream = std::ostream();
> >
> >   const auto chunk = std::vector(chunk_size);
> >
> >   for (auto i = 0; i < 1'000; ++i) {
> > stream.write(chunk.data(), chunk.size());
> >   }
> >
> >   return 0;
> > }
> > ```
> >
> > ```
> > $ g++ -o /tmp/test_fstream_flush test_fstream_flush.cpp -std=c++17
> > $ for i in $(seq 1021 1025); do echo -e "\n$i"; time 
> > /tmp/test_fstream_flush /tmp/foo $i; done
> >
> > 1021
> >
> > real0m0.997s
> > user0m0.000s
> > sys 0m0.038s
> >
> > 1022
> >
> > real0m0.939s
> > user0m0.005s
> > sys 0m0.032s
> >
> > 1023
> >
> > real0m0.954s
> > user0m0.005s
> > sys 0m0.034s
> >
> > 1024
> >
> > real0m7.102s
> > user0m0.040s
> > sys 0m0.192s
> >
> > 1025
> >
> > real0m7.204s
> > user0m0.025s
> > sys 0m0.209s
> > ```
> >
> > See the huge drop in performance at the 1024-boundary.
>
> I've finally found time to properly look at this, sorry for the delay.
>
> I thought I was unable to reproduce these numbers, then I realised I'd
> already installed a build with the patch, so was measuring the patched
> performance for both my "before" and "after" tests. Oops!
>
> My concern is that the patch doesn't only affect files on remote
> filesystems. I assume the original 1024-byte chunking behaviour is
> there for a reason, because for large writes the performance might be
> better if we just write directly instead of buffering and then writing
> again. Assuming we have a fast disk, writing straight to disk avoids
> copying in and out of the buffer. But if we have a slow disk, it's
> better to buffer and reduce the total number of disk writes. I'm
> concerned that the patch optimizes the slow disk case potentially at a
> cost for the fast disk case.
>
> I wonder whether it would make sense to check whether the buffer size
> has been manually changed, i.e. epptr() - pbase() != _M_buf_size. If
> the buffer has been explicitly set by the user, then we should assume
> they really want it to be used and so don't bypass it for writes >=
> 1024.
>
> In the absence of a better idea, I think I'm going to commit the patch
> as-is. I don't see it causing any measurable slowdown for very large
> writes on fast disks, and it's certainly a huge improvement for slow
> disks.

The patch has been pushed to trunk now, thanks for the contribution.

I removed the testcase and results from the commit message as they
don't need to be in the git log. I added a link to your email into
bugzilla though, so we can still find it easily.



Re: [PATCH] [PR24021] Implement PLUS_EXPR range-op entry for floats.

2022-11-08 Thread Andrew Waterman
On Tue, Nov 8, 2022 at 3:20 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Mon, Nov 07, 2022 at 04:41:23PM +0100, Aldy Hernandez wrote:
> > As suggested upthread, I have also adjusted update_nan_sign() to drop
> > the NAN sign to VARYING if both operands are NAN.  As an optimization
> > I keep the sign if both operands are NAN and have the same sign.
>
> For NaNs this still relies on something IEEE754 doesn't guarantee,
> as I cited, after a binary operation the sign bit of the NaN is
> unspecified, whether there is one NaN operand or two.
> It might be that all CPUs handle it the way you've implemented
> (that for one NaN operand the sign of NaN result will be the same
> as that NaN operand and for two it will be the sign of one of the two
> NaNs operands, never something else), but I think we'd need to check
> more than one implementation for that (I've only tried x86_64 and thus
> SSE behavior in it), so one would need to test i387 long double behavior
> too, ARM/AArch64, PowerPC, s390{,x}, RISCV, ...
> The guarantee given by IEEE754 is only for those copy, negate, abs, copySign
> operations, so copying values around, NEG_EXPR, ABS_EXPR, __builtin_fabs*,
> __builtin_copysign*.

FWIW, RISC-V canonicalizes NaNs by clearing the sign bit; the signs of
the input NaNs do not factor in.

>
> Otherwise LGTM (but would be nice to get into GCC13 not just
> +, but also -, *, /, sqrt at least).
>
> Jakub
>


Re: [PATCH] libstdc++: Refactor implementation of operator+ for std::string

2022-11-08 Thread Jonathan Wakely via Gcc-patches
On Thu, 20 Oct 2022 at 01:06, Will Hawkins wrote:
>
> Sorry for the delay. Tested on x86-64 Linux.
>
> -->8--
>
> After consultation with Jonathan, it seemed like a good idea to create a
> single function that performed one-allocation string concatenation that
> could be used by various different version of operator+. This patch adds
> such a function and calls it from the relevant implementations.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/basic_string.h:
> Add common function that performs single-allocation string
> concatenation. (__str_cat)
> Use __str_cat to perform optimized operator+, where relevant.
> * include/bits/basic_string.tcc::
> Remove single-allocation implementation of operator+.
>
> Signed-off-by: Will Hawkins 

I've pushed this patch to trunk now. I changed the commit message
significantly though:

   libstdc++: Refactor implementation of operator+ for std::string

   Until now operator+(char*, string) and operator+(string, char*) had
   different performance characteristics. The former required a single
   memory allocation and the latter required two. This patch makes the
   performance equal.

   After consultation with Jonathan, it seemed like a good idea to create a
   single function that performed one-allocation string concatenation that
   could be used by various different version of operator+. This patch adds
   such a function and calls it from the relevant implementations.

   Co-authored-by: Jonathan Wakely 

   libstdc++-v3/ChangeLog:

   * include/bits/basic_string.h (__str_cat): Add common function
   that performs single-allocation string concatenation.
   (operator+): Use __str_cat.
   * include/bits/basic_string.tcc (operator+): Move to .h and
   define inline using __str_cat.

   Signed-off-by: Will Hawkins 

Specifically, I restored part of your earlier commit's message, which
gives the necessary context for the commit. Just starting with "After
consultation with Jonathan, ..." doesn't say anything about the patch
itself and is not very helpful without the earlier context from the
mailing list.

I added myself as Co-author, since the new patch was largely based on
a patch I sent in a private email.

And I changed the changelog part to better meet the format of GNU ChangeLogs.
https://www.gnu.org/prep/standards/html_node/Style-of-Change-Logs.html

The change is on trunk now (and I didn't see any libgomp test failures
this time).






> ---
>  libstdc++-v3/include/bits/basic_string.h   | 66 --
>  libstdc++-v3/include/bits/basic_string.tcc | 41 --
>  2 files changed, 49 insertions(+), 58 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/basic_string.h 
> b/libstdc++-v3/include/bits/basic_string.h
> index cd244191df4..9c2b57f5a1d 100644
> --- a/libstdc++-v3/include/bits/basic_string.h
> +++ b/libstdc++-v3/include/bits/basic_string.h
> @@ -3485,6 +3485,24 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
>  _GLIBCXX_END_NAMESPACE_CXX11
>  #endif
>
> +  template
> +_GLIBCXX20_CONSTEXPR
> +inline _Str
> +__str_concat(typename _Str::value_type const* __lhs,
> +typename _Str::size_type __lhs_len,
> +typename _Str::value_type const* __rhs,
> +typename _Str::size_type __rhs_len,
> +typename _Str::allocator_type const& __a)
> +{
> +  typedef typename _Str::allocator_type allocator_type;
> +  typedef __gnu_cxx::__alloc_traits _Alloc_traits;
> +  _Str __str(_Alloc_traits::_S_select_on_copy(__a));
> +  __str.reserve(__lhs_len + __rhs_len);
> +  __str.append(__lhs, __lhs_len);
> +  __str.append(__rhs, __rhs_len);
> +  return __str;
> +}
> +
>// operator+
>/**
> *  @brief  Concatenate two strings.
> @@ -3494,13 +3512,14 @@ _GLIBCXX_END_NAMESPACE_CXX11
> */
>template
>  _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
> -basic_string<_CharT, _Traits, _Alloc>
> +inline basic_string<_CharT, _Traits, _Alloc>
>  operator+(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
>   const basic_string<_CharT, _Traits, _Alloc>& __rhs)
>  {
> -  basic_string<_CharT, _Traits, _Alloc> __str(__lhs);
> -  __str.append(__rhs);
> -  return __str;
> +  typedef basic_string<_CharT, _Traits, _Alloc> _Str;
> +  return std::__str_concat<_Str>(__lhs.c_str(), __lhs.size(),
> +__rhs.c_str(), __rhs.size(),
> +__lhs.get_allocator());
>  }
>
>/**
> @@ -3511,9 +3530,16 @@ _GLIBCXX_END_NAMESPACE_CXX11
> */
>template
>  _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
> -basic_string<_CharT,_Traits,_Alloc>
> +inline basic_string<_CharT,_Traits,_Alloc>
>  operator+(const _CharT* __lhs,
> - const basic_string<_CharT,_Traits,_Alloc>& __rhs);
> + const basic_string<_CharT,_Traits,_Alloc>& __rhs)
> +{
> +  

RE: [PATCH 1/4]middle-end Support not decomposing specific divisions during vectorization.

2022-11-08 Thread Tamar Christina via Gcc-patches
Ping.

> -Original Message-
> From: Tamar Christina
> Sent: Monday, October 31, 2022 11:35 AM
> To: Richard Biener 
> Cc: gcc-patches@gcc.gnu.org; nd ; jeffreya...@gmail.com
> Subject: RE: [PATCH 1/4]middle-end Support not decomposing specific
> divisions during vectorization.
> 
> >
> > The type of the expression should be available via the mode and the
> > signedness, no?  So maybe to avoid having both RTX and TREE on the
> > target hook pass it a wide_int instead for the divisor?
> >
> 
> Done.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * expmed.h (expand_divmod): Pass tree operands down in addition
> to RTX.
>   * expmed.cc (expand_divmod): Likewise.
>   * explow.cc (round_push, align_dynamic_address): Likewise.
>   * expr.cc (force_operand, expand_expr_divmod): Likewise.
>   * optabs.cc (expand_doubleword_mod,
> expand_doubleword_divmod):
>   Likewise.
>   * target.h: Include tree-core.
>   * target.def (can_special_div_by_const): New.
>   * targhooks.cc (default_can_special_div_by_const): New.
>   * targhooks.h (default_can_special_div_by_const): New.
>   * tree-vect-generic.cc (expand_vector_operation): Use it.
>   * doc/tm.texi.in: Document it.
>   * doc/tm.texi: Regenerate.
>   * tree-vect-patterns.cc (vect_recog_divmod_pattern): Check for
> support.
>   * tree-vect-stmts.cc (vectorizable_operation): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-div-bitmask-1.c: New test.
>   * gcc.dg/vect/vect-div-bitmask-2.c: New test.
>   * gcc.dg/vect/vect-div-bitmask-3.c: New test.
>   * gcc.dg/vect/vect-div-bitmask.h: New file.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index
> 92bda1a7e14a3c9ea63e151e4a49a818bf4d1bdb..a29f5c39be3f0927f8ef6e094
> c7a712c0604fb77 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -6112,6 +6112,22 @@ instruction pattern.  There is no need for the hook
> to handle these two  implementation approaches itself.
>  @end deftypefn
> 
> +@deftypefn {Target Hook} bool
> TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST
> +(enum @var{tree_code}, tree @var{vectype}, wide_int @var{constant}, rtx
> +*@var{output}, rtx @var{in0}, rtx @var{in1}) This hook is used to test
> +whether the target has a special method of division of vectors of type
> +@var{vectype} using the value @var{constant}, and producing a vector of
> type @var{vectype}.  The division will then not be decomposed by the and
> kept as a div.
> +
> +When the hook is being used to test whether the target supports a
> +special divide, @var{in0}, @var{in1}, and @var{output} are all null.
> +When the hook is being used to emit a division, @var{in0} and @var{in1}
> +are the source vectors of type @var{vecttype} and @var{output} is the
> +destination vector of type @var{vectype}.
> +
> +Return true if the operation is possible, emitting instructions for it
> +if rtxes are provided and updating @var{output}.
> +@end deftypefn
> +
>  @deftypefn {Target Hook} tree
> TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION (unsigned
> @var{code}, tree @var{vec_type_out}, tree @var{vec_type_in})  This hook
> should return the decl of a function that implements the  vectorized variant
> of the function with the @code{combined_fn} code diff --git
> a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index
> 112462310b134705d860153294287cfd7d4af81d..d5a745a02acdf051ea1da1b04
> 076d058c24ce093 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -4164,6 +4164,8 @@ address;  but often a machine-dependent strategy
> can generate better code.
> 
>  @hook TARGET_VECTORIZE_VEC_PERM_CONST
> 
> +@hook TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST
> +
>  @hook TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
> 
>  @hook TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION
> diff --git a/gcc/explow.cc b/gcc/explow.cc index
> ddb4d6ae3600542f8d2bb5617cdd3933a9fae6c0..568e0eb1a158c696458ae678f
> 5e346bf34ba0036 100644
> --- a/gcc/explow.cc
> +++ b/gcc/explow.cc
> @@ -1037,7 +1037,7 @@ round_push (rtx size)
>   TRUNC_DIV_EXPR.  */
>size = expand_binop (Pmode, add_optab, size, alignm1_rtx,
>  NULL_RTX, 1, OPTAB_LIB_WIDEN);
> -  size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, size, align_rtx,
> +  size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, NULL, NULL, size,
> + align_rtx,
>   NULL_RTX, 1);
>size = expand_mult (Pmode, size, align_rtx, NULL_RTX, 1);
> 
> @@ -1203,7 +1203,7 @@ align_dynamic_address (rtx target, unsigned
> required_align)
>gen_int_mode (required_align / BITS_PER_UNIT - 1,
>  Pmode),
>NULL_RTX, 1, OPTAB_LIB_WIDEN);
> -  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
> +  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, NULL, NULL,
> target,
>

Re: [PATCH] sched1: Fix -fcompare-debug issue in schedule_region [PR105586]

2022-11-08 Thread Richard Biener via Gcc-patches
On Tue, Nov 8, 2022 at 5:37 PM Surya Kumari Jangala
 wrote:
>
> Hi Richard,
>
> On 21/09/22 1:03 pm, Richard Biener wrote:
> > On Tue, Sep 20, 2022 at 9:18 AM Surya Kumari Jangala via Gcc-patches
> >  wrote:
> >>
> >> Hi Jeff, Richard,
> >> Thank you for reviewing the patch!
> >> I have committed the patch to the gcc repo.
> >> Can I backport this patch to prior versions of gcc, as this is an easy 
> >> patch to backport and the issue exists in prior versions too?
> >
> > It doesn't seem to be a regression so I'd error on the safe side here.
>
> Can you please clarify, should this patch be backported? It is not very clear 
> what "safe side" means here.

Not backporting is the safe side.

Richard.

> Thanks!
> Surya
>
> >
> > Richard.
> >
> >> Regards,
> >> Surya
> >>
> >>
> >> On 31/08/22 9:09 pm, Jeff Law via Gcc-patches wrote:
> >>>
> >>>
> >>> On 8/23/2022 5:49 AM, Surya Kumari Jangala via Gcc-patches wrote:
>  sched1: Fix -fcompare-debug issue in schedule_region [PR105586]
> 
>  In schedule_region(), a basic block that does not contain any real insns
>  is not scheduled and the dfa state at the entry of the bb is not copied
>  to the fallthru basic block. However a DEBUG insn is treated as a real
>  insn, and if a bb contains non-real insns and a DEBUG insn, it's dfa
>  state is copied to the fallthru bb. This was resulting in
>  -fcompare-debug failure as the incoming dfa state of the fallthru block
>  is different with -g. We should always copy the dfa state of a bb to
>  it's fallthru bb even if the bb does not contain real insns.
> 
>  2022-08-22  Surya Kumari Jangala  
> 
>  gcc/
>  PR rtl-optimization/105586
>  * sched-rgn.cc (schedule_region): Always copy dfa state to
>  fallthru block.
> 
>  gcc/testsuite/
>  PR rtl-optimization/105586
>  * gcc.target/powerpc/pr105586.c: New test.
> >>> Interesting.We may have stumbled over this bug internally a little 
> >>> while ago -- not from a compare-debug standpoint, but from a "why isn't 
> >>> the processor state copied to the fallthru block" point of view.   I had 
> >>> it on my to investigate list, but hadn't gotten around to it yet.
> >>>
> >>> I think there were requests for ChangeLog updates and a function comment 
> >>> for save_state_for_fallthru_edge.  OK with those updates.
> >>>
> >>> jeff
> >>>


Re: [PATCH] Use toplevel configure for GMP and MPFR for gdb

2022-11-08 Thread Andrew Pinski via Gcc-patches
On Tue, Nov 8, 2022 at 8:46 AM Andreas Schwab via Gdb-patches
 wrote:
>
> On Nov 08 2022, apinski--- via Gcc-patches wrote:
>
> > diff --git a/configure b/configure
> > index 7bcb894d1fe..9ee7a1a3abe 100755
> > --- a/configure
> > +++ b/configure
> > @@ -769,6 +769,7 @@ infodir
> >  docdir
> >  oldincludedir
> >  includedir
> > +runstatedir
> >  localstatedir
> >  sharedstatedir
> >  sysconfdir
>
> Please avoid using a patched autoconf command.

Sorry about that. I have regenerated it with a plain autoconfig-2.69
and will make sure I will be using that going forward and when
approved use that with the pushed version but I don't see a reason to
resubmit the patch otherwise.

Thanks,
Andrew Pinski

>
> --
> Andreas Schwab, SUSE Labs, sch...@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."


Re: [PATCH] Use toplevel configure for GMP and MPFR for gdb

2022-11-08 Thread Andreas Schwab via Gcc-patches
On Nov 08 2022, apinski--- via Gcc-patches wrote:

> diff --git a/configure b/configure
> index 7bcb894d1fe..9ee7a1a3abe 100755
> --- a/configure
> +++ b/configure
> @@ -769,6 +769,7 @@ infodir
>  docdir
>  oldincludedir
>  includedir
> +runstatedir
>  localstatedir
>  sharedstatedir
>  sysconfdir

Please avoid using a patched autoconf command.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH] sched1: Fix -fcompare-debug issue in schedule_region [PR105586]

2022-11-08 Thread Surya Kumari Jangala via Gcc-patches
Hi Richard,

On 21/09/22 1:03 pm, Richard Biener wrote:
> On Tue, Sep 20, 2022 at 9:18 AM Surya Kumari Jangala via Gcc-patches
>  wrote:
>>
>> Hi Jeff, Richard,
>> Thank you for reviewing the patch!
>> I have committed the patch to the gcc repo.
>> Can I backport this patch to prior versions of gcc, as this is an easy patch 
>> to backport and the issue exists in prior versions too?
> 
> It doesn't seem to be a regression so I'd error on the safe side here.

Can you please clarify, should this patch be backported? It is not very clear 
what "safe side" means here.

Thanks!
Surya

> 
> Richard.
> 
>> Regards,
>> Surya
>>
>>
>> On 31/08/22 9:09 pm, Jeff Law via Gcc-patches wrote:
>>>
>>>
>>> On 8/23/2022 5:49 AM, Surya Kumari Jangala via Gcc-patches wrote:
 sched1: Fix -fcompare-debug issue in schedule_region [PR105586]

 In schedule_region(), a basic block that does not contain any real insns
 is not scheduled and the dfa state at the entry of the bb is not copied
 to the fallthru basic block. However a DEBUG insn is treated as a real
 insn, and if a bb contains non-real insns and a DEBUG insn, it's dfa
 state is copied to the fallthru bb. This was resulting in
 -fcompare-debug failure as the incoming dfa state of the fallthru block
 is different with -g. We should always copy the dfa state of a bb to
 it's fallthru bb even if the bb does not contain real insns.

 2022-08-22  Surya Kumari Jangala  

 gcc/
 PR rtl-optimization/105586
 * sched-rgn.cc (schedule_region): Always copy dfa state to
 fallthru block.

 gcc/testsuite/
 PR rtl-optimization/105586
 * gcc.target/powerpc/pr105586.c: New test.
>>> Interesting.We may have stumbled over this bug internally a little 
>>> while ago -- not from a compare-debug standpoint, but from a "why isn't the 
>>> processor state copied to the fallthru block" point of view.   I had it on 
>>> my to investigate list, but hadn't gotten around to it yet.
>>>
>>> I think there were requests for ChangeLog updates and a function comment 
>>> for save_state_for_fallthru_edge.  OK with those updates.
>>>
>>> jeff
>>>


[PATCH] Use toplevel configure for GMP and MPFR for gdb

2022-11-08 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

This patch uses the toplevel configure parts for GMP/MPFR for
gdb. The only thing is that gdb now requires MPFR for building.
Before it was a recommended but not required library.
Also this allows building of GMP and MPFR with the toplevel
directory just like how it is done for GCC.
We now error out in the toplevel configure of the version
of GMP and MPFR that is wrong.

OK? Build gdb 3 ways:
with GMP and MPFR in the toplevel (static library used at that point for both)
With only MPFR in the toplevel (GMP distro library used and MPFR built from 
source)
With neither GMP and MPFR in the toplevel (distro libraries used)

Thanks,
Andrew Pinski

ChangeLog:
* Makefile.def: Add configure-gdb dependencies
on all-gmp and all-mpfr.
* configure.ac: Split out MPC checking from MPFR.
Require GMP and MPFR if the gdb directory exist.
* Makefile.in: Regenerate.
* configure: Regenerate.

gdb/ChangeLog:
* configure.ac: Remove AC_LIB_HAVE_LINKFLAGS
for gmp and mpfr.
Use GMPLIBS and GMPINC which is provided by the
toplevel configure.
* Makefile.in (LIBGMP, LIBMPFR): Remove.
(GMPLIBS, GMPINC): Add definition.
(INTERNAL_CFLAGS_BASE): Add GMPINC.
(CLIBS): Exchange LIBMPFR and LIBGMP
for GMPLIBS.
* target-float.c: Make the code conditional on
HAVE_LIBMPFR unconditional.
* top.c: Remove code checking HAVE_LIBMPFR.
* configure: Regenerate.
* config.in: Regenerate.
---
 Makefile.def   |2 +
 Makefile.in|2 +
 configure  |   81 +++-
 configure.ac   |   45 +-
 gdb/Makefile.in|   12 +-
 gdb/config.in  |6 -
 gdb/configure  | 1036 ++--
 gdb/configure.ac   |   31 +-
 gdb/target-float.c |8 -
 gdb/top.c  |8 -
 10 files changed, 147 insertions(+), 1084 deletions(-)

diff --git a/Makefile.def b/Makefile.def
index acdcd625ed6..d5976e61d98 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -418,6 +418,8 @@ dependencies = { module=configure-isl; on=all-gmp; };
 dependencies = { module=all-intl; on=all-libiconv; };
 
 // Host modules specific to gdb.
+dependencies = { module=configure-gdb; on=all-gmp; };
+dependencies = { module=configure-gdb; on=all-mpfr; };
 dependencies = { module=configure-gdb; on=all-intl; };
 dependencies = { module=configure-gdb; on=configure-sim; };
 dependencies = { module=configure-gdb; on=all-bfd; };
diff --git a/Makefile.in b/Makefile.in
index cb39e4790d6..d0666c75b00 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -63748,6 +63748,8 @@ configure-libcc1: maybe-configure-gcc
 all-libcc1: maybe-all-gcc
 all-c++tools: maybe-all-gcc
 all-utils: maybe-all-libiberty
+configure-gdb: maybe-all-gmp
+configure-gdb: maybe-all-mpfr
 configure-gdb: maybe-all-intl
 configure-gdb: maybe-all-bfd
 configure-gdb: maybe-all-libiconv
diff --git a/configure b/configure
index 7bcb894d1fe..9ee7a1a3abe 100755
--- a/configure
+++ b/configure
@@ -769,6 +769,7 @@ infodir
 docdir
 oldincludedir
 includedir
+runstatedir
 localstatedir
 sharedstatedir
 sysconfdir
@@ -941,6 +942,7 @@ datadir='${datarootdir}'
 sysconfdir='${prefix}/etc'
 sharedstatedir='${prefix}/com'
 localstatedir='${prefix}/var'
+runstatedir='${localstatedir}/run'
 includedir='${prefix}/include'
 oldincludedir='/usr/include'
 docdir='${datarootdir}/doc/${PACKAGE}'
@@ -1193,6 +1195,15 @@ do
   | -silent | --silent | --silen | --sile | --sil)
 silent=yes ;;
 
+  -runstatedir | --runstatedir | --runstatedi | --runstated \
+  | --runstate | --runstat | --runsta | --runst | --runs \
+  | --run | --ru | --r)
+ac_prev=runstatedir ;;
+  -runstatedir=* | --runstatedir=* | --runstatedi=* | --runstated=* \
+  | --runstate=* | --runstat=* | --runsta=* | --runst=* | --runs=* \
+  | --run=* | --ru=* | --r=*)
+runstatedir=$ac_optarg ;;
+
   -sbindir | --sbindir | --sbindi | --sbind | --sbin | --sbi | --sb)
 ac_prev=sbindir ;;
   -sbindir=* | --sbindir=* | --sbindi=* | --sbind=* | --sbin=* \
@@ -1330,7 +1341,7 @@ fi
 for ac_var in  exec_prefix prefix bindir sbindir libexecdir datarootdir \
datadir sysconfdir sharedstatedir localstatedir includedir \
oldincludedir docdir infodir htmldir dvidir pdfdir psdir \
-   libdir localedir mandir
+   libdir localedir mandir runstatedir
 do
   eval ac_val=\$$ac_var
   # Remove trailing slashes.
@@ -1490,6 +1501,7 @@ Fine tuning of the installation directories:
   --sysconfdir=DIRread-only single-machine data [PREFIX/etc]
   --sharedstatedir=DIRmodifiable architecture-independent data [PREFIX/com]
   --localstatedir=DIR modifiable single-machine data [PREFIX/var]
+  --runstatedir=DIR   modifiable per-process data [LOCALSTATEDIR/run]
   --libdir=DIRobject code libraries [EPREFIX/lib]
   --includedir=DIRC header files [PREFIX/include]
   --oldincludedir=DIR C header files 

Re:Re: [PATCH] [PHIOPT] Add A ? B + CST : B match and simplify optimizations

2022-11-08 Thread 钟云德 via Gcc-patches
At 2022-11-08 22:58:34, "Richard Biener"  wrote:

>On Sat, Nov 5, 2022 at 10:03 AM Zhongyunde via Gcc-patches
> wrote:
>>
>>
>> > -Original Message-
>> > From: Andrew Pinski [mailto:pins...@gcc.gnu.org]
>> > Sent: Saturday, November 5, 2022 2:34 PM
>> > To: Zhongyunde 
>> > Cc: hongtao@intel.com; gcc-patches@gcc.gnu.org; Zhangwen(Esan)
>> > ; Weiwei (weiwei, Compiler)
>> > ; zhong_1985...@163.com
>> > Subject: Re: [PATCH] [PHIOPT] Add A ? B + CST : B match and simplify
>> > optimizations
>> >
>> > On Fri, Nov 4, 2022 at 11:17 PM Zhongyunde 
>> > wrote:
>> > >
>> > > hi,
>> > >   This patch is try to fix the issue
>> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107190,
>> > > would you like to give me some suggestion, thanks.
>> >
>> > This seems like a "simplified" version of
>> > https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584411.html
>> > which just handles power of 2 constants where we know the cond will be
>> > removed.
>> > We could do even more "simplified" of 1 if needed really.
>> > What is the IR before PHI-OPT? Is it just + 1?
>>
>> Thanks for your attention. It is + 4294967296 before PHI-OPT  (See detail 
>> https://gcc.godbolt.org/z/6zEc6ja1z)
>> So we should keep matching the power of 2 constants ?
>>
>> > Also your pattern can be simplified to use integer_pow2p in the match part
>> > instead of INTEGER_CST.
>> >
>> Apply your comment, thanks
>
>How does the patch fix the mentioned bug?  match.pd patterns should make things
>"simpler" but x + (a << C') isn't simpler than a ? x + C : x.  It
>looks you are targeting
>PHI-OPT here, so maybe instead extend value_replacement to handle this case,
>it does look similar to the case with neutral/absorbing element there?
>

>Richard.


Thanks. This patch try to fix the 1st issued mentioned in107090 – [aarch64] 
sequence logic should be combined with mul and umulh (gnu.org)
Sure, I'll take a look at the function value_replacement. 
I have also noticed that the function of two_value_replacement is very close to 
patch I want to achieve, and it may be easy to extend.
It seems can be expressed equally in match.pd (called by 
match_simplify_replacement), so how do we
choose where to implement may be better?
```
|
/* Do the replacement of conditional if it can be done.  */if (!early_p 

   && !diamond_p
   && 
two_value_replacement (bb, bb1, e2, phi, arg0, arg1))   
   cfgchanged = true;   
   elseif 
(!diamond_p 
 && match_simplify_replacement (bb, 
bb1, e1, e2, phi,   
arg0, arg1, early_p))   
  cfgchanged = true;
  
|
```
>> > Thanks, >> > Andrew >> >>

[PATCH 1/2] Change the name of array_at_struct_end_p to array_ref_flexible_size_p

2022-11-08 Thread Qing Zhao via Gcc-patches
The name of the utility routine "array_at_struct_end_p" is misleading
and should be changed to a new name that more accurately reflects its
real meaning.

The routine "array_at_struct_end_p" is used to check whether an array
reference is to an array whose actual size might be larger than its
upper bound implies, which includes 3 different cases:

   A. a ref to a flexible array member at the end of a structure;
   B. a ref to an array with a different type against the original decl;
   C. a ref to an array that was passed as a parameter;

The old name only reflects the above case A, therefore very confusing
when reading the corresponding gcc source code.

In this patch, A new name "array_ref_flexible_size_p" is used to replace
the old name.

All the references to the routine "array_at_struct_end_p" was replaced
with this new name, and the corresponding comments were updated to make
them clean and consistent.

gcc/ChangeLog:

* gimple-array-bounds.cc (trailing_array): Replace
array_at_struct_end_p with new name and update comments.
* gimple-fold.cc (get_range_strlen_tree): Likewise.
* gimple-ssa-warn-restrict.cc (builtin_memref::builtin_memref):
Likewise.
* graphite-sese-to-poly.cc (bounds_are_valid): Likewise.
* tree-if-conv.cc (idx_within_array_bound): Likewise.
* tree-object-size.cc (addr_object_size): Likewise.
* tree-ssa-alias.cc (component_ref_to_zero_sized_trailing_array_p):
Likewise.
(stmt_kills_ref_p): Likewise.
* tree-ssa-loop-niter.cc (idx_infer_loop_bounds): Likewise.
* tree-ssa-strlen.cc (maybe_set_strlen_range): Likewise.
* tree.cc (array_at_struct_end_p): Rename to ...
(array_ref_flexible_size_p): ... this.
(component_ref_size): Replace array_at_struct_end_p with new name.
* tree.h (array_at_struct_end_p): Rename to ...
(array_ref_flexible_size_p): ... this.
---
 gcc/gimple-array-bounds.cc  |  4 ++--
 gcc/gimple-fold.cc  |  6 ++
 gcc/gimple-ssa-warn-restrict.cc |  5 +++--
 gcc/graphite-sese-to-poly.cc|  4 ++--
 gcc/tree-if-conv.cc |  7 +++
 gcc/tree-object-size.cc |  2 +-
 gcc/tree-ssa-alias.cc   |  8 
 gcc/tree-ssa-loop-niter.cc  | 15 +++
 gcc/tree-ssa-strlen.cc  |  2 +-
 gcc/tree.cc | 11 ++-
 gcc/tree.h  |  8 
 11 files changed, 35 insertions(+), 37 deletions(-)

diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc
index e190b93aa85..fbf448e045d 100644
--- a/gcc/gimple-array-bounds.cc
+++ b/gcc/gimple-array-bounds.cc
@@ -129,7 +129,7 @@ get_ref_size (tree arg, tree *pref)
 }
 
 /* Return true if REF is (likely) an ARRAY_REF to a trailing array member
-   of a struct.  It refines array_at_struct_end_p by detecting a pointer
+   of a struct.  It refines array_ref_flexible_size_p by detecting a pointer
to an array and an array parameter declared using the [N] syntax (as
opposed to a pointer) and returning false.  Set *PREF to the decl or
expression REF refers to.  */
@@ -167,7 +167,7 @@ trailing_array (tree arg, tree *pref)
return false;
 }
 
-  return array_at_struct_end_p (arg);
+  return array_ref_flexible_size_p (arg);
 }
 
 /* Checks one ARRAY_REF in REF, located at LOCUS. Ignores flexible
diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 9055cd8982d..cafd331ca98 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -1690,13 +1690,11 @@ get_range_strlen_tree (tree arg, bitmap visited, 
strlen_range_kind rkind,
  /* Handle a MEM_REF into a DECL accessing an array of integers,
 being conservative about references to extern structures with
 flexible array members that can be initialized to arbitrary
-numbers of elements as an extension (static structs are okay).
-FIXME: Make this less conservative -- see
-component_ref_size in tree.cc.  */
+numbers of elements as an extension (static structs are okay).  */
  tree ref = TREE_OPERAND (TREE_OPERAND (arg, 0), 0);
  if ((TREE_CODE (ref) == PARM_DECL || VAR_P (ref))
  && (decl_binds_to_current_def_p (ref)
- || !array_at_struct_end_p (arg)))
+ || !array_ref_flexible_size_p (arg)))
{
  /* Fail if the offset is out of bounds.  Such accesses
 should be diagnosed at some point.  */
diff --git a/gcc/gimple-ssa-warn-restrict.cc b/gcc/gimple-ssa-warn-restrict.cc
index b7ed15c8902..832456ba6fc 100644
--- a/gcc/gimple-ssa-warn-restrict.cc
+++ b/gcc/gimple-ssa-warn-restrict.cc
@@ -296,8 +296,9 @@ builtin_memref::builtin_memref (pointer_query , 
gimple *stmt, tree expr,
   tree basetype = TREE_TYPE (base);
   if (TREE_CODE (basetype) == ARRAY_TYPE)
 {
-  if (ref && array_at_struct_end_p (ref))
-   ;   /* Use the maximum possible 

Re: [OG12] [committed] amdgcn: Enable SIMD vectorization of math library functions

2022-11-08 Thread Kwok Cheung Yeung

Hello

These additional patches were pushed onto the devel/omp/gcc-12 branch to 
fix various issues with the SIMD math library:


ecf1603b7ad amdgcn: Fix expansion of GCN_BUILTIN_LDEXPV builtin
6c40e3f5daa amdgcn: Various fixes for SIMD math library
8e6c5b18e10 amdgcn: Fixed intermittent failure in vectorized version of rint

Kwok


[COMMITTED] amdgcn: Fix expansion of GCN_BUILTIN_LDEXPV builtin

2022-11-08 Thread Kwok Cheung Yeung

Hello

This patch fixes a bug in the expansion of GCN_BUILTIN_LDEXPV. As this 
is a double-precision operation, the first argument should be expanded 
as a V64DF expression (instead of V64SF).


Committed to trunk as obvious.

KwokFrom cb0a2b1f28cf0c231bf38fcd02c40689739df7bb Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Tue, 8 Nov 2022 14:38:23 +
Subject: [PATCH] amdgcn: Fix expansion of GCN_BUILTIN_LDEXPV builtin

2022-11-08  Kwok Cheung Yeung  

gcc/
* config/gcn/gcn.cc (gcn_expand_builtin_1): Expand first argument
of GCN_BUILTIN_LDEXPV to V64DFmode.
---
 gcc/config/gcn/gcn.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 9c5e3419748..5e6f3b8b74b 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -4383,7 +4383,7 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx 
/*subtarget */ ,
  return target;
rtx arg1 = force_reg (V64DFmode,
  expand_expr (CALL_EXPR_ARG (exp, 0), NULL_RTX,
-  V64SFmode,
+  V64DFmode,
   EXPAND_NORMAL));
rtx arg2 = force_reg (V64SImode,
  expand_expr (CALL_EXPR_ARG (exp, 1), NULL_RTX,
-- 
2.25.1



Re: [PATCH] CCP: handle division by a power of 2 as a right shift.

2022-11-08 Thread Richard Biener via Gcc-patches
On Tue, Nov 8, 2022 at 3:25 PM Aldy Hernandez  wrote:
>
> We have some code in range-ops that sets better maybe nonzero bits for
> TRUNC_DIV_EXPR by a power of 2 than CCP does, by just shifting the
> mask.  I'd like to offload this functionality into the CCP mask
> tracking code, which already does the right thing for right shifts.
>
> The testcase for this change is gcc.dg/tree-ssa/vrp123.c and
> gcc.dg/tree-ssa/pr107541.c.
>
> Tested on x86-64 Linux.
>
> OK?

LGTM

> gcc/ChangeLog:
>
> * range-op.cc (operator_div::fold_range): Call
> update_known_bitmask.
> * tree-ssa-ccp.cc (bit_value_binop): Handle divisions by powers of
> 2 as a right shift.
> ---
>  gcc/range-op.cc | 18 +-
>  gcc/tree-ssa-ccp.cc | 12 
>  2 files changed, 13 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/range-op.cc b/gcc/range-op.cc
> index 846931ddcae..8ff5d5b4c78 100644
> --- a/gcc/range-op.cc
> +++ b/gcc/range-op.cc
> @@ -1995,23 +1995,7 @@ operator_div::fold_range (irange , tree type,
>if (!cross_product_operator::fold_range (r, type, lh, rh, trio))
>  return false;
>
> -  if (lh.undefined_p ())
> -return true;
> -
> -  tree t;
> -  if (code == TRUNC_DIV_EXPR
> -  && rh.singleton_p ()
> -  && !wi::neg_p (lh.lower_bound ()))
> -{
> -  wide_int wi = wi::to_wide (t);
> -  int shift = wi::exact_log2 (wi);
> -  if (shift != -1)
> -   {
> - wide_int nz = lh.get_nonzero_bits ();
> - nz = wi::rshift (nz, shift, TYPE_SIGN (type));
> - r.set_nonzero_bits (nz);
> -   }
> -}
> +  update_known_bitmask (r, code, lh, rh);
>return true;
>  }
>
> diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc
> index 3a4b6bc1118..2bcd90646f6 100644
> --- a/gcc/tree-ssa-ccp.cc
> +++ b/gcc/tree-ssa-ccp.cc
> @@ -1934,6 +1934,18 @@ bit_value_binop (enum tree_code code, signop sgn, int 
> width,
>{
> widest_int r1max = r1val | r1mask;
> widest_int r2max = r2val | r2mask;
> +   if (r2mask == 0 && !wi::neg_p (r1max))
> + {
> +   widest_int shift = wi::exact_log2 (r2val);
> +   if (shift != -1)
> + {
> +   // Handle division by a power of 2 as an rshift.
> +   bit_value_binop (RSHIFT_EXPR, sgn, width, val, mask,
> +r1type_sgn, r1type_precision, r1val, r1mask,
> +r2type_sgn, r2type_precision, shift, r2mask);
> +   return;
> + }
> + }
> if (sgn == UNSIGNED
> || (!wi::neg_p (r1max) && !wi::neg_p (r2max)))
>   {
> --
> 2.38.1
>


Re: [PATCH] [PHIOPT] Add A ? B + CST : B match and simplify optimizations

2022-11-08 Thread Richard Biener via Gcc-patches
On Sat, Nov 5, 2022 at 10:03 AM Zhongyunde via Gcc-patches
 wrote:
>
>
> > -Original Message-
> > From: Andrew Pinski [mailto:pins...@gcc.gnu.org]
> > Sent: Saturday, November 5, 2022 2:34 PM
> > To: Zhongyunde 
> > Cc: hongtao@intel.com; gcc-patches@gcc.gnu.org; Zhangwen(Esan)
> > ; Weiwei (weiwei, Compiler)
> > ; zhong_1985...@163.com
> > Subject: Re: [PATCH] [PHIOPT] Add A ? B + CST : B match and simplify
> > optimizations
> >
> > On Fri, Nov 4, 2022 at 11:17 PM Zhongyunde 
> > wrote:
> > >
> > > hi,
> > >   This patch is try to fix the issue
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107190,
> > > would you like to give me some suggestion, thanks.
> >
> > This seems like a "simplified" version of
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584411.html
> > which just handles power of 2 constants where we know the cond will be
> > removed.
> > We could do even more "simplified" of 1 if needed really.
> > What is the IR before PHI-OPT? Is it just + 1?
>
> Thanks for your attention. It is + 4294967296 before PHI-OPT  (See detail 
> https://gcc.godbolt.org/z/6zEc6ja1z)
> So we should keep matching the power of 2 constants ?
>
> > Also your pattern can be simplified to use integer_pow2p in the match part
> > instead of INTEGER_CST.
> >
> Apply your comment, thanks

How does the patch fix the mentioned bug?  match.pd patterns should make things
"simpler" but x + (a << C') isn't simpler than a ? x + C : x.  It
looks you are targeting
PHI-OPT here, so maybe instead extend value_replacement to handle this case,
it does look similar to the case with neutral/absorbing element there?

Richard.

>
> > Thanks,
> > Andrew
>
>


Re: [PATCH 9/15] arm: Set again stack pointer as CFA reg when popping if necessary

2022-11-08 Thread Richard Earnshaw via Gcc-patches




On 26/10/2022 09:49, Andrea Corallo via Gcc-patches wrote:

Richard Earnshaw  writes:


On 27/09/2022 16:24, Kyrylo Tkachov via Gcc-patches wrote:



-Original Message-
From: Andrea Corallo 
Sent: Tuesday, September 27, 2022 11:06 AM
To: Kyrylo Tkachov 
Cc: Andrea Corallo via Gcc-patches ; Richard
Earnshaw ; nd 
Subject: Re: [PATCH 9/15] arm: Set again stack pointer as CFA reg when
popping if necessary

Kyrylo Tkachov  writes:


Hi Andrea,


-Original Message-
From: Gcc-patches  On Behalf Of Andrea
Corallo via Gcc-patches
Sent: Friday, August 12, 2022 4:34 PM
To: Andrea Corallo via Gcc-patches 
Cc: Richard Earnshaw ; nd 
Subject: [PATCH 9/15] arm: Set again stack pointer as CFA reg when

popping

if necessary

Hi all,

this patch enables 'arm_emit_multi_reg_pop' to set again the stack
pointer as CFA reg when popping if this is necessary.



  From what I can tell from similar functions this is correct, but could you

elaborate on why this change is needed for my understanding please?

Thanks,
Kyrill


Hi Kyrill,

sure, if the frame pointer was set, than it is the current CFA register.
If we request to adjust the current CFA register offset indicating it
being SP (while it's actually FP) that is indeed not correct and the
incoherence we will be detected by an assertion in the dwarf emission
machinery.

Thanks,  the patch is ok
Kyrill



Best Regards

Andrea


Hmm, wait.  Why would a multi-reg pop be updating the stack pointer?


Hi Richard,

not sure I understand, isn't any pop updating SP by definition?


Yes, but the SP must already be the CFA before this instruction, since 
SP must be the base of the pop. So the reg note changing the CFA to SP 
can't be right.  I'm thinking there must be some earlier restore of SP 
that's missing a frame-related note.


R.



BR

   Andrea


[PATCH 0/2] Add a new warning option -Wstrict-flex-array

2022-11-08 Thread Qing Zhao via Gcc-patches


This patch serie include two changes:
  1. Change the name of array_at_struct_end_p to array_ref_flexible_size_p.
  2. Add a new warning option -Wstrict-flex-arrays and at the same time
keep -Warray-bounds unchanged from -fstrict-flex-arrays.

The new warning -Wstrict-flex-arrays is implemented at the same place as 
-Warray-bounds. Since we need to keep the old behaviors of
-Warray-bounds=[1|2], we refactor the routine
"array_bounds_checker::check_array_ref" to make it work for both
-Warray-bounds and -Wstrict-flex-arrays.  

if -Warray-bounds, -Wstrict-flex-arrays, -fstrict-flex-arrays presents
at the same time:
  A. -Warray-bounds will be not controlled by -fstrict-flex-arrays;
  B. -Wstrict-flex-arrays will be controled by -fstrict-flex-arrays;
  C. both -Warray-bounds and -Wstrict-flex-arrays will report warnings
 according to it's own rules.

bootstrapped and regression tested on both x86 and aarch64. no issue.
Okay for commit?

thanks.

Qing




[PATCH 2/2] Add a new warning option -Wstrict-flex-arrays.

2022-11-08 Thread Qing Zhao via Gcc-patches
'-Wstrict-flex-arrays'
 Warn about inproper usages of flexible array members according to
 the LEVEL of the 'strict_flex_array (LEVEL)' attribute attached to
 the trailing array field of a structure if it's available,
 otherwise according to the LEVEL of the option
 '-fstrict-flex-arrays=LEVEL'.

 This option is effective only when LEVEL is bigger than 0.
 Otherwise, it will be ignored with a warning.

 when LEVEL=1, warnings will be issued for a trailing array
 reference of a structure that have 2 or more elements if the
 trailing array is referenced as a flexible array member.

 when LEVEL=2, in addition to LEVEL=1, additional warnings will be
 issued for a trailing one-element array reference of a structure if
 the array is referenced as a flexible array member.

 when LEVEL=3, in addition to LEVEL=2, additional warnings will be
 issued for a trailing zero-length array reference of a structure if
 the array is referenced as a flexible array member.

At the same time, keep -Warray-bounds=[1|2] warnings unchanged from
 -fstrict-flex-arrays.

gcc/ChangeLog:

* attribs.cc (strict_flex_array_level_of): New function.
* attribs.h (strict_flex_array_level_of): Prototype for new function.
* doc/invoke.texi: Document -Wstrict-flex-arrays option. Update
-fstrict-flex-arrays[=n] options.
* gimple-array-bounds.cc (array_bounds_checker::check_array_ref):
Issue warnings for -Wstrict-flex-arrays.
(get_up_bounds_for_array_ref): New function.
(check_out_of_bounds_and_warn): New function.
* opts.cc (finish_options): Issue warnings for unsupported combination
of -Warray-bounds and -fstrict-flex-arrays, -Wstrict_flex_arrays and
-fstrict-flex-array.
* tree-vrp.cc (execute_vrp): Enable the pass when
warn_strict_flex_array is true.
(execute_ranger_vrp): Likewise.
* tree.cc (array_ref_flexible_size_p): Add one new argument.
(component_ref_sam_type): New function.
(component_ref_size): Add one new argument,
* tree.h (array_ref_flexible_size_p): Update prototype.
(enum struct special_array_member): Add two new enum values.
(component_ref_sam_type): New prototype.
(component_ref_size): Update prototype.

gcc/c-family/ChangeLog:

* c.opt (Wstrict-flex-arrays): New option.

gcc/c/ChangeLog:

* c-decl.cc (is_flexible_array_member_p): Call new function
strict_flex_array_level_of.

gcc/testsuite/ChangeLog:

* c-c++-common/Wstrict-flex-arrays.c: New test.
* c-c++-common/Wstrict-flex-arrays_2.c: New test.
* gcc.dg/Wstrict-flex-arrays-2.c: New test.
* gcc.dg/Wstrict-flex-arrays-3.c: New test.
* gcc.dg/Wstrict-flex-arrays-4.c: New test.
* gcc.dg/Wstrict-flex-arrays-5.c: New test.
* gcc.dg/Wstrict-flex-arrays-6.c: New test.
* gcc.dg/Wstrict-flex-arrays-7.c: New test.
* gcc.dg/Wstrict-flex-arrays-8.c: New test.
* gcc.dg/Wstrict-flex-arrays-9.c: New test.
* gcc.dg/Wstrict-flex-arrays.c: New test.
---
 gcc/attribs.cc|  30 ++
 gcc/attribs.h |   2 +
 gcc/c-family/c.opt|   5 +
 gcc/c/c-decl.cc   |  22 +-
 gcc/doc/invoke.texi   |  33 ++-
 gcc/gimple-array-bounds.cc| 264 +-
 gcc/opts.cc   |  15 +
 .../c-c++-common/Wstrict-flex-arrays.c|   9 +
 .../c-c++-common/Wstrict-flex-arrays_2.c  |   9 +
 gcc/testsuite/gcc.dg/Wstrict-flex-arrays-2.c  |  46 +++
 gcc/testsuite/gcc.dg/Wstrict-flex-arrays-3.c  |  46 +++
 gcc/testsuite/gcc.dg/Wstrict-flex-arrays-4.c  |  49 
 gcc/testsuite/gcc.dg/Wstrict-flex-arrays-5.c  |  48 
 gcc/testsuite/gcc.dg/Wstrict-flex-arrays-6.c  |  48 
 gcc/testsuite/gcc.dg/Wstrict-flex-arrays-7.c  |  50 
 gcc/testsuite/gcc.dg/Wstrict-flex-arrays-8.c  |  49 
 gcc/testsuite/gcc.dg/Wstrict-flex-arrays-9.c  |  49 
 gcc/testsuite/gcc.dg/Wstrict-flex-arrays.c|  46 +++
 gcc/tree-vrp.cc   |   6 +-
 gcc/tree.cc   | 165 ---
 gcc/tree.h|  15 +-
 21 files changed, 870 insertions(+), 136 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/Wstrict-flex-arrays.c
 create mode 100644 gcc/testsuite/c-c++-common/Wstrict-flex-arrays_2.c
 create mode 100644 gcc/testsuite/gcc.dg/Wstrict-flex-arrays-2.c
 create mode 100644 gcc/testsuite/gcc.dg/Wstrict-flex-arrays-3.c
 create mode 100644 gcc/testsuite/gcc.dg/Wstrict-flex-arrays-4.c
 create mode 100644 gcc/testsuite/gcc.dg/Wstrict-flex-arrays-5.c
 create mode 100644 gcc/testsuite/gcc.dg/Wstrict-flex-arrays-6.c
 create mode 100644 gcc/testsuite/gcc.dg/Wstrict-flex-arrays-7.c
 create mode 100644 

Re: [PATCH] amdgcn: Add builtins for vectorized native versions of abs, floorf and floor

2022-11-08 Thread Andrew Stubbs

On 08/11/2022 14:35, Kwok Cheung Yeung wrote:

Hello

This patch adds three extra builtins for the vectorized forms of the 
abs, floorf and floor math functions, which are implemented by native 
GCN instructions. I have also added a test to check that they generate 
the expected assembler instructions.


Okay for trunk?


OK.

Andrew


[RFC PATCH] c++: Minimal handling of carries_dependency attribute

2022-11-08 Thread Jakub Jelinek via Gcc-patches
Hi!

A comment in D2552R1:
"The only questionable (but still conforming) case we found was
[[carries_dependency(some_argument)]] on GCC, where the emitted diagnostic said 
that the
carries_dependency attribute is not supported, but did not specifically call 
out the syntax error
in the argument clause."
made me try the following patch, where we'll error at least
for arguments to the attribute and for some uses of the attribute
appertaining to something not mentioned in the standard warn
with different diagnostics (or should that be an error?; clang++
does that, but I think we never do for any attribute, standard or not).
The diagnostics on toplevel attribute declaration is still an
attribute ignored warning and on empty statement different wording.

The paper additionally mentions
struct X { [[nodiscard]]; }; // no diagnostic on GCC
and 2 cases of missing diagnostics on [[fallthrough]] (guess I should
file a PR about those; one problem is that do { ... } while (0); there
is replaced during genericization just by ... and another that
[[fallthrough]] there is followed by a label, but not user/case/default
label, but an artificial one created from while loop genericization.

Thoughts on this?

2022-11-08  Jakub Jelinek  

* tree.cc (handle_carries_dependency_attribute): New function.
(std_attribute_table): Add carries_dependency attribute.
* parser.cc (cp_parser_check_std_attribute): Add carries_dependency
attribute.

* g++.dg/cpp0x/attr-carries_dependency1.C: New test.

--- gcc/cp/tree.cc.jj   2022-11-07 10:30:42.758629740 +0100
+++ gcc/cp/tree.cc  2022-11-08 14:45:08.853864684 +0100
@@ -4923,6 +4923,32 @@ structural_type_p (tree t, bool explain)
   return true;
 }
 
+/* Partially handle the C++11 [[carries_dependency]] attribute.
+   Just emit a different diagnostics when it is used on something the
+   spec doesn't allow vs. where it allows and we just choose to ignore
+   it.  */
+
+static tree
+handle_carries_dependency_attribute (tree *node, tree name,
+tree ARG_UNUSED (args),
+int ARG_UNUSED (flags),
+bool *no_add_attrs)
+{
+  if (TREE_CODE (*node) != FUNCTION_DECL
+  && TREE_CODE (*node) != PARM_DECL)
+{
+  warning (OPT_Wattributes, "%qE attribute can only be applied to "
+  "functions or parameters", name);
+  *no_add_attrs = true;
+}
+  else
+{
+  warning (OPT_Wattributes, "%qE attribute ignored", name);
+  *no_add_attrs = true;
+}
+  return NULL_TREE;
+}
+
 /* Handle the C++17 [[nodiscard]] attribute, which is similar to the GNU
warn_unused_result attribute.  */
 
@@ -5036,6 +5062,8 @@ const struct attribute_spec std_attribut
 handle_likeliness_attribute, attr_cold_hot_exclusions },
   { "noreturn", 0, 0, true, false, false, false,
 handle_noreturn_attribute, attr_noreturn_exclusions },
+  { "carries_dependency", 0, 0, true, false, false, false,
+handle_carries_dependency_attribute, NULL },
   { NULL, 0, 0, false, false, false, false, NULL, NULL }
 };
 
--- gcc/cp/parser.cc.jj 2022-11-04 18:11:41.523945997 +0100
+++ gcc/cp/parser.cc2022-11-08 13:41:35.075135139 +0100
@@ -29239,8 +29239,7 @@ cp_parser_std_attribute (cp_parser *pars
 
 /* Warn if the attribute ATTRIBUTE appears more than once in the
attribute-list ATTRIBUTES.  This used to be enforced for certain
-   attributes, but the restriction was removed in P2156.  Note that
-   carries_dependency ([dcl.attr.depend]) isn't implemented yet in GCC.
+   attributes, but the restriction was removed in P2156.
LOC is the location of ATTRIBUTE.  Returns true if ATTRIBUTE was not
found in ATTRIBUTES.  */
 
@@ -29249,7 +29248,7 @@ cp_parser_check_std_attribute (location_
 {
   static auto alist = { "noreturn", "deprecated", "nodiscard", "maybe_unused",
"likely", "unlikely", "fallthrough",
-   "no_unique_address" };
+   "no_unique_address", "carries_dependency" };
   if (attributes)
 for (const auto  : alist)
   if (is_attribute_p (a, get_attribute_name (attribute))
--- gcc/testsuite/g++.dg/cpp0x/attr-carries_dependency1.C.jj2022-11-08 
15:17:43.168238390 +0100
+++ gcc/testsuite/g++.dg/cpp0x/attr-carries_dependency1.C   2022-11-08 
15:16:39.695104787 +0100
@@ -0,0 +1,17 @@
+// { dg-do compile { target c++11 } }
+
+[[carries_dependency]] int *f1 (); // { dg-warning "attribute 
ignored" }
+int f2 (int *x [[carries_dependency]]);// { dg-warning 
"attribute ignored" }
+[[carries_dependency]] int f3 ();  // { dg-warning "attribute 
ignored" }
+int f4 (int x [[carries_dependency]]); // { dg-warning "attribute 
ignored" }
+[[carries_dependency(1)]] int f5 ();   // { dg-error 
"'carries_dependency' attribute does not take any arguments" }
+[[carries_dependency]] int v;  // { dg-warning 

Re: [PATCH] Optimize VEC_PERM_EXPR with same permutation index and operation [PR98167]

2022-11-08 Thread Richard Biener via Gcc-patches
On Fri, Nov 4, 2022 at 7:44 AM Prathamesh Kulkarni via Gcc-patches
 wrote:
>
> On Fri, 4 Nov 2022 at 05:36, Hongyu Wang via Gcc-patches
>  wrote:
> >
> > Hi,
> >
> > This is a follow-up patch for PR98167
> >
> > The sequence
> >  c1 = VEC_PERM_EXPR (a, a, mask)
> >  c2 = VEC_PERM_EXPR (b, b, mask)
> >  c3 = c1 op c2
> > can be optimized to
> >  c = a op b
> >  c3 = VEC_PERM_EXPR (c, c, mask)
> > for all integer vector operation, and float operation with
> > full permutation.
> >
> > Bootstrapped & regrtested on x86_64-pc-linux-gnu.
> >
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > PR target/98167
> > * match.pd: New perm + vector op patterns for int and fp vector.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/98167
> > * gcc.target/i386/pr98167.c: New test.
> > ---
> >  gcc/match.pd| 49 +
> >  gcc/testsuite/gcc.target/i386/pr98167.c | 44 ++
> >  2 files changed, 93 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr98167.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 194ba8f5188..b85ad34f609 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -8189,3 +8189,52 @@ and,
> >   (bit_and (negate @0) integer_onep@1)
> >   (if (!TYPE_OVERFLOW_SANITIZED (type))
> >(bit_and @0 @1)))
> > +
> > +/* Optimize
> > +   c1 = VEC_PERM_EXPR (a, a, mask)
> > +   c2 = VEC_PERM_EXPR (b, b, mask)
> > +   c3 = c1 op c2
> > +   -->
> > +   c = a op b
> > +   c3 = VEC_PERM_EXPR (c, c, mask)
> > +   For all integer non-div operations.  */
> > +(for op (plus minus mult bit_and bit_ior bit_xor
> > +lshift rshift)
> > + (simplify
> > +  (op (vec_perm @0 @0 VECTOR_CST@2) (vec_perm @1 @1 VECTOR_CST@2))
> > +(if (VECTOR_INTEGER_TYPE_P (type))
> > + (vec_perm (op @0 @1) (op @0 @1) @2
> Just wondering, why should mask be CST here ?
> I guess the transform should work as long as mask is same for both
> vectors even if it's
> not constant ?

Yes, please change accordingly (and maybe push separately).

> > +
> > +/* Similar for float arithmetic when permutation constant covers
> > +   all vector elements.  */
> > +(for op (plus minus mult)
> > + (simplify
> > +  (op (vec_perm @0 @0 VECTOR_CST@2) (vec_perm @1 @1 VECTOR_CST@2))
> > +(if (VECTOR_FLOAT_TYPE_P (type))
> > + (with
> > +  {
> > +   tree perm_cst = @2;
> > +   vec_perm_builder builder;
> > +   bool full_perm_p = false;
> > +   if (tree_to_vec_perm_builder (, perm_cst))
> > + {
> > +   /* Create a vec_perm_indices for the integer vector.  */
> > +   int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant ();
> If this transform is meant only for VLS vectors, I guess you should
> bail out if TYPE_VECTOR_SUBPARTS is not constant,
> otherwise it will crash for VLA vectors.

I suppose it's difficult to create a VLA permute that covers all elements
and that is not trivial though.  But indeed add ().is_constant to the
VECTOR_FLOAT_TYPE_P guard.

>
> Thanks,
> Prathamesh
> > +   vec_perm_indices sel (builder, 1, nelts);
> > +
> > +   /* Check if perm indices covers all vector elements.  */
> > +   int count = 0, i, j;
> > +   for (i = 0; i < nelts; i++)
> > + for (j = 0; j < nelts; j++)

Meh, that's quadratic!  I suggest to check .encoding ().encoded_full_vector_p ()
(as said I can't think of a non-full encoding that isn't trivial
but covers all elements) and then simply .qsort () the vector_builder
(it derives
from vec<>) so the scan is O(n log n).

Maybe Richard has a better idea here though.

Otherwise looks OK, though with these kind of (* (op ..) (op ..)) patterns it's
always that they explode the match decision tree, we'd ideally have a way to
match those with (op ..) (op ..) first to be able to share more of the matching
code.  That said, match.pd is a less than ideal place for these (but mostly
because of the way we code generate *-match.cc)

Richard.

> > +   {
> > + if (sel[j].to_constant () == i)
> > +   {
> > + count++;
> > + break;
> > +   }
> > +   }
> > +   full_perm_p = count == nelts;
> > + }
> > +   }
> > +   (if (full_perm_p)
> > +   (vec_perm (op @0 @1) (op @0 @1) @2))
> > diff --git a/gcc/testsuite/gcc.target/i386/pr98167.c 
> > b/gcc/testsuite/gcc.target/i386/pr98167.c
> > new file mode 100644
> > index 000..40e0ac11332
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr98167.c
> > @@ -0,0 +1,44 @@
> > +/* PR target/98167 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mavx2" } */
> > +
> > +/* { dg-final { scan-assembler-times "vpshufd\t" 8 } } */
> > +/* { dg-final { scan-assembler-times "vpermilps\t" 3 } } */
> > +
> > +#define VEC_PERM_4 \
> > +  2, 3, 1, 0
> > +#define VEC_PERM_8 \
> > +  4, 5, 6, 7, 3, 2, 1, 0
> 

[PATCH] amdgcn: Add builtins for vectorized native versions of abs, floorf and floor

2022-11-08 Thread Kwok Cheung Yeung

Hello

This patch adds three extra builtins for the vectorized forms of the 
abs, floorf and floor math functions, which are implemented by native 
GCN instructions. I have also added a test to check that they generate 
the expected assembler instructions.


Okay for trunk?

Thanks

KwokFrom 37f49b204d501327d0867b3e8a3f01b9445fb9bd Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Tue, 8 Nov 2022 11:59:58 +
Subject: [PATCH] amdgcn: Add builtins for vectorized native versions of abs,
 floorf and floor

2022-11-08  Kwok Cheung Yeung  

gcc/
* config/gcn/gcn-builtins.def (FABSV, FLOORVF, FLOORV): New builtins.
* config/gcn/gcn.cc (gcn_expand_builtin_1): Expand GCN_BUILTIN_FABSV,
GCN_BUILTIN_FLOORVF and GCN_BUILTIN_FLOORV.

gcc/testsuite/
* gcc.target/gcn/math-builtins-1.c: New test.
---
 gcc/config/gcn/gcn-builtins.def   | 15 +
 gcc/config/gcn/gcn.cc | 33 +++
 .../gcc.target/gcn/math-builtins-1.c  | 33 +++
 3 files changed, 81 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/gcn/math-builtins-1.c

diff --git a/gcc/config/gcn/gcn-builtins.def b/gcc/config/gcn/gcn-builtins.def
index 27691909925..c50777bd3b0 100644
--- a/gcc/config/gcn/gcn-builtins.def
+++ b/gcc/config/gcn/gcn-builtins.def
@@ -64,6 +64,21 @@ DEF_BUILTIN (FABSVF, 3 /*CODE_FOR_fabsvf */,
 _A2 (GCN_BTI_V64SF, GCN_BTI_V64SF),
 gcn_expand_builtin_1)
 
+DEF_BUILTIN (FABSV, 3 /*CODE_FOR_fabsv */,
+"fabsv", B_INSN,
+_A2 (GCN_BTI_V64DF, GCN_BTI_V64DF),
+gcn_expand_builtin_1)
+
+DEF_BUILTIN (FLOORVF, 3 /*CODE_FOR_floorvf */,
+"floorvf", B_INSN,
+_A2 (GCN_BTI_V64SF, GCN_BTI_V64SF),
+gcn_expand_builtin_1)
+
+DEF_BUILTIN (FLOORV, 3 /*CODE_FOR_floorv */,
+"floorv", B_INSN,
+_A2 (GCN_BTI_V64DF, GCN_BTI_V64DF),
+gcn_expand_builtin_1)
+
 DEF_BUILTIN (LDEXPVF, 3 /*CODE_FOR_ldexpvf */,
 "ldexpvf", B_INSN,
 _A3 (GCN_BTI_V64SF, GCN_BTI_V64SF, GCN_BTI_V64SI),
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 1996115a686..9c5e3419748 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -4329,6 +4329,39 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx 
/*subtarget */ ,
emit_insn (gen_absv64sf2 (target, arg));
return target;
   }
+case GCN_BUILTIN_FABSV:
+  {
+   if (ignore)
+ return target;
+   rtx arg = force_reg (V64DFmode,
+expand_expr (CALL_EXPR_ARG (exp, 0), NULL_RTX,
+ V64DFmode,
+ EXPAND_NORMAL));
+   emit_insn (gen_absv64df2 (target, arg));
+   return target;
+  }
+case GCN_BUILTIN_FLOORVF:
+  {
+   if (ignore)
+ return target;
+   rtx arg = force_reg (V64SFmode,
+expand_expr (CALL_EXPR_ARG (exp, 0), NULL_RTX,
+ V64SFmode,
+ EXPAND_NORMAL));
+   emit_insn (gen_floorv64sf2 (target, arg));
+   return target;
+  }
+case GCN_BUILTIN_FLOORV:
+  {
+   if (ignore)
+ return target;
+   rtx arg = force_reg (V64DFmode,
+expand_expr (CALL_EXPR_ARG (exp, 0), NULL_RTX,
+ V64DFmode,
+ EXPAND_NORMAL));
+   emit_insn (gen_floorv64df2 (target, arg));
+   return target;
+  }
 case GCN_BUILTIN_LDEXPVF:
   {
if (ignore)
diff --git a/gcc/testsuite/gcc.target/gcn/math-builtins-1.c 
b/gcc/testsuite/gcc.target/gcn/math-builtins-1.c
new file mode 100644
index 000..e1aadfb40d9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/math-builtins-1.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
+
+typedef float v64sf __attribute__ ((vector_size (256)));
+typedef double v64df __attribute__ ((vector_size (512)));
+typedef int v64si __attribute__ ((vector_size (256)));
+typedef long v64di __attribute__ ((vector_size (512)));
+
+v64sf f (v64sf _x, v64si _y)
+{
+  v64sf x = _x;
+  v64si y = _y;
+  x = __builtin_gcn_fabsvf (x); /* { dg-final { scan-assembler 
"v_add_f32\\s+v\[0-9\]+, 0, |v\[0-9\]+|" } } */
+  x = __builtin_gcn_floorvf (x); /* { dg-final { scan-assembler 
"v_floor_f32\\s+v\[0-9\]+, v\[0-9\]+" } }*/
+  x = __builtin_gcn_frexpvf_mant (x); /* { dg-final { scan-assembler 
"v_frexp_mant_f32\\s+v\[0-9\]+, v\[0-9\]+" } }*/
+  y = __builtin_gcn_frexpvf_exp (x); /* { dg-final { scan-assembler 
"v_frexp_exp_i32_f32\\s+v\[0-9\]+, v\[0-9\]+" } }*/
+  x = __builtin_gcn_ldexpvf (x, y); /* { dg-final { scan-assembler 
"v_ldexp_f32\\s+v\[0-9\]+, v\[0-9\]+, v\[0-9\]+" } }*/
+
+  return x;
+}
+
+v64df g (v64df _x, v64si _y)
+{
+  v64df x = _x;
+  v64si y = _y;
+  x = 

[PATCH] CCP: handle division by a power of 2 as a right shift.

2022-11-08 Thread Aldy Hernandez via Gcc-patches
We have some code in range-ops that sets better maybe nonzero bits for
TRUNC_DIV_EXPR by a power of 2 than CCP does, by just shifting the
mask.  I'd like to offload this functionality into the CCP mask
tracking code, which already does the right thing for right shifts.

The testcase for this change is gcc.dg/tree-ssa/vrp123.c and
gcc.dg/tree-ssa/pr107541.c.

Tested on x86-64 Linux.

OK?

gcc/ChangeLog:

* range-op.cc (operator_div::fold_range): Call
update_known_bitmask.
* tree-ssa-ccp.cc (bit_value_binop): Handle divisions by powers of
2 as a right shift.
---
 gcc/range-op.cc | 18 +-
 gcc/tree-ssa-ccp.cc | 12 
 2 files changed, 13 insertions(+), 17 deletions(-)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 846931ddcae..8ff5d5b4c78 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -1995,23 +1995,7 @@ operator_div::fold_range (irange , tree type,
   if (!cross_product_operator::fold_range (r, type, lh, rh, trio))
 return false;
 
-  if (lh.undefined_p ())
-return true;
-
-  tree t;
-  if (code == TRUNC_DIV_EXPR
-  && rh.singleton_p ()
-  && !wi::neg_p (lh.lower_bound ()))
-{
-  wide_int wi = wi::to_wide (t);
-  int shift = wi::exact_log2 (wi);
-  if (shift != -1)
-   {
- wide_int nz = lh.get_nonzero_bits ();
- nz = wi::rshift (nz, shift, TYPE_SIGN (type));
- r.set_nonzero_bits (nz);
-   }
-}
+  update_known_bitmask (r, code, lh, rh);
   return true;
 }
 
diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc
index 3a4b6bc1118..2bcd90646f6 100644
--- a/gcc/tree-ssa-ccp.cc
+++ b/gcc/tree-ssa-ccp.cc
@@ -1934,6 +1934,18 @@ bit_value_binop (enum tree_code code, signop sgn, int 
width,
   {
widest_int r1max = r1val | r1mask;
widest_int r2max = r2val | r2mask;
+   if (r2mask == 0 && !wi::neg_p (r1max))
+ {
+   widest_int shift = wi::exact_log2 (r2val);
+   if (shift != -1)
+ {
+   // Handle division by a power of 2 as an rshift.
+   bit_value_binop (RSHIFT_EXPR, sgn, width, val, mask,
+r1type_sgn, r1type_precision, r1val, r1mask,
+r2type_sgn, r2type_precision, shift, r2mask);
+   return;
+ }
+ }
if (sgn == UNSIGNED
|| (!wi::neg_p (r1max) && !wi::neg_p (r2max)))
  {
-- 
2.38.1



Re: [PATCH] diagnostics: Allow FEs to keep customizations for middle end [PR101551, PR106274]

2022-11-08 Thread Richard Biener via Gcc-patches
On Thu, Nov 3, 2022 at 9:07 PM Lewis Hyatt  wrote:
>
> On Fri, Oct 28, 2022 at 10:28:21AM +0200, Richard Biener wrote:
> > Yes, the idea was also to free up memory but then that part never
> > really materialized - the idea was to always run free-lang-data, not
> > just when later outputting LTO bytecode.  The reason is probably
> > mainly the diagnostic regressions you observe.
> >
> > Maybe a better strathegy than your patch would be to work towards
> > that goal but reduce the number of "freeings", instead adjusting the
> > LTO streamer to properly ignore frontend specific bits where clearing
> > conflicts with the intent to preserve accurate diagnostics throughout
> > the compilation.
> >
> > If you see bits that when not freed would fix some of the observed
> > issues we can see to replicate the freeing in the LTO output machinery.
> >
> > Richard.
>
> Thanks again for the suggestions. I took a look and it seems pretty doable to
> just stop resetting all the diagnostics hooks in free-lang-data. Once that's
> done, the only problematic part that I have been able to identify is here in
> ipa-free-lang-data.c around line 674:
>
> 
>   /* We need to keep field decls associated with their trees. Otherwise tree
>  merging may merge some fields and keep others disjoint which in turn will
>  not do well with TREE_CHAIN pointers linking them.
>
>  Also do not drop containing types for virtual methods and tables because
>  these are needed by devirtualization.
>  C++ destructors are special because C++ frontends sometimes produces
>  virtual destructor as an alias of non-virtual destructor.  In
>  devirutalization code we always walk through aliases and we need
>  context to be preserved too.  See PR89335  */
>   if (TREE_CODE (decl) != FIELD_DECL
>   && ((TREE_CODE (decl) != VAR_DECL && TREE_CODE (decl) != FUNCTION_DECL)
>   || (!DECL_VIRTUAL_P (decl)
>   && (TREE_CODE (decl) != FUNCTION_DECL
>   || !DECL_CXX_DESTRUCTOR_P (decl)
> DECL_CONTEXT (decl) = fld_decl_context (DECL_CONTEXT (decl));
> 
>
> The C++ implementations of the decl_printable_name langhook and the diagnostic
> starter callback do not work as-is when the DECL_CONTEXT for class member
> functions disappears.  So I did have a patch that changes the C++
> implementations to work in this case, but attached here is a new one along the
> lines of what you suggested, rather changing the above part of free-lang-data
> so it doesn't activate as often. The patch is pretty complete (other than
> missing a commit message) and bootstrap + regtest all languages looks good
> with no regressions. I tried the same with BUILD_CONFIG=bootstrap-lto as well,
> and that also looked good when it eventually finished. I added testcases for
> several frontends to verify that the diagnostics still work with -flto. I am
> not sure what are the implications for LTO itself, of changing this part of
> the pass, so I would have to ask you to weigh in on that aspect please. 
> Thanks!

First of all sorry for the delay and thanks for trying.  The effect on LTO is an
increase in the amount of streamed IL since we follow the DECL_CONTEXT
edge when streaming the tree graph.  So my solution for this would be to
reflect the case you remove in free-lang-data in both
lto-streamer-out.cc:DFS::DFS_write_tree_body where we do

  if (TREE_CODE (expr) != TRANSLATION_UNIT_DECL
  && ! DECL_CONTEXT (expr))
DFS_follow_tree_edge ((*all_translation_units)[0]);
  else
DFS_follow_tree_edge (DECL_CONTEXT (expr));

and in tree-streamer-out.cc:write_ts_decl_minimal_tree_pointers which
does

  if (TREE_CODE (expr) != TRANSLATION_UNIT_DECL
  && ! DECL_CONTEXT (expr))
stream_write_tree_ref (ob, (*all_translation_units)[0]);
  else
stream_write_tree_ref (ob, DECL_CONTEXT (expr));

that possibly boils down to "just" doing

   tree ctx = DECL_CONTEXT (..);
   if (TREE_CODE (..) == VAR_DECL || TREE_CODE (..) == FUNCTION_DECL)
 ctx = fld_decl_context (ctx);

and using 'ctx' for DECL_CONTEXT in those two places (and exporting the
fld_decl_context function).

As said the idea for this is that we want to avoid streaming type trees when
not necessary.  When doing an LTO bootstrap with your patch you should
see (slightly) larger object files.

Richard.

>
> -Lewis


  1   2   >