Re: [SVE] PR96463 - Optimise svld1rq from vectors

2021-12-14 Thread Prathamesh Kulkarni via Gcc-patches
On Tue, 7 Dec 2021 at 19:08, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Thu, 2 Dec 2021 at 23:11, Richard Sandiford
> >  wrote:
> >>
> >> Prathamesh Kulkarni  writes:
> >> > Hi Richard,
> >> > I have attached a WIP untested patch for PR96463.
> >> > IIUC, the PR suggests to transform
> >> > lhs = svld1rq ({-1, -1, ...}, &v[0])
> >> > into:
> >> > lhs = vec_perm_expr
> >> > if v is vector of 4 elements, and each element is 32 bits on little
> >> > endian target ?
> >> >
> >> > I am sorry if this sounds like a silly question, but I am not sure how
> >> > to convert a vector of type int32x4_t into svint32_t ? In the patch, I
> >> > simply used NOP_EXPR (which I expected to fail), and gave type error
> >> > during gimple verification:
> >>
> >> It should be possible in principle to have a VEC_PERM_EXPR in which
> >> the operands are Advanced SIMD vectors and the result is an SVE vector.
> >>
> >> E.g., the dup in the PR would be something like this:
> >>
> >> foo (int32x4_t a)
> >> {
> >>   svint32_t _2;
> >>
> >>   _2 = VEC_PERM_EXPR ;
> >>   return _2;
> >> }
> >>
> >> where the final operand can be built using:
> >>
> >>   int source_nelts = TYPE_VECTOR_SUBPARTS (…rhs type…).to_constant ();
> >>   vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (…lhs type…), source_nelts, 
> >> 1);
> >>   for (int i = 0; i < source_nelts; ++i)
> >> sel.quick_push (i);
> >>
> >> I'm not sure how well-tested that combination is though.  It might need
> >> changes to target-independent code.
> > Hi Richard,
> > Thanks for the suggestions.
> > I tried the above approach in attached patch, but it still results in
> > ICE due to type mismatch:
> >
> > pr96463.c: In function ‘foo’:
> > pr96463.c:8:1: error: type mismatch in ‘vec_perm_expr’
> > 8 | }
> >   | ^
> > svint32_t
> > int32x4_t
> > int32x4_t
> > svint32_t
> > _3 = VEC_PERM_EXPR ;
> > during GIMPLE pass: ccp
> > dump file: pr96463.c.032t.ccp1
> > pr96463.c:8:1: internal compiler error: verify_gimple failed
> >
> > Should we perhaps add another tree code, that "extends" a fixed-width
> > vector into it's VLA equivalent ?
>
> No, I think this is just an extreme example of the combination not being
> well-tested. :-)  Obviously it's worse than I thought.
>
> I think accepting this kind of VEC_PERM_EXPR is still the way to go.
> Richi, WDYT?
Hi Richi, ping ?

Thanks,
Prathamesh
>
> Thanks,
> Richard


Re: [PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL

2021-12-14 Thread Xionghu Luo via Gcc-patches



On 2021/12/13 18:24, Jan Hubicka wrote:
>>> gcc/ChangeLog:
>>>
>>> * loop-invariant.c (find_invariants_bb): Check profile count
>>> before motion.
>>> (find_invariants_body): Add argument.
>>> ---
>>>  gcc/loop-invariant.c | 10 +++---
>>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
>>> index 5eee2e5c9f8..c61c8612fae 100644
>>> --- a/gcc/loop-invariant.c
>>> +++ b/gcc/loop-invariant.c
>>> @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool 
>>> always_reached, bool always_executed)
>>> call.  */
>>>  
>>>  static void
>>> -find_invariants_bb (basic_block bb, bool always_reached, bool 
>>> always_executed)
>>> +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
>>> +   bool always_executed)
>>>  {
>>>rtx_insn *insn;
>>> +  basic_block preheader = loop_preheader_edge (loop)->src;
>>> +
>>> +  if (preheader->count > bb->count)
>>> +return;
>>
>> Please add a comment explaining the conditional and if possible also a
>> testcase.  Since profile updating and use is sensitive topic and it may
>> trigger regressions later, it is important to keep track of info why
>> given tests was added.
 
OK. Comments like?

/* Don't move insn of cold BB out of loop to preheader to reduce calculations
   and register live range in hot loop with cold BB.  */


And maybe some dump log will help tracking in xxx.c.271r.loop2_invariant.

--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -1190,7 +1190,12 @@ find_invariants_bb (class loop *loop, basic_block bb, 
bool always_reached,
   basic_block preheader = loop_preheader_edge (loop)->src;

   if (preheader->count > bb->count)
-return;
+{
+  if (dump_file)
+   fprintf (dump_file, "Don't move invariant from bb: %d in loop %d\n",
+bb->index, loop->num);
+  return;
+}


This case could reflect the patch's effect:


gcc/testsuite/gcc.dg/loop-invariant-2.c
/* { dg-do compile } */
/* { dg-options "-O2 -fdump-rtl-loop2_invariant" } */

volatile int x;
void
bar (int, char *, char *);
void
foo (int *a, int n, int k)
{
  int i;

  for (i = 0; i < n; i++)
{
  if (__builtin_expect (x, 0))
bar (k / 5, "one", "two");
  a[i] = k;
}
}

/* { dg-final { scan-rtl-dump-not "Decided to move invariant" "loop2_invariant" 
} } */


insn 27,28,29 was hoisted out of loop, with the count test patch, they are kept 
in
loop body.

 diff -U15 base/ssa-lim-23.c.271r.loop2_invariant 
patched/ssa-lim-23.c.271r.loop2_invariant

 *ending processing of loop 1 **
-Set in insn 27 is invariant (0), cost 16, depends on
-Set in insn 28 is invariant (1), cost 16, depends on
-Set in insn 29 is invariant (2), cost 8, depends on
-Set in insn 30 is invariant (3), cost 0, depends on 0
-Set in insn 31 is invariant (4), cost 0, depends on 1
-Set in insn 32 is invariant (5), cost 0, depends on 2
-Decided to move invariant 0 -- gain 16
-Decided to move invariant 1 -- gain 16
-Decided to move invariant 2 -- gain 8
-deferring rescan insn with uid = 27.
-deferring rescan insn with uid = 30.
-deferring rescan insn with uid = 61.
-changing bb of uid 27
-  from 5 to 3
-deferring rescan insn with uid = 28.
-deferring rescan insn with uid = 31.
-deferring rescan insn with uid = 62.
-changing bb of uid 28
-  from 5 to 3
-deferring rescan insn with uid = 29.
-deferring rescan insn with uid = 32.
-deferring rescan insn with uid = 63.
-changing bb of uid 29
-  from 5 to 3
 starting the processing of deferred insns
-rescanning insn with uid = 27.
-rescanning insn with uid = 28.
-rescanning insn with uid = 29.
-rescanning insn with uid = 30.
-rescanning insn with uid = 31.
-rescanning insn with uid = 32.
-rescanning insn with uid = 61.
-rescanning insn with uid = 62.
-rescanning insn with uid = 63.
 ending the processing of deferred insns
 starting the processing of deferred insns
 ending the processing of deferred insns

...

55: r138:DI=unspec[`*.LANCHOR0',%2:DI] 47
   REG_EQUAL `*.LANCHOR0'
-   27: r139:DI=unspec[`*.LC0',%2:DI] 47
-   28: r140:DI=unspec[`*.LC1',%2:DI] 47
-   29: r141:DI=sign_extend(r118:SI)
39: L39:
21: NOTE_INSN_BASIC_BLOCK 4
23: r117:SI=[r138:DI]
24: r133:CC=cmp(r117:SI,0)
   REG_DEAD r117:SI
25: pc={(r133:CC==0)?L34:pc}
   REG_DEAD r133:CC
   REG_BR_PROB 966367644
26: NOTE_INSN_BASIC_BLOCK 5
-   61: r134:DI=r139:DI
-   62: r135:DI=r140:DI
-   63: r136:DI=r141:DI
-   30: %5:DI=r139:DI
+   27: r134:DI=unspec[`*.LC0',%2:DI] 47
+  REG_EQUAL `*.LC0'
+   28: r135:DI=unspec[`*.LC1',%2:DI] 47
+  REG_EQUAL `*.LC1'
+   29: r136:DI=sign_extend(r118:SI)
+   30: %5:DI=r134:DI
   REG_DEAD r134:DI
   REG_EQUAL `*.LC0'
-   31: %4:DI=r140:DI
+   31: %4:DI=r135:DI
   REG_DEAD r135:DI
   REG_EQUAL `*.LC1'
-   32: %3:DI=r141:DI
+   32: %3:DI=r136:DI
   REG_DEAD r136:DI
33: call [`bar'] argc:0
   REG_DEAD %5:DI
   REG_DEAD %4:DI
   R

Re: [PATCH 2/3] Fix incorrect loop exit edge probability [PR103270]

2021-12-14 Thread Xionghu Luo via Gcc-patches



On 2021/12/13 17:25, Jan Hubicka wrote:
>> r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
>> profile-estimate when predict_extra_loop_exits, outer loop's exit edge
>> is marked as inner loop's extra loop exit and set with incorrect
>> prediction, then a hot inner loop will become cold loop finally through
>> optimizations, this patch add loop check when searching extra exit edges
>> to avoid unexpected predict_edge from predict_paths_for_bb.
>>
>> Regression tested on P8LE, OK for master?
>>
>> gcc/ChangeLog:
>>
>>  PR middle-end/103270
>>  * predict.c (predict_extra_loop_exits): Add loop parameter.
>>  (predict_loops): Call with loop argument.
> 
> With changes to branch predictors it is useful to re-test their
> effectivity on spec and see if their hitrates are still mathcing
> reality.  You can do it by buiding spec with -fprofile-generate, train
> it and then build with -fprofile-use -fdump-tree-ipa-profile-details
> and use contrib/analyze_brprob.py that will collect info on how they
> work.
> 
> This patch looks good to me, but it would be nice to have things reality
> checked (and since we did not do the stats for some time, there may be
> surprises) so if you could run the specs and post results of
> analyze_brprob, it would be great.  I will also try to get to that soon,
> but currently I am bit swamped by other problems I noticed on clang
> builds.
> 
> Thanks a lot for working on profile fixes - I am trying now to get
> things into shape.  With Martin we added basic testing infrastructure
> for keeping track of profile updates and I am trying to see how it works
> in practice now.  Hopefully it will make it easier to judge on profile
> updating patches. I would welcome list of patches I should look at.
> 
> I will write separate mail on this.
> Honza


With the patch, the analyze_brprob.py outputs below data with PGO build,
there is no verification code in the script, so how to check whether it
is correct?  Run it again without the patch and compare "extra loop exit"
field?


./contrib/analyze_brprob.py ~/workspace/tests/spec2017/dump_file_all
HEURISTICS   BRANCHES  (REL)  BR. HITRATE   
 HITRATE   COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
noreturn call   1   0.0%  100.00%   50.00% 
/  50.00%  2 2.00   0.0% 100%:1
Fortran zero-sized array3   0.0%   66.67%   41.71% 
/  60.50%362   362.00   0.0% 100%:3
loop iv compare16   0.0%   93.75%   98.26% 
/  98.76% 279847  279.85k   0.0% 93%:4
__builtin_expect   35   0.0%   97.14%   78.09% 
/  78.35%   17079558   17.08M   0.0%
loop guard with recursion  45   0.1%   86.67%   85.13% 
/  85.14% 67224244126.72G   1.3% 74%:4
extra loop exit80   0.1%   58.75%   81.49% 
/  89.21%  438470261  438.47M   0.1% 86%:3
guess loop iv compare 235   0.3%   80.85%   52.83% 
/  73.97%  148558247  148.56M   0.0% 47%:3
negative return   241   0.3%   71.37%   25.33% 
/  92.61%  250402383  250.40M   0.0% 69%:2
loop exit with recursion  315   0.4%   74.60%   85.07% 
/  85.71% 94031368589.40G   1.8% 59%:4
const return  320   0.4%   51.88%   90.45% 
/  95.63%  925341727  925.34M   0.2% 76%:5
indirect call 377   0.5%   51.46%   84.72% 
/  91.14% 21337728482.13G   0.4% 69%:1
polymorphic call  410   0.5%   44.15%   31.26% 
/  79.37% 32726882443.27G   0.6% 53%:2
recursive call506   0.7%   39.53%   44.97% 
/  83.92% 12110368061.21G   0.2% 10%:1
goto  618   0.8%   64.24%   65.37% 
/  83.57%  702446178  702.45M   0.1% 20%:1
null return   800   1.1%   64.62%   56.59% 
/  77.70%  603952067  603.95M   0.1% 28%:2
continue  956   1.3%   63.70%   65.65% 
/  79.97% 37803037993.78G   0.7% 52%:3
loop guard   1177   1.6%   56.33%   42.54% 
/  80.32% 73736014577.37G   1.4% 50%:2
opcode values positive (on trees)2020   2.7%   62.38%   64.16% 
/  84.44%31695571761   31.70G   6.0% 21%:2
loop exit3293   4.4%   76.19%   87.18% 
/  88

RE: GCC 11 backport does not build (no "directly_supported_p") - was: Re: pr103523: Check for PLUS/MINUS support

2021-12-14 Thread Joel Hutton via Gcc-patches
Bootstrapped and regression tested on releases/gcc-11 on aarch64.

Ok for 11?

Previous commit broke build as it relied on directly_supported_p which
is not in 11. This reworks to avoid using directly_supported_p.

gcc/ChangeLog:

  PR bootstrap/103688
  * tree-vect-loop.c (vectorizable_induction): Rework to avoid
directly_supported_p

From: Joel Hutton 
Sent: 13 December 2021 15:02
To: Richard Biener ; gcc-patches@gcc.gnu.org; 
Tobias Burnus ; Richard Sandiford 

Cc: Richard Biener 
Subject: Re: GCC 11 backport does not build (no "directly_supported_p") - was: 
Re: pr103523: Check for PLUS/MINUS support

My mistake, reworked patch. Tests are still running.

From: Richard Biener 
mailto:richard.guent...@gmail.com>>
Sent: 13 December 2021 14:47
To: gcc-patches@gcc.gnu.org 
mailto:gcc-patches@gcc.gnu.org>>; Tobias Burnus 
mailto:tob...@codesourcery.com>>; Joel Hutton 
mailto:joel.hut...@arm.com>>; Richard Sandiford 
mailto:richard.sandif...@arm.com>>
Cc: GCC Patches mailto:gcc-patches@gcc.gnu.org>>; 
Richard Biener mailto:rguent...@suse.de>>
Subject: Re: GCC 11 backport does not build (no "directly_supported_p") - was: 
Re: pr103523: Check for PLUS/MINUS support

On December 13, 2021 3:27:50 PM GMT+01:00, Tobias Burnus 
mailto:tob...@codesourcery.com>> wrote:
>Hi Joel,
>
>your patch fails here with:
>
>../../repos/gcc-11-commit/gcc/tree-vect-loop.c:8000:8: error:
>'directly_supported_p' was not declared in this scope
>  8000 |   if (!directly_supported_p (PLUS_EXPR, step_vectype)
>   |^~~~
>
>And "git grep" shows that this is only present in:
>
>gcc/tree-vect-loop.c:  if (!directly_supported_p (PLUS_EXPR, step_vectype)
>gcc/tree-vect-loop.c:  || !directly_supported_p (MINUS_EXPR,
>step_vectype))
>
>That's different on mainline, which offers that function.

Just as a reminder, backports need regular bootstrap and regtest validation on 
the respective branches.

Richard.

>Tobias
>
>On 10.12.21 14:24, Joel Hutton via Gcc-patches wrote:
>> ok for backport to 11?
>> 
>> From: Richard Sandiford 
>> mailto:richard.sandif...@arm.com>>
>> Sent: 10 December 2021 10:22
>> To: Joel Hutton mailto:joel.hut...@arm.com>>
>> Cc: GCC Patches mailto:gcc-patches@gcc.gnu.org>>; 
>> Richard Biener mailto:rguent...@suse.de>>
>> Subject: Re: pr103523: Check for PLUS/MINUS support
>>
>> Joel Hutton mailto:joel.hut...@arm.com>> writes:
>>> Hi all,
>>>
>>> This is to address pr103523.
>>>
>>> bootstrapped and regression tested on aarch64.
>>>
>>> Check for PLUS_EXPR/MINUS_EXPR support in vectorizable_induction.
>>> PR103523 is an ICE on valid code:
>>>
>>> void d(float *a, float b, int c) {
>>>  float e;
>>>  for (; c; c--, e += b)
>>>a[c] = e;
>>> }
>>>
>>> This is due to not checking for PLUS_EXPR support, which is missing in
>>> VNx2sf mode. This causes an ICE at expand time. This patch adds a check
>>> for support in vectorizable_induction.
>>>
>>> gcc/ChangeLog:
>>>
>>>  PR tree-optimization/PR103523
>> The bugzilla hook expects: PR tree-optimization/103523
>>
>>>  * tree-vect-loop.c (vectorizable_induction): Check for
>>>  PLUS_EXPR/MINUS_EXPR support.
>> OK, thanks.
>>
>> Richard
>-
>Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
>München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
>Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
>München, HRB 106955


0001-vect-loop-fix-build.patch
Description: 0001-vect-loop-fix-build.patch


[PATCH]AArch64 Fix the AAPCs for new partial and full SIMD structure types [PR103094]

2021-12-14 Thread Tamar Christina via Gcc-patches
Hi All,

The new partial and full vector types added to AArch64, e.g.

int8x8x2_t with mode V2x8QI are incorrectly being defined as being short
vectors and not being composite types.

This causes the layout code to incorrectly conclude that the registers are
packed. i.e. for V2x8QI it thinks those 16-bytes are in the same registers.

Because of this the code under !aarch64_composite_type_p is unreachable but also
lacked any extra checks to see that nregs is what we expected it to be.

I have also updated aarch64_advsimd_full_struct_mode_p and 
aarch64_advsimd_partial_struct_mode_p to only consider vector types as struct
modes.  Otherwise types such as OImode and friends would qualify leading to
incorrect results.

This patch fixes up the issues and we now generate correct code.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar



gcc/ChangeLog:

PR target/103094
* config/aarch64/aarch64.c (aarch64_function_value, aarch64_layout_arg):
Fix unreachable code for partial vectors and re-order switch to perform
the simplest test first.
(aarch64_short_vector_p): Mark as not short vectors.
(aarch64_composite_type_p): Mark as composite types.
(aarch64_advsimd_partial_struct_mode_p,
aarch64_advsimd_full_struct_mode_p): Restrict to actual SIMD types.

gcc/testsuite/ChangeLog:

PR target/103094
* gcc.target/aarch64/pr103094.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
fdf05505846721b02059df494d6395ae9423a8ef..d9104ddac3cdd44f7c2290b8725d05be4fd6468f
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3055,15 +3055,17 @@ aarch64_advsimd_struct_mode_p (machine_mode mode)
 static bool
 aarch64_advsimd_partial_struct_mode_p (machine_mode mode)
 {
-  return (aarch64_classify_vector_mode (mode)
- == (VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL));
+  return VECTOR_MODE_P (mode)
+&& (aarch64_classify_vector_mode (mode)
+   == (VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL));
 }
 
 /* Return true if MODE is an Advanced SIMD Q-register structure mode.  */
 static bool
 aarch64_advsimd_full_struct_mode_p (machine_mode mode)
 {
-  return (aarch64_classify_vector_mode (mode) == (VEC_ADVSIMD | VEC_STRUCT));
+  return VECTOR_MODE_P (mode)
+&& (aarch64_classify_vector_mode (mode) == (VEC_ADVSIMD | VEC_STRUCT));
 }
 
 /* Return true if MODE is any of the data vector modes, including
@@ -6468,17 +6470,21 @@ aarch64_function_value (const_tree type, const_tree 
func,
   NULL, false))
 {
   gcc_assert (!sve_p);
-  if (!aarch64_composite_type_p (type, mode))
+  if (aarch64_advsimd_full_struct_mode_p (mode))
+   {
+ gcc_assert (known_eq (exact_div (GET_MODE_SIZE (mode), 16), count));
+ return gen_rtx_REG (mode, V0_REGNUM);
+   }
+  else if (aarch64_advsimd_partial_struct_mode_p (mode))
+   {
+ gcc_assert (known_eq (exact_div (GET_MODE_SIZE (mode), 8), count));
+ return gen_rtx_REG (mode, V0_REGNUM);
+   }
+  else if (!aarch64_composite_type_p (type, mode))
{
  gcc_assert (count == 1 && mode == ag_mode);
  return gen_rtx_REG (mode, V0_REGNUM);
}
-  else if (aarch64_advsimd_full_struct_mode_p (mode)
-  && known_eq (GET_MODE_SIZE (ag_mode), 16))
-   return gen_rtx_REG (mode, V0_REGNUM);
-  else if (aarch64_advsimd_partial_struct_mode_p (mode)
-  && known_eq (GET_MODE_SIZE (ag_mode), 8))
-   return gen_rtx_REG (mode, V0_REGNUM);
   else
{
  int i;
@@ -6745,6 +6751,7 @@ aarch64_layout_arg (cumulative_args_t pcum_v, const 
function_arg_info &arg)
 /* No frontends can create types with variable-sized modes, so we
shouldn't be asked to pass or return them.  */
 size = GET_MODE_SIZE (mode).to_constant ();
+
   size = ROUND_UP (size, UNITS_PER_WORD);
 
   allocate_ncrn = (type) ? !(FLOAT_TYPE_P (type)) : !FLOAT_MODE_P (mode);
@@ -6769,17 +6776,21 @@ aarch64_layout_arg (cumulative_args_t pcum_v, const 
function_arg_info &arg)
   if (nvrn + nregs <= NUM_FP_ARG_REGS)
{
  pcum->aapcs_nextnvrn = nvrn + nregs;
- if (!aarch64_composite_type_p (type, mode))
+ if (aarch64_advsimd_full_struct_mode_p (mode))
+   {
+ gcc_assert (nregs == size / 16);
+ pcum->aapcs_reg = gen_rtx_REG (mode, V0_REGNUM + nvrn);
+   }
+ else if (aarch64_advsimd_partial_struct_mode_p (mode))
+   {
+ gcc_assert (nregs == size / 8);
+ pcum->aapcs_reg = gen_rtx_REG (mode, V0_REGNUM + nvrn);
+   }
+ else if (!aarch64_composite_type_p (type, mode))
{
  gcc_assert (nregs == 1);
  pcum->aapcs_reg = gen_rtx_REG (mode, V0_REGNUM + nvrn);
}
- 

[PATCH]middle-end: REE should always check all vector usages, even if it finds a defining def. [PR103350]

2021-12-14 Thread Tamar Christina via Gcc-patches
Hi All,

This and the report in PR103632 are caused by a bug in REE where it generates
incorrect code.

It's trying to eliminate the following zero extension

(insn 54 90 102 2 (set (reg:V4SI 33 v1)
(zero_extend:V4SI (reg/v:V4HI 40 v8)))
 (nil))

by folding it in the definition of `v8`:

(insn 2 5 104 2 (set (reg/v:V4HI 40 v8)
(reg:V4HI 32 v0 [156]))
 (nil))

which is fine, except that `v8` is also used by the extracts, e.g.:

(insn 11 10 12 2 (set (reg:SI 1 x1)
(zero_extend:SI (vec_select:HI (reg/v:V4HI 40 v8)
(parallel [
(const_int 3)
]
 (nil))

REE replaces insn 2 by folding insn 54 and placing it at the definition site of
insn 2, so before insn 11.

Trying to eliminate extension:
(insn 54 90 102 2 (set (reg:V4SI 33 v1)
(zero_extend:V4SI (reg/v:V4HI 40 v8)))
 (nil))
Tentatively merged extension with definition (copy needed):
(insn 2 5 104 2 (set (reg:V4SI 33 v1)
(zero_extend:V4SI (reg:V4HI 32 v0)))
 (nil))

to produce

(insn 2 5 110 2 (set (reg:V4SI 33 v1)
(zero_extend:V4SI (reg:V4HI 32 v0)))
 (nil))
(insn 110 2 104 2 (set (reg:V4SI 40 v8)
(reg:V4SI 33 v1))
 (nil))

The new insn 2 using v0 directly is correct, but the insn 110 it creates is
wrong, `v8` should still be V4HI.

or it also needs to eliminate the zero extension from the extracts, so instead
of

(insn 11 10 12 2 (set (reg:SI 1 x1)
(zero_extend:SI (vec_select:HI (reg/v:V4HI 40 v8)
(parallel [
(const_int 3)
]
 (nil))

it should be

(insn 11 10 12 2 (set (reg:SI 1 x1)
(vec_select:SI (reg/v:V4SI 40 v8)
(parallel [
(const_int 3)
])))
 (nil))

without doing so the indices have been remapped in the extension and so we
extract the wrong elements

At any other optimization level but -Os ree seems to abort so this doesn't
trigger:

Trying to eliminate extension:
(insn 54 90 101 2 (set (reg:V4SI 32 v0)
(zero_extend:V4SI (reg/v:V4HI 40 v8)))
 (nil))
Elimination opportunities = 2 realized = 0

purely due to the ordering of instructions. REE doesn't check uses of `v8`
because it assumes that with a zero extended value, you still have access to the
lower bits by using the the bottom part of the register.

This is true for scalar but not for vector.  This would have been fine as well
if REE had eliminated the zero_extend on insn 11 and the rest but it doesn't do
so since REE can only handle cases where the SRC value are REG_P.

It does try to do this in add_removable_extension:

 1160  /* For vector mode extensions, ensure that all uses of the
 1161 XEXP (src, 0) register are in insn or debug insns, as unlike
 1162 integral extensions lowpart subreg of the sign/zero extended
 1163 register are not equal to the original register, so we have
 1164 to change all uses or none and the current code isn't able
 1165 to change them all at once in one transaction.  */

However this code doesn't trigger for the example because REE doesn't check the
uses if the defining instruction doesn't feed into another extension..

Which is bogus. For vectors it should always check all usages.

r12-2288-g8695bf78dad1a42636775843ca832a2f4dba4da3 simply exposed this as it now
lowers VEC_SELECT 0 into the RTL canonical form subreg 0 which causes REE to run
more often.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR rtl-optimization/103350
* ree.c (add_removable_extension): Don't stop at first definition but
inspect all.

gcc/testsuite/ChangeLog:

PR rtl-optimization/103350
* gcc.target/aarch64/pr103350-1.c: New test.
* gcc.target/aarch64/pr103350-2.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/ree.c b/gcc/ree.c
index 
e31ca2fa1a8073c09b054602c2fa19cfe0cb23c4..13debe8a4af1e8abe666d88b6694163172894030
 100644
--- a/gcc/ree.c
+++ b/gcc/ree.c
@@ -1165,31 +1165,28 @@ add_removable_extension (const_rtx expr, rtx_insn *insn,
   to change them all at once in one transaction.  */
else if (VECTOR_MODE_P (GET_MODE (XEXP (src, 0
  {
-   if (idx == 0)
- {
-   struct df_link *ref_chain, *ref_link;
+   struct df_link *ref_chain, *ref_link;
 
-   ref_chain = DF_REF_CHAIN (def->ref);
-   for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
+   ref_chain = DF_REF_CHAIN (def->ref);
+   for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
+ {
+   if (ref_link->ref == NULL
+   || DF_REF_INSN_INFO (ref_link->ref) == NULL)
  {
-   if (ref_link->ref == NULL
-   || DF_REF_INSN_INFO (ref_link->ref) == NULL)
-  

Re: GCC 11 backport does not build (no "directly_supported_p") - was: Re: pr103523: Check for PLUS/MINUS support

2021-12-14 Thread Jakub Jelinek via Gcc-patches
On Tue, Dec 14, 2021 at 09:37:03AM +, Joel Hutton via Gcc-patches wrote:
> Bootstrapped and regression tested on releases/gcc-11 on aarch64.
> 
> Ok for 11?
> 
> Previous commit broke build as it relied on directly_supported_p which
> is not in 11. This reworks to avoid using directly_supported_p.
> 
> gcc/ChangeLog:
> 
>   PR bootstrap/103688
>   * tree-vect-loop.c (vectorizable_induction): Rework to avoid
> directly_supported_p

Missing . after directly_supported_p

--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -7997,8 +7997,14 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   tree step_vectype = get_same_sized_vectype (TREE_TYPE (step_expr), vectype);
 
   /* Check for backend support of PLUS/MINUS_EXPR. */
-  if (!directly_supported_p (PLUS_EXPR, step_vectype)
-  || !directly_supported_p (MINUS_EXPR, step_vectype))
+  direct_optab ot_plus = optab_for_tree_code (tree_code (PLUS_EXPR),
+step_vectype, optab_default);
+  direct_optab ot_minus = optab_for_tree_code (tree_code (MINUS_EXPR),
+step_vectype, optab_default);

Why tree_code (PLUS_EXPR) instead of just PLUS_EXPR (ditto MINUS_EXPR)?
The formatting is off, step_vectype isn't aligned below tree_code.

+  if (ot_plus == unknown_optab
+  || ot_minus == unknown_optab
+  || optab_handler (ot_minus, TYPE_MODE (step_vectype)) == CODE_FOR_nothing
+  || optab_handler (ot_plus, TYPE_MODE (step_vectype)) == CODE_FOR_nothing)
 return false;

Won't optab_handler just return CODE_FOR_nothing for unknown_optab?

Anyway, I think best would be to write it as:
  if (!target_supports_op_p (step_vectype, PLUS_EXPR, optab_default)
  || !target_supports_op_p (step_vectype, MINUS_EXPR, optab_default))
return false;

Jakub



[PATCH] i386: Fix emissing of __builtin_cpu_supports.

2021-12-14 Thread Martin Liška

The patch fixes __builtin_cpu_supports("avx512vbmi2") which returns a negative
value (that's not allowed in the documentation).

I also checked ppc target that does the same, and __builtin_cpu_is, which
are fine.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin  

PR target/103661

gcc/ChangeLog:

* config/i386/i386-builtins.c (fold_builtin_cpu): Compare to 0
as API expects that non-zero values are returned.  For
"avx512vbmi2" argument, we return now 1 << 31, which is a
negative integer value.
---
 gcc/config/i386/i386-builtins.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386-builtins.c b/gcc/config/i386/i386-builtins.c
index 0fb14b55712..7e57b665c1e 100644
--- a/gcc/config/i386/i386-builtins.c
+++ b/gcc/config/i386/i386-builtins.c
@@ -2353,7 +2353,8 @@ fold_builtin_cpu (tree fndecl, tree *args)
   /* Return __cpu_model.__cpu_features[0] & field_val  */
   final = build2 (BIT_AND_EXPR, unsigned_type_node, array_elt,
  build_int_cstu (unsigned_type_node, field_val));
-  return build1 (CONVERT_EXPR, integer_type_node, final);
+  return build2 (NE_EXPR, integer_type_node, final,
+build_int_cst (unsigned_type_node, 0));
 }
   gcc_unreachable ();
 }
--
2.34.1



Re: [PATCH] Fix alignment of stack slots for overaligned types [PR103500]

2021-12-14 Thread Alex Coplan via Gcc-patches
Hi,

I'd just like to ping this for review:

https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585785.html

Thanks,
Alex

On 30/11/2021 16:48, Alex Coplan via Gcc-patches wrote:
> Hi,
> 
> This fixes PR103500 i.e. ensuring that stack slots for
> passed-by-reference overaligned types are appropriately aligned. For the
> testcase:
> 
> typedef struct __attribute__((aligned(32))) {
>   long x,y;
> } S;
> S x;
> void f(S);
> void g(void) { f(x); }
> 
> on AArch64, we currently generate (at -O2):
> 
> g:
> adrpx1, .LANCHOR0
> add x1, x1, :lo12:.LANCHOR0
> stp x29, x30, [sp, -48]!
> mov x29, sp
> ldp q0, q1, [x1]
> add x0, sp, 16
> stp q0, q1, [sp, 16]
> bl  f
> ldp x29, x30, [sp], 48
> ret
> 
> so the stack slot for the passed-by-reference copy of the structure is
> at sp + 16, and the sp is only guaranteed to be 16-byte aligned, so the
> structure is only 16-byte aligned. The PCS requires the structure to be
> 32-byte aligned. After this patch, we generate:
> 
> g:
> adrpx1, .LANCHOR0
> add x1, x1, :lo12:.LANCHOR0
> stp x29, x30, [sp, -64]!
> mov x29, sp
> add x0, sp, 47
> ldp q0, q1, [x1]
> and x0, x0, -32
> stp q0, q1, [x0]
> bl  f
> ldp x29, x30, [sp], 64
> ret
> 
> i.e. we ensure 32-byte alignment for the struct.
> 
> The approach taken here is similar to that in
> function.c:assign_parm_setup_block where it handles the case for
> DECL_ALIGN (parm) > MAX_SUPPORTED_STACK_ALIGNMENT. This in turn is
> similar to the approach taken in cfgexpand.c:expand_stack_vars (where
> the function calls get_dynamic_stack_size) which is the code that
> handles the alignment for overaligned structures as addressable local
> variables (see the related case discussed in the PR).
> 
> This patch also updates the aapcs64 test mentioned in the PR to avoid
> the frontend folding away the alignment check. I've confirmed that the
> execution test actually fails on aarch64-linux-gnu prior to the patch
> being applied and passes afterwards.
> 
> Bootstrapped and regtested on aarch64-linux-gnu, x86_64-linux-gnu, and
> arm-linux-gnueabihf: no regressions.
> 
> I'd appreciate any feedback. Is it OK for trunk?
> 
> Thanks,
> Alex
> 
> gcc/ChangeLog:
> 
>   PR middle-end/103500
>   * function.c (get_stack_local_alignment): Align BLKmode overaligned
>   types to the alignment required by the type.
>   (assign_stack_temp_for_type): Handle BLKmode overaligned stack
>   slots by allocating a larger-than-necessary buffer and aligning
>   the address within appropriately.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR middle-end/103500
>   * gcc.target/aarch64/aapcs64/rec_align-8.c (test_pass_by_ref):
>   Prevent the frontend from folding our alignment check away by
>   using snprintf to store the pointer into a string and recovering
>   it with sscanf.

> diff --git a/gcc/function.c b/gcc/function.c
> index 61b3bd036b8..5ed722ab959 100644
> --- a/gcc/function.c
> +++ b/gcc/function.c
> @@ -278,7 +278,9 @@ get_stack_local_alignment (tree type, machine_mode mode)
>unsigned int alignment;
>  
>if (mode == BLKmode)
> -alignment = BIGGEST_ALIGNMENT;
> +alignment = (type && TYPE_ALIGN (type) > MAX_SUPPORTED_STACK_ALIGNMENT)
> +  ? TYPE_ALIGN (type)
> +  : BIGGEST_ALIGNMENT;
>else
>  alignment = GET_MODE_ALIGNMENT (mode);
>  
> @@ -872,21 +874,35 @@ assign_stack_temp_for_type (machine_mode mode, 
> poly_int64 size, tree type)
>  
>p = ggc_alloc ();
>  
> -  /* We are passing an explicit alignment request to assign_stack_local.
> -  One side effect of that is assign_stack_local will not round SIZE
> -  to ensure the frame offset remains suitably aligned.
> -
> -  So for requests which depended on the rounding of SIZE, we go ahead
> -  and round it now.  We also make sure ALIGNMENT is at least
> -  BIGGEST_ALIGNMENT.  */
> -  gcc_assert (mode != BLKmode || align == BIGGEST_ALIGNMENT);
> -  p->slot = assign_stack_local_1 (mode,
> -   (mode == BLKmode
> -? aligned_upper_bound (size,
> -   (int) align
> -   / BITS_PER_UNIT)
> -: size),
> -   align, 0);
> +  if (mode == BLKmode && align > MAX_SUPPORTED_STACK_ALIGNMENT)
> + {
> +   rtx allocsize = gen_int_mode (size, Pmode);
> +   get_dynamic_stack_size (&allocsize, 0, align, NULL);
> +   gcc_assert (CONST_INT_P (allocsize));
> +   size = UINTVAL (allocsize);
> +   p->slot = assign_stack_local_1 (mode,
> +   size,
> +

Re: [PATCH] i386: Fix emissing of __builtin_cpu_supports.

2021-12-14 Thread Jakub Jelinek via Gcc-patches
On Tue, Dec 14, 2021 at 10:55:01AM +0100, Martin Liška wrote:
> The patch fixes __builtin_cpu_supports("avx512vbmi2") which returns a negative
> value (that's not allowed in the documentation).
> 
> I also checked ppc target that does the same, and __builtin_cpu_is, which
> are fine.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
>   PR target/103661
> 
> gcc/ChangeLog:
> 
>   * config/i386/i386-builtins.c (fold_builtin_cpu): Compare to 0
>   as API expects that non-zero values are returned.  For
>   "avx512vbmi2" argument, we return now 1 << 31, which is a
>   negative integer value.
> ---
>  gcc/config/i386/i386-builtins.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/i386/i386-builtins.c b/gcc/config/i386/i386-builtins.c
> index 0fb14b55712..7e57b665c1e 100644
> --- a/gcc/config/i386/i386-builtins.c
> +++ b/gcc/config/i386/i386-builtins.c
> @@ -2353,7 +2353,8 @@ fold_builtin_cpu (tree fndecl, tree *args)
>/* Return __cpu_model.__cpu_features[0] & field_val  */
>final = build2 (BIT_AND_EXPR, unsigned_type_node, array_elt,
> build_int_cstu (unsigned_type_node, field_val));
> -  return build1 (CONVERT_EXPR, integer_type_node, final);
> +  return build2 (NE_EXPR, integer_type_node, final,
> +  build_int_cst (unsigned_type_node, 0));
>  }
>gcc_unreachable ();
>  }

Wouldn't this be better done only if field_val has the msb set
and keep the CONVERT_EXPR otherwise (why isn't it NOP_EXPR?)?

Jakub



Re: [PATCH] x86: PR target/103611: Splitter for DST:DI = (HI:SI<<32)|LO:SI.

2021-12-14 Thread Richard Sandiford via Gcc-patches
"Roger Sayle"  writes:
> A common idiom is to create a DImode value from the "concat" of two SImode
> values, using "(long long)hi << 32 | (long long)lo", where the operation
> may be ior, xor or plus.  On x86, with -m32, the high and low parts of
> a DImode register are actually different SImode registers (typically %edx
> and %eax) so ideally this idiom should reduce to two move instructions
> (or optimally, just clever register allocation).
>
> Unfortunately, GCC currently performs the IOR operation above on -m32,
> and worse allocates DImode registers (split to SImode register pairs)
> for both the zero extended HI and LO values.
>
> Hence, for test1 from the new test case below:
>
> typedef int __v4si __attribute__ ((__vector_size__ (16)));
> long long test1(__v4si v) {
>   unsigned int loVal = (unsigned int)v[0];
>   unsigned int hiVal = (unsigned int)v[1];
>   return (long long)(loVal) | ((long long)(hiVal) << 32);
> }
>
> we currently generate (with -m32 -O2 -msse4.1):
>
> test1:  subl$28, %esp
> pextrd  $1, %xmm0, %eax
> pmovzxdq%xmm0, %xmm1
> movq%xmm1, 8(%esp)
> movl%eax, %edx
> movl8(%esp), %eax
> orl 12(%esp), %edx
> addl$28, %esp
> orb $0, %ah
> ret
>
> with this patch we now generate:
>
> test1:  pextrd  $1, %xmm0, %edx
> movd%xmm0, %eax
> ret
>
> The fix is to recognize and split the idiom (hi<<32)|zext(lo) prior
> to register allocation on !TARGET_64BIT, simplifying this sequence to
> "highpart(dst) = hi; lowpart(dst) = lo".
>
> The one minor complication is that sse.md's define_insn for
> *vec_extractv4si_0_zext_sse4 can sometimes interfere with this
> optimization.  It turns out that on !TARGET_64BIT, the zero_extend:DI
> following vec_select:SI isn't free, and this insn gets split back
> into multiple instructions during later passes, but too late to
> be optimized away by this patch/reload.  Hence the last hunk of
> this patch is to restrict *vec_extractv4si_0_zext_sse4 to TARGET_64BIT.
> Checking PR target/80286, where *vec_extractv4si_0_zext_sse4 was
> first added, this seems reasonable (but this patch has been tested
> both with and without this last change, if it's consider controversial).
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without "--target_board='unix{-m32}'"
> with no new failures.  OK for mainline?

To play devil's advocate: does this optimisation belong in
target-specific code?  It feels pretty generic and could be keyed
off BITS_PER_WORD.

Thanks,
Richard

>
>
> 2021-12-13  Roger Sayle  
>
> gcc/ChangeLog
>   PR target/103611
>   * config/i386/i386.md (any_or_plus): New code iterator.
>   (define_split): Split (HI<<32)|zext(LO) into piece-wise
>   move instructions on !TARGET_64BIT.
>   * config/i386/sse.md (*vec_extractv4si_0_zext_sse4):
>   Restrict to TARGET_64BIT.
>
> gcc/testsuite/ChangeLog
>   PR target/103611
>   * gcc.target/i386/pr103611-2.c: New test case.
>
>
> Thanks in advance,
> Roger
> --


RE: GCC 11 backport does not build (no "directly_supported_p") - was: Re: pr103523: Check for PLUS/MINUS support

2021-12-14 Thread Joel Hutton via Gcc-patches
> +  if (ot_plus == unknown_optab
> +  || ot_minus == unknown_optab
> +  || optab_handler (ot_minus, TYPE_MODE (step_vectype)) ==
> CODE_FOR_nothing
> +  || optab_handler (ot_plus, TYPE_MODE (step_vectype)) ==
> + CODE_FOR_nothing)
>  return false;
> 
> Won't optab_handler just return CODE_FOR_nothing for unknown_optab?

I was taking the check used in directly_supported_p

return (optab != unknown_optab$
&& optab_handler (optab, TYPE_MODE (type)) != CODE_FOR_nothing);$

> Anyway, I think best would be to write it as:
>   if (!target_supports_op_p (step_vectype, PLUS_EXPR, optab_default)
>   || !target_supports_op_p (step_vectype, MINUS_EXPR, optab_default))
> return false;
Looks good to me.

Patch attached.

Tests running on gcc-11 on aarch64. 

Ok for 11 once tests come back?


0001-vect-loop-fix-build.patch
Description: 0001-vect-loop-fix-build.patch


Re: [PATCH][1/4][committed] aarch64: Add support for Armv8.8-a memory operations and memcpy expansion

2021-12-14 Thread Richard Sandiford via Gcc-patches
Kyrylo Tkachov via Gcc-patches  writes:
> @@ -23568,6 +23568,28 @@ aarch64_copy_one_block_and_progress_pointers (rtx 
> *src, rtx *dst,
>*dst = aarch64_progress_pointer (*dst);
>  }
>  
> +/* Expand a cpymem using the MOPS extension.  OPERANDS are taken
> +   from the cpymem pattern.  Return true iff we succeeded.  */
> +static bool
> +aarch64_expand_cpymem_mops (rtx *operands)
> +{
> +  if (!TARGET_MOPS)
> +return false;
> +  rtx addr_dst = XEXP (operands[0], 0);
> +  rtx addr_src = XEXP (operands[1], 0);
> +  rtx sz_reg = operands[2];
> +
> +  if (!REG_P (sz_reg))
> +sz_reg = force_reg (DImode, sz_reg);
> +  if (!REG_P (addr_dst))
> +addr_dst = force_reg (DImode, addr_dst);
> +  if (!REG_P (addr_src))
> +addr_src = force_reg (DImode, addr_src);
> +  emit_insn (gen_aarch64_cpymemdi (addr_dst, addr_src, sz_reg));
> +
> +  return true;
> +}

On this, I think it would be better to adjust the original src and dst
MEMs if possible, since they contain metadata about the size of the
access and alias set information.  It looks like the code above
generates an instruction with a wild read and a wild write instead.

It should be possible to do that with a define_expand/define_insn
pair, where the define_expand takes two extra operands for the MEMs,
but the define_insn contains the same operands as now.

Since the instruction clobbers the three registers, I think we have to
use copy_to_reg (unconditionally) to force a fresh register.  The ultimate
caller is not expecting the values of the registers in the original
address to change.

Thanks,
Richard



> +
>  /* Expand cpymem, as if from a __builtin_memcpy.  Return true if
> we succeed, otherwise return false, indicating that a libcall to
> memcpy should be emitted.  */
> @@ -23581,19 +23603,25 @@ aarch64_expand_cpymem (rtx *operands)
>rtx base;
>machine_mode cur_mode = BLKmode;
>  
> -  /* Only expand fixed-size copies.  */
> +  /* Variable-sized memcpy can go through the MOPS expansion if available.  
> */
>if (!CONST_INT_P (operands[2]))
> -return false;
> +return aarch64_expand_cpymem_mops (operands);
>  
>unsigned HOST_WIDE_INT size = INTVAL (operands[2]);
>  
> -  /* Try to inline up to 256 bytes.  */
> -  unsigned HOST_WIDE_INT max_copy_size = 256;
> +  /* Try to inline up to 256 bytes or use the MOPS threshold if available.  
> */
> +  unsigned HOST_WIDE_INT max_copy_size
> += TARGET_MOPS ? aarch64_mops_memcpy_size_threshold : 256;
>  
>bool size_p = optimize_function_for_size_p (cfun);
>  
> +  /* Large constant-sized cpymem should go through MOPS when possible.
> + It should be a win even for size optimization in the general case.
> + For speed optimization the choice between MOPS and the SIMD sequence
> + depends on the size of the copy, rather than number of instructions,
> + alignment etc.  */
>if (size > max_copy_size)
> -return false;
> +return aarch64_expand_cpymem_mops (operands);
>  
>int copy_bits = 256;
>  
> @@ -23643,9 +23671,9 @@ aarch64_expand_cpymem (rtx *operands)
>nops += 2;
>n -= mode_bits;
>  
> -  /* Emit trailing copies using overlapping unaligned accesses - this is
> -  smaller and faster.  */
> -  if (n > 0 && n < copy_bits / 2)
> +  /* Emit trailing copies using overlapping unaligned accesses
> + (when !STRICT_ALIGNMENT) - this is smaller and faster.  */
> +  if (n > 0 && n < copy_bits / 2 && !STRICT_ALIGNMENT)
>   {
> machine_mode next_mode = smallest_mode_for_size (n, MODE_INT);
> int n_bits = GET_MODE_BITSIZE (next_mode).to_constant ();
> @@ -23657,9 +23685,25 @@ aarch64_expand_cpymem (rtx *operands)
>  }
>rtx_insn *seq = get_insns ();
>end_sequence ();
> +  /* MOPS sequence requires 3 instructions for the memory copying + 1 to move
> + the constant size into a register.  */
> +  unsigned mops_cost = 3 + 1;
> +
> +  /* If MOPS is available at this point we don't consider the libcall as it's
> + not a win even on code size.  At this point only consider MOPS if
> + optimizing for size.  For speed optimizations we will have chosen 
> between
> + the two based on copy size already.  */
> +  if (TARGET_MOPS)
> +{
> +  if (size_p && mops_cost < nops)
> + return aarch64_expand_cpymem_mops (operands);
> +  emit_insn (seq);
> +  return true;
> +}
>  
>/* A memcpy libcall in the worst case takes 3 instructions to prepare the
> - arguments + 1 for the call.  */
> + arguments + 1 for the call.  When MOPS is not available and we're
> + optimizing for size a libcall may be preferable.  */
>unsigned libcall_cost = 4;
>if (size_p && libcall_cost < nops)
>  return false;
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> 5297b2d3f95744ac72e36814c6676cc97478d48b..d623c1b00bf62bf24420813fb7a3a6bf09ff1dc0
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch6

Re: GCC 11 backport does not build (no "directly_supported_p") - was: Re: pr103523: Check for PLUS/MINUS support

2021-12-14 Thread Jakub Jelinek via Gcc-patches
On Tue, Dec 14, 2021 at 10:46:39AM +, Joel Hutton wrote:
> > +  if (ot_plus == unknown_optab
> > +  || ot_minus == unknown_optab
> > +  || optab_handler (ot_minus, TYPE_MODE (step_vectype)) ==
> > CODE_FOR_nothing
> > +  || optab_handler (ot_plus, TYPE_MODE (step_vectype)) ==
> > + CODE_FOR_nothing)
> >  return false;
> > 
> > Won't optab_handler just return CODE_FOR_nothing for unknown_optab?
> 
> I was taking the check used in directly_supported_p
> 
> return (optab != unknown_optab$
>   && optab_handler (optab, TYPE_MODE (type)) != CODE_FOR_nothing);$
> 
> > Anyway, I think best would be to write it as:
> >   if (!target_supports_op_p (step_vectype, PLUS_EXPR, optab_default)
> >   || !target_supports_op_p (step_vectype, MINUS_EXPR, optab_default))
> > return false;
> Looks good to me.
> 
> Patch attached.
> 
> Tests running on gcc-11 on aarch64. 
> 
> Ok for 11 once tests come back?

Yes, thanks.

Jakub



Re: [RFC PATCH] tree-ssa-sink: do not sink to in front of setjmp

2021-12-14 Thread Алексей Нурмухаметов via Gcc-patches

On 13.12.2021 18:20, Alexander Monakov wrote:

On Mon, 13 Dec 2021, Richard Biener wrote:


On December 13, 2021 3:25:47 PM GMT+01:00, Alexander Monakov 
 wrote:

Greetings!

While testing our patch that reimplements -Wclobbered on GIMPLE we found
a case where tree-ssa-sink moves a statement to a basic block in front
of a setjmp call.

I am confident that this is unintended and should be considered invalid
GIMPLE.

Does CFG validation not catch this? That is, doesn't setjmp force the start of
a new BB?

Oh, good point. There's stmt_start_bb_p which returns true for setjmp, but
gimple_verify_flow_info doesn't check it. I guess we can try adding that
and collect the fallout on bootstrap/regtest.


Bootstrap looks good, but testsuite has some regression (the applied 
patch is below).


The overall number of unexpected failures and unresolved testcases is 
around 100. The diff is in attachment.

I think sinking relies on dominance and post dominance here but post dominance
may be too fragile with the abnormal cycles which are likely not backwards
reachable from exit.

That said, checking for abnormal preds is OK, I just want to make sure we
detect the invalid CFG - do we?

As above, no, otherwise it would have been caught much earlier than ICE'ing
our -Wclobbered patch :)

Thank you.
Alexander


The patch for CFG validation:

diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index ebbd894ae03..92b08d1d6d8 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -5663,6 +5663,7 @@ gimple_verify_flow_info (void)
    }

   /* Verify that body of basic block BB is free of control flow.  */
+  gimple *prev_stmt = NULL;
   for (; !gsi_end_p (gsi); gsi_next (&gsi))
    {
  gimple *stmt = gsi_stmt (gsi);
@@ -5674,6 +5675,14 @@ gimple_verify_flow_info (void)
  err = 1;
    }

+ if (prev_stmt && stmt_starts_bb_p (stmt, prev_stmt))
+   {
+ error ("setjmp in the middle of basic block %d", bb->index);
+ err = 1;
+   }
+ if (!is_gimple_debug (stmt))
+   prev_stmt = stmt;
+
  if (stmt_ends_bb_p (stmt))
    found_ctrl_stmt = true;

7a8,9
> FAIL: gcc.c-torture/compile/930111-1.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler error)
> FAIL: gcc.c-torture/compile/930111-1.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
85a88,148
> FAIL: c-c++-common/tm/cancel-1.c (internal compiler error)
> FAIL: c-c++-common/tm/cancel-1.c (test for excess errors)
> FAIL: gcc.dg/tm/data-2.c (internal compiler error)
> FAIL: gcc.dg/tm/data-2.c (test for excess errors)
> FAIL: gcc.dg/tm/debug-1.c (internal compiler error)
> FAIL: gcc.dg/tm/debug-1.c (test for excess errors)
> FAIL: gcc.dg/tm/memopt-10.c (internal compiler error)
> FAIL: gcc.dg/tm/memopt-10.c (test for excess errors)
> FAIL: gcc.dg/tm/memopt-11.c (internal compiler error)
> FAIL: gcc.dg/tm/memopt-11.c (test for excess errors)
> FAIL: gcc.dg/tm/memopt-12.c (internal compiler error)
> FAIL: gcc.dg/tm/memopt-12.c (test for excess errors)
> FAIL: gcc.dg/tm/memopt-15.c (internal compiler error)
> FAIL: gcc.dg/tm/memopt-15.c (test for excess errors)
> UNRESOLVED: gcc.dg/tm/memopt-15.c scan-assembler _ITM_LM128
> FAIL: gcc.dg/tm/memopt-16.c (internal compiler error)
> FAIL: gcc.dg/tm/memopt-16.c (test for excess errors)
> FAIL: gcc.dg/tm/memopt-3.c (internal compiler error)
> FAIL: gcc.dg/tm/memopt-3.c (test for excess errors)
> FAIL: gcc.dg/tm/memopt-4.c (internal compiler error)
> FAIL: gcc.dg/tm/memopt-4.c (test for excess errors)
> UNRESOLVED: gcc.dg/tm/memopt-4.c scan-tree-dump-times tmedge "tm_save.[0-9_]+ = lala.x[55]" 1
> UNRESOLVED: gcc.dg/tm/memopt-4.c scan-tree-dump-times tmedge "lala.x[55] = tm_save" 1
> FAIL: gcc.dg/tm/memopt-5.c (internal compiler error)
> FAIL: gcc.dg/tm/memopt-5.c (test for excess errors)
> UNRESOLVED: gcc.dg/tm/memopt-5.c scan-tree-dump-times tmedge "ITM_LU[0-9] (&lala.x[55]" 1
> FAIL: gcc.dg/tm/memopt-6.c (internal compiler error)
> FAIL: gcc.dg/tm/memopt-6.c (test for excess errors)
> UNRESOLVED: gcc.dg/tm/memopt-6.c scan-tree-dump-times tmedge "memcpyRtWn (.*, &lacopy" 1
> FAIL: gcc.dg/tm/memopt-7.c (internal compiler error)
> FAIL: gcc.dg/tm/memopt-7.c (test for excess errors)
> UNRESOLVED: gcc.dg/tm/memopt-7.c scan-tree-dump-times tmedge "tm_save.[0-9_]+ = lala" 1
> UNRESOLVED: gcc.dg/tm/memopt-7.c scan-tree-dump-times tmedge "lala = tm_save" 1
> FAIL: gcc.dg/tm/opt-1.c (internal compiler error)
> FAIL: gcc.dg/tm/opt-1.c (test for excess errors)
> FAIL: gcc.dg/tm/pr55401.c (internal compiler error)
> FAIL: gcc.dg/tm/pr55401.c (test for excess errors)
> UNRESOLVED: gcc.dg/tm/pr55401.c scan-tree-dump-times optimized "ITM_WU[0-9] (&george," 2
> FAIL: gcc.dg/tm/pr95569.c (internal compiler error)
> FAIL: gcc.dg/tm/pr95569.c (test for excess errors)
> FAIL: gcc.dg/torture/pr100053.c   -

Re: [vect] Re-analyze all modes for epilogues

2021-12-14 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)"  writes:
> Hi,
>
> Added an extra step to skip unusable epilogue modes when we know the 
> target does not support predication. This uses a new function 
> 'support_predication_p' that is generated at build time and checks 
> whether the target supports at least one optab that can be used for 
> predicated code-generation.
>
> Bootstrapped and regression tested on aarch64-none-linux-gnu.
>
> OK for trunk?

Looks good, but see the final comment below about whether we could
simplify this a bit.

> gcc/ChangeLog:
>
>      * tree-vect-loop.c (vect_better_loop_vinfo_p): Round factors up 
> for epilogue costing.
>      (vect_analyze_loop): Re-analyze all modes for epilogues, unless 
> we are guaranteed that no
>      predication is possible.
>      (genopinit.c) (support_predication_p): Generate new function.
>
> gcc/testsuite/ChangeLog:
>
>      * gcc.target/aarch64/masked_epilogue.c: New test.
>
> diff --git a/gcc/genopinit.c b/gcc/genopinit.c
> index 
> 195ddf74fa2b7d89760622073dcec9d5d339a097..e0958bc6c849911395341611a53b0fcb69565827
>  100644
> --- a/gcc/genopinit.c
> +++ b/gcc/genopinit.c
> @@ -321,6 +321,7 @@ main (int argc, const char **argv)
>  "  bool supports_vec_scatter_store_cached;\n"
>  "};\n"
>  "extern void init_all_optabs (struct target_optabs *);\n"
> +"extern bool support_predication_p (void);\n"

len_load and len_store aren't really predication (or masking).
So maybe:

  partial_vectors_supported_p

?

>  "\n"
>  "extern struct target_optabs default_target_optabs;\n"
>  "extern struct target_optabs *this_fn_optabs;\n"
> @@ -373,6 +374,33 @@ main (int argc, const char **argv)
>  fprintf (s_file, "  ena[%u] = HAVE_%s;\n", i, p->name);
>fprintf (s_file, "}\n\n");
>  
> +  fprintf (s_file,
> +"/* Returns TRUE if the target supports any of the predication\n"
> +"   specific optabs: while_ult_optab, len_load_optab or 
> len_store_optab,\n"
> +"   for any mode.  */\n"

Similarly here, s/predication specific optabs/partial vector optabs/.

> +"bool\nsupport_predication_p (void)\n{\n");
> +  bool any_match = false;
> +  fprintf (s_file, "\treturn");
> +  bool first = true;
> +  for (i = 0; patterns.iterate (i, &p); ++i)
> +{
> +#define CMP_NAME(N) !strncmp (p->name, (N), strlen ((N)))
> +  if (CMP_NAME("while_ult") || CMP_NAME ("len_load")
> +   || CMP_NAME ("len_store"))
> + {
> +   if (first)
> + fprintf (s_file, " HAVE_%s", p->name);
> +   else
> + fprintf (s_file, " || HAVE_%s", p->name);
> +   first = false;
> +   any_match = true;
> + }
> +}
> +  if (!any_match)
> +fprintf (s_file, " false");
> +  fprintf (s_file, ";\n}\n");
> +
> +
>/* Perform a binary search on a pre-encoded optab+mode*2.  */
>/* ??? Perhaps even better to generate a minimal perfect hash.
>   Using gperf directly is awkward since it's so geared to working
> diff --git a/gcc/testsuite/gcc.target/aarch64/masked_epilogue.c 
> b/gcc/testsuite/gcc.target/aarch64/masked_epilogue.c
> new file mode 100644
> index 
> ..286a7be236f337fee4c4650f42da72000855c5e6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/masked_epilogue.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details 
> -march=armv8-a+sve -msve-vector-bits=scalable" } */
> +
> +void f(unsigned char y[restrict],
> +   unsigned char x[restrict], int n) {
> +  for (int i = 0; i < n; ++i)
> +y[i] = (y[i] + x[i] + 1) >> 1;
> +}
> +
> +/* { dg-final { scan-tree-dump {LOOP EPILOGUE VECTORIZED \(MODE=VNx} "vect" 
> } } */
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 
> a28bb6321d76b8222bc8cfdade151ca9b4dca406..86e0cb47aef2919fdf7d87228f7f6a8378893e68
>  100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -2824,11 +2824,13 @@ vect_better_loop_vinfo_p (loop_vec_info 
> new_loop_vinfo,
>   {
> unsigned HOST_WIDE_INT main_vf_max
>   = estimated_poly_value (main_poly_vf, POLY_VALUE_MAX);
> +   unsigned HOST_WIDE_INT old_vf_max
> + = estimated_poly_value (old_vf, POLY_VALUE_MAX);
> +   unsigned HOST_WIDE_INT new_vf_max
> + = estimated_poly_value (new_vf, POLY_VALUE_MAX);
>  
> -   old_factor = main_vf_max / estimated_poly_value (old_vf,
> -POLY_VALUE_MAX);
> -   new_factor = main_vf_max / estimated_poly_value (new_vf,
> -POLY_VALUE_MAX);
> +   old_factor = CEIL (main_vf_max, old_vf_max);
> +   new_factor = CEIL (main_vf_max, new_vf_max);
>  
> /* If the loop is not using partial vectors then it will iterate one
>time less than one that does.  It is safe to subtract one here,
> @@ -3069,8 +3071,6 @@ vect_analyze_loop (class loop *loop, vec_in

Re: [PATCH] vect: Add bias parameter for partial vectorization

2021-12-14 Thread Richard Sandiford via Gcc-patches
Robin Dapp  writes:
> Hi Richard,
>
> I incorporated all your remarks (sorry for the hunk from a different
> branch) except for this one:
>
>> Think it would be better to make it:
>> 
>>if (use_bias_adjusted_len)
>>  {
>>gcc_assert (i == 0);
>> 
>> But do we need to do this?  Code should only care about the final value,
>> so I didn't think we would need to keep the intermediate unbiased length
>> alongside the biased one.  (Or maybe we do.  My memory is a bit rusty,
>> sorry.)
>
> I'd agree that we generally don't need to keep the unbiased length.
> However "loop_len" being a phi node, I wasn't sure how or where to
> properly apply the bias (except via creating a new variable like I did).
>  Would be thankful about a pointer here.

Ah, I see.  Yeah, I guess the loop manip stuff does still need
access to the unbiased value, so I agree we should keep both.

Thanks,
Richard


Re: [PATCH, v2] PR libfortran/103634 - Runtime crash with PACK on zero-sized arrays

2021-12-14 Thread Mikael Morin

Le 13/12/2021 à 21:27, Harald Anlauf via Fortran a écrit :

Works better with patch attached...

Am 13.12.21 um 21:25 schrieb Harald Anlauf via Gcc-patches:


The code is so similar (for good reason) that it makes sense to keep
it synchronous.  I added code for 'zero_sized' array with the minor
difference that I made it boolean instead of integer.

I also extended the testcase so that it exercises PACK/pack_internal
a little, for argument 'vector' present as well as not.  (There are
existing tests for intrinsic types, but not for the issue at hand).

Regtested again, and checked the testcase (against other compilers
and also with valgrind).

OK now?


Yes, thanks.


Re: [PATCH][GCC] aarch64: Add LS64 extension and intrinsics

2021-12-14 Thread Richard Sandiford via Gcc-patches
Przemyslaw Wirkus  writes:
> Hello Richard,
>
> I've updated my patch following all your comments. Thank you.
>
> Boostrapped on aarch64-linux-gnu and all new ACLE tests pass.
>
> OK to install?

Thanks.  OK with a couple of formatting nits:

> @@ -2130,6 +2203,57 @@ aarch64_expand_builtin_tme (int fcode, tree exp, rtx 
> target)
>  return target;
>  }
>  
> +/* Function to expand an expression EXP which calls one of the Load/Store
> +   64 Byte extension (LS64) builtins FCODE with the result going to TARGET.  
> */
> +static rtx
> +aarch64_expand_builtin_ls64 (int fcode, tree exp, rtx target)
> +{
> +  expand_operand ops[3];
> +
> +  switch (fcode)
> +{
> +case AARCH64_LS64_BUILTIN_LD64B:
> +  {
> +rtx op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
> +create_output_operand (&ops[0], target, V8DImode);
> +create_input_operand (&ops[1], op0, DImode);
> +expand_insn (CODE_FOR_ld64b, 2, ops);
> +return ops[0].value;
> +  }
> +case AARCH64_LS64_BUILTIN_ST64B:
> +  {
> +rtx op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
> +rtx op1 = expand_normal (CALL_EXPR_ARG (exp, 1));
> +create_output_operand (&ops[0], op0, DImode);
> +create_input_operand (&ops[1], op1, V8DImode);
> +expand_insn (CODE_FOR_st64b, 2, ops);
> +return const0_rtx;
> +  }
> +case AARCH64_LS64_BUILTIN_ST64BV:
> +  {
> +rtx op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
> +rtx op1 = expand_normal (CALL_EXPR_ARG (exp, 1));
> +create_output_operand (&ops[0], target, DImode);
> +create_input_operand (&ops[1], op0, DImode);
> +create_input_operand (&ops[2], op1, V8DImode);
> +expand_insn (CODE_FOR_st64bv, 3, ops);
> +return ops[0].value;
> +  }
> +case AARCH64_LS64_BUILTIN_ST64BV0:
> +  {
> +rtx op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
> +rtx op1 = expand_normal (CALL_EXPR_ARG (exp, 1));
> +create_output_operand (&ops[0], target, DImode);
> +create_input_operand (&ops[1], op0, DImode);
> +create_input_operand (&ops[2], op1, V8DImode);
> +expand_insn (CODE_FOR_st64bv0, 3, ops);
> +return ops[0].value;
> +  }
> +}
> +
> +gcc_unreachable ();

This line should be indented by 2 spaces rather than 4.

> +}
> +
>  /* Expand a random number builtin EXP with code FCODE, putting the result
> int TARGET.  If IGNORE is true the return value is ignored.  */
>  
> […]
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> be24b7320d28deed9a19a0451c96bd67d2fb3104..e0ceba68968a28a9fcf1ba6e3a3036783b0931b0
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -10013,8 +10013,12 @@ aarch64_classify_address (struct 
> aarch64_address_info *info,
>instruction memory accesses.  */
> if (mode == TImode || mode == TFmode)
>   return (aarch64_offset_7bit_signed_scaled_p (DImode, offset)
> - && (aarch64_offset_9bit_signed_unscaled_p (mode, offset)
> - || offset_12bit_unsigned_scaled_p (mode, offset)));
> + && (aarch64_offset_9bit_signed_unscaled_p (mode, offset)
> + || offset_12bit_unsigned_scaled_p (mode, offset)));

The original formatting was correct here.

> +
> +   if (mode == V8DImode)
> + return (aarch64_offset_7bit_signed_scaled_p (DImode, offset)
> + && aarch64_offset_7bit_signed_scaled_p (DImode, offset + 
> 48));
>  
> /* A 7bit offset check because OImode will emit a ldp/stp
>instruction (only big endian will get here).


[PATCH] Drop the fpic multilib for VxWorks on powerpc

2021-12-14 Thread Olivier Hainque via Gcc-patches
The addition of fPIC for shared libraries is performed
independently from multilibs and fpic multilibs have
no other particular purpose for VxWorks at this stage.

They incur extra build time, complexify the install tree
and are a bit tricky because -fpic is not supported for kernel
mode.

Tested together with our recent cleanups in the shared-objects
support area, for a mix of builds/tests for vxworks 6.9 and 7.2
with our in-house gcc-11 based toolchains, including for powerpc64
with support for shared libraries enabled.

Sanity checked for mainline with a build for vxWorks 6.9.

Olivier

2020-11-06  Fred Konrad  

gcc/
* config/rs6000/t-vxworks: Drop the fPIC multilib.



0001-Drop-the-fpic-multilib-for-powerpc-vxworks.patch
Description: Binary data




Re: [PATCH]AArch64 Fix the AAPCs for new partial and full SIMD structure types [PR103094]

2021-12-14 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi All,
>
> The new partial and full vector types added to AArch64, e.g.
>
> int8x8x2_t with mode V2x8QI are incorrectly being defined as being short
> vectors and not being composite types.
>
> This causes the layout code to incorrectly conclude that the registers are
> packed. i.e. for V2x8QI it thinks those 16-bytes are in the same registers.
>
> Because of this the code under !aarch64_composite_type_p is unreachable but 
> also
> lacked any extra checks to see that nregs is what we expected it to be.
>
> I have also updated aarch64_advsimd_full_struct_mode_p and 
> aarch64_advsimd_partial_struct_mode_p to only consider vector types as struct
> modes.  Otherwise types such as OImode and friends would qualify leading to
> incorrect results.

How easy would it be to fix the bug without doing this last bit?
The idea was that OI, CI and XI should continue to be structure
modes until we remove them.  aarch64_advsimd_partial_struct_mode_p
and aarch64_advsimd_full_struct_mode_p are meant to be convenience
wrappers and so they shouldn't make different decisions from the
underlying aarch64_classify_vector_mode.

>
> This patch fixes up the issues and we now generate correct code.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
>
>
> gcc/ChangeLog:
>
>   PR target/103094
>   * config/aarch64/aarch64.c (aarch64_function_value, aarch64_layout_arg):
>   Fix unreachable code for partial vectors and re-order switch to perform
>   the simplest test first.
>   (aarch64_short_vector_p): Mark as not short vectors.
>   (aarch64_composite_type_p): Mark as composite types.
>   (aarch64_advsimd_partial_struct_mode_p,
>   aarch64_advsimd_full_struct_mode_p): Restrict to actual SIMD types.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/103094
>   * gcc.target/aarch64/pr103094.c: New test.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> fdf05505846721b02059df494d6395ae9423a8ef..d9104ddac3cdd44f7c2290b8725d05be4fd6468f
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -3055,15 +3055,17 @@ aarch64_advsimd_struct_mode_p (machine_mode mode)
>  static bool
>  aarch64_advsimd_partial_struct_mode_p (machine_mode mode)
>  {
> -  return (aarch64_classify_vector_mode (mode)
> -   == (VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL));
> +  return VECTOR_MODE_P (mode)
> +  && (aarch64_classify_vector_mode (mode)
> + == (VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL));
>  }
>  
>  /* Return true if MODE is an Advanced SIMD Q-register structure mode.  */
>  static bool
>  aarch64_advsimd_full_struct_mode_p (machine_mode mode)
>  {
> -  return (aarch64_classify_vector_mode (mode) == (VEC_ADVSIMD | VEC_STRUCT));
> +  return VECTOR_MODE_P (mode)
> +  && (aarch64_classify_vector_mode (mode) == (VEC_ADVSIMD | VEC_STRUCT));
>  }
>  
>  /* Return true if MODE is any of the data vector modes, including
> @@ -6468,17 +6470,21 @@ aarch64_function_value (const_tree type, const_tree 
> func,
>  NULL, false))
>  {
>gcc_assert (!sve_p);
> -  if (!aarch64_composite_type_p (type, mode))
> +  if (aarch64_advsimd_full_struct_mode_p (mode))
> + {
> +   gcc_assert (known_eq (exact_div (GET_MODE_SIZE (mode), 16), count));
> +   return gen_rtx_REG (mode, V0_REGNUM);
> + }
> +  else if (aarch64_advsimd_partial_struct_mode_p (mode))
> + {
> +   gcc_assert (known_eq (exact_div (GET_MODE_SIZE (mode), 8), count));
> +   return gen_rtx_REG (mode, V0_REGNUM);
> + }
> +  else if (!aarch64_composite_type_p (type, mode))
>   {
> gcc_assert (count == 1 && mode == ag_mode);
> return gen_rtx_REG (mode, V0_REGNUM);
>   }
> -  else if (aarch64_advsimd_full_struct_mode_p (mode)
> -&& known_eq (GET_MODE_SIZE (ag_mode), 16))
> - return gen_rtx_REG (mode, V0_REGNUM);
> -  else if (aarch64_advsimd_partial_struct_mode_p (mode)
> -&& known_eq (GET_MODE_SIZE (ag_mode), 8))
> - return gen_rtx_REG (mode, V0_REGNUM);
>else
>   {
> int i;
> @@ -6745,6 +6751,7 @@ aarch64_layout_arg (cumulative_args_t pcum_v, const 
> function_arg_info &arg)
>  /* No frontends can create types with variable-sized modes, so we
> shouldn't be asked to pass or return them.  */
>  size = GET_MODE_SIZE (mode).to_constant ();
> +
>size = ROUND_UP (size, UNITS_PER_WORD);
>  
>allocate_ncrn = (type) ? !(FLOAT_TYPE_P (type)) : !FLOAT_MODE_P (mode);
> @@ -6769,17 +6776,21 @@ aarch64_layout_arg (cumulative_args_t pcum_v, const 
> function_arg_info &arg)
>if (nvrn + nregs <= NUM_FP_ARG_REGS)
>   {
> pcum->aapcs_nextnvrn = nvrn + nregs;
> -   if (!aarch64_composite_type_p (type, mode))
> +   if (aarch64_advsimd_full_struct_mode_p (mode))
> +  

[PATCH] Remove fpic multilib on x86_64-vxworks

2021-12-14 Thread Olivier Hainque via Gcc-patches
The addition of fPIC for shared libraries is performed
independently from multilibs and fpic multilibs have
no other particular purpose for VxWorks at this stage.

Tested together with our recent cleanups in the shared-objects
support area, for a mix of builds/tests for vxworks 6.9 and 7.2
with our in-house gcc-11 based toolchains, including for x86_64
with support for shared libraries enabled.

Cheers,

Olivier

2021-12-14  Olivier Hainque  

* config/i386/t-vxworks: Drop the fPIC multilibs.



0001-Remove-fpic-multilib-on-x86_64-vxworks.patch
Description: Binary data




RE: [PATCH]AArch64 Fix the AAPCs for new partial and full SIMD structure types [PR103094]

2021-12-14 Thread Tamar Christina via Gcc-patches


> -Original Message-
> From: Richard Sandiford 
> Sent: Tuesday, December 14, 2021 12:38 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: Re: [PATCH]AArch64 Fix the AAPCs for new partial and full SIMD
> structure types [PR103094]
> 
> Tamar Christina  writes:
> > Hi All,
> >
> > The new partial and full vector types added to AArch64, e.g.
> >
> > int8x8x2_t with mode V2x8QI are incorrectly being defined as being
> > short vectors and not being composite types.
> >
> > This causes the layout code to incorrectly conclude that the registers
> > are packed. i.e. for V2x8QI it thinks those 16-bytes are in the same 
> > registers.
> >
> > Because of this the code under !aarch64_composite_type_p is
> > unreachable but also lacked any extra checks to see that nregs is what we
> expected it to be.
> >
> > I have also updated aarch64_advsimd_full_struct_mode_p and
> > aarch64_advsimd_partial_struct_mode_p to only consider vector types as
> > struct modes.  Otherwise types such as OImode and friends would
> > qualify leading to incorrect results.
> 
> How easy would it be to fix the bug without doing this last bit?
> The idea was that OI, CI and XI should continue to be structure modes until
> we remove them.  aarch64_advsimd_partial_struct_mode_p
> and aarch64_advsimd_full_struct_mode_p are meant to be convenience
> wrappers and so they shouldn't make different decisions from the
> underlying aarch64_classify_vector_mode.

It can be done by moving the check higher in callers of these functions, but 
the problem is that
With an e.g. an OImode there's no real indication of how many registers are 
used to create the
IOmode. It could be 4, 6, 8 as it's just a bag of bits.

My concern is that these functions are misleading without this, with any of 
these opaque
types returning true for both of these functions it becomes harder to make 
decisions between
the two, in particular because we still expand to these modes for certain 
structures.

> 
> >
> > This patch fixes up the issues and we now generate correct code.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> >
> >
> > gcc/ChangeLog:
> >
> > PR target/103094
> > * config/aarch64/aarch64.c (aarch64_function_value,
> aarch64_layout_arg):
> > Fix unreachable code for partial vectors and re-order switch to
> perform
> > the simplest test first.
> > (aarch64_short_vector_p): Mark as not short vectors.
> > (aarch64_composite_type_p): Mark as composite types.
> > (aarch64_advsimd_partial_struct_mode_p,
> > aarch64_advsimd_full_struct_mode_p): Restrict to actual SIMD types.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/103094
> > * gcc.target/aarch64/pr103094.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/config/aarch64/aarch64.c
> > b/gcc/config/aarch64/aarch64.c index
> >
> fdf05505846721b02059df494d6395ae9423a8ef..d9104ddac3cdd44f7c2290b872
> 5d
> > 05be4fd6468f 100644
> > --- a/gcc/config/aarch64/aarch64.c
> > +++ b/gcc/config/aarch64/aarch64.c
> > @@ -3055,15 +3055,17 @@ aarch64_advsimd_struct_mode_p
> (machine_mode
> > mode)  static bool  aarch64_advsimd_partial_struct_mode_p
> > (machine_mode mode)  {
> > -  return (aarch64_classify_vector_mode (mode)
> > - == (VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL));
> > +  return VECTOR_MODE_P (mode)
> > +&& (aarch64_classify_vector_mode (mode)
> > +   == (VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL));
> >  }
> >
> >  /* Return true if MODE is an Advanced SIMD Q-register structure mode.
> > */  static bool  aarch64_advsimd_full_struct_mode_p (machine_mode
> > mode)  {
> > -  return (aarch64_classify_vector_mode (mode) == (VEC_ADVSIMD |
> > VEC_STRUCT));
> > +  return VECTOR_MODE_P (mode)
> > +&& (aarch64_classify_vector_mode (mode) == (VEC_ADVSIMD |
> > +VEC_STRUCT));
> >  }
> >
> >  /* Return true if MODE is any of the data vector modes, including @@
> > -6468,17 +6470,21 @@ aarch64_function_value (const_tree type,
> const_tree func,
> >NULL, false))
> >  {
> >gcc_assert (!sve_p);
> > -  if (!aarch64_composite_type_p (type, mode))
> > +  if (aarch64_advsimd_full_struct_mode_p (mode))
> > +   {
> > + gcc_assert (known_eq (exact_div (GET_MODE_SIZE (mode), 16),
> count));
> > + return gen_rtx_REG (mode, V0_REGNUM);
> > +   }
> > +  else if (aarch64_advsimd_partial_struct_mode_p (mode))
> > +   {
> > + gcc_assert (known_eq (exact_div (GET_MODE_SIZE (mode), 8),
> count));
> > + return gen_rtx_REG (mode, V0_REGNUM);
> > +   }
> > +  else if (!aarch64_composite_type_p (type, mode))
> > {
> >   gcc_assert (count == 1 && mode == ag_mode);
> >   return gen_rtx_REG (mode, V0_REGNUM);
> > }
> > -  else if (aarch64_advsimd_full_struct_mode_p (mode)
> > -  && known_eq (GET_MODE_SIZE

Re: [committed] libstdc++: Specialize std::pointer_traits<__normal_iterator>

2021-12-14 Thread Jonathan Wakely via Gcc-patches
On Tue, 14 Dec 2021 at 06:53, François Dumont wrote:

> Hi
>
>  Any conclusion regarding this thread ?
>
> François
>
>
> On 06/10/21 7:25 pm, François Dumont wrote:
> > I forgot to ask if with this patch this overload:
> >
> >   template
> > constexpr auto
> > __to_address(const _Ptr& __ptr, _None...) noexcept
> > {
> >   if constexpr (is_base_of_v<__gnu_debug::_Safe_iterator_base, _Ptr>)
> > return std::__to_address(__ptr.base().operator->());
> >   else
> > return std::__to_address(__ptr.operator->());
> > }
> >
> > should be removed ?
>
>
No, definitely not.

That is the default overload for types that do not have a
pointer_traits::to_address specialization. If you remove it, __to_address
won't work for fancy pointers or any other pointer-like types. That would
completely break it.

The purpose of C++20's std::to_address is to get a real pointer from a
pointer-like type. Using it with iterators is not the primary use case, but
it needs to work with contiguous iterators because those are pointer-like.
I made it work correctly with __normal_iterator because that was necessary
to support the uses of std::__to_address in  and , but I
removed those uses in:

https://gcc.gnu.org/g:247bac507e63b32d4dc23ef1c55f300aafea24c6
https://gcc.gnu.org/g:b83b810ac440f72e7551b6496539e60ac30c0d8a

So now we don't really need the C++17 version of std::__to_address to work
with __normal_iterator at all.

I think it's OK to add the overload for __normal_iterator though, but only
for C++11/14/17, because the default std::__to_address handles
__normal_iterator correctly in C++20.


> Or perhaps just the _Safe_iterator_base branch in it ?
>

Yes, you can just remove that branch, because your new overload handles it.


>

> > On 06/10/21 7:18 pm, François Dumont wrote:
> >> Here is another proposal with the __to_address overload.
> >>
> >> I preferred to let it open to any kind of __normal_iterator
> >> instantiation cause afaics std::vector supports fancy pointer types.
> >> It is better if __to_address works fine also in this case, no ?
>

 If we intend to support that, then we should verify it in the testsuite,
using __gnu_test::CustomPointerAlloc.


>> libstdc++: Overload std::__to_address for
> >> __gnu_cxx::__normal_iterator.
> >>
> >> Prefer to overload __to_address to partially specialize
> >> std::pointer_traits because
> >> std::pointer_traits would be mostly useless. In the case of
> >> __gnu_debug::_Safe_iterator
> >> the to_pointer method is even impossible to implement correctly
> >> because we are missing
> >> the parent container to associate the iterator to.
>

To record additional rationale in the git history, please add that the
partial specialization of pointer_traits<__normal_iterator> fails to
rebind C, so you get incorrect types like __normal_iterator>.


>>
> >> libstdc++-v3/ChangeLog:
> >>
> >> * include/bits/stl_iterator.h
> >> (std::pointer_traits<__gnu_cxx::__normal_iterator<>>): Remove.
>

OK to remove this (it's broken anyway).

>> (std::__to_address(const
> >> __gnu_cxx::__normal_iterator<>&)): New.
>

Please make this only defined for C++11, 14 and 17.


> >> * include/debug/safe_iterator.h
> >> (std::__to_address(const
> >> __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<>,
> >> _Sequence>&)):
> >> New.
>

OK to add this (including for C++20), and remove the _Safe_iterator branch
from the C++20 std::__to_address in .

I think this new overload could return
std::__to_address(__it.base().base()) though. That saves a function call,
by going directly to the value stored in the __normal_iterator.



> >> * testsuite/24_iterators/normal_iterator/to_address.cc:
> >> Add check on std::vector::iterator
> >> to validate both __gnu_cxx::__normal_iterator<>
> >> __to_address overload in normal mode and the
>

Add similar checks for vector>.

OK with those changes, thanks.


Re: [PATCH]AArch64 Fix the AAPCs for new partial and full SIMD structure types [PR103094]

2021-12-14 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Tuesday, December 14, 2021 12:38 PM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; Kyrylo Tkachov 
>> Subject: Re: [PATCH]AArch64 Fix the AAPCs for new partial and full SIMD
>> structure types [PR103094]
>> 
>> Tamar Christina  writes:
>> > Hi All,
>> >
>> > The new partial and full vector types added to AArch64, e.g.
>> >
>> > int8x8x2_t with mode V2x8QI are incorrectly being defined as being
>> > short vectors and not being composite types.
>> >
>> > This causes the layout code to incorrectly conclude that the registers
>> > are packed. i.e. for V2x8QI it thinks those 16-bytes are in the same 
>> > registers.
>> >
>> > Because of this the code under !aarch64_composite_type_p is
>> > unreachable but also lacked any extra checks to see that nregs is what we
>> expected it to be.
>> >
>> > I have also updated aarch64_advsimd_full_struct_mode_p and
>> > aarch64_advsimd_partial_struct_mode_p to only consider vector types as
>> > struct modes.  Otherwise types such as OImode and friends would
>> > qualify leading to incorrect results.
>> 
>> How easy would it be to fix the bug without doing this last bit?
>> The idea was that OI, CI and XI should continue to be structure modes until
>> we remove them.  aarch64_advsimd_partial_struct_mode_p
>> and aarch64_advsimd_full_struct_mode_p are meant to be convenience
>> wrappers and so they shouldn't make different decisions from the
>> underlying aarch64_classify_vector_mode.
>
> It can be done by moving the check higher in callers of these functions, but 
> the problem is that
> With an e.g. an OImode there's no real indication of how many registers are 
> used to create the
> IOmode. It could be 4, 6, 8 as it's just a bag of bits.

OImode is always 2 Q registers, etc.

Which bit of code are you concerned about?  Is it the parts where
we generate gen_rtx_REG?  If so, it was the case even before the
new modes that an OImode structure could have safely been classified
as (reg:OI V0) (say) rather than as a less efficient parallel.

Thanks,
Richard


>
> My concern is that these functions are misleading without this, with any of 
> these opaque
> types returning true for both of these functions it becomes harder to make 
> decisions between
> the two, in particular because we still expand to these modes for certain 
> structures.
>
>> 
>> >
>> > This patch fixes up the issues and we now generate correct code.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >
>> > Ok for master?
>> >
>> > Thanks,
>> > Tamar
>> >
>> >
>> >
>> > gcc/ChangeLog:
>> >
>> >PR target/103094
>> >* config/aarch64/aarch64.c (aarch64_function_value,
>> aarch64_layout_arg):
>> >Fix unreachable code for partial vectors and re-order switch to
>> perform
>> >the simplest test first.
>> >(aarch64_short_vector_p): Mark as not short vectors.
>> >(aarch64_composite_type_p): Mark as composite types.
>> >(aarch64_advsimd_partial_struct_mode_p,
>> >aarch64_advsimd_full_struct_mode_p): Restrict to actual SIMD types.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> >PR target/103094
>> >* gcc.target/aarch64/pr103094.c: New test.
>> >
>> > --- inline copy of patch --
>> > diff --git a/gcc/config/aarch64/aarch64.c
>> > b/gcc/config/aarch64/aarch64.c index
>> >
>> fdf05505846721b02059df494d6395ae9423a8ef..d9104ddac3cdd44f7c2290b872
>> 5d
>> > 05be4fd6468f 100644
>> > --- a/gcc/config/aarch64/aarch64.c
>> > +++ b/gcc/config/aarch64/aarch64.c
>> > @@ -3055,15 +3055,17 @@ aarch64_advsimd_struct_mode_p
>> (machine_mode
>> > mode)  static bool  aarch64_advsimd_partial_struct_mode_p
>> > (machine_mode mode)  {
>> > -  return (aarch64_classify_vector_mode (mode)
>> > -== (VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL));
>> > +  return VECTOR_MODE_P (mode)
>> > +   && (aarch64_classify_vector_mode (mode)
>> > +  == (VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL));
>> >  }
>> >
>> >  /* Return true if MODE is an Advanced SIMD Q-register structure mode.
>> > */  static bool  aarch64_advsimd_full_struct_mode_p (machine_mode
>> > mode)  {
>> > -  return (aarch64_classify_vector_mode (mode) == (VEC_ADVSIMD |
>> > VEC_STRUCT));
>> > +  return VECTOR_MODE_P (mode)
>> > +   && (aarch64_classify_vector_mode (mode) == (VEC_ADVSIMD |
>> > +VEC_STRUCT));
>> >  }
>> >
>> >  /* Return true if MODE is any of the data vector modes, including @@
>> > -6468,17 +6470,21 @@ aarch64_function_value (const_tree type,
>> const_tree func,
>> >   NULL, false))
>> >  {
>> >gcc_assert (!sve_p);
>> > -  if (!aarch64_composite_type_p (type, mode))
>> > +  if (aarch64_advsimd_full_struct_mode_p (mode))
>> > +  {
>> > +gcc_assert (known_eq (exact_div (GET_MODE_SIZE (mode), 16),
>> count));
>> > +return gen_rtx_REG (mode, V0_REGNUM);
>> > +  }
>> > +  else if (aarch64_advsimd_p

Re: [PATCH] rs6000: __builtin_darn[_raw] should be in [power9-64] (PR103624)

2021-12-14 Thread Bill Schmidt via Gcc-patches
Hi!

On 12/13/21 6:22 PM, Segher Boessenkool wrote:
> On Mon, Dec 13, 2021 at 02:37:43PM -0600, Bill Schmidt wrote:
>> On 12/13/21 10:54 AM, Segher Boessenkool wrote:
>>> On Mon, Dec 13, 2021 at 11:30:28AM -0500, David Edelsohn wrote:
 On Mon, Dec 13, 2021 at 10:48 AM Bill Schmidt  
 wrote:
> PR103624 observes that we get segfaults for the 64-bit darn builtins when 
> compiled
> on a 32-bit architecture.  The old built-in infrastructure requires 
> TARGET_64BIT, and
> this was missed in the new support.  Moving these two builtins from the 
> [power9]
> stanza to the [power9-64] stanza solves the problem.
>
> Tested the fix on a powerpc-e300c3-linux-gnu cross.  Bootstrapped and 
> tested on
> powerpc64le-linux-gnu with no regressions.  Is this okay for trunk?
 Okay.
>>> No, as I said before this is not correct, not without a lot more
>>> explanation at least.  We should not copy errors in the old code into
>>> the new code.  That is negating one of the main advantages of
>>> reimplementing this in the first place!
>> Can you please be more specific?
>>
>> All I have from you before is "It should work for 32-bit though?"  I 
>> responded in the
>> bug report that __builtin_darn_32 was used for this purpose.  I haven't seen 
>> a
>> response to that.  What do you want to see happen?
> That of course does not work for _raw.
>
> These builtins should just return a "long", just like __builtin_ppc_mftb
> does.  All three of them.

Well, that seems wrong for __builtin_darn_32, which maps to an SImode pattern.

So, I assume what you'd like to see is for the other two built-ins to return
long, and for the "&& TARGET_64BIT" to be removed from the darn_raw and darn
patterns?

>
>> The patterns in rs6000.md are darn_32, gated by TARGET_P9_MISC; darn_raw, 
>> gated by
>> TARGET_P9_MISC && TARGET_64BIT; and darn, gated by TARGET_P9_MISC && 
>> TARGET_64BIT.
>> The builtins correspond to these patterns in the obvious way.
>>
>> If you think that these patterns should be enabled differently, that's fine, 
>> but
>> that's a completely different patch than fixing the incorrect built-ins to 
>> match
>> what the patterns do and thus avoid ICEing.
> Avoiding ICEs should not be a goal.  It should be a side effect of doing
> the right thing in the first place!


There's no reason to get snippy.  Given that you approved Kelvin's original
implementation of the darn patterns and built-in functions, I think I can be
forgiven for thinking that those were the desired semantics. :-)

Thanks,
Bill

>
>
> Segher


[PATCH] libstdc++: Poor man's case insensitive comparisons in time_get [PR71557]

2021-12-14 Thread Jakub Jelinek via Gcc-patches
Hi!

This patch uses the same not completely correct case insensitive comparisons
as used elsewhere in the same header.  Proper comparisons that would handle
even multi-byte characters would be harder, but I don't see them implemented
in __ctype's methods.

Tested on x86_64-linux, ok for trunk?

2021-12-14  Jakub Jelinek  

PR libstdc++/71557
* include/bits/locale_facets_nonio.tcc (_M_extract_via_format):
Compare characters other than format specifiers and whitespace
case insensitively.
(_M_extract_name): Compare characters case insensitively.
* testsuite/22_locale/time_get/get/char/71557.cc: New test.
* testsuite/22_locale/time_get/get/wchar_t/71557.cc: New test.

--- libstdc++-v3/include/bits/locale_facets_nonio.tcc.jj2021-12-10 
17:04:35.224563127 +0100
+++ libstdc++-v3/include/bits/locale_facets_nonio.tcc   2021-12-14 
13:10:40.845984740 +0100
@@ -910,7 +910,9 @@ _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11
  else
{
  // Verify format and input match, extract and discard.
- if (__format[__i] == *__beg)
+ // TODO real case-insensitive comparison
+ if (__ctype.tolower(__format[__i]) == __ctype.tolower(*__beg)
+ || __ctype.toupper(__format[__i]) == __ctype.toupper(*__beg))
++__beg;
  else
__tmperr |= ios_base::failbit;
@@ -988,15 +990,15 @@ _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11
   bool __begupdated = false;
 
   // Look for initial matches.
-  // NB: Some of the locale data is in the form of all lowercase
-  // names, and some is in the form of initially-capitalized
-  // names. Look for both.
   if (__beg != __end)
{
  const char_type __c = *__beg;
+ // TODO real case-insensitive comparison
+ const char_type __cl = __ctype.tolower(__c);
+ const char_type __cu = __ctype.toupper(__c);
  for (size_t __i1 = 0; __i1 < __indexlen; ++__i1)
-   if (__c == __names[__i1][0]
-   || __c == __ctype.toupper(__names[__i1][0]))
+   if (__cl == __ctype.tolower(__names[__i1][0])
+   || __cu == __ctype.toupper(__names[__i1][0]))
  {
__lengths[__nmatches]
  = __traits_type::length(__names[__i1]);
@@ -1023,15 +1025,22 @@ _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11
  bool __match_longer = false;
 
  if (__beg != __end)
-   for (size_t __i3 = 0; __i3 < __nmatches; ++__i3)
- {
-   __name = __names[__matches[__i3]];
-   if (__lengths[__i3] > __pos && (__name[__pos] == *__beg))
- {
-   __match_longer = true;
-   break;
- }
- }
+   {
+ // TODO real case-insensitive comparison
+ const char_type __cl = __ctype.tolower(*__beg);
+ const char_type __cu = __ctype.toupper(*__beg);
+ for (size_t __i3 = 0; __i3 < __nmatches; ++__i3)
+   {
+ __name = __names[__matches[__i3]];
+ if (__lengths[__i3] > __pos
+ && (__ctype.tolower(__name[__pos]) == __cl
+ || __ctype.toupper(__name[__pos]) == __cu))
+   {
+ __match_longer = true;
+ break;
+   }
+   }
+   }
  for (size_t __i4 = 0; __i4 < __nmatches;)
if (__match_longer == (__lengths[__i4] == __pos))
  {
@@ -1069,17 +1078,23 @@ _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11
}
}
  if (__pos < __minlen && __beg != __end)
-   for (size_t __i6 = 0; __i6 < __nmatches;)
- {
-   __name = __names[__matches[__i6]];
-   if (!(__name[__pos] == *__beg))
- {
-   __matches[__i6] = __matches[--__nmatches];
-   __lengths[__i6] = __lengths[__nmatches];
- }
-   else
- ++__i6;
- }
+   {
+ // TODO real case-insensitive comparison
+ const char_type __cl = __ctype.tolower(*__beg);
+ const char_type __cu = __ctype.toupper(*__beg);
+ for (size_t __i6 = 0; __i6 < __nmatches;)
+   {
+ __name = __names[__matches[__i6]];
+ if (__ctype.tolower(__name[__pos]) != __cl
+ && __ctype.toupper(__name[__pos]) != __cu)
+   {
+ __matches[__i6] = __matches[--__nmatches];
+ __lengths[__i6] = __lengths[__nmatches];
+   }
+ else
+   ++__i6;
+   }
+   }
  else
break;

Re: [PATCH] libstdc++: Poor man's case insensitive comparisons in time_get [PR71557]

2021-12-14 Thread Jonathan Wakely via Gcc-patches
On Tue, 14 Dec 2021 at 13:50, Jakub Jelinek via Libstdc++ <
libstd...@gcc.gnu.org> wrote:

> Hi!
>
> This patch uses the same not completely correct case insensitive
> comparisons
> as used elsewhere in the same header.  Proper comparisons that would handle
> even multi-byte characters would be harder, but I don't see them
> implemented
> in __ctype's methods.
>
> Tested on x86_64-linux, ok for trunk?
>

OK, thanks.



>
> 2021-12-14  Jakub Jelinek  
>
> PR libstdc++/71557
> * include/bits/locale_facets_nonio.tcc (_M_extract_via_format):
> Compare characters other than format specifiers and whitespace
> case insensitively.
> (_M_extract_name): Compare characters case insensitively.
> * testsuite/22_locale/time_get/get/char/71557.cc: New test.
> * testsuite/22_locale/time_get/get/wchar_t/71557.cc: New test.
>
> --- libstdc++-v3/include/bits/locale_facets_nonio.tcc.jj2021-12-10
> 17:04:35.224563127 +0100
> +++ libstdc++-v3/include/bits/locale_facets_nonio.tcc   2021-12-14
> 13:10:40.845984740 +0100
> @@ -910,7 +910,9 @@ _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11
>   else
> {
>   // Verify format and input match, extract and discard.
> - if (__format[__i] == *__beg)
> + // TODO real case-insensitive comparison
> + if (__ctype.tolower(__format[__i]) == __ctype.tolower(*__beg)
> + || __ctype.toupper(__format[__i]) ==
> __ctype.toupper(*__beg))
> ++__beg;
>   else
> __tmperr |= ios_base::failbit;
> @@ -988,15 +990,15 @@ _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11
>bool __begupdated = false;
>
>// Look for initial matches.
> -  // NB: Some of the locale data is in the form of all lowercase
> -  // names, and some is in the form of initially-capitalized
> -  // names. Look for both.
>if (__beg != __end)
> {
>   const char_type __c = *__beg;
> + // TODO real case-insensitive comparison
> + const char_type __cl = __ctype.tolower(__c);
> + const char_type __cu = __ctype.toupper(__c);
>   for (size_t __i1 = 0; __i1 < __indexlen; ++__i1)
> -   if (__c == __names[__i1][0]
> -   || __c == __ctype.toupper(__names[__i1][0]))
> +   if (__cl == __ctype.tolower(__names[__i1][0])
> +   || __cu == __ctype.toupper(__names[__i1][0]))
>   {
> __lengths[__nmatches]
>   = __traits_type::length(__names[__i1]);
> @@ -1023,15 +1025,22 @@ _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11
>   bool __match_longer = false;
>
>   if (__beg != __end)
> -   for (size_t __i3 = 0; __i3 < __nmatches; ++__i3)
> - {
> -   __name = __names[__matches[__i3]];
> -   if (__lengths[__i3] > __pos && (__name[__pos] ==
> *__beg))
> - {
> -   __match_longer = true;
> -   break;
> - }
> - }
> +   {
> + // TODO real case-insensitive comparison
> + const char_type __cl = __ctype.tolower(*__beg);
> + const char_type __cu = __ctype.toupper(*__beg);
> + for (size_t __i3 = 0; __i3 < __nmatches; ++__i3)
> +   {
> + __name = __names[__matches[__i3]];
> + if (__lengths[__i3] > __pos
> + && (__ctype.tolower(__name[__pos]) == __cl
> + || __ctype.toupper(__name[__pos]) == __cu))
> +   {
> + __match_longer = true;
> + break;
> +   }
> +   }
> +   }
>   for (size_t __i4 = 0; __i4 < __nmatches;)
> if (__match_longer == (__lengths[__i4] == __pos))
>   {
> @@ -1069,17 +1078,23 @@ _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11
> }
> }
>   if (__pos < __minlen && __beg != __end)
> -   for (size_t __i6 = 0; __i6 < __nmatches;)
> - {
> -   __name = __names[__matches[__i6]];
> -   if (!(__name[__pos] == *__beg))
> - {
> -   __matches[__i6] = __matches[--__nmatches];
> -   __lengths[__i6] = __lengths[__nmatches];
> - }
> -   else
> - ++__i6;
> - }
> +   {
> + // TODO real case-insensitive comparison
> + const char_type __cl = __ctype.tolower(*__beg);
> + const char_type __cu = __ctype.toupper(*__beg);
> + for (size_t __i6 = 0; __i6 < __nmatches;)
> +   {
> + __name = __names[__matches[__i6]];
> + if (__ctype.tolower(__name[__pos]) != __cl
> + && __cty

[committed] libstdc++: Fix non-reserved name in header

2021-12-14 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, committed to trunk.


libstdc++-v3/ChangeLog:

* include/bits/regex_compiler.tcc (_Compiler::_M_match_token):
Use reserved name for parameter.
* testsuite/17_intro/names.cc: Check "token".
---
 libstdc++-v3/include/bits/regex_compiler.tcc | 4 ++--
 libstdc++-v3/testsuite/17_intro/names.cc | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/bits/regex_compiler.tcc 
b/libstdc++-v3/include/bits/regex_compiler.tcc
index 956262a12c9..0e2e1321376 100644
--- a/libstdc++-v3/include/bits/regex_compiler.tcc
+++ b/libstdc++-v3/include/bits/regex_compiler.tcc
@@ -580,9 +580,9 @@ namespace __detail
   template
 bool
 _Compiler<_TraitsT>::
-_M_match_token(_TokenT token)
+_M_match_token(_TokenT __token)
 {
-  if (token == _M_scanner._M_get_token())
+  if (__token == _M_scanner._M_get_token())
{
  _M_value = _M_scanner._M_get_value();
  _M_scanner._M_advance();
diff --git a/libstdc++-v3/testsuite/17_intro/names.cc 
b/libstdc++-v3/testsuite/17_intro/names.cc
index 2a908ea9fc5..1341bed7a62 100644
--- a/libstdc++-v3/testsuite/17_intro/names.cc
+++ b/libstdc++-v3/testsuite/17_intro/names.cc
@@ -109,6 +109,7 @@
 #define func (
 #define tmp (
 #define sz (
+#define token (
 
 #if __cplusplus < 201103L
 #define uses_allocator  (
-- 
2.31.1



[PATCH][pushed] testsuite: fix ASAN errors

2021-12-14 Thread Martin Liška

The tests failed on my machine as they contain out-of-bounds
access.

I'm going to push the fix.

Martin

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx2-psraq-1.c: Use ARRAY_SIZE.
* gcc.target/i386/m128-check.h: Move it to the top-level
context.
* gcc.target/i386/sse2-psraq-1.c: Use ARRAY_SIZE.
* gcc.target/i386/sse4_2-check.h: Include the header with
ARRAY_SIZE definition.
---
 gcc/testsuite/gcc.target/i386/avx2-psraq-1.c | 2 +-
 gcc/testsuite/gcc.target/i386/m128-check.h   | 8 
 gcc/testsuite/gcc.target/i386/sse2-psraq-1.c | 2 +-
 gcc/testsuite/gcc.target/i386/sse4_2-check.h | 1 +
 4 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/avx2-psraq-1.c 
b/gcc/testsuite/gcc.target/i386/avx2-psraq-1.c
index e9051bf9fcb..96e5c4ccf76 100644
--- a/gcc/testsuite/gcc.target/i386/avx2-psraq-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx2-psraq-1.c
@@ -41,7 +41,7 @@ TEST (void)
   V a = (V) { 0xdeadbeefcafebabeULL, 0x123456789abcdef0ULL,
  0x173a74be8a95134cULL, 0x817bae35ac0ebf12ULL };
   int i;
-  for (i = 0; tests[i].n; i++)
+  for (i = 0; i < ARRAY_SIZE (tests); i++)
 {
   V c = tests[i].fn (a);
   if (c[0] != a[0] >> tests[i].n || c[1] != a[1] >> tests[i].n
diff --git a/gcc/testsuite/gcc.target/i386/m128-check.h 
b/gcc/testsuite/gcc.target/i386/m128-check.h
index c468eac6a13..e90e1f02d32 100644
--- a/gcc/testsuite/gcc.target/i386/m128-check.h
+++ b/gcc/testsuite/gcc.target/i386/m128-check.h
@@ -1,6 +1,10 @@
 #include 
 #include 
 
+#ifndef ARRAY_SIZE

+#define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0]))
+#endif
+
 #ifdef __SSE2__
 #include 
 
@@ -66,10 +70,6 @@ typedef union

   float a[4];
 } union128;
 
-#ifndef ARRAY_SIZE

-#define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0]))
-#endif
-
 #ifdef DEBUG
 #define PRINTF printf
 #else
diff --git a/gcc/testsuite/gcc.target/i386/sse2-psraq-1.c 
b/gcc/testsuite/gcc.target/i386/sse2-psraq-1.c
index 9a08ee4f7fa..dfb0bb8435f 100644
--- a/gcc/testsuite/gcc.target/i386/sse2-psraq-1.c
+++ b/gcc/testsuite/gcc.target/i386/sse2-psraq-1.c
@@ -41,7 +41,7 @@ TEST (void)
   V a = (V) { 0xdeadbeefcafebabeULL, 0x123456789abcdef0ULL };
   V b = (V) { 0x173a74be8a95134cULL, 0x817bae35ac0ebf12ULL };
   int i;
-  for (i = 0; tests[i].n; i++)
+  for (i = 0; i < ARRAY_SIZE (tests); i++)
 {
   V c = tests[i].fn (a);
   if (c[0] != a[0] >> tests[i].n || c[1] != a[1] >> tests[i].n)
diff --git a/gcc/testsuite/gcc.target/i386/sse4_2-check.h 
b/gcc/testsuite/gcc.target/i386/sse4_2-check.h
index d10e6c7d7e2..c33cd1b4986 100644
--- a/gcc/testsuite/gcc.target/i386/sse4_2-check.h
+++ b/gcc/testsuite/gcc.target/i386/sse4_2-check.h
@@ -1,6 +1,7 @@
 #include 
 #include 
 
+#include "m128-check.h"

 #include "cpuid.h"
 
 static void sse4_2_test (void);

--
2.34.1



RE: [PATCH][GCC] aarch64: Add LS64 extension and intrinsics

2021-12-14 Thread Przemyslaw Wirkus via Gcc-patches
> -Original Message-
> From: Richard Sandiford 
> Sent: 14 December 2021 11:58
> To: Przemyslaw Wirkus 
> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: Re: [PATCH][GCC] aarch64: Add LS64 extension and intrinsics
> 
> Przemyslaw Wirkus  writes:
> > Hello Richard,
> >
> > I've updated my patch following all your comments. Thank you.
> >
> > Boostrapped on aarch64-linux-gnu and all new ACLE tests pass.
> >
> > OK to install?
> 
> Thanks.  OK with a couple of formatting nits:

Updated and committed:

commit fdcddba8f29ea3878851b8b4cd37d0fd3476d3bf

Thank you!

> > @@ -2130,6 +2203,57 @@ aarch64_expand_builtin_tme (int fcode, tree
> exp, rtx target)
> >  return target;
> >  }
> >
> > +/* Function to expand an expression EXP which calls one of the
> Load/Store
> > +   64 Byte extension (LS64) builtins FCODE with the result going to
> > +TARGET.  */ static rtx
> > +aarch64_expand_builtin_ls64 (int fcode, tree exp, rtx target) {
> > +  expand_operand ops[3];
> > +
> > +  switch (fcode)
> > +{
> > +case AARCH64_LS64_BUILTIN_LD64B:
> > +  {
> > +rtx op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
> > +create_output_operand (&ops[0], target, V8DImode);
> > +create_input_operand (&ops[1], op0, DImode);
> > +expand_insn (CODE_FOR_ld64b, 2, ops);
> > +return ops[0].value;
> > +  }
> > +case AARCH64_LS64_BUILTIN_ST64B:
> > +  {
> > +rtx op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
> > +rtx op1 = expand_normal (CALL_EXPR_ARG (exp, 1));
> > +create_output_operand (&ops[0], op0, DImode);
> > +create_input_operand (&ops[1], op1, V8DImode);
> > +expand_insn (CODE_FOR_st64b, 2, ops);
> > +return const0_rtx;
> > +  }
> > +case AARCH64_LS64_BUILTIN_ST64BV:
> > +  {
> > +rtx op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
> > +rtx op1 = expand_normal (CALL_EXPR_ARG (exp, 1));
> > +create_output_operand (&ops[0], target, DImode);
> > +create_input_operand (&ops[1], op0, DImode);
> > +create_input_operand (&ops[2], op1, V8DImode);
> > +expand_insn (CODE_FOR_st64bv, 3, ops);
> > +return ops[0].value;
> > +  }
> > +case AARCH64_LS64_BUILTIN_ST64BV0:
> > +  {
> > +rtx op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
> > +rtx op1 = expand_normal (CALL_EXPR_ARG (exp, 1));
> > +create_output_operand (&ops[0], target, DImode);
> > +create_input_operand (&ops[1], op0, DImode);
> > +create_input_operand (&ops[2], op1, V8DImode);
> > +expand_insn (CODE_FOR_st64bv0, 3, ops);
> > +return ops[0].value;
> > +  }
> > +}
> > +
> > +gcc_unreachable ();
> 
> This line should be indented by 2 spaces rather than 4.
> 
> > +}
> > +
> >  /* Expand a random number builtin EXP with code FCODE, putting the
> result
> > int TARGET.  If IGNORE is true the return value is ignored.  */
> >
> > […]
> > diff --git a/gcc/config/aarch64/aarch64.c
> > b/gcc/config/aarch64/aarch64.c index
> >
> be24b7320d28deed9a19a0451c96bd67d2fb3104..e0ceba68968a28a9fcf1ba6
> e3a30
> > 36783b0931b0 100644
> > --- a/gcc/config/aarch64/aarch64.c
> > +++ b/gcc/config/aarch64/aarch64.c
> > @@ -10013,8 +10013,12 @@ aarch64_classify_address (struct
> aarch64_address_info *info,
> >  instruction memory accesses.  */
> >   if (mode == TImode || mode == TFmode)
> > return (aarch64_offset_7bit_signed_scaled_p (DImode, offset)
> > -   && (aarch64_offset_9bit_signed_unscaled_p (mode, offset)
> > -   || offset_12bit_unsigned_scaled_p (mode, offset)));
> > +   && (aarch64_offset_9bit_signed_unscaled_p (mode, offset)
> > +   || offset_12bit_unsigned_scaled_p (mode, offset)));
> 
> The original formatting was correct here.
> 
> > +
> > + if (mode == V8DImode)
> > +   return (aarch64_offset_7bit_signed_scaled_p (DImode, offset)
> > +   && aarch64_offset_7bit_signed_scaled_p (DImode, offset +
> > +48));
> >
> >   /* A 7bit offset check because OImode will emit a ldp/stp
> >  instruction (only big endian will get here).


[COMMITED] MAINTAINERS: Add myself to write after approval

2021-12-14 Thread Marc Poulhiès via Gcc-patches
Changelog:

* MAINTAINERS: Add myself to write after approval.
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index b74db64c1a2..8afbda71888 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -574,6 +574,7 @@ Nicolas Pitre   

 Michael Ploujnikov 
 Paul Pluzhnikov
 Antoniu Pop
+Marc Poulhiès  
 Siddhesh Poyarekar 
 Vidya Praveen  
 Thomas Preud'homme 
-- 
2.25.1



[committed] [PR99531] Do not scan push insn for ia32 in the test

2021-12-14 Thread Vladimir Makarov via Gcc-patches

This is one more patch for

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99531

The following patch fixes the test failure on ia32.

commit 4ddeae2b2777aa5136fc2bb21c15b0fcccdafece
Author: Vladimir N. Makarov 
Date:   Tue Dec 14 08:57:30 2021 -0500

[PR99531] Do not scan push insn for ia32 in the test

The patch prohibits scanning push insn for ia32 as push are expected not to be generated only for x86_64 Linux ABI.

gcc/testsuite/ChangeLog:

PR target/99531
* gcc.target/i386/pr99531.c: Do not scan for ia32.

diff --git a/gcc/testsuite/gcc.target/i386/pr99531.c b/gcc/testsuite/gcc.target/i386/pr99531.c
index 0e1a08b7c77..98536452488 100644
--- a/gcc/testsuite/gcc.target/i386/pr99531.c
+++ b/gcc/testsuite/gcc.target/i386/pr99531.c
@@ -4,4 +4,4 @@
 int func(int, int, int, int, int, int);
 int caller(int a, int b, int c, int d, int e) { return func(0, a, b, c, d, e); }
 
-/* { dg-final { scan-assembler-not "push" } } */
+/* { dg-final { scan-assembler-not "push"  { target { ! ia32 } } } } */


Re: [PATCH] i386: Fix emissing of __builtin_cpu_supports.

2021-12-14 Thread Martin Liška

On 12/14/21 11:28, Jakub Jelinek wrote:

Wouldn't this be better done only if field_val has the msb set


Yes, updated in the attached patch.


and keep the CONVERT_EXPR otherwise (why isn't it NOP_EXPR?)?


Dunno, but I can prepare a separate patch (likely stage1 material,
right)? Note that are other places that also use CONVERT_EXPR.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
MartinFrom 227450e9f3a506fdfcff67aa45135fe31f3f91f6 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 13 Dec 2021 15:34:30 +0100
Subject: [PATCH] i386: Fix emissing of __builtin_cpu_supports.

	PR target/103661

gcc/ChangeLog:

	* config/i386/i386-builtins.c (fold_builtin_cpu): Compare to 0
	as API expects that non-zero values are returned (do that
	it mask == 31).
	For "avx512vbmi2" argument, we return now 1 << 31, which is a
	negative integer value.
---
 gcc/config/i386/i386-builtins.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386-builtins.c b/gcc/config/i386/i386-builtins.c
index 0fb14b55712..bca244fc011 100644
--- a/gcc/config/i386/i386-builtins.c
+++ b/gcc/config/i386/i386-builtins.c
@@ -2353,7 +2353,11 @@ fold_builtin_cpu (tree fndecl, tree *args)
   /* Return __cpu_model.__cpu_features[0] & field_val  */
   final = build2 (BIT_AND_EXPR, unsigned_type_node, array_elt,
 		  build_int_cstu (unsigned_type_node, field_val));
-  return build1 (CONVERT_EXPR, integer_type_node, final);
+  if (isa_names_table[i].feature == 31)
+	return build2 (NE_EXPR, integer_type_node, final,
+		   build_int_cst (unsigned_type_node, 0));
+  else
+	return build1 (CONVERT_EXPR, integer_type_node, final);
 }
   gcc_unreachable ();
 }
-- 
2.34.1



Re: [PATCH] rs6000: __builtin_darn[_raw] should be in [power9-64] (PR103624)

2021-12-14 Thread Bill Schmidt via Gcc-patches
On 12/14/21 7:32 AM, Bill Schmidt wrote:
> Hi!
>
> On 12/13/21 6:22 PM, Segher Boessenkool wrote:
>>
>> These builtins should just return a "long", just like __builtin_ppc_mftb
>> does.  All three of them.
> Well, that seems wrong for __builtin_darn_32, which maps to an SImode pattern.
>
> So, I assume what you'd like to see is for the other two built-ins to return
> long, and for the "&& TARGET_64BIT" to be removed from the darn_raw and darn
> patterns?
>
For the record, I don't see how this can work.  WHen I compile:

#include 

long get_raw_random ()
{
  return __builtin_darn_raw ();
}

with these changes, the compiler thinks that __builtin_darn_raw returns a
register pair, presumably due to it being a DImode pattern.  It then pulls
the second register of the pair as the actual result.

get_raw_random:
.LFB0:
darn 10,2
mr 3,11
blr

The vregs dump shows:

(insn 5 2 6 2 (set (reg:DI 118)
(unspec_volatile:DI [
(const_int 0 [0])
] UNSPECV_DARN_RAW)) "darn-thing.c":11:10 1043 {darn_raw}
 (nil))
(insn 6 5 10 2 (set (reg:SI 117 [  ])
(subreg:SI (reg:DI 118) 4)) "darn-thing.c":11:10 543 {*movsi_internal1}
 (nil))
(insn 10 6 11 2 (set (reg/i:SI 3 3)
(reg:SI 117 [  ])) "darn-thing.c":12:1 543 {*movsi_internal1}
 (nil))
(insn 11 10 0 2 (use (reg/i:SI 3 3)) "darn-thing.c":12:1 -1
 (nil))

So if you want to support these patterns for 32-bit mode, there's more work
required.

Given this, I'd like to ask you to reconsider the original submitted patch
for now.

Thanks,
Bill



[PATCH] testsuite: fix ASAN errors in i386.exp tests

2021-12-14 Thread Martin Liška

The patch fixes various tests in i386.exp that fail with:

make check -k RUNTESTFLAGS="i386.exp --target_board=unix/-fsanitize=address"

Survives i386.exp w/o sanitizer.

Ready for master?
Thanks,
Martin

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx2-i32gatherpd256-4.c: Fix ASAN errors.
* gcc.target/i386/avx2-i32gatherq256-4.c: Likewise.
* gcc.target/i386/avx2-i64gatherpd256-4.c: Likewise.
* gcc.target/i386/avx2-i64gatherq256-4.c: Likewise.
* gcc.target/i386/avx2-vpabsb256-2.c: Likewise.
* gcc.target/i386/avx2-vpabsd256-2.c: Likewise.
* gcc.target/i386/avx2-vpabsw256-2.c: Likewise.
* gcc.target/i386/avx256-unaligned-load-7.c: Likewise.
* gcc.target/i386/avx256-unaligned-store-7.c: Likewise.
* gcc.target/i386/pr64291-1.c: Likewise.
---
 .../gcc.target/i386/avx2-i32gatherpd256-4.c| 14 +-
 .../gcc.target/i386/avx2-i32gatherq256-4.c | 14 +-
 .../gcc.target/i386/avx2-i64gatherpd256-4.c| 14 +-
 .../gcc.target/i386/avx2-i64gatherq256-4.c | 14 +-
 gcc/testsuite/gcc.target/i386/avx2-vpabsb256-2.c   |  2 +-
 gcc/testsuite/gcc.target/i386/avx2-vpabsd256-2.c   |  2 +-
 gcc/testsuite/gcc.target/i386/avx2-vpabsw256-2.c   |  2 +-
 .../gcc.target/i386/avx256-unaligned-load-7.c  |  8 
 .../gcc.target/i386/avx256-unaligned-store-7.c |  4 ++--
 gcc/testsuite/gcc.target/i386/pr64291-1.c  |  2 +-
 10 files changed, 46 insertions(+), 30 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/avx2-i32gatherpd256-4.c 
b/gcc/testsuite/gcc.target/i386/avx2-i32gatherpd256-4.c
index f24acbd7fe1..17b0c40e4d4 100644
--- a/gcc/testsuite/gcc.target/i386/avx2-i32gatherpd256-4.c
+++ b/gcc/testsuite/gcc.target/i386/avx2-i32gatherpd256-4.c
@@ -25,15 +25,19 @@ avx2_test (void)
   int i;
   union128i_d idx;
   union256d res, src, mask;
-  double s1[4], res_ref[4] = { 0 };
+  double s1[16], res_ref[4] = { 0 };
+  double *s1_ptr = s1 + 8;
 
-  for (i = 0; i < 4; ++i)

+  for (i = 0; i < ARRAY_SIZE (s1); i++)
 {
   /* Set some stuff */
   s1[i] = 2.718281828459045 * (i + 1) * (i + 2);
+}
 
+  for (i = 0; i < 4; ++i)

+{
   /* Set src as something different from s1 */
-  src.a[i] = -s1[i];
+  src.a[i] = -s1_ptr[i];
 
   /* Mask out evens */

   ((long long *) mask.a)[i] = i % 2 ? 0 : -1;
@@ -43,9 +47,9 @@ avx2_test (void)
   idx.a[i] = (16 - (i + 1) * 8) >> 1;
 }
 
-  res.x = _mm256_mask_i32gather_pd (src.x, s1, idx.x, mask.x, 2);

+  res.x = _mm256_mask_i32gather_pd (src.x, s1_ptr, idx.x, mask.x, 2);
 
-  compute_i32gatherpd256 (src.a, s1, idx.a, mask.a, 2, res_ref);

+  compute_i32gatherpd256 (src.a, s1_ptr, idx.a, mask.a, 2, res_ref);
 
   if (check_union256d (res, res_ref) != 0)

 abort ();
diff --git a/gcc/testsuite/gcc.target/i386/avx2-i32gatherq256-4.c 
b/gcc/testsuite/gcc.target/i386/avx2-i32gatherq256-4.c
index 3eab9be5c96..77ebf1fc198 100644
--- a/gcc/testsuite/gcc.target/i386/avx2-i32gatherq256-4.c
+++ b/gcc/testsuite/gcc.target/i386/avx2-i32gatherq256-4.c
@@ -25,15 +25,19 @@ avx2_test (void)
   long long i;
   union128i_d idx;
   union256i_q res, src, mask;
-  long long s1[4], res_ref[4] = { 0 };
+  long long s1[16], res_ref[4] = { 0 };
+  long long *s1_ptr = s1 + 8;
 
-  for (i = 0; i < 4; ++i)

+  for (i = 0; i < ARRAY_SIZE (s1); i++)
 {
   /* Set some stuff */
   s1[i] = 1983 * (i + 1) * (i + 2);
+}
 
+  for (i = 0; i < 4; ++i)

+{
   /* Set src as something different from s1 */
-  src.a[i] = -s1[i];
+  src.a[i] = -s1_ptr[i];
 
   /* Mask out evens */

   mask.a[i] = i % 2 ? 0 : -1;
@@ -44,10 +48,10 @@ avx2_test (void)
 }
 
   res.x = _mm256_mask_i32gather_epi64 (src.x,

-  (long long int *) s1,
+  (long long int *) s1_ptr,
   idx.x, mask.x, 2);
 
-  compute_i32gatherpd256 (src.a, s1, idx.a, mask.a, 2, res_ref);

+  compute_i32gatherpd256 (src.a, s1_ptr, idx.a, mask.a, 2, res_ref);
 
   if (check_union256i_q (res, res_ref) != 0)

 abort ();
diff --git a/gcc/testsuite/gcc.target/i386/avx2-i64gatherpd256-4.c 
b/gcc/testsuite/gcc.target/i386/avx2-i64gatherpd256-4.c
index 09a5f8a14e0..cf94256e1da 100644
--- a/gcc/testsuite/gcc.target/i386/avx2-i64gatherpd256-4.c
+++ b/gcc/testsuite/gcc.target/i386/avx2-i64gatherpd256-4.c
@@ -25,15 +25,19 @@ avx2_test (void)
   int i;
   union256i_q idx;
   union256d res, src, mask;
-  double s1[4], res_ref[4] = { 0 };
+  double s1[16], res_ref[4] = { 0 };
+  double *s1_ptr = s1 + 8;
 
-  for (i = 0; i < 4; ++i)

+  for (i = 0; i < ARRAY_SIZE (s1); i++)
 {
   /* Set some stuff */
   s1[i] = 2.718281828459045 * (i + 1) * (i + 2);
+}
 
+  for (i = 0; i < 4; ++i)

+{
   /* Set src as something different from s1 */
-  src.a[i] = -s1[i];
+  src.a[i] = -s1_ptr[i];
 
   /* Mask out even

Re: [PATCH] i386: Fix emissing of __builtin_cpu_supports.

2021-12-14 Thread Jakub Jelinek via Gcc-patches
On Tue, Dec 14, 2021 at 04:07:55PM +0100, Martin Liška wrote:
> On 12/14/21 11:28, Jakub Jelinek wrote:
> > Wouldn't this be better done only if field_val has the msb set
> 
> Yes, updated in the attached patch.
> 
> > and keep the CONVERT_EXPR otherwise (why isn't it NOP_EXPR?)?
> 
> Dunno, but I can prepare a separate patch (likely stage1 material,
> right)? Note that are other places that also use CONVERT_EXPR.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin

> From 227450e9f3a506fdfcff67aa45135fe31f3f91f6 Mon Sep 17 00:00:00 2001
> From: Martin Liska 
> Date: Mon, 13 Dec 2021 15:34:30 +0100
> Subject: [PATCH] i386: Fix emissing of __builtin_cpu_supports.
> 
>   PR target/103661
> 
> gcc/ChangeLog:
> 
>   * config/i386/i386-builtins.c (fold_builtin_cpu): Compare to 0
>   as API expects that non-zero values are returned (do that
>   it mask == 31).
>   For "avx512vbmi2" argument, we return now 1 << 31, which is a
>   negative integer value.
> ---
>  gcc/config/i386/i386-builtins.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/i386/i386-builtins.c b/gcc/config/i386/i386-builtins.c
> index 0fb14b55712..bca244fc011 100644
> --- a/gcc/config/i386/i386-builtins.c
> +++ b/gcc/config/i386/i386-builtins.c
> @@ -2353,7 +2353,11 @@ fold_builtin_cpu (tree fndecl, tree *args)
>/* Return __cpu_model.__cpu_features[0] & field_val  */
>final = build2 (BIT_AND_EXPR, unsigned_type_node, array_elt,
> build_int_cstu (unsigned_type_node, field_val));
> -  return build1 (CONVERT_EXPR, integer_type_node, final);
> +  if (isa_names_table[i].feature == 31)
> + return build2 (NE_EXPR, integer_type_node, final,
> +build_int_cst (unsigned_type_node, 0));
> +  else
> + return build1 (CONVERT_EXPR, integer_type_node, final);
>  }
>gcc_unreachable ();
>  }

I'd use INT_TYPE_SIZE - 1 instead of 31.  Otherwise LGTM.

Jakub



Re: [PATCH take #2] PR target/43892: Some carry flag (CA) optimizations on PowerPC.

2021-12-14 Thread David Edelsohn via Gcc-patches
Hi, Roger!

Thanks very much for investigating this issue and developing a patch
to leverage this feature of the PowerPC architecture.

2021-12-03  Roger Sayle  

gcc/ChangeLog
PR target/43892
* config/rs6000/rs6000.md (*add3_carry_in_0_2): New
define_insn to recognize commutative form of add3_carry_in_0.
(*add3_geu, *add3_leu, *subf3_carry_in_xx_subf,
*add3_carry_in_addc): New define_insn_and_split patterns.

It might be easier to read if each of the define_insn_and_split
ChangeLog entries were on a separate line and the latter ones said
"Same" or "Likewise", but up to you. Segher can be more pedantic.

gcc/testsuite/ChangeLog
PR target/43892
* gcc.target/powerpc/addcmp.c: New test case.
* gcc.target/powerpc/pr43892.c: New test case.

This patch is okay.

Thanks, David


Re: [Patch]Enable -Wuninitialized + -ftrivial-auto-var-init for address taken variables

2021-12-14 Thread Qing Zhao via Gcc-patches
Hi,

> On Dec 9, 2021, at 12:13 PM, Qing Zhao via Gcc-patches 
>  wrote:
>> 
>>> + return;
>>> +
>>> + /* Get the variable declaration location from the def_stmt.  */
>>> + var_decl_loc = gimple_location (def_stmt);
>>> +
>>> + /* The LHS of the call is a temporary variable, we use it as a
>>> +placeholder to record the information on whether the warning
>>> +has been issued or not.  */
>>> + repl_var = gimple_call_lhs (def_stmt);
>>> +   }
>>> }
>>> -  if (var == NULL_TREE)
>>> +  if (var == NULL_TREE && var_name == NULL_TREE)
>>> return;
>>> /* Avoid warning if we've already done so or if the warning has been
>>> @@ -207,36 +245,56 @@ warn_uninit (opt_code opt, tree t, tree var, const 
>>> char *gmsgid,
>>>   if (((warning_suppressed_p (context, OPT_Wuninitialized)
>>> || (gimple_assign_single_p (context)
>>> && get_no_uninit_warning (gimple_assign_rhs1 (context)
>>> -  || get_no_uninit_warning (var))
>>> +  || (var && get_no_uninit_warning (var))
>>> +  || (repl_var && get_no_uninit_warning (repl_var)))
>>> return;
>>> /* Use either the location of the read statement or that of the PHI
>>>  argument, or that of the uninitialized variable, in that order,
>>>  whichever is valid.  */
>>> -  location_t location;
>>> +  location_t location = UNKNOWN_LOCATION;
>>>   if (gimple_has_location (context))
>>> location = gimple_location (context);
>>>   else if (phi_arg_loc != UNKNOWN_LOCATION)
>>> location = phi_arg_loc;
>>> -  else
>>> +  else if (var)
>>> location = DECL_SOURCE_LOCATION (var);
>>> +  else if (var_name)
>>> +location = var_decl_loc;
>>> +
>>>   location = linemap_resolve_location (line_table, location,
>>>LRK_SPELLING_LOCATION, NULL);
>>> auto_diagnostic_group d;
>>> -  if (!warning_at (location, opt, gmsgid, var))
>>> +  char *gmsgid_final = XNEWVEC (char, strlen (gmsgid) + 5);
>>> +  gmsgid_final[0] = 0;
>>> +  if (var)
>>> +strcat (gmsgid_final, "%qD ");
>>> +  else if (var_name)
>>> +strcat (gmsgid_final, "%qs ");
>>> +  strcat (gmsgid_final, gmsgid);
>>> +
>>> +  if (var && !warning_at (location, opt, gmsgid_final, var))
>>> +return;
>>> +  else if (var_name && !warning_at (location, opt, gmsgid_final, 
>>> var_name_str))
>>> return;
>> 
>> Dynamically creating the string seems quite cumbersome here, and
>> it leaks the allocated block.  I wonder if it might be better to
>> remove the gmsgid argument from the function and assign it to
>> one of the literals based on the other arguments.
>> 
>> Since only one of var and var_name is used, I also wonder if
>> the %qs form could be used for both to simplify the overall
>> logic.  (I.e., get the IDENTIFIER_POINTER string from var and
>> use it instead of %qD).

Looks like that using “%qs” + get the IDENTIFIER_POINTER string from var did 
not work very well for the following testing case:

  1 /* PR tree-optimization/45083 */
  2 /* { dg-do compile } */
  3 /* { dg-options "-O2 -Wuninitialized" } */
  4 
  5 struct S { char *a; unsigned b; unsigned c; };
  6 extern int foo (const char *);
  7 extern void bar (int, int);
  8 
  9 static void
 10 baz (void)
 11 {
 12   struct S cs[1];   /* { dg-message "was declared here" } */
 13   switch (cs->b)/* { dg-warning "cs\[^\n\r\]*\\.b\[^\n\r\]*is used 
uninitialized" } */
 14 {
 15 case 101:
 16   if (foo (cs->a))  /* { dg-warning "cs\[^\n\r\]*\\.a\[^\n\r\]*may be 
used uninitialized" } */
 17 bar (cs->c, cs->b); /* { dg-warning 
"cs\[^\n\r\]*\\.c\[^\n\r\]*may be used uninitialized" } */
 18 }
 19 }
 20 
 21 void
 22 test (void)
 23 {
 24   baz ();
 25 }


For the uninitialized usages at line 13, 16, 17: the IDENTIFIER_POINTER string 
of var are:
cs$0$b, cs$0$a ,cs$0$c

However, with %qD, they are printed as cs[0].b, cs[0].a, cs[0].c
But with %qs, they are printed as cs$0$b, cs$0$a ,cs$0$c.

Looks like that %qD does not simplify print out the IDENTIFIER_POINTER string 
directly, it specially handle it for some cases. 

I tried to see how %qD specially handle the strings, but didn’t get it so far.

Do you know where the %qD handle this case specially?

Thanks.

Qing


> Both are good suggestions, I will try to update the code based on this.
> 
> Thanks again.
> 
> Qing
>> 
> 



Re: [PATCH] Remove an invalid assert. [PR103619]

2021-12-14 Thread Jakub Jelinek via Gcc-patches
On Thu, Dec 09, 2021 at 05:32:02PM +, Hafiz Abid Qadeer wrote:
> Commit 13b6c7639cf assumed that registers in a span will be in a certain
> order. But that assumption is not true at least for the big endian targets.
> Currently amdgcn is probably only target where CFA is split into multiple
> registers so build_span_loc is only gets called for it. However, the
> dwf_cfa_reg function where this ICE was seen can be called for any
> architecture from the comparison dwf_cfa_reg (src) == cur_cfa->reg in
> dwarf2out_frame_debug_expr. So dwf_cfa_reg should not assume certain
> order of registers.
> 
> I was tempted to modify the assert to handle big-endian cases but that will
> still be error prone and may fail on some other targets.

Do you have a preprocessed testcase on which the ICE triggers on ARM EB?
I think I'd like to see what we were emitting in debug info for it before
r12-5833 and what we emit with this assert removed after it.
I assumed it wouldn't affect anything but GCN during the review.
Seems the arm span hook for EB will swap pairs in the list:
  for (i = 0; i < nregs; i += 2)
if (TARGET_BIG_END)
  {
parts[i] = gen_rtx_REG (SImode, regno + i + 1);
parts[i + 1] = gen_rtx_REG (SImode, regno + i);
  }

BTW, I wonder about those
  dw_stack_pointer_regnum = dwf_cfa_reg (gen_rtx_REG (Pmode,
  STACK_POINTER_REGNUM));
and
  dw_frame_pointer_regnum
= dwf_cfa_reg (gen_rtx_REG (Pmode, HARD_FRAME_POINTER_REGNUM));
Can't those use
  dw_stack_pointer_regnum = dwf_cfa_reg (stack_pointer_rtx);
and
  dw_frame_pointer_regnum = dwf_cfa_reg (hard_frame_pointer_rtx);
?

> gcc/ChangeLog:
> 
>   PR debug/103619
>   * dwarf2cfi.c (dwf_cfa_reg): Remove gcc_assert.
> ---
>  gcc/dwarf2cfi.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
> index 9dd1dfe71b7..55ae172eda2 100644
> --- a/gcc/dwarf2cfi.c
> +++ b/gcc/dwarf2cfi.c
> @@ -1136,7 +1136,6 @@ dwf_cfa_reg (rtx reg)
> gcc_assert (GET_MODE_SIZE (GET_MODE (XVECEXP (span, 0, i)))
> .to_constant ()  == result.span_width);
> gcc_assert (REG_P (XVECEXP (span, 0, i)));
> -   gcc_assert (dwf_regno (XVECEXP (span, 0, i)) == result.reg + i);
>   }
>   }
>  }

Jakub



Re: [PATCH 2/5] Add Power10 XXSPLTI* and LXVKQ instructions (LXVKQ)

2021-12-14 Thread David Edelsohn via Gcc-patches
On Fri, Nov 5, 2021 at 2:01 PM Michael Meissner  wrote:
>
> On Fri, Nov 05, 2021 at 12:52:51PM -0500, will schmidt wrote:
> > > diff --git a/gcc/config/rs6000/predicates.md 
> > > b/gcc/config/rs6000/predicates.md
> > > index 956e42bc514..e0d1c718e9f 100644
> > > --- a/gcc/config/rs6000/predicates.md
> > > +++ b/gcc/config/rs6000/predicates.md
> > > @@ -601,6 +601,14 @@ (define_predicate "easy_fp_constant"
> > >if (TARGET_VSX && op == CONST0_RTX (mode))
> > >  return 1;
> > >
> > > +  /* Constants that can be generated with ISA 3.1 instructions are easy. 
> > >  */
> >
> > Easy is relative, but OK.
>
> The names of the function is easy_fp_constant.
>
> > > +  vec_const_128bit_type vsx_const;
> > > +  if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const))
> > > +{
> > > +  if (constant_generates_lxvkq (&vsx_const) != 0)
> > > +   return true;
> > > +}
> > > +
> > >/* Otherwise consider floating point constants hard, so that the
> > >   constant gets pushed to memory during the early RTL phases.  This
> > >   has the advantage that double precision constants that can be
> > > @@ -609,6 +617,23 @@ (define_predicate "easy_fp_constant"
> > > return 0;
> > >  })
> > >
> > > +;; Return 1 if the operand is a special IEEE 128-bit value that can be 
> > > loaded
> > > +;; via the LXVKQ instruction.
> > > +
> > > +(define_predicate "easy_vector_constant_ieee128"
> > > +  (match_code "const_vector,const_double")
> > > +{
> > > +  vec_const_128bit_type vsx_const;
> > > +
> > > +  /* Can we generate the LXVKQ instruction?  */
> > > +  if (!TARGET_IEEE128_CONSTANT || !TARGET_FLOAT128_HW || !TARGET_POWER10
> > > +  || !TARGET_VSX)
> > > +return false;
> >
> > Presumably all of the checks there are valid.  (Can we have power10
> > without float128_hw or ieee128_constant flags set?)I do notice the
> > addition of an ieee128_constant flag below.
>
> Yes, we can have power10 without float128_hw.  At the moment, 32-bit big 
> endian
> does not enable the 128-bit IEEE instructions.  Also when we are building the
> bits in libgcc that can switch between compiling the software routines and the
> routines used for IEEE hardware, and when we are building the IEEE 128-bit
> software emulation functions we need to explicitly turn off IEEE 128-bit
> hardware support.
>
> Similarly for VSX, if the user explicitly says -mno-vsx, then we can't enable
> this instruction.
>
> > Ok.  I did look at this a bit before it clicked, so would suggest a
> > comment stl "All of the constants that can be loaded by lxvkq will have
> > zero in the bottom 3 words, so ensure those are zero before we use a
> > switch based on the nonzero portion of the constant."
> >
> > It would be fine as-is too.  :-)
>
> Ok.

Okay.

Thanks, David


Re: [PATCH 1/5] Add XXSPLTI* and LXVKQ instructions (new data structure and function)

2021-12-14 Thread David Edelsohn via Gcc-patches
On Fri, Nov 5, 2021 at 2:13 PM Michael Meissner  wrote:
>
> On Fri, Nov 05, 2021 at 12:01:43PM -0500, will schmidt wrote:
> > On Fri, 2021-11-05 at 00:04 -0400, Michael Meissner wrote:
> > > Add new constant data structure.
> > >
> > > This patch provides the data structure and function to convert a
> > > CONST_INT, CONST_DOUBLE, CONST_VECTOR, or VEC_DUPLICATE of a constant) to
> > > an array of bytes, half-words, words, and  double words that can be loaded
> > > into a 128-bit vector register.
> > >
> > > The next patches will use this data structure to generate code that
> > > generates load of the vector/floating point registers using the XXSPLTIDP,
> > > XXSPLTIW, and LXVKQ instructions that were added in power10.
> > >
> > > 2021-11-05  Michael Meissner  
> > >
>
> Whoops, it should be meiss...@linux.ibm.com.
>
> > comment to be explicit on the structure name being copied to/from.
> > (vec_const_128bit_type is easy to search for, vector or constant or
> > structure are not as unique)
>
> Yes, the original name was more generic (rs6000_const).  Originally it could
> potentially handle vector constants that were greater than 128-bits if we ever
> have support for larger vectors.  But I thought that extra generallity 
> hindered
> the code (since you had to check whether the size was exactly 128-bits, etc.).
> So I made the data structure tailored to the problem at hand.
>
> > > +
> > > +/* Copy an floating point constant to the vector constant structure.  */
> > > +
> >
> > s/an/a/
>
> Ok.
>
> > > +static void
> > > +constant_fp_to_128bit_vector (rtx op,
> > > + machine_mode mode,
> > > + size_t byte_num,
> > > + vec_const_128bit_type *info)
> > > +{
> > > +  unsigned bitsize = GET_MODE_BITSIZE (mode);
> > > +  unsigned num_words = bitsize / 32;
> > > +  const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op);
> > > +  long real_words[VECTOR_128BIT_WORDS];
> > > +
> > > +  /* Make sure we don't overflow the real_words array and that it is
> > > + filled completely.  */
> > > +  gcc_assert (num_words <= VECTOR_128BIT_WORDS && (bitsize % 32) == 0);
> >
> > Not clear to me on the potential to partially fill the real_words
> > array.
>
> At the moment we don't support a 16-bit floating point type in the compiler
> (the Power10 has limited 16-bit floating point support, but we don't make a
> special type for it).  If/when we add the 16-bit floating point, we will
> possibly need to revisit this.
>
> > > +
> > > +  real_to_target (real_words, rtype, mode);
> > > +
> > > +  /* Iterate over each 32-bit word in the floating point constant.  The
> > > + real_to_target function puts out words in endian fashion.  We need
> >
> > Meaning host-endian fashion, or is that meant to be big-endian ?
>
> Real_to_target puts out the 32-bit values in endian fashion.  This data
> structure wants to hold everything in big endian fashion to make checking
> things simpler.
>
> > Perhaps also rephrase or move the comment up to indicate that
> > real_to_target will have placed or has already placed the words in
> >  endian fashion.
> > As stated I was expecting to see a call to real_to_target() below the
> > comment.
>
> Yes, I probably should move the real_to_target call after the comment.
>
> > > +
> > > +  /* Possibly splat the constant to fill a vector size.  */
> >
> >
> > Suggest "Splat the constant to fill a vector size if ..."
>
> Ok.

Okay.

Thanks, David


Re: [PATCH 3/5] Add Power10 XXSPLTIW

2021-12-14 Thread David Edelsohn via Gcc-patches
On Fri, Nov 5, 2021 at 2:50 PM will schmidt  wrote:
>
> On Fri, 2021-11-05 at 00:09 -0400, Michael Meissner wrote:
> > Generate XXSPLTIW on power10.
> >
>
> Hi,
>
>
> > This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
> > instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
> > adding support for vector constants that can be used, and adding a
> > VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
> >
> > The eP constraint was added to recognize constants that can be loaded into
> > vector registers with a single prefixed instruction.
>
> Perhaps Swap "... the eP constraint was added ..."  for "Add the eP
> constraint to ..."
>
>
> >
> > I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector
> > constants.
>
>
> >
> > 2021-11-05  Michael Meissner  
> >
> > gcc/
> >
> >   * config/rs6000/constraints.md (eP): Update comment.
> >   * config/rs6000/predicates.md (easy_fp_constant): Add support for
> >   generating XXSPLTIW.
> >   (vsx_prefixed_constant): New predicate.
> >   (easy_vector_constant): Add support for
> >   generating XXSPLTIW.
> >   * config/rs6000/rs6000-protos.h (prefixed_xxsplti_p): New
> >   declaration.
> >   (constant_generates_xxspltiw): Likewise.
> >   * config/rs6000/rs6000.c (xxspltib_constant_p): If we can generate
> >   XXSPLTIW, don't do XXSPLTIB and sign extend.
>
> Perhaps just 'generate XXSPLTIW if possible'.
>
> >   (output_vec_const_move): Add support for XXSPLTIW.
> >   (prefixed_xxsplti_p): New function.
> >   (constant_generates_xxspltiw): New function.
> >   * config/rs6000/rs6000.md (prefixed attribute): Add support to
> >   mark XXSPLTI* instructions as being prefixed.
> >   * config/rs6000/rs6000.opt (-msplat-word-constant): New debug
> >   switch.
> >   * config/rs6000/vsx.md (vsx_mov_64bit): Add support for
> >   generating XXSPLTIW or XXSPLTIDP.
> >   (vsx_mov_32bit): Likewise.
> >   * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
> >   eP constraint.
> >
> > gcc/testsuite/
> >
> >   * gcc.target/powerpc/vec-splat-constant-v16qi.c: New test.
> >   * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
> >   * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
> >   * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.
> >   * gcc.target/powerpc/vec-splati-runnable.c: Update insn count.
> > ---
> >  gcc/config/rs6000/constraints.md  |  6 ++
> >  gcc/config/rs6000/predicates.md   | 46 ++-
> >  gcc/config/rs6000/rs6000-protos.h |  2 +
> >  gcc/config/rs6000/rs6000.c| 81 +++
> >  gcc/config/rs6000/rs6000.md   |  5 ++
> >  gcc/config/rs6000/rs6000.opt  |  4 +
> >  gcc/config/rs6000/vsx.md  | 28 +++
> >  gcc/doc/md.texi   |  4 +
> >  .../powerpc/vec-splat-constant-v16qi.c| 27 +++
> >  .../powerpc/vec-splat-constant-v4sf.c | 67 +++
> >  .../powerpc/vec-splat-constant-v4si.c | 51 
> >  .../powerpc/vec-splat-constant-v8hi.c | 62 ++
> >  .../gcc.target/powerpc/vec-splati-runnable.c  |  4 +-
> >  13 files changed, 369 insertions(+), 18 deletions(-)
> >  create mode 100644 
> > gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
> >  create mode 100644 
> > gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
> >  create mode 100644 
> > gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
> >  create mode 100644 
> > gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
> >
> > diff --git a/gcc/config/rs6000/constraints.md 
> > b/gcc/config/rs6000/constraints.md
> > index e72132b4c28..a4b05837fa6 100644
> > --- a/gcc/config/rs6000/constraints.md
> > +++ b/gcc/config/rs6000/constraints.md
> > @@ -213,6 +213,12 @@ (define_constraint "eI"
> >"A signed 34-bit integer constant if prefixed instructions are 
> > supported."
> >(match_operand 0 "cint34_operand"))
> >
> > +;; A SF/DF scalar constant or a vector constant that can be loaded into 
> > vector
> > +;; registers with one prefixed instruction such as XXSPLTIDP or XXSPLTIW.
> > +(define_constraint "eP"
> > +  "A constant that can be loaded into a VSX register with one prefixed 
> > insn."
> > +  (match_operand 0 "vsx_prefixed_constant"))
> > +
> >  ;; A TF/KF scalar constant or a vector constant that can load certain IEEE
> >  ;; 128-bit constants into vector registers using LXVKQ.
> >  (define_constraint "eQ"
> > diff --git a/gcc/config/rs6000/predicates.md 
> > b/gcc/config/rs6000/predicates.md
> > index e0d1c718e9f..ed6252bd0c4 100644
> > --- a/gcc/config/rs6000/predicates.md
> > +++ b/gcc/config/rs6000/predicates.md
> > @@ -605,7 +605,10 @@ (define_predicate "easy_fp_constant"
> >vec_const_128bit_type vsx_const;
> >if (TARGET_POWER10 && v

Re: [PATCH 4/5] Add Power10 XXSPLTIDP for vector constants

2021-12-14 Thread David Edelsohn via Gcc-patches
On Fri, Nov 5, 2021 at 3:24 PM will schmidt  wrote:
>
> On Fri, 2021-11-05 at 00:10 -0400, Michael Meissner wrote:
> > Generate XXSPLTIDP for vectors on power10.
> >
> > This patch implements XXSPLTIDP support for all vector constants.  The
> > XXSPLTIDP instruction is given a 32-bit immediate that is converted to a 
> > vector
> > of two DFmode constants.  The immediate is in SFmode format, so only 
> > constants
> > that fit as SFmode values can be loaded with XXSPLTIDP.
> >
> > The constraint (eP) added in the previous patch for XXSPLTIW is also used
> > for XXSPLTIDP.
> >
>
> ok
>
>
> > DImode scalar constants are not handled.  This is due to the majority of 
> > DImode
> > constants will be in the GPR registers.  With vector registers, you have the
> > problem that XXSPLTIDP splats the double word into both elements of the
> > vector.  However, if TImode is loaded with an integer constant, it wants a 
> > full
> > 128-bit constant.
>
> This may be worth as adding to a todo somewhere in the code.
>
> >
> > SFmode and DFmode scalar constants are not handled in this patch.  The
> > support for for those constants will be in the next patch.
>
> ok
>
> >
> > I have added a temporary switch (-msplat-float-constant) to control whether 
> > or
> > not the XXSPLTIDP instruction is generated.
> >
> > I added 2 new tests to test loading up V2DI and V2DF vector constants.
>
>
>
>
> >
> > 2021-11-05  Michael Meissner  
> >
> > gcc/
> >
> >   * config/rs6000/predicates.md (easy_fp_constant): Add support for
> >   generating XXSPLTIDP.
> >   (vsx_prefixed_constant): Likewise.
> >   (easy_vector_constant): Likewise.
> >   * config/rs6000/rs6000-protos.h (constant_generates_xxspltidp):
> >   New declaration.
> >   * config/rs6000/rs6000.c (output_vec_const_move): Add support for
> >   generating XXSPLTIDP.
> >   (prefixed_xxsplti_p): Likewise.
> >   (constant_generates_xxspltidp): New function.
> >   * config/rs6000/rs6000.opt (-msplat-float-constant): New debug option.
> >
> > gcc/testsuite/
> >
> >   * gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn
> >   regex for power10.
> >   * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
> >   * gcc.target/powerpc/vec-splat-constant-v2di.c: New test.
> > ---
>
>
> ok
>
> >  gcc/config/rs6000/predicates.md   |   9 ++
> >  gcc/config/rs6000/rs6000-protos.h |   1 +
> >  gcc/config/rs6000/rs6000.c| 108 ++
> >  gcc/config/rs6000/rs6000.opt  |   4 +
> >  .../powerpc/pr86731-fwrapv-longlong.c |   9 +-
> >  .../powerpc/vec-splat-constant-v2df.c |  64 +++
> >  .../powerpc/vec-splat-constant-v2di.c |  50 
> >  7 files changed, 241 insertions(+), 4 deletions(-)
> >  create mode 100644 
> > gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
> >  create mode 100644 
> > gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
> >
> > diff --git a/gcc/config/rs6000/predicates.md 
> > b/gcc/config/rs6000/predicates.md
> > index ed6252bd0c4..d748b11857c 100644
> > --- a/gcc/config/rs6000/predicates.md
> > +++ b/gcc/config/rs6000/predicates.md
> > @@ -610,6 +610,9 @@ (define_predicate "easy_fp_constant"
> >
> >if (constant_generates_xxspltiw (&vsx_const))
> >   return true;
> > +
> > +  if (constant_generates_xxspltidp (&vsx_const))
> > + return true;
> >  }
> >
> >/* Otherwise consider floating point constants hard, so that the
> > @@ -653,6 +656,9 @@ (define_predicate "vsx_prefixed_constant"
> >if (constant_generates_xxspltiw (&vsx_const))
> >  return true;
> >
> > +  if (constant_generates_xxspltidp (&vsx_const))
> > +return true;
> > +
> >return false;
> >  })
> >
> > @@ -727,6 +733,9 @@ (define_predicate "easy_vector_constant"
> >
> > if (constant_generates_xxspltiw (&vsx_const))
> >   return true;
> > +
> > +   if (constant_generates_xxspltidp (&vsx_const))
> > + return true;
> >   }
>
>
> ok
>
> >
> >if (TARGET_P9_VECTOR
> > diff --git a/gcc/config/rs6000/rs6000-protos.h 
> > b/gcc/config/rs6000/rs6000-protos.h
> > index 99c6a671289..2d28df7442d 100644
> > --- a/gcc/config/rs6000/rs6000-protos.h
> > +++ b/gcc/config/rs6000/rs6000-protos.h
> > @@ -253,6 +253,7 @@ extern bool vec_const_128bit_to_bytes (rtx, 
> > machine_mode,
> >  vec_const_128bit_type *);
> >  extern unsigned constant_generates_lxvkq (vec_const_128bit_type *);
> >  extern unsigned constant_generates_xxspltiw (vec_const_128bit_type *);
> > +extern unsigned constant_generates_xxspltidp (vec_const_128bit_type *);
> >  #endif /* RTX_CODE */
> >
> >  #ifdef TREE_CODE
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index be24f56eb31..8fde48cf2b3 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -7012,6 +7012,13 @@ output_vec_const_move (rtx 

Re: [PATCH 5/5] Add Power10 XXSPLTIDP for SFmode/DFmode constants.

2021-12-14 Thread David Edelsohn via Gcc-patches
On Fri, Nov 5, 2021 at 3:38 PM will schmidt  wrote:
>
> On Fri, 2021-11-05 at 00:11 -0400, Michael Meissner wrote:
> > Generate XXSPLTIDP for scalars on power10.
> >
> > This patch implements XXSPLTIDP support for SF, and DF scalar constants.
> > The previous patch added support for vector constants.  This patch adds
> > the support for SFmode and DFmode scalar constants.
> >
> > I added 2 new tests to test loading up SF and DF scalar constants.
>
>
> ok
>
> >
> > 2021-11-05  Michael Meissner  
> >
> > gcc/
> >
> >   * config/rs6000/rs6000.md (UNSPEC_XXSPLTIDP_CONST): New unspec.
> >   (UNSPEC_XXSPLTIW_CONST): New unspec.
> >   (movsf_hardfloat): Add support for generating XXSPLTIDP.
> >   (mov_hardfloat32): Likewise.
> >   (mov_hardfloat64): Likewise.
> >   (xxspltidp__internal): New insns.
> >   (xxspltiw__internal): New insns.
> >   (splitters for SF/DFmode): Add new splitters for XXSPLTIDP.
> >
> > gcc/testsuite/
> >
> >   * gcc.target/powerpc/vec-splat-constant-df.c: New test.
> >   * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
> > ---
>
> ok
>
>
> >  gcc/config/rs6000/rs6000.md   | 97 +++
> >  .../powerpc/vec-splat-constant-df.c   | 60 
> >  .../powerpc/vec-splat-constant-sf.c   | 60 
> >  3 files changed, 199 insertions(+), 18 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
> >
> > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> > index 3a7bcd2426e..4122acb98cf 100644
> > --- a/gcc/config/rs6000/rs6000.md
> > +++ b/gcc/config/rs6000/rs6000.md
> > @@ -156,6 +156,8 @@ (define_c_enum "unspec"
> > UNSPEC_PEXTD
> > UNSPEC_HASHST
> > UNSPEC_HASHCHK
> > +   UNSPEC_XXSPLTIDP_CONST
> > +   UNSPEC_XXSPLTIW_CONST
> >])
> >
> >  ;;
> > @@ -7764,17 +7766,17 @@ (define_split
> >  ;;
> >  ;;   LWZ  LFSLXSSP   LXSSPX STFS   STXSSP
> >  ;;   STXSSPX  STWXXLXOR  LI FMRXSCPSGNDP
> > -;;   MR   MT  MF   NOP
> > +;;   MR   MT  MF   NOPXXSPLTIDP
> >
> >  (define_insn "movsf_hardfloat"
> >[(set (match_operand:SF 0 "nonimmediate_operand"
> >"=!r,   f, v,  wa,m, wY,
> > Z, m, wa, !r,f, wa,
> > -   !r,*c*l,  !r, *h")
> > +   !r,*c*l,  !r, *h,wa")
> >   (match_operand:SF 1 "input_operand"
> >"m, m, wY, Z, f, v,
> > wa,r, j,  j, f, wa,
> > -   r, r, *h, 0"))]
> > +   r, r, *h, 0, eP"))]
> >"(register_operand (operands[0], SFmode)
> > || register_operand (operands[1], SFmode))
> > && TARGET_HARD_FLOAT
> > @@ -7796,15 +7798,16 @@ (define_insn "movsf_hardfloat"
> > mr %0,%1
> > mt%0 %1
> > mf%1 %0
> > -   nop"
> > +   nop
> > +   #"
> >[(set_attr "type"
> >   "load,   fpload,fpload, fpload,fpstore,   fpstore,
> >fpstore,store, veclogical, integer,   fpsimple,  fpsimple,
> > -  *,  mtjmpr,mfjmpr, *")
> > +  *,  mtjmpr,mfjmpr, *, vecperm")
> > (set_attr "isa"
> >   "*,  *, p9v,p8v,   *, p9v,
> >p8v,*, *,  *, *, *,
> > -  *,  *, *,  *")])
> > +  *,  *, *,  *, p10")])
> >
> >  ;;   LWZ  LFIWZX STWSTFIWX MTVSRWZMFVSRWZ
> >  ;;   FMR  MR MT%0   MF%1   NOP
> > @@ -8064,18 +8067,18 @@ (define_split
> >
> >  ;;   STFD LFD FMR LXSDSTXSD
> >  ;;   LXSD STXSD   XXLOR   XXLXOR  GPR<-0
> > -;;   LWZ  STW MR
> > +;;   LWZ  STW MR  XXSPLTIDP
> >
> >
> >  (define_insn "*mov_hardfloat32"
> >[(set (match_operand:FMOVE64 0 "nonimmediate_operand"
> >  "=m,  d,  d,  ,   wY,
> >,   Z,  ,  ,  !r,
> > -  Y,  r,  !r")
> > +  Y,  r,  !r, wa")
> >   (match_operand:FMOVE64 1 "input_operand"
> >   "d,  m,  d,  wY, ,
> >Z,  ,   ,  ,  ,
> > -  r,  Y,  r"))]
> > +  r,  Y,  r,  eP"))]
> >"! TARGET_POWERPC64 && TARGET_HARD_FLOAT
> > && (gpc_reg_operand (operands[0], mode)
> > || gpc_reg_operand (operands[1], mode))"
> > @@ -8092,20 +8095,21 @@ (define_insn "*mov

Re: [PATCH v2 0/6] Remove "old" built-in function infrastructure

2021-12-14 Thread Bill Schmidt via Gcc-patches
Hi!  I'd like to ping patches 2 through 6 of this series.  Much obliged!

Thanks,
Bill


On 12/6/21 2:49 PM, Bill Schmidt via Gcc-patches wrote:
> Hi!
>
> Now that the new built-in function support is all upstream and enabled, it
> seems safe and prudent to remove the old code to avoid confusion.  I broke 
> this
> up to the extent possible, but a couple of patches are still pretty large.
>
> David Edelsohn found that I had broken some C++ library functions for AIX, and
> his fix for that required me to re-spin the patches.  I also generated the 
> diff
> with a more efficient algorithm to reduce the patch size.  Otherwise this
> series is identical to V1.
>
> Thanks!
> Bill
>
> Bill Schmidt (6):
>   rs6000: Remove new_builtins_are_live and dead code it was guarding
>   rs6000: Remove altivec_overloaded_builtins array and initialization
>   rs6000: Rename rs6000-builtin-new.def to rs6000-builtins.def
>   rs6000: Remove rs6000-builtin.def and associated data and functions
>   rs6000: Rename functions with "new" in their names
>   rs6000: Rename arrays to remove temporary _x suffix
>
>  gcc/config/rs6000/darwin.h| 8 +-
>  gcc/config/rs6000/rs6000-builtin.def  |  3350 -
>  ...00-builtin-new.def => rs6000-builtins.def} | 0
>  gcc/config/rs6000/rs6000-c.c  |  1266 +-
>  gcc/config/rs6000/rs6000-call.c   | 11964 +---
>  gcc/config/rs6000/rs6000-gen-builtins.c   |   115 +-
>  gcc/config/rs6000/rs6000-internal.h   | 2 +-
>  gcc/config/rs6000/rs6000-protos.h | 3 -
>  gcc/config/rs6000/rs6000.c|   334 +-
>  gcc/config/rs6000/rs6000.h|58 -
>  gcc/config/rs6000/t-rs6000| 7 +-
>  11 files changed, 224 insertions(+), 16883 deletions(-)
>  delete mode 100644 gcc/config/rs6000/rs6000-builtin.def
>  rename gcc/config/rs6000/{rs6000-builtin-new.def => rs6000-builtins.def} 
> (100%)
>


Re: [PATCH] Remove an invalid assert. [PR103619]

2021-12-14 Thread Jeff Law via Gcc-patches



On 12/14/2021 9:53 AM, Jakub Jelinek via Gcc-patches wrote:

On Thu, Dec 09, 2021 at 05:32:02PM +, Hafiz Abid Qadeer wrote:

Commit 13b6c7639cf assumed that registers in a span will be in a certain
order. But that assumption is not true at least for the big endian targets.
Currently amdgcn is probably only target where CFA is split into multiple
registers so build_span_loc is only gets called for it. However, the
dwf_cfa_reg function where this ICE was seen can be called for any
architecture from the comparison dwf_cfa_reg (src) == cur_cfa->reg in
dwarf2out_frame_debug_expr. So dwf_cfa_reg should not assume certain
order of registers.

I was tempted to modify the assert to handle big-endian cases but that will
still be error prone and may fail on some other targets.

Do you have a preprocessed testcase on which the ICE triggers on ARM EB?
I think I'd like to see what we were emitting in debug info for it before
r12-5833 and what we emit with this assert removed after it.
I assumed it wouldn't affect anything but GCN during the review.
Seems the arm span hook for EB will swap pairs in the list:
   for (i = 0; i < nregs; i += 2)
 if (TARGET_BIG_END)
   {
 parts[i] = gen_rtx_REG (SImode, regno + i + 1);
 parts[i + 1] = gen_rtx_REG (SImode, regno + i);
   }

BTW, I wonder about those
   dw_stack_pointer_regnum = dwf_cfa_reg (gen_rtx_REG (Pmode,
   STACK_POINTER_REGNUM));
and
   dw_frame_pointer_regnum
 = dwf_cfa_reg (gen_rtx_REG (Pmode, HARD_FRAME_POINTER_REGNUM));
Can't those use
   dw_stack_pointer_regnum = dwf_cfa_reg (stack_pointer_rtx);
and
   dw_frame_pointer_regnum = dwf_cfa_reg (hard_frame_pointer_rtx);
?

I think the attached testcase should trigger on c6x with -mbig-endian -O2 -g

Jeff
# 0 "../../../../../../..//newlib-cygwin/newlib/libc/argz/argz_append.c"
# 1 
"//home/jlaw/jenkins/workspace/c6x-elf/c6x-elf-obj/newlib/c6x-elf/be/newlib/libc/argz//"
# 0 ""
# 0 ""
# 1 "../../../../../../..//newlib-cygwin/newlib/libc/argz/argz_append.c"






# 1 
"/home/jlaw/jenkins/workspace/c6x-elf/newlib-cygwin/newlib/libc/include/argz.h" 
1 3 4
# 10 
"/home/jlaw/jenkins/workspace/c6x-elf/newlib-cygwin/newlib/libc/include/argz.h" 
3 4
# 1 
"/home/jlaw/jenkins/workspace/c6x-elf/newlib-cygwin/newlib/libc/include/errno.h"
 1 3 4





# 5 
"/home/jlaw/jenkins/workspace/c6x-elf/newlib-cygwin/newlib/libc/include/errno.h"
 3 4
typedef int error_t;



# 1 
"/home/jlaw/jenkins/workspace/c6x-elf/newlib-cygwin/newlib/libc/include/sys/errno.h"
 1 3 4
# 11 
"/home/jlaw/jenkins/workspace/c6x-elf/newlib-cygwin/newlib/libc/include/sys/errno.h"
 3 4
# 1 
"/home/jlaw/jenkins/workspace/c6x-elf/newlib-cygwin/newlib/libc/include/sys/reent.h"
 1 3 4
# 13 
"/home/jlaw/jenkins/workspace/c6x-elf/newlib-cygwin/newlib/libc/include/sys/reent.h"
 3 4
# 1 
"/home/jlaw/jenkins/workspace/c6x-elf/newlib-cygwin/newlib/libc/include/_ansi.h"
 1 3 4
# 10 
"/home/jlaw/jenkins/workspace/c6x-elf/newlib-cygwin/newlib/libc/include/_ansi.h"
 3 4
# 1 
"/home/jlaw/jenkins/workspace/c6x-elf/c6x-elf-obj/newlib/c6x-elf/be/newlib/targ-include/newlib.h"
 1 3 4
# 14 
"/home/jlaw/jenkins/workspace/c6x-elf/c6x-elf-obj/newlib/c6x-elf/be/newlib/targ-include/newlib.h"
 3 4
# 1 
"/home/jlaw/jenkins/workspace/c6x-elf/c6x-elf-obj/newlib/c6x-elf/be/newlib/targ-include/_newlib_version.h"
 1 3 4
# 15 
"/home/jlaw/jenkins/workspace/c6x-elf/c6x-elf-obj/newlib/c6x-elf/be/newlib/targ-include/newlib.h"
 2 3 4
# 11 
"/home/jlaw/jenkins/workspace/c6x-elf/newlib-cygwin/newlib/libc/include/_ansi.h"
 2 3 4
# 1 
"/home/jlaw/jenkins/workspace/c6x-elf/newlib-cygwin/newlib/libc/include/sys/config.h"
 1 3 4



# 1 
"/home/jlaw/jenkins/workspace/c6x-elf/newlib-cygwin/newlib/libc/include/machine/ieeefp.h"
 1 3 4
# 5 
"/home/jlaw/jenkins/workspace/c6x-elf/newlib-cygwin/newlib/libc/include/sys/config.h"
 2 3 4
# 1 
"/home/jlaw/jenkins/workspace/c6x-elf/newlib-cygwin/newlib/libc/include/sys/features.h"
 1 3 4
# 6 
"/home/jlaw/jenkins/workspace/c6x-elf/newlib-cygwin/newlib/libc/include/sys/config.h"
 2 3 4
# 12 
"/home/jlaw/jenkins/workspace/c6x-elf/newlib-cygwin/newlib/libc/include/_ansi.h"
 2 3 4
# 14 
"/home/jlaw/jenkins/workspace/c6x-elf/newlib-cygwin/newlib/libc/include/sys/reent.h"
 2 3 4
# 1 
"/home/jlaw/jenkins/workspace/c6x-elf/c6x-elf-installed/lib/gcc/c6x-elf/12.0.0/include/stddef.h"
 1 3 4
# 143 
"/home/jlaw/jenkins/workspace/c6x-elf/c6x-elf-installed/lib/gcc/c6x-elf/12.0.0/include/stddef.h"
 3 4
typedef int ptrdiff_t;
# 209 
"/home/jlaw/jenkins/workspace/c6x-elf/c6x-elf-installed/lib/gcc/c6x-elf/12.0.0/include/stddef.h"
 3 4
typedef unsigned int size_t;
# 321 
"/home/jlaw/jenkins/workspace/c6x-elf/c6x-elf-installed/lib/gcc/c6x-elf/12.0.0/include/stddef.h"
 3 4
typedef int wchar_t;
# 415 
"/home/jlaw/jenkins/workspace/c6x-elf/c6x-elf-installed/lib/gcc/c6x-elf/12.0.0/include/stddef.h"
 3 4
typedef struct {
  long long __max_align_ll __attribute__((__aligned__(__alignof__(l

[PATCH] i386: Implement VxHF vector set/insert/extract with lower ABI levels

2021-12-14 Thread Uros Bizjak via Gcc-patches
This is a preparation patch that moves VxHF vector set/insert/extract
expansions from AVX512FP16 ABI to lower ABIs.  There are no functional
changes for -mavx512fp16 and a follow-up patch is needed to actually
enable VxHF vector modes for lower ABIs.

2021-12-14  Uroš Bizjak  

gcc/ChangeLog:

PR target/103571
* config/i386/i386-expand.c (ix86_expand_vector_init_duplicate)
: Implement for TARGET_SSE2.
: Implement for TARGET_AVX.
: Implement for TARGET_AVX512F.
(ix86_expand_vector_set_var): Handle V32HFmode
without TARGET_AVX512BW.
(ix86_expand_vector_extract)
: Implement for TARGET_SSE2.
: Implement for TARGET_AVX.
: Implement for TARGET_AVX512BW.
(expand_vec_perm_broadcast_1) : New.
* config/i386/sse.md (VI12HF_AVX512VL): Remove
TARGET_AVX512FP16 condition.
(V): Ditto.
(V_256_512): Ditto.
(avx_vbroadcastf128_): Use V_256H mode iterator.

Bootstrapped and regression tested on x86_64-linux-gnu {-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 2bbb28e5317..7013c20a97a 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -14855,6 +14855,7 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, 
machine_mode mode,
   goto widen;
 
 case E_V8HImode:
+case E_V8HFmode:
   if (TARGET_AVX2)
return ix86_vector_duplicate_value (mode, target, val);
 
@@ -14871,15 +14872,22 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, 
machine_mode mode,
  dperm.op0 = dperm.op1 = gen_reg_rtx (mode);
  dperm.one_operand_p = true;
 
- /* Extend to SImode using a paradoxical SUBREG.  */
- tmp1 = gen_reg_rtx (SImode);
- emit_move_insn (tmp1, gen_lowpart (SImode, val));
-
- /* Insert the SImode value as low element of a V4SImode vector.  */
- tmp2 = gen_reg_rtx (V4SImode);
- emit_insn (gen_vec_setv4si_0 (tmp2, CONST0_RTX (V4SImode), tmp1));
- emit_move_insn (dperm.op0, gen_lowpart (mode, tmp2));
+ if (mode == V8HFmode)
+   tmp1 = lowpart_subreg (V8HFmode, force_reg (HFmode, val), HFmode);
+ else
+   {
+ /* Extend to SImode using a paradoxical SUBREG.  */
+ tmp1 = gen_reg_rtx (SImode);
+ emit_move_insn (tmp1, gen_lowpart (SImode, val));
+
+ /* Insert the SImode value as
+low element of a V4SImode vector.  */
+ tmp2 = gen_reg_rtx (V4SImode);
+ emit_insn (gen_vec_setv4si_0 (tmp2, CONST0_RTX (V4SImode), tmp1));
+ tmp1 = gen_lowpart (mode, tmp2);
+   }
 
+ emit_move_insn (dperm.op0, tmp1);
  ok = (expand_vec_perm_1 (&dperm)
|| expand_vec_perm_broadcast_1 (&dperm));
  gcc_assert (ok);
@@ -14926,12 +14934,15 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, 
machine_mode mode,
   }
 
 case E_V16HImode:
+case E_V16HFmode:
 case E_V32QImode:
   if (TARGET_AVX2)
return ix86_vector_duplicate_value (mode, target, val);
   else
{
- machine_mode hvmode = (mode == V16HImode ? V8HImode : V16QImode);
+ machine_mode hvmode = (mode == V16HImode ? V8HImode
+: mode == V16HFmode ? V8HFmode
+: V16QImode);
  rtx x = gen_reg_rtx (hvmode);
 
  ok = ix86_expand_vector_init_duplicate (false, hvmode, x, val);
@@ -14942,13 +14953,16 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, 
machine_mode mode,
}
   return true;
 
-case E_V64QImode:
 case E_V32HImode:
+case E_V32HFmode:
+case E_V64QImode:
   if (TARGET_AVX512BW)
return ix86_vector_duplicate_value (mode, target, val);
   else
{
- machine_mode hvmode = (mode == V32HImode ? V16HImode : V32QImode);
+ machine_mode hvmode = (mode == V32HImode ? V16HImode
+: mode == V32HFmode ? V16HFmode
+: V32QImode);
  rtx x = gen_reg_rtx (hvmode);
 
  ok = ix86_expand_vector_init_duplicate (false, hvmode, x, val);
@@ -14959,11 +14973,6 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, 
machine_mode mode,
}
   return true;
 
-case E_V8HFmode:
-case E_V16HFmode:
-case E_V32HFmode:
-  return ix86_vector_duplicate_value (mode, target, val);
-
 default:
   return false;
 }
@@ -15912,7 +15921,8 @@ ix86_expand_vector_set_var (rtx target, rtx val, rtx 
idx)
   /* 512-bits vector byte/word broadcast and comparison only available
  under TARGET_AVX512BW, break 512-bits vector into two 256-bits vector
  when without TARGET_AVX512BW.  */
-  if ((mode == V32HImode || mode == V64QImode) && !TARGET_AVX512BW)
+  if ((mode == V32HImode || mode == V32HFmode || mode == V64QImode)
+  && !TARGET_AVX512BW)
 {
   gcc_assert (TARGET_AVX512F);
   rtx vhi, vlo, idx_hi;
@@ -15926,6 +15

Re: [PATCH v2 3/6] rs6000: Rename rs6000-builtin-new.def to rs6000-builtins.def

2021-12-14 Thread Bill Schmidt via Gcc-patches
Ping.  Thanks!
Bill

On 12/6/21 2:49 PM, Bill Schmidt via Gcc-patches wrote:
> Hi!
>
> This patch just renames a file and updates the build machinery accordingly.
>
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
> okay for trunk?
>
> Thanks!
> Bill
>
> 2021-12-02  Bill Schmidt  
>
> gcc/
>   * config/rs6000/rs6000-builtin-new.def: Rename to...
>   * config/rs6000/rs6000-builtins.def: ...this.
>   * config/rs6000/rs6000-gen-builtins.c: Adjust header commentary.
>   * config/rs6000/t-rs6000 (EXTRA_GTYPE_DEPS): Rename
>   rs6000-builtin-new.def to rs6000-builtins.def.
>   (rs6000-builtins.c): Likewise.
> ---
>  .../rs6000/{rs6000-builtin-new.def => rs6000-builtins.def}  | 0
>  gcc/config/rs6000/rs6000-gen-builtins.c | 4 ++--
>  gcc/config/rs6000/t-rs6000  | 6 +++---
>  3 files changed, 5 insertions(+), 5 deletions(-)
>  rename gcc/config/rs6000/{rs6000-builtin-new.def => rs6000-builtins.def} 
> (100%)
>
> diff --git a/gcc/config/rs6000/rs6000-builtin-new.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> similarity index 100%
> rename from gcc/config/rs6000/rs6000-builtin-new.def
> rename to gcc/config/rs6000/rs6000-builtins.def
> diff --git a/gcc/config/rs6000/rs6000-gen-builtins.c 
> b/gcc/config/rs6000/rs6000-gen-builtins.c
> index 78b2486aafc..9c61b7d9fe6 100644
> --- a/gcc/config/rs6000/rs6000-gen-builtins.c
> +++ b/gcc/config/rs6000/rs6000-gen-builtins.c
> @@ -22,7 +22,7 @@ along with GCC; see the file COPYING3.  If not see
> recognition code for Power targets, based on text files that
> describe the built-in functions and vector overloads:
>
> - rs6000-builtin-new.def Table of built-in functions
> + rs6000-builtins.defTable of built-in functions
>   rs6000-overload.defTable of overload functions
>
> Both files group similar functions together in "stanzas," as
> @@ -125,7 +125,7 @@ along with GCC; see the file COPYING3.  If not see
>
> The second line contains the  that this particular instance of
> the overloaded function maps to.  It must match a token that appears in
> -   rs6000-builtin-new.def.  Optionally, a second token may appear.  If only
> +   rs6000-builtins.def.  Optionally, a second token may appear.  If only
> one token is on the line, it is also used to build the unique identifier
> for the overloaded function.  If a second token is present, the second
> token is used instead for this purpose.  This is necessary in cases
> diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
> index d48a4b1be6c..3d3143a171d 100644
> --- a/gcc/config/rs6000/t-rs6000
> +++ b/gcc/config/rs6000/t-rs6000
> @@ -22,7 +22,7 @@ TM_H += $(srcdir)/config/rs6000/rs6000-builtin.def
>  TM_H += $(srcdir)/config/rs6000/rs6000-cpus.def
>  TM_H += $(srcdir)/config/rs6000/rs6000-modes.h
>  PASSES_EXTRA += $(srcdir)/config/rs6000/rs6000-passes.def
> -EXTRA_GTYPE_DEPS += $(srcdir)/config/rs6000/rs6000-builtin-new.def
> +EXTRA_GTYPE_DEPS += $(srcdir)/config/rs6000/rs6000-builtins.def
>
>  rs6000-pcrel-opt.o: $(srcdir)/config/rs6000/rs6000-pcrel-opt.c
>   $(COMPILE) $<
> @@ -59,10 +59,10 @@ build/rs6000-gen-builtins$(build_exeext): 
> build/rs6000-gen-builtins.o \
>  # For now, the header files depend on rs6000-builtins.c, which avoids
>  # races because the .c file is closed last in rs6000-gen-builtins.c.
>  rs6000-builtins.c: build/rs6000-gen-builtins$(build_exeext) \
> -$(srcdir)/config/rs6000/rs6000-builtin-new.def \
> +$(srcdir)/config/rs6000/rs6000-builtins.def \
>  $(srcdir)/config/rs6000/rs6000-overload.def
>   $(RUN_GEN) ./build/rs6000-gen-builtins$(build_exeext) \
> - $(srcdir)/config/rs6000/rs6000-builtin-new.def \
> + $(srcdir)/config/rs6000/rs6000-builtins.def \
>   $(srcdir)/config/rs6000/rs6000-overload.def rs6000-builtins.h \
>   rs6000-builtins.c rs6000-vecdefines.h
>


Re: [PATCH v2 4/6] rs6000: Remove rs6000-builtin.def and associated data and functions

2021-12-14 Thread Bill Schmidt via Gcc-patches
Ping.  Thanks!

Bill

On 12/6/21 2:49 PM, Bill Schmidt via Gcc-patches wrote:
> Hi!
>
> The old rs6000-builtin.def file is no longer needed.  Remove it and the code
> that depends on it.
>
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
> okay for trunk?
>
> Thanks!
> Bill
>
> 2021-12-02  Bill Schmidt  
>
> gcc/
>   * config/rs6000/rs6000-builtin.def: Delete.
>   * config/rs6000/rs6000-call.c (builtin_compatibility): Delete.
>   (builtin_description): Delete.
>   (builtin_hash_struct): Delete.
>   (builtin_hasher): Delete.
>   (builtin_hash_table): Delete.
>   (builtin_hasher::hash): Delete.
>   (builtin_hasher::equal): Delete.
>   (rs6000_builtin_info_type): Delete.
>   (rs6000_builtin_info): Delete.
>   (bdesc_compat): Delete.
>   (bdesc_3arg): Delete.
>   (bdesc_4arg): Delete.
>   (bdesc_dst): Delete.
>   (bdesc_2arg): Delete.
>   (bdesc_altivec_preds): Delete.
>   (bdesc_abs): Delete.
>   (bdesc_1arg): Delete.
>   (bdesc_0arg): Delete.
>   (bdesc_htm): Delete.
>   (bdesc_mma): Delete.
>   (rs6000_overloaded_builtin_p): Delete.
>   (rs6000_overloaded_builtin_name): Delete.
>   (htm_spr_num): Delete.
>   (rs6000_builtin_is_supported_p): Delete.
>   (rs6000_gimple_fold_mma_builtin): Delete.
>   (gt-rs6000-call.h): Remove include directive.
>   * config/rs6000/rs6000-protos.h (rs6000_overloaded_builtin_p): Delete.
>   (rs6000_builtin_is_supported_p): Delete.
>   (rs6000_overloaded_builtin_name): Delete.
>   * config/rs6000/rs6000.c (rs6000_builtin_decls): Delete.
>   (rs6000_debug_reg_global): Remove reference to RS6000_BUILTIN_COUNT.
>   * config/rs6000/rs6000.h (rs6000_builtins): Delete.
>   (altivec_builtin_types): Delete.
>   (rs6000_builtin_decls): Delete.
>   * config/rs6000/t-rs6000 (TM_H): Don't add rs6000-builtin.def.
> ---
>  gcc/config/rs6000/rs6000-builtin.def | 3350 --
>  gcc/config/rs6000/rs6000-call.c  |  712 --
>  gcc/config/rs6000/rs6000-protos.h|3 -
>  gcc/config/rs6000/rs6000.c   |3 -
>  gcc/config/rs6000/rs6000.h   |   57 -
>  gcc/config/rs6000/t-rs6000   |1 -
>  6 files changed, 4126 deletions(-)
>  delete mode 100644 gcc/config/rs6000/rs6000-builtin.def
>
> diff --git a/gcc/config/rs6000/rs6000-builtin.def 
> b/gcc/config/rs6000/rs6000-builtin.def
> deleted file mode 100644
> index 9dbf16f48c4..000
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index 86054f75756..a5ee06c991f 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -89,20 +89,6 @@
>  #define TARGET_NO_PROTOTYPE 0
>  #endif
>
> -struct builtin_compatibility
> -{
> -  const enum rs6000_builtins code;
> -  const char *const name;
> -};
> -
> -struct builtin_description
> -{
> -  const HOST_WIDE_INT mask;
> -  const enum insn_code icode;
> -  const char *const name;
> -  const enum rs6000_builtins code;
> -};
> -
>  /* Used by __builtin_cpu_is(), mapping from PLATFORM names to values.  */
>  static const struct
>  {
> @@ -184,127 +170,6 @@ static const struct
>
>  static rtx rs6000_expand_new_builtin (tree, rtx, rtx, machine_mode, int);
>  static bool rs6000_gimple_fold_new_builtin (gimple_stmt_iterator *gsi);
> -
> -
> -/* Hash table to keep track of the argument types for builtin functions.  */
> -
> -struct GTY((for_user)) builtin_hash_struct
> -{
> -  tree type;
> -  machine_mode mode[4];  /* return value + 3 arguments.  */
> -  unsigned char uns_p[4];/* and whether the types are unsigned.  */
> -};
> -
> -struct builtin_hasher : ggc_ptr_hash
> -{
> -  static hashval_t hash (builtin_hash_struct *);
> -  static bool equal (builtin_hash_struct *, builtin_hash_struct *);
> -};
> -
> -static GTY (()) hash_table *builtin_hash_table;
> -
> -/* Hash function for builtin functions with up to 3 arguments and a return
> -   type.  */
> -hashval_t
> -builtin_hasher::hash (builtin_hash_struct *bh)
> -{
> -  unsigned ret = 0;
> -  int i;
> -
> -  for (i = 0; i < 4; i++)
> -{
> -  ret = (ret * (unsigned)MAX_MACHINE_MODE) + ((unsigned)bh->mode[i]);
> -  ret = (ret * 2) + bh->uns_p[i];
> -}
> -
> -  return ret;
> -}
> -
> -/* Compare builtin hash entries H1 and H2 for equivalence.  */
> -bool
> -builtin_hasher::equal (builtin_hash_struct *p1, builtin_hash_struct *p2)
> -{
> -  return ((p1->mode[0] == p2->mode[0])
> -   && (p1->mode[1] == p2->mode[1])
> -   && (p1->mode[2] == p2->mode[2])
> -   && (p1->mode[3] == p2->mode[3])
> -   && (p1->uns_p[0] == p2->uns_p[0])
> -   && (p1->uns_p[1] == p2->uns_p[1])
> -   && (p1->uns_p[2] == p2->uns_p[2])
> -   && (p1->uns_p[3] == p2->uns_p[3]));
> -}
> -
> -
> -/* Table that classifies rs6000 builtin functions (pure, const, etc.).  */
> -#undef RS6000_BUILTIN_0
> -#undef RS6000_BUILTIN_1
> -#undef RS6000_BUILTIN_2
> -#unde

Re: [PATCH v2 /6] rs6000: Rename functions with "new" in their names

2021-12-14 Thread Bill Schmidt via Gcc-patches
Ping.  Thanks!

Bill

On 12/6/21 2:49 PM, Bill Schmidt via Gcc-patches wrote:
> Hi!
>
> While we had two sets of built-in functionality at the same time, I put "new"
> in the names of quite a few functions.  Time to undo that.
>
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
> okay for trunk?
>
> Thanks!
> Bill
>
> 2021-12-02  Bill Schmidt  
>
> gcc/
>   * config/rs6000/rs6000-c.c (altivec_resolve_new_overloaded_builtin):
>   Remove forward declaration.
>   (rs6000_new_builtin_type_compatible): Rename to
>   rs6000_builtin_type_compatible.
>   (rs6000_builtin_type_compatible): Remove.
>   (altivec_resolve_overloaded_builtin): Remove.
>   (altivec_build_new_resolved_builtin): Rename to
>   altivec_build_resolved_builtin.
>   (altivec_resolve_new_overloaded_builtin): Rename to
>   altivec_resolve_overloaded_builtin.  Remove static keyword.  Adjust
>   called function names.
>   * config/rs6000/rs6000-call.c (rs6000_expand_new_builtin): Remove
>   forward declaration.
>   (rs6000_gimple_fold_new_builtin): Likewise.
>   (rs6000_invalid_new_builtin): Rename to rs6000_invalid_builtin.
>   (rs6000_gimple_fold_builtin): Remove.
>   (rs6000_new_builtin_valid_without_lhs): Rename to
>   rs6000_builtin_valid_without_lhs.
>   (rs6000_new_builtin_is_supported): Rename to
>   rs6000_builtin_is_supported.
>   (rs6000_gimple_fold_new_mma_builtin): Rename to
>   rs6000_gimple_fold_mma_builtin.
>   (rs6000_gimple_fold_new_builtin): Rename to
>   rs6000_gimple_fold_builtin.  Remove static keyword.  Adjust called
>   function names.
>   (rs6000_expand_builtin): Remove.
>   (new_cpu_expand_builtin): Rename to cpu_expand_builtin.
>   (new_mma_expand_builtin): Rename to mma_expand_builtin.
>   (new_htm_spr_num): Rename to htm_spr_num.
>   (new_htm_expand_builtin): Rename to htm_expand_builtin.  Change name
>   of called function.
>   (rs6000_expand_new_builtin): Rename to rs6000_expand_builtin.  Remove
>   static keyword.  Adjust called function names.
>   (rs6000_new_builtin_decl): Rename to rs6000_builtin_decl.  Remove
>   static keyword.
>   (rs6000_builtin_decl): Remove.
>   * config/rs6000/rs6000-gen-builtins.c (write_decls): In gnerated code,
>   rename rs6000_new_builtin_is_supported to rs6000_builtin_is_supported.
>   * config/rs6000/rs6000-internal.h (rs6000_invalid_new_builtin): Rename
>   to rs6000_invalid_builtin.
>   * config/rs6000/rs6000.c (rs6000_new_builtin_vectorized_function):
>   Rename to rs6000_builtin_vectorized_function.
>   (rs6000_new_builtin_md_vectorized_function): Rename to
>   rs6000_builtin_md_vectorized_function.
>   (rs6000_builtin_vectorized_function): Remove.
>   (rs6000_builtin_md_vectorized_function): Remove.
> ---
>  gcc/config/rs6000/rs6000-c.c| 120 +---
>  gcc/config/rs6000/rs6000-call.c |  99 ++-
>  gcc/config/rs6000/rs6000-gen-builtins.c |   3 +-
>  gcc/config/rs6000/rs6000-internal.h |   2 +-
>  gcc/config/rs6000/rs6000.c  |  31 ++
>  5 files changed, 80 insertions(+), 175 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
> index d44edf585aa..f790c72d621 100644
> --- a/gcc/config/rs6000/rs6000-c.c
> +++ b/gcc/config/rs6000/rs6000-c.c
> @@ -37,9 +37,6 @@
>
>  #include "rs6000-internal.h"
>
> -static tree altivec_resolve_new_overloaded_builtin (location_t, tree, void 
> *);
> -
> -
>  /* Handle the machine specific pragma longcall.  Its syntax is
>
> # pragma longcall ( TOGGLE )
> @@ -817,7 +814,7 @@ is_float128_p (tree t)
>
>  /* Return true iff ARGTYPE can be compatibly passed as PARMTYPE.  */
>  static bool
> -rs6000_new_builtin_type_compatible (tree parmtype, tree argtype)
> +rs6000_builtin_type_compatible (tree parmtype, tree argtype)
>  {
>if (parmtype == error_mark_node)
>  return false;
> @@ -840,23 +837,6 @@ rs6000_new_builtin_type_compatible (tree parmtype, tree 
> argtype)
>return lang_hooks.types_compatible_p (parmtype, argtype);
>  }
>
> -static inline bool
> -rs6000_builtin_type_compatible (tree t, int id)
> -{
> -  tree builtin_type;
> -  builtin_type = rs6000_builtin_type (id);
> -  if (t == error_mark_node)
> -return false;
> -  if (INTEGRAL_TYPE_P (t) && INTEGRAL_TYPE_P (builtin_type))
> -return true;
> -  else if (TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128
> -&& is_float128_p (t) && is_float128_p (builtin_type))
> -return true;
> -  else
> -return lang_hooks.types_compatible_p (t, builtin_type);
> -}
> -
> -
>  /* In addition to calling fold_convert for EXPR of type TYPE, also
> call c_fully_fold to remove any C_MAYBE_CONST_EXPRs that could be
> hiding there (PR47197).  */
> @@ -873,16 +853,6 @@ fully_fold_convert (tree type, tree expr)
>return result;
>  }
>
> -/* Implementation of the resol

Re: [PATCH v2 6/6] rs6000: Rename arrays to remove temporary _x suffix

2021-12-14 Thread Bill Schmidt via Gcc-patches
Ping.  Thanks!

Bill

On 12/6/21 2:49 PM, Bill Schmidt via Gcc-patches wrote:
> Hi!
>
> While we had two sets of built-in infrastructure at once, I added _x as a
> suffix to two arrays to disambiguate the old and new versions.  Time to fix
> that also.
>
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
> okay for trunk?
>
> Thanks!
> Bill
>
> 2021-12-06  Bill Schmidt  
>
> gcc/
>   * config/rs6000/rs6000-c.c (altivec_build_resolved_builtin): Rename
>   rs6000_builtin_decls_x to rs6000_builtin_decls.
>   (altivec_resolve_overloaded_builtin): Likewise.  Also rename
>   rs6000_builtin_info_x to rs6000_builtin_info.
>   * config/rs6000/rs6000-call.c (rs6000_invalid_builtin): Rename
>   rs6000_builtin_info_x to rs6000_builtin_info.
>   (rs6000_builtin_is_supported): Likewise.
>   (rs6000_gimple_fold_mma_builtin): Likewise.  Also rename
>   rs6000_builtin_decls_x to rs6000_builtin_decls.
>   (rs6000_gimple_fold_builtin): Rename rs6000_builtin_info_x to
>   rs6000_builtin_info.
>   (cpu_expand_builtin): Likewise.
>   (rs6000_expand_builtin): Likewise.
>   (rs6000_init_builtins): Likewise.  Also rename rs6000_builtin_decls_x
>   to rs6000_builtin_decls.
>   (rs6000_builtin_decl): Rename rs6000_builtin_decls_x to
>   rs6000_builtin_decls.
>   * config/rs6000/rs6000-gen-builtins.c (write_decls): In generated code,
>   rename rs6000_builtin_decls_x to rs6000_builtin_decls, and rename
>   rs6000_builtin_info_x to rs6000_builtin_info.
>   (write_bif_static_init): In generated code, rename
>   rs6000_builtin_info_x to rs6000_builtin_info.
>   (write_init_bif_table): In generated code, rename
>   rs6000_builtin_decls_x to rs6000_builtin_decls, and rename
>   rs6000_builtin_info_x to rs6000_builtin_info.
>   (write_init_ovld_table): In generated code, rename
>   rs6000_builtin_decls_x to rs6000_builtin_decls.
>   (write_init_file): Likewise.
>   * config/rs6000/rs6000.c (rs6000_builtin_vectorized_function):
>   Likewise.
>   (rs6000_builtin_md_vectorized_function): Likewise.
>   (rs6000_builtin_reciprocal): Likewise.
>   (add_condition_to_bb): Likewise.
>   (rs6000_atomic_assign_expand_fenv): Likewise.
> ---
>  gcc/config/rs6000/rs6000-c.c| 64 -
>  gcc/config/rs6000/rs6000-call.c | 46 +-
>  gcc/config/rs6000/rs6000-gen-builtins.c | 27 +--
>  gcc/config/rs6000/rs6000.c  | 58 +++---
>  4 files changed, 96 insertions(+), 99 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
> index f790c72d621..e0ebdeed548 100644
> --- a/gcc/config/rs6000/rs6000-c.c
> +++ b/gcc/config/rs6000/rs6000-c.c
> @@ -867,7 +867,7 @@ altivec_build_resolved_builtin (tree *args, int n, tree 
> fntype, tree ret_type,
>  {
>tree argtypes = TYPE_ARG_TYPES (fntype);
>tree arg_type[MAX_OVLD_ARGS];
> -  tree fndecl = rs6000_builtin_decls_x[bif_id];
> +  tree fndecl = rs6000_builtin_decls[bif_id];
>
>for (int i = 0; i < n; i++)
>  {
> @@ -1001,13 +1001,13 @@ altivec_resolve_overloaded_builtin (location_t loc, 
> tree fndecl,
> case E_SFmode:
>   {
> /* For floats use the xvmulsp instruction directly.  */
> -   tree call = rs6000_builtin_decls_x[RS6000_BIF_XVMULSP];
> +   tree call = rs6000_builtin_decls[RS6000_BIF_XVMULSP];
> return build_call_expr (call, 2, arg0, arg1);
>   }
> case E_DFmode:
>   {
> /* For doubles use the xvmuldp instruction directly.  */
> -   tree call = rs6000_builtin_decls_x[RS6000_BIF_XVMULDP];
> +   tree call = rs6000_builtin_decls[RS6000_BIF_XVMULDP];
> return build_call_expr (call, 2, arg0, arg1);
>   }
> /* Other types are errors.  */
> @@ -1066,7 +1066,7 @@ altivec_resolve_overloaded_builtin (location_t loc, 
> tree fndecl,
>   vec_safe_push (params, arg0);
>   vec_safe_push (params, arg1);
>   tree call = altivec_resolve_overloaded_builtin
> -   (loc, rs6000_builtin_decls_x[RS6000_OVLD_VEC_CMPEQ],
> +   (loc, rs6000_builtin_decls[RS6000_OVLD_VEC_CMPEQ],
>  params);
>   /* Use save_expr to ensure that operands used more than once
>  that may have side effects (like calls) are only evaluated
> @@ -1076,7 +1076,7 @@ altivec_resolve_overloaded_builtin (location_t loc, 
> tree fndecl,
>   vec_safe_push (params, call);
>   vec_safe_push (params, call);
>   return altivec_resolve_overloaded_builtin
> -   (loc, rs6000_builtin_decls_x[RS6000_OVLD_VEC_NOR], params);
> +   (loc, rs6000_builtin_decls[RS6000_OVLD_VEC_NOR], params);
> }
> /* Other types are errors.  */
>   default:
> @@ -1129,9 +1129,9 @@ 

Re: [PATCH RFC] c++: add color to function decl printing

2021-12-14 Thread Martin Sebor via Gcc-patches

On 12/13/21 10:41 PM, Jason Merrill wrote:

On 12/13/21 14:22, Martin Sebor wrote:

On 12/11/21 10:39 PM, Jason Merrill via Gcc-patches wrote:
In reading C++ diagnostics, it's often hard to find the name of the 
function
in the middle of the template header, return type, parameters, and 
template
arguments.  So let's colorize it, and maybe the template argument 
bindings

while we're at it.

I've somewhat arbitrarily chosen bold green for the function name, and
non-bold magenta for the template arguments.  I'm not at all attached to
these choices.

A side-effect is that when this happens in a quote (i.e. %qD), the
rest of the quote after the function name is no longer bold.  I think 
that's
acceptable; returning to the bold would require maintaining a 
colorize stack

instead of the on/off controls we have now.

Any thoughts?


I appreciate the problem but I can't say I find this solution
much of an improvement.  We end up with the same name in up to
four colors: cyan, magenta, green, and black, plus bold versions
of each, depending on where in the text the name appears.  It's
not apparent to me what the different colors mean or how they
help.


You can get the same name in different colors because the diagnostic is 
telling you something different about it, if it's e.g. the name of a 
function we're printing or the source text being indicated as the source 
of the problem.  Is it really unclear what the different colors mean?  I 
find it much easier to read the output for your testcase after this 
patch, as highlighting the function name and lowlighting the template 
args means that


map(_InputIterator, _InputIterator)

stands out as the problematic function.


I understand why you want to draw attention to some parts of
the message and I think that could be useful if done without
relying on color as the sole attribute (think of users of
monochrome or low color terminals or the color-blind), with
more consistency, and without "overloading" existing colors
to mean something subtly different in different contexts (green
is already used for insertion hints, and magenta for warnings).

Given a simple case like this:

  struct A
  {
A (T, T);
  };

  A a (1.0);

t.C:7:14: error: no matching function for call to ‘A::A(double)’
7 | A a (1.0);
  |  ^
t.C:4:3: note: candidate: ‘A::A(T, T) [with T = int]’
4 |   A (T, T);
  |   ^
t.C:4:3: note:   candidate expects 2 arguments, 1 provided
t.C:2:8: note: candidate: ‘constexpr A::A(const A&)’
2 | struct A
  |^
t.C:2:8: note:   no known conversion for argument 1 from ‘double’ to 
‘const A&’

t.C:2:8: note: candidate: ‘constexpr A::A(A&&)’
t.C:2:8: note:   no known conversion for argument 1 from ‘double’ to 
‘A&&’


In the attached screenshot of the output it's reasonably clear
that the green A in the messages names the candidate function.
But even there the function's name is rendered in three colors:
black in the error message, green in the notes, and cyan in
the source code quoted in the notes.  What is not clear is
why. (I can guess that it's simply an artifact of how the GCC
diagnostic machinery works, but that's neither intuitive to
users nor helpful.)

But simple cases are clear even with no colors.  What you'd like
to do is improve the not-so-simple cases like the one with
std::map and that's where the color choices become much less
clear: in long messages with lots of the same names highlighted
in different colors.  It seems especially unhelpful that some
of the text in the same color is bold while other such text is
not (the [with T = ...] parts).  Using the same (or very similar)
colors for entirely different things (warnings and fix-it hints)
compounds the problem.




IMO, the underlying root cause for why relevant details are so
hard to find in G++ messages is that there's so much redundancy
and irrelevant context in the output.  For instance, for this
test case:

#include 

std::map m ("123", "456");

GCC produces 10 screenfuls of output (more than 10 times as many
as Clang).  GCC produces so much more output because it repeats
the full set of included files before each candidate (even though
the headers are the same in each),


Yes, unfortunately the explanation of why each candidate is non-viable 
switches files.  Perhaps we should remember files that we've already 
listed the include path for and avoid repeating it.


I was thinking the same thing.  Paths to system headers could
also be abbreviated (or common prefixes replaced by some symbol).

Perhaps even the repetitive [with T = ...] could be removed in
subsequent messages to reduce the clutter (I notice Clang does
away with either all or most of it altogether in some messages
involving standard containers like std::string or std::vector).




and also because it repeats
the full set of template arguments each time.  E.g., like so:
In file included from 
/build/gcc-master/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:64, 

 

Re: [PATCH 2/6] rs6000: Remove altivec_overloaded_builtins array and initialization

2021-12-14 Thread David Edelsohn via Gcc-patches
On Mon, Dec 6, 2021 at 3:49 PM Bill Schmidt  wrote:
>
> Hi!
>
> This patch just removes the huge altivec_overloaded_builtins array.
>
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
> okay for trunk?
>
> Thanks!
> Bill
>
> 2021-12-02  Bill Schmidt  
>
> gcc/
> * config/rs6000/rs6000-call.c (altivec_overloaded_builtins): Remove.
> * config/rs6000/rs6000.h (altivec_overloaded_builtins): Remove.

Okay.

Thanks, David


Re: [PATCH 3/6] rs6000: Rename rs6000-builtin-new.def to rs6000-builtins.def

2021-12-14 Thread David Edelsohn via Gcc-patches
On Mon, Dec 6, 2021 at 3:49 PM Bill Schmidt  wrote:
>
> Hi!
>
> This patch just renames a file and updates the build machinery accordingly.
>
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
> okay for trunk?
>
> Thanks!
> Bill
>
> 2021-12-02  Bill Schmidt  
>
> gcc/
> * config/rs6000/rs6000-builtin-new.def: Rename to...
> * config/rs6000/rs6000-builtins.def: ...this.
> * config/rs6000/rs6000-gen-builtins.c: Adjust header commentary.
> * config/rs6000/t-rs6000 (EXTRA_GTYPE_DEPS): Rename
> rs6000-builtin-new.def to rs6000-builtins.def.
> (rs6000-builtins.c): Likewise.

Okay.

Thanks, David


Re: [PATCH 4/6] rs6000: Remove rs6000-builtin.def and associated data and functions

2021-12-14 Thread David Edelsohn via Gcc-patches
On Mon, Dec 6, 2021 at 3:49 PM Bill Schmidt  wrote:
>
> Hi!
>
> The old rs6000-builtin.def file is no longer needed.  Remove it and the code
> that depends on it.
>
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
> okay for trunk?
>
> Thanks!
> Bill
>
> 2021-12-02  Bill Schmidt  
>
> gcc/
> * config/rs6000/rs6000-builtin.def: Delete.
> * config/rs6000/rs6000-call.c (builtin_compatibility): Delete.
> (builtin_description): Delete.
> (builtin_hash_struct): Delete.
> (builtin_hasher): Delete.
> (builtin_hash_table): Delete.
> (builtin_hasher::hash): Delete.
> (builtin_hasher::equal): Delete.
> (rs6000_builtin_info_type): Delete.
> (rs6000_builtin_info): Delete.
> (bdesc_compat): Delete.
> (bdesc_3arg): Delete.
> (bdesc_4arg): Delete.
> (bdesc_dst): Delete.
> (bdesc_2arg): Delete.
> (bdesc_altivec_preds): Delete.
> (bdesc_abs): Delete.
> (bdesc_1arg): Delete.
> (bdesc_0arg): Delete.
> (bdesc_htm): Delete.
> (bdesc_mma): Delete.
> (rs6000_overloaded_builtin_p): Delete.
> (rs6000_overloaded_builtin_name): Delete.
> (htm_spr_num): Delete.
> (rs6000_builtin_is_supported_p): Delete.
> (rs6000_gimple_fold_mma_builtin): Delete.
> (gt-rs6000-call.h): Remove include directive.
> * config/rs6000/rs6000-protos.h (rs6000_overloaded_builtin_p): Delete.
> (rs6000_builtin_is_supported_p): Delete.
> (rs6000_overloaded_builtin_name): Delete.
> * config/rs6000/rs6000.c (rs6000_builtin_decls): Delete.
> (rs6000_debug_reg_global): Remove reference to RS6000_BUILTIN_COUNT.
> * config/rs6000/rs6000.h (rs6000_builtins): Delete.
> (altivec_builtin_types): Delete.
> (rs6000_builtin_decls): Delete.
> * config/rs6000/t-rs6000 (TM_H): Don't add rs6000-builtin.def.

Okay.

Thanks, David


Re: [PATCH 5/6] rs6000: Rename functions with "new" in their names

2021-12-14 Thread David Edelsohn via Gcc-patches
On Mon, Dec 6, 2021 at 3:49 PM Bill Schmidt  wrote:
>
> Hi!
>
> While we had two sets of built-in functionality at the same time, I put "new"
> in the names of quite a few functions.  Time to undo that.
>
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
> okay for trunk?
>
> Thanks!
> Bill
>
> 2021-12-02  Bill Schmidt  
>
> gcc/
> * config/rs6000/rs6000-c.c (altivec_resolve_new_overloaded_builtin):
> Remove forward declaration.
> (rs6000_new_builtin_type_compatible): Rename to
> rs6000_builtin_type_compatible.
> (rs6000_builtin_type_compatible): Remove.
> (altivec_resolve_overloaded_builtin): Remove.
> (altivec_build_new_resolved_builtin): Rename to
> altivec_build_resolved_builtin.
> (altivec_resolve_new_overloaded_builtin): Rename to
> altivec_resolve_overloaded_builtin.  Remove static keyword.  Adjust
> called function names.
> * config/rs6000/rs6000-call.c (rs6000_expand_new_builtin): Remove
> forward declaration.
> (rs6000_gimple_fold_new_builtin): Likewise.
> (rs6000_invalid_new_builtin): Rename to rs6000_invalid_builtin.
> (rs6000_gimple_fold_builtin): Remove.
> (rs6000_new_builtin_valid_without_lhs): Rename to
> rs6000_builtin_valid_without_lhs.
> (rs6000_new_builtin_is_supported): Rename to
> rs6000_builtin_is_supported.
> (rs6000_gimple_fold_new_mma_builtin): Rename to
> rs6000_gimple_fold_mma_builtin.
> (rs6000_gimple_fold_new_builtin): Rename to
> rs6000_gimple_fold_builtin.  Remove static keyword.  Adjust called
> function names.
> (rs6000_expand_builtin): Remove.
> (new_cpu_expand_builtin): Rename to cpu_expand_builtin.
> (new_mma_expand_builtin): Rename to mma_expand_builtin.
> (new_htm_spr_num): Rename to htm_spr_num.
> (new_htm_expand_builtin): Rename to htm_expand_builtin.  Change name
> of called function.
> (rs6000_expand_new_builtin): Rename to rs6000_expand_builtin.  Remove
> static keyword.  Adjust called function names.
> (rs6000_new_builtin_decl): Rename to rs6000_builtin_decl.  Remove
> static keyword.
> (rs6000_builtin_decl): Remove.
> * config/rs6000/rs6000-gen-builtins.c (write_decls): In gnerated code,
> rename rs6000_new_builtin_is_supported to rs6000_builtin_is_supported.
> * config/rs6000/rs6000-internal.h (rs6000_invalid_new_builtin): Rename
> to rs6000_invalid_builtin.
> * config/rs6000/rs6000.c (rs6000_new_builtin_vectorized_function):
> Rename to rs6000_builtin_vectorized_function.
> (rs6000_new_builtin_md_vectorized_function): Rename to
> rs6000_builtin_md_vectorized_function.
> (rs6000_builtin_vectorized_function): Remove.
> (rs6000_builtin_md_vectorized_function): Remove.

Okay.

Thanks, David


Re: [PATCH 6/6] rs6000: Rename arrays to remove temporary _x suffix

2021-12-14 Thread David Edelsohn via Gcc-patches
On Mon, Dec 6, 2021 at 3:49 PM Bill Schmidt  wrote:
>
> Hi!
>
> While we had two sets of built-in infrastructure at once, I added _x as a
> suffix to two arrays to disambiguate the old and new versions.  Time to fix
> that also.
>
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
> okay for trunk?
>
> Thanks!
> Bill
>
> 2021-12-06  Bill Schmidt  
>
> gcc/
> * config/rs6000/rs6000-c.c (altivec_build_resolved_builtin): Rename
> rs6000_builtin_decls_x to rs6000_builtin_decls.
> (altivec_resolve_overloaded_builtin): Likewise.  Also rename
> rs6000_builtin_info_x to rs6000_builtin_info.
> * config/rs6000/rs6000-call.c (rs6000_invalid_builtin): Rename
> rs6000_builtin_info_x to rs6000_builtin_info.
> (rs6000_builtin_is_supported): Likewise.
> (rs6000_gimple_fold_mma_builtin): Likewise.  Also rename
> rs6000_builtin_decls_x to rs6000_builtin_decls.
> (rs6000_gimple_fold_builtin): Rename rs6000_builtin_info_x to
> rs6000_builtin_info.
> (cpu_expand_builtin): Likewise.
> (rs6000_expand_builtin): Likewise.
> (rs6000_init_builtins): Likewise.  Also rename rs6000_builtin_decls_x
> to rs6000_builtin_decls.
> (rs6000_builtin_decl): Rename rs6000_builtin_decls_x to
> rs6000_builtin_decls.
> * config/rs6000/rs6000-gen-builtins.c (write_decls): In generated 
> code,
> rename rs6000_builtin_decls_x to rs6000_builtin_decls, and rename
> rs6000_builtin_info_x to rs6000_builtin_info.
> (write_bif_static_init): In generated code, rename
> rs6000_builtin_info_x to rs6000_builtin_info.
> (write_init_bif_table): In generated code, rename
> rs6000_builtin_decls_x to rs6000_builtin_decls, and rename
> rs6000_builtin_info_x to rs6000_builtin_info.
> (write_init_ovld_table): In generated code, rename
> rs6000_builtin_decls_x to rs6000_builtin_decls.
> (write_init_file): Likewise.
> * config/rs6000/rs6000.c (rs6000_builtin_vectorized_function):
> Likewise.
> (rs6000_builtin_md_vectorized_function): Likewise.
> (rs6000_builtin_reciprocal): Likewise.
> (add_condition_to_bb): Likewise.
> (rs6000_atomic_assign_expand_fenv): Likewise.

Okay.

Thanks, David


Re: [PATCH] gcc/diagnostic.c: make -Werror message more helpful

2021-12-14 Thread Eric Gallager via Gcc-patches
On Mon, Dec 13, 2021 at 1:17 PM Martin Sebor via Gcc-patches
 wrote:
>
> On 12/12/21 3:13 AM, Andrea Monaco via Gcc-patches wrote:
> >
> > Hello.
> >
> >
> > I propose to make that message more verbose.  It sure would have helped
> > me once.  You don't always have a Web search available :)
>
> Warnings turned into errors have the [-Werror=...] tag at the end
> so I'm not sure I see when reiterating -Werror at the end of output
> would be helpful.  Can you explain the circumstances when it would
> have helped you?
>
> For what it's worth, a change here that I think might be more useful
> is printing the number of diagnostics of each kind (e.g., 2 warnings
> and 5 errors found).
>

I swear we already had a bug open for this suggestion, but after much
searching I can't seem to find it anymore, so if anyone has any ideas
of what keywords I forgot to try, feel free to send them...

> > Andrea Monaco
> >
> >
> >
> > diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
> > index 4ded1760705..8b67662390e 100644
> > --- a/gcc/diagnostic.c
> > +++ b/gcc/diagnostic.c
> > @@ -156,7 +156,7 @@ default_diagnostic_final_cb (diagnostic_context 
> > *context)
> > /* -Werror was given.  */
> > if (context->warning_as_error_requested)
> >  pp_verbatim (context->printer,
> > -_("%s: all warnings being treated as errors"),
> > +_("%s: all warnings being treated as errors (-Werror; 
> > disable with -Wno-error)"),
>
> If this change should move forward, -Werror needs to be quoted
> (e.g., passed as an argument to %qs or surrounded in a pair of
> %< and %> directives).  The "disable with -Wno-error" part
> is superfluous and would not be entirely accurate for warnings
> promoted to errors by #pragma GCC diagnostic (those cannot be
> demoted back to warnings by -Wno-error).
>
> Martin
>
> >   progname);
> > /* At least one -Werror= was given.  */
> > else
> >
>


Re: [PATCH] [Gimple] Fix ICE. [PR103682]

2021-12-14 Thread Jeff Law via Gcc-patches




On 12/13/2021 10:17 PM, liuhongt via Gcc-patches wrote:

This testcase should just go in gcc.c-torture/compile and remove the
dg-options too.
The main reason there is nothing specific to x86 here.


Thanks, here's the updated patch.


Check is_gimple_assign before gimple_assign_rhs_code.

gcc/ChangeLog:

PR target/103682
* tree-ssa-ccp.c (optimize_atomic_bit_test_and): Check
is_gimple_assign before gimple_assign_rhs_code.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr103682.c: New test.

OK
jeff



Re: Dominators question

2021-12-14 Thread Andrew MacLeod via Gcc-patches

On 12/3/21 11:46, Richard Biener wrote:

On December 3, 2021 3:15:25 PM GMT+01:00, Andrew MacLeod  
wrote:

When something like the loop unswitching code adds elements to the CFGs,
does this invalidate the dominators? or are they updated?  or is it in
an in between state.

Im curious because a) the relation code uses it under the covers, and b)
Im looking to add a ranger caching improvement which also uses
dominators if they are available.

When blocks are added, I wonder what will happen to

   1) dom_info_available_p (CDI_DOMINATORS)  (is it still true), and
then what happens to

   2) get_immediate_dominator (CDI_DOMINATORS, bb);  for one of the
newly added BBs.

Dominators are generally updated by most high level CFG manipulations, just the 
fast queries are invalidated. If a pass uses CFG manipulation that does not 
update dominators you will get ICEs or silent wron code...

Richard.

Are post dominators generally available as well?  ie, does the presence 
of one imply the other?  or do I need to call calculate_dominance _info 
with CDI_POST_DOMINATORS?


Thanks

Andrew




Re: Dominators question

2021-12-14 Thread Jeff Law via Gcc-patches




On 12/14/2021 12:51 PM, Andrew MacLeod via Gcc-patches wrote:

On 12/3/21 11:46, Richard Biener wrote:
On December 3, 2021 3:15:25 PM GMT+01:00, Andrew MacLeod 
 wrote:
When something like the loop unswitching code adds elements to the 
CFGs,

does this invalidate the dominators? or are they updated?  or is it in
an in between state.

Im curious because a) the relation code uses it under the covers, 
and b)

Im looking to add a ranger caching improvement which also uses
dominators if they are available.

When blocks are added, I wonder what will happen to

   1) dom_info_available_p (CDI_DOMINATORS)  (is it still true), and
then what happens to

   2) get_immediate_dominator (CDI_DOMINATORS, bb);  for one of the
newly added BBs.
Dominators are generally updated by most high level CFG 
manipulations, just the fast queries are invalidated. If a pass uses 
CFG manipulation that does not update dominators you will get ICEs or 
silent wron code...


Richard.

Are post dominators generally available as well?  ie, does the 
presence of one imply the other?  or do I need to call 
calculate_dominance _info with CDI_POST_DOMINATORS?

They're built and maintained separately.

jeff



[PATCH] PR fortran/103718 & PR fortran/103719 - [11/12 Regression] ICE in doloop_contained_procedure_code

2021-12-14 Thread Harald Anlauf via Gcc-patches
Dear all,

there are several pretty obvious NULL pointer dereferences on
valid and invalid code when checking do-loop contained stuff.
Reported by Gerhard.

Regtested on x86_64-pc-linux-gnu.  OK for mainline/11-branch?

Thanks,
Harald

From 89bf4b17022890b539cd4b5dbe9bd9142ff8706c Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Tue, 14 Dec 2021 21:02:04 +0100
Subject: [PATCH] Fortran: prevent NULL pointer dereferences checking do-loop
 contained stuff

gcc/fortran/ChangeLog:

	PR fortran/103718
	PR fortran/103719
	* frontend-passes.c (doloop_contained_procedure_code): Add several
	checks to prevent NULL pointer dereferences on valid and invalid
	code called within do-loops.

gcc/testsuite/ChangeLog:

	PR fortran/103718
	PR fortran/103719
	* gfortran.dg/do_check_18.f90: New test.
---
 gcc/fortran/frontend-passes.c | 17 --
 gcc/testsuite/gfortran.dg/do_check_18.f90 | 27 +++
 2 files changed, 37 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/do_check_18.f90

diff --git a/gcc/fortran/frontend-passes.c b/gcc/fortran/frontend-passes.c
index 57b24a11cbe..c106ee0957a 100644
--- a/gcc/fortran/frontend-passes.c
+++ b/gcc/fortran/frontend-passes.c
@@ -2390,7 +2390,7 @@ doloop_contained_procedure_code (gfc_code **c,
   switch (co->op)
 {
 case EXEC_ASSIGN:
-  if (co->expr1->symtree->n.sym == do_var)
+  if (co->expr1->symtree && co->expr1->symtree->n.sym == do_var)
 	gfc_error_now (errmsg, do_var->name, &co->loc, info->procedure->name,
 		   &info->where_do);
   break;
@@ -2411,14 +2411,14 @@ doloop_contained_procedure_code (gfc_code **c,
   break;

 case EXEC_OPEN:
-  if (co->ext.open->iostat
+  if (co->ext.open && co->ext.open->iostat
 	  && co->ext.open->iostat->symtree->n.sym == do_var)
 	gfc_error_now (errmsg, do_var->name, &co->ext.open->iostat->where,
 		   info->procedure->name, &info->where_do);
   break;

 case EXEC_CLOSE:
-  if (co->ext.close->iostat
+  if (co->ext.close && co->ext.close->iostat
 	  && co->ext.close->iostat->symtree->n.sym == do_var)
 	gfc_error_now (errmsg, do_var->name, &co->ext.close->iostat->where,
 		   info->procedure->name, &info->where_do);
@@ -2429,7 +2429,8 @@ doloop_contained_procedure_code (gfc_code **c,
 	{

 	case EXEC_INQUIRE:
-#define CHECK_INQ(a) do { if (co->ext.inquire->a &&			\
+#define CHECK_INQ(a) do { if (co->ext.inquire&&			\
+			  co->ext.inquire->a &&			\
 			  co->ext.inquire->a->symtree->n.sym == do_var) \
 	  gfc_error_now (errmsg, do_var->name,			\
 			 &co->ext.inquire->a->where,		\
@@ -2448,21 +2449,23 @@ doloop_contained_procedure_code (gfc_code **c,
 #undef CHECK_INQ

 	case EXEC_READ:
-	  if (co->expr1 && co->expr1->symtree->n.sym == do_var)
+	  if (co->expr1 && co->expr1->symtree
+	  && co->expr1->symtree->n.sym == do_var)
 	gfc_error_now (errmsg, do_var->name, &co->expr1->where,
 			   info->procedure->name, &info->where_do);

 	  /* Fallthrough.  */

 	case EXEC_WRITE:
-	  if (co->ext.dt->iostat
+	  if (co->ext.dt && co->ext.dt->iostat && co->ext.dt->iostat->symtree
 	  && co->ext.dt->iostat->symtree->n.sym == do_var)
 	gfc_error_now (errmsg, do_var->name, &co->ext.dt->iostat->where,
 			   info->procedure->name, &info->where_do);
 	  break;

 	case EXEC_IOLENGTH:
-	  if (co->expr1 && co->expr1->symtree->n.sym == do_var)
+	  if (co->expr1 && co->expr1->symtree
+	  && co->expr1->symtree->n.sym == do_var)
 	gfc_error_now (errmsg, do_var->name, &co->expr1->where,
 			   info->procedure->name, &info->where_do);
 	  break;
diff --git a/gcc/testsuite/gfortran.dg/do_check_18.f90 b/gcc/testsuite/gfortran.dg/do_check_18.f90
new file mode 100644
index 000..b06112aa68f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/do_check_18.f90
@@ -0,0 +1,27 @@
+! { dg-do compile }
+! PR103718,
+! PR103719 - ICE in doloop_contained_procedure_code
+! Contributed by G.Steinmetz
+
+subroutine s1
+  integer :: i
+  do i = 1, 2
+ call s
+  end do
+contains
+  subroutine s
+integer :: n
+inquire (iolength=n) 0  ! valid
+  end
+end
+
+subroutine s2
+  integer :: i
+  do i = 1, 2
+ call s
+  end do
+contains
+  subroutine s
+shape(1) = 0! { dg-error "Non-variable expression" }
+  end
+end
--
2.26.2



Re: [PATCH] PR fortran/103718 & PR fortran/103719 - [11/12 Regression] ICE in doloop_contained_procedure_code

2021-12-14 Thread Thomas Koenig via Gcc-patches



Hi Harald,


there are several pretty obvious NULL pointer dereferences on
valid and invalid code when checking do-loop contained stuff.
Reported by Gerhard.

Regtested on x86_64-pc-linux-gnu.  OK for mainline/11-branch?


OK for both.  Thanks for cleaning this up!

Regards

Thomas


[PATCH] dwarf2cfi: Improve cfa_reg comparisons [PR103619]

2021-12-14 Thread Jakub Jelinek via Gcc-patches
On Tue, Dec 14, 2021 at 10:32:21AM -0700, Jeff Law wrote:
> I think the attached testcase should trigger on c6x with -mbig-endian -O2 -g

Thanks.  Finally I see what's going on.  c6x doesn't really need the CFA
with span > 1 (and I bet neither does armbe), the only reason why
dwf_cfa_reg is called is that the code in 13 cases just tries to compare
the CFA against dwf_cfa_reg (some_reg).  And that dwf_cfa_reg on some reg
that usually isn't a CFA reg results in targetm.dwarf_register_span hook
call, which on targets like c6x or armeb and others for some registers
creates a PARALLEL with various REGs in it, then the loop with the assertion
and finally operator== which just notes that the reg is different and fails.

This seems compile time memory and time inefficient.

The following so far untested patch instead adds an extra operator== and !=
for comparison of cfa_reg with rtx, which has the most common case where it
is a different register number done early without actually invoking
dwf_cfa_reg.  This means the assertion in dwf_cfa_reg can stay as is (at
least until some big endian target needs to have hard frame pointer or stack
pointer with span > 1 as well).
I've removed a different assertion there because it is redundant - dwf_regno
already has exactly that assertion in it too.

And I've included those 2 tweaks to avoid creating a REG in GC memory when
we can use {stack,hard_frame}_pointer_rtx which is already initialized to
the same REG we need by init_emit_regs.

Ok for trunk if it passes bootstrap/regtest?

2021-12-14  Jakub Jelinek  

PR debug/103619
* dwarf2cfi.c (dwf_cfa_reg): Remove gcc_assert.
(operator==, operator!=): New overloaded operators.
(dwarf2out_frame_debug_adjust_cfa, dwarf2out_frame_debug_cfa_offset,
dwarf2out_frame_debug_expr): Compare vars with cfa_reg type directly
with REG rtxes rather than with dwf_cfa_reg results on those REGs.
(create_cie_data): Use stack_pointer_rtx instead of
gen_rtx_REG (Pmode, STACK_POINTER_REGNUM).
(execute_dwarf2_frame): Use hard_frame_pointer_rtx instead of
gen_rtx_REG (Pmode, HARD_FRAME_POINTER_REGNUM).

--- gcc/dwarf2cfi.c.jj  2021-12-14 19:00:49.067607884 +0100
+++ gcc/dwarf2cfi.c 2021-12-14 20:29:19.138677618 +0100
@@ -1113,8 +1113,6 @@ dwf_cfa_reg (rtx reg)
 {
   struct cfa_reg result;
 
-  gcc_assert (REGNO (reg) < FIRST_PSEUDO_REGISTER);
-
   result.reg = dwf_regno (reg);
   result.span = 1;
   result.span_width = 0;
@@ -1144,6 +1142,25 @@ dwf_cfa_reg (rtx reg)
   return result;
 }
 
+/* More efficient comparisons that don't call targetm.dwarf_register_span
+   unnecessarily.  */
+
+static bool
+operator== (cfa_reg &cfa, rtx reg)
+{
+  unsigned int regno = dwf_regno (reg);
+  if (cfa.reg != regno)
+return false;
+  struct cfa_reg other = dwf_cfa_reg (reg);
+  return cfa == other;
+}
+
+static inline bool
+operator!= (cfa_reg &cfa, rtx reg)
+{
+  return !(cfa == reg);
+}
+
 /* Compare X and Y for equivalence.  The inputs may be REGs or PC_RTX.  */
 
 static bool
@@ -1313,7 +1330,7 @@ dwarf2out_frame_debug_adjust_cfa (rtx pa
   switch (GET_CODE (src))
 {
 case PLUS:
-  gcc_assert (dwf_cfa_reg (XEXP (src, 0)) == cur_cfa->reg);
+  gcc_assert (cur_cfa->reg == XEXP (src, 0));
   cur_cfa->offset -= rtx_to_poly_int64 (XEXP (src, 1));
   break;
 
@@ -1346,11 +1363,11 @@ dwarf2out_frame_debug_cfa_offset (rtx se
   switch (GET_CODE (addr))
 {
 case REG:
-  gcc_assert (dwf_cfa_reg (addr) == cur_cfa->reg);
+  gcc_assert (cur_cfa->reg == addr);
   offset = -cur_cfa->offset;
   break;
 case PLUS:
-  gcc_assert (dwf_cfa_reg (XEXP (addr, 0)) == cur_cfa->reg);
+  gcc_assert (cur_cfa->reg == XEXP (addr, 0));
   offset = rtx_to_poly_int64 (XEXP (addr, 1)) - cur_cfa->offset;
   break;
 default:
@@ -1797,7 +1814,7 @@ dwarf2out_frame_debug_expr (rtx expr)
{
  /* Setting FP from SP.  */
case REG:
- if (cur_cfa->reg == dwf_cfa_reg (src))
+ if (cur_cfa->reg == src)
{
  /* Rule 1 */
  /* Update the CFA rule wrt SP or FP.  Make sure src is
@@ -1828,7 +1845,7 @@ dwarf2out_frame_debug_expr (rtx expr)
{
  gcc_assert (REGNO (dest) == HARD_FRAME_POINTER_REGNUM
  && fde->drap_reg != INVALID_REGNUM
- && cur_cfa->reg != dwf_cfa_reg (src)
+ && cur_cfa->reg != src
  && fde->rule18);
  fde->rule18 = 0;
  /* The save of hard frame pointer has been deferred
@@ -1852,8 +1869,7 @@ dwarf2out_frame_debug_expr (rtx expr)
  /* Adjusting SP.  */
  if (REG_P (XEXP (src, 1)))
{
- gcc_assert (dwf_cfa_reg (XEXP (src, 1))
- == cur_trace->cfa_temp.reg);
+ gcc_assert (cur_trace->cfa_temp.reg == XEX

[PATCH, committed] PR fortran/103717 - ICE in doloop_code, at fortran/frontend-passes.c:2656

2021-12-14 Thread Harald Anlauf via Gcc-patches
Dear all,

the attached patch fixes an obvious NULL pointer dereference.
Committed as obvious after regtesting on x86_64-pc-linux-gnu.

Will "backport" to 11-branch after waiting a few days unless
someone protests.

Thanks,
Harald

From ca39102e10643a6b3f07d06934cc0907ba83d9ee Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Tue, 14 Dec 2021 21:57:04 +0100
Subject: [PATCH] Fortran: prevent NULL pointer dereference in check of passed
 do-loop variable

gcc/fortran/ChangeLog:

	PR fortran/103717
	* frontend-passes.c (doloop_code): Prevent NULL pointer
	dereference when checking for passing a do-loop variable to a
	contained procedure with an interface mismatch.

gcc/testsuite/ChangeLog:

	PR fortran/103717
	* gfortran.dg/do_check_19.f90: New test.
---
 gcc/fortran/frontend-passes.c |  2 +-
 gcc/testsuite/gfortran.dg/do_check_19.f90 | 21 +
 2 files changed, 22 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/do_check_19.f90

diff --git a/gcc/fortran/frontend-passes.c b/gcc/fortran/frontend-passes.c
index c106ee0957a..6ffe072b185 100644
--- a/gcc/fortran/frontend-passes.c
+++ b/gcc/fortran/frontend-passes.c
@@ -2653,7 +2653,7 @@ doloop_code (gfc_code **c, int *walk_subtrees ATTRIBUTE_UNUSED,

 	  do_sym = cl->ext.iterator->var->symtree->n.sym;

-	  if (a->expr && a->expr->symtree
+	  if (a->expr && a->expr->symtree && f->sym
 		  && a->expr->symtree->n.sym == do_sym)
 		{
 		  if (f->sym->attr.intent == INTENT_OUT)
diff --git a/gcc/testsuite/gfortran.dg/do_check_19.f90 b/gcc/testsuite/gfortran.dg/do_check_19.f90
new file mode 100644
index 000..1373a7374ce
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/do_check_19.f90
@@ -0,0 +1,21 @@
+! { dg-do compile }
+! { dg-prune-output "Obsolescent feature: Alternate-return argument" }
+! PR fortran/103717 - ICE in doloop_code
+! Contributed by G.Steinmetz
+
+program p
+  integer :: i
+  do i = 1, 2
+ call s(i) ! { dg-error "Missing alternate return specifier" }
+  end do
+contains
+  subroutine s(*)
+  end
+end
+
+recursive subroutine s(*)
+  integer :: i
+  do i = 1, 2
+ call s(i) ! { dg-error "Missing alternate return specifier" }
+  end do
+end
--
2.26.2



Re: [PATCH] gcc/diagnostic.c: make -Werror message more helpful

2021-12-14 Thread Eric Gallager via Gcc-patches
On Tue, Dec 14, 2021 at 1:33 PM Eric Gallager  wrote:
>
> On Mon, Dec 13, 2021 at 1:17 PM Martin Sebor via Gcc-patches
>  wrote:
> >
> > On 12/12/21 3:13 AM, Andrea Monaco via Gcc-patches wrote:
> > >
> > > Hello.
> > >
> > >
> > > I propose to make that message more verbose.  It sure would have helped
> > > me once.  You don't always have a Web search available :)
> >
> > Warnings turned into errors have the [-Werror=...] tag at the end
> > so I'm not sure I see when reiterating -Werror at the end of output
> > would be helpful.  Can you explain the circumstances when it would
> > have helped you?
> >
> > For what it's worth, a change here that I think might be more useful
> > is printing the number of diagnostics of each kind (e.g., 2 warnings
> > and 5 errors found).
> >
>
> I swear we already had a bug open for this suggestion, but after much
> searching I can't seem to find it anymore, so if anyone has any ideas
> of what keywords I forgot to try, feel free to send them...

Never mind, I managed to find it after all: it's bug 26061:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26061

> > > Andrea Monaco
> > >
> > >
> > >
> > > diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
> > > index 4ded1760705..8b67662390e 100644
> > > --- a/gcc/diagnostic.c
> > > +++ b/gcc/diagnostic.c
> > > @@ -156,7 +156,7 @@ default_diagnostic_final_cb (diagnostic_context 
> > > *context)
> > > /* -Werror was given.  */
> > > if (context->warning_as_error_requested)
> > >  pp_verbatim (context->printer,
> > > -_("%s: all warnings being treated as errors"),
> > > +_("%s: all warnings being treated as errors 
> > > (-Werror; disable with -Wno-error)"),
> >
> > If this change should move forward, -Werror needs to be quoted
> > (e.g., passed as an argument to %qs or surrounded in a pair of
> > %< and %> directives).  The "disable with -Wno-error" part
> > is superfluous and would not be entirely accurate for warnings
> > promoted to errors by #pragma GCC diagnostic (those cannot be
> > demoted back to warnings by -Wno-error).
> >
> > Martin
> >
> > >   progname);
> > > /* At least one -Werror= was given.  */
> > > else
> > >
> >


[committed] libstdc++: Simplify definition of std::regex_constants variables

2021-12-14 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.


This removes the __syntax_option and __match_flag enumeration types,
which are only used to define enumerators with successive values that
are then used to initialize the std::regex_constants global variables.

By defining enumerators in the syntax_option_type and match_flag_type
enumeration types with the correct values for the globals we get rid of
two useless enumeration types that just count from 0 to N, and we
improve the debugging experience. Because the enumeration types now have
enumerators defined, GDB will print values in terms of those enumerators
e.g.

$6 = (std::regex_constants::_S_ECMAScript | std::regex_constants::_S_multiline)

Previously this would have been shown as simply 0x810 because there were
no enumerators of that type.

This changes the type and value of enumerators such as _S_grep, but
users should never be referring to them directly anyway.

libstdc++-v3/ChangeLog:

* include/bits/regex_constants.h (__syntax_option, __match_flag):
Remove.
(syntax_option_type, match_flag_type): Define enumerators.
Use to initialize globals. Add constexpr to compound assignment
operators.
* include/bits/regex_error.h (error_type): Add comment.
* testsuite/28_regex/constants/constexpr.cc: Remove comment.
* testsuite/28_regex/constants/error_type.cc: Improve comment.
* testsuite/28_regex/constants/match_flag_type.cc: Check bitmask
requirements.
* testsuite/28_regex/constants/syntax_option_type.cc: Likewise.
---
 libstdc++-v3/include/bits/regex_constants.h   | 148 --
 libstdc++-v3/include/bits/regex_error.h   |   2 +-
 .../testsuite/28_regex/constants/constexpr.cc |   2 -
 .../28_regex/constants/error_type.cc  |   2 +-
 .../28_regex/constants/match_flag_type.cc |  25 ++-
 .../28_regex/constants/syntax_option_type.cc  |  26 ++-
 6 files changed, 114 insertions(+), 91 deletions(-)

diff --git a/libstdc++-v3/include/bits/regex_constants.h 
b/libstdc++-v3/include/bits/regex_constants.h
index 0fd2879c817..9be14292519 100644
--- a/libstdc++-v3/include/bits/regex_constants.h
+++ b/libstdc++-v3/include/bits/regex_constants.h
@@ -51,21 +51,6 @@ namespace regex_constants
* @name 5.1 Regular Expression Syntax Options
*/
   ///@{
-  enum __syntax_option
-  {
-_S_icase,
-_S_nosubs,
-_S_optimize,
-_S_collate,
-_S_ECMAScript,
-_S_basic,
-_S_extended,
-_S_awk,
-_S_grep,
-_S_egrep,
-_S_polynomial,
-_S_multiline
-  };
 
   /**
* @brief This is a bitmask type indicating how to interpret the regex.
@@ -78,22 +63,34 @@ namespace regex_constants
* elements @c ECMAScript, @c basic, @c extended, @c awk, @c grep, @c egrep
* %set.
*/
-  enum syntax_option_type : unsigned int { };
+  enum syntax_option_type : unsigned int
+  {
+_S_icase   = 1 << 0,
+_S_nosubs  = 1 << 1,
+_S_optimize= 1 << 2,
+_S_collate = 1 << 3,
+_S_ECMAScript  = 1 << 4,
+_S_basic   = 1 << 5,
+_S_extended= 1 << 6,
+_S_awk = 1 << 7,
+_S_grep= 1 << 8,
+_S_egrep   = 1 << 9,
+_S_polynomial  = 1 << 10,
+_S_multiline   = 1 << 11
+  };
 
   /**
* Specifies that the matching of regular expressions against a character
* sequence shall be performed without regard to case.
*/
-  _GLIBCXX17_INLINE constexpr syntax_option_type icase =
-static_cast(1 << _S_icase);
+  _GLIBCXX17_INLINE constexpr syntax_option_type icase = _S_icase;
 
   /**
* Specifies that when a regular expression is matched against a character
* container sequence, no sub-expression matches are to be stored in the
* supplied match_results structure.
*/
-  _GLIBCXX17_INLINE constexpr syntax_option_type nosubs =
-static_cast(1 << _S_nosubs);
+  _GLIBCXX17_INLINE constexpr syntax_option_type nosubs = _S_nosubs;
 
   /**
* Specifies that the regular expression engine should pay more attention to
@@ -101,15 +98,13 @@ namespace regex_constants
* speed with which regular expression objects are constructed. Otherwise
* it has no detectable effect on the program output.
*/
-  _GLIBCXX17_INLINE constexpr syntax_option_type optimize =
-static_cast(1 << _S_optimize);
+  _GLIBCXX17_INLINE constexpr syntax_option_type optimize = _S_optimize;
 
   /**
* Specifies that character ranges of the form [a-b] should be locale
* sensitive.
*/
-  _GLIBCXX17_INLINE constexpr syntax_option_type collate =
-static_cast(1 << _S_collate);
+  _GLIBCXX17_INLINE constexpr syntax_option_type collate = _S_collate;
 
   /**
* Specifies that the grammar recognized by the regular expression engine is
@@ -119,8 +114,7 @@ namespace regex_constants
* in the PERL scripting language but extended with elements found in the
* POSIX regular expression grammar.
*/
-  _GLIBCXX17_I

[committed] libstdc++: Simplify typedefs by using __UINTPTR_TYPE__

2021-12-14 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.


libstdc++-v3/ChangeLog:

* include/ext/pointer.h (_Relative_pointer_impl::_UIntPtrType):
Rename to uintptr_t and define as __UINTPTR_TYPE__.
---
 libstdc++-v3/include/ext/pointer.h | 50 --
 1 file changed, 19 insertions(+), 31 deletions(-)

diff --git a/libstdc++-v3/include/ext/pointer.h 
b/libstdc++-v3/include/ext/pointer.h
index 6bed55f642d..5bf638a0c28 100644
--- a/libstdc++-v3/include/ext/pointer.h
+++ b/libstdc++-v3/include/ext/pointer.h
@@ -120,7 +120,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 if (_M_diff == 1)
   return 0;
 else
-  return reinterpret_cast<_Tp*>(reinterpret_cast<_UIntPtrType>(this)
+  return reinterpret_cast<_Tp*>(reinterpret_cast(this)
+ _M_diff);
   }
   
@@ -130,30 +130,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 if (!__arg)
   _M_diff = 1;
 else
-  _M_diff = reinterpret_cast<_UIntPtrType>(__arg) 
-- reinterpret_cast<_UIntPtrType>(this);
+  _M_diff = reinterpret_cast(__arg) 
+- reinterpret_cast(this);
   }
   
   // Comparison of pointers
   inline bool
   operator<(const _Relative_pointer_impl& __rarg) const
-  { return (reinterpret_cast<_UIntPtrType>(this->get())
-   < reinterpret_cast<_UIntPtrType>(__rarg.get())); }
+  { return (reinterpret_cast(this->get())
+   < reinterpret_cast(__rarg.get())); }
 
   inline bool
   operator==(const _Relative_pointer_impl& __rarg) const
-  { return (reinterpret_cast<_UIntPtrType>(this->get())
-   == reinterpret_cast<_UIntPtrType>(__rarg.get())); }
+  { return (reinterpret_cast(this->get())
+   == reinterpret_cast(__rarg.get())); }
 
 private:
-#ifdef _GLIBCXX_USE_LONG_LONG
-  typedef __gnu_cxx::__conditional_type<
-(sizeof(unsigned long) >= sizeof(void*)),
-unsigned long, unsigned long long>::__type _UIntPtrType;
-#else
-  typedef unsigned long _UIntPtrType;
-#endif
-  _UIntPtrType _M_diff;
+  typedef __UINTPTR_TYPE__ uintptr_t;
+  uintptr_t _M_diff;
 };
   
   /**
@@ -173,7 +167,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return 0;
 else
   return reinterpret_cast
- (reinterpret_cast<_UIntPtrType>(this) + _M_diff);
+ (reinterpret_cast(this) + _M_diff);
   }
   
   void 
@@ -182,30 +176,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 if (!__arg)
   _M_diff = 1;
 else
-  _M_diff = reinterpret_cast<_UIntPtrType>(__arg) 
-- reinterpret_cast<_UIntPtrType>(this);
+  _M_diff = reinterpret_cast(__arg) 
+- reinterpret_cast(this);
   }
   
   // Comparison of pointers
   inline bool
   operator<(const _Relative_pointer_impl& __rarg) const
-  { return (reinterpret_cast<_UIntPtrType>(this->get())
-   < reinterpret_cast<_UIntPtrType>(__rarg.get())); }
+  { return (reinterpret_cast(this->get())
+   < reinterpret_cast(__rarg.get())); }
 
   inline bool
   operator==(const _Relative_pointer_impl& __rarg) const
-  { return (reinterpret_cast<_UIntPtrType>(this->get())
-   == reinterpret_cast<_UIntPtrType>(__rarg.get())); }
+  { return (reinterpret_cast(this->get())
+   == reinterpret_cast(__rarg.get())); }
   
 private:
-#ifdef _GLIBCXX_USE_LONG_LONG
-  typedef __gnu_cxx::__conditional_type<
-(sizeof(unsigned long) >= sizeof(void*)),
-unsigned long, unsigned long long>::__type _UIntPtrType;
-#else
-  typedef unsigned long _UIntPtrType;
-#endif
-   _UIntPtrType _M_diff;
+  typedef __UINTPTR_TYPE__ uintptr_t;
+  uintptr_t _M_diff;
 };
 
   /**
@@ -597,7 +585,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 using rebind = typename __gnu_cxx::_Pointer_adapter<
-   typename pointer_traits<_Storage_policy>::template rebind<_Up>>;
+ typename pointer_traits<_Storage_policy>::template rebind<_Up>>;
 
   static pointer pointer_to(typename pointer::reference __r) noexcept
   { return pointer(std::addressof(__r)); }
-- 
2.31.1



[committed] libstdc++: Fix handling of invalid ranges in std::regex [PR102447]

2021-12-14 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.


std::regex currently allows invalid bracket ranges such as [\w-a] which
are only allowed by ECMAScript when in web browser compatibility mode.
It should be an error, because the start of the range is a character
class, not a single character. The current implementation of
_Compiler::_M_expression_term does not provide a way to reject this,
because we only remember a previous character, not whether we just
processed a character class (or collating symbol etc.)

This patch replaces the pair used to emulate
optional with a custom class closer to pair. That
allows us to track three states, so that we can tell when we've just
seen a character class.

With this additional state the code in _M_expression_term for processing
the _S_token_bracket_dash can be improved to correctly reject the [\w-a]
case, without regressing for valid cases such as [\w-] and [].

libstdc++-v3/ChangeLog:

PR libstdc++/102447
* include/bits/regex_compiler.h (_Compiler::_BracketState): New
class.
(_Compiler::_BrackeyMatcher): New alias template.
(_Compiler::_M_expression_term): Change pair
parameter to _BracketState. Process first character for
ECMAScript syntax as well as POSIX.
* include/bits/regex_compiler.tcc
(_Compiler::_M_insert_bracket_matcher): Pass _BracketState.
(_Compiler::_M_expression_term): Use _BracketState to store
state between calls. Improve handling of dashes in ranges.
* testsuite/28_regex/algorithms/regex_match/cstring_bracket_01.cc:
Add more tests for ranges containing dashes. Check invalid
ranges with character class at the beginning.
---
 libstdc++-v3/include/bits/regex_compiler.h|  40 +-
 libstdc++-v3/include/bits/regex_compiler.tcc  | 118 --
 .../regex_match/cstring_bracket_01.cc |  62 -
 3 files changed, 152 insertions(+), 68 deletions(-)

diff --git a/libstdc++-v3/include/bits/regex_compiler.h 
b/libstdc++-v3/include/bits/regex_compiler.h
index 88c60c2bed7..06cb48f2b6d 100644
--- a/libstdc++-v3/include/bits/regex_compiler.h
+++ b/libstdc++-v3/include/bits/regex_compiler.h
@@ -121,13 +121,45 @@ namespace __detail
void
_M_insert_bracket_matcher(bool __neg);
 
-  // Returns true if successfully matched one term and should continue.
+  // Cache of the last atom seen in a bracketed range expression.
+  struct _BracketState
+  {
+   enum class _Type : char { _None, _Char, _Class } _M_type = _Type::_None;
+   _CharT _M_char;
+
+   void
+   set(_CharT __c) noexcept { _M_type = _Type::_Char; _M_char = __c; }
+
+   _GLIBCXX_NODISCARD _CharT
+   get() const noexcept { return _M_char; }
+
+   void
+   reset(_Type __t = _Type::_None) noexcept { _M_type = __t; }
+
+   explicit operator bool() const noexcept
+   { return _M_type != _Type::_None; }
+
+   // Previous token was a single character.
+   _GLIBCXX_NODISCARD bool
+   _M_is_char() const noexcept { return _M_type == _Type::_Char; }
+
+   // Previous token was a character class, equivalent class,
+   // collating symbol etc.
+   _GLIBCXX_NODISCARD bool
+   _M_is_class() const noexcept { return _M_type == _Type::_Class; }
+  };
+
+  template
+   using _BracketMatcher
+ = std::__detail::_BracketMatcher<_TraitsT, __icase, __collate>;
+
+  // Returns true if successfully parsed one term and should continue
+  // compiling a bracket expression.
   // Returns false if the compiler should move on.
   template
bool
-   _M_expression_term(pair& __last_char,
-  _BracketMatcher<_TraitsT, __icase, __collate>&
-  __matcher);
+   _M_expression_term(_BracketState& __last_char,
+  _BracketMatcher<__icase, __collate>& __matcher);
 
   int
   _M_cur_int_value(int __radix);
diff --git a/libstdc++-v3/include/bits/regex_compiler.tcc 
b/libstdc++-v3/include/bits/regex_compiler.tcc
index 0e2e1321376..9000aec8e25 100644
--- a/libstdc++-v3/include/bits/regex_compiler.tcc
+++ b/libstdc++-v3/include/bits/regex_compiler.tcc
@@ -403,7 +403,7 @@ namespace __detail
 _M_insert_character_class_matcher()
 {
   __glibcxx_assert(_M_value.size() == 1);
-  _BracketMatcher<_TraitsT, __icase, __collate> __matcher
+  _BracketMatcher<__icase, __collate> __matcher
(_M_ctype.is(_CtypeT::upper, _M_value[0]), _M_traits);
   __matcher._M_add_character_class(_M_value, false);
   __matcher._M_ready();
@@ -417,26 +417,17 @@ namespace __detail
 _Compiler<_TraitsT>::
 _M_insert_bracket_matcher(bool __neg)
 {
-  _BracketMatcher<_TraitsT, __icase, __collate> __matcher(__neg, 
_M_traits);
-  pair __last_char; // Optional<_CharT>
-  __last_char.first = false;
-  if (!(_M_flags & regex_constants::ECMAScript))
-   {
- if (_M_t

Re: [PATCH v2] regrename: Skip renaming if instruction is noop move.

2021-12-14 Thread Jeff Law via Gcc-patches




On 12/13/2021 6:40 PM, Jojo R wrote:

Hi,

Thank you for your review & help.

I could not fetch the merged patch from gcc master of git.

Is there any problem for this ?
I assumed you'd commit the change.  I thought you had commit 
privileges.   I'll go ahead and push it momentarily.


Thanks for following up.

Jeff


Re: [PR100843] store by mult pieces: punt on max_len < min_len

2021-12-14 Thread Jeff Law via Gcc-patches




On 12/10/2021 10:18 PM, Alexandre Oliva wrote:

On Dec 10, 2021, Jeff Law  wrote:


The patch is clearly safe.  My question is should we have caught this
earlier in the call chain?

Callers will call try_store_by_multiple_pieces if set_storage_via_setmem
fails.  setmem doesn't necessarily need min and max len to do its job,
so if we were to modify callers, it would be just guarding the calls of
try_store_by_multiple_pieces with max_len >= min_len: 3 callers in 2
files, which didn't seem appealing to me.

Thanks for the additional info.  OK for the trunk.

Jeff


Re: [PATCH] configure: Account CXXFLAGS in gcc-plugin.m4.

2021-12-14 Thread Jeff Law via Gcc-patches




On 12/10/2021 4:23 PM, Iain Sandoe via Gcc-patches wrote:

While doing tests of the PCH changes, I noticed that all the
plugin tests were being omitted from m32 Darwin under some
permutations of flags.  It turned out to be a broken config
test - it was not removing -mdynamic-no-pic properly.

We now use a C++ compiler so that we need to process CXXFLAGS
as well as CFLAGS in the gcc-plugin config fragment.

Tested on i686, x86_64-darwin, x86_64, powerpc64le-linux
OK for master?
backports?

Signed-off-by: Iain Sandoe 

config/ChangeLog:

* gcc-plugin.m4: Save and process CXXFLAGS.

gcc/ChangeLog:

* configure: Regenerate.

libcc1/ChangeLog:

* configure: Regenerate.

OK
jeff



[r12-5960 Regression] FAIL: gfortran.dg/unlimited_polymorphic_3.f03 -Os execution test on Linux/x86_64

2021-12-14 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

3305135c29e1c3e988bd9bad40aefc01d138aaca is the first bad commit
commit 3305135c29e1c3e988bd9bad40aefc01d138aaca
Author: Jan Hubicka 
Date:   Tue Dec 14 16:50:27 2021 +0100

Determine global memory accesses in ipa-modref

caused

FAIL: gfortran.dg/unlimited_polymorphic_3.f03   -Os  execution test

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-5960/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/unlimited_polymorphic_3.f03 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/unlimited_polymorphic_3.f03 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/unlimited_polymorphic_3.f03 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/unlimited_polymorphic_3.f03 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: [PATCH] dwarf2cfi: Improve cfa_reg comparisons [PR103619]

2021-12-14 Thread Jeff Law via Gcc-patches




On 12/14/2021 1:18 PM, Jakub Jelinek wrote:

On Tue, Dec 14, 2021 at 10:32:21AM -0700, Jeff Law wrote:

I think the attached testcase should trigger on c6x with -mbig-endian -O2 -g

Thanks.  Finally I see what's going on.  c6x doesn't really need the CFA
with span > 1 (and I bet neither does armbe), the only reason why
dwf_cfa_reg is called is that the code in 13 cases just tries to compare
the CFA against dwf_cfa_reg (some_reg).  And that dwf_cfa_reg on some reg
that usually isn't a CFA reg results in targetm.dwarf_register_span hook
call, which on targets like c6x or armeb and others for some registers
creates a PARALLEL with various REGs in it, then the loop with the assertion
and finally operator== which just notes that the reg is different and fails.

This seems compile time memory and time inefficient.

The following so far untested patch instead adds an extra operator== and !=
for comparison of cfa_reg with rtx, which has the most common case where it
is a different register number done early without actually invoking
dwf_cfa_reg.  This means the assertion in dwf_cfa_reg can stay as is (at
least until some big endian target needs to have hard frame pointer or stack
pointer with span > 1 as well).
I've removed a different assertion there because it is redundant - dwf_regno
already has exactly that assertion in it too.

And I've included those 2 tweaks to avoid creating a REG in GC memory when
we can use {stack,hard_frame}_pointer_rtx which is already initialized to
the same REG we need by init_emit_regs.

Ok for trunk if it passes bootstrap/regtest?

2021-12-14  Jakub Jelinek  

PR debug/103619
* dwarf2cfi.c (dwf_cfa_reg): Remove gcc_assert.
(operator==, operator!=): New overloaded operators.
(dwarf2out_frame_debug_adjust_cfa, dwarf2out_frame_debug_cfa_offset,
dwarf2out_frame_debug_expr): Compare vars with cfa_reg type directly
with REG rtxes rather than with dwf_cfa_reg results on those REGs.
(create_cie_data): Use stack_pointer_rtx instead of
gen_rtx_REG (Pmode, STACK_POINTER_REGNUM).
(execute_dwarf2_frame): Use hard_frame_pointer_rtx instead of
gen_rtx_REG (Pmode, HARD_FRAME_POINTER_REGNUM).
So if someone is unfamiliar with the underlying issues here and needs to 
twiddle dwarf2cfi, how are they supposed to know if they should compare 
directly or use dwf_cfa_reg?


I'm not saying the patch is wrong, just wondering if we're setting 
ourselves up for a maintenance problem going forward.


jeff



Re: [Patch]Enable -Wuninitialized + -ftrivial-auto-var-init for address taken variables

2021-12-14 Thread Martin Sebor via Gcc-patches

On 12/14/21 9:43 AM, Qing Zhao wrote:

Hi,


On Dec 9, 2021, at 12:13 PM, Qing Zhao via Gcc-patches 
 wrote:



+ return;
+
+ /* Get the variable declaration location from the def_stmt.  */
+ var_decl_loc = gimple_location (def_stmt);
+
+ /* The LHS of the call is a temporary variable, we use it as a
+placeholder to record the information on whether the warning
+has been issued or not.  */
+ repl_var = gimple_call_lhs (def_stmt);
+   }
 }
-  if (var == NULL_TREE)
+  if (var == NULL_TREE && var_name == NULL_TREE)
 return;
 /* Avoid warning if we've already done so or if the warning has been
@@ -207,36 +245,56 @@ warn_uninit (opt_code opt, tree t, tree var, const char 
*gmsgid,
   if (((warning_suppressed_p (context, OPT_Wuninitialized)
|| (gimple_assign_single_p (context)
&& get_no_uninit_warning (gimple_assign_rhs1 (context)
-  || get_no_uninit_warning (var))
+  || (var && get_no_uninit_warning (var))
+  || (repl_var && get_no_uninit_warning (repl_var)))
 return;
 /* Use either the location of the read statement or that of the PHI
  argument, or that of the uninitialized variable, in that order,
  whichever is valid.  */
-  location_t location;
+  location_t location = UNKNOWN_LOCATION;
   if (gimple_has_location (context))
 location = gimple_location (context);
   else if (phi_arg_loc != UNKNOWN_LOCATION)
 location = phi_arg_loc;
-  else
+  else if (var)
 location = DECL_SOURCE_LOCATION (var);
+  else if (var_name)
+location = var_decl_loc;
+
   location = linemap_resolve_location (line_table, location,
   LRK_SPELLING_LOCATION, NULL);
 auto_diagnostic_group d;
-  if (!warning_at (location, opt, gmsgid, var))
+  char *gmsgid_final = XNEWVEC (char, strlen (gmsgid) + 5);
+  gmsgid_final[0] = 0;
+  if (var)
+strcat (gmsgid_final, "%qD ");
+  else if (var_name)
+strcat (gmsgid_final, "%qs ");
+  strcat (gmsgid_final, gmsgid);
+
+  if (var && !warning_at (location, opt, gmsgid_final, var))
+return;
+  else if (var_name && !warning_at (location, opt, gmsgid_final, var_name_str))
 return;


Dynamically creating the string seems quite cumbersome here, and
it leaks the allocated block.  I wonder if it might be better to
remove the gmsgid argument from the function and assign it to
one of the literals based on the other arguments.

Since only one of var and var_name is used, I also wonder if
the %qs form could be used for both to simplify the overall
logic.  (I.e., get the IDENTIFIER_POINTER string from var and
use it instead of %qD).


Looks like that using “%qs” + get the IDENTIFIER_POINTER string from var did 
not work very well for the following testing case:

   1 /* PR tree-optimization/45083 */
   2 /* { dg-do compile } */
   3 /* { dg-options "-O2 -Wuninitialized" } */
   4
   5 struct S { char *a; unsigned b; unsigned c; };
   6 extern int foo (const char *);
   7 extern void bar (int, int);
   8
   9 static void
  10 baz (void)
  11 {
  12   struct S cs[1];   /* { dg-message "was declared here" } */
  13   switch (cs->b)/* { dg-warning "cs\[^\n\r\]*\\.b\[^\n\r\]*is used 
uninitialized" } */
  14 {
  15 case 101:
  16   if (foo (cs->a))  /* { dg-warning "cs\[^\n\r\]*\\.a\[^\n\r\]*may be used 
uninitialized" } */
  17 bar (cs->c, cs->b); /* { dg-warning "cs\[^\n\r\]*\\.c\[^\n\r\]*may 
be used uninitialized" } */
  18 }
  19 }
  20
  21 void
  22 test (void)
  23 {
  24   baz ();
  25 }


For the uninitialized usages at line 13, 16, 17: the IDENTIFIER_POINTER string 
of var are:
cs$0$b, cs$0$a ,cs$0$c

However, with %qD, they are printed as cs[0].b, cs[0].a, cs[0].c
But with %qs, they are printed as cs$0$b, cs$0$a ,cs$0$c.

Looks like that %qD does not simplify print out the IDENTIFIER_POINTER string 
directly, it specially handle it for some cases.

I tried to see how %qD specially handle the strings, but didn’t get it so far.

Do you know where the %qD handle this case specially?


In the front end's pretty printer where it handles %D (e.g.,
for C in c_tree_printer in c/c-objc-common.c).  For VARs with
DECL_HAS_DEBUG_EXPR_P (temp) the code uses DECL_DEBUG_EXPR().

There's also print_generic_expr_to_str(tree) that formats a decl
or an expression to a dynamically allocated string (the string
needs to be freed).

Martin



Thanks.

Qing



Both are good suggestions, I will try to update the code based on this.

Thanks again.

Qing










[PATCH] c++: local_specializations and recursive constrained fn [PR103714]

2021-12-14 Thread Patrick Palka via Gcc-patches
Here during constraint checking for the recursive call to 'f',
substitution into the PARM_DECL 'd' in the atomic constraint gives us
the wrong local specialization because local_specializations at this
point contains entities associated with the _outer_ call to 'f'.

This patch fixes this by calling push_to_top_level during constraint
checking, which'll clear local_specializations for us.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/11?  Also tested on cmcstl2 and range-v3.

PR c++/103714

gcc/cp/ChangeLog:

* constraint.cc (satisfy_declaration_constraints): Do
push_to_top_level and pop_from_top_level around the call to
satisfy_normalized_constraints.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-uneval5.C: New test.
---
 gcc/cp/constraint.cc  |  4 
 gcc/testsuite/g++.dg/cpp2a/concepts-uneval5.C | 17 +
 2 files changed, 21 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-uneval5.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 566f4e38fac..9b0348dfdd6 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3186,9 +3186,11 @@ satisfy_declaration_constraints (tree t, sat_info info)
 {
   if (!push_tinst_level (t))
return result;
+  push_to_top_level ();
   push_access_scope (t);
   result = satisfy_normalized_constraints (norm, args, info);
   pop_access_scope (t);
+  pop_from_top_level ();
   pop_tinst_level ();
 }
 
@@ -3244,9 +3246,11 @@ satisfy_declaration_constraints (tree t, tree args, 
sat_info info)
   if (!push_tinst_level (t, args))
return result;
   tree pattern = DECL_TEMPLATE_RESULT (t);
+  push_to_top_level ();
   push_access_scope (pattern);
   result = satisfy_normalized_constraints (norm, args, info);
   pop_access_scope (pattern);
+  pop_from_top_level ();
   pop_tinst_level ();
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-uneval5.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-uneval5.C
new file mode 100644
index 000..a315a59b828
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-uneval5.C
@@ -0,0 +1,17 @@
+// PR c++/103714
+// { dg-do compile { target c++20 } }
+
+template
+struct A {
+  static const int i = I;
+
+  template
+  void f(A d = {}) requires (d.i != i) {
+f(); // { dg-error "no match" }
+  }
+};
+
+int main() {
+  A<0> a;
+  a.f<1>();
+}
-- 
2.34.1.182.ge773545c7f



Re: [PATCH] dwarf2cfi: Improve cfa_reg comparisons [PR103619]

2021-12-14 Thread Jakub Jelinek via Gcc-patches
On Tue, Dec 14, 2021 at 03:05:37PM -0700, Jeff Law wrote:
> > 2021-12-14  Jakub Jelinek  
> > 
> > PR debug/103619
> > * dwarf2cfi.c (dwf_cfa_reg): Remove gcc_assert.
> > (operator==, operator!=): New overloaded operators.
> > (dwarf2out_frame_debug_adjust_cfa, dwarf2out_frame_debug_cfa_offset,
> > dwarf2out_frame_debug_expr): Compare vars with cfa_reg type directly
> > with REG rtxes rather than with dwf_cfa_reg results on those REGs.
> > (create_cie_data): Use stack_pointer_rtx instead of
> > gen_rtx_REG (Pmode, STACK_POINTER_REGNUM).
> > (execute_dwarf2_frame): Use hard_frame_pointer_rtx instead of
> > gen_rtx_REG (Pmode, HARD_FRAME_POINTER_REGNUM).
> So if someone is unfamiliar with the underlying issues here and needs to
> twiddle dwarf2cfi, how are they supposed to know if they should compare
> directly or use dwf_cfa_reg?

Comparison without dwf_cfa_reg should be used whenever possible, because
for registers which are never CFA related that won't call
targetm.dwarf_register_span uselessly.

The only comparisons with dwf_cfa_reg I've kept are the:
regno = dwf_cfa_reg (XEXP (XEXP (dest, 0), 0));
  
if (cur_cfa->reg == regno)
  offset -= cur_cfa->offset;
else if (cur_trace->cfa_store.reg == regno)
  offset -= cur_trace->cfa_store.offset;
else
  {   
gcc_assert (cur_trace->cfa_temp.reg == regno);
offset -= cur_trace->cfa_temp.offset;
  }
and
struct cfa_reg regno = dwf_cfa_reg (XEXP (dest, 0));
  
if (cur_cfa->reg == regno)
  offset = -cur_cfa->offset;
else if (cur_trace->cfa_store.reg == regno)
  offset = -cur_trace->cfa_store.offset;
else
  {
gcc_assert (cur_trace->cfa_temp.reg == regno);
offset = -cur_trace->cfa_temp.offset;
  }
and there are 2 reasons for it:
1) there is an assertion, which guarantees it must compare equal to one of
those 3 cfa related struct cfa_reg structs, so it must be some CFA related
register (so, right now, targetm.dwarf_register_span shouldn't return
non-NULL in those on anything but gcn)
2) it is compared 3 times in a row, so for the GCN case doing
if (cur_cfa->reg == XEXP (XEXP (dest, 0), 0))
  offset -= cur_cfa->offset;
else if (cur_trace->cfa_store.reg == XEXP (XEXP (dest, 0), 0))
  offset -= cur_trace->cfa_store.offset;
else
  {   
gcc_assert (cur_trace->cfa_temp.reg == XEXP (XEXP (dest, 0), 
0));
offset -= cur_trace->cfa_temp.offset;
  }
could actually create more GC allocated garbage than the way it is written
now.  But doing it that way would work fine.

I think for most of the comparisons even comparing with dwf_cfa_reg would
work but be less compile time/memory efficient (e.g. those assertions that
it is equal to some CFA related cfa_reg or in any spots where only the CFA
related regs may appear in the frame related patterns).

I'm aware just of a single spot where comparison with dwf_cfa_reg doesn't
work (when the assert is in dwf_cfa_reg), that is the spot that was ICEing
on your testcase, where we save arbitrary call saved register:
  if (REG_P (src)
  && REGNO (src) != STACK_POINTER_REGNUM
  && REGNO (src) != HARD_FRAME_POINTER_REGNUM
  && cur_cfa->reg == src)

Jakub



[PATCH] Check for class type before assuming a type is one [PR103703]

2021-12-14 Thread Martin Sebor via Gcc-patches

The attached patch avoids an ICE when using
the CLASSTYPE_IMPLICIT_INSTANTIATION() macro with an argument
that is not a class type but rather a typename_type.

The test case should trigger a warning but doesn't because
the code doesn't fully handle explicit instantiations.

Martin
Check for class type before assuming a type is one [PR103703].

Resolves:
PR c++/103703 - ICE with -Wmismatched-tags with friends and templates

gcc/cp/ChangeLog:

	PR c++/103703
	* parser.c (class_decl_loc_t::diag_mismatched_tags):

gcc/testsuite/ChangeLog:

	PR c++/103703
	* g++.dg/warn/Wmismatched-tags-9.C: New test.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 52225d46d4e..d21e1d9de6d 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -33536,7 +33536,7 @@ class_decl_loc_t::diag_mismatched_tags (tree type_decl)
   class_decl_loc_t *cdlguide = this;
 
   tree type = TREE_TYPE (type_decl);
-  if (CLASSTYPE_IMPLICIT_INSTANTIATION (type))
+  if (CLASS_TYPE_P (type) && CLASSTYPE_IMPLICIT_INSTANTIATION (type))
 {
   /* For implicit instantiations of a primary template look up
 	 the primary or partial specialization and use it as
diff --git a/gcc/testsuite/g++.dg/warn/Wmismatched-tags-9.C b/gcc/testsuite/g++.dg/warn/Wmismatched-tags-9.C
new file mode 100644
index 000..2712c4de1f6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wmismatched-tags-9.C
@@ -0,0 +1,32 @@
+/* PR c++/103703 - ICE with -Wmismatched-tags with friends and templates
+   { dg-do compile }
+   { dg-options "-Wall -Wmismatched-tags" } */
+
+template 
+struct A
+{
+  struct B { };
+};
+
+template 
+struct C
+{
+  friend class A::B;   // { dg-warning "-Wmismatched-tags" "pr102036" { xfail *-*-* } }
+};
+
+template struct C;
+
+
+template 
+struct D
+{
+  friend class A::B;   // okay, specialized as class below
+};
+
+template <>
+struct A
+{
+  class B { };
+};
+
+template struct D;


Re: testsuite: Be more informative for ICEs

2021-12-14 Thread Jeff Law via Gcc-patches




On 12/10/2021 3:42 PM, Thomas Schwinge wrote:

Hi!

OK to push the attached "testsuite: Be more informative for ICEs"?


Grüße
  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

0001-testsuite-Be-more-informative-for-ICEs.patch

 From 5ffc2cfc9c6ec6ed0937311377118efd648f0297 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 10 Dec 2021 19:08:26 +0100
Subject: [PATCH] testsuite: Be more informative for ICEs

For example, for the two (FAIL, XFAIL)
'gcc/testsuite/lib/gcc-dg.exp:gcc-dg-test-1' cases:

 -FAIL: g++.dg/modules/xtreme-header-3_a.H -std=c++17 (internal compiler 
error)
 +FAIL: g++.dg/modules/xtreme-header-3_a.H -std=c++17 (internal compiler 
error: tree check: expected var_decl or function_decl or field_decl or 
type_decl or concept_decl or template_decl, have namespace_decl in 
get_merge_kind, at cp/module.cc:10072)

 -FAIL: gfortran.dg/gomp/clauses-1.f90   -O  (internal compiler error)
 +FAIL: gfortran.dg/gomp/clauses-1.f90   -O  (internal compiler error: 
Segmentation fault)

 -XFAIL: c-c++-common/goacc/kernels-decompose-ice-1.c (internal compiler 
error)
 +XFAIL: c-c++-common/goacc/kernels-decompose-ice-1.c (internal compiler 
error: in lower_omp_target, at omp-low.c:13147)

 -XFAIL: g++.dg/cpp1z/constexpr-lambda26.C  -std=c++17 (internal compiler 
error)
 +XFAIL: g++.dg/cpp1z/constexpr-lambda26.C  -std=c++17 (internal compiler 
error: in cxx_eval_constant_expression, at cp/constexpr.c:6954)

That allows for more easily spotting when during development you're trading one
ICE for another.

gcc/testsuite/
* lib/fortran-torture.exp (fortran-torture-compile)
(fortran-torture-execute): Be more informative for ICEs.
* lib/gcc-defs.exp (${tool}_check_compile): Likewise.
* lib/gcc-dg.exp (gcc-dg-test-1): Likewise.
* lib/go-torture.exp (go-torture-compile, go-torture-execute):
Likewise.

OK.

jeff



Re: [PATCH] stddef.h: add support for musl typedef macro guards

2021-12-14 Thread Jeff Law via Gcc-patches




On 12/2/2021 11:24 PM, Sören Tempel wrote:

Hi,

Jeff Law  wrote:

So what doesn't make sense here is how both stddef.h files get
included.  That's the core problem I think you need to resolve.

The libgo/sysinfo.c file includes stddef.h (for which the GCC version in
ginclude is used on my system) and stdlib.h which, on musl, causes an
include of /usr/include/bits/alltypes.h [1] which then defines size_t
and other types, which were already defined by GCC's stddef.h, again [2].

As such, both stddef.h files are actually not included, i.e. only the
GCC one is used. The alternative here would be to have libgo/sysinfo.c
include the stddef.h provide by the systemc libc but not sure if that is
intended here. I am personally not very familiar with the GCC codebase.

I can send you a pre-processed version of sysinfo.c if you want to
reproduce this yourself.
Thanks.  What was confusing was the original explanation indicated it 
was two different versions of stddef.h being included, when in fact it 
was stddef.h (from gcc) and stdlib.h (from musl) getting included that 
caused the problem.


I replaced the instance of stddef.h with stdlib.h in the original 
explanation and included that in the commit log when I pushed this fix 
to the trunk.


Thanks for your patience,
Jeff



Re: [PATCH 1/2] Sync with binutils: GCC: Pass --plugin to AR and RANLIB

2021-12-14 Thread Jeff Law via Gcc-patches




On 11/22/2021 7:29 PM, H.J. Lu wrote:

On Mon, Nov 22, 2021 at 4:29 PM Jeff Law  wrote:



On 11/13/2021 9:33 AM, H.J. Lu via Gcc-patches wrote:

Sync with binutils for building binutils with LTO:

  From 50ad1254d5030d0804cbf89c758359ae202e8d55 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sat, 9 Jan 2021 06:43:11 -0800
Subject: [PATCH] GCC: Pass --plugin to AR and RANLIB

Detect GCC LTO plugin.  Pass --plugin to AR and RANLIB to support LTO
build.

   * Makefile.tpl (AR): Add @AR_PLUGIN_OPTION@
   (RANLIB): Add @RANLIB_PLUGIN_OPTION@.
   * configure.ac: Include config/gcc-plugin.m4.
   AC_SUBST AR_PLUGIN_OPTION and RANLIB_PLUGIN_OPTION.
   * libtool.m4 (_LT_CMD_OLD_ARCHIVE): Pass --plugin to AR and
   RANLIB if possible.
   * Makefile.in: Regenerated.
   * configure: Likewise.

config/

   * gcc-plugin.m4 (GCC_PLUGIN_OPTION): New.

libiberty/

   * Makefile.in (AR): Add @AR_PLUGIN_OPTION@
   (RANLIB): Add @RANLIB_PLUGIN_OPTION@.
   (configure_deps): Depend on ../config/gcc-plugin.m4.
   * configure.ac: AC_SUBST AR_PLUGIN_OPTION and
   RANLIB_PLUGIN_OPTION.
   * aclocal.m4: Regenerated.
   * configure: Likewise.

zlib/

   * configure: Regenerated.

I thought the plugins were automatically loaded if they're in the right
place in the filesystem.  Wouldn't that make this patch unnecessary?  Am
I missing something?


It only works for system GCC and binutils.  It doesn't work for non-system
GCC nor binutils since either GCC plugin isn't installed in the binutils plugin
search patch.
Ah.  So this is primarily useful if GCC was installed into a path 
different than the system binutils expects to find the plugin? Does it 
work properly in cross environments or at least do no harm in those 
kinds of builds?


Jeff


Re: [PATCH] PR target/32803: Add -Oz option for improved clang compatibility.

2021-12-14 Thread Jeff Law via Gcc-patches




On 12/13/2021 5:27 PM, Joseph Myers wrote:

This is missing an invoke.texi update for the new option.
And that update should probably note that -Oz turns on O2.  OK with that 
change.


jeff


Re: [PATCH] dwarf2cfi: Improve cfa_reg comparisons [PR103619]

2021-12-14 Thread Jeff Law via Gcc-patches




On 12/14/2021 3:27 PM, Jakub Jelinek wrote:

On Tue, Dec 14, 2021 at 03:05:37PM -0700, Jeff Law wrote:

2021-12-14  Jakub Jelinek  

PR debug/103619
* dwarf2cfi.c (dwf_cfa_reg): Remove gcc_assert.
(operator==, operator!=): New overloaded operators.
(dwarf2out_frame_debug_adjust_cfa, dwarf2out_frame_debug_cfa_offset,
dwarf2out_frame_debug_expr): Compare vars with cfa_reg type directly
with REG rtxes rather than with dwf_cfa_reg results on those REGs.
(create_cie_data): Use stack_pointer_rtx instead of
gen_rtx_REG (Pmode, STACK_POINTER_REGNUM).
(execute_dwarf2_frame): Use hard_frame_pointer_rtx instead of
gen_rtx_REG (Pmode, HARD_FRAME_POINTER_REGNUM).

So if someone is unfamiliar with the underlying issues here and needs to
twiddle dwarf2cfi, how are they supposed to know if they should compare
directly or use dwf_cfa_reg?

Comparison without dwf_cfa_reg should be used whenever possible, because
for registers which are never CFA related that won't call
targetm.dwarf_register_span uselessly.
So it's easy enough to articulate.   Is there anywhere you could put a 
comment to that effect where it's likely to be seen in that file?


OK with that change.

Jeff



[committed] libstdc++: Support old and new T_FMT for en_HK locale [PR103687]

2021-12-14 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux (glibc 2.33) and powerpc64le-linux (glibc 2.17).
Pushed to trunk.


This checks whether the locale data for en_HK includes %p and adjusts
the string being tested accordingly. To account for Jakub's fix to make
%I parse "12" as 0 instead of 12, we need to change the expected value
for the case where the locale format doesn't include %p. Also change the
time from 12:00:00 to 12:02:01 so we can tell if the minutes and seconds
get mixed up.

libstdc++-v3/ChangeLog:

PR libstdc++/103687
* testsuite/22_locale/time_get/get_date/wchar_t/4.cc: Restore
original locale before returning.
* testsuite/22_locale/time_get/get_time/char/2.cc: Check for %p
in locale's T_FMT and adjust accordingly.
* testsuite/22_locale/time_get/get_time/wchar_t/2.cc: Likewise.
---
 .../22_locale/time_get/get_date/wchar_t/4.cc  |  9 ++---
 .../22_locale/time_get/get_time/char/2.cc | 33 +--
 .../22_locale/time_get/get_time/wchar_t/2.cc  | 33 +--
 3 files changed, 65 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/testsuite/22_locale/time_get/get_date/wchar_t/4.cc 
b/libstdc++-v3/testsuite/22_locale/time_get/get_date/wchar_t/4.cc
index d227d4b1ce0..f6de882e4bd 100644
--- a/libstdc++-v3/testsuite/22_locale/time_get/get_date/wchar_t/4.cc
+++ b/libstdc++-v3/testsuite/22_locale/time_get/get_date/wchar_t/4.cc
@@ -39,7 +39,7 @@ void test01()
 
   wistringstream iss;
   iss.imbue(loc_tw);
-  const time_get& tim_get = use_facet 
>(iss.getloc()); 
+  const time_get& tim_get = use_facet 
>(iss.getloc());
 
   const ios_base::iostate good = ios_base::goodbit;
   ios_base::iostate errorstate = good;
@@ -66,13 +66,14 @@ void test01()
 static bool debian_date_format()
 {
 #ifdef D_FMT
+  std::string orig = setlocale(LC_TIME, NULL);
   if (setlocale(LC_TIME, "zh_TW.UTF-8") != NULL)
   {
 // See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31413
 // and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71641#c2
-if (*nl_langinfo(D_FMT) == '%')
-  return true;
-setlocale(LC_TIME, "C");
+std::string d_fmt = nl_langinfo(D_FMT);
+setlocale(LC_TIME, orig.c_str());
+return d_fmt[0] == '%';
   }
 #endif
   return false;
diff --git a/libstdc++-v3/testsuite/22_locale/time_get/get_time/char/2.cc 
b/libstdc++-v3/testsuite/22_locale/time_get/get_time/char/2.cc
index a847748dc27..b40971a9bf7 100644
--- a/libstdc++-v3/testsuite/22_locale/time_get/get_time/char/2.cc
+++ b/libstdc++-v3/testsuite/22_locale/time_get/get_time/char/2.cc
@@ -25,6 +25,8 @@
 #include 
 #include 
 
+static bool ampm_time_format();
+
 void test02()
 {
   using namespace std;
@@ -36,19 +38,23 @@ void test02()
   locale loc_hk = locale(ISO_8859(1,en_HK));
   VERIFY( loc_hk != loc_c );
 
+  const int pm = ampm_time_format() ? 12 : 0;
   const string empty;
-  const tm time_bday = __gnu_test::test_tm(0, 0, 12, 4, 3, 71, 0, 93, 0);
+  const tm time_bday = __gnu_test::test_tm(1, 2, 0+pm, 4, 3, 71, 0, 93, 0);
 
   // create an ostream-derived object, cache the time_get facet
   iterator_type end;
   istringstream iss;
-  const time_get& tim_get = use_facet >(iss.getloc()); 
+  const time_get& tim_get = use_facet >(iss.getloc());
   const ios_base::iostate good = ios_base::goodbit;
   ios_base::iostate errorstate = good;
 
   // inspection of named locales, en_HK
   iss.imbue(loc_hk);
-  iss.str("12:00:00 PM PST"); 
+  if (pm)
+iss.str("12:02:01 PM PST");
+  else
+iss.str("12:02:01 PST"); // %I means 12-hour clock, so parsed as 12am
   // Hong Kong in California! Well, they have Paris in Vegas... this
   // is all a little disney-esque anyway. Besides, you can get decent
   // Dim Sum in San Francisco.
@@ -62,6 +68,27 @@ void test02()
   VERIFY( errorstate == ios_base::eofbit );
 }
 
+#include 
+#if __has_include()
+# include 
+#endif
+#include 
+
+static bool ampm_time_format()
+{
+#ifdef T_FMT
+  std::string orig = setlocale(LC_TIME, NULL);
+  if (setlocale(LC_TIME, ISO_8859(1,en_HK)) != NULL)
+  {
+// See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103687
+std::string t_fmt = nl_langinfo(T_FMT);
+setlocale(LC_TIME, orig.c_str());
+return t_fmt.find("%p") != std::string::npos;
+  }
+#endif
+  return false;
+}
+
 int main()
 {
   test02();
diff --git a/libstdc++-v3/testsuite/22_locale/time_get/get_time/wchar_t/2.cc 
b/libstdc++-v3/testsuite/22_locale/time_get/get_time/wchar_t/2.cc
index b5d61e1afdb..b74604ef472 100644
--- a/libstdc++-v3/testsuite/22_locale/time_get/get_time/wchar_t/2.cc
+++ b/libstdc++-v3/testsuite/22_locale/time_get/get_time/wchar_t/2.cc
@@ -25,6 +25,8 @@
 #include 
 #include 
 
+static bool ampm_time_format();
+
 void test02()
 {
   using namespace std;
@@ -36,19 +38,23 @@ void test02()
   locale loc_hk = locale(ISO_8859(1,en_HK));
   VERIFY( loc_hk != loc_c );
 
+  const int pm = ampm_time_format() ? 12 : 0;
   const wstring empty;
-  const tm time_bday = __gnu_test::test_tm(0, 0, 12, 4, 3, 71, 0, 93, 0);
+  const tm time_bd

Re: [PATCH 5/5] gcc: Pass sysroot options to cpp for preprocessed source

2021-12-14 Thread Jeff Law via Gcc-patches




On 10/27/2021 2:05 PM, Richard Purdie via Gcc-patches wrote:

OpenEmbedded/Yocto Project extensively uses the --sysroot support within gcc.
We discovered that when compiling preprocessed source (.i or .ii files), the
compiler will try and access the builtin sysroot location rather than the
--sysroot option specified on the commandline. If access to that directory is
permission denied (unreadable), gcc will error. This is particularly problematic
when ccache is involved.

This patch adds %I to the cpp-output spec macro so the default substitutions for
-iprefix, -isystem, -isysroot happen and the correct sysroot is used.

2021-10-27 Richard Purdie 

gcc/cp/ChangeLog:

 * lang-specs.h: Pass sysroot options to cpp for preprocessed source

gcc/ChangeLog:

 * gcc.c: Pass sysroot options to cpp for preprocessed source
So generally OK, though I think this is incomplete.  If I understand the 
underlying bits correctly a similar change is needed in:


{lto,objc,fortran,ada/gcc-interface,objcp}/lang-specs.h.

I think d/lang-specs.h is OK, though it'd probably be useful to double 
check that.


jeff



Re: [PATCH]middle-end: REE should always check all vector usages, even if it finds a defining def. [PR103350]

2021-12-14 Thread Jeff Law via Gcc-patches




On 12/14/2021 2:43 AM, Tamar Christina wrote:

Hi All,

This and the report in PR103632 are caused by a bug in REE where it generates
incorrect code.

It's trying to eliminate the following zero extension

(insn 54 90 102 2 (set (reg:V4SI 33 v1)
 (zero_extend:V4SI (reg/v:V4HI 40 v8)))
  (nil))

by folding it in the definition of `v8`:

(insn 2 5 104 2 (set (reg/v:V4HI 40 v8)
 (reg:V4HI 32 v0 [156]))
  (nil))

which is fine, except that `v8` is also used by the extracts, e.g.:

(insn 11 10 12 2 (set (reg:SI 1 x1)
 (zero_extend:SI (vec_select:HI (reg/v:V4HI 40 v8)
 (parallel [
 (const_int 3)
 ]
  (nil))

REE replaces insn 2 by folding insn 54 and placing it at the definition site of
insn 2, so before insn 11.

Trying to eliminate extension:
(insn 54 90 102 2 (set (reg:V4SI 33 v1)
 (zero_extend:V4SI (reg/v:V4HI 40 v8)))
  (nil))
Tentatively merged extension with definition (copy needed):
(insn 2 5 104 2 (set (reg:V4SI 33 v1)
 (zero_extend:V4SI (reg:V4HI 32 v0)))
  (nil))

to produce

(insn 2 5 110 2 (set (reg:V4SI 33 v1)
 (zero_extend:V4SI (reg:V4HI 32 v0)))
  (nil))
(insn 110 2 104 2 (set (reg:V4SI 40 v8)
 (reg:V4SI 33 v1))
  (nil))

The new insn 2 using v0 directly is correct, but the insn 110 it creates is
wrong, `v8` should still be V4HI.

or it also needs to eliminate the zero extension from the extracts, so instead
of

(insn 11 10 12 2 (set (reg:SI 1 x1)
 (zero_extend:SI (vec_select:HI (reg/v:V4HI 40 v8)
 (parallel [
 (const_int 3)
 ]
  (nil))

it should be

(insn 11 10 12 2 (set (reg:SI 1 x1)
 (vec_select:SI (reg/v:V4SI 40 v8)
 (parallel [
 (const_int 3)
 ])))
  (nil))

without doing so the indices have been remapped in the extension and so we
extract the wrong elements

At any other optimization level but -Os ree seems to abort so this doesn't
trigger:

Trying to eliminate extension:
(insn 54 90 101 2 (set (reg:V4SI 32 v0)
 (zero_extend:V4SI (reg/v:V4HI 40 v8)))
  (nil))
Elimination opportunities = 2 realized = 0

purely due to the ordering of instructions. REE doesn't check uses of `v8`
because it assumes that with a zero extended value, you still have access to the
lower bits by using the the bottom part of the register.

This is true for scalar but not for vector.  This would have been fine as well
if REE had eliminated the zero_extend on insn 11 and the rest but it doesn't do
so since REE can only handle cases where the SRC value are REG_P.

It does try to do this in add_removable_extension:

  1160  /* For vector mode extensions, ensure that all uses of the
  1161 XEXP (src, 0) register are in insn or debug insns, as unlike
  1162 integral extensions lowpart subreg of the sign/zero extended
  1163 register are not equal to the original register, so we have
  1164 to change all uses or none and the current code isn't able
  1165 to change them all at once in one transaction.  */

However this code doesn't trigger for the example because REE doesn't check the
uses if the defining instruction doesn't feed into another extension..

Which is bogus. For vectors it should always check all usages.

r12-2288-g8695bf78dad1a42636775843ca832a2f4dba4da3 simply exposed this as it now
lowers VEC_SELECT 0 into the RTL canonical form subreg 0 which causes REE to run
more often.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR rtl-optimization/103350
* ree.c (add_removable_extension): Don't stop at first definition but
inspect all.

gcc/testsuite/ChangeLog:

PR rtl-optimization/103350
* gcc.target/aarch64/pr103350-1.c: New test.
* gcc.target/aarch64/pr103350-2.c: New test.

OK.
jeff



Re: [PATCH 1/2] Sync with binutils: GCC: Pass --plugin to AR and RANLIB

2021-12-14 Thread H.J. Lu via Gcc-patches
On Tue, Dec 14, 2021 at 3:30 PM Jeff Law  wrote:
>
>
>
> On 11/22/2021 7:29 PM, H.J. Lu wrote:
> > On Mon, Nov 22, 2021 at 4:29 PM Jeff Law  wrote:
> >>
> >>
> >> On 11/13/2021 9:33 AM, H.J. Lu via Gcc-patches wrote:
> >>> Sync with binutils for building binutils with LTO:
> >>>
> >>>   From 50ad1254d5030d0804cbf89c758359ae202e8d55 Mon Sep 17 00:00:00 2001
> >>> From: "H.J. Lu" 
> >>> Date: Sat, 9 Jan 2021 06:43:11 -0800
> >>> Subject: [PATCH] GCC: Pass --plugin to AR and RANLIB
> >>>
> >>> Detect GCC LTO plugin.  Pass --plugin to AR and RANLIB to support LTO
> >>> build.
> >>>
> >>>* Makefile.tpl (AR): Add @AR_PLUGIN_OPTION@
> >>>(RANLIB): Add @RANLIB_PLUGIN_OPTION@.
> >>>* configure.ac: Include config/gcc-plugin.m4.
> >>>AC_SUBST AR_PLUGIN_OPTION and RANLIB_PLUGIN_OPTION.
> >>>* libtool.m4 (_LT_CMD_OLD_ARCHIVE): Pass --plugin to AR and
> >>>RANLIB if possible.
> >>>* Makefile.in: Regenerated.
> >>>* configure: Likewise.
> >>>
> >>> config/
> >>>
> >>>* gcc-plugin.m4 (GCC_PLUGIN_OPTION): New.
> >>>
> >>> libiberty/
> >>>
> >>>* Makefile.in (AR): Add @AR_PLUGIN_OPTION@
> >>>(RANLIB): Add @RANLIB_PLUGIN_OPTION@.
> >>>(configure_deps): Depend on ../config/gcc-plugin.m4.
> >>>* configure.ac: AC_SUBST AR_PLUGIN_OPTION and
> >>>RANLIB_PLUGIN_OPTION.
> >>>* aclocal.m4: Regenerated.
> >>>* configure: Likewise.
> >>>
> >>> zlib/
> >>>
> >>>* configure: Regenerated.
> >> I thought the plugins were automatically loaded if they're in the right
> >> place in the filesystem.  Wouldn't that make this patch unnecessary?  Am
> >> I missing something?
> >>
> > It only works for system GCC and binutils.  It doesn't work for non-system
> > GCC nor binutils since either GCC plugin isn't installed in the binutils 
> > plugin
> > search patch.
> Ah.  So this is primarily useful if GCC was installed into a path
> different than the system binutils expects to find the plugin? Does it

Yes.

> work properly in cross environments or at least do no harm in those
> kinds of builds?
>

I believe so.

-- 
H.J.


[PATCH] [i386][avx512]Add combine splitter to transform vpternlogd/vpcmpeqd/vpxor/vblendvps to vblendvps for ~op0

2021-12-14 Thread Haochen Jiang via Gcc-patches
Hi all,

This patch fix the regression previously reported on the combine splitter under 
'-m32 -march=cascadelake' options.

Regtested on x86_64-pc-linux-gnu.

BRs,
Haochen

gcc/ChangeLog:

PR target/100738
* config/i386/sse.md (*avx_cmp3_lt, *avx_cmp3_ltint):
Remove MEM_P restriction and add force_reg for operands[2].
(*avx_cmp3_ltint_not): Add new define_insn_and_split.

gcc/testsuite/ChangeLog:

PR target/100738
* g++.target/i386/avx512vl-pr100738-1.C: New test.
---
 gcc/config/i386/sse.md| 44 +--
 .../g++.target/i386/avx512vl-pr100738-1.C |  8 
 2 files changed, 48 insertions(+), 4 deletions(-)
 create mode 100755 gcc/testsuite/g++.target/i386/avx512vl-pr100738-1.C

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 5421fb51684..8ec9fb075d0 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -3528,8 +3528,7 @@
 UNSPEC_PCMP)))]
   "TARGET_AVX512VL && ix86_pre_reload_split ()
   /* LT or GE 0 */
-  && ((INTVAL (operands[5]) == 1 && !MEM_P (operands[2]))
-  || (INTVAL (operands[5]) == 5 && !MEM_P (operands[1])))"
+  && ((INTVAL (operands[5]) == 1) || (INTVAL (operands[5]) == 5))"
   "#"
   "&& 1"
   [(set (match_dup 0)
@@ -3543,6 +3542,7 @@
 {
   if (INTVAL (operands[5]) == 5)
 std::swap (operands[1], operands[2]);
+  operands[2] = force_reg (mode, operands[2]);
 })
 
 (define_insn_and_split "*avx_cmp3_ltint"
@@ -3557,8 +3557,7 @@
 UNSPEC_PCMP)))]
   "TARGET_AVX512VL && ix86_pre_reload_split ()
   /* LT or GE 0 */
-  && ((INTVAL (operands[5]) == 1 && !MEM_P (operands[2]))
-  || (INTVAL (operands[5]) == 5 && !MEM_P (operands[1])))"
+  && ((INTVAL (operands[5]) == 1) || (INTVAL (operands[5]) == 5))"
   "#"
   "&& 1"
   [(set (match_dup 0)
@@ -3575,7 +3574,44 @@
 std::swap (operands[1], operands[2]);
   operands[0] = gen_lowpart (mode, operands[0]);
   operands[1] = gen_lowpart (mode, operands[1]);
+  operands[2] = force_reg (mode,
+ gen_lowpart (mode, operands[2]));
+})
+
+(define_insn_and_split "*avx_cmp3_ltint_not"
+ [(set (match_operand:VI48_AVX  0 "register_operand")
+   (vec_merge:VI48_AVX
+(match_operand:VI48_AVX 1 "vector_operand")
+(match_operand:VI48_AVX 2 "vector_operand")
+(unspec:
+  [(subreg:VI48_AVX
+   (not:
+ (match_operand: 3 "vector_operand")) 0)
+   (match_operand:VI48_AVX 4 "const0_operand")
+   (match_operand:SI 5 "const_0_to_7_operand")]
+   UNSPEC_PCMP)))]
+  "TARGET_AVX512VL && ix86_pre_reload_split ()
+  /* not LT or GE 0 */
+  && ((INTVAL (operands[5]) == 1) || (INTVAL (operands[5]) == 5))"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (unspec:
+ [(match_dup 1)
+  (match_dup 2)
+  (subreg:
+(lt:VI48_AVX
+ (match_dup 3)
+ (match_dup 4)) 0)]
+   UNSPEC_BLENDV))]
+{
+  if (INTVAL (operands[5]) == 5)
+std::swap (operands[1], operands[2]);
+  operands[0] = gen_lowpart (mode, operands[0]);
+  operands[1] = force_reg (mode,
+ gen_lowpart (mode, operands[1]));
   operands[2] = gen_lowpart (mode, operands[2]);
+  operands[3] = lowpart_subreg (mode, operands[3], mode);
 })
 
 (define_insn "avx_vmcmp3"
diff --git a/gcc/testsuite/g++.target/i386/avx512vl-pr100738-1.C 
b/gcc/testsuite/g++.target/i386/avx512vl-pr100738-1.C
new file mode 100755
index 000..ac4d62b94d1
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/avx512vl-pr100738-1.C
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=cascadelake" } */
+/* { dg-final {scan-assembler-times "vblendvps\[ \\t\]" 2 } } */
+/* { dg-final {scan-assembler-not "vpcmpeqd\[ \\t\]" } } */
+/* { dg-final {scan-assembler-not "vpxor\[ \\t\]" } } */
+/* { dg-final {scan-assembler-not "vpternlogd\[ \\t\]" } } */
+
+#include "pr100738-1.C"
-- 
2.18.1



[PATCH] Verbose support in analyze_brprob_spec

2021-12-14 Thread Xionghu Luo via Gcc-patches
Also add verbose argument support like analyze_brprob.py

contrib/ChangeLog:

* analyze_brprob_spec.py: Add verbose argument.
---
 contrib/analyze_brprob_spec.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/contrib/analyze_brprob_spec.py b/contrib/analyze_brprob_spec.py
index e621853ba4e..063bd11d99c 100755
--- a/contrib/analyze_brprob_spec.py
+++ b/contrib/analyze_brprob_spec.py
@@ -31,6 +31,7 @@ parser.add_argument('-s', '--sorting', dest = 'sorting',
 choices = ['branches', 'branch-hitrate', 'hitrate', 'coverage', 'name'],
 default = 'branches')
 parser.add_argument('-d', '--def-file', help = 'path to predict.def')
+parser.add_argument('-v', '--verbose', action = 'store_true', help = 'Print 
verbose informations')
 
 args = parser.parse_args()
 
-- 
2.27.0.90.geebb51ba8c



Re: [PATCH take #2] PR target/43892: Some carry flag (CA) optimizations on PowerPC.

2021-12-14 Thread Segher Boessenkool
Hi!

On Fri, Dec 03, 2021 at 07:42:52PM -, Roger Sayle wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/addcmp.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +unsigned long add_leu(unsigned long a, unsigned long b, unsigned long c) {
> +  return a + (b <= c);
> +}
> +
> +unsigned long add_geu(unsigned long a, unsigned long b, unsigned long c) {
> +  return a + (b >= c);
> +}
> +
> +/* { dg-final { scan-assembler-times "addze " 2 } } */

Does this work with -mcpu=power10 after your patch?  (It doesn't before
it :-) )

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr43892.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +unsigned long foo(unsigned long sum, unsigned long x)
> +{
> +  unsigned long z = sum + x;
> +  if (sum + x < x)
> +z++;
> +  return z;
> +}
> +
> +/* { dg-final { scan-assembler "addze " } } */

Same question here.

Thanks!


Segher


Re: [PATCH] rs6000: __builtin_darn[_raw] should be in [power9-64] (PR103624)

2021-12-14 Thread Segher Boessenkool
On Tue, Dec 14, 2021 at 07:32:30AM -0600, Bill Schmidt wrote:
> On 12/13/21 6:22 PM, Segher Boessenkool wrote:
> > On Mon, Dec 13, 2021 at 02:37:43PM -0600, Bill Schmidt wrote:
> >> On 12/13/21 10:54 AM, Segher Boessenkool wrote:
> >>> On Mon, Dec 13, 2021 at 11:30:28AM -0500, David Edelsohn wrote:
>  On Mon, Dec 13, 2021 at 10:48 AM Bill Schmidt  
>  wrote:
> > PR103624 observes that we get segfaults for the 64-bit darn builtins 
> > when compiled
> > on a 32-bit architecture.  The old built-in infrastructure requires 
> > TARGET_64BIT, and
> > this was missed in the new support.  Moving these two builtins from the 
> > [power9]
> > stanza to the [power9-64] stanza solves the problem.
> >
> > Tested the fix on a powerpc-e300c3-linux-gnu cross.  Bootstrapped and 
> > tested on
> > powerpc64le-linux-gnu with no regressions.  Is this okay for trunk?
>  Okay.
> >>> No, as I said before this is not correct, not without a lot more
> >>> explanation at least.  We should not copy errors in the old code into
> >>> the new code.  That is negating one of the main advantages of
> >>> reimplementing this in the first place!
> >> Can you please be more specific?
> >>
> >> All I have from you before is "It should work for 32-bit though?"  I 
> >> responded in the
> >> bug report that __builtin_darn_32 was used for this purpose.  I haven't 
> >> seen a
> >> response to that.  What do you want to see happen?
> > That of course does not work for _raw.
> >
> > These builtins should just return a "long", just like __builtin_ppc_mftb
> > does.  All three of them.
> 
> Well, that seems wrong for __builtin_darn_32, which maps to an SImode pattern.

That is Yet Another Bug, then.

The insn returns a full register.  The patterns should use either :P or
:GPR (the latter if SImode makes sense for it, so we could have that for
all darn variants).  :DI and :SI never make sense for this.

> So, I assume what you'd like to see is for the other two built-ins to return
> long, and for the "&& TARGET_64BIT" to be removed from the darn_raw and darn
> patterns?

No, all builtins should work in either mode, and always return long.
If the patterns are broken, the *patterns* should be fixed :-)

> > Avoiding ICEs should not be a goal.  It should be a side effect of doing
> > the right thing in the first place!
> 
> There's no reason to get snippy.  Given that you approved Kelvin's original
> implementation of the darn patterns and built-in functions, I think I can be
> forgiven for thinking that those were the desired semantics. :-)

Sorry if I sound annoyed.  I am annoyed, but not with you.  Just with
the world in general I suppose.

With the new builtins representation it is much easier to spot problems,
it is a great success already!


Segher


Re: [PATCH 2/3] Fix incorrect loop exit edge probability [PR103270]

2021-12-14 Thread Xionghu Luo via Gcc-patches



On 2021/12/14 17:27, Xionghu Luo via Gcc-patches wrote:
> 
> 
> On 2021/12/13 17:25, Jan Hubicka wrote:
>>> r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
>>> profile-estimate when predict_extra_loop_exits, outer loop's exit edge
>>> is marked as inner loop's extra loop exit and set with incorrect
>>> prediction, then a hot inner loop will become cold loop finally through
>>> optimizations, this patch add loop check when searching extra exit edges
>>> to avoid unexpected predict_edge from predict_paths_for_bb.
>>>
>>> Regression tested on P8LE, OK for master?
>>>
>>> gcc/ChangeLog:
>>>
>>> PR middle-end/103270
>>> * predict.c (predict_extra_loop_exits): Add loop parameter.
>>> (predict_loops): Call with loop argument.
>>
>> With changes to branch predictors it is useful to re-test their
>> effectivity on spec and see if their hitrates are still mathcing
>> reality.  You can do it by buiding spec with -fprofile-generate, train
>> it and then build with -fprofile-use -fdump-tree-ipa-profile-details
>> and use contrib/analyze_brprob.py that will collect info on how they
>> work.
>>
>> This patch looks good to me, but it would be nice to have things reality
>> checked (and since we did not do the stats for some time, there may be
>> surprises) so if you could run the specs and post results of
>> analyze_brprob, it would be great.  I will also try to get to that soon,
>> but currently I am bit swamped by other problems I noticed on clang
>> builds.
>>
>> Thanks a lot for working on profile fixes - I am trying now to get
>> things into shape.  With Martin we added basic testing infrastructure
>> for keeping track of profile updates and I am trying to see how it works
>> in practice now.  Hopefully it will make it easier to judge on profile
>> updating patches. I would welcome list of patches I should look at.
>>
>> I will write separate mail on this.
>> Honza
> 
> 
> With the patch, the analyze_brprob.py outputs below data with PGO build,
> there is no verification code in the script, so how to check whether it
> is correct?  Run it again without the patch and compare "extra loop exit"
> field?
> 
> 
> ./contrib/analyze_brprob.py ~/workspace/tests/spec2017/dump_file_all
> HEURISTICS   BRANCHES  (REL)  BR. HITRATE 
>HITRATE   COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches 
> (>10%)
> noreturn call   1   0.0%  100.00%   
> 50.00% /  50.00%  2 2.00   0.0% 100%:1
> Fortran zero-sized array3   0.0%   66.67%   
> 41.71% /  60.50%362   362.00   0.0% 100%:3
> loop iv compare16   0.0%   93.75%   
> 98.26% /  98.76% 279847  279.85k   0.0% 93%:4
> __builtin_expect   35   0.0%   97.14%   
> 78.09% /  78.35%   17079558   17.08M   0.0%
> loop guard with recursion  45   0.1%   86.67%   
> 85.13% /  85.14% 67224244126.72G   1.3% 74%:4
> extra loop exit80   0.1%   58.75%   
> 81.49% /  89.21%  438470261  438.47M   0.1% 86%:3
> guess loop iv compare 235   0.3%   80.85%   
> 52.83% /  73.97%  148558247  148.56M   0.0% 47%:3
> negative return   241   0.3%   71.37%   
> 25.33% /  92.61%  250402383  250.40M   0.0% 69%:2
> loop exit with recursion  315   0.4%   74.60%   
> 85.07% /  85.71% 94031368589.40G   1.8% 59%:4
> const return  320   0.4%   51.88%   
> 90.45% /  95.63%  925341727  925.34M   0.2% 76%:5
> indirect call 377   0.5%   51.46%   
> 84.72% /  91.14% 21337728482.13G   0.4% 69%:1
> polymorphic call  410   0.5%   44.15%   
> 31.26% /  79.37% 32726882443.27G   0.6% 53%:2
> recursive call506   0.7%   39.53%   
> 44.97% /  83.92% 12110368061.21G   0.2% 10%:1
> goto  618   0.8%   64.24%   
> 65.37% /  83.57%  702446178  702.45M   0.1% 20%:1
> null return   800   1.1%   64.62%   
> 56.59% /  77.70%  603952067  603.95M   0.1% 28%:2
> continue  956   1.3%   63.70%   
> 65.65% /  79.97% 37803037993.78G   0.7% 52%:3
> loop guard   1177   1.6%   56.33%   
> 42.54% /  80.32% 73736014577.37G   1.4% 50%:2
> opcode values positive (on trees)