[PATCH] RISC-V: Define __riscv_cmodel_medany for PIC mode.

2020-09-24 Thread Kito Cheng
 - According the conclusion in RISC-V C API document, we decide to deprecat
   the __riscv_cmodel_pic marco

 - __riscv_cmodel_pic is deprecated and will removed in next GCC
   release.

[1] https://github.com/riscv/riscv-c-api-doc/pull/11
---
 gcc/config/riscv/riscv-c.c| 7 ---
 gcc/testsuite/gcc.target/riscv/predef-3.c | 6 +++---
 gcc/testsuite/gcc.target/riscv/predef-6.c | 6 +++---
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/gcc/config/riscv/riscv-c.c b/gcc/config/riscv/riscv-c.c
index 735f2f2f513f..9221fcbaca5d 100644
--- a/gcc/config/riscv/riscv-c.c
+++ b/gcc/config/riscv/riscv-c.c
@@ -90,12 +90,13 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
   builtin_define ("__riscv_cmodel_medlow");
   break;
 
+case CM_PIC:
+  builtin_define ("__riscv_cmodel_pic");
+  /* FALLTHROUGH. */
+
 case CM_MEDANY:
   builtin_define ("__riscv_cmodel_medany");
   break;
 
-case CM_PIC:
-  builtin_define ("__riscv_cmodel_pic");
-  break;
 }
 }
diff --git a/gcc/testsuite/gcc.target/riscv/predef-3.c 
b/gcc/testsuite/gcc.target/riscv/predef-3.c
index 6f4f2e219941..d7c9793b3d7c 100644
--- a/gcc/testsuite/gcc.target/riscv/predef-3.c
+++ b/gcc/testsuite/gcc.target/riscv/predef-3.c
@@ -55,11 +55,11 @@ int main () {
 #if defined(__riscv_cmodel_medlow)
 #error "__riscv_cmodel_medlow"
 #endif
-#if defined(__riscv_cmodel_medany)
-#error "__riscv_cmodel_medlow"
+#if !defined(__riscv_cmodel_medany)
+#error "__riscv_cmodel_medany"
 #endif
 #if !defined(__riscv_cmodel_pic)
-#error "__riscv_cmodel_medlow"
+#error "__riscv_cmodel_pic"
 #endif
 
   return 0;
diff --git a/gcc/testsuite/gcc.target/riscv/predef-6.c 
b/gcc/testsuite/gcc.target/riscv/predef-6.c
index ee4e02bcb63e..7530f9598aeb 100644
--- a/gcc/testsuite/gcc.target/riscv/predef-6.c
+++ b/gcc/testsuite/gcc.target/riscv/predef-6.c
@@ -55,11 +55,11 @@ int main () {
 #if defined(__riscv_cmodel_medlow)
 #error "__riscv_cmodel_medlow"
 #endif
-#if defined(__riscv_cmodel_medany)
-#error "__riscv_cmodel_medlow"
+#if !defined(__riscv_cmodel_medany)
+#error "__riscv_cmodel_medany"
 #endif
 #if !defined(__riscv_cmodel_pic)
-#error "__riscv_cmodel_medlow"
+#error "__riscv_cmodel_medpic"
 #endif
 
   return 0;
-- 
2.28.0



Re: [PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-24 Thread Richard Biener via Gcc-patches
On September 25, 2020 5:50:40 AM GMT+02:00, xionghu luo  
wrote:
>Hi,
>
>On 2020/9/24 21:27, Richard Biener wrote:
>> On Thu, Sep 24, 2020 at 10:21 AM xionghu luo 
>wrote:
>> 
>> I'll just comment that
>> 
>>  xxperm 34,34,33
>>  xxinsertw 34,0,12
>>  xxperm 34,34,32
>> 
>> doesn't look like a variable-position insert instruction but
>> this is a variable whole-vector rotate plus an insert at index zero
>> followed by a variable whole-vector rotate.  I'm not fluend in
>> ppc assembly but
>> 
>>  rlwinm 6,6,2,28,29
>>  mtvsrwz 0,5
>>  lvsr 1,0,6
>>  lvsl 0,0,6
>> 
>> possibly computes the shift masks for r33/r32?  though
>> I do not see those registers mentioned...
>
>For V4SI:
>   rlwinm 6,6,2,28,29  // r6*4
>   mtvsrwz 0,5 // vs0   <- r5  (0xfe)
>   lvsr 1,0,6  // vs33  <- lvsr[r6]
>   lvsl 0,0,6  // vs32  <- lvsl[r6] 
>   xxperm 34,34,33   
>   xxinsertw 34,0,12
>   xxperm 34,34,32
>   blr
>
>
>idx = idx * 4; 
>00  0x4000300020001  
>xxperm:0x4000300020001  
>vs33:0x101112131415161718191a1b1c1d1e1f 
>vs32:0x102030405060708090a0b0c0d0e0f
>14  0x4000300020001  
>xxperm:0x1000400030002  
>vs33:0xc0d0e0f101112131415161718191a1b  
>vs32:0x405060708090a0b0c0d0e0f10111213
>28  0x4000300020001  
>xxperm:0x2000100040003  
>vs33:0x8090a0b0c0d0e0f1011121314151617  
>vs32:0x8090a0b0c0d0e0f1011121314151617
>312 0x4000300020001  
>xxperm:0x3000200010004  
>vs33:0x405060708090a0b0c0d0e0f10111213  
>vs32:0xc0d0e0f101112131415161718191a1b
>
>vs34:
> 0x40003000200fe
> 0x4000300fe0001
> 0x400fe00020001
>0xfe000300020001
>
>
>"xxinsertw 34,0,12" will always insert vs0[32:63] content to the forth
>word of
>target vector, bits[96:127].  Then the second xxperm rotate the
>modified vector
>back. 
>
>All the instructions are register based operation, as Segher replied,
>power9
>supports only fixed position inserts, so we need do some trick here to
>support
>it instead of generate short store wide load instructions.

OK, I fair enough - I've heard power 10 does support those inserts. 

>
>> 
>> This might be a generic viable expansion strathegy btw,
>> which is why I asked before whether the CPU supports
>> inserts at a variable position ...  the building blocks are
>> already there with vec_set at constant zero position
>> plus vec_perm_const for the rotates.
>> 
>> But well, I did ask this question.  Multiple times.
>> 
>> ppc does _not_ have a VSX instruction
>> like xxinsertw r34, r8, r12 where r8 denotes
>> the vector element (or byte position or whatever).
>> 
>> So I don't think vec_set with a variable index is the
>> best approach.
>> Xionghu - you said even without the patch the stack
>> storage is eventually elided but
>> 
>>  addi 9,1,-16
>>  rldic 6,6,2,60
>>  stxv 34,-16(1)
>>  stwx 5,9,6
>>  lxv 34,-16(1)
>> 
>> still shows stack(?) store/load with a bad STLF penalty.
>
>
>Sorry that if I didn't describe clearly and misunderstood you, I mean
>if insert many
>instructions(tested with a loop inserted) between "stwx 5,9,6" and "lxv
>34,-16(1)",
>the store hit load performance issue could be elided, but this is not
>the solution
>we want.
>
>I also changed your test as below and build for X86, seems it also
>generates
>inefficient code?  What my patch does maybe different usage from your
>pasted
>case? 
>
>#define N 32
>typedef int T;
>typedef T V __attribute__((vector_size(N)));
>  V setg3 (V v, int idx, T val)
>{
>v[idx&31] = val;
>return v;
>}
>
>-O2 -S -mavx -march=znver2:
>
>setg3:
>pushrbp
>and edi, 31
>mov rbp, rsp
>and rsp, -32
>vmovdqa YMMWORD PTR [rsp-32], ymm0
>mov DWORD PTR [rsp-32+rdi*4], esi
>vmovdqa ymm0, YMMWORD PTR [rsp-32]
>leave
>ret
>
>
>While idx is constant: 
>
>setg3:
>vpinsrd xmm1, xmm0, esi, 3
>vinserti128 ymm0, ymm0, xmm1, 0x0
>ret
>
>And ARM with -O2 -S -march=armv8.2-a+sve (N change to 16): 
>
>setg3:
>sub sp, sp, #16
>and x0, x0, 15
>str q0, [sp]
>str w1, [sp, x0, lsl 2]
>ldr q0, [sp]
>add sp, sp, 16
>ret
>
>While idx is constant: 
>
>setg3:
>ins v0.s[3], w1
>ret
>
>
>Though I've no idea how to optimize this on X86 and ARM with vector
>instructions
>to avoid short store with wide load followed on stack.

Yes, all targets suffer from this. On x86 you can play similar tricks as on 
power9, 
See my attempt in the other mail (and Seghers fix). 

Richard. 

>
>Thanks,
>Xionghu



Re: [PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-24 Thread xionghu luo via Gcc-patches
Hi,

On 2020/9/24 21:27, Richard Biener wrote:
> On Thu, Sep 24, 2020 at 10:21 AM xionghu luo  wrote:
> 
> I'll just comment that
> 
>  xxperm 34,34,33
>  xxinsertw 34,0,12
>  xxperm 34,34,32
> 
> doesn't look like a variable-position insert instruction but
> this is a variable whole-vector rotate plus an insert at index zero
> followed by a variable whole-vector rotate.  I'm not fluend in
> ppc assembly but
> 
>  rlwinm 6,6,2,28,29
>  mtvsrwz 0,5
>  lvsr 1,0,6
>  lvsl 0,0,6
> 
> possibly computes the shift masks for r33/r32?  though
> I do not see those registers mentioned...

For V4SI:
   rlwinm 6,6,2,28,29  // r6*4
   mtvsrwz 0,5 // vs0   <- r5  (0xfe)
   lvsr 1,0,6  // vs33  <- lvsr[r6]
   lvsl 0,0,6  // vs32  <- lvsl[r6] 
   xxperm 34,34,33   
   xxinsertw 34,0,12
   xxperm 34,34,32
   blr


idx = idx * 4; 
00  0x4000300020001   xxperm:0x4000300020001   
vs33:0x101112131415161718191a1b1c1d1e1f  vs32:0x102030405060708090a0b0c0d0e0f
14  0x4000300020001   xxperm:0x1000400030002   
vs33:0xc0d0e0f101112131415161718191a1b   vs32:0x405060708090a0b0c0d0e0f10111213
28  0x4000300020001   xxperm:0x2000100040003   
vs33:0x8090a0b0c0d0e0f1011121314151617   vs32:0x8090a0b0c0d0e0f1011121314151617
312 0x4000300020001   xxperm:0x3000200010004   
vs33:0x405060708090a0b0c0d0e0f10111213   vs32:0xc0d0e0f101112131415161718191a1b

vs34:
 0x40003000200fe
 0x4000300fe0001
 0x400fe00020001
0xfe000300020001


"xxinsertw 34,0,12" will always insert vs0[32:63] content to the forth word of
target vector, bits[96:127].  Then the second xxperm rotate the modified vector
back. 

All the instructions are register based operation, as Segher replied, power9
supports only fixed position inserts, so we need do some trick here to support
it instead of generate short store wide load instructions.


> 
> This might be a generic viable expansion strathegy btw,
> which is why I asked before whether the CPU supports
> inserts at a variable position ...  the building blocks are
> already there with vec_set at constant zero position
> plus vec_perm_const for the rotates.
> 
> But well, I did ask this question.  Multiple times.
> 
> ppc does _not_ have a VSX instruction
> like xxinsertw r34, r8, r12 where r8 denotes
> the vector element (or byte position or whatever).
> 
> So I don't think vec_set with a variable index is the
> best approach.
> Xionghu - you said even without the patch the stack
> storage is eventually elided but
> 
>  addi 9,1,-16
>  rldic 6,6,2,60
>  stxv 34,-16(1)
>  stwx 5,9,6
>  lxv 34,-16(1)
> 
> still shows stack(?) store/load with a bad STLF penalty.


Sorry that if I didn't describe clearly and misunderstood you, I mean if insert 
many
instructions(tested with a loop inserted) between "stwx 5,9,6" and "lxv 
34,-16(1)",
the store hit load performance issue could be elided, but this is not the 
solution
we want.

I also changed your test as below and build for X86, seems it also generates
inefficient code?  What my patch does maybe different usage from your pasted
case? 

#define N 32
typedef int T;
typedef T V __attribute__((vector_size(N)));
  V setg3 (V v, int idx, T val)
{
v[idx&31] = val;
return v;
}

-O2 -S -mavx -march=znver2:

setg3:
pushrbp
and edi, 31
mov rbp, rsp
and rsp, -32
vmovdqa YMMWORD PTR [rsp-32], ymm0
mov DWORD PTR [rsp-32+rdi*4], esi
vmovdqa ymm0, YMMWORD PTR [rsp-32]
leave
ret


While idx is constant: 

setg3:
vpinsrd xmm1, xmm0, esi, 3
vinserti128 ymm0, ymm0, xmm1, 0x0
ret

And ARM with -O2 -S -march=armv8.2-a+sve (N change to 16): 

setg3:
sub sp, sp, #16
and x0, x0, 15
str q0, [sp]
str w1, [sp, x0, lsl 2]
ldr q0, [sp]
add sp, sp, 16
ret

While idx is constant: 

setg3:
ins v0.s[3], w1
ret


Though I've no idea how to optimize this on X86 and ARM with vector instructions
to avoid short store with wide load followed on stack.


Thanks,
Xionghu


[PATCH v2] PR target/96759 - Handle global variable assignment from misaligned structure/PARALLEL return values.

2020-09-24 Thread Kito Cheng
In g:70cdb21e579191fe9f0f1d45e328908e59c0179e, DECL/global variable has handled
misaligned stores, but it didn't handle PARALLEL values, and I refer the
other part of this function, I found the PARALLEL need handled by
emit_group_* functions, so I add a check, and using emit_group_store if
storing a PARALLEL value, also checked this change didn't break the
testcase(gcc.target/arm/unaligned-argument-3.c) added by the orginal changes.

For riscv64 target, struct S {int a; double b;} will pack into a parallel
value to return and it has TImode when misaligned access is supported,
however TImode required 16-byte align, but it only 8-byte align, so it go to
the misaligned stores handling, then it will try to generate move
instruction from a PARALLEL value.

Tested on following target without introduced new reguression:
  - riscv32/riscv64 elf
  - x86_64-linux
  - arm-eabi

v2 changes:
  - Use maybe_emit_group_store instead of emit_group_store.
  - Remove push_temp_slots/pop_temp_slots, emit_group_store only require
stack temp slot when dst is CONCAT or PARALLEL, however
maybe_emit_group_store will always use REG for dst if needed.

gcc/ChangeLog:

PR target/96759
* expr.c (expand_assignment): Handle misaligned stores with PARALLEL
value.

gcc/testsuite/ChangeLog:

PR target/96759
* g++.target/riscv/pr96759.C: New.
* gcc.target/riscv/pr96759.c: New.
---
 gcc/expr.c   |  2 ++
 gcc/testsuite/g++.target/riscv/pr96759.C |  8 
 gcc/testsuite/gcc.target/riscv/pr96759.c | 13 +
 3 files changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/g++.target/riscv/pr96759.C
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr96759.c

diff --git a/gcc/expr.c b/gcc/expr.c
index 1a15f24b3979..6eb13a12c8c5 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -5168,6 +5168,8 @@ expand_assignment (tree to, tree from, bool nontemporal)
   rtx reg, mem;
 
   reg = expand_expr (from, NULL_RTX, VOIDmode, EXPAND_NORMAL);
+  /* Handle PARALLEL.  */
+  reg = maybe_emit_group_store (reg, TREE_TYPE (from));
   reg = force_not_mem (reg);
   mem = expand_expr (to, NULL_RTX, VOIDmode, EXPAND_WRITE);
   if (TREE_CODE (to) == MEM_REF && REF_REVERSE_STORAGE_ORDER (to))
diff --git a/gcc/testsuite/g++.target/riscv/pr96759.C 
b/gcc/testsuite/g++.target/riscv/pr96759.C
new file mode 100644
index ..673999a4baf7
--- /dev/null
+++ b/gcc/testsuite/g++.target/riscv/pr96759.C
@@ -0,0 +1,8 @@
+/* { dg-options "-mno-strict-align -std=gnu++17" } */
+/* { dg-do compile } */
+struct S {
+  int a;
+  double b;
+};
+S GetNumbers();
+auto [globalC, globalD] = GetNumbers();
diff --git a/gcc/testsuite/gcc.target/riscv/pr96759.c 
b/gcc/testsuite/gcc.target/riscv/pr96759.c
new file mode 100644
index ..621c39196fca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr96759.c
@@ -0,0 +1,13 @@
+/* { dg-options "-mno-strict-align" } */
+/* { dg-do compile } */
+
+struct S {
+  int a;
+  double b;
+};
+struct S GetNumbers();
+struct S g;
+
+void foo(){
+  g = GetNumbers();
+}
-- 
2.28.0



[r11-3434 Regression] FAIL: gcc.dg/ipa/ipa-pta-13.c scan-tree-dump fre3 " = x;" on Linux/x86_64 (-m64)

2020-09-24 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

c33f474239308d81bf96cfdb2520d25488ad8724 is the first bad commit
commit c33f474239308d81bf96cfdb2520d25488ad8724
Author: Jan Hubicka 
Date:   Thu Sep 24 15:09:17 2020 +0200

Add access through parameter derference tracking to modref

caused

FAIL: gcc.dg/ipa/ipa-pta-13.c scan-tree-dump fre3 " = x;"

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-3434/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/ipa-pta-13.c --target_board='unix{-m64}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[r11-3436 Regression] FAIL: g++.dg/template/local-fn4.C -std=c++98 (test for excess errors) on Linux/x86_64 (-m64)

2020-09-24 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

2e66e53b1efb98f5cf6b0a123990c1ca999affd7 is the first bad commit
commit 2e66e53b1efb98f5cf6b0a123990c1ca999affd7
Author: Nathan Sidwell 
Date:   Thu Sep 24 06:17:00 2020 -0700

c++: local-decls are never member fns [PR97186]

caused

FAIL: g++.dg/template/local-fn4.C  -std=c++98  (test for errors, line 11)
FAIL: g++.dg/template/local-fn4.C  -std=c++98 (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-3436/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/template/local-fn4.C --target_board='unix{-m64}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[r11-3434 Regression] FAIL: gcc.target/i386/sse2-mmx-pinsrw.c execution test on Linux/x86_64 (-m64)

2020-09-24 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

c33f474239308d81bf96cfdb2520d25488ad8724 is the first bad commit
commit c33f474239308d81bf96cfdb2520d25488ad8724
Author: Jan Hubicka 
Date:   Thu Sep 24 15:09:17 2020 +0200

Add access through parameter derference tracking to modref

caused

FAIL: gcc.target/i386/sse2-mmx-pinsrw.c execution test

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-3434/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/sse2-mmx-pinsrw.c 
--target_board='unix{-m64}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: [PATCH][testsuite] Add effective target ident_directive

2020-09-24 Thread Mike Stump via Gcc-patches
On Sep 24, 2020, at 2:38 AM, Tom de Vries  wrote:
> 
> Fix this by adding an effective target ident_directive, and requiring
> it in both test-cases.

> OK for trunk?

Ok.


Re: [PATCH 2/2, rs6000] VSX load/store rightmost element operations

2020-09-24 Thread Segher Boessenkool
On Thu, Sep 24, 2020 at 11:04:38AM -0500, will schmidt wrote:
> [PATCH 2/2, rs6000] VSX load/store rightmost element operations
> 
> Hi,
>   This adds support for the VSX load/store rightmost element operations.
> This includes the instructions lxvrbx, lxvrhx, lxvrwx, lxvrdx,
> stxvrbx, stxvrhx, stxvrwx, stxvrdx; And the builtins
> vec_xl_sext() /* vector load sign extend */
> vec_xl_zext() /* vector load zero extend */
> vec_xst_trunc() /* vector store truncate */.
> 
> Testcase results show that the instructions added with this patch show
> up at low/no optimization (-O0), with a number of those being replaced
> with other load and store instructions at higher optimization levels.
> For consistency I've left the tests at -O0.
> 
> Regtested OK for Linux on power8,power9 targets.  Sniff-regtested OK on
> power10 simulator.
> OK for trunk?
> 
> Thanks,
> -Will
> 
> gcc/ChangeLog:
>   * config/rs6000/altivec.h (vec_xl_zest, vec_xl_sext, vec_xst_trunc): New
>   defines.

vec_xl_zext (no humour there :-) ).

> +BU_P10V_OVERLOAD_X (SE_LXVRX,   "se_lxvrx")
> +BU_P10V_OVERLOAD_X (ZE_LXVRX,   "ze_lxvrx")
> +BU_P10V_OVERLOAD_X (TR_STXVRX,  "tr_stxvrx")

I'm not a fan of the cryptic names.  I guess I'll get used to them ;-)

> +  if (op0 == const0_rtx)
> + addr = gen_rtx_MEM (blk ? BLKmode : tmode, op1);

That indent is broken.

> +  else
> + {
> + op0 = copy_to_mode_reg (mode0, op0);

And so is this.  Should be two spaces, not three.

> + addr = gen_rtx_MEM (blk ? BLKmode : smode,
> +   gen_rtx_PLUS (Pmode, op1, op0));

"gen_rtx_PLUS" should line up with "blk".

> +  if (sign_extend)
> +{
> + rtx discratch = gen_reg_rtx (DImode);
> + rtx tiscratch = gen_reg_rtx (TImode);

More broken indentation.  (And more later.)

> + // emit the lxvr*x insn.

Use only /* comments */ please, don't mix them.  Emit with a capital E.

> + pat = GEN_FCN (icode) (tiscratch, addr);
> + if (! pat)

No space after "!" (or any other unary op other than casts and sizeof
and the like).

> + // Emit a sign extention from QI,HI,WI to double.

"extension"

> +;; Store rightmost element into store_data
> +;; using stxvrbx, stxvrhx, strvxwx, strvxdx.
> +(define_insn "vsx_stxvrx"
> +   [(set
> +  (match_operand:INT_ISA3 0 "memory_operand" "=Z")
> +  (truncate:INT_ISA3 (match_operand:TI 1 "vsx_register_operand" "wa")))]
> +  "TARGET_POWER10"
> +  "stxvrx %1,%y0"

%x1 I think?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-char.c
> @@ -0,0 +1,168 @@
> +/*
> + * Test of vec_xl_sext and vec_xl_zext (load into rightmost
> + * vector element and zero/sign extend). */
> +
> +/* { dg-do compile {target power10_ok} } */
> +/* { dg-do run {target power10_hw} } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */

If you dg_require it, why test it on the "dg-do compile" line?  It will
*work* with it of course, but it is puzzling :-)

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-int.c
> @@ -0,0 +1,165 @@
> +/*
> + * Test of vec_xl_sext and vec_xl_zext (load into rightmost
> + * vector element and zero/sign extend). */
> +
> +/* { dg-do compile {target power10_ok} } */
> +/* { dg-do run {target power10_hw} } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O0" } */

Please comment here what that -O0 is for?  So that we still know when we
read it decades from now ;-)

> +/* { dg-final { scan-assembler-times {\mlxvrwx\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mlwax\M} 0 } } */

Maybe all of  {\mlwa}  here?


Segher


[r11-3434 Regression] FAIL: gcc.dg/ipa/ipa-pta-13.c scan-tree-dump fre3 " = x;" on Linux/x86_64 (-m64 -march=cascadelake)

2020-09-24 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

c33f474239308d81bf96cfdb2520d25488ad8724 is the first bad commit
commit c33f474239308d81bf96cfdb2520d25488ad8724
Author: Jan Hubicka 
Date:   Thu Sep 24 15:09:17 2020 +0200

Add access through parameter derference tracking to modref

caused

FAIL: gcc.dg/ipa/ipa-pta-13.c scan-tree-dump fre3 " = x;"

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-3434/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/ipa-pta-13.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[r11-3434 Regression] FAIL: gcc.target/i386/sse2-mmx-pinsrw.c execution test on Linux/x86_64 (-m64 -march=cascadelake)

2020-09-24 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

c33f474239308d81bf96cfdb2520d25488ad8724 is the first bad commit
commit c33f474239308d81bf96cfdb2520d25488ad8724
Author: Jan Hubicka 
Date:   Thu Sep 24 15:09:17 2020 +0200

Add access through parameter derference tracking to modref

caused

FAIL: gcc.target/i386/sse2-mmx-pinsrw.c execution test

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-3434/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/sse2-mmx-pinsrw.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[PATCH] correct/improve handling of null VLA arguments (PR 97188)

2020-09-24 Thread Martin Sebor via Gcc-patches

The machinery recently added to support -Warray-parameter and
-Wvla-parameter also results in enhanced detection of null
pointer arguments to VLA function parameters.  This enhancement
wasn't tested as comprehensively as it should have been and
so has some bugs.  The attached patch fixes one that leads
to an ICE.  It also restructures the function and improves
the warning issues in this case.

The fix is slightly bit bigger than what I would normally commit
without a review but since it's all in code I just wrote and in
my view low risk I will go ahead and push it in a few days unless
I hear requests for changes by then.

Martin
PR middle-end/97188 - ICE passing a null VLA to a function expecting at least one element

gcc/ChangeLog:

	PR middle-end/97188
	* calls.c (maybe_warn_rdwr_sizes): Simplify warning messages.
	Correct handling of VLA argumments.

gcc/testsuite/ChangeLog:

	PR middle-end/97188
	* gcc.dg/Wstringop-overflow-23.c: Adjust text of expected warnings.
	* gcc.dg/Wnonnull-4.c: New test.

diff --git a/gcc/calls.c b/gcc/calls.c
index 0e5c696c463..ed4363811c8 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -17,6 +17,7 @@ You should have received a copy of the GNU General Public License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
+#define INCLUDE_STRING
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -1924,7 +1925,10 @@ static void
 maybe_warn_rdwr_sizes (rdwr_map *rwm, tree fndecl, tree fntype, tree exp)
 {
   auto_diagnostic_group adg;
-  bool warned = false;
+
+  /* Set if a warning has been issued for any argument (used to decide
+ whether to emit an informational note at the end).  */
+  bool any_warned = false;
 
   /* A string describing the attributes that the warnings issued by this
  function apply to.  Used to print one informational note per function
@@ -1974,27 +1978,60 @@ maybe_warn_rdwr_sizes (rdwr_map *rwm, tree fndecl, tree fntype, tree exp)
   else
 	access_size = rwm->get (sizidx)->size;
 
-  bool warned = false;
+  /* Format the value or range to avoid an explosion of messages.  */
+  char sizstr[80];
+  tree sizrng[2] = { size_zero_node, build_all_ones_cst (sizetype) };
+  if (get_size_range (access_size, sizrng, true))
+	{
+	  const char *s0 = print_generic_expr_to_str (sizrng[0]);
+	  if (tree_int_cst_equal (sizrng[0], sizrng[1]))
+	{
+	  gcc_checking_assert (strlen (s0) < sizeof sizstr);
+	  strcpy (sizstr, s0);
+	}
+	  else
+	{
+	  const char *s1 = print_generic_expr_to_str (sizrng[1]);
+	  gcc_checking_assert (strlen (s0) + strlen (s1)
+   < sizeof sizstr - 4);
+	  sprintf (sizstr, "[%s, %s]", s0, s1);
+	}
+	}
+  else
+	*sizstr = '\0';
+
+  /* Set if a warning has been issued for the current argument.  */
+  bool arg_warned = false;
   location_t loc = EXPR_LOCATION (exp);
   tree ptr = access.second.ptr;
-  tree sizrng[2] = { size_zero_node, build_all_ones_cst (sizetype) };
-  if (get_size_range (access_size, sizrng, true)
+  if (*sizstr
 	  && tree_int_cst_sgn (sizrng[0]) < 0
 	  && tree_int_cst_sgn (sizrng[1]) < 0)
 	{
 	  /* Warn about negative sizes.  */
-	  if (tree_int_cst_equal (sizrng[0], sizrng[1]))
-	warned = warning_at (loc, OPT_Wstringop_overflow_,
- "%Kargument %i value %E is negative",
- exp, sizidx + 1, access_size);
+	  if (access.second.internal_p)
+	{
+	  const std::string argtypestr
+		= access.second.array_as_string (ptrtype);
+
+	  arg_warned = warning_at (loc, OPT_Wstringop_overflow_,
+   "%Kbound argument %i value %s is "
+   "negative for a variable length array "
+   "argument %i of type %s",
+   exp, sizidx + 1, sizstr,
+   ptridx + 1, argtypestr.c_str ());
+	}
 	  else
-	warned = warning_at (loc, OPT_Wstringop_overflow_,
- "%Kargument %i range [%E, %E] is negative",
- exp, sizidx + 1, sizrng[0], sizrng[1]);
-	  if (warned)
+	arg_warned = warning_at (loc, OPT_Wstringop_overflow_,
+ "%Kargument %i value %s is negative",
+ exp, sizidx + 1, sizstr);
+
+	  if (arg_warned)
 	{
 	  append_attrname (access, attrstr, sizeof attrstr);
-	  /* Avoid warning again for the same attribute.  */
+	  /* Remember a warning has been issued and avoid warning
+		 again below for the same attribute.  */
+	  any_warned = true;
 	  continue;
 	}
 	}
@@ -2006,7 +2043,6 @@ maybe_warn_rdwr_sizes (rdwr_map *rwm, tree fndecl, tree fntype, tree exp)
 	  /* Multiply ACCESS_SIZE by the size of the type the pointer
 		 argument points to.  If it's incomplete the size is used
 		 as is.  */
-	  access_size = NULL_TREE;
 	  if (tree argsize = TYPE_SIZE_UNIT (argtype))
 		if (TREE_CODE (argsize) == INTEGER_CST)
 		  {
@@ -2028,35 +2064,44 @@ maybe_warn_rdwr_sizes (rdwr_map *rwm, tree fndecl, tree fntype, tree exp)
 		 different from also declaring the pointer argument with
 

[PATCH] c++: Implement -Wrange-loop-construct [PR94695]

2020-09-24 Thread Marek Polacek via Gcc-patches
This new warning can be used to prevent expensive copies inside range-based
for-loops, for instance:

  struct S { char arr[128]; };
  void fn () {
S arr[5];
for (const auto x : arr) {  }
  }

where auto deduces to S and then we copy the big S in every iteration.
Using "const auto " would not incur such a copy.  With this patch the
compiler will warn:

q.C:4:19: warning: loop variable 'x' creates a copy from type 'const S' 
[-Wrange-loop-construct]
4 |   for (const auto x : arr) {  }
  |   ^
q.C:4:19: note: use reference type 'const S&' to prevent copying
4 |   for (const auto x : arr) {  }
  |   ^
  |   &

As per Clang, this warning is suppressed for trivially copyable types
whose size does not exceed 64B.  The tricky part of the patch was how
to figure out if using a reference would have prevented a copy.  I've
used perform_implicit_conversion to perform the imaginary conversion.
Then if the conversion doesn't have any side-effects, I assume it does
not call any functions or create any TARGET_EXPRs, and is just a simple
assignment like this one:

  const T  = (const T &) <__for_begin>;

But it can also be a CALL_EXPR:

  x = (const T &) Iterator::operator* (&__for_begin)

which is still fine -- we just use the return value and don't create
any copies.

This warning is enabled by -Wall.  Further warnings of similar nature
should follow soon.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/c-family/ChangeLog:

PR c++/94695
* c.opt (Wrange-loop-construct): New option.

gcc/cp/ChangeLog:

PR c++/94695
* parser.c (warn_for_range_copy): New function.
(cp_convert_range_for): Call it.

gcc/ChangeLog:

PR c++/94695
* doc/invoke.texi: Document -Wrange-loop-construct.

gcc/testsuite/ChangeLog:

PR c++/94695
* g++.dg/warn/Wrange-loop-construct.C: New test.
---
 gcc/c-family/c.opt|   4 +
 gcc/cp/parser.c   |  77 ++-
 gcc/doc/invoke.texi   |  21 +-
 .../g++.dg/warn/Wrange-loop-construct.C   | 207 ++
 4 files changed, 304 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wrange-loop-construct.C

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 7761eefd203..bbf7da89658 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -800,6 +800,10 @@ Wpacked-not-aligned
 C ObjC C++ ObjC++ Var(warn_packed_not_aligned) Warning LangEnabledBy(C ObjC 
C++ ObjC++,Wall)
 Warn when fields in a struct with the packed attribute are misaligned.
 
+Wrange-loop-construct
+C++ ObjC++ Var(warn_range_loop_construct) Warning LangEnabledBy(C++ 
ObjC++,Wall)
+Warn when a range-based for-loop is creating unnecessary copies.
+
 Wredundant-tags
 C++ ObjC++ Var(warn_redundant_tags) Warning
 Warn when a class or enumerated type is referenced using a redundant class-key.
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index fba3fcc0c4c..d233279ac62 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -12646,6 +12646,73 @@ do_range_for_auto_deduction (tree decl, tree 
range_expr)
 }
 }
 
+/* Warns when the loop variable should be changed to a reference type to
+   avoid unnecessary copying.  I.e., from
+
+ for (const auto x : range)
+
+   where range returns a reference, to
+
+ for (const auto  : range)
+
+   if this version doesn't make a copy.  DECL is the RANGE_DECL; EXPR is the
+   *__for_begin expression.
+   This function is never called when processing_template_decl is on.  */
+
+static void
+warn_for_range_copy (tree decl, tree expr)
+{
+  if (!warn_range_loop_construct
+  || decl == error_mark_node)
+return;
+
+  location_t loc = DECL_SOURCE_LOCATION (decl);
+  tree type = TREE_TYPE (decl);
+
+  if (from_macro_expansion_at (loc))
+return;
+
+  if (TYPE_REF_P (type))
+{
+  /* TODO: Implement reference warnings.  */
+  return;
+}
+  else if (!CP_TYPE_CONST_P (type))
+return;
+
+  /* Since small trivially copyable types are cheap to copy, we suppress the
+ warning for them.  64B is a common size of a cache line.  */
+  if (TREE_CODE (TYPE_SIZE_UNIT (type)) != INTEGER_CST
+  || (tree_to_uhwi (TYPE_SIZE_UNIT (type)) <= 64
+ && trivially_copyable_p (type)))
+return;
+
+  tree rtype = cp_build_reference_type (type, /*rval*/false);
+  /* See what it would take to convert the expr if we used a reference.  */
+  expr = perform_implicit_conversion (rtype, expr, tf_none);
+  if (!TREE_SIDE_EFFECTS (expr))
+/* No calls/TARGET_EXPRs.  */;
+  else
+{
+  /* If we could initialize the reference directly from the call, it
+wouldn't involve any copies.  */
+  STRIP_NOPS (expr);
+  if (TREE_CODE (expr) != CALL_EXPR
+ || !reference_related_p (non_reference (TREE_TYPE (expr)), type))
+  return;
+}
+
+  auto_diagnostic_group d;
+  if (warning_at 

[r11-3436 Regression] FAIL: g++.dg/template/local-fn4.C -std=c++98 (test for excess errors) on Linux/x86_64 (-m64 -march=cascadelake)

2020-09-24 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

2e66e53b1efb98f5cf6b0a123990c1ca999affd7 is the first bad commit
commit 2e66e53b1efb98f5cf6b0a123990c1ca999affd7
Author: Nathan Sidwell 
Date:   Thu Sep 24 06:17:00 2020 -0700

c++: local-decls are never member fns [PR97186]

caused

FAIL: g++.dg/template/local-fn4.C  -std=c++98  (test for errors, line 11)
FAIL: g++.dg/template/local-fn4.C  -std=c++98 (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-3436/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/template/local-fn4.C --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: [PATCH 2/2] rs6000: Add tests for _mm_insert_epi{8,32,64}

2020-09-24 Thread Segher Boessenkool
On Wed, Sep 23, 2020 at 05:12:45PM -0500, Paul A. Clarke wrote:
> Copied from gcc.target/i386.

Okay for trunk then.  Thanks!

(I peeked, it is just fine ;-) )


Segher


Re: [PATCH 1/2] rs6000: Support _mm_insert_epi{8,32,64}

2020-09-24 Thread Segher Boessenkool
Hi!

On Wed, Sep 23, 2020 at 05:12:44PM -0500, Paul A. Clarke wrote:
> +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> +_mm_insert_epi8 (__m128i const __A, int const __D, int const __N)
> +{
> +  __v16qi result = (__v16qi)__A;
> +
> +  result [(__N & 0b)] = __D;

Hrm, GCC supports binary constants like this since 2007, so okay.  But I
have to wonder if this improves anything over hex (or decimal even!)
The parens are superfluous (and only hinder legibility), fwiw.

> +_mm_insert_epi64 (__m128i const __A, long long const __D, int const __N)
> +{
> +  __v2di result = (__v2di)__A;
> +
> +  result [(__N & 0b1)] = __D;

Especially single-digit numbers look really goofy (like 0x0, but even
worse for binary somehow).

Anyway, okay for trunk, with or without those things improved.  Thanks!


Segher


Re: [PATCH 1/2, rs6000] int128 sign extention instructions (partial prereq)

2020-09-24 Thread Segher Boessenkool
Hi!

On Thu, Sep 24, 2020 at 10:59:09AM -0500, will schmidt wrote:
>   This is a sub-set of the 128-bit sign extension support patch series
> that I believe will be fully implemented in a subsequent patch from Carl.
> This is a necessary pre-requisite for the vector-load/store rightmost
> element patch that follows in this thread.

>   * config/rs6000/rs6000.md (enum c_enum): Add UNSPEC_EXTENDDITI2
>   and UNSPEC_MTVSRD_DITI_W1 entries.

(The define_c_enum is called "unspec", not "c_enum".)

These should really be coded not as unspecs, but as normal RTL code?
That way, it can be optimised.

> +;; Move DI value from GPR to TI mode in VSX register, word 1.
> +(define_insn "mtvsrdd_diti_w1"
> +  [(set (match_operand:TI 0 "register_operand" "=wa")
> + (unspec:TI [(match_operand:DI 1 "register_operand" "r")]
> +UNSPEC_MTVSRD_DITI_W1))]
> +  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
> +  "mtvsrdd %x0,0,%1"
> +  [(set_attr "type" "vecsimple")])

(Hrm, we should have had an extended mnemonic for this, "mtvsrld".  Oh
well.)

This should be in vsx.md?

And, please just extend vsx_concat for this?  Maybe using
reg_or_zero_operand?

> +;; Sign extend 64-bit value in TI reg, word 1, to 128-bit value in TI reg
> +(define_insn "extendditi2_vector"
> +  [(set (match_operand:TI 0 "gpc_reg_operand" "=v")
> +(unspec:TI [(match_operand:TI 1 "gpc_reg_operand" "v")]
> + UNSPEC_EXTENDDITI2))]
> +  "TARGET_POWER10"
> +  "vextsd2q %0,%1"
> +  [(set_attr "type" "exts")])

This should use something with sign_extend.

Okay for trunk.  Thanks!  But the unspecs really need to go sooner
rather than later (these are by far not the only ones, so :-( ).


Segher


Re: [PATCH] generalized range_query class for multiple contexts

2020-09-24 Thread Martin Sebor via Gcc-patches

On 9/18/20 12:38 PM, Aldy Hernandez via Gcc-patches wrote:
As part of the ranger work, we have been trying to clean up and 
generalize interfaces whenever possible.  This not only helps in 
reducing the maintenance burden going forward, but provides mechanisms 
for backwards compatibility between ranger and other providers/users of 
ranges throughout the compiler like evrp and VRP.


One such interface is the range_query class in vr_values.h, which 
provides a range query mechanism for use in the simplify_using_ranges 
module.  With it, simplify_using_ranges can be used with the ranger, or 
the VRP twins by providing a get_value_range() method.  This has helped 
us in comparing apples to apples while doing our work, and has also 
future proofed the interface so that asking for a range can be done 
within the context in which it appeared.  For example, get_value_range 
now takes a gimple statement which provides context.  We are no longer 
tied to asking for a global SSA range, but can ask for the range of an 
SSA within a statement.  Granted, this functionality is currently only 
in the ranger, but evrp/vrp could be adapted to pass such context.


The range_query is a good first step, but what we really want is a 
generic query mechanism that can ask for SSA ranges within an 
expression, a statement, an edge, or anything else that may come up.  We 
think that a generic mechanism can be used not only for range producers, 
but consumers such as the substitute_and_fold_engine (see get_value 
virtual) and possibly the gimple folder (see valueize).


The attached patchset provides such an interface.  It is meant to be a 
replacement for range_query that can be used for vr_values, 
substitute_and_fold, the subsitute_and_fold_engine, as well as the 
ranger.  The general API is:


class value_query
{
public:
   // Return the singleton expression for NAME at a gimple statement,
   // or NULL if none found.
   virtual tree value_of_expr (tree name, gimple * = NULL) = 0;
   // Return the singleton expression for NAME at an edge, or NULL if
   // none found.
   virtual tree value_on_edge (edge, tree name);
   // Return the singleton expression for the LHS of a gimple
   // statement, assuming an (optional) initial value of NAME.  Returns
   // NULL if none found.
   //
   // Note this method calculates the range the LHS would have *after*
   // the statement has executed.
   virtual tree value_of_stmt (gimple *, tree name = NULL);
};

class range_query : public value_query
{
public:
   range_query ();
   virtual ~range_query ();

   virtual tree value_of_expr (tree name, gimple * = NULL) OVERRIDE;
   virtual tree value_on_edge (edge, tree name) OVERRIDE;
   virtual tree value_of_stmt (gimple *, tree name = NULL) OVERRIDE;

   // These are the range equivalents of the value_* methods.  Instead
   // of returning a singleton, they calculate a range and return it in
   // R.  TRUE is returned on success or FALSE if no range was found.
   virtual bool range_of_expr (irange , tree name, gimple * = NULL) = 0;
   virtual bool range_on_edge (irange , edge, tree name);
   virtual bool range_of_stmt (irange , gimple *, tree name = NULL);

   // DEPRECATED: This method is used from vr-values.  The plan is to
   // rewrite all uses of it to the above API.
   virtual const class value_range_equiv *get_value_range (const_tree,
   gimple * = NULL);
};

The duality of the API (value_of_* and range_on_*) is because some 
passes are interested in a singleton value 
(substitute_and_fold_enginge), while others are interested in ranges 
(vr_values).  Passes that are only interested in singletons can take a 
value_query, while passes that are interested in full ranges, can take a 
range_query.  Of course, for future proofing, we would recommend taking 
a range_query, since if you provide a default range_of_expr, sensible 
defaults will be provided for the others in terms of range_of_expr.


Note, that the absolute bare minimum that must be provided is a 
value_of_expr and a range_of_expr respectively.


One piece of the API which is missing is a method  to return the range 
of an arbitrary SSA_NAME *after* a statement.  Currently range_of_expr 
calculates the range of an expression upon entry to the statement, 
whereas range_of_stmt calculates the range of *only* the LHS of a 
statement AFTER the statement has executed.


This would allow for complete representation of the ranges/values in 
something like:


     d_4 = *g_7;

Here the range of g_7 upon entry could be VARYING, but after the 
dereference we know it must be non-zero.  Well for sane targets anyhow.


Choices would be to:

   1) add a 4th method such as "range_after_stmt", or

   2) merge that functionality with the existing range_of_stmt method to 
provide "after" functionality for any ssa_name.  Currently the SSA_NAME 
must be the same as the LHS if specified.  It also does not need to be 
specified to allow evaluation of statements without a 

Re: [PATCH 0/2] Rework adding Power10 IEEE 128-bit min, max, and conditional move

2020-09-24 Thread Michael Meissner via Gcc-patches
On Thu, Sep 24, 2020 at 10:24:52AM +0200, Florian Weimer wrote:
> * Michael Meissner via Gcc-patches:
> 
> > These patches are my latest versions of the patches to add IEEE 128-bit min,
> > max, and conditional move to GCC.  They correspond to the earlier patches #3
> > and #4 (patches #1 and #2 have been installed).
> 
> Is this about IEEE min or IEEE minimum?  My understanding is that they
> are not the same (or that the behavior depends on the standard version,
> but I think min was replaced with minimum in the 2019 standard or
> something like that).
> 
> Thanks,
> Florian

The ISA 3.0 added 2 min/max variants to add to the original variant in power7
(ISA 2.6).

xsmaxdp   Maximum value
xsmaxcdp  Maximum value with "C" semantics
xsmaxjdp  Maximum value with "Java" semantics

Due to the NaN rules, unless you use -ffast-math, the compiler won't generate
these by default.  However with the compare and set mask instruction that was
also introduced in ISA 3.0, the compiler can use compare and set mask to
implement maximum and minimum in some cases, that would return the 'right'
value with NaNs.

In ISA 3.1 (power10) the decision was made to only provide the "C" form on
maximum and minimum.  Hence the test in the first patch that uses -ffast-math
to get XSMAXCQP generated.  The second patch adds the conditional move support,
which like for SF/DF modes, can generate maximum and minimums in some cases.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


[PATCH 9/9] PowerPC: Use __builtin_pack_ieee128 if long double is IEEE 128-bit.

2020-09-24 Thread Michael Meissner via Gcc-patches
PowerPC: Use __builtin_pack_ieee128 if long double is IEEE 128-bit.

This patch changes the __ibm128 emulator to use __builtin_pack_ieee128
instead of __builtin_pack_longdouble if long double is IEEE 128-bit, and
we need to use the __ibm128 type.

libgcc/
2020-09-23  Michael Meissner  

* config/rs6000/ibm-ldouble.c (pack_ldouble): Use
__builtin_pack_ieee128 if long double is IEEE 128-bit.
---
 libgcc/config/rs6000/ibm-ldouble.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/libgcc/config/rs6000/ibm-ldouble.c 
b/libgcc/config/rs6000/ibm-ldouble.c
index dd2a02373f2..767fdd72683 100644
--- a/libgcc/config/rs6000/ibm-ldouble.c
+++ b/libgcc/config/rs6000/ibm-ldouble.c
@@ -102,9 +102,17 @@ __asm__ (".symver __gcc_qadd,_xlqadd@GCC_3.4\n\t"
 static inline IBM128_TYPE
 pack_ldouble (double dh, double dl)
 {
+  /* If we are building on a non-VSX system, the __ibm128 type is not defined.
+ This means we can't always use __builtin_pack_ibm128.  Instead, we use
+ __builtin_pack_longdouble if long double uses the IBM extended double
+ 128-bit format, and use the explicit __builtin_pack_ibm128 if long double
+ is IEEE 128-bit.  */
 #if defined (__LONG_DOUBLE_128__) && defined (__LONG_DOUBLE_IBM128__)  \
 && !(defined (_SOFT_FLOAT) || defined (__NO_FPRS__))
   return __builtin_pack_longdouble (dh, dl);
+#elif defined (__LONG_DOUBLE_128__) && defined (__LONG_DOUBLE_IEEE128__) \
+&& !(defined (_SOFT_FLOAT) || defined (__NO_FPRS__))
+  return __builtin_pack_ibm128 (dh, dl);
 #else
   union
   {
-- 
2.22.0


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


[PATCH 8/9] PowerPC: Change tests to use __float128 instead of __ieee128.

2020-09-24 Thread Michael Meissner via Gcc-patches
>From e4114c9c13067b356f9ab5c5bb4c6a928771aef8 Mon Sep 17 00:00:00 2001
From: Michael Meissner 
Date: Wed, 23 Sep 2020 17:12:56 -0400
Subject: [PATCH 8/9] PowerPC: Change tests to use __float128 instead of 
__ieee128.

Two of the tests used the __ieee128 keyword instead of __float128.  This
patch changes those cases to use the official keyword.

gcc/testsuite/
2020-09-23  Michael Meissner  

* gcc.target/powerpc/float128-cmp2-runnable.c: Use __float128
keyword instead of __ieee128.
* gcc.target/powerpc/pr92796.c: Use __float128 keyword instead of
__ieee128.
---
 gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/pr92796.c| 8 
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c
index 93dd1128a3f..fbec5289063 100644
--- a/gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c
@@ -16,7 +16,7 @@ int main(void)
 {
   int result;
   double a_dble, b_dble;
-  __ieee128 a_ieee128, b_ieee128;
+  __float128 a_ieee128, b_ieee128;
   
   a_dble = 3.10;
   b_dble = 3.10;
diff --git a/gcc/testsuite/gcc.target/powerpc/pr92796.c 
b/gcc/testsuite/gcc.target/powerpc/pr92796.c
index 1e671e175de..f2c6b8b7f5c 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr92796.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr92796.c
@@ -4,14 +4,14 @@
 
 typedef union
 {
-  __ieee128 a;
+  __float128 a;
   int b;
 } c;
 
-__ieee128
-d (__ieee128 x)
+__float128
+d (__float128 x)
 {
-  __ieee128 g;
+  __float128 g;
   c h;
   h.a = x;
   g = h.b & 5;
-- 
2.22.0


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


[PATCH 7/9] PowerPC: Update IEEE 128-bit built-in functions to work if long double is IEEE 128-bit.

2020-09-24 Thread Michael Meissner via Gcc-patches
PowerPC: Update IEEE 128-bit built-in functions to work if long double is IEEE 
128-bit.

This patch adds long double variants of the power10 __float128 built-in
functions.  This is needed because __float128 uses TFmode in this case
instead of KFmode.

gcc/
2020-09-23  Michael Meissner  

* config/rs6000/rs6000-call.c (altivec_overloaded_builtins): Add
built-in functions for long double built-ins that use IEEE
128-bit.
(rs6000_expand_builtin): Change the KF IEEE 128-bit comparison
insns to TF if long double is IEEE 128-bit.
* config/rs6000/rs6000-builtin.def (scalar_extract_exptf): Add
support for long double being IEEE 128-bit built-in functions.
(scalar_extract_sigtf): Likewise.
(scalar_test_neg_tf): Likewise.
(scalar_insert_exp_tf): Likewise.
(scalar_insert_exp_tfp): Likewise.
(scalar_cmp_exp_tf_gt): Likewise.
(scalar_cmp_exp_tf_lt): Likewise.
(scalar_cmp_exp_tf_eq): Likewise.
(scalar_cmp_exp_tf_unordered): Likewise.
(scalar_test_data_class_tf): Likewise.
---
 gcc/config/rs6000/rs6000-builtin.def | 11 
 gcc/config/rs6000/rs6000-call.c  | 40 
 2 files changed, 51 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index e91a48ddf5f..7d52961c8cf 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2401,8 +2401,11 @@ BU_P9V_64BIT_VSX_1 (VSESDP,  "scalar_extract_sig",   
CONST,  xsxsigdp)
 
 BU_FLOAT128_HW_VSX_1 (VSEEQP,  "scalar_extract_expq",  CONST,  xsxexpqp_kf)
 BU_FLOAT128_HW_VSX_1 (VSESQP,  "scalar_extract_sigq",  CONST,  xsxsigqp_kf)
+BU_FLOAT128_HW_VSX_1 (VSEETF,  "scalar_extract_exptf", CONST,  xsxexpqp_tf)
+BU_FLOAT128_HW_VSX_1 (VSESTF,  "scalar_extract_sigtf", CONST,  xsxsigqp_tf)
 
 BU_FLOAT128_HW_VSX_1 (VSTDCNQP, "scalar_test_neg_qp",  CONST,  xststdcnegqp_kf)
+BU_FLOAT128_HW_VSX_1 (VSTDCNTF, "scalar_test_neg_tf",  CONST,  xststdcnegqp_tf)
 BU_P9V_VSX_1 (VSTDCNDP,"scalar_test_neg_dp",   CONST,  xststdcnegdp)
 BU_P9V_VSX_1 (VSTDCNSP,"scalar_test_neg_sp",   CONST,  xststdcnegsp)
 
@@ -2420,6 +2423,8 @@ BU_P9V_64BIT_VSX_2 (VSIEDPF,  "scalar_insert_exp_dp", 
CONST,  xsiexpdpf)
 
 BU_FLOAT128_HW_VSX_2 (VSIEQP,  "scalar_insert_exp_q",  CONST,  xsiexpqp_kf)
 BU_FLOAT128_HW_VSX_2 (VSIEQPF, "scalar_insert_exp_qp", CONST,  xsiexpqpf_kf)
+BU_FLOAT128_HW_VSX_2 (VSIETF,  "scalar_insert_exp_tf", CONST,  xsiexpqp_tf)
+BU_FLOAT128_HW_VSX_2 (VSIETFF, "scalar_insert_exp_tfp", CONST, xsiexpqpf_tf)
 
 BU_P9V_VSX_2 (VSCEDPGT,"scalar_cmp_exp_dp_gt", CONST,  xscmpexpdp_gt)
 BU_P9V_VSX_2 (VSCEDPLT,"scalar_cmp_exp_dp_lt", CONST,  xscmpexpdp_lt)
@@ -2431,7 +2436,13 @@ BU_P9V_VSX_2 (VSCEQPLT,  "scalar_cmp_exp_qp_lt", CONST,  
xscmpexpqp_lt_kf)
 BU_P9V_VSX_2 (VSCEQPEQ,"scalar_cmp_exp_qp_eq", CONST,  
xscmpexpqp_eq_kf)
 BU_P9V_VSX_2 (VSCEQPUO,"scalar_cmp_exp_qp_unordered",  CONST,  
xscmpexpqp_unordered_kf)
 
+BU_P9V_VSX_2 (VSCETFGT,"scalar_cmp_exp_tf_gt", CONST,  
xscmpexpqp_gt_tf)
+BU_P9V_VSX_2 (VSCETFLT,"scalar_cmp_exp_tf_lt", CONST,  
xscmpexpqp_lt_tf)
+BU_P9V_VSX_2 (VSCETFEQ,"scalar_cmp_exp_tf_eq", CONST,  
xscmpexpqp_eq_tf)
+BU_P9V_VSX_2 (VSCETFUO,"scalar_cmp_exp_tf_unordered", CONST, 
xscmpexpqp_unordered_tf)
+
 BU_FLOAT128_HW_VSX_2 (VSTDCQP, "scalar_test_data_class_qp",CONST,  
xststdcqp_kf)
+BU_FLOAT128_HW_VSX_2 (VSTDCTF, "scalar_test_data_class_tf",CONST,  
xststdcqp_tf)
 BU_P9V_VSX_2 (VSTDCDP, "scalar_test_data_class_dp",CONST,  xststdcdp)
 BU_P9V_VSX_2 (VSTDCSP, "scalar_test_data_class_sp",CONST,  xststdcsp)
 
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index a8b520834c7..8dc779df1f9 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -4587,6 +4587,8 @@ const struct altivec_builtin_types 
altivec_overloaded_builtins[] = {
 RS6000_BTI_bool_int, RS6000_BTI_double, RS6000_BTI_INTSI, 0 },
   { P9V_BUILTIN_VEC_VSTDC, P9V_BUILTIN_VSTDCQP,
 RS6000_BTI_bool_int, RS6000_BTI_ieee128_float, RS6000_BTI_INTSI, 0 },
+  { P9V_BUILTIN_VEC_VSTDC, P9V_BUILTIN_VSTDCTF,
+RS6000_BTI_bool_int, RS6000_BTI_long_double, RS6000_BTI_INTSI, 0 },
 
   { P9V_BUILTIN_VEC_VSTDCSP, P9V_BUILTIN_VSTDCSP,
 RS6000_BTI_bool_int, RS6000_BTI_float, RS6000_BTI_INTSI, 0 },
@@ -4594,6 +4596,8 @@ const struct altivec_builtin_types 
altivec_overloaded_builtins[] = {
 RS6000_BTI_bool_int, RS6000_BTI_double, RS6000_BTI_INTSI, 0 },
   { P9V_BUILTIN_VEC_VSTDCQP, P9V_BUILTIN_VSTDCQP,
 RS6000_BTI_bool_int, RS6000_BTI_ieee128_float, RS6000_BTI_INTSI, 0 },
+  { P9V_BUILTIN_VEC_VSTDCQP, P9V_BUILTIN_VSTDCTF,
+RS6000_BTI_bool_int, RS6000_BTI_long_double, RS6000_BTI_INTSI, 0 },
 
   { P9V_BUILTIN_VEC_VSTDCN, P9V_BUILTIN_VSTDCNSP,
 RS6000_BTI_bool_int, RS6000_BTI_float, 0, 0 },
@@ -4601,6 +4605,8 @@ const 

[PATCH 6/9] PowerPC: If long double is IEEE 128-bit, map q built-ins to *l instead of *f128.

2020-09-24 Thread Michael Meissner via Gcc-patches
PowerPC: If long double is IEEE 128-bit, map q built-ins to *l instead of *f128.

If we map nanq to nanf128 when long double is IEEE, it seems to lose the
special signaling vs. non-signaling NAN support.  This patch maps the functions
to the long double version if long double is IEEE 128-bit.

gcc/
2020-09-23  Michael Meissner  

* config/rs6000/rs6000-c.c (rs6000_cpu_cpp_builtins): If long
double is IEEE-128 map the nanq built-in functions to the long
double function, not the f128 function.
---
 gcc/config/rs6000/rs6000-c.c | 31 ---
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index f5982907e90..8f7a8eec740 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -681,15 +681,32 @@ rs6000_cpu_cpp_builtins (cpp_reader *pfile)
   builtin_define ("__builtin_vsx_xvnmsubmsp=__builtin_vsx_xvnmsubsp");
 }
 
-  /* Map the old _Float128 'q' builtins into the new 'f128' builtins.  */
+  /* Map the old _Float128 'q' builtins into the new 'f128' builtins if long
+ double is IBM or 64-bit.
+
+ However, if long double is IEEE 128-bit, map both sets of built-in
+ functions to the normal long double version.  This shows up in nansf128
+ vs. nanf128.  */
   if (TARGET_FLOAT128_TYPE)
 {
-  builtin_define ("__builtin_fabsq=__builtin_fabsf128");
-  builtin_define ("__builtin_copysignq=__builtin_copysignf128");
-  builtin_define ("__builtin_nanq=__builtin_nanf128");
-  builtin_define ("__builtin_nansq=__builtin_nansf128");
-  builtin_define ("__builtin_infq=__builtin_inff128");
-  builtin_define ("__builtin_huge_valq=__builtin_huge_valf128");
+  if (FLOAT128_IEEE_P (TFmode))
+   {
+ builtin_define ("__builtin_fabsq=__builtin_fabsl");
+ builtin_define ("__builtin_copysignq=__builtin_copysignl");
+ builtin_define ("__builtin_nanq=__builtin_nanl");
+ builtin_define ("__builtin_nansq=__builtin_nansl");
+ builtin_define ("__builtin_infq=__builtin_infl");
+ builtin_define ("__builtin_huge_valq=__builtin_huge_vall");
+   }
+  else
+   {
+ builtin_define ("__builtin_fabsq=__builtin_fabsf128");
+ builtin_define ("__builtin_copysignq=__builtin_copysignf128");
+ builtin_define ("__builtin_nanq=__builtin_nanf128");
+ builtin_define ("__builtin_nansq=__builtin_nansf128");
+ builtin_define ("__builtin_infq=__builtin_inff128");
+ builtin_define ("__builtin_huge_valq=__builtin_huge_valf128");
+   }
 }
 
   /* Tell users they can use __builtin_bswap{16,64}.  */
-- 
2.22.0


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


[PATCH 5/9] PowerPC: Update tests to run if long double is IEEE 128-bit.

2020-09-24 Thread Michael Meissner via Gcc-patches
PowerPC: Update tests to run if long double is IEEE 128-bit.

gcc/testsuite/
2020-09-23  Michael Meissner  

* c-c++-common/dfp/convert-bfp-11.c: If long double is IEEE
128-bit, skip the test.
* gcc.dg/nextafter-2.c: On PowerPC, if long double is IEEE
128-bit, include math.h to get the built-in mapped correctly.
* gcc.target/powerpc/pr70117.c: Add support for long double being
IEEE 128-bit.
---
 gcc/testsuite/c-c++-common/dfp/convert-bfp-11.c |  7 +++
 gcc/testsuite/gcc.dg/nextafter-2.c  | 10 ++
 gcc/testsuite/gcc.target/powerpc/pr70117.c  |  6 --
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/dfp/convert-bfp-11.c 
b/gcc/testsuite/c-c++-common/dfp/convert-bfp-11.c
index 95c433d2c24..6ee0c1c6ae9 100644
--- a/gcc/testsuite/c-c++-common/dfp/convert-bfp-11.c
+++ b/gcc/testsuite/c-c++-common/dfp/convert-bfp-11.c
@@ -5,6 +5,7 @@
Don't force 128-bit long doubles because runtime support depends
on glibc.  */
 
+#include 
 #include "convert.h"
 
 volatile _Decimal32 sd;
@@ -39,6 +40,12 @@ main ()
   if (sizeof (long double) != 16)
 return 0;
 
+  /* This test is written to test IBM extended double, which is a pair of
+ doubles.  If long double can hold a larger value than a double can, such
+ as when long double is IEEE 128-bit, just exit immediately.  */
+  if (LDBL_MAX_10_EXP > DBL_MAX_10_EXP)
+return 0;
+
   convert_101 ();
   convert_102 ();
 
diff --git a/gcc/testsuite/gcc.dg/nextafter-2.c 
b/gcc/testsuite/gcc.dg/nextafter-2.c
index e51ae94be0c..64e9e3c485f 100644
--- a/gcc/testsuite/gcc.dg/nextafter-2.c
+++ b/gcc/testsuite/gcc.dg/nextafter-2.c
@@ -13,4 +13,14 @@
 #  define NO_LONG_DOUBLE 1
 # endif
 #endif
+
+#if defined(_ARCH_PPC) && defined(__LONG_DOUBLE_IEEE128__)
+/* On PowerPC systems, long double uses either the IBM long double format, or
+   IEEE 128-bit format.  The compiler switches the long double built-in
+   function names and glibc switches the names when math.h is included.
+   Because this test is run with -fno-builtin, include math.h so that the
+   appropriate nextafter functions are called.  */
+#include 
+#endif
+
 #include "nextafter-1.c"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr70117.c 
b/gcc/testsuite/gcc.target/powerpc/pr70117.c
index 3bbd2c595e0..928efe39c7b 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr70117.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr70117.c
@@ -9,9 +9,11 @@
128-bit floating point, because the type is not enabled on those
systems.  */
 #define LDOUBLE __ibm128
+#define IBM128_MAX ((__ibm128) 1.79769313486231580793728971405301199e+308L)
 
 #elif defined(__LONG_DOUBLE_IBM128__)
 #define LDOUBLE long double
+#define IBM128_MAX LDBL_MAX
 
 #else
 #error "long double must be either IBM 128-bit or IEEE 128-bit"
@@ -75,10 +77,10 @@ main (void)
   if (__builtin_isnormal (ld))
 __builtin_abort ();
 
-  ld = LDBL_MAX;
+  ld = IBM128_MAX;
   if (!__builtin_isnormal (ld))
 __builtin_abort ();
-  ld = -LDBL_MAX;
+  ld = -IBM128_MAX;
   if (!__builtin_isnormal (ld))
 __builtin_abort ();
 
-- 
2.22.0


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


[PATCH 4/9] PowerPC: Add IEEE 128-bit <-> Decimal conversions.

2020-09-24 Thread Michael Meissner via Gcc-patches
PowerPC: Add IEEE 128-bit <-> Decimal conversions.

This patch adds the basic support for converting between IEEE 128-bit floating
point and Decimal types.

libgcc/
2020-09-23  Michael Meissner  

* config/rs6000/_dd_to_kf.c: New file.
* config/rs6000/_kf_to_dd.c: New file.
* config/rs6000/_kf_to_sd.c: New file.
* config/rs6000/_kf_to_td.c: New file.
* config/rs6000/_sd_to_kf.c: New file.
* config/rs6000/_td_to_kf.c: New file.
* config/rs6000/t-float128: Build __floating conversions to/from
Decimal support functions.  By default compile with long double
being IBM extended double.
* dfp-bit.c: Add support for building the PowerPC _Float128
to/from Decimal conversion functions.
* dfp-bit.h: Likewise.
---
 libgcc/config/rs6000/_dd_to_kf.c | 30 ++
 libgcc/config/rs6000/_kf_to_dd.c | 30 ++
 libgcc/config/rs6000/_kf_to_sd.c | 30 ++
 libgcc/config/rs6000/_kf_to_td.c | 30 ++
 libgcc/config/rs6000/_sd_to_kf.c | 30 ++
 libgcc/config/rs6000/_td_to_kf.c | 30 ++
 libgcc/config/rs6000/t-float128  | 30 +-
 libgcc/dfp-bit.c | 10 +++--
 libgcc/dfp-bit.h | 37 +---
 9 files changed, 251 insertions(+), 6 deletions(-)
 create mode 100644 libgcc/config/rs6000/_dd_to_kf.c
 create mode 100644 libgcc/config/rs6000/_kf_to_dd.c
 create mode 100644 libgcc/config/rs6000/_kf_to_sd.c
 create mode 100644 libgcc/config/rs6000/_kf_to_td.c
 create mode 100644 libgcc/config/rs6000/_sd_to_kf.c
 create mode 100644 libgcc/config/rs6000/_td_to_kf.c

diff --git a/libgcc/config/rs6000/_dd_to_kf.c b/libgcc/config/rs6000/_dd_to_kf.c
new file mode 100644
index 000..081415fd393
--- /dev/null
+++ b/libgcc/config/rs6000/_dd_to_kf.c
@@ -0,0 +1,30 @@
+/* Copyright (C) 1989-2020 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+/* Decimal64 -> _Float128 conversion.  */
+#define FINE_GRAINED_LIBRARIES 1
+#define L_dd_to_kf 1
+#define WIDTH  64
+
+/* Use dfp-bit.c to do the real work.  */
+#include "dfp-bit.c"
diff --git a/libgcc/config/rs6000/_kf_to_dd.c b/libgcc/config/rs6000/_kf_to_dd.c
new file mode 100644
index 000..09a62cbe629
--- /dev/null
+++ b/libgcc/config/rs6000/_kf_to_dd.c
@@ -0,0 +1,30 @@
+/* Copyright (C) 1989-2020 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+/* _Float128 -> Decimal64 conversion.  */
+#define FINE_GRAINED_LIBRARIES 1
+#define L_kf_to_dd 1
+#define WIDTH  64
+
+/* Use dfp-bit.c to do the real work.  */
+#include "dfp-bit.c"
diff --git a/libgcc/config/rs6000/_kf_to_sd.c b/libgcc/config/rs6000/_kf_to_sd.c
new file mode 100644
index 000..f35b68eb4d9
--- /dev/null
+++ b/libgcc/config/rs6000/_kf_to_sd.c
@@ -0,0 +1,30 @@
+/* Copyright (C) 1989-2020 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software 

[PATCH, rs6000] correct an erroneous BTM value in the BU_P10_MISC define

2020-09-24 Thread will schmidt via Gcc-patches
[PATCH, rs6000] correct an erroneous blip in the BU_P10_MISC define

Hi, 
We have extraneous BTM entry (RS6000_BTM_POWERPC64) in the define for
our P10 MISC 2 builtin definition.  This does not exist for the '0',
'1' or '3' definitions. It appears to me that this was erroneously
copied from the P7 version of the define which contains a version of the
BU macro both with and without that element.  Removing the
RS6000_BTM_POWERPC64 portion of the define does not introduce any obvious
failures, I believe this extra line can be safely removed.

OK for trunk?

Thanks
-Will

diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index e91a48ddf5fe..62c9b77cb76d 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1112,12 +1112,11 @@
CODE_FOR_ ## ICODE) /* ICODE */
 
 #define BU_P10_MISC_2(ENUM, NAME, ATTR, ICODE) \
   RS6000_BUILTIN_2 (P10_BUILTIN_ ## ENUM,  /* ENUM */  \
"__builtin_" NAME,  /* NAME */  \
-   RS6000_BTM_P10  \
-   | RS6000_BTM_POWERPC64, /* MASK */  \
+   RS6000_BTM_P10, /* MASK */  \
(RS6000_BTC_ ## ATTR/* ATTR */  \
 | RS6000_BTC_BINARY),  \
CODE_FOR_ ## ICODE) /* ICODE */
 
 #define BU_P10_MISC_3(ENUM, NAME, ATTR, ICODE) \



One issue with default implementation of zero_call_used_regs

2020-09-24 Thread Qing Zhao via Gcc-patches
Hi, Richard,

As you suggested, I added a default implementation of the target hook 
“zero_cal_used_regs (HARD_REG_SET)” as following in my latest patch


/* The default hook for TARGET_ZERO_CALL_USED_REGS.  */

void
default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
{
  gcc_assert (!hard_reg_set_empty_p (need_zeroed_hardregs));

  /* This array holds the zero rtx with the correponding machine mode.  */
  rtx zero_rtx[(int)MAX_MACHINE_MODE];
  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
zero_rtx[i] = NULL_RTX;

  expand_asm_memory_blockage ();

  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
if (TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
  {
rtx reg, tmp;
machine_mode mode = reg_raw_mode[regno];

reg = gen_rtx_REG (mode, regno);

/* update the data flow information.  */
expand_asm_reg_clobber_blockage (reg);
df_update_zeroed_reg_set (regno);

if (zero_rtx[(int)mode] == NULL_RTX)
  {
zero_rtx[(int)mode] = reg;
tmp = gen_rtx_SET (reg, const0_rtx);
emit_insn (tmp);
  }
else
  emit_move_insn (reg, zero_rtx[(int)mode]);
  }
  return;
}

I tested this default implementation on aarch64 with a small testing case, 
-fzero-call-used-regs=all-gpr|used-gpr|used-gpr-arg|used-arg|used work well, 
however, 
-fzero-call-used-regs=all-arg or -fzero-call-used-regs=all have an internal 
compiler error as following:

t1.c:15:1: internal compiler error: in gen_highpart, at emit-rtl.c:1631
   15 | }
  | ^
0xcff58b gen_highpart(machine_mode, rtx_def*)
../../hjl-caller-saved-gcc/gcc/emit-rtl.c:1631
0x174b373 aarch64_split_128bit_move(rtx_def*, rtx_def*)
../../hjl-caller-saved-gcc/gcc/config/aarch64/aarch64.c:3390
0x1d8b087 gen_split_11(rtx_insn*, rtx_def**)
../../hjl-caller-saved-gcc/gcc/config/aarch64/aarch64.md:1394

As I studied today, I found the major issue for this bug is because the 
following statement:

machine_mode mode = reg_raw_mode[regno];

“reg_raw_mode” returns E_TImode for aarch64 register V0 (which is a vector 
register on aarch64) , as a result, the zeroing insn for this register is:

(insn 112 111 113 7 (set (reg:TI 32 v0)
(const_int 0 [0])) "t1.c":15:1 -1
 (nil))


However, looks like that the above RTL have to be splitted into two sub 
register moves on aarch64, and the splitting has some issue. 

So, I guess that on aarch64, zeroing vector registers might need other modes 
than the one returned by “reg_raw_mode”.  

My questions are:

1. Is there another available utility routine that returns the proper MODE for 
the hard registers that can be readily used to zero the hardr register?
2. If not, should I add one more target hook for this purpose? i.e 

/* Return the proper machine mode that can be used to zero this hard register 
specified by REGNO.  */
machine_mode zero-call-used-regs-mode (unsigned int REGNO)


Thanks.

Qing





[PATCH 3/9] PowerPC: Update IEEE <-> IBM 128-bit floating point conversions.

2020-09-24 Thread Michael Meissner via Gcc-patches
PowerPC: Update IEEE <-> IBM 128-bit floating point conversions.

This patch changes the code for doing conversions between IEEE 128-bit floating
point and IBM 128-bit extended double floating point.  It moves the conversion
functions to a separate file.  It uses explicit __ibm128 instead of long
double to allow the long double type to be set to IEEE 128-bit.

libgcc/
2020-09-23  Michael Meissner  

* config/rs6000/extendkftf2-sw.c: Move __float128 to __ibm128
conversion into float128-convert.h.
* config/rs6000/float128-convert.h: New file.
* config/rs6000/float128-hw.c: Move conversions between __float128
and __ibm128 into float128-convert.h.
* config/rs6000/quad-float128.h: Move conversions between
__float128 and __ibm128 into float128-convert.h.
* config/rs6000/trunctfkf2-sw.c: Move __ibm128 to __float128
conversion to float128-convert.h.
---
 libgcc/config/rs6000/extendkftf2-sw.c   |  6 +-
 libgcc/config/rs6000/float128-convert.h | 77 +
 libgcc/config/rs6000/float128-hw.c  | 11 +---
 libgcc/config/rs6000/quad-float128.h| 58 ---
 libgcc/config/rs6000/trunctfkf2-sw.c|  6 +-
 5 files changed, 84 insertions(+), 74 deletions(-)
 create mode 100644 libgcc/config/rs6000/float128-convert.h

diff --git a/libgcc/config/rs6000/extendkftf2-sw.c 
b/libgcc/config/rs6000/extendkftf2-sw.c
index f0de1784c43..80b48c20d9c 100644
--- a/libgcc/config/rs6000/extendkftf2-sw.c
+++ b/libgcc/config/rs6000/extendkftf2-sw.c
@@ -38,6 +38,7 @@
 
 #include "soft-fp.h"
 #include "quad-float128.h"
+#include "float128-convert.h"
 
 #ifndef FLOAT128_HW_INSNS
 #define __extendkftf2_sw __extendkftf2
@@ -46,8 +47,5 @@
 IBM128_TYPE
 __extendkftf2_sw (TFtype value)
 {
-  IBM128_TYPE ret;
-
-  CVT_FLOAT128_TO_IBM128 (ret, value);
-  return ret;
+  return convert_float128_to_ibm128 (value);
 }
diff --git a/libgcc/config/rs6000/float128-convert.h 
b/libgcc/config/rs6000/float128-convert.h
new file mode 100644
index 000..bb6b3d71889
--- /dev/null
+++ b/libgcc/config/rs6000/float128-convert.h
@@ -0,0 +1,77 @@
+/* Convert between IEEE 128-bit and IBM 128-bit floating point types.
+   Copyright (C) 2016-2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   Contributed by Michael Meissner (meiss...@linux.ibm.com).
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   .  */
+
+/* Implementation of conversions between __ibm128 and __float128, to allow the
+   same code to be used on systems with IEEE 128-bit emulation and with IEEE
+   128-bit hardware support.
+
+   These functions are called by the actual conversion functions called by the
+   compiler.  This code is here to allow being built at power8 (no hardware
+   float128) and power9 (hardware float128) varients that is selected by an
+   IFUNC function.  */
+
+static inline __ibm128 convert_float128_to_ibm128 (__float128);
+static inline __float128 convert_ibm128_to_float128 (__ibm128);
+
+static inline __ibm128
+convert_float128_to_ibm128 (__float128 value)
+{
+  double high, high_temp, low;
+
+  high = (double) value;
+  if (__builtin_isnan (high) || __builtin_isinf (high))
+low = 0.0;
+
+  else
+{
+  low = (double) (value - (__float128) high);
+  /* Renormalize low/high and move them into canonical IBM long
+double form.  */
+  high_temp = high + low;
+  low = (high - high_temp) + low;
+  high = high_temp;
+}
+
+  return __builtin_pack_ibm128 (high, low);
+}
+
+static inline __float128
+convert_ibm128_to_float128 (__ibm128 value)
+{
+  double high = __builtin_unpack_ibm128 (value, 0);
+  double low = __builtin_unpack_ibm128 (value, 1);
+
+  /* Handle the special cases of NAN and infinity.  Similarly, if low is 0.0,
+ there no need to do 

[PATCH 2/9] PowerPC: Update __float128 and __ibm128 error messages.

2020-09-24 Thread Michael Meissner via Gcc-patches
PowerPC: Update __float128 and __ibm128 error messages.

This patch attempts to make the error messages for intermixing IEEE 128-bit
floating point with IBM 128-bit extended double types to be clearer if the long
double type uses the IEEE 128-bit format.

gcc/
2020-09-23  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_invalid_binary_op): Update error
messages about mixing IBM long double and IEEE 128-bit.

gcc/testsuite/
2020-09-23  Michael Meissner  

* gcc.target/powerpc/bfp/scalar-extract-exp-4.c: Update failure
messages.
* gcc.target/powerpc/bfp/scalar-extract-sig-4.c: Update failure
messages.
* gcc.target/powerpc/bfp/scalar-test-data-class-11.c: Update
failure messages.
* gcc.target/powerpc/bfp/scalar-test-neg-5.c: Update failure
messages.
* gcc.target/powerpc/float128-mix-2.c: New test.
* gcc.target/powerpc/float128-mix-3.c: New test.
* gcc.target/powerpc/float128-mix.c: Update failure messages.
---
 gcc/config/rs6000/rs6000.c| 20 ---
 .../powerpc/bfp/scalar-extract-exp-4.c|  4 +---
 .../powerpc/bfp/scalar-extract-sig-4.c|  2 +-
 .../powerpc/bfp/scalar-test-data-class-11.c   |  2 +-
 .../powerpc/bfp/scalar-test-neg-5.c   |  2 +-
 .../gcc.target/powerpc/float128-mix-2.c   | 17 
 .../gcc.target/powerpc/float128-mix-3.c   | 17 
 .../gcc.target/powerpc/float128-mix.c | 19 ++
 8 files changed, 53 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-mix-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-mix-3.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 0ff0f31d552..97f535f0018 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -14352,22 +14352,10 @@ rs6000_invalid_binary_op (int op ATTRIBUTE_UNUSED,
 
   if (!TARGET_FLOAT128_CVT)
 {
-  if ((mode1 == KFmode && mode2 == IFmode)
- || (mode1 == IFmode && mode2 == KFmode))
-   return N_("__float128 and __ibm128 cannot be used in the same "
- "expression");
-
-  if (TARGET_IEEEQUAD
- && ((mode1 == IFmode && mode2 == TFmode)
- || (mode1 == TFmode && mode2 == IFmode)))
-   return N_("__ibm128 and long double cannot be used in the same "
- "expression");
-
-  if (!TARGET_IEEEQUAD
- && ((mode1 == KFmode && mode2 == TFmode)
- || (mode1 == TFmode && mode2 == KFmode)))
-   return N_("__float128 and long double cannot be used in the same "
- "expression");
+  if ((FLOAT128_IEEE_P (mode1) && FLOAT128_IBM_P (mode2))
+ || (FLOAT128_IBM_P (mode1) && FLOAT128_IEEE_P (mode2)))
+   return N_("Invalid mixing of IEEE 128-bit and IBM 128-bit floating "
+ "point types");
 }
 
   return NULL;
diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-4.c 
b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-4.c
index 850ff620490..2065a287bb3 100644
--- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-4.c
+++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-4.c
@@ -11,7 +11,5 @@ get_exponent (__ieee128 *p)
 {
   __ieee128 source = *p;
 
-  return __builtin_vec_scalar_extract_exp (source); /* { dg-error 
"'__builtin_vsx_scalar_extract_expq' requires" } */
+  return __builtin_vec_scalar_extract_exp (source); /* { dg-error 
"'__builtin_vsx_scalar_extract_exp.*' requires" } */
 }
-
-
diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-4.c 
b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-4.c
index 32a53c6fffd..37bc8332961 100644
--- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-4.c
+++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-4.c
@@ -11,5 +11,5 @@ get_significand (__ieee128 *p)
 {
   __ieee128 source = *p;
 
-  return __builtin_vec_scalar_extract_sig (source);/* { dg-error 
"'__builtin_vsx_scalar_extract_sigq' requires" } */
+  return __builtin_vec_scalar_extract_sig (source);/* { dg-error 
"'__builtin_vsx_scalar_extract_sig.*' requires" } */
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-11.c 
b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-11.c
index 7c6fca2b729..ec3118792c4 100644
--- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-11.c
+++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-11.c
@@ -10,5 +10,5 @@ test_data_class (__ieee128 *p)
 {
   __ieee128 source = *p;
 
-  return __builtin_vec_scalar_test_data_class (source, 3); /* { dg-error 
"'__builtin_vsx_scalar_test_data_class_qp' requires" } */
+  return __builtin_vec_scalar_test_data_class (source, 3); /* { dg-error 
"'__builtin_vsx_scalar_test_data_class_.*' requires" } */
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-5.c 

[PATCH 1/9] PowerPC: Map long double built-in functions if IEEE 128-bit long double.

2020-09-24 Thread Michael Meissner via Gcc-patches
PowerPC: Map long double built-in functions if IEEE 128-bit long double.

This patch goes through the built-in functions and changes the name that is
used to the name used for __float128 and _Float128 support in glibc if the
PowerPC long double type is IEEE 128-bit instead of IBM extended double.

Normally the mapping is done in the math.h and stdio.h files.  However, not
everybody uses these files, which means we also need to change the external
name for the built-in function within the compiler.

In addition, changing the name in GCC allows the Fortran compiler to
automatically use the correct name.

To map the math functions, typically this patch changes l to f128.
However there are some exceptions that are handled with this patch.

To map the printf functions,  is mapped to __ieee128.

To map the scanf functions,  is mapped to __isoc99ieee128.

gcc/
2020-09-23  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_mangle_decl_assembler_name): Add
support for mapping built-in function names for long double
built-in functions if long double is IEEE 128-bit.

gcc/testsuite/
2020-09-23  Michael Meissner  

* gcc.target/powerpc/float128-longdouble-math.c: New test.
* gcc.target/powerpc/float128-longdouble-stdio.c: New test.
---
 gcc/config/rs6000/rs6000.c| 153 -
 .../powerpc/float128-longdouble-math.c| 559 ++
 .../powerpc/float128-longdouble-stdio.c   |  37 ++
 3 files changed, 718 insertions(+), 31 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-longdouble-math.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-longdouble-stdio.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index b589f4566c2..0ff0f31d552 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -26909,56 +26909,147 @@ rs6000_globalize_decl_name (FILE * stream, tree decl)
library before you can switch the real*16 type at compile time.
 
We use the TARGET_MANGLE_DECL_ASSEMBLER_NAME hook to change this name.  We
-   only do this if the default is that long double is IBM extended double, and
-   the user asked for IEEE 128-bit.  */
+   only do this transformation if the __float128 type is enabled.  This
+   prevents us from doing the transformation on older 32-bit ports that might
+   have enabled using IEEE 128-bit floating point as the default long double
+   type.  */
 
 static tree
 rs6000_mangle_decl_assembler_name (tree decl, tree id)
 {
-  if (!TARGET_IEEEQUAD_DEFAULT && TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128
-  && TREE_CODE (decl) == FUNCTION_DECL && DECL_IS_BUILTIN (decl) )
+  if (TARGET_FLOAT128_TYPE && TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128
+  && TREE_CODE (decl) == FUNCTION_DECL
+  && fndecl_built_in_p (decl, BUILT_IN_NORMAL))
 {
   size_t len = IDENTIFIER_LENGTH (id);
   const char *name = IDENTIFIER_POINTER (id);
+  const char *newname = NULL;
 
-  if (name[len - 1] == 'l')
+  /* See if it is one of the built-in functions with an unusual name.  */
+  switch (DECL_FUNCTION_CODE (decl))
{
- bool uses_ieee128_p = false;
- tree type = TREE_TYPE (decl);
- machine_mode ret_mode = TYPE_MODE (type);
+   default:
+ break;
 
- /* See if the function returns a IEEE 128-bit floating point type or
-complex type.  */
- if (ret_mode == TFmode || ret_mode == TCmode)
-   uses_ieee128_p = true;
- else
+   case BUILT_IN_DREML:
+ newname = "remainderf128";
+ break;
+
+   case BUILT_IN_GAMMAL:
+ newname = "lgammaf128";
+ break;
+
+   case BUILT_IN_GAMMAL_R:
+   case BUILT_IN_LGAMMAL_R:
+ newname = "__lgammaieee128_r";
+ break;
+
+   case BUILT_IN_NEXTTOWARD:
+ newname = "__nexttoward_to_ieee128";
+ break;
+
+   case BUILT_IN_NEXTTOWARDF:
+ newname = "__nexttowardf_to_ieee128";
+ break;
+
+   case BUILT_IN_NEXTTOWARDL:
+ newname = "__nexttowardieee128";
+ break;
+
+   case BUILT_IN_POW10L:
+ newname = "exp10f128";
+ break;
+
+   case BUILT_IN_SCALBL:
+ newname = "__scalbnieee128";
+ break;
+
+   case BUILT_IN_SIGNIFICANDL:
+ newname = "__significandieee128";
+ break;
+
+   case BUILT_IN_SINCOSL:
+ newname = "__sincosieee128";
+ break;
+   }
+
+  /* Update the __builtin_*printf && __builtin_*scanf functions.  */
+  if (!newname)
+   {
+ const size_t printf_len = sizeof ("printf") - 1;
+ const size_t scanf_len = sizeof ("scanf") - 1;
+ const size_t printf_extra
+   = sizeof ("__") - 1 + sizeof ("ieee128") - 1;
+ const size_t scanf_extra
+   = sizeof ("__isoc99_") - 1 + sizeof ("ieee128") - 1;
+
+ if (len >= printf_len
+ && strcmp (name + len - printf_len, 

[PATCH 0/9] PowerPC: Patches to enable changing the long double default to IEEE 128-bit on little endian PowerPC 64-bit Linux systems

2020-09-24 Thread Michael Meissner via Gcc-patches
This series of 9 patches is an attempt to gather together all of the patches
that are needed to be able to configure and build a little endian 64-bit
PowerPC Linux GCC compiler where the defualt long double format uses the IEEE
128-bit representation.

I have created an IBM vendor branch that includes these patches (along the
other outstanding patches that I have for IEEE 128-bit min/max/cmove on
power10, and power10 PCREL_OPT support):

vendors/ibm/ieee-longdouble-001

You will need a new enough GLIBC in order to do this configuration.  The
Advance Toolchain AT14.0 from IBM includes the changes in the library that are
needed to build a compiler with this default.

Note, with these patches, we need the libstdc++ work that was begun last year
to be finished and committed.  This shows up in trying to build the Spec 2017
511.parest_r (rate) benchmark when long double uses the IEEE representation.

Using the steps outlined below, I have build and bootstraped current GCC
sources, comparing builds where the default long double is the current IBM
extended double to builds where long double uses the IEEE 128-bit
representation.  The only difference in C, C++, LTO, and Fortran tests are 3
Fortran tests that either were marked as XFAIL or just failed now pass.

The patches that will be posted include:

#1  Map built-in function names for long double;
#2  Update error messages intermixing the 2 128-bit types;
#3  Fixes libgcc conversions between the 2 128-bit types;
#4  Add support for converting IEEE 128-bit <-> Decimal;
#5  Update tests to run with IEEE 128-bit long double;
#6  Map nanq, nansq, etc. to long double if long double is IEEE;
#7  Update power10 __float128 tests to work with IEEE long double;
#8  Use __float128 in some of the tests instead of __ieee128; (and)
#9  Use __builtin_pack_ieee128 in libgcc if IEEE long double.

I put the following file in the branch:

gcc/config/rs6000/gcc-with-ieee-128bit-longdouble.txt

This is a short memo of how to build a GCC 11 compiler where the long double
type is IEEE 128-bit instead of using the IBM extended double format on the
PowerPC 64-bit little endian Linux environment.

You will likely need the Advance Toolchain AT14.0 library, as it has all of the
changes to support switching the long double default to IEEE 128-bit.

*   https://www.ibm.com/support/pages/advance-toolchain-linux-power

You will need a recent version of binutils.  I've used the binutils that I
downloaded via git on September 14th, 2020:

*   git clone git://sourceware.org/git/binutils-gdb.git

You will need appropriate versions of the gmp, mpfr, and mpc libraries:

*   http://gcc.gnu.org/pub/gcc/infrastructure/gmp-6.1.0.tar.bz2
*   http://gcc.gnu.org/pub/gcc/infrastructure/mpfr-3.1.4.tar.bz2
*   http://gcc.gnu.org/pub/gcc/infrastructure/mpc-1.0.3.tar.gz

Currently, I use --without-ppl --without-cloog --without-isl so I haven't used
those libraries.

I currently disable plug-in support.  If you want plug-in support, you will
likely need to build a binutils with the first compiler, to use with the second
and third compilers.  If you use a binutils compiled with a compiler where the
long double format is IBM extended double, it may not work.

I found I needed the configuration option --with-system-zlib to avoid some
issues when doing a bootstrap build.

Build the first PowerPC GCC compiler (non-bootstrap) using at least the
following options:

--prefix=
--enable-stage1-languages=c,c++,fortran
--disable-bootstrap
--disable-plugin
--with-long-double-format=ieee
--with-advance-toolchain=at14.0
--with-system-zlib
--with-native-system-header-dir=/opt/at14.0/include
--without-ppl
--without-cloog
--without-isl

Other configuration options that I use but may not affect switching the long
double default include:

--enable-checking
--enable-languages=c,c++,fortran
--enable-stage1-checking
--enable-gnu-indirect-function
--enable-decimal-float
--with-long-double-128
--enable-secureplt
--enable-threads=posix
--enable-__cxa_atexit
--with-as=
--with-ld=
--with-gnu-as=
--with-gnu-ld=
--with-cpu=power9   (or --with-cpu=power8)

Build and install the first compiler.

Configure, build, and install gmp 6.1.0 using the first compiler built above
with following configuration options:

--prefix=
--enable-static
--disable-shared
--enable-cxx
CPPFLAGS=-fexceptions

Configure, build, and install mpfr 3.1.4 using the first compiler build above
with the following configuration options:

--prefix=
--enable-static
--disable-shared
--with-gmp=

Configure, build, and install mpc 1.0.3 using the first compiler build above
with the following configuration options:


c++: Cleanup some decl pushing apis

2020-09-24 Thread Nathan Sidwell


In cleaning up local decl handling, here's an initial patch that takes
advantage of C++'s default args for the is_friend parm of pushdecl,
duplicate_decls and push_template_decl_real and the scope & tpl_header
parms of xref_tag.  Then many of the calls simply not mention these.
I also rename push_template_decl_real to push_template_decl, deleting
the original forwarding function.  This'll make my later patches
changing their types less intrusive.  There are 2 functional changes:

1) push_template_decl requires is_friend to be correct, it doesn't go
checking for a friend function (an assert is added).

2) debug_overload prints out Hidden and Using markers for the overload set.

gcc/cp/
* cp-tree.h (duplicate_decls): Default is_friend to false.
(xref_tag): Default tag_scope & tpl_header_p to ts_current & false.
(push_template_decl_real): Default is_friend to false.  Rename to
...
(push_template_decl): ... here.  Delete original decl.
* name-lookup.h (pushdecl_namespace_level): Default is_friend to
false.
(pushtag): Default tag_scope to ts_current.
* coroutine.cc (morph_fn_to_coro): Drop default args to xref_tag.
* decl.c (start_decl): Drop default args to duplicate_decls.
(start_enum): Drop default arg to pushtag & xref_tag.
(start_preparsed_function): Pass DECL_FRIEND_P to
push_template_decl.
(grokmethod): Likewise.
* friend.c (do_friend): Rename push_template_decl_real calls.
* lambda.c (begin_lamnbda_type): Drop default args to xref_tag.
(vla_capture_type): Likewise.
* name-lookup.c (maybe_process_template_type_declaration): Rename
push_template_decl_real call.
(pushdecl_top_level_and_finish): Drop default arg to
pushdecl_namespace_level.
* pt.c (push_template_decl_real): Assert no surprising friend
functions.  Rename to ...
(push_template_decl): ... here.  Delete original function.
(lookup_template_class_1): Drop default args from pushtag.
(instantiate_class_template_1): Likewise.
* ptree.c (debug_overload): Print hidden and using markers.
* rtti.c (init_rtti_processing): Drop refault args from xref_tag.
* semantics.c (begin_class_definition): Drop default args to
pushtag.
gcc/objcp/
* objcp-decl.c (objcp_start_struct): Drop default args to
xref_tag.
(objcp_xref_tag): Likewise.
libcc1/
* libcp1plugin.cc (supplement_binding): Drop default args to
duplicate_decls.
(safe_pushtag): Drop scope parm.  Drop default args to pushtag.
(safe_pushdecl_maybe_friend): Rename to ...
(safe_pushdecl): ... here. Drop is_friend parm.  Drop default args
to pushdecl.
(plugin_build_decl): Adjust safe_pushdecl & safe_pushtag calls.
(plugin_build_constant): Adjust safe_pushdecl call.


pushing to trunk

nathan
--
Nathan Sidwell
diff --git i/gcc/cp/coroutines.cc w/gcc/cp/coroutines.cc
index 898b88b7075..ba813454a0b 100644
--- i/gcc/cp/coroutines.cc
+++ w/gcc/cp/coroutines.cc
@@ -4011,7 +4011,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree *destroyer)
   /* 2. Types we need to define or look up.  */
 
   tree fr_name = get_fn_local_identifier (orig, "frame");
-  tree coro_frame_type = xref_tag (record_type, fr_name, ts_current, false);
+  tree coro_frame_type = xref_tag (record_type, fr_name);
   DECL_CONTEXT (TYPE_NAME (coro_frame_type)) = current_scope ();
   tree coro_frame_ptr = build_pointer_type (coro_frame_type);
   tree act_des_fn_type
diff --git i/gcc/cp/cp-tree.h w/gcc/cp/cp-tree.h
index 029a165a3e8..3ae48749b3d 100644
--- i/gcc/cp/cp-tree.h
+++ w/gcc/cp/cp-tree.h
@@ -6461,7 +6461,8 @@ extern void note_iteration_stmt_body_end	(bool);
 extern void determine_local_discriminator	(tree);
 extern int decls_match(tree, tree, bool = true);
 extern bool maybe_version_functions		(tree, tree, bool);
-extern tree duplicate_decls			(tree, tree, bool);
+extern tree duplicate_decls			(tree, tree,
+		 bool is_friend = false);
 extern tree declare_local_label			(tree);
 extern tree define_label			(location_t, tree);
 extern void check_goto(tree);
@@ -6501,7 +6502,9 @@ extern tree get_scope_of_declarator		(const cp_declarator *);
 extern void grok_special_member_properties	(tree);
 extern bool grok_ctor_properties		(const_tree, const_tree);
 extern bool grok_op_properties			(tree, bool);
-extern tree xref_tag(enum tag_types, tree, tag_scope, bool);
+extern tree xref_tag(tag_types, tree,
+		 tag_scope = ts_current,
+		 bool tpl_header_p = false);
 extern void xref_basetypes			(tree, tree);
 extern tree start_enum(tree, tree, tree, tree, bool, bool *);
 extern void finish_enum_value_list		(tree);
@@ -6849,8 +6852,7 @@ extern void end_template_parm_list		(void);
 extern void end_template_decl			(void);
 extern tree maybe_update_decl_type		(tree, tree);
 

Re: [PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-24 Thread Segher Boessenkool
Hi!

On Thu, Sep 24, 2020 at 04:55:21PM +0200, Richard Biener wrote:
> Btw, on x86_64 the following produces sth reasonable:
> 
> #define N 32
> typedef int T;
> typedef T V __attribute__((vector_size(N)));
> V setg (V v, int idx, T val)
> {
>   V valv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
>   V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == valv);
>   v = (v & ~mask) | (valv & mask);
>   return v;
> }
> 
> vmovd   %edi, %xmm1
> vpbroadcastd%xmm1, %ymm1
> vpcmpeqd.LC0(%rip), %ymm1, %ymm2
> vpblendvb   %ymm2, %ymm1, %ymm0, %ymm0
> ret
> 
> I'm quite sure you could do sth similar on power?

This only allows inserting aligned elements.  Which is probably fine
of course (we don't allow elements that straddle vector boundaries
either, anyway).

And yes, we can do that :-)

That should be
  #define N 32
  typedef int T;
  typedef T V __attribute__((vector_size(N)));
  V setg (V v, int idx, T val)
  {
V valv = (V){val, val, val, val, val, val, val, val};
V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv);
v = (v & ~mask) | (valv & mask);
return v;
  }

after which I get (-march=znver2)

setg:
vmovd   %edi, %xmm1
vmovd   %esi, %xmm2
vpbroadcastd%xmm1, %ymm1
vpbroadcastd%xmm2, %ymm2
vpcmpeqd.LC0(%rip), %ymm1, %ymm1
vpandn  %ymm0, %ymm1, %ymm0
vpand   %ymm2, %ymm1, %ymm1
vpor%ymm0, %ymm1, %ymm0
ret

.LC0:
.long   0
.long   1
.long   2
.long   3
.long   4
.long   5
.long   6
.long   7

and for powerpc (changing it to 16B vectors, -mcpu=power9) it is

setg:
addis 9,2,.LC0@toc@ha
mtvsrws 32,5
mtvsrws 33,6
addi 9,9,.LC0@toc@l
lxv 45,0(9)
vcmpequw 0,0,13
xxsel 34,34,33,32
blr

.LC0:
.long   0
.long   1
.long   2
.long   3

(We can generate that 0..3 vector without doing loads; I guess x86 can
do that as well?  But it takes more than one insn to do (of course we
have to set up the memory address first *with* the load, heh).)

For power8 it becomes (we need to splat in separate insns):

setg:
addis 9,2,.LC0@toc@ha
mtvsrwz 32,5
mtvsrwz 33,6
addi 9,9,.LC0@toc@l
lxvw4x 45,0,9
xxspltw 32,32,1
xxspltw 33,33,1
vcmpequw 0,0,13
xxsel 34,34,33,32
blr


Segher


Re: [PATCH] tree-optimization/97151 - improve PTA for C++ operator delete

2020-09-24 Thread Jason Merrill via Gcc-patches

On 9/24/20 3:43 AM, Richard Biener wrote:

On Wed, 23 Sep 2020, Jason Merrill wrote:


On 9/23/20 2:42 PM, Richard Biener wrote:

On September 23, 2020 7:53:18 PM GMT+02:00, Jason Merrill 
wrote:

On 9/23/20 4:14 AM, Richard Biener wrote:

C++ operator delete, when DECL_IS_REPLACEABLE_OPERATOR_DELETE_P,
does not cause the deleted object to be escaped.  It also has no
other interesting side-effects for PTA so skip it like we do
for BUILT_IN_FREE.


Hmm, this is true of the default implementation, but since the function

is replaceable, we don't know what a user definition might do with the
pointer.


But can the object still be 'used' after delete? Can delete fail / throw?

What guarantee does the predicate give us?


The deallocation function is called as part of a delete expression in order to
release the storage for an object, ending its lifetime (if it was not ended by
a destructor), so no, the object can't be used afterward.


OK, but the delete operator can access the object contents if there
wasn't a destructor ...



A deallocation function that throws has undefined behavior.


OK, so it seems the 'replaceable' operators are the global ones
(for user-defined/class-specific placement variants I see arbitrary
extra arguments that we'd possibly need to handle).

I'm happy to revert but I'd like to have a testcase that FAILs
with the patch ;)

Now, the following aborts:

struct X {
   static struct X saved;
   int *p;
   X() { __builtin_memcpy (this, , sizeof (X)); }
};
void operator delete (void *p)
{
   __builtin_memcpy (::saved, p, sizeof (X));
}
int main()
{
   int y = 1;
   X *p = new X;
   p->p = 
   delete p;
   X *q = new X;
   *(q->p) = 2;
   if (y != 2)
 __builtin_abort ();
}

and I could fix this by not making *p but what *p points to escape.
The testcase is of course maximally awkward, but hey ... ;)

Now this would all be moot if operator delete may not access
the object (or if the object contents are undefined at that point).

Oh, and the testcase segfaults when compiled with GCC 10 because
there we elide the new X / delete p pair ... which is invalid then?
Hmm, we emit

   MEM[(struct X *)_8] ={v} {CLOBBER};
   operator delete (_8, 8);

so the object contents are undefined _before_ calling delete
even when I do not have a DTOR?  That is, the above,
w/o -fno-lifetime-dse, makes the PTA patch OK for the testcase.


Yes, all classes have a destructor, even if it's trivial, so the 
object's lifetime definitely ends before the call to operator delete. 
This is less clear for scalar objects, but treating them similarly would 
be consistent with other recent changes, so I think it's fine for us to 
assume that scalar objects are also invalidated before the call to 
operator delete.  But of course this doesn't apply to explicit calls to 
operator delete outside of a delete expression.



Richard.


Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

Richard.

2020-09-23  Richard Biener  

  PR tree-optimization/97151
  * tree-ssa-structalias.c (find_func_aliases_for_call):
  DECL_IS_REPLACEABLE_OPERATOR_DELETE_P has no effect on
  arguments.

* g++.dg/cpp1y/new1.C: Adjust for two more handled transforms.
---
gcc/testsuite/g++.dg/cpp1y/new1.C | 4 ++--
gcc/tree-ssa-structalias.c| 2 ++
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/g++.dg/cpp1y/new1.C

b/gcc/testsuite/g++.dg/cpp1y/new1.C

index aa5f647d535..fec0088cb40 100644
--- a/gcc/testsuite/g++.dg/cpp1y/new1.C
+++ b/gcc/testsuite/g++.dg/cpp1y/new1.C
@@ -69,5 +69,5 @@ test_unused() {
  delete p;
}

-/* { dg-final { scan-tree-dump-times "Deleting : operator delete" 5

"cddce1"} } */

-/* { dg-final { scan-tree-dump-times "Deleting : _\\d+ = operator

new" 7 "cddce1"} } */

+/* { dg-final { scan-tree-dump-times "Deleting : operator delete" 6

"cddce1"} } */

+/* { dg-final { scan-tree-dump-times "Deleting : _\\d+ = operator

new" 8 "cddce1"} } */

diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index 44fe52e0f65..f676bf91e95 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -4857,6 +4857,8 @@ find_func_aliases_for_call (struct function

*fn, gcall *t)

  point for reachable memory of their arguments.  */
   else if (flags & (ECF_PURE|ECF_LOOPING_CONST_OR_PURE))
handle_pure_call (t, );
+  else if (fndecl && DECL_IS_REPLACEABLE_OPERATOR_DELETE_P

(fndecl))

+   ;
   else
 handle_rhs_call (t, );
  if (gimple_call_lhs (t))












Re: [PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-24 Thread Segher Boessenkool
Hi!

On Thu, Sep 24, 2020 at 03:27:48PM +0200, Richard Biener wrote:
> On Thu, Sep 24, 2020 at 10:21 AM xionghu luo  wrote:
> I'll just comment that
> 
> xxperm 34,34,33
> xxinsertw 34,0,12
> xxperm 34,34,32
> 
> doesn't look like a variable-position insert instruction but
> this is a variable whole-vector rotate plus an insert at index zero
> followed by a variable whole-vector rotate.  I'm not fluend in
> ppc assembly but
> 
> rlwinm 6,6,2,28,29
> mtvsrwz 0,5
> lvsr 1,0,6
> lvsl 0,0,6
> 
> possibly computes the shift masks for r33/r32?  though
> I do not see those registers mentioned...

v0/v1 (what the lvs[lr] write to) are the same as vs32/vs33.

The low half of the VSRs (vector-scalar registers) are the FP registers
(expanded to 16B each), and the high half are the original VRs (vector
registers).  AltiVec insns (like lvsl, lvsr) naturally only work on VRs,
as do some newer insns for which there wasn't enough budget in the
opcode space to have for VSRs (which take 6 bits each, while VRs take
only 5, just like FPRs and GPRs).

> This might be a generic viable expansion strathegy btw,
> which is why I asked before whether the CPU supports
> inserts at a variable position ...

ISA 3.1 (Power10) supports variable position inserts.  Power9 supports
fixed position inserts.  Older CPUs can of course construct it some
other way.

> ppc does _not_ have a VSX instruction
> like xxinsertw r34, r8, r12 where r8 denotes
> the vector element (or byte position or whatever).

vins[bhwd][v][lr]x does this.  Those are Power10 instructions.


Segher


libgo patch committed: Don't build __go_ptrace on AIX

2020-09-24 Thread Ian Lance Taylor via Gcc-patches
This libgo patch by Clément Chigot removes __go_ptrace on AIX.  AIX
ptrace syscalls doesn't have the same semantic than the glibc one.
The syscall package is already handling it correctly so disable the
new __go_ptrace C function for AIX.  Bootstrapped and ran Go testsuite
on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
763460e4776ce2d1ca2fe87678fc233f27f70e64
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index f51dac55365..daa0d2d6177 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-6a7648c97c3e0cdbecbec7e760b30246521a6d90
+2357468ae9b071de0e2ebe6574d78572967b7183
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/runtime/go-varargs.c b/libgo/runtime/go-varargs.c
index f9270a97bfd..9cb4a7e79bd 100644
--- a/libgo/runtime/go-varargs.c
+++ b/libgo/runtime/go-varargs.c
@@ -114,7 +114,9 @@ __go_syscall6(uintptr_t flag, uintptr_t a1, uintptr_t a2, 
uintptr_t a3,
 
 #endif
 
-#ifdef HAVE_SYS_PTRACE_H
+// AIX ptrace is really different from Linux ptrace. Let syscall
+// package handles it.
+#if defined(HAVE_SYS_PTRACE_H) && !defined(_AIX)
 
 // Despite documented appearances, this is actually implemented as
 // a variadic function within glibc.


Re: [PATCH 5/5] Conversions between 128-bit integer and floating point values.

2020-09-24 Thread will schmidt via Gcc-patches
On Mon, 2020-09-21 at 16:57 -0700, Carl Love wrote:
> Segher, Will:
> 
> Patch 5 adds the 128-bit integer to/from 128-floating point
> conversions.  This patch has to invoke the routines to use the 128-
> bit
> hardware instructions if on Power 10 or use software routines if
> running on a pre Power 10 system via the resolve function.
> 
> Add ifunc resolves for __fixkfti, __floatuntikf_sw, __fixkfti_swn,
> __fixunskfti_sw.
> 
> The following changes were made to the previous version of the
> patch: 
> 
> Fixed typos in ChangeLog noted by Will.
> 
> Turned off debug in test case.
> 
> Removed extra blank lines, fixed spacing of #else in the test case.
> 
> Added comment to fixunskfti-sw.c about changes made from the original
> file fixunskfti.c.
> 
> The patch has been tested on
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> with no regression errors.
> 
> The P10 tests were run by hand on Mambo.
> 
> Carl Love
> -
> 
> gcc/ChangeLog
> 
> 2020-09-21  Carl Love  
>   config/rs6000/rs6000.md (floatti2, floatunsti2,
>   fix_truncti2, fixuns_truncti2): Add
>   define_insn for mode IEEE 128.
ok

>   libgcc/config/rs6000/fixkfi-sw.c: New file.
>   libgcc/config/rs6000/fixkfi.c: Remove file.

Should that be fixkfti-sw.c (missing t).

Adjust to indicate this is a rename
libgcc/config/rs6000/fixkfti.c: Rename to
libgcc/config/rs6000/fixkfti-sw.c


>   libgcc/config/rs6000/fixunskfti-sw.c: New file.
>   libgcc/config/rs6000/fixunskfti.c: Remove file.
>   libgcc/config/rs6000/float128-hw.c (__floattikf_hw,
>   __floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw):
>   New functions.
>   libgcc/config/rs6000/float128-ifunc.c (SW_OR_HW_ISA3_1):
>   New macro.
>   (__floattikf_resolve, __floatuntikf_resolve, __fixkfti_resolve,
>   __fixunskfti_resolve): Add resolve functions.
>   (__floattikf, __floatuntikf, __fixkfti, __fixunskfti): New
>   functions.
>   libgcc/config/rs6000/float128-sed (floattitf, __floatuntitf,
>   __fixtfti, __fixunstfti): Add editor commands to change
>   names.
>   libgcc/config/rs6000/float128-sed-hw (__floattitf,
>   __floatuntitf, __fixtfti, __fixunstfti): Add editor commands
>   to change names.
>   libgcc/config/rs6000/floattikf-sw.c: New file.
>   libgcc/config/rs6000/floattikf.c: Remove file.
>   libgcc/config/rs6000/floatuntikf-sw.c: New file.
>   libgcc/config/rs6000/floatuntikf.c: Remove file.
>   libgcc/config/rs6000/floatuntikf-sw.c: New file.
>   libgcc/config/rs6000/quaad-float128.h (__floattikf_sw,
>   __floatuntikf_sw, __fixkfti_sw, __fixunskfti_sw,
> __floattikf_hw,
>   __floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw, __floattikf,
>   __floatuntikf, __fixkfti, __fixunskfti): New extern
> declarations.
>   libgcc/config/rs6000/t-float128 (floattikf, floatuntikf,
>   fixkfti, fixunskfti): Remove file names from fp128_ppc_funcs.
>   (floattikf-sw, floatuntikf-sw, fixkfti-sw, fixunskfti-sw): Add
>   file names to fp128_ppc_funcs.
> 
> gcc/testsuite/ChangeLog
> 
> 2020-09-21  Carl Love  
>   gcc.target/powerpc/fl128_conversions.c: New file.
> ---
>  gcc/config/rs6000/rs6000.md   |  36 +++
>  .../gcc.target/powerpc/fp128_conversions.c| 286
> ++
>  .../config/rs6000/{fixkfti.c => fixkfti-sw.c} |   4 +-
>  .../rs6000/{fixunskfti.c => fixunskfti-sw.c}  |   7 +-
>  libgcc/config/rs6000/float128-hw.c|  24 ++
>  libgcc/config/rs6000/float128-ifunc.c |  44 ++-
>  libgcc/config/rs6000/float128-sed |   4 +
>  libgcc/config/rs6000/float128-sed-hw  |   4 +
>  .../rs6000/{floattikf.c => floattikf-sw.c}|   4 +-
>  .../{floatuntikf.c => floatuntikf-sw.c}   |   4 +-
>  libgcc/config/rs6000/quad-float128.h  |  17 +-
>  libgcc/config/rs6000/t-float128   |   3 +-
>  12 files changed, 417 insertions(+), 20 deletions(-)
>  create mode 100644
> gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
>  rename libgcc/config/rs6000/{fixkfti.c => fixkfti-sw.c} (96%)
>  rename libgcc/config/rs6000/{fixunskfti.c => fixunskfti-sw.c} (90%)
>  rename libgcc/config/rs6000/{floattikf.c => floattikf-sw.c} (96%)
>  rename libgcc/config/rs6000/{floatuntikf.c => floatuntikf-sw.c}
> (96%)
> 
> diff --git a/gcc/config/rs6000/rs6000.md
> b/gcc/config/rs6000/rs6000.md
> index 694ff70635e..5db5d0b4505 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -6390,6 +6390,42 @@
> xscvsxddp %x0,%x1"
>[(set_attr "type" "fp")])
> 
> +(define_insn "floatti2"
> +  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
> +   (float:IEEE128 (match_operand:TI 1 "vsx_register_operand"
> "v")))]
> +  "TARGET_POWER10"
> +{
> +  return  "xscvsqqp %0,%1";
> +}
> +  [(set_attr "type" "fp")])
> +
> +(define_insn "floatunsti2"
> +  [(set (match_operand:IEEE128 0 

Re: [PATCH 4/5] Test 128-bit shifts for just the int128 type.

2020-09-24 Thread will schmidt via Gcc-patches
On Mon, 2020-09-21 at 16:56 -0700, Carl Love wrote:
> Segher, Will:
> 
> Patch 4 adds the vector 128-bit integer shift instruction support for
> the V1TI type.
> 
> The following changes were made from the previous version.
> 
> Renamed VSX_TI to VEC_TI, put def in vector.md.  Didn't get it
> separated into a different patch.
> 
> Reworked the XXSWAPD_V1TI to not use UNSPEC.
> 
> Test suite program cleanups, removed "//" comments that were not
> needed.
> 
> The patch has been tested on
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> with no regression errors.
> 
> The P10 test was run by hand on Mambo.
> 
> 
> Carl Love
> 
> --
> 
> 
> gcc/ChangeLog
> 
> 2020-09-21  Carl Love  
>   * config/rs6000/altivec.md (altivec_vslq, altivec_vsrq):
>   Rename altivec_vslq_, altivec_vsrq_, mode VEC_TI.

Nit "Rename to"

>   * config/rs6000/vector.md (VEC_TI): New mode iterator.
>   (vashlv1ti3): Change to vashl3, mode VEC_TI.
>   (vlshrv1ti3): Change to vlshr3, mode VEC_TI.
s/Change/Rename to/

'New' isn't quite right for the mode iterator, since it's renamed from
the VSX_TI iterator.
perhaps something like

* config/rs6000/vector.md (VEC_TI): New name for VSX_TI 
iterator from vsx.md.

>   * config/rs6000/vsx.md (VSX_TI): Remove define_mode_iterator.
>   (VSX_TI): Renamed VEC_TI.


Just the Remove.  VEC_TI doesn't exist in vsx.md. 



> 
> gcc/testsuite/ChangeLog
> 
> 2020-09-21  Carl Love  
>   gcc.target/powerpc/int_128bit-runnable.c: Add shift_right,
> shift_left
>   tests.
> ---
>  gcc/config/rs6000/altivec.md  | 16 -
>  gcc/config/rs6000/vector.md   | 27 ---
>  gcc/config/rs6000/vsx.md  | 33 +--
> 
>  .../gcc.target/powerpc/int_128bit-runnable.c  | 16 +++--
>  4 files changed, 52 insertions(+), 40 deletions(-)
> 
> diff --git a/gcc/config/rs6000/altivec.md
> b/gcc/config/rs6000/altivec.md
> index 34a4731342a..5db3de3cc9f 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -2219,10 +2219,10 @@
>"vsl %0,%1,%2"
>[(set_attr "type" "vecsimple")])
> 
> -(define_insn "altivec_vslq"
> -  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> - (ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> -  (match_operand:V1TI 2 "vsx_register_operand"
> "v")))]
> +(define_insn "altivec_vslq_"
> +  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
> + (ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand"
> "v")
> +  (match_operand:VEC_TI 2 "vsx_register_operand"
> "v")))]
>"TARGET_POWER10"
>/* Shift amount in needs to be in bits[57:63] of 128-bit operand.
> */
>"vslq %0,%1,%2"
> @@ -2236,10 +2236,10 @@
>"vsr %0,%1,%2"
>[(set_attr "type" "vecsimple")])
> 
> -(define_insn "altivec_vsrq"
> -  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> - (lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand"
> "v")
> -(match_operand:V1TI 2 "vsx_register_operand"
> "v")))]
> +(define_insn "altivec_vsrq_"
> +  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
> + (lshiftrt:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand"
> "v")
> +(match_operand:VEC_TI 2
> "vsx_register_operand" "v")))]
>"TARGET_POWER10"
>/* Shift amount in needs to be in bits[57:63] of 128-bit operand.
> */
>"vsrq %0,%1,%2"
> diff --git a/gcc/config/rs6000/vector.md
> b/gcc/config/rs6000/vector.md
> index 0cca4232619..3ea3a91845a 100644
> --- a/gcc/config/rs6000/vector.md
> +++ b/gcc/config/rs6000/vector.md
> @@ -26,6 +26,9 @@
>  ;; Vector int modes
>  (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
> 
> +;; 128-bit int modes
> +(define_mode_iterator VEC_TI [V1TI TI])
> +
>  ;; Vector int modes for parity
>  (define_mode_iterator VEC_IP [V8HI
> V4SI
> @@ -1627,17 +1630,17 @@
>"")
> 
>  ;; No immediate version of this 128-bit instruction
> -(define_expand "vashlv1ti3"
> -  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> - (ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> -  (match_operand:V1TI 2 "vsx_register_operand"
> "v")))]
> +(define_expand "vashl3"
> +  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
> + (ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand")
> +  (match_operand:VEC_TI 2
> "vsx_register_operand")))]
>"TARGET_POWER10"
>  {
>/* Shift amount in needs to be put in bits[57:63] of 128-bit
> operand2. */
> -  rtx tmp = gen_reg_rtx (V1TImode);
> +  rtx tmp = gen_reg_rtx (mode);
> 
>emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
> -  emit_insn(gen_altivec_vslq (operands[0], operands[1], tmp));
> +  emit_insn(gen_altivec_vslq_ (operands[0], operands[1],
> tmp));
>DONE;
>  })
> 
> @@ 

Re: [PATCH 3/5] Add TI to TD (128-bit DFP) and TD to TI support

2020-09-24 Thread will schmidt via Gcc-patches
On Mon, 2020-09-21 at 16:56 -0700, Carl Love wrote:
> Segher, Will:
> 
> Add support for converting to/from 128-bit integers and 128-bit 
> decimal floating point formats.

A more wordy blurb here clarifying what the patch does would be useful.

i.e. this adds support for dcffixqq and dctfixqq instructons..

> 
> The updates from the previous version of the patch:
> 
> Removed stray ";; carll" comment.  
> 
> Removed #if 1 and #endif in the test case.
> 
> Replaced TARGET_TI_VECTOR_OPS with POWER10.
> 
> The patch has been tested on
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> with no regression errors.
> 
> The P10 test was run by hand on Mambo.
> 
> 
>  Carl Love
> 
> 
> ---
> 
> gcc/ChangeLog
> 
> 2020-09-21  Carl Love  
>   * config/rs6000/dfp.md (floattitd2, fixtdti2): New define_insns.
> 

ok.


Need changelog blurb to reflect the rs6000-call changes.
(this may have leaked in from previous or subsequent patch?)


> gcc/testsuite/ChangeLog
> 
> 2020-09-21  Carl Love  
>   * gcc.target/powerpc/int_128bit-runnable.c:  Update test.


ok.


> ---


>  gcc/config/rs6000/dfp.md  | 14 +
>  gcc/config/rs6000/rs6000-call.c   |  4 ++
>  .../gcc.target/powerpc/int_128bit-runnable.c  | 62 +++
>  3 files changed, 80 insertions(+)
> 
> diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
> index 8f822732bac..0e82e315fee 100644
> --- a/gcc/config/rs6000/dfp.md
> +++ b/gcc/config/rs6000/dfp.md
> @@ -222,6 +222,13 @@
>"dcffixq %0,%1"
>[(set_attr "type" "dfp")])
> 
> +(define_insn "floattitd2"
> +  [(set (match_operand:TD 0 "gpc_reg_operand" "=d")
> + (float:TD (match_operand:TI 1 "gpc_reg_operand" "v")))]
> +  "TARGET_POWER10"
> +  "dcffixqq %0,%1"
> +  [(set_attr "type" "dfp")])
> +
>  ;; Convert a decimal64/128 to a decimal64/128 whose value is an integer.
>  ;; This is the first stage of converting it to an integer type.
> 
> @@ -241,6 +248,13 @@
>"TARGET_DFP"
>"dctfix %0,%1"
>[(set_attr "type" "dfp")])
> +
> +(define_insn "fixtdti2"
> +  [(set (match_operand:TI 0 "gpc_reg_operand" "=v")
> + (fix:TI (match_operand:TD 1 "gpc_reg_operand" "d")))]
> +  "TARGET_POWER10"
> +  "dctfixqq %0,%1"
> +  [(set_attr "type" "dfp")])
> 
>  ;; Decimal builtin support
> 
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index e1d9c2e8729..9c50cd3c5a7 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -4967,6 +4967,8 @@ const struct altivec_builtin_types 
> altivec_overloaded_builtins[] = {
>  RS6000_BTI_bool_V2DI, 0 },
>{ P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P,
>  RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
> +  { P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P,
> +RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 
> },
> 
>{ P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEFP_P,
>  RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
> @@ -5074,6 +5076,8 @@ const struct altivec_builtin_types 
> altivec_overloaded_builtins[] = {
>  RS6000_BTI_bool_V2DI, 0 },
>{ P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P,
>  RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
> +  { P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P,
> +RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 
> },
>{ P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEFP_P,
>  RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
>{ P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEDP_P,
> diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c 
> b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> index 85ad544e22b..ec3dcf3dff1 100644
> --- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> @@ -38,6 +38,7 @@
>  #if DEBUG
>  #include 
>  #include 
> +#include 
> 
> 
>  void print_i128(__int128_t val)
> @@ -59,6 +60,13 @@ int main ()
>__int128_t arg1, result;
>__uint128_t uarg2;
> 
> +  _Decimal128 arg1_dfp128, result_dfp128, expected_result_dfp128;
> +
> +  struct conv_t {
> +__uint128_t u128;
> +_Decimal128 d128;
> +  } conv, conv2;
> +
>vector signed long long int vec_arg1_di, vec_arg2_di;
>vector unsigned long long int vec_uarg1_di, vec_uarg2_di, vec_uarg3_di;
>vector unsigned long long int vec_uresult_di;
> @@ -2249,6 +2257,60 @@ int main ()
>  abort();
>  #endif
>}
> +  
> +  /* DFP to __int128 and __int128 to DFP conversions */
> +  /* Can't get printing of DFP values to work.  Print the DFP value as an
> + unsigned int so we can see the bit patterns.  */
> +  conv.u128 = 0x2208ULL;
> +  conv.u128 = (conv.u128 << 64) | 0x4ULL;   //DFP bit pattern for integer 4
> +  expected_result_dfp128 = conv.d128;
> 
> +  arg1 = 4;
> +
> +  conv.d128 = (_Decimal128) 

Re: [PATCH 1/5] RS6000 Add 128-bit Binary Integer sign extend operations

2020-09-24 Thread will schmidt via Gcc-patches
On Mon, 2020-09-21 at 16:56 -0700, Carl Love wrote:
> Segher, Will:
> 
> Patch 1, adds the 128-bit sign extension instruction support and
> corresponding builtin support.
> 
> No changes from the previous version.
> 
> The patch has been tested on 
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> with no regression errors.
> 
> Fixed the issues in the ChangeLog noted by Will.
> 
>  Carl Love
> 
> ---
> 
> gcc/ChangeLog
> 
> 2020-09-21  Carl Love  
>   * config/rs6000/altivec.h (vec_signextll, vec_signexti): Add define
>   for new builtins.
>   * config/rs6000/rs6000-builtin.def (VSIGNEXTI, VSIGNEXTLL):  Add
>   overloaded builtin definitions.
>   (VSIGNEXTSB2W, VSIGNEXTSB2D, VSIGNEXTSH2D,VSIGNEXTSW2D): Add builtin
>   expansions.

+VSIGNEXTSH2W


>   * config/rs6000-call.c (P9V_BUILTIN_VEC_VSIGNEXTI,
>   P9V_BUILTIN_VEC_VSIGNEXTLL): Add overloaded argument definitions.
>   * config/rs6000/vsx.md: Make define_insn vsx_sign_extend_si_v2di
>   visible.
>   * doc/extend.texi:  Add documentation for the vec_signexti and
>   vec_signextll builtins.
> 
> gcc/testsuite/ChangeLog
> 
> 2020-09-21  Carl Love  
>   * gcc.target/powerpc/p9-sign_extend-runnable.c:  New test case.
> ---
>  gcc/config/rs6000/altivec.h   |   3 +
>  gcc/config/rs6000/rs6000-builtin.def  |   9 ++
>  gcc/config/rs6000/rs6000-call.c   |  13 ++
>  gcc/config/rs6000/vsx.md  |   2 +-
>  gcc/doc/extend.texi   |  15 ++
>  .../powerpc/p9-sign_extend-runnable.c | 128 ++
>  6 files changed, 169 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
> 
> diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
> index 8a2dcda0144..acc365612be 100644
> --- a/gcc/config/rs6000/altivec.h
> +++ b/gcc/config/rs6000/altivec.h
> @@ -494,6 +494,9 @@
> 
>  #define vec_xlx __builtin_vec_vextulx
>  #define vec_xrx __builtin_vec_vexturx
> +#define vec_signexti  __builtin_vec_vsignexti
> +#define vec_signextll __builtin_vec_vsignextll
> +
>  #endif
> 
>  /* Predicates.
> diff --git a/gcc/config/rs6000/rs6000-builtin.def 
> b/gcc/config/rs6000/rs6000-builtin.def
> index e91a48ddf5f..4c2e9460949 100644
> --- a/gcc/config/rs6000/rs6000-builtin.def
> +++ b/gcc/config/rs6000/rs6000-builtin.def
> @@ -2715,6 +2715,8 @@ BU_P9V_OVERLOAD_1 (VPRTYBD, "vprtybd")
>  BU_P9V_OVERLOAD_1 (VPRTYBQ,  "vprtybq")
>  BU_P9V_OVERLOAD_1 (VPRTYBW,  "vprtybw")
>  BU_P9V_OVERLOAD_1 (VPARITY_LSBB, "vparity_lsbb")
> +BU_P9V_OVERLOAD_1 (VSIGNEXTI,"vsignexti")
> +BU_P9V_OVERLOAD_1 (VSIGNEXTLL,   "vsignextll")
> 
>  /* 2 argument functions added in ISA 3.0 (power9).  */
>  BU_P9_2 (CMPRB,  "byte_in_range",CONST,  cmprb)
> @@ -2726,6 +2728,13 @@ BU_P9_OVERLOAD_2 (CMPRB,   "byte_in_range")
>  BU_P9_OVERLOAD_2 (CMPRB2,"byte_in_either_range")
>  BU_P9_OVERLOAD_2 (CMPEQB,"byte_in_set")
>  
> +/* Sign extend builtins that work on ISA 3.0, but not defined until ISA 3.1. 
>  */
> +BU_P9V_AV_1 (VSIGNEXTSB2W,   "vsignextsb2w", CONST,  
> vsx_sign_extend_qi_v4si)
> +BU_P9V_AV_1 (VSIGNEXTSH2W,   "vsignextsh2w", CONST,  
> vsx_sign_extend_hi_v4si)
> +BU_P9V_AV_1 (VSIGNEXTSB2D,   "vsignextsb2d", CONST,  
> vsx_sign_extend_qi_v2di)
> +BU_P9V_AV_1 (VSIGNEXTSH2D,   "vsignextsh2d", CONST,  
> vsx_sign_extend_hi_v2di)
> +BU_P9V_AV_1 (VSIGNEXTSW2D,   "vsignextsw2d", CONST,  
> vsx_sign_extend_si_v2di)
> +
>  /* Builtins for scalar instructions added in ISA 3.1 (power10).  */
>  BU_P10_MISC_2 (CFUGED, "cfuged", CONST, cfuged)
>  BU_P10_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm)
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index a8b520834c7..9e514a01012 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -5527,6 +5527,19 @@ const struct altivec_builtin_types 
> altivec_overloaded_builtins[] = {
>  RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
>  RS6000_BTI_INTSI, RS6000_BTI_INTSI },
> 
> +  /* Sign extend builtins that work work on ISA 3.0, not added until ISA 3.1 
> */
> +  { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSB2W,
> +RS6000_BTI_V4SI, RS6000_BTI_V16QI, 0, 0 },
> +  { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSH2W,
> +RS6000_BTI_V4SI, RS6000_BTI_V8HI, 0, 0 },
> +
> +  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSB2D,
> +RS6000_BTI_V2DI, RS6000_BTI_V16QI, 0, 0 },
> +  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSH2D,
> +RS6000_BTI_V2DI, RS6000_BTI_V8HI, 0, 0 },
> +  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSW2D,
> +RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
> +
>/* Overloaded built-in functions for ISA3.1 (power10). */
>{ P10_BUILTIN_VEC_CLRL, P10V_BUILTIN_VCLRLB,
>  RS6000_BTI_V16QI, 

Re: [PATCH 2/5] RS6000 add 128-bit Integer Operations

2020-09-24 Thread will schmidt via Gcc-patches
On Mon, 2020-09-21 at 16:56 -0700, Carl Love wrote:
> Will, Segher:
> 
> Add support for divide, modulo, shift, compare of 128-bit
> integers instructions and builtin support.
> 
> The following are the changes from the previous version of the patch.
> 
> The TARGET_TI_VECTOR_OPS was removed per comments for patch 3.  Just
> using TARGET_POWER10.
> 
> Removed extra comment.
> 
> Note the change
> 
> -#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((c)<<8)|(b)))
> +#define
> vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((b)<<8)|(c)))
> 
> is a bug fix. Added missing comment to ChangeLog.
> 
> Removed vector_eqv1ti, used eqvv1ti3 instead.
> 
> Test case, put ppc_native_128bit in the dg-require command.
> 
> The patch has been tested on
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> with no regression errors.
> 
> The P10 test was run by hand on Mambo.
> 
> 
> Carl Love
> 
> 
> ---
> 
> gcc/ChangeLog
> 
>   2020-09-21  Carl Love  
>   * config/rs6000/altivec.h (vec_signextq, vec_dive, vec_mod): Add define
>   for new builtins.
>   (vec_rlnm): Fix bug in argument generation.


If there is delay, this bugfix could (and probably should) be broken out into 
it's own patch.


>   * config/rs6000/altivec.md (UNSPEC_VMULEUD, UNSPEC_VMULESD,
>   UNSPEC_VMULOUD, UNSPEC_VMULOSD): New unspecs.
>   (altivec_eqv1ti, altivec_gtv1ti, altivec_gtuv1ti, altivec_vmuleud,
>   altivec_vmuloud, altivec_vmulesd, altivec_vmulosd, altivec_vrlq,
>   altivec_vrlqmi, altivec_vrlqmi_inst, altivec_vrlqnm,
>   altivec_vrlqnm_inst, altivec_vslq, altivec_vsrq, altivec_vsraq,
>   altivec_vcmpequt_p, altivec_vcmpgtst_p, altivec_vcmpgtut_p): New
>   define_insn.

altivec_vrlqnm, altivec_vrlqmi should be in the define_expand list.


>   (vec_widen_umult_even_v2di, vec_widen_smult_even_v2di,
>   vec_widen_umult_odd_v2di, vec_widen_smult_odd_v2di, altivec_vrlqmi,
>   altivec_vrlqnm): New define_expands.

Actually, they are here... so just need to be removed from the
define_insn list.  :-)


>   * config/rs6000/rs6000-builtin.def (BU_P10_P, BU_P10_128BIT_1,
>   BU_P10_128BIT_2, BU_P10_128BIT_3): New macro definitions.

Question below.

>   (VCMPEQUT_P, VCMPGTST_P, VCMPGTUT_P): Add macro expansions.
>   (VCMPGTUT, VCMPGTST, VCMPEQUT, CMPNET, CMPGE_1TI,
>   CMPGE_U1TI, CMPLE_1TI, CMPLE_U1TI, VNOR_V1TI_UNS, VNOR_V1TI, VCMPNET_P,
>   VCMPAET_P): New macro expansions.
>   (VSIGNEXTSD2Q, VMULEUD, VMULESD, VMULOUD, VMULOSD, VRLQ, VSLQ,
>   VSRQ, VSRAQ, VRLQNM, DIV_V1TI, UDIV_V1TI, DIVES_V1TI, DIVEU_V1TI,
>   MODS_V1TI, MODU_V1TI, VRLQMI): New macro expansions.
>   (VRLQ, VSLQ, VSRQ, VSRAQ, DIVE, MOD, SIGNEXT): New overload expansions.
>   * config/rs6000/rs6000-call.c (P10_BUILTIN_VCMPEQUT,
>   P10_BUILTIN_CMPGE_1TI, P10_BUILTIN_CMPGE_U1TI,
>   P10_BUILTIN_VCMPGTUT, P10_BUILTIN_VCMPGTST,
>   P10_BUILTIN_CMPLE_1TI, P10_BUILTIN_VCMPLE_U1TI,
>   P10_BUILTIN_128BIT_DIV_V1TI, P10_BUILTIN_128BIT_UDIV_V1TI,
>   P10_BUILTIN_128BIT_VMULESD, P10_BUILTIN_128BIT_VMULEUD,
>   P10_BUILTIN_128BIT_VMULOSD, P10_BUILTIN_128BIT_VMULOUD,
>   P10_BUILTIN_VNOR_V1TI, P10_BUILTIN_VNOR_V1TI_UNS,
>   P10_BUILTIN_128BIT_VRLQ, P10_BUILTIN_128BIT_VRLQMI,
>   P10_BUILTIN_128BIT_VRLQNM, P10_BUILTIN_128BIT_VSLQ,
>   P10_BUILTIN_128BIT_VSRQ, P10_BUILTIN_128BIT_VSRAQ,
>   P10_BUILTIN_VCMPGTUT_P, P10_BUILTIN_VCMPGTST_P,
>   P10_BUILTIN_VCMPEQUT_P, P10_BUILTIN_VCMPGTUT_P,
>   P10_BUILTIN_VCMPGTST_P, P10_BUILTIN_CMPNET,
>   P10_BUILTIN_VCMPNET_P, P10_BUILTIN_VCMPAET_P,
>   P10_BUILTIN_128BIT_VSIGNEXTSD2Q, P10_BUILTIN_128BIT_DIVES_V1TI,
>   P10_BUILTIN_128BIT_MODS_V1TI, P10_BUILTIN_128BIT_MODU_V1TI):
>   New overloaded definitions.

Looks like those should (all?) now be P10V_BUILTIN_.


>   (int_ftype_int_v1ti_v1ti) [P10_BUILTIN_VCMPEQUT,

?  That appeasr to be the (rs6000_gimple_fold_builtin) function.  

>   P10_BUILTIN_CMPNET, P10_BUILTIN_CMPGE_1TI,
>   P10_BUILTIN_CMPGE_U1TI, P10_BUILTIN_VCMPGTUT,
>   P10_BUILTIN_VCMPGTST, P10_BUILTIN_CMPLE_1TI,
>   P10_BUILTIN_CMPLE_U1TI, E_V1TImode]: New case statements.

Also should be P10V_BUILTIN_  ?

I see both P10_BUILTIN_CMPNET and P10V_BUILTIN_CMPNET references in the
patch.  

>   (int_ftype_int_v1ti_v1ti) [bool_V1TI_type_node, 
> int_ftype_int_v1ti_v1ti]:
>   New assignments.

Thats in the (rs6000_init_builtins) function.

>   (altivec_init_builtins): New E_V1TImode case statement.
>   (builtin_function_type)[P10_BUILTIN_128BIT_VMULEUD,
>   P10_BUILTIN_128BIT_VMULOUD, P10_BUILTIN_128BIT_DIVEU_V1TI,
>   P10_BUILTIN_128BIT_MODU_V1TI, P10_BUILTIN_CMPGE_U1TI,
>   P10_BUILTIN_VCMPGTUT, P10_BUILTIN_VCMPEQUT]: New case statements.

May need a refresh with respect to the P10_BUILTIN_ vs
P10V_BUILTIN_ ... 


>   * config/rs6000/r6000.c 

Re: *PING* Re: [PATCH] Fortran : ICE in build_field PR95614

2020-09-24 Thread Thomas König

Hi Mark,


I haven't yet committed this.


I am unfamiliar with Andre, I've checked MAINTAINERS and I find Andre in 
the "Write after approval" section.


Is Andre's approval sufficient? If so MAINTAINERS needs to be updated.


The official list of people who can review is at

https://gcc.gnu.org/fortran/

but that is clearly not sufficient.  We need to update it to reflect
current realities, there are people who have approved patches (with
nobody objecting) for a long time, and many people who are on that list
are no longer active.

I'm not 100% sure what is needed to update that file, if we need OK
from the steering committee or not.  I had already taken a straw
poll I think I will simply do tomorrow moring.  If anybody
strenuously objects, we can always revert the patch :-)


If not: OK to commit and backport?


OK from my side, and thanks for the patch.

Best regards

Thomas


Re: [PATCH] PR libstdc++/71579 assert that type traits are not misused with an incomplete type

2020-09-24 Thread Jonathan Wakely via Gcc-patches

On 20/08/20 18:31 +0300, Antony Polukhin via Libstdc++ wrote:

ср, 19 авг. 2020 г. в 14:29, Jonathan Wakely :
<...>

Do we also want to check
(std::__is_complete_or_unbounded(__type_identity<_ArgTypes>{}) && ...)
for invoke_result and the is_invocable traits?


Done.

Changelog:

2020-08-20  Antony Polukhin  

   PR libstdc/71579
   * include/std/type_traits (invoke_result, is_invocable, is_invocable_r)
   (is_nothrow_invocable, is_nothrow_invocable_r): Add static_asserts
   to make sure that the arguments of the type traits are not misused
   with incomplete types.`
   * testsuite/20_util/invoke_result/incomplete_args_neg.cc: New test.
   * testsuite/20_util/is_invocable/incomplete_args_neg.cc: New test.
   * testsuite/20_util/is_invocable/incomplete_neg.cc: New test.
   * testsuite/20_util/is_nothrow_invocable/incomplete_args_neg.cc: New test.
   * testsuite/20_util/is_nothrow_invocable/incomplete_neg.cc: Check for
   error on incomplete response type usage in trait.


Committed with some tweaks to the static assert messages to say:

"each argument type must be a complete class or an unbounded array"

Thanks!




Re: [PATCH] libiberty: Add get_DW_UT_name and update include/dwarf2.{def, h}

2020-09-24 Thread Jakub Jelinek via Gcc-patches
On Wed, Sep 23, 2020 at 04:51:01PM +0200, Mark Wielaard wrote:
> This adds a get_DW_UT_name function to dwarfnames using dwarf2.def
> for use in binutils readelf to show the unit types in a DWARF5 header.
> 
> Also remove DW_CIE_VERSION which was already removed in binutils/gdb
> and is not used in gcc.
> 
> include/ChangeLog:
> 
>   * dwarf2.def: Add DWARF5 Unit type header encoding macros
>   DW_UT_FIRST, DW_UT and DW_UT_END.
>   * dwarf2.h (enum dwarf_unit_type): Removed and define using
>   DW_UT_FIRST, DW_UT and DW_UT_END macros.
>   (DW_CIE_VERSION): Removed.
>   (get_DW_UT_name): New function declaration.
> 
> libiberty/ChangeLog:
> 
>   * dwarfnames.c (get_DW_UT_name): Define using DW_UT_FIRST, DW_UT
>   and DW_UT_END.

LGTM, thanks.

Jakub



Re: [GCC 8] [PATCH] Ignore the clobbered stack pointer in asm statment

2020-09-24 Thread H.J. Lu via Gcc-patches
On Thu, Sep 24, 2020 at 9:48 AM H.J. Lu  wrote:
>
> On Wed, Sep 16, 2020 at 4:47 AM Jakub Jelinek  wrote:
> >
> > On Wed, Sep 16, 2020 at 12:34:50PM +0100, Richard Sandiford wrote:
> > > Jakub Jelinek via Gcc-patches  writes:
> > > > On Mon, Sep 14, 2020 at 08:57:18AM -0700, H.J. Lu via Gcc-patches wrote:
> > > >> Something like this for GCC 8 and 9.
> > > >
> > > > Guess my preference would be to do this everywhere and then let's 
> > > > discuss if
> > > > we change the warning into error there or keep it being deprecated.
> > >
> > > Agreed FWIW.  On turning it into an error: I think it might be better
> > > to wait a bit longer if we can.
> >
> > Ok.  The patch is ok for trunk and affected release branches after a week.
> >
>
> I cherry-picked it to GCC 9 and 10 branches.   GCC 8 needs some
> changes.  I am enclosing the backported patch for GCC 8.  I will check
> it in if there are no regressions on Linux/x86-64.
>

No regression.  I am checking it into GCC 8 branch.

-- 
H.J.


[patch] Fix the VX_CPU selection for -mcpu=xscale on arm-vxworks

2020-09-24 Thread Olivier Hainque

This fixlet makes sure -mcpu=xscale selects the correct VX_CPU.

Fixes a number of tests for arm-vxworks.

Committing to mainline shortly.

Olivier


2020-09-24  Olivier Hainque  

* config/arm/vxworks.h (TARGET_OS_CPP_BUILTINS): Fix
the VX_CPU selection for -mcpu=xscale on arm-vxworks.



0009-Fix-thinko-in-TARGET_OS_CPP_BUILTINS-for-arm-vxworks.diff
Description: Binary data


[patch] Fallback to default CPP spec for C++ on VxWorks

2020-09-24 Thread Olivier Hainque
Arrange to inhibit the effects of CPLUSPLUS_CPP_SPEC in gnu-user.h,
which #defines _GNU_SOURCE, which is invalid for VxWorks (possibly
not providing ::mkstemp, for example).

This has been used in gcc-9 based production compilers for several targets
for a year, passed a build & test sequence for powerpc-vxworks7 with gcc-10
and a sanity check build with a recent mainline.

Olivier

2020-09-24  Olivier Hainque  

* config/vxworks.h: #undef CPLUSPLUS_CPP_SPEC.



0005-Fallback-to-default-CPP-spec-for-C-on-VxWorks.diff
Description: Binary data





[patch] Honor $(MULTISUBDIR) in -I directives for libgcc on VxWorks

2020-09-24 Thread Olivier Hainque

To handle ports where we might arrange to use different
sets of fixed headers for different multilibs.

This has been used in gcc-9 based production compilers for several targets
for a year, passed a build & test sequence for powerpc-vxworks7 with gcc-10
and a sanity check build with a recent mainline.

Olivier

2020-09-24  Olivier Hainque 

* libgcc/config/t-vxworks (LIBGCC2_INCLUDES): Append
$(MULTISUBDIR) to the -I path for fixed headers.



0004-Honor-MULTISUBDIR-in-I-directives-for-libgcc-on-VxWo.diff
Description: Binary data


Re: [PATCH] generalized range_query class for multiple contexts

2020-09-24 Thread Martin Sebor via Gcc-patches

On 9/24/20 12:46 AM, Aldy Hernandez wrote:



On 9/24/20 1:53 AM, Martin Sebor wrote:


Finally, unless both a type and function with the same name exist
in the same scope there is no reason to mention the class-id when
referencing a class name.  I.e., this

   value_range_equiv *allocate_value_range_equiv ();
   void free_value_range_equiv (value_range_equiv *);

is the same as this:

   class value_range_equiv *allocate_value_range_equiv ();
   void free_value_range_equiv (class value_range_equiv *);

but the former is shorter and clearer (and in line with existing
practice).


value_range_equiv may not be defined in the scope of range-query.h, so 
that is why the class specifier is there.


I see.  It's probably a reflection of my C++ background that this
style catches my eye.  In C++ I think it's more common to introduce
a forward declaration of a class before using it.

Just as a side note, the first declaration of a type introduces it
into the enclosing namespace so that from that point forward it can
be used without the class-id.  E.g., this is valid:

  struct A
  {
// Need class here...
class B *f ();
// ...but not here...
void g (B *);
  };

 // ...or even here:
 B* A::f () { return 0; }

Either way, the code is correct as is and I don't object to it,
just noting that (at least some of) the class-ids are redundant.

Martin


[patch] Add include-fixed to include search paths for libgcc on VxWorks

2020-09-24 Thread Olivier Hainque

The special vxworks rules for the compilation of libgcc had
-I.../gcc/include and not .../gcc/include-fixed, causing build
failure of our arm-vxworks7r2 port because of indirect dependencies
on limits.h.

The omission was just an oversight and this change just adds the
missing -I,

This fixes the aforementioned build failure, has been used in gcc-9
based production compilers for several targets for a year, passed a build
& test sequence for powerpc-vxworks7 with gcc-10 and a sanity check build
with a recent mainline.

Committing to mainline shortly.

Olivier

2020-09-24  Olivier Hainque  

libgcc/
* config/t-vxworks: Add include-fixed to include search
paths for libgcc on VxWorks.
* config/t-vxworks7: Likewise.



0003-Add-include-fixed-to-include-search-paths-for-libgcc.diff
Description: Binary data




[patch] Adjust the VxWorks alternative LIMITS_H guard for glimits.h

2020-09-24 Thread Olivier Hainque

This is a minor adjustment to the vxworks specific macro name
used to guard the header file contents, to make it closer to the
original one and easier to search for.

We have been using this in gcc-9 based compilers for a while now,
I was able to build and test a gcc-10 based toolchain for ppc-vxworks7
with it, and performed a sanity check build with a recent mainline.

Committing to mainline shortly,

Olivier

2020-09-24  Olivier Hainque  

* config/t-vxworks: Adjust the VxWorks alternative LIMITS_H guard
for glimits.h, make it both closer to the previous one and easier to
search for.



0002-Adjust-the-VxWorks-alternative-LIMITS_H-guard-for-gl.diff
Description: Binary data


[GCC 8] [PATCH] Ignore the clobbered stack pointer in asm statment

2020-09-24 Thread H.J. Lu via Gcc-patches
On Wed, Sep 16, 2020 at 4:47 AM Jakub Jelinek  wrote:
>
> On Wed, Sep 16, 2020 at 12:34:50PM +0100, Richard Sandiford wrote:
> > Jakub Jelinek via Gcc-patches  writes:
> > > On Mon, Sep 14, 2020 at 08:57:18AM -0700, H.J. Lu via Gcc-patches wrote:
> > >> Something like this for GCC 8 and 9.
> > >
> > > Guess my preference would be to do this everywhere and then let's discuss 
> > > if
> > > we change the warning into error there or keep it being deprecated.
> >
> > Agreed FWIW.  On turning it into an error: I think it might be better
> > to wait a bit longer if we can.
>
> Ok.  The patch is ok for trunk and affected release branches after a week.
>

I cherry-picked it to GCC 9 and 10 branches.   GCC 8 needs some
changes.  I am enclosing the backported patch for GCC 8.  I will check
it in if there are no regressions on Linux/x86-64.

Thanks.

H.J.
From 97c34eb5f57bb1d37f3feddefefa5f553bcea9fc Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Mon, 14 Sep 2020 08:52:27 -0700
Subject: [PATCH] rtl_data: Add sp_is_clobbered_by_asm

Add sp_is_clobbered_by_asm to rtl_data to inform backends that the stack
pointer is clobbered by asm statement.

gcc/

	PR target/97032
	* cfgexpand.c (expand_asm_stmt): Set sp_is_clobbered_by_asm to
	true if the stack pointer is clobbered by asm statement.
	* emit-rtl.h (rtl_data): Add sp_is_clobbered_by_asm.
	* config/i386/i386.c (ix86_get_drap_rtx): Set need_drap to true
	if the stack pointer is clobbered by asm statement.

gcc/testsuite/

	PR target/97032
	* gcc.target/i386/pr97032.c: New test.

(cherry picked from commit 453a20c65722719b9e2d84339f215e7ec87692dc)
---
 gcc/cfgexpand.c |  3 +++
 gcc/config/i386/i386.c  |  6 --
 gcc/emit-rtl.h  |  3 +++
 gcc/testsuite/gcc.target/i386/pr97032.c | 22 ++
 4 files changed, 32 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr97032.c

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 18565bf1dab..dcf491954f1 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -2972,6 +2972,9 @@ expand_asm_stmt (gasm *stmt)
 			   regname);
 		return;
 		  }
+		/* Clobbering the stack pointer register.  */
+		else if (reg == (int) STACK_POINTER_REGNUM)
+		  crtl->sp_is_clobbered_by_asm = true;
 
 	SET_HARD_REG_BIT (clobbered_regs, reg);
 	rtx x = gen_rtx_REG (reg_raw_mode[reg], reg);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f3c722b51e9..ce20bc2ab4e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12528,10 +12528,12 @@ ix86_update_stack_boundary (void)
 static rtx
 ix86_get_drap_rtx (void)
 {
-  /* We must use DRAP if there are outgoing arguments on stack and
+  /* We must use DRAP if there are outgoing arguments on stack or
+ the stack pointer register is clobbered by asm statment and
  ACCUMULATE_OUTGOING_ARGS is false.  */
   if (ix86_force_drap
-  || (cfun->machine->outgoing_args_on_stack
+  || ((cfun->machine->outgoing_args_on_stack
+	   || crtl->sp_is_clobbered_by_asm)
 	  && !ACCUMULATE_OUTGOING_ARGS))
 crtl->need_drap = true;
 
diff --git a/gcc/emit-rtl.h b/gcc/emit-rtl.h
index 4e7bd1ec26d..55dc3e84e9c 100644
--- a/gcc/emit-rtl.h
+++ b/gcc/emit-rtl.h
@@ -265,6 +265,9 @@ struct GTY(()) rtl_data {
  pass_stack_ptr_mod has run.  */
   bool sp_is_unchanging;
 
+  /* True if the stack pointer is clobbered by asm statement.  */
+  bool sp_is_clobbered_by_asm;
+
   /* Nonzero if function being compiled doesn't contain any calls
  (ignoring the prologue and epilogue).  This is set prior to
  register allocation in IRA and is valid for the remaining
diff --git a/gcc/testsuite/gcc.target/i386/pr97032.c b/gcc/testsuite/gcc.target/i386/pr97032.c
new file mode 100644
index 000..b9ef2ad0c05
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr97032.c
@@ -0,0 +1,22 @@
+/* { dg-do compile { target { ia32 && fstack_protector } } } */
+/* { dg-options "-O2 -mincoming-stack-boundary=2 -fstack-protector-all" } */
+
+#include 
+
+extern int *__errno_location (void);
+
+long
+sys_socketcall (int op, ...)
+{
+  long int res;
+  va_list ap;
+  va_start (ap, op);
+  asm volatile ("push %%ebx; movl %2, %%ebx; int $0x80; pop %%ebx"
+		: "=a" (res) : "0" (102), "ri" (16), "c" (ap) : "memory", "esp");
+  if (__builtin_expect (res > 4294963200UL, 0))
+*__errno_location () = -res;
+  va_end (ap);
+  return res;
+}
+
+/* { dg-final { scan-assembler "call\[ \t\]*_?__errno_location" } } */
-- 
2.26.2



[committed] libstdc++: Fix misnamed configure option in manual

2020-09-24 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

* doc/xml/manual/configure.xml: Correct name of option.
* doc/html/*: Regenerate.

Committed to trunk.

commit 61f7995398a719f2ff91d07e8f8ed6d4413db697
Author: Jonathan Wakely 
Date:   Thu Sep 24 17:33:16 2020

libstdc++: Fix misnamed configure option in manual

libstdc++-v3/ChangeLog:

* doc/xml/manual/configure.xml: Correct name of option.
* doc/html/*: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/configure.xml 
b/libstdc++-v3/doc/xml/manual/configure.xml
index 58587e858a4..74d6db71ab4 100644
--- a/libstdc++-v3/doc/xml/manual/configure.xml
+++ b/libstdc++-v3/doc/xml/manual/configure.xml
@@ -204,7 +204,8 @@
 
  --enable-libstdcxx-debug-flags=FLAGS
 
- This option is only valid when  --enable-debug 
+ This option is only valid when
+   --enable-libstdcxx-debug
is also specified, and applies to the debug builds only. With
this option, you can pass a specific string of flags to the
compiler to use when building the debug versions of libstdc++.


RE: [PATCH 2/2] arm: Add support for Neoverse N2 CPU

2020-09-24 Thread Kyrylo Tkachov
Hi Alex,

> -Original Message-
> From: Alex Coplan 
> Sent: 24 September 2020 17:01
> To: gcc-patches@gcc.gnu.org
> Cc: ni...@redhat.com; Richard Earnshaw ;
> Ramana Radhakrishnan ; Kyrylo
> Tkachov 
> Subject: [PATCH 2/2] arm: Add support for Neoverse N2 CPU
> 
> This adds support for Arm's Neoverse N2 CPU to the AArch32 backend.
> Neoverse N2 builds AArch32 at EL0 and therefore needs support in AArch32
> GCC.
> 
> Testing:
>  * Bootstrapped and regtested on arm-none-linux-gnueabihf.
> 
> OK for master?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Alex
> 
> ---
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm-cpus.in (neoverse-n2): New.
>   * config/arm/arm-tables.opt: Regenerate.
>   * config/arm/arm-tune.md: Regenerate.
>   * doc/invoke.texi: Document support for Neoverse N2.


RE: [PATCH 1/2] aarch64: Add support for Neoverse N2 CPU

2020-09-24 Thread Kyrylo Tkachov
Hi Alex,

> -Original Message-
> From: Alex Coplan 
> Sent: 24 September 2020 17:00
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw ; Richard Sandiford
> ; Kyrylo Tkachov 
> Subject: [PATCH 1/2] aarch64: Add support for Neoverse N2 CPU
> 
> This patch adds support for Arm's Neoverse N2 CPU to the AArch64
> backend.
> 
> Testing:
>  * Bootstrapped and regtested on aarch64-none-linux-gnu.
> 
> OK for trunk?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Alex
> 
> ---
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-cores.def: Add Neoverse N2.
>   * config/aarch64/aarch64-tune.md: Regenerate.
>   * doc/invoke.texi: Document AArch64 support for Neoverse N2.


[PATCH 2/2, rs6000] VSX load/store rightmost element operations

2020-09-24 Thread will schmidt via Gcc-patches
[PATCH 2/2, rs6000] VSX load/store rightmost element operations

Hi,
  This adds support for the VSX load/store rightmost element operations.
This includes the instructions lxvrbx, lxvrhx, lxvrwx, lxvrdx,
stxvrbx, stxvrhx, stxvrwx, stxvrdx; And the builtins
vec_xl_sext() /* vector load sign extend */
vec_xl_zext() /* vector load zero extend */
vec_xst_trunc() /* vector store truncate */.

Testcase results show that the instructions added with this patch show
up at low/no optimization (-O0), with a number of those being replaced
with other load and store instructions at higher optimization levels.
For consistency I've left the tests at -O0.

Regtested OK for Linux on power8,power9 targets.  Sniff-regtested OK on
power10 simulator.
OK for trunk?

Thanks,
-Will

gcc/ChangeLog:
* config/rs6000/altivec.h (vec_xl_zest, vec_xl_sext, vec_xst_trunc): New
defines.
* config/rs6000/rs6000-builtin.def (BU_P10V_OVERLOAD_X): New builtin 
macro.
(BU_P10V_AV_X): New builtin macro.
(se_lxvrhbx, se_lxrbhx, se_lxvrwx, se_lxvrdx): Define internal names for
load and sign extend vector element.
(ze_lxvrbx, ze_lxvrhx, ze_lxvrwx, ze_lxvrdx): Define internal names for
load and zero extend vector element.
(tr_stxvrbx, tr_stxvrhx, tr_stxvrwx, tr_stxvrdx): Define internal names
for truncate and store vector element.
(se_lxvrx, ze_lxvrx, tr_stxvrx): Define internal names for overloaded
load/store rightmost element.
* config/rs6000/rs6000-call.c (altivec_builtin_types): Define the 
internal
monomorphs P10_BUILTIN_SE_LXVRBX, P10_BUILTIN_SE_LXVRHX,
P10_BUILTIN_SE_LXVRWX, P10_BUILTIN_SE_LXVRDX,
P10_BUILTIN_ZE_LXVRBX, P10_BUILTIN_ZE_LXVRHX, P10_BUILTIN_ZE_LXVRWX,
P10_BUILTIN_ZE_LXVRDX,
P10_BUILTIN_TR_STXVRBX, P10_BUILTIN_TR_STXVRHX, P10_BUILTIN_TR_STXVRWX,
P10_BUILTIN_TR_STXVRDX,
(altivec_expand_lxvr_builtin): New expansion for load element builtins.
(altivec_expand_stv_builtin): Update to support truncate and store 
builtins.
(altivec_expand_builtin): Add clases for the load/store rightmost 
builtins.
(altivec_init_builtins): Add def_builtin entries for
__builtin_altivec_se_lxvrbx, __builtin_altivec_se_lxvrhx,
__builtin_altivec_se_lxvrwx, __builtin_altivec_se_lxvrdx,
__builtin_altivec_ze_lxvrbx, __builtin_altivec_ze_lxvrhx,
__builtin_altivec_ze_lxvrwx, __builtin_altivec_ze_lxvrdx,
__builtin_altivec_tr_stxvrbx, __builtin_altivec_tr_stxvrhx,
__builtin_altivec_tr_stxvrwx, __builtin_altivec_tr_stxvrdx,
__builtin_vec_se_lxvrx, __builtin_vec_ze_lxvrx, __builtin_vec_tr_stxvrx.
* config/rs6000/vsx.md (vsx_lxvrx, vsx_stxvrx, vsx_stxvrx):
New define_insn entries.
* gcc/doc/extend.texi:  Add documentation for vsx_xl_sext, vsx_xl_zext,
and vec_xst_trunc.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-load-element-extend-char.c: New test.
* gcc.target/powerpc/vsx-load-element-extend-int.c: New test.
* gcc.target/powerpc/vsx-load-element-extend-longlong.c: New test.
* gcc.target/powerpc/vsx-load-element-extend-short.c: New test.
* gcc.target/powerpc/vsx-store-element-truncate-char.c: New test.
* gcc.target/powerpc/vsx-store-element-truncate-int.c: New test.
* gcc.target/powerpc/vsx-store-element-truncate-longlong.c: New test.
* gcc.target/powerpc/vsx-store-element-truncate-short.c: New test.

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 8a2dcda..df10a8c 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -234,10 +234,13 @@
 #define vec_lde __builtin_vec_lde
 #define vec_ldl __builtin_vec_ldl
 #define vec_lvebx __builtin_vec_lvebx
 #define vec_lvehx __builtin_vec_lvehx
 #define vec_lvewx __builtin_vec_lvewx
+#define vec_xl_zext __builtin_vec_ze_lxvrx
+#define vec_xl_sext __builtin_vec_se_lxvrx
+#define vec_xst_trunc __builtin_vec_tr_stxvrx
 #define vec_neg __builtin_vec_neg
 #define vec_pmsum_be __builtin_vec_vpmsum
 #define vec_shasigma_be __builtin_crypto_vshasigma
 /* Cell only intrinsics.  */
 #ifdef __PPU__
diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index e91a48d..c481e81 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1143,10 +1143,18 @@
(RS6000_BTC_ ## ATTR/* ATTR */  \
 | RS6000_BTC_BINARY),  \
CODE_FOR_ ## ICODE) /* ICODE */
 #endif
 
+#define BU_P10V_OVERLOAD_X(ENUM, NAME) \
+  RS6000_BUILTIN_X (P10_BUILTIN_VEC_ ## ENUM,  /* ENUM */  \
+   "__builtin_vec_" NAME,  /* NAME */  \
+   RS6000_BTM_P10, /* MASK */  \
+ 

[PATCH 2/2] arm: Add support for Neoverse N2 CPU

2020-09-24 Thread Alex Coplan
This adds support for Arm's Neoverse N2 CPU to the AArch32 backend.
Neoverse N2 builds AArch32 at EL0 and therefore needs support in AArch32
GCC.

Testing:
 * Bootstrapped and regtested on arm-none-linux-gnueabihf.

OK for master?

Thanks,
Alex

---

gcc/ChangeLog:

* config/arm/arm-cpus.in (neoverse-n2): New.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.
* doc/invoke.texi: Document support for Neoverse N2.
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 4550694e138..be563b7f807 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -1459,6 +1459,17 @@ begin cpu neoverse-n1
  part d0c
 end cpu neoverse-n1
 
+begin cpu neoverse-n2
+  cname neoversen2
+  tune for cortex-a57
+  tune flags LDSCHED
+  architecture armv8.5-a+fp16+bf16+i8mm
+  option crypto add FP_ARMv8 CRYPTO
+  costs cortex_a57
+  vendor 41
+  part 0xd49
+end cpu neoverse-n2
+
 # ARMv8.2 A-profile ARM DynamIQ big.LITTLE implementations
 begin cpu cortex-a75.cortex-a55
  cname cortexa75cortexa55
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index 1a7c3191784..b57206313e2 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -243,6 +243,9 @@ Enum(processor_type) String(cortex-a77) Value( 
TARGET_CPU_cortexa77)
 EnumValue
 Enum(processor_type) String(neoverse-n1) Value( TARGET_CPU_neoversen1)
 
+EnumValue
+Enum(processor_type) String(neoverse-n2) Value( TARGET_CPU_neoversen2)
+
 EnumValue
 Enum(processor_type) String(cortex-a75.cortex-a55) Value( 
TARGET_CPU_cortexa75cortexa55)
 
diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md
index 3874f42a26b..2377037bf7d 100644
--- a/gcc/config/arm/arm-tune.md
+++ b/gcc/config/arm/arm-tune.md
@@ -45,7 +45,8 @@ (define_attr "tune"
cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,
cortexa73cortexa53,cortexa55,cortexa75,
cortexa76,cortexa76ae,cortexa77,
-   neoversen1,cortexa75cortexa55,cortexa76cortexa55,
-   neoversev1,cortexm23,cortexm33,
-   cortexm35p,cortexm55,cortexr52"
+   neoversen1,neoversen2,cortexa75cortexa55,
+   cortexa76cortexa55,neoversev1,cortexm23,
+   cortexm33,cortexm35p,cortexm55,
+   cortexr52"
(const (symbol_ref "((enum attr_tune) arm_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 75203ba2420..7948ed4fa95 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -19365,9 +19365,9 @@ Permissible names are: @samp{arm7tdmi}, 
@samp{arm7tdmi-s}, @samp{arm710t},
 @samp{cortex-m35p}, @samp{cortex-m55},
 @samp{cortex-m1.small-multiply}, @samp{cortex-m0.small-multiply},
 @samp{cortex-m0plus.small-multiply}, @samp{exynos-m1}, @samp{marvell-pj4},
-@samp{neoverse-n1}, @samp{neoverse-v1}, @samp{xscale}, @samp{iwmmxt},
-@samp{iwmmxt2}, @samp{ep9312}, @samp{fa526}, @samp{fa626}, @samp{fa606te},
-@samp{fa626te}, @samp{fmp626}, @samp{fa726te}, @samp{xgene1}.
+@samp{neoverse-n1}, @samp{neoverse-n2}, @samp{neoverse-v1}, @samp{xscale},
+@samp{iwmmxt}, @samp{iwmmxt2}, @samp{ep9312}, @samp{fa526}, @samp{fa626},
+@samp{fa606te}, @samp{fa626te}, @samp{fmp626}, @samp{fa726te}, @samp{xgene1}.
 
 Additionally, this option can specify that GCC should tune the performance
 of the code for a big.LITTLE system.  Permissible names are:


[PATCH 1/2] aarch64: Add support for Neoverse N2 CPU

2020-09-24 Thread Alex Coplan
This patch adds support for Arm's Neoverse N2 CPU to the AArch64
backend.

Testing:
 * Bootstrapped and regtested on aarch64-none-linux-gnu.

OK for trunk?

Thanks,
Alex

---

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def: Add Neoverse N2.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi: Document AArch64 support for Neoverse N2.
diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 04dc587681e..469ee99824c 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -136,6 +136,9 @@ AARCH64_CORE("thunderx3t110",  thunderx3t110,  
thunderx3t110, 8_3A,  AARCH64_FL_
 AARCH64_CORE("zeus", zeus, cortexa57, 8_4A,  AARCH64_FL_FOR_ARCH8_4 | 
AARCH64_FL_SVE | AARCH64_FL_RCPC | AARCH64_FL_I8MM | AARCH64_FL_BF16 | 
AARCH64_FL_F16 | AARCH64_FL_PROFILE | AARCH64_FL_SSBS | AARCH64_FL_RNG, 
neoversen1, 0x41, 0xd40, -1)
 AARCH64_CORE("neoverse-v1", neoversev1, cortexa57, 8_4A,  
AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_SVE | AARCH64_FL_RCPC | AARCH64_FL_I8MM | 
AARCH64_FL_BF16 | AARCH64_FL_F16 | AARCH64_FL_PROFILE | AARCH64_FL_SSBS | 
AARCH64_FL_RNG, neoversen1, 0x41, 0xd40, -1)
 
+/* Armv8.5-A Architecture Processors.  */
+AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, 8_5A, 
AARCH64_FL_FOR_ARCH8_5 | AARCH64_FL_I8MM | AARCH64_FL_BF16 | AARCH64_FL_F16 | 
AARCH64_FL_SVE | AARCH64_FL_SVE2 | AARCH64_FL_SVE2_BITPERM | AARCH64_FL_RNG | 
AARCH64_FL_MEMTAG, neoversen1, 0x41, 0xd49, -1)
+
 /* Qualcomm ('Q') cores. */
 AARCH64_CORE("saphira", saphira,saphira,8_4A,  
AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   0x51, 
0xC01, -1)
 
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index 729eb3ec2c7..3cf69ceadaf 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa65,cortexa65ae,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82"
+   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa65,cortexa65ae,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,neoversen2,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 75203ba2420..f420da6c9f8 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17517,8 +17517,8 @@ performance of the code.  Permissible values for this 
option are:
 @samp{cortex-a76}, @samp{cortex-a76ae}, @samp{cortex-a77},
 @samp{cortex-a65}, @samp{cortex-a65ae}, @samp{cortex-a34},
 @samp{ares}, @samp{exynos-m1}, @samp{emag}, @samp{falkor},
-@samp{neoverse-e1},@samp{neoverse-n1},@samp{neoverse-v1},@samp{qdf24xx},
-@samp{saphira},
+@samp{neoverse-e1}, @samp{neoverse-n1}, @samp{neoverse-n2},
+@samp{neoverse-v1}, @samp{qdf24xx}, @samp{saphira},
 @samp{phecda}, @samp{xgene1}, @samp{vulcan}, @samp{octeontx},
 @samp{octeontx81},  @samp{octeontx83},
 @samp{octeontx2}, @samp{octeontx2t98}, @samp{octeontx2t96}


[PATCH 1/2, rs6000] int128 sign extention instructions (partial prereq)

2020-09-24 Thread will schmidt via Gcc-patches
[PATCH, rs6000] int128 sign extention instructions (partial prereq)

Hi
  This is a sub-set of the 128-bit sign extension support patch series
that I believe will be fully implemented in a subsequent patch from Carl.
This is a necessary pre-requisite for the vector-load/store rightmost
element patch that follows in this thread.

Thanks,
-Will

gcc/ChangeLog:
* config/rs6000/rs6000.md (enum c_enum): Add UNSPEC_EXTENDDITI2
and UNSPEC_MTVSRD_DITI_W1 entries.
(mtvsrdd_diti_w1, extendditi2_vector): New define_insns.
(extendditi2): New define_expand.

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 9c5a228..7d0b296 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -150,10 +150,12 @@
UNSPEC_PLTSEQ
UNSPEC_PLT16_HA
UNSPEC_CFUGED
UNSPEC_CNTLZDM
UNSPEC_CNTTZDM
+   UNSPEC_EXTENDDITI2
+   UNSPEC_MTVSRD_DITI_W1
UNSPEC_PDEPD
UNSPEC_PEXTD
   ])
 
 ;;
@@ -963,10 +965,41 @@
   ""
   [(set_attr "type" "shift")
(set_attr "dot" "yes")
(set_attr "length" "4,8")])
 
+;; Move DI value from GPR to TI mode in VSX register, word 1.
+(define_insn "mtvsrdd_diti_w1"
+  [(set (match_operand:TI 0 "register_operand" "=wa")
+   (unspec:TI [(match_operand:DI 1 "register_operand" "r")]
+  UNSPEC_MTVSRD_DITI_W1))]
+  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
+  "mtvsrdd %x0,0,%1"
+  [(set_attr "type" "vecsimple")])
+
+;; Sign extend 64-bit value in TI reg, word 1, to 128-bit value in TI reg
+(define_insn "extendditi2_vector"
+  [(set (match_operand:TI 0 "gpc_reg_operand" "=v")
+(unspec:TI [(match_operand:TI 1 "gpc_reg_operand" "v")]
+ UNSPEC_EXTENDDITI2))]
+  "TARGET_POWER10"
+  "vextsd2q %0,%1"
+  [(set_attr "type" "exts")])
+
+(define_expand "extendditi2"
+  [(set (match_operand:TI 0 "gpc_reg_operand")
+(sign_extend:DI (match_operand:DI 1 "gpc_reg_operand")))]
+  "TARGET_POWER10"
+  {
+/* Move 64-bit src from GPR to vector reg and sign extend to 128-bits */
+rtx temp = gen_reg_rtx (TImode);
+emit_insn (gen_mtvsrdd_diti_w1 (temp, operands[1]));
+emit_insn (gen_extendditi2_vector (operands[0], temp));
+DONE;
+  }
+  [(set_attr "type" "exts")])
+
 
 (define_insn "extendqi2"
   [(set (match_operand:EXTQI 0 "gpc_reg_operand" "=r,?*v")
(sign_extend:EXTQI (match_operand:QI 1 "gpc_reg_operand" "r,?*v")))]
   ""



[PATCH][AArch64][GCC 8] Add support for __jcvt intrinsic

2020-09-24 Thread Kyrylo Tkachov
Hi all,

I'd like to backport support for the __jcvt intrinsic to the active branches as 
it's an Armv8.3-a intrinsic that should have been supported there.
This is a squashed commit of the initial supported and a couple of follow-up 
fixes from Andrea.
This is the GCC 8 version.

Bootstrapped and tested on the branch.

This patch implements the __jcvt ACLE intrinsic [1] that maps down to the 
FJCVTZS [2] instruction from Armv8.3-a.
No fancy mode iterators or nothing. Just a single builtin, UNSPEC and 
define_insn and the associate plumbing.
This patch also defines __ARM_FEATURE_JCVT to indicate when the intrinsic is 
available.

[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics
[2] 
https://developer.arm.com/docs/ddi0596/latest/simd-and-floating-point-instructions-alphabetic-order/fjcvtzs-floating-point-javascript-convert-to-signed-fixed-point-rounding-toward-zero

gcc/
PR target/71233
* config/aarch64/aarch64.md (UNSPEC_FJCVTZS): Define.
(aarch64_fjcvtzs): New define_insn.
* config/aarch64/aarch64.h (TARGET_JSCVT): Define.
* config/aarch64/aarch64-builtins.c (aarch64_builtins):
Add AARCH64_JSCVT.
(aarch64_init_builtins): Initialize __builtin_aarch64_jcvtzs.
(aarch64_expand_builtin): Handle AARCH64_JSCVT.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_JCVT where appropriate.
* config/aarch64/arm_acle.h (__jcvt): Define.
* doc/sourcebuild.texi (aarch64_fjcvtzs_hw) Document new
target supports option.

gcc/testsuite/
PR target/71233
* gcc.target/aarch64/acle/jcvt_1.c: New test.
* gcc.target/aarch64/acle/jcvt_2.c: New testcase.
* lib/target-supports.exp
(check_effective_target_aarch64_fjcvtzs_hw): Add new check for
FJCVTZS hw.

Co-Authored-By: Andrea Corallo  

(cherry picked from commit e1d5d19ec4f84b67ac693fef5b2add7dc9cf056d)
(cherry picked from commit 2c62952f8160bdc8d4111edb34a4bc75096c1e05)
(cherry picked from commit d2b86e14c14020f3e119ab8f462e2a91bd7d46e5)
(cherry picked from commit 58ae77d3ba70a2b9ccc90a90f3f82cf46239d5f1)


jcvt-8.patch
Description: jcvt-8.patch


[PATCH][AArch64][GCC 9] Add support for __jcvt intrinsic

2020-09-24 Thread Kyrylo Tkachov
Hi all,

I'd like to backport support for the __jcvt intrinsic to the active branches as 
it's an Armv8.3-a intrinsic that should have been supported there.
This is a squashed commit of the initial supported and a couple of follow-up 
fixes from Andrea.
This is the GCC 9 version.

Bootstrapped and tested on the branch.

This patch implements the __jcvt ACLE intrinsic [1] that maps down to the 
FJCVTZS [2] instruction from Armv8.3-a.
No fancy mode iterators or nothing. Just a single builtin, UNSPEC and 
define_insn and the associate plumbing.
This patch also defines __ARM_FEATURE_JCVT to indicate when the intrinsic is 
available.

[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics
[2] 
https://developer.arm.com/docs/ddi0596/latest/simd-and-floating-point-instructions-alphabetic-order/fjcvtzs-floating-point-javascript-convert-to-signed-fixed-point-rounding-toward-zero

gcc/
PR target/71233
* config/aarch64/aarch64.md (UNSPEC_FJCVTZS): Define.
(aarch64_fjcvtzs): New define_insn.
* config/aarch64/aarch64.h (TARGET_JSCVT): Define.
* config/aarch64/aarch64-builtins.c (aarch64_builtins):
Add AARCH64_JSCVT.
(aarch64_init_builtins): Initialize __builtin_aarch64_jcvtzs.
(aarch64_expand_builtin): Handle AARCH64_JSCVT.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_JCVT where appropriate.
* config/aarch64/arm_acle.h (__jcvt): Define.
* doc/sourcebuild.texi (aarch64_fjcvtzs_hw) Document new
target supports option.

gcc/testsuite/
PR target/71233
* gcc.target/aarch64/acle/jcvt_1.c: New test.
* gcc.target/aarch64/acle/jcvt_2.c: New testcase.
* lib/target-supports.exp
(check_effective_target_aarch64_fjcvtzs_hw): Add new check for
FJCVTZS hw.

Co-Authored-By: Andrea Corallo  

(cherry picked from commit e1d5d19ec4f84b67ac693fef5b2add7dc9cf056d)
(cherry picked from commit 2c62952f8160bdc8d4111edb34a4bc75096c1e05)
(cherry picked from commit d2b86e14c14020f3e119ab8f462e2a91bd7d46e5)


jcvt-9.patch
Description: jcvt-9.patch


Re: [PATCH] Add cgraph_edge::debug function.

2020-09-24 Thread Jan Hubicka
> I see it handy to debug cgraph_edge and I've dumped it manually many times.
> Maybe it's time to come up with the function? Example output:
> 
> (gdb) p e->debug()
> ag/9 -> h/3 (1 (adjusted),0.25 per call)
> 
> ag/9 (ag) @0x7773eca8
>   Type: function definition analyzed
>   Visibility: public
>   next sharing asm name: 7
>   References: table/5 (addr)
>   Referring:
>   Function ag/9 is inline copy in ap/4
>   Clone of ag/7
>   Availability: local
>   Function flags: count:2 (adjusted) first_run:6 body local hot
>   Called by: ai/8 (inlined) (indirect_inlining) (4 (adjusted),1.00 per call)
>   Calls: h/3 (1 (adjusted),0.25 per call)
> h/3 (h) @0x7772b438
>   Type: function definition analyzed
>   Visibility: externally_visible public
>   References: ap/4 (addr)
>   Referring:
>   Availability: available
>   Profile id: 1806506296
>   Function flags: count:4 (precise) first_run:3 body hot
>   Called by: ag/9 (1 (adjusted),0.25 per call) ag/7 (1 (adjusted),0.25 per 
> call) ag/0 (2 (estimated locally, globally 0 adjusted),0.50 per call) bug/2 
> (1 (precise),1.00 per call) bug/2 (1 (precise),1.00 per call)
>   Calls: ai/1 (4 (precise),1.00 per call)
> 
> (gdb) p ie->debug()
> ai/1 -> (null) (speculative) (0 (adjusted),0.00 per call)
> 
> ai/1 (ai) @0x7772b168
>   Type: function definition analyzed
>   Visibility: prevailing_def_ironly
>   previous sharing asm name: 8
>   References: table/5 (addr) ap/4 (addr) (speculative) ag/0 (addr) 
> (speculative)
>   Referring:
>   Function ai/1 is inline copy in h/3
>   Availability: local
>   Profile id: 1923518911
>   Function flags: count:4 (precise) first_run:4 body local hot
>   Called by: h/3 (inlined) (4 (precise),1.00 per call)
>   Calls: ag/7 (speculative) (inlined) (2 (adjusted),0.50 per call) ap/4 
> (speculative) (2 (adjusted),0.50 per call) PyErr_Format/6 (0 (precise),0.00 
> per call)
>Indirect call(speculative) (0 (adjusted),0.00 per call)  of param:1 (vptr 
> maybe changed) Num speculative call targets: 2
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
>   * cgraph.c (cgraph_edge::debug): New.
>   * cgraph.h (cgraph_edge::debug): New.
OK,
Honza
> ---
>  gcc/cgraph.c | 14 ++
>  gcc/cgraph.h |  3 +++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> index b43adaac7c0..46c3b124b1a 100644
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -2072,6 +2072,20 @@ cgraph_edge::dump_edge_flags (FILE *f)
>  fprintf (f, "(can throw external) ");
>  }
> +/* Dump edge to stderr.  */
> +
> +void
> +cgraph_edge::debug (void)
> +{
> +  fprintf (stderr, "%s -> %s ", caller->dump_asm_name (),
> +callee == NULL ? "(null)" : callee->dump_asm_name ());
> +  dump_edge_flags (stderr);
> +  fprintf (stderr, "\n\n");
> +  caller->debug ();
> +  if (callee != NULL)
> +callee->debug ();
> +}
> +
>  /* Dump call graph node to file F.  */
>  void
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index 0211f08964f..96d6cf609fe 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -2022,6 +2022,9 @@ private:
>/* Output flags of edge to a file F.  */
>void dump_edge_flags (FILE *f);
> +  /* Dump edge to stderr.  */
> +  void DEBUG_FUNCTION debug (void);
> +
>/* Verify that call graph edge corresponds to DECL from the associated
>   statement.  Return true if the verification should fail.  */
>bool verify_corresponds_to_fndecl (tree decl);
> -- 
> 2.28.0
> 


Re: [PATCH] aarch64: Do not alter force_reg returned rtx expanding pauth builtins

2020-09-24 Thread Andrea Corallo
Hi Richard,

thanks for reviewing

Richard Sandiford  writes:

> Andrea Corallo  writes:
>> Hi all,
>>
>> having a look for force_reg returned rtx later on modified I've found
>> this other case in `aarch64_general_expand_builtin` while expanding 
>> pointer authentication builtins.
>>
>> Regtested and bootsraped on aarch64-linux-gnu.
>>
>> Okay for trunk?
>>
>>   Andrea
>>
>> From 8869ee04e3788fdec86aa7e5a13e2eb477091d0e Mon Sep 17 00:00:00 2001
>> From: Andrea Corallo 
>> Date: Mon, 21 Sep 2020 13:52:45 +0100
>> Subject: [PATCH] aarch64: Do not alter force_reg returned rtx expanding pauth
>>  builtins
>>
>> 2020-09-21  Andrea Corallo  
>>
>>  * config/aarch64/aarch64-builtins.c
>>  (aarch64_general_expand_builtin): Do not alter value on a
>>  force_reg returned rtx.
>> ---
>>  gcc/config/aarch64/aarch64-builtins.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
>> b/gcc/config/aarch64/aarch64-builtins.c
>> index b787719cf5e..a77718ccfac 100644
>> --- a/gcc/config/aarch64/aarch64-builtins.c
>> +++ b/gcc/config/aarch64/aarch64-builtins.c
>> @@ -2079,10 +2079,10 @@ aarch64_general_expand_builtin (unsigned int fcode, 
>> tree exp, rtx target,
>>arg0 = CALL_EXPR_ARG (exp, 0);
>>op0 = force_reg (Pmode, expand_normal (arg0));
>>  
>> -  if (!target)
>> +  if (!(target
>> +&& REG_P (target)
>> +&& GET_MODE (target) == Pmode))
>>  target = gen_reg_rtx (Pmode);
>> -  else
>> -target = force_reg (Pmode, target);
>>  
>>emit_move_insn (target, op0);
>
> Do we actually use the result of this move?  It looked like we always
> use op0 rather than target (good) and overwrite target with a later move.
>
> If so, I think we should delete the move

Good point agree.

> and convert the later code to use expand_insn.

I'm not sure I understand the suggestion right, xpaclri patterns
are written with hardcoded in/out regs, is the suggestion to just use like
'expand_insn (CODE_FOR_xpaclri, 0, NULL)' in place of GEN_FCN+emit_insn?

Thanks!

  Andrea


Re: [PATCH] add move CTOR to auto_vec, use auto_vec for get_loop_exit_edges

2020-09-24 Thread Richard Biener
On Thu, 24 Sep 2020, Jonathan Wakely wrote:

> On 24/09/20 11:11 +0200, Richard Biener wrote:
> >On Wed, 26 Aug 2020, Richard Biener wrote:
> >
> >> On Thu, 6 Aug 2020, Richard Biener wrote:
> >>
> >> > On Thu, 6 Aug 2020, Richard Biener wrote:
> >> >
> >> > > This adds a move CTOR to auto_vec and makes use of a
> >> > > auto_vec return value for get_loop_exit_edges denoting
> >> > > that lifetime management of the vector is handed to the caller.
> >> > >
> >> > > The move CTOR prompted the hash_table change because it appearantly
> >> > > makes the copy CTOR implicitely deleted (good) and hash-table
> >> > > expansion of the odr_enum_map which is
> >> > > hash_map  where odr_enum has an
> >> > > auto_vec member triggers this.  Not sure if
> >> > > there's a latent bug there before this (I think we're not
> >> > > invoking DTORs, but we're invoking copy-CTORs).
> >> > >
> >> > > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> >> > >
> >> > > Does this all look sensible and is it a good change
> >> > > (the get_loop_exit_edges one)?
> >> >
> >> > Regtest went OK, here's an update with a complete ChangeLog
> >> > (how useful..) plus the move assign operator deleted, copy
> >> > assign wouldn't work as auto-generated and at the moment
> >> > there's no use of assigning.  I guess if we'd have functions
> >> > that take an auto_vec<> argument meaning they will destroy
> >> > the vector that will become useful and we can implement it.
> >> >
> >> > OK for trunk?
> >>
> >> Ping.
> >
> >Ping^2.
> 
> Looks good to me as far as the use of C++ features goes.

Thanks, now pushed after re-testing.

Richard.


Re: [gcc-7-arm] Backport -moutline-atomics flag

2020-09-24 Thread Pop, Sebastian via Gcc-patches
Thanks Richard for your recommendations.
I am still discussing with Kyrill about a good name for the branch.
Once we agree on a name we will commit the patches to that branch.

Sebastian

On 9/24/20, 4:10 AM, "Richard Biener"  wrote:

On Fri, Sep 11, 2020 at 12:38 AM Pop, Sebastian via Gcc-patches
 wrote:
>
> Hi,
>
> the attached patches are back-porting the flag -moutline-atomics to the 
gcc-7-arm vendor branch.
> The flag enables a very important performance optimization for 
N1-neoverse processors.
> The patches pass bootstrap and make check on Graviton2 aarch64-linux.
>
> Ok to commit to the gcc-7-arm vendor branch?

Given the branch doesn't exist yet can you eventually push this series to
a user branch (or a amazon vendor branch)?

You failed to CC arm folks so your mail might have been lost in the noise.

Thanks,
Richard.

> Thanks,
> Sebastian
>



Re: [PATCH] libstdc++: Specialize ranges::__detail::__box for semiregular types

2020-09-24 Thread Jonathan Wakely via Gcc-patches

On 24/09/20 09:04 -0400, Patrick Palka via Libstdc++ wrote:

The class template semiregular-box defined in [range.semi.wrap] is
used by a number of views to accomodate non-semiregular subobjects
while ensuring that the overall view remains semiregular.  It provides
a stand-in default constructor, copy assignment operator and move
assignment operator whenever the underlying type lacks them.  The
wrapper derives from std::optional to support default construction
when T is not default constructible.

It would be nice for this wrapper to essentially be a no-op when the
underlying type is already semiregular, but this is currently not the
case due to its use of std::optional, which incurs space overhead
compared to storing just T.

To that end, this patch specializes the semiregular wrapper for
semiregular T.  Compared to the primary template, this specialization
uses less space and it allows [[no_unique_address]] to optimize away
wrapped data members whose underlying type is empty and semiregular
(e.g. a non-capturing lambda).  This patch also applies
[[no_unique_address]] to the five data members that currently use the
wrapper.

Tested on x86_64-pc-linux-gnu, does this look OK to commit?

libstdc++-v3/ChangeLog:

* include/std/ranges (__detail::__boxable): Split out the
associated constraints of __box into here.
(__detail::__box): Use the __boxable concept.  Define a leaner
partial specialization for semiregular types.
(single_view::_M_value): Mark it [[no_unique_address]].
(filter_view::_M_pred): Likewise.
(transform_view::_M_fun): Likewise.
(take_while_view::_M_pred): Likewise.
(drop_while_view::_M_pred):: Likewise.
* testsuite/std/ranges/adaptors/detail/semiregular_box.cc: New
test.
---
libstdc++-v3/include/std/ranges   | 68 +++--
.../ranges/adaptors/detail/semiregular_box.cc | 73 +++
2 files changed, 135 insertions(+), 6 deletions(-)
create mode 100644 
libstdc++-v3/testsuite/std/ranges/adaptors/detail/semiregular_box.cc

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index e7fa4493612..8a302a7918f 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -86,7 +86,10 @@ namespace ranges

  namespace __detail
  {
-template requires is_object_v<_Tp>
+template
+  concept __boxable = copy_constructible<_Tp> && is_object_v<_Tp>;
+
+template<__boxable _Tp>
  struct __box : std::optional<_Tp>
  {
using std::optional<_Tp>::optional;
@@ -130,6 +133,59 @@ namespace ranges
}
  };

+// For types which are already semiregular, this specialization of the
+// semiregular wrapper stores the object directly without going through
+// std::optional.  It provides the subset of the primary template's API
+// that we currently use.
+template<__boxable _Tp> requires semiregular<_Tp>
+  struct __box<_Tp>
+  {
+  private:
+   [[no_unique_address]] _Tp _M_value;
+
+  public:
+   __box() = default;
+
+   constexpr
+   __box(const _Tp& __t)
+   noexcept(is_nothrow_copy_constructible_v<_Tp>)
+   : _M_value{__t}
+   { }
+
+   constexpr
+   __box(_Tp&& __t)


To be consistent with optional, these constructors should be
conditionally explicit (and since we're in C++20 code here, we can
actually use explicit(bool) rather than needing two overloads of each
constructor).

But I think we could just make them unconditionally explicit, since we
only ever construct them explicitly. No need to allow implicit
conversions if we never need them.

Otherwise this looks great, p[lease push. It's an ABI change for the
types using __box, so isn't appropriate for backporting to gcc-10
(unlike most changes to ).




Re: [PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-24 Thread Richard Biener via Gcc-patches
On Thu, Sep 24, 2020 at 3:27 PM Richard Biener
 wrote:
>
> On Thu, Sep 24, 2020 at 10:21 AM xionghu luo  wrote:
> >
> > Hi Segher,
> >
> > The attached two patches are updated and split from
> >  "[PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple 
> > [PR79251]"
> > as your comments.
> >
> >
> > [PATCH v3 2/3] rs6000: Fix lvsl mode and change 
> > rs6000_expand_vector_set param
> >
> > This one is preparation work of fix lvsl arg mode and 
> > rs6000_expand_vector_set
> > parameter support for both constant and variable index input.
> >
> >
> > [PATCH v3 2/3] rs6000: Support variable insert and Expand vec_insert in 
> > expander [PR79251]
> >
> > This one is Building VIEW_CONVERT_EXPR and expand the IFN VEC_SET to fast.
>
> I'll just comment that
>
> xxperm 34,34,33
> xxinsertw 34,0,12
> xxperm 34,34,32

Btw, on x86_64 the following produces sth reasonable:

#define N 32
typedef int T;
typedef T V __attribute__((vector_size(N)));
V setg (V v, int idx, T val)
{
  V valv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
  V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == valv);
  v = (v & ~mask) | (valv & mask);
  return v;
}

vmovd   %edi, %xmm1
vpbroadcastd%xmm1, %ymm1
vpcmpeqd.LC0(%rip), %ymm1, %ymm2
vpblendvb   %ymm2, %ymm1, %ymm0, %ymm0
ret

I'm quite sure you could do sth similar on power?

> doesn't look like a variable-position insert instruction but
> this is a variable whole-vector rotate plus an insert at index zero
> followed by a variable whole-vector rotate.  I'm not fluend in
> ppc assembly but
>
> rlwinm 6,6,2,28,29
> mtvsrwz 0,5
> lvsr 1,0,6
> lvsl 0,0,6
>
> possibly computes the shift masks for r33/r32?  though
> I do not see those registers mentioned...
>
> This might be a generic viable expansion strathegy btw,
> which is why I asked before whether the CPU supports
> inserts at a variable position ...  the building blocks are
> already there with vec_set at constant zero position
> plus vec_perm_const for the rotates.
>
> But well, I did ask this question.  Multiple times.
>
> ppc does _not_ have a VSX instruction
> like xxinsertw r34, r8, r12 where r8 denotes
> the vector element (or byte position or whatever).
>
> So I don't think vec_set with a variable index is the
> best approach.
> Xionghu - you said even without the patch the stack
> storage is eventually elided but
>
> addi 9,1,-16
> rldic 6,6,2,60
> stxv 34,-16(1)
> stwx 5,9,6
> lxv 34,-16(1)
>
> still shows stack(?) store/load with a bad STLF penalty.
>
> Richard.
>
> >
> > Thanks,
> > Xionghu


[PATCH] Add cgraph_edge::debug function.

2020-09-24 Thread Martin Liška

I see it handy to debug cgraph_edge and I've dumped it manually many times.
Maybe it's time to come up with the function? Example output:

(gdb) p e->debug()
ag/9 -> h/3 (1 (adjusted),0.25 per call)

ag/9 (ag) @0x7773eca8
  Type: function definition analyzed
  Visibility: public
  next sharing asm name: 7
  References: table/5 (addr)
  Referring:
  Function ag/9 is inline copy in ap/4
  Clone of ag/7
  Availability: local
  Function flags: count:2 (adjusted) first_run:6 body local hot
  Called by: ai/8 (inlined) (indirect_inlining) (4 (adjusted),1.00 per call)
  Calls: h/3 (1 (adjusted),0.25 per call)
h/3 (h) @0x7772b438
  Type: function definition analyzed
  Visibility: externally_visible public
  References: ap/4 (addr)
  Referring:
  Availability: available
  Profile id: 1806506296
  Function flags: count:4 (precise) first_run:3 body hot
  Called by: ag/9 (1 (adjusted),0.25 per call) ag/7 (1 (adjusted),0.25 per 
call) ag/0 (2 (estimated locally, globally 0 adjusted),0.50 per call) bug/2 (1 
(precise),1.00 per call) bug/2 (1 (precise),1.00 per call)
  Calls: ai/1 (4 (precise),1.00 per call)

(gdb) p ie->debug()
ai/1 -> (null) (speculative) (0 (adjusted),0.00 per call)

ai/1 (ai) @0x7772b168
  Type: function definition analyzed
  Visibility: prevailing_def_ironly
  previous sharing asm name: 8
  References: table/5 (addr) ap/4 (addr) (speculative) ag/0 (addr) (speculative)
  Referring:
  Function ai/1 is inline copy in h/3
  Availability: local
  Profile id: 1923518911
  Function flags: count:4 (precise) first_run:4 body local hot
  Called by: h/3 (inlined) (4 (precise),1.00 per call)
  Calls: ag/7 (speculative) (inlined) (2 (adjusted),0.50 per call) ap/4 
(speculative) (2 (adjusted),0.50 per call) PyErr_Format/6 (0 (precise),0.00 per 
call)
   Indirect call(speculative) (0 (adjusted),0.00 per call)  of param:1 (vptr 
maybe changed) Num speculative call targets: 2

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

* cgraph.c (cgraph_edge::debug): New.
* cgraph.h (cgraph_edge::debug): New.
---
 gcc/cgraph.c | 14 ++
 gcc/cgraph.h |  3 +++
 2 files changed, 17 insertions(+)

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index b43adaac7c0..46c3b124b1a 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2072,6 +2072,20 @@ cgraph_edge::dump_edge_flags (FILE *f)
 fprintf (f, "(can throw external) ");
 }
 
+/* Dump edge to stderr.  */

+
+void
+cgraph_edge::debug (void)
+{
+  fprintf (stderr, "%s -> %s ", caller->dump_asm_name (),
+  callee == NULL ? "(null)" : callee->dump_asm_name ());
+  dump_edge_flags (stderr);
+  fprintf (stderr, "\n\n");
+  caller->debug ();
+  if (callee != NULL)
+callee->debug ();
+}
+
 /* Dump call graph node to file F.  */
 
 void

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 0211f08964f..96d6cf609fe 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -2022,6 +2022,9 @@ private:
   /* Output flags of edge to a file F.  */
   void dump_edge_flags (FILE *f);
 
+  /* Dump edge to stderr.  */

+  void DEBUG_FUNCTION debug (void);
+
   /* Verify that call graph edge corresponds to DECL from the associated
  statement.  Return true if the verification should fail.  */
   bool verify_corresponds_to_fndecl (tree decl);
--
2.28.0



[PATCH] libstdc++: Fix Unicode codecvt and add tests [PR86419]

2020-09-24 Thread Dimitrij Mijoski via Gcc-patches
Fixes the conversion from UTF-8 to UTF-16 to properly return partial
instead ok.
Fixes the conversion from UTF-16 to UTF-8 to properly return partial
instead ok.
Fixes the conversion from UTF-8 to UCS-2 to properly return partial
instead error.
Fixes the conversion from UTF-8 to UCS-2 to treat 4-byte UTF-8 sequences
as error just by seeing the leading byte.
Fixes UTF-8 decoding for all codecvts so they detect error at the end of
the input range when the last code point is also incomplete.

The testsute is large and may need splitting into multiple files.

libstdc++-v3/ChangeLog:
PR libstdc++/86419
* src/c++11/codecvt.cc: Fix bugs.
* testsuite/22_locale/codecvt/codecvt_unicode.cc: New tests.
---
 libstdc++-v3/src/c++11/codecvt.cc |   25 +-
 .../22_locale/codecvt/codecvt_unicode.cc  | 1310 +
 2 files changed, 1323 insertions(+), 12 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.cc

diff --git a/libstdc++-v3/src/c++11/codecvt.cc 
b/libstdc++-v3/src/c++11/codecvt.cc
index 0311b15177d0..4545ba1b5933 100644
--- a/libstdc++-v3/src/c++11/codecvt.cc
+++ b/libstdc++-v3/src/c++11/codecvt.cc
@@ -277,13 +277,15 @@ namespace
 }
 else if (c1 < 0xF0) // 3-byte sequence
 {
-  if (avail < 3)
+  if (avail < 2)
return incomplete_mb_character;
   unsigned char c2 = from[1];
   if ((c2 & 0xC0) != 0x80)
return invalid_mb_sequence;
   if (c1 == 0xE0 && c2 < 0xA0) // overlong
return invalid_mb_sequence;
+  if (avail < 3)
+   return incomplete_mb_character;
   unsigned char c3 = from[2];
   if ((c3 & 0xC0) != 0x80)
return invalid_mb_sequence;
@@ -292,9 +294,9 @@ namespace
from += 3;
   return c;
 }
-else if (c1 < 0xF5) // 4-byte sequence
+else if (c1 < 0xF5 && maxcode > 0x) // 4-byte sequence
 {
-  if (avail < 4)
+  if (avail < 2)
return incomplete_mb_character;
   unsigned char c2 = from[1];
   if ((c2 & 0xC0) != 0x80)
@@ -302,10 +304,14 @@ namespace
   if (c1 == 0xF0 && c2 < 0x90) // overlong
return invalid_mb_sequence;
   if (c1 == 0xF4 && c2 >= 0x90) // > U+10
-  return invalid_mb_sequence;
+   return invalid_mb_sequence;
+  if (avail < 3)
+   return incomplete_mb_character;
   unsigned char c3 = from[2];
   if ((c3 & 0xC0) != 0x80)
return invalid_mb_sequence;
+  if (avail < 4)
+   return incomplete_mb_character;
   unsigned char c4 = from[3];
   if ((c4 & 0xC0) != 0x80)
return invalid_mb_sequence;
@@ -540,12 +546,7 @@ namespace
auto orig = from;
const char32_t codepoint = read_utf8_code_point(from, maxcode);
if (codepoint == incomplete_mb_character)
- {
-   if (s == surrogates::allowed)
- return codecvt_base::partial;
-   else
- return codecvt_base::error; // No surrogates in UCS2
- }
+ return codecvt_base::partial;
if (codepoint > maxcode)
  return codecvt_base::error;
if (!write_utf16_code_point(to, codepoint, mode))
@@ -554,7 +555,7 @@ namespace
return codecvt_base::partial;
  }
   }
-return codecvt_base::ok;
+return from.size() ? codecvt_base::partial : codecvt_base::ok;
   }
 
   // utf16 -> utf8 (or ucs2 -> utf8 if s == surrogates::disallowed)
@@ -576,7 +577,7 @@ namespace
  return codecvt_base::error; // No surrogates in UCS-2
 
if (from.size() < 2)
- return codecvt_base::ok; // stop converting at this point
+ return codecvt_base::partial; // stop converting at this point
 
const char32_t c2 = from[1];
if (is_low_surrogate(c2))
diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.cc 
b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.cc
new file mode 100644
index ..88afd49206d1
--- /dev/null
+++ b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.cc
@@ -0,0 +1,1310 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// ;.
+
+// { dg-do run { target c++11 } }
+
+#include 
+#include 
+#include 
+#include 
+
+using namespace std;
+
+template 

Re: [PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-24 Thread Richard Biener via Gcc-patches
On Thu, Sep 24, 2020 at 10:21 AM xionghu luo  wrote:
>
> Hi Segher,
>
> The attached two patches are updated and split from
>  "[PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple 
> [PR79251]"
> as your comments.
>
>
> [PATCH v3 2/3] rs6000: Fix lvsl mode and change rs6000_expand_vector_set 
> param
>
> This one is preparation work of fix lvsl arg mode and 
> rs6000_expand_vector_set
> parameter support for both constant and variable index input.
>
>
> [PATCH v3 2/3] rs6000: Support variable insert and Expand vec_insert in 
> expander [PR79251]
>
> This one is Building VIEW_CONVERT_EXPR and expand the IFN VEC_SET to fast.

I'll just comment that

xxperm 34,34,33
xxinsertw 34,0,12
xxperm 34,34,32

doesn't look like a variable-position insert instruction but
this is a variable whole-vector rotate plus an insert at index zero
followed by a variable whole-vector rotate.  I'm not fluend in
ppc assembly but

rlwinm 6,6,2,28,29
mtvsrwz 0,5
lvsr 1,0,6
lvsl 0,0,6

possibly computes the shift masks for r33/r32?  though
I do not see those registers mentioned...

This might be a generic viable expansion strathegy btw,
which is why I asked before whether the CPU supports
inserts at a variable position ...  the building blocks are
already there with vec_set at constant zero position
plus vec_perm_const for the rotates.

But well, I did ask this question.  Multiple times.

ppc does _not_ have a VSX instruction
like xxinsertw r34, r8, r12 where r8 denotes
the vector element (or byte position or whatever).

So I don't think vec_set with a variable index is the
best approach.
Xionghu - you said even without the patch the stack
storage is eventually elided but

addi 9,1,-16
rldic 6,6,2,60
stxv 34,-16(1)
stwx 5,9,6
lxv 34,-16(1)

still shows stack(?) store/load with a bad STLF penalty.

Richard.

>
> Thanks,
> Xionghu


c++: local-decls are never member fns [PR97186]

2020-09-24 Thread Nathan Sidwell


This fixes an ICE in noexcept instantiation.  It was presuming
functions always have template_info, but that changed with my
DECL_LOCAL_DECL_P changes.  Fortunately DECL_LOCAL_DECL_P fns are
never member fns, so we don't need to go fishing out a this pointer.

Also I realized I'd misnamed local10.C, so renaming it local-fn3.C,
and while there adding the effective-target lto that David E pointed
out was missing.

PR c++/97186
gcc/cp/
* pt.c (maybe_instantiate_noexcept): Local externs are never
member fns.
gcc/testsuite/
* g++.dg/template/local10.C: Rename ...
* g++.dg/template/local-fn3.C: .. here.  Require lto.
* g++.dg/template/local-fn4.C: New.

pushing to trunk

nathan
--
Nathan Sidwell
diff --git c/gcc/cp/pt.c w/gcc/cp/pt.c
index 1ec039d0793..62e85095bc4 100644
--- c/gcc/cp/pt.c
+++ w/gcc/cp/pt.c
@@ -25397,15 +25397,20 @@ maybe_instantiate_noexcept (tree fn, tsubst_flags_t complain)
 	  push_deferring_access_checks (dk_no_deferred);
 	  input_location = DECL_SOURCE_LOCATION (fn);
 
-	  /* If needed, set current_class_ptr for the benefit of
-	 tsubst_copy/PARM_DECL.  */
-	  tree tdecl = DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (fn));
-	  if (DECL_NONSTATIC_MEMBER_FUNCTION_P (tdecl))
+	  if (!DECL_LOCAL_DECL_P (fn))
 	{
-	  tree this_parm = DECL_ARGUMENTS (tdecl);
-	  current_class_ptr = NULL_TREE;
-	  current_class_ref = cp_build_fold_indirect_ref (this_parm);
-	  current_class_ptr = this_parm;
+	  /* If needed, set current_class_ptr for the benefit of
+		 tsubst_copy/PARM_DECL.  The exception pattern will
+		 refer to the parm of the template, not the
+		 instantiation.  */
+	  tree tdecl = DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (fn));
+	  if (DECL_NONSTATIC_MEMBER_FUNCTION_P (tdecl))
+		{
+		  tree this_parm = DECL_ARGUMENTS (tdecl);
+		  current_class_ptr = NULL_TREE;
+		  current_class_ref = cp_build_fold_indirect_ref (this_parm);
+		  current_class_ptr = this_parm;
+		}
 	}
 
 	  /* If this function is represented by a TEMPLATE_DECL, then
diff --git c/gcc/testsuite/g++.dg/template/local10.C w/gcc/testsuite/g++.dg/template/local-fn3.C
similarity index 87%
rename from gcc/testsuite/g++.dg/template/local10.C
rename to gcc/testsuite/g++.dg/template/local-fn3.C
index a2ffc1e7306..2affe235bd3 100644
--- c/gcc/testsuite/g++.dg/template/local10.C
+++ w/gcc/testsuite/g++.dg/template/local-fn3.C
@@ -1,4 +1,6 @@
 // PR c++/97171
+
+// { dg-require-effective-target lto }
 // { dg-additional-options -flto }
 
 template 
diff --git c/gcc/testsuite/g++.dg/template/local-fn4.C w/gcc/testsuite/g++.dg/template/local-fn4.C
new file mode 100644
index 000..4699012accc
--- /dev/null
+++ w/gcc/testsuite/g++.dg/template/local-fn4.C
@@ -0,0 +1,21 @@
+// PR c++/97186
+// ICE in exception spec substitution
+
+
+template 
+struct no {
+  static void
+  tg ()
+  {
+void
+  hk () noexcept (tg); // { dg-error "convert" }
+
+hk ();
+  }
+};
+
+void
+os ()
+{
+  no ().tg ();
+}


Re: [PATCH 1/1] arm: [testsuite] Skip thumb2-cond-cmp tests on Cortex-M [PR94595]

2020-09-24 Thread Christophe Lyon via Gcc-patches
Ping?

On Mon, 7 Sep 2020 at 18:13, Christophe Lyon  wrote:
>
> Since r204778 (g571880a0a4c512195aa7d41929ba6795190887b2), we favor
> branches over IT blocks on Cortex-M. As a result, instead of
> generating two nested IT blocks in thumb2-cond-cmp-[1234].c, we
> generate either a single IT block, or use branches depending on
> conditions tested by the program.
>
> Since this was a deliberate change and the tests still pass as
> expected on Cortex-A, this patch skips them when targetting
> Cortex-M. The avoids the failures on Cortex M3, M4, and M33.  This
> patch makes the testcases unsupported on Cortex-M7 although they pass
> in this case because this CPU has different branch costs.
>
> I tried to relax the scan-assembler directives using eg. cmpne|subne
> or cmpgt|ble but that seemed fragile.
>
> OK?
>
> 2020-09-07  Christophe Lyon  
>
> gcc/testsuite/
> PR target/94595
> * gcc.target/arm/thumb2-cond-cmp-1.c: Skip if arm_cortex_m.
> * gcc.target/arm/thumb2-cond-cmp-2.c: Skip if arm_cortex_m.
> * gcc.target/arm/thumb2-cond-cmp-3.c: Skip if arm_cortex_m.
> * gcc.target/arm/thumb2-cond-cmp-3.c: Skip if arm_cortex_m.
> ---
>  gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-1.c | 2 +-
>  gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-2.c | 2 +-
>  gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-3.c | 2 +-
>  gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-4.c | 2 +-
>  4 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-1.c 
> b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-1.c
> index 45ab605..36204f4 100644
> --- a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-1.c
> +++ b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-1.c
> @@ -1,6 +1,6 @@
>  /* Use conditional compare */
>  /* { dg-options "-O2" } */
> -/* { dg-skip-if "" { arm_thumb1_ok } } */
> +/* { dg-skip-if "" { arm_thumb1_ok || arm_cortex_m } } */
>  /* { dg-final { scan-assembler "cmpne" } } */
>
>  int f(int i, int j)
> diff --git a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-2.c 
> b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-2.c
> index 17d9a8f..108d1c3 100644
> --- a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-2.c
> +++ b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-2.c
> @@ -1,6 +1,6 @@
>  /* Use conditional compare */
>  /* { dg-options "-O2" } */
> -/* { dg-skip-if "" { arm_thumb1_ok } } */
> +/* { dg-skip-if "" { arm_thumb1_ok || arm_cortex_m } } */
>  /* { dg-final { scan-assembler "cmpeq" } } */
>
>  int f(int i, int j)
> diff --git a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-3.c 
> b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-3.c
> index 6b2a79b..ca7fd9f 100644
> --- a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-3.c
> +++ b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-3.c
> @@ -1,6 +1,6 @@
>  /* Use conditional compare */
>  /* { dg-options "-O2" } */
> -/* { dg-skip-if "" { arm_thumb1_ok } } */
> +/* { dg-skip-if "" { arm_thumb1_ok || arm_cortex_m } } */
>  /* { dg-final { scan-assembler "cmpgt" } } */
>
>  int f(int i, int j)
> diff --git a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-4.c 
> b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-4.c
> index 80e1076..91cc8f4 100644
> --- a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-4.c
> +++ b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-4.c
> @@ -1,6 +1,6 @@
>  /* Use conditional compare */
>  /* { dg-options "-O2" } */
> -/* { dg-skip-if "" { arm_thumb1_ok } } */
> +/* { dg-skip-if "" { arm_thumb1_ok || arm_cortex_m } } */
>  /* { dg-final { scan-assembler "cmpgt" } } */
>
>  int f(int i, int j)
> --
> 2.7.4
>


Re: [PATCH] add move CTOR to auto_vec, use auto_vec for get_loop_exit_edges

2020-09-24 Thread Jonathan Wakely via Gcc-patches

On 24/09/20 11:11 +0200, Richard Biener wrote:

On Wed, 26 Aug 2020, Richard Biener wrote:


On Thu, 6 Aug 2020, Richard Biener wrote:

> On Thu, 6 Aug 2020, Richard Biener wrote:
>
> > This adds a move CTOR to auto_vec and makes use of a
> > auto_vec return value for get_loop_exit_edges denoting
> > that lifetime management of the vector is handed to the caller.
> >
> > The move CTOR prompted the hash_table change because it appearantly
> > makes the copy CTOR implicitely deleted (good) and hash-table
> > expansion of the odr_enum_map which is
> > hash_map  where odr_enum has an
> > auto_vec member triggers this.  Not sure if
> > there's a latent bug there before this (I think we're not
> > invoking DTORs, but we're invoking copy-CTORs).
> >
> > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> >
> > Does this all look sensible and is it a good change
> > (the get_loop_exit_edges one)?
>
> Regtest went OK, here's an update with a complete ChangeLog
> (how useful..) plus the move assign operator deleted, copy
> assign wouldn't work as auto-generated and at the moment
> there's no use of assigning.  I guess if we'd have functions
> that take an auto_vec<> argument meaning they will destroy
> the vector that will become useful and we can implement it.
>
> OK for trunk?

Ping.


Ping^2.


Looks good to me as far as the use of C++ features goes.



[PATCH] libstdc++: Specialize ranges::__detail::__box for semiregular types

2020-09-24 Thread Patrick Palka via Gcc-patches
The class template semiregular-box defined in [range.semi.wrap] is
used by a number of views to accomodate non-semiregular subobjects
while ensuring that the overall view remains semiregular.  It provides
a stand-in default constructor, copy assignment operator and move
assignment operator whenever the underlying type lacks them.  The
wrapper derives from std::optional to support default construction
when T is not default constructible.

It would be nice for this wrapper to essentially be a no-op when the
underlying type is already semiregular, but this is currently not the
case due to its use of std::optional, which incurs space overhead
compared to storing just T.

To that end, this patch specializes the semiregular wrapper for
semiregular T.  Compared to the primary template, this specialization
uses less space and it allows [[no_unique_address]] to optimize away
wrapped data members whose underlying type is empty and semiregular
(e.g. a non-capturing lambda).  This patch also applies
[[no_unique_address]] to the five data members that currently use the
wrapper.

Tested on x86_64-pc-linux-gnu, does this look OK to commit?

libstdc++-v3/ChangeLog:

* include/std/ranges (__detail::__boxable): Split out the
associated constraints of __box into here.
(__detail::__box): Use the __boxable concept.  Define a leaner
partial specialization for semiregular types.
(single_view::_M_value): Mark it [[no_unique_address]].
(filter_view::_M_pred): Likewise.
(transform_view::_M_fun): Likewise.
(take_while_view::_M_pred): Likewise.
(drop_while_view::_M_pred):: Likewise.
* testsuite/std/ranges/adaptors/detail/semiregular_box.cc: New
test.
---
 libstdc++-v3/include/std/ranges   | 68 +++--
 .../ranges/adaptors/detail/semiregular_box.cc | 73 +++
 2 files changed, 135 insertions(+), 6 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/std/ranges/adaptors/detail/semiregular_box.cc

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index e7fa4493612..8a302a7918f 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -86,7 +86,10 @@ namespace ranges
 
   namespace __detail
   {
-template requires is_object_v<_Tp>
+template
+  concept __boxable = copy_constructible<_Tp> && is_object_v<_Tp>;
+
+template<__boxable _Tp>
   struct __box : std::optional<_Tp>
   {
using std::optional<_Tp>::optional;
@@ -130,6 +133,59 @@ namespace ranges
}
   };
 
+// For types which are already semiregular, this specialization of the
+// semiregular wrapper stores the object directly without going through
+// std::optional.  It provides the subset of the primary template's API
+// that we currently use.
+template<__boxable _Tp> requires semiregular<_Tp>
+  struct __box<_Tp>
+  {
+  private:
+   [[no_unique_address]] _Tp _M_value;
+
+  public:
+   __box() = default;
+
+   constexpr
+   __box(const _Tp& __t)
+   noexcept(is_nothrow_copy_constructible_v<_Tp>)
+   : _M_value{__t}
+   { }
+
+   constexpr
+   __box(_Tp&& __t)
+   noexcept(is_nothrow_move_constructible_v<_Tp>)
+   : _M_value{std::move(__t)}
+   { }
+
+   template
+ requires constructible_from<_Tp, _Args...>
+ constexpr
+ __box(in_place_t, _Args&&... __args)
+ noexcept(is_nothrow_constructible_v<_Tp, _Args...>)
+ : _M_value{std::forward<_Args>(__args)...}
+ { }
+
+   constexpr bool
+   has_value() const noexcept
+   { return true; };
+
+   constexpr _Tp&
+   operator*() noexcept
+   { return _M_value; }
+
+   constexpr const _Tp&
+   operator*() const noexcept
+   { return _M_value; }
+
+   constexpr _Tp*
+   operator->() noexcept
+   { return &_M_value; }
+
+   constexpr const _Tp*
+   operator->() const noexcept
+   { return &_M_value; }
+  };
   } // namespace __detail
 
   /// A view that contains exactly one element.
@@ -185,7 +241,7 @@ namespace ranges
   { return _M_value.operator->(); }
 
 private:
-  __detail::__box<_Tp> _M_value;
+  [[no_unique_address]] __detail::__box<_Tp> _M_value;
 };
 
   namespace __detail
@@ -1195,7 +1251,7 @@ namespace views
   };
 
   _Vp _M_base = _Vp();
-  __detail::__box<_Pred> _M_pred;
+  [[no_unique_address]] __detail::__box<_Pred> _M_pred;
   [[no_unique_address]] __detail::_CachedPosition<_Vp> _M_cached_begin;
 
 public:
@@ -1533,7 +1589,7 @@ namespace views
};
 
   _Vp _M_base = _Vp();
-  __detail::__box<_Fp> _M_fun;
+  [[no_unique_address]] __detail::__box<_Fp> _M_fun;
 
 public:
   transform_view() = default;
@@ -1787,7 +1843,7 @@ namespace views
};
 
   _Vp _M_base = _Vp();
-  __detail::__box<_Pred> _M_pred;
+  

Re: [PATCH][omp, ftracer] Don't duplicate blocks in SIMT region

2020-09-24 Thread Richard Biener
On Thu, 24 Sep 2020, Tom de Vries wrote:

> On 9/24/20 1:42 PM, Richard Biener wrote:
> > On Wed, 23 Sep 2020, Tom de Vries wrote:
> > 
> >> On 9/23/20 9:28 AM, Richard Biener wrote:
> >>> On Tue, 22 Sep 2020, Tom de Vries wrote:
> >>>
>  [ was: Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB
>  with SIMT LANE [PR95654] ]
> 
>  On 9/16/20 8:20 PM, Alexander Monakov wrote:
> >
> >
> > On Wed, 16 Sep 2020, Tom de Vries wrote:
> >
> >> [ cc-ing author omp support for nvptx. ]
> >
> > The issue looks familiar. I recognized it back in 2017 (and LLVM people
> > recognized it too for their GPU targets). In an attempt to get agreement
> > to fix the issue "properly" for GCC I found a similar issue that affects
> > all targets, not just offloading, and filed it as PR 80053.
> >
> > (yes, there are no addressable labels involved in offloading, but 
> > nevertheless
> > the nature of the middle-end issue is related)
> 
>  Hi Alexander,
> 
>  thanks for looking into this.
> 
>  Seeing that the attempt to fix things properly is stalled, for now I'm
>  proposing a point-fix, similar to the original patch proposed by Tobias.
> 
>  Richi, Jakub, OK for trunk?
> >>>
> >>> I notice that we call ignore_bb_p many times in tracer.c but one call
> >>> is conveniently early in tail_duplicate (void):
> >>>
> >>>   int n = count_insns (bb);
> >>>   if (!ignore_bb_p (bb))
> >>> blocks[bb->index] = heap.insert (-bb->count.to_frequency (cfun), 
> >>> bb);
> >>>
> >>> where count_insns already walks all stmts in the block.  It would be
> >>> nice to avoid repeatedly walking all stmts, maybe adjusting the above
> >>> call is enough and/or count_insns can compute this and/or the ignore_bb_p
> >>> result can be cached (optimize_bb_for_size_p might change though,
> >>> but maybe all other ignore_bb_p calls effectively just are that,
> >>> checks for blocks that became optimize_bb_for_size_p).
> >>>
> >>
> >> This untested follow-up patch tries something in that direction.
> >>
> >> Is this what you meant?
> > 
> > Yeah, sort of.
> > 
> > +static bool
> > +cached_can_duplicate_bb_p (const_basic_block bb)
> > +{
> > +  if (can_duplicate_bb)
> > 
> > is there any path where can_duplicate_bb would be NULL?
> > 
> 
> Yes, ignore_bb_p is called from gimple-ssa-split-paths.c.

Oh, that was probably done because of the very same OMP issue ...

> > +{
> > +  unsigned int size = SBITMAP_SIZE (can_duplicate_bb);
> > +  /* Assume added bb's should be ignored.  */
> > +  if ((unsigned int)bb->index < size
> > + && bitmap_bit_p (can_duplicate_bb_computed, bb->index))
> > +   return !bitmap_bit_p (can_duplicate_bb, bb->index);
> > 
> > yes, newly added bbs should be ignored so,
> > 
> >  }
> >  
> > -  return false;
> > +  bool val = compute_can_duplicate_bb_p (bb);
> > +  if (can_duplicate_bb)
> > +cache_can_duplicate_bb_p (bb, val);
> > 
> > no need to compute & cache for them, just return true (because
> > we did duplicate them)?
> > 
> 
> Also the case for gimple-ssa-split-paths.c.?

If it had the bitmap then yes ... since it doesn't the early
out should be in the conditional above only.

Richard.

> Thanks,
> - Tom
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


Re: [PATCH] Add if-chain to switch conversion pass.

2020-09-24 Thread Richard Biener via Gcc-patches
On Wed, Sep 2, 2020 at 1:53 PM Martin Liška  wrote:
>
> On 9/1/20 4:50 PM, David Malcolm wrote:
> > Hope this is constructive
> > Dave
>
> Thank you David. All of them very very useful!
>
> There's updated version of the patch.

I noticed several functions without a function-level comment.

-  cluster (tree case_label_expr, basic_block case_bb, profile_probability prob,
-  profile_probability subtree_prob);
+  inline cluster (tree case_label_expr, basic_block case_bb,
+ profile_probability prob, profile_probability subtree_prob);

I thought we generally leave this to the compiler ...

+@item -fconvert-if-to-switch
+@opindex fconvert-if-to-switch
+Perform conversion of an if cascade into a switch statement.
+Do so if the switch can be later transformed using a jump table
+or a bit test.  The transformation can help to produce faster code for
+the switch statement.  This flag is enabled by default
+at @option{-O2} and higher.

this mentions we do this only when we later can convert the
switch again but both passes (we still have two :/) have
independent guards.

+  /* For now, just wipe the dominator information.  */
+  free_dominance_info (CDI_DOMINATORS);

could at least be conditional on the vop renaming condition...

+  if (!all_candidates.is_empty ())
+mark_virtual_operands_for_renaming (fun);

+  if (bitmap_bit_p (*visited_bbs, bb->index))
+   break;
+  bitmap_set_bit (*visited_bbs, bb->index);

since you are using a bitmap and not a sbitmap (why?)
you can combine those into

   if (!bitmap_set_bit (*visited_bbs, bb->index))
break;

+  /* Current we support following patterns (situations):
+
+1) if condition with equal operation:
+
...

did you see whether using

   register_edge_assert_for (lhs, true_edge, code, lhs, rhs, asserts);

works equally well?  It fills the 'asserts' vector with relations
derived from 'lhs'.  There's also
vr_values::extract_range_for_var_from_comparison_expr
to compute the case_range

+  /* If it's not the first condition, then we need a BB without
+any statements.  */
+  if (!first)
+   {
+ unsigned stmt_count = 0;
+ for (gimple_stmt_iterator gsi = gsi_start_nondebug_bb (bb);
+  !gsi_end_p (gsi); gsi_next_nondebug ())
+   ++stmt_count;
+
+ if (stmt_count - visited_stmt_count != 0)
+   break;

hmm, OK, this might be a bit iffy to get correct then, still it's a lot
of pattern maching code that is there elsewhere already.
ifcombine simply hoists any stmts without side-effects up the
dominator tree and thus only requires BBs without side-effects
(IIRC there's a predicate fn for that).

+  /* Prevent loosing information for a PHI node where 2 edges will
+be folded into one.  Note that we must do the same also for false_edge
+(for last BB in a if-elseif chain).  */
+  if (!chain->record_phi_arguments (true_edge)
+ || !chain->record_phi_arguments (false_edge))

I don't really get this - looking at record_phi_arguments it seems
we're requiring that all edges into the same PHI from inside the case
(irrespective of from which case label) have the same value for the
PHI arg?

+ if (arg != *v)
+   return false;

should use operand_equal_p at least, REAL_CSTs are for example
not shared tree nodes.  I'll also notice that if record_phi_arguments
fails we still may have altered its hash-map even though the particular
edge will not participate in the current chain, so it will affect other
chains ending in the same BB.  Overall this looks a bit too conservative
(and random, based on visiting order).

+expanded_location loc
+= expand_location (gimple_location (chain->m_first_condition));
+  if (dump_file)
+   {
+ fprintf (dump_file, "Condition chain (at %s:%d) with %d conditions "
+  "(%d BBs) transformed into a switch statement.\n",
+  loc.file, loc.line, total_case_values,
+  chain->m_entries.length ());

Use dump_printf_loc and you can pass a gimple * stmt as location.

+  /* Follow if-elseif-elseif chain.  */
+  bb = false_edge->dest;

so that means the code doesn't handle a tree, right?  But what
makes us sure the chain doesn't continue on the true_edge instead,
guess this degenerate tree isn't handled either.

I was thinking on whether doing the switch discovery in a reverse
CFG walk, recording for each BB what case_range(s) it represents
for a particular variable(s) so when visiting a dominator you
can quickly figure what's the relevant children (true, false or both).
It would also make the matching a BB-local operation where you'd
do the case_label discovery based on the single-pred BBs gimple-cond.

+  output = bit_test_cluster::find_bit_tests (filtered_clusters);
+  r = output.length () < filtered_clusters.length ();
+  if (r)
+dump_clusters (, "BT can be built");

so as of the very above comment - this might be guarded 

Re: [PATCH v3 1/2] IFN: Implement IFN_VEC_SET for ARRAY_REF with VIEW_CONVERT_EXPR

2020-09-24 Thread Richard Sandiford
xionghu luo  writes:
> @@ -2658,6 +2659,43 @@ expand_vect_cond_mask_optab_fn (internal_fn, gcall 
> *stmt, convert_optab optab)
>  
>  #define expand_vec_cond_mask_optab_fn expand_vect_cond_mask_optab_fn
>  
> +/* Expand VEC_SET internal functions.  */
> +
> +static void
> +expand_vec_set_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +{
> +  tree lhs = gimple_call_lhs (stmt);
> +  tree op0 = gimple_call_arg (stmt, 0);
> +  tree op1 = gimple_call_arg (stmt, 1);
> +  tree op2 = gimple_call_arg (stmt, 2);
> +  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> +  rtx src = expand_expr (op0, NULL_RTX, VOIDmode, EXPAND_WRITE);

I'm not sure about the expand_expr here.  ISTM that op0 is a normal
input and so should be expanded by expand_normal rather than
EXPAND_WRITE.  Also:

> +
> +  machine_mode outermode = TYPE_MODE (TREE_TYPE (op0));
> +  scalar_mode innermode = GET_MODE_INNER (outermode);
> +
> +  rtx value = expand_expr (op1, NULL_RTX, VOIDmode, EXPAND_NORMAL);
> +  rtx pos = expand_expr (op2, NULL_RTX, VOIDmode, EXPAND_NORMAL);
> +
> +  class expand_operand ops[3];
> +  enum insn_code icode = optab_handler (optab, outermode);
> +
> +  if (icode != CODE_FOR_nothing)
> +{
> +  pos = convert_to_mode (E_SImode, pos, 0);
> +
> +  create_fixed_operand ([0], src);

...this would mean that if SRC happens to be a MEM, the pattern
must also accept a MEM.

ISTM that we're making more work for ourselves by not “fixing” the optab
to have a natural pure-input + pure-output interface. :-)  But if we
stick with the current optab interface, I think we need to:

- create a temporary register
- move SRC into the temporary register before the insn
- use create_fixed_operand with the temporary register for operand 0
- move the temporary register into TARGET after the insn

> +  create_input_operand ([1], value, innermode);
> +  create_input_operand ([2], pos, GET_MODE (pos));

For this I think we should use convert_operand_from on the original “pos”,
so that the target gets to choose what the mode of the operand is.

Thanks,
Richard


[committed][testsuite, nvptx] Fix gcc.dg/tls/thr-cse-1.c

2020-09-24 Thread Tom de Vries
Hi,

With nvptx, we run into:
...
FAIL: gcc.dg/tls/thr-cse-1.c scan-assembler-not \
  emutls_get_address.*emutls_get_address.*
...
because the nvptx assembly looks like:
...
  call (%value_in), __emutls_get_address, (%out_arg1);
  ...
// BEGIN GLOBAL FUNCTION DECL: __emutls_get_address
.extern .func (.param.u64 %value_out) __emutls_get_address (.param.u64 %in_ar0);
...

Fix this by checking the slim final dump instead, where we have just:
...
   12: r35:DI=call [`__emutls_get_address'] argc:0
...

Committed to trunk.

Thanks,
- Tom

[testsuite, nvptx] Fix gcc.dg/tls/thr-cse-1.c

gcc/testsuite/ChangeLog:

2020-09-24  Tom de Vries  

* gcc.dg/tls/thr-cse-1.c: Scan final dump instead of assembly for
nvptx.

---
 gcc/testsuite/gcc.dg/tls/thr-cse-1.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tls/thr-cse-1.c 
b/gcc/testsuite/gcc.dg/tls/thr-cse-1.c
index 84eedfdb226..7145671eb95 100644
--- a/gcc/testsuite/gcc.dg/tls/thr-cse-1.c
+++ b/gcc/testsuite/gcc.dg/tls/thr-cse-1.c
@@ -4,6 +4,7 @@
registers and thus getting the counts wrong.  */
 /* { dg-additional-options "-mshort-calls" { target epiphany-*-* } } */
 /* { dg-require-effective-target tls_emulated } */
+/* { dg-additional-options "-fdump-rtl-final-slim" { target nvptx-*-* } }*/
 
 /* Test that we only get one call to emutls_get_address when CSE is
active.  Note that the var _must_ be initialized for the scan asm
@@ -18,10 +19,12 @@ int foo (int b, int c, int d)
   return a;
 }
 
-/* { dg-final { scan-assembler-not "emutls_get_address.*emutls_get_address.*" 
{ target { ! { "*-wrs-vxworks"  "*-*-darwin8"  "hppa*-*-hpux*" "i?86-*-mingw*" 
"x86_64-*-mingw*" visium-*-* } } } } } */
+/* { dg-final { scan-assembler-not "emutls_get_address.*emutls_get_address.*" 
{ target { ! { "*-wrs-vxworks"  "*-*-darwin8"  "hppa*-*-hpux*" "i?86-*-mingw*" 
"x86_64-*-mingw*" visium-*-* nvptx-*-* } } } } } */
 /* { dg-final { scan-assembler-not 
"call\tL___emutls_get_address.stub.*call\tL___emutls_get_address.stub.*" { 
target "*-*-darwin8" } } } */
 /* { dg-final { scan-assembler-not "(b,l|bl) __emutls_get_address.*(b,l|bl) 
__emutls_get_address.*" { target "hppa*-*-hpux*" } } } */
 /* { dg-final { scan-assembler-not "tls_lookup.*tls_lookup.*" { target 
*-wrs-vxworks } } } */
 /* { dg-final { scan-assembler-not 
"call\t___emutls_get_address.*call\t___emutls_get_address" { target 
"i?86-*-mingw*" } } } */
 /* { dg-final { scan-assembler-not 
"call\t__emutls_get_address.*call\t__emutls_get_address" { target 
"x86_64-*-mingw*" } } } */
 /* { dg-final { scan-assembler-not "%l __emutls_get_address.*%l 
__emutls_get_address" { target visium-*-* } } } */
+
+/* { dg-final { scan-rtl-dump-times "emutls_get_address" 1 "final" { target 
nvptx-*-* } } } */


Re: [PATCH][omp, ftracer] Don't duplicate blocks in SIMT region

2020-09-24 Thread Tom de Vries
On 9/24/20 1:42 PM, Richard Biener wrote:
> On Wed, 23 Sep 2020, Tom de Vries wrote:
> 
>> On 9/23/20 9:28 AM, Richard Biener wrote:
>>> On Tue, 22 Sep 2020, Tom de Vries wrote:
>>>
 [ was: Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB
 with SIMT LANE [PR95654] ]

 On 9/16/20 8:20 PM, Alexander Monakov wrote:
>
>
> On Wed, 16 Sep 2020, Tom de Vries wrote:
>
>> [ cc-ing author omp support for nvptx. ]
>
> The issue looks familiar. I recognized it back in 2017 (and LLVM people
> recognized it too for their GPU targets). In an attempt to get agreement
> to fix the issue "properly" for GCC I found a similar issue that affects
> all targets, not just offloading, and filed it as PR 80053.
>
> (yes, there are no addressable labels involved in offloading, but 
> nevertheless
> the nature of the middle-end issue is related)

 Hi Alexander,

 thanks for looking into this.

 Seeing that the attempt to fix things properly is stalled, for now I'm
 proposing a point-fix, similar to the original patch proposed by Tobias.

 Richi, Jakub, OK for trunk?
>>>
>>> I notice that we call ignore_bb_p many times in tracer.c but one call
>>> is conveniently early in tail_duplicate (void):
>>>
>>>   int n = count_insns (bb);
>>>   if (!ignore_bb_p (bb))
>>> blocks[bb->index] = heap.insert (-bb->count.to_frequency (cfun), 
>>> bb);
>>>
>>> where count_insns already walks all stmts in the block.  It would be
>>> nice to avoid repeatedly walking all stmts, maybe adjusting the above
>>> call is enough and/or count_insns can compute this and/or the ignore_bb_p
>>> result can be cached (optimize_bb_for_size_p might change though,
>>> but maybe all other ignore_bb_p calls effectively just are that,
>>> checks for blocks that became optimize_bb_for_size_p).
>>>
>>
>> This untested follow-up patch tries something in that direction.
>>
>> Is this what you meant?
> 
> Yeah, sort of.
> 
> +static bool
> +cached_can_duplicate_bb_p (const_basic_block bb)
> +{
> +  if (can_duplicate_bb)
> 
> is there any path where can_duplicate_bb would be NULL?
> 

Yes, ignore_bb_p is called from gimple-ssa-split-paths.c.

> +{
> +  unsigned int size = SBITMAP_SIZE (can_duplicate_bb);
> +  /* Assume added bb's should be ignored.  */
> +  if ((unsigned int)bb->index < size
> + && bitmap_bit_p (can_duplicate_bb_computed, bb->index))
> +   return !bitmap_bit_p (can_duplicate_bb, bb->index);
> 
> yes, newly added bbs should be ignored so,
> 
>  }
>  
> -  return false;
> +  bool val = compute_can_duplicate_bb_p (bb);
> +  if (can_duplicate_bb)
> +cache_can_duplicate_bb_p (bb, val);
> 
> no need to compute & cache for them, just return true (because
> we did duplicate them)?
> 

Also the case for gimple-ssa-split-paths.c.?

Thanks,
- Tom


Re: [PATCH PR96757] aarch64: ICE during GIMPLE pass: vect

2020-09-24 Thread Richard Sandiford
Hi,

"duanbo (C)"  writes:
> Sorry for the late reply.

My time to apologise for the late reply.

> Thanks for your suggestions. I have modified accordingly.
> Attached please find the v1 patch. 

Thanks, the logic to choose which precision we pick looks good.
But I think the build_mask_conversions should be deferred until
after we've decided to make the transform.  So…

> @@ -4340,16 +4342,91 @@ vect_recog_mask_conversion_pattern (vec_info *vinfo,
>  
>it is better for b1 and b2 to use the mask type associated
>with int elements rather bool (byte) elements.  */
> -   rhs1_type = integer_type_for_mask (TREE_OPERAND (rhs1, 0), vinfo);
> -   if (!rhs1_type)
> - rhs1_type = TREE_TYPE (TREE_OPERAND (rhs1, 0));
> +   rhs1_op0 = TREE_OPERAND (rhs1, 0);
> +   rhs1_op1 = TREE_OPERAND (rhs1, 1);
> +   if (!rhs1_op0 || !rhs1_op1)
> + return NULL;
> +   rhs1_op0_type = integer_type_for_mask (rhs1_op0, vinfo);
> +   rhs1_op1_type = integer_type_for_mask (rhs1_op1, vinfo);
> +
> +   if (!rhs1_op0_type && !rhs1_op1_type)
> + {
> +   rhs1_type = TREE_TYPE (rhs1_op0);
> +   vectype2 = get_mask_type_for_scalar_type (vinfo, rhs1_type);

…here we should just be able to set rhs1_type, and leave vectype2
to the code below.

> + }
> +   else if (!rhs1_op0_type && rhs1_op1_type)
> + {
> +   rhs1_type = TREE_TYPE (rhs1_op0);
> +   vectype2 = get_mask_type_for_scalar_type (vinfo, rhs1_type);
> +   if (!vectype2)
> + return NULL;
> +   rhs1_op1 = build_mask_conversion (vinfo, rhs1_op1,
> + vectype2, stmt_vinfo);
> + }
> +   else if (rhs1_op0_type && !rhs1_op1_type)
> + {
> +   rhs1_type = TREE_TYPE (rhs1_op1);
> +   vectype2 = get_mask_type_for_scalar_type (vinfo, rhs1_type);
> +   if (!vectype2)
> + return NULL;
> +   rhs1_op0 = build_mask_conversion (vinfo, rhs1_op0,
> + vectype2, stmt_vinfo);

Same for these two.

> + }
> +   else if (TYPE_PRECISION (rhs1_op0_type)
> +!= TYPE_PRECISION (rhs1_op1_type))
> + {
> +   int tmp1 = (int)TYPE_PRECISION (rhs1_op0_type)
> +  - (int)TYPE_PRECISION (TREE_TYPE (lhs));
> +   int tmp2 = (int)TYPE_PRECISION (rhs1_op1_type)
> +  - (int)TYPE_PRECISION (TREE_TYPE (lhs));
> +   if ((tmp1 > 0 && tmp2 > 0)||(tmp1 < 0 && tmp2 < 0))

Minor formatting nit, sorry, but: GCC style is to put a space after
(int) and on either side of ||.

Might be good to use the same numbering as the operands: tmp0 and tmp1
instead of tmp1 and tmp2.

> + {
> +   if (abs (tmp1) > abs (tmp2))
> + {
> +   vectype2 = get_mask_type_for_scalar_type (vinfo,
> + rhs1_op1_type);
> +   if (!vectype2)
> + return NULL;
> +   rhs1_op0 = build_mask_conversion (vinfo, rhs1_op0,
> + vectype2, stmt_vinfo);
> + }
> +   else
> + {
> +   vectype2 = get_mask_type_for_scalar_type (vinfo,
> + rhs1_op0_type);
> +   if (!vectype2)
> + return NULL;
> +   rhs1_op1 = build_mask_conversion (vinfo, rhs1_op1,
> + vectype2, stmt_vinfo);
> + }
> +   rhs1_type = integer_type_for_mask (rhs1_op0, vinfo);

Here I think we can just go with rhs1_type = rhs1_op1_type if
abs (tmp1) > abs (tmp2) (i.e. op1 is closer to the final type
than op0) and rhs1_op0_type otherwise.

> + }
> +   else
> + {
> +   rhs1_op0 = build_mask_conversion (vinfo, rhs1_op0,
> + vectype1, stmt_vinfo);
> +   rhs1_op1 = build_mask_conversion (vinfo, rhs1_op1,
> + vectype1, stmt_vinfo);
> +   rhs1_type = integer_type_for_mask (rhs1_op0, vinfo);
> +   if (!rhs1_type)
> + return NULL;
> +   vectype2 = get_mask_type_for_scalar_type (vinfo, rhs1_type);

and here I think rhs1_type can be:

  build_nonstandard_integer_type (TYPE_PRECISION (lhs_type), 1);

> + }
> + }
> +   else
> + {
> +   rhs1_type = integer_type_for_mask (rhs1_op0, vinfo);
> +   if (!rhs1_type)
> + return NULL;
> +   vectype2 = get_mask_type_for_scalar_type (vinfo, rhs1_type);

Here either rhs1_op0_type or rhs1_op1_type should be OK.

> + }
> +   tmp = build2 (TREE_CODE (rhs1), 

Re: [PATCH][omp, ftracer] Don't duplicate blocks in SIMT region

2020-09-24 Thread Richard Biener
On Wed, 23 Sep 2020, Tom de Vries wrote:

> On 9/23/20 9:28 AM, Richard Biener wrote:
> > On Tue, 22 Sep 2020, Tom de Vries wrote:
> > 
> >> [ was: Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB
> >> with SIMT LANE [PR95654] ]
> >>
> >> On 9/16/20 8:20 PM, Alexander Monakov wrote:
> >>>
> >>>
> >>> On Wed, 16 Sep 2020, Tom de Vries wrote:
> >>>
>  [ cc-ing author omp support for nvptx. ]
> >>>
> >>> The issue looks familiar. I recognized it back in 2017 (and LLVM people
> >>> recognized it too for their GPU targets). In an attempt to get agreement
> >>> to fix the issue "properly" for GCC I found a similar issue that affects
> >>> all targets, not just offloading, and filed it as PR 80053.
> >>>
> >>> (yes, there are no addressable labels involved in offloading, but 
> >>> nevertheless
> >>> the nature of the middle-end issue is related)
> >>
> >> Hi Alexander,
> >>
> >> thanks for looking into this.
> >>
> >> Seeing that the attempt to fix things properly is stalled, for now I'm
> >> proposing a point-fix, similar to the original patch proposed by Tobias.
> >>
> >> Richi, Jakub, OK for trunk?
> > 
> > I notice that we call ignore_bb_p many times in tracer.c but one call
> > is conveniently early in tail_duplicate (void):
> > 
> >   int n = count_insns (bb);
> >   if (!ignore_bb_p (bb))
> > blocks[bb->index] = heap.insert (-bb->count.to_frequency (cfun), 
> > bb);
> > 
> > where count_insns already walks all stmts in the block.  It would be
> > nice to avoid repeatedly walking all stmts, maybe adjusting the above
> > call is enough and/or count_insns can compute this and/or the ignore_bb_p
> > result can be cached (optimize_bb_for_size_p might change though,
> > but maybe all other ignore_bb_p calls effectively just are that,
> > checks for blocks that became optimize_bb_for_size_p).
> > 
> 
> This untested follow-up patch tries something in that direction.
> 
> Is this what you meant?

Yeah, sort of.

+static bool
+cached_can_duplicate_bb_p (const_basic_block bb)
+{
+  if (can_duplicate_bb)

is there any path where can_duplicate_bb would be NULL?

+{
+  unsigned int size = SBITMAP_SIZE (can_duplicate_bb);
+  /* Assume added bb's should be ignored.  */
+  if ((unsigned int)bb->index < size
+ && bitmap_bit_p (can_duplicate_bb_computed, bb->index))
+   return !bitmap_bit_p (can_duplicate_bb, bb->index);

yes, newly added bbs should be ignored so,

 }
 
-  return false;
+  bool val = compute_can_duplicate_bb_p (bb);
+  if (can_duplicate_bb)
+cache_can_duplicate_bb_p (bb, val);

no need to compute & cache for them, just return true (because
we did duplicate them)?

Thanks,
Richard.


> Thanks,
> - Tom
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


[committed][testsuite] Scan final instead of asm in independent-cloneids-1.c

2020-09-24 Thread Tom de Vries
Hi,

When running test-case gcc.dg/independent-cloneids-1.c for nvptx, we get:
...
FAIL: scan-assembler-times (?n)^_*bar[.$_]constprop[.$_]0: 1
FAIL: scan-assembler-times (?n)^_*bar[.$_]constprop[.$_]1: 1
FAIL: scan-assembler-times (?n)^_*bar[.$_]constprop[.$_]2: 1
FAIL: scan-assembler-times (?n)^_*foo[.$_]constprop[.$_]0: 1
FAIL: scan-assembler-times (?n)^_*foo[.$_]constprop[.$_]1: 1
FAIL: scan-assembler-times (?n)^_*foo[.$_]constprop[.$_]2: 1
...

The test expects to find something like:
...
bar.constprop.0:
...
but instead on nvptx we have:
...
.func (.param.u32 %value_out) bar$constprop$0
...

Fix this by rewriting the scans to use the final dump instead.

Tested on x86_64.

Committed to trunk.

Thanks,
- Tom

[testsuite] Scan final instead of asm in independent-cloneids-1.c

gcc/testsuite/ChangeLog:

2020-09-24  Tom de Vries  

* gcc.dg/independent-cloneids-1.c: Use scan-rtl-dump instead of
scan-assembler.

---
 gcc/testsuite/gcc.dg/independent-cloneids-1.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/independent-cloneids-1.c 
b/gcc/testsuite/gcc.dg/independent-cloneids-1.c
index 516211a6e86..efbc1c51da0 100644
--- a/gcc/testsuite/gcc.dg/independent-cloneids-1.c
+++ b/gcc/testsuite/gcc.dg/independent-cloneids-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fipa-cp -fipa-cp-clone"  } */
+/* { dg-options "-O3 -fipa-cp -fipa-cp-clone -fdump-rtl-final"  } */
 /* { dg-skip-if "Odd label definition syntax" { mmix-*-* } } */
 
 extern int printf (const char *, ...);
@@ -29,11 +29,11 @@ baz (int arg)
   return foo (8);
 }
 
-/* { dg-final { scan-assembler-times {(?n)^_*bar[.$_]constprop[.$_]0:} 1 } } */
-/* { dg-final { scan-assembler-times {(?n)^_*bar[.$_]constprop[.$_]1:} 1 } } */
-/* { dg-final { scan-assembler-times {(?n)^_*bar[.$_]constprop[.$_]2:} 1 } } */
-/* { dg-final { scan-assembler-times {(?n)^_*foo[.$_]constprop[.$_]0:} 1 } } */
-/* { dg-final { scan-assembler-times {(?n)^_*foo[.$_]constprop[.$_]1:} 1 } } */
-/* { dg-final { scan-assembler-times {(?n)^_*foo[.$_]constprop[.$_]2:} 1 } } */
-/* { dg-final { scan-assembler-not {(?n)^_*foo[.$_]constprop[.$_]3:} } } */
-/* { dg-final { scan-assembler-not {(?n)^_*foo[.$_]constprop[.$_]4:} } } */
+/* { dg-final { scan-rtl-dump-times {(?n)^;; Function bar.constprop 
\(bar[.$_]constprop[.$_]0,} 1 "final" } } */
+/* { dg-final { scan-rtl-dump-times {(?n)^;; Function bar.constprop 
\(bar[.$_]constprop[.$_]1,} 1 "final" } } */
+/* { dg-final { scan-rtl-dump-times {(?n)^;; Function bar.constprop 
\(bar[.$_]constprop[.$_]2,} 1 "final" } } */
+/* { dg-final { scan-rtl-dump-times {(?n)^;; Function foo.constprop 
\(foo[.$_]constprop[.$_]0,} 1 "final" } } */
+/* { dg-final { scan-rtl-dump-times {(?n)^;; Function foo.constprop 
\(foo[.$_]constprop[.$_]1,} 1 "final" } } */
+/* { dg-final { scan-rtl-dump-times {(?n)^;; Function foo.constprop 
\(foo[.$_]constprop[.$_]2,} 1 "final" } } */
+/* { dg-final { scan-rtl-dump-times {(?n)^;; Function foo.constprop 
\(foo[.$_]constprop[.$_]3,} 0 "final" } } */
+/* { dg-final { scan-rtl-dump-times {(?n)^;; Function foo.constprop 
\(foo[.$_]constprop[.$_]4,} 0 "final" } } */


Re: Add access through parameter derference tracking to modref

2020-09-24 Thread Richard Biener via Gcc-patches
On Thu, Sep 24, 2020 at 1:26 PM Jan Hubicka  wrote:
>
> > >
> > > I will do (but need to think bit of a redundancy between comment in
> > > ipa-modref and ipa-modref-tree)
> >
> > One place is enough - just add a pointer to the other place.
> Here is updated patch I am testing.  I adds documentation into
> ipa-modref-tree.h that is perhaps more natural place and links it from
> ipa-modref.c documentation.
>
> Also note that loads/stores are distinguished since for every function
> we have both a decision tree for loads and a different decision tree for
> stores.
>
> I do not plan to add more levels to the tree (at least for time being).
> I think that forming groups by alias sets is quite effective becuase
> TBAA oracle lookup is cheap and has good chance to disambiguate.  For
> the remaining info tracked I plan simple flat (and capped by small
> constant) list of accesses.
>
> It indeed seems that ptr_deref_may_alias_ref_p_1 is precisely what I
> need which simplifies the patch.  I did not know about it and simply
> followed for the main oracle does.
>
> Once the base/offset tracking is added I will need to figure out when
> the base pointers are same (or with a known offset) which is not readily
> available from ptr_deref_may_alias_ref_p_1, but we can do that step by
> step.
>
> Thanks a lot.
> I am re-testing the patch attached. OK if testing passes?
OK.

Richard.

> 2020-09-24  Jan Hubicka  
>
> * doc/invoke.texi: Document -fipa-modref, ipa-modref-max-bases,
> ipa-modref-max-refs, ipa-modref-max-accesses, ipa-modref-max-tests.
> * ipa-modref-tree.c (test_insert_search_collapse): Update.
> (test_merge): Update.
> (gt_ggc_mx): New function.
> * ipa-modref-tree.h (struct modref_access_node): New structure.
> (struct modref_ref_node): Add every_access and accesses array.
> (modref_ref_node::modref_ref_node): Update ctor.
> (modref_ref_node::search): New member function.
> (modref_ref_node::collapse): New member function.
> (modref_ref_node::insert_access): New member function.
> (modref_base_node::insert_ref): Do not collapse base if ref is 0.
> (modref_base_node::collapse): Copllapse also refs.
> (modref_tree): Add accesses.
> (modref_tree::modref_tree): Initialize max_accesses.
> (modref_tree::insert): Add access parameter.
> (modref_tree::cleanup): New member function.
> (modref_tree::merge): Add parm_map; merge accesses.
> (modref_tree::copy_from): New member function.
> (modref_tree::create_ggc): Add max_accesses.
> * ipa-modref.c (dump_access): New function.
> (dump_records): Dump accesses.
> (dump_lto_records): Dump accesses.
> (get_access): New function.
> (record_access): Record access.
> (record_access_lto): Record access.
> (analyze_call): Compute parm_map.
> (analyze_function): Update construction of modref records.
> (modref_summaries::duplicate): Likewise; use copy_from.
> (write_modref_records): Stream accesses.
> (read_modref_records): Sream accesses.
> (pass_ipa_modref::execute): Update call of merge.
> * params.opt (-param=modref-max-accesses): New.
> * tree-ssa-alias.c (alias_stats): Add modref_baseptr_tests.
> (dump_alias_stats): Update.
> (modref_may_conflict): Check accesses.
> (ref_maybe_used_by_call_p_1): Update call to modref_may_conflict.
> (call_may_clobber_ref_p_1): Update call to modref_may_conflict.
>
> gcc/testsuite/ChangeLog:
>
> 2020-09-24  Jan Hubicka  
>
> * gcc.dg/tree-ssa/modref-1.c: New test.
>
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 75203ba2420..623dfb8ac28 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -486,7 +486,7 @@ Objective-C and Objective-C++ Dialects}.
>  -fgcse-sm  -fhoist-adjacent-loads  -fif-conversion @gol
>  -fif-conversion2  -findirect-inlining @gol
>  -finline-functions  -finline-functions-called-once  -finline-limit=@var{n} 
> @gol
> --finline-small-functions  -fipa-cp  -fipa-cp-clone @gol
> +-finline-small-functions -fipa-modref -fipa-cp  -fipa-cp-clone @gol
>  -fipa-bit-cp  -fipa-vrp  -fipa-pta  -fipa-profile  -fipa-pure-const @gol
>  -fipa-reference  -fipa-reference-addressable @gol
>  -fipa-stack-alignment  -fipa-icf  -fira-algorithm=@var{algorithm} @gol
> @@ -9688,6 +9688,7 @@ compilation time.
>  -fif-conversion @gol
>  -fif-conversion2 @gol
>  -finline-functions-called-once @gol
> +-fipa-modref @gol
>  -fipa-profile @gol
>  -fipa-pure-const @gol
>  -fipa-reference @gol
> @@ -10783,11 +10784,18 @@ default at any optimization level.
>  @opindex fipa-profile
>  Perform interprocedural profile propagation.  The functions called only from
>  cold functions are marked as cold. Also functions executed once (such as
> -@code{cold}, @code{noreturn}, static constructors or destructors) are 
> 

[PATCH] target/97192 - new testcase for fixed PR

2020-09-24 Thread Richard Biener
This adds another testcase for the PR97085 fix.

Pused.

2020-09-24  Richard Biener  

PR tree-optimization/97085
* gcc.dg/pr97192.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr97192.c | 16 
 1 file changed, 16 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr97192.c

diff --git a/gcc/testsuite/gcc.dg/pr97192.c b/gcc/testsuite/gcc.dg/pr97192.c
new file mode 100644
index 000..16647ca67a3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr97192.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O -ftracer" } */
+/* { dg-additional-options "-mavx512vl" { target x86_64-*-* i?86-*-* } } */
+
+typedef int __attribute__ ((__vector_size__ (32))) V;
+
+int a, b;
+V v;
+
+int
+foo (void)
+{
+  b -= 4 - !a;
+  V u = 0 != v == a;
+  return u[0];
+}
-- 
2.26.2


Re: Add access through parameter derference tracking to modref

2020-09-24 Thread Jan Hubicka
> >
> > I will do (but need to think bit of a redundancy between comment in
> > ipa-modref and ipa-modref-tree)
> 
> One place is enough - just add a pointer to the other place.
Here is updated patch I am testing.  I adds documentation into
ipa-modref-tree.h that is perhaps more natural place and links it from
ipa-modref.c documentation.

Also note that loads/stores are distinguished since for every function
we have both a decision tree for loads and a different decision tree for
stores.

I do not plan to add more levels to the tree (at least for time being).
I think that forming groups by alias sets is quite effective becuase
TBAA oracle lookup is cheap and has good chance to disambiguate.  For
the remaining info tracked I plan simple flat (and capped by small
constant) list of accesses.

It indeed seems that ptr_deref_may_alias_ref_p_1 is precisely what I
need which simplifies the patch.  I did not know about it and simply
followed for the main oracle does.

Once the base/offset tracking is added I will need to figure out when
the base pointers are same (or with a known offset) which is not readily
available from ptr_deref_may_alias_ref_p_1, but we can do that step by
step.

Thanks a lot.
I am re-testing the patch attached. OK if testing passes?

2020-09-24  Jan Hubicka  

* doc/invoke.texi: Document -fipa-modref, ipa-modref-max-bases,
ipa-modref-max-refs, ipa-modref-max-accesses, ipa-modref-max-tests.
* ipa-modref-tree.c (test_insert_search_collapse): Update.
(test_merge): Update.
(gt_ggc_mx): New function.
* ipa-modref-tree.h (struct modref_access_node): New structure.
(struct modref_ref_node): Add every_access and accesses array.
(modref_ref_node::modref_ref_node): Update ctor.
(modref_ref_node::search): New member function.
(modref_ref_node::collapse): New member function.
(modref_ref_node::insert_access): New member function.
(modref_base_node::insert_ref): Do not collapse base if ref is 0.
(modref_base_node::collapse): Copllapse also refs.
(modref_tree): Add accesses.
(modref_tree::modref_tree): Initialize max_accesses.
(modref_tree::insert): Add access parameter.
(modref_tree::cleanup): New member function.
(modref_tree::merge): Add parm_map; merge accesses.
(modref_tree::copy_from): New member function.
(modref_tree::create_ggc): Add max_accesses.
* ipa-modref.c (dump_access): New function.
(dump_records): Dump accesses.
(dump_lto_records): Dump accesses.
(get_access): New function.
(record_access): Record access.
(record_access_lto): Record access.
(analyze_call): Compute parm_map.
(analyze_function): Update construction of modref records.
(modref_summaries::duplicate): Likewise; use copy_from.
(write_modref_records): Stream accesses.
(read_modref_records): Sream accesses.
(pass_ipa_modref::execute): Update call of merge.
* params.opt (-param=modref-max-accesses): New.
* tree-ssa-alias.c (alias_stats): Add modref_baseptr_tests.
(dump_alias_stats): Update.
(modref_may_conflict): Check accesses.
(ref_maybe_used_by_call_p_1): Update call to modref_may_conflict.
(call_may_clobber_ref_p_1): Update call to modref_may_conflict.

gcc/testsuite/ChangeLog:

2020-09-24  Jan Hubicka  

* gcc.dg/tree-ssa/modref-1.c: New test.


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 75203ba2420..623dfb8ac28 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -486,7 +486,7 @@ Objective-C and Objective-C++ Dialects}.
 -fgcse-sm  -fhoist-adjacent-loads  -fif-conversion @gol
 -fif-conversion2  -findirect-inlining @gol
 -finline-functions  -finline-functions-called-once  -finline-limit=@var{n} @gol
--finline-small-functions  -fipa-cp  -fipa-cp-clone @gol
+-finline-small-functions -fipa-modref -fipa-cp  -fipa-cp-clone @gol
 -fipa-bit-cp  -fipa-vrp  -fipa-pta  -fipa-profile  -fipa-pure-const @gol
 -fipa-reference  -fipa-reference-addressable @gol
 -fipa-stack-alignment  -fipa-icf  -fira-algorithm=@var{algorithm} @gol
@@ -9688,6 +9688,7 @@ compilation time.
 -fif-conversion @gol
 -fif-conversion2 @gol
 -finline-functions-called-once @gol
+-fipa-modref @gol
 -fipa-profile @gol
 -fipa-pure-const @gol
 -fipa-reference @gol
@@ -10783,11 +10784,18 @@ default at any optimization level.
 @opindex fipa-profile
 Perform interprocedural profile propagation.  The functions called only from
 cold functions are marked as cold. Also functions executed once (such as
-@code{cold}, @code{noreturn}, static constructors or destructors) are 
identified. Cold
-functions and loop less parts of functions executed once are then optimized for
-size.
+@code{cold}, @code{noreturn}, static constructors or destructors) are
+identified. Cold functions and loop less parts of functions executed once are
+then optimized for 

Re: Add access through parameter derference tracking to modref

2020-09-24 Thread Richard Biener via Gcc-patches
On Thu, Sep 24, 2020 at 12:54 PM Jan Hubicka  wrote:
>
> > > +  else if (TREE_CODE (op) == SSA_NAME
> > > +  && POINTER_TYPE_P (TREE_TYPE (op)))
> > > +{
> > > +  if (DECL_P (base) && !ptr_deref_may_alias_decl_p (op, base))
> > > +   return false;
> > > +  if (TREE_CODE (base) == SSA_NAME
> > > + && !ptr_derefs_may_alias_p (op, base))
> > > +   return false;
> > > +}
> >
> > this all looks redundant - why is it important to look at the
> > base of ref, why not simply ask below (*)
> >
> > > + modref_access_node *access_node;
> > > + FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node)
> > > +   {
> > > + if (num_tests >= max_tests)
> > > +   return true;
> > > +
> > > + if (access_node->parm_index == -1
> > > + || (unsigned)access_node->parm_index
> > > +>= gimple_call_num_args (stmt))
> > > +   return true;
> > > +
> > > + tree op = gimple_call_arg (stmt, access_node->parm_index);
> > > +
> > > + alias_stats.modref_baseptr_tests++;
> > > +
> > > + /* Lookup base, if this is the first time we compare bases. 
> > >  */
> > > + if (!base)
> >
> > Meh, so this function is a bit confusing with base_node, ref_node,
> > access_node and now 'base' and 'op'.  The loop now got a
> > new nest as well.
> >
> > I'm looking for a high-level description of the modref_tree <>
> > but cannot find any which makes reviewing this quite difficult...
>
> There is a description in ipa-modref.c though it may need a bit of expanding.
> Basically the modref summary represents a decision tree for
> tree-ssa-alias that has three levels
>   1) base which records base alias set,
>   2) ref which records ref alias set and
>   3) access wich presently records info whether the access is a
>   dreference of pointer passed by parameter. In future I will re-add
>   info about offset/size and base type. It would be posisble to record
>   the access path though I am not sure if that it is worth the effort
> >
> > > +   {
> > > + base = ref->ref;
> > > + while (handled_component_p (base))
> > > +   base = TREE_OPERAND (base, 0);
> >
> > ao_ref_base (ref)?  OK, that might strip an inner
> > MEM_REF, yielding in a decl, but ...
> >
> > > + if (TREE_CODE (base) == MEM_REF
> > > + || TREE_CODE (base) == TARGET_MEM_REF)
> > > +   base = TREE_OPERAND (base, 0);
> >
> > that might happen here, too.  But in the MEM_REF case base
> > is a pointer.
> >
> > > +   }
> > > +
> > > + if (base_may_alias_with_dereference_p (base, op))
> >
> > So this is a query purely at the caller side - whether 'ref' may
> > alias 'op'.
> >
> > ---
> >
> > (*) ptr_deref_may_alias_ref_p_1 (op, ref)
> >
> > without any of the magic?
>
> Hmm, it may actually just work, I did not know that looks through
> memrefs, let me re-test the patch.
> >
> > Can you please amend ipa-modref-tree.h/c with a toplevel comment
> > layint out the data structure and what is recorded?
>
> I will do (but need to think bit of a redundancy between comment in
> ipa-modref and ipa-modref-tree)

One place is enough - just add a pointer to the other place.

Richard.

> Honza
> >
> > Thanks,
> > Richard.
> >
> > > +   return true;
> > > + num_tests++;
> > > +   }
> > > }
> > >  }
> > >return false;
> > > @@ -2510,7 +2584,7 @@ ref_maybe_used_by_call_p_1 (gcall *call, ao_ref 
> > > *ref, bool tbaa_p)
> > >   modref_summary *summary = get_modref_function_summary (node);
> > >   if (summary)
> > > {
> > > - if (!modref_may_conflict (summary->loads, ref, tbaa_p))
> > > + if (!modref_may_conflict (call, summary->loads, ref, 
> > > tbaa_p))
> > > {
> > >   alias_stats.modref_use_no_alias++;
> > >   if (dump_file && (dump_flags & TDF_DETAILS))
> > > @@ -2934,7 +3008,7 @@ call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, 
> > > bool tbaa_p)
> > >   modref_summary *summary = get_modref_function_summary (node);
> > >   if (summary)
> > > {
> > > - if (!modref_may_conflict (summary->stores, ref, tbaa_p))
> > > + if (!modref_may_conflict (call, summary->stores, ref, 
> > > tbaa_p))
> > > {
> > >   alias_stats.modref_clobber_no_alias++;
> > >   if (dump_file && (dump_flags & TDF_DETAILS))


Re: Add access through parameter derference tracking to modref

2020-09-24 Thread Jan Hubicka
> > +  else if (TREE_CODE (op) == SSA_NAME
> > +  && POINTER_TYPE_P (TREE_TYPE (op)))
> > +{
> > +  if (DECL_P (base) && !ptr_deref_may_alias_decl_p (op, base))
> > +   return false;
> > +  if (TREE_CODE (base) == SSA_NAME
> > + && !ptr_derefs_may_alias_p (op, base))
> > +   return false;
> > +}
> 
> this all looks redundant - why is it important to look at the
> base of ref, why not simply ask below (*)
> 
> > + modref_access_node *access_node;
> > + FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node)
> > +   {
> > + if (num_tests >= max_tests)
> > +   return true;
> > +
> > + if (access_node->parm_index == -1
> > + || (unsigned)access_node->parm_index
> > +>= gimple_call_num_args (stmt))
> > +   return true;
> > +
> > + tree op = gimple_call_arg (stmt, access_node->parm_index);
> > +
> > + alias_stats.modref_baseptr_tests++;
> > +
> > + /* Lookup base, if this is the first time we compare bases.  
> > */
> > + if (!base)
> 
> Meh, so this function is a bit confusing with base_node, ref_node,
> access_node and now 'base' and 'op'.  The loop now got a
> new nest as well.
> 
> I'm looking for a high-level description of the modref_tree <>
> but cannot find any which makes reviewing this quite difficult...

There is a description in ipa-modref.c though it may need a bit of expanding.
Basically the modref summary represents a decision tree for
tree-ssa-alias that has three levels
  1) base which records base alias set,
  2) ref which records ref alias set and
  3) access wich presently records info whether the access is a
  dreference of pointer passed by parameter. In future I will re-add
  info about offset/size and base type. It would be posisble to record
  the access path though I am not sure if that it is worth the effort
> 
> > +   {
> > + base = ref->ref;
> > + while (handled_component_p (base))
> > +   base = TREE_OPERAND (base, 0);
> 
> ao_ref_base (ref)?  OK, that might strip an inner
> MEM_REF, yielding in a decl, but ...
> 
> > + if (TREE_CODE (base) == MEM_REF
> > + || TREE_CODE (base) == TARGET_MEM_REF)
> > +   base = TREE_OPERAND (base, 0);
> 
> that might happen here, too.  But in the MEM_REF case base
> is a pointer.
> 
> > +   }
> > +
> > + if (base_may_alias_with_dereference_p (base, op))
> 
> So this is a query purely at the caller side - whether 'ref' may
> alias 'op'.
> 
> ---
> 
> (*) ptr_deref_may_alias_ref_p_1 (op, ref)
> 
> without any of the magic?

Hmm, it may actually just work, I did not know that looks through
memrefs, let me re-test the patch.
> 
> Can you please amend ipa-modref-tree.h/c with a toplevel comment
> layint out the data structure and what is recorded?

I will do (but need to think bit of a redundancy between comment in
ipa-modref and ipa-modref-tree)

Honza
> 
> Thanks,
> Richard.
> 
> > +   return true;
> > + num_tests++;
> > +   }
> > }
> >  }
> >return false;
> > @@ -2510,7 +2584,7 @@ ref_maybe_used_by_call_p_1 (gcall *call, ao_ref *ref, 
> > bool tbaa_p)
> >   modref_summary *summary = get_modref_function_summary (node);
> >   if (summary)
> > {
> > - if (!modref_may_conflict (summary->loads, ref, tbaa_p))
> > + if (!modref_may_conflict (call, summary->loads, ref, tbaa_p))
> > {
> >   alias_stats.modref_use_no_alias++;
> >   if (dump_file && (dump_flags & TDF_DETAILS))
> > @@ -2934,7 +3008,7 @@ call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, 
> > bool tbaa_p)
> >   modref_summary *summary = get_modref_function_summary (node);
> >   if (summary)
> > {
> > - if (!modref_may_conflict (summary->stores, ref, tbaa_p))
> > + if (!modref_may_conflict (call, summary->stores, ref, tbaa_p))
> > {
> >   alias_stats.modref_clobber_no_alias++;
> >   if (dump_file && (dump_flags & TDF_DETAILS))


[committed][testsuite, nvptx] Fix string matching in gcc.dg/pr87314-1.c

2020-09-24 Thread Tom de Vries
Hi,

with nvptx we run into:
...
FAIL: gcc.dg/pr87314-1.c scan-assembler hellooo
...

The required string is part of the assembly, just in a different format than
expected:
...
.const .align 1 .u8 $LC0[12] =
  { 104, 101, 108, 108, 111, 111, 111, 111, 98, 121, 101, 0 };
...

Fix this by adding an nvptx-specific scan-assembler directive.

Tested on nvptx and x86_64.

Committed to trunk.

Thanks,
- Tom

[testsuite, nvptx] Fix string matching in gcc.dg/pr87314-1.c

gcc/testsuite/ChangeLog:

2020-09-24  Tom de Vries  

* gcc.dg/pr87314-1.c: Add nvptx-specific scan-assembler directive.

---
 gcc/testsuite/gcc.dg/pr87314-1.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr87314-1.c b/gcc/testsuite/gcc.dg/pr87314-1.c
index 9bc905612b5..0cb9c07e32c 100644
--- a/gcc/testsuite/gcc.dg/pr87314-1.c
+++ b/gcc/testsuite/gcc.dg/pr87314-1.c
@@ -8,4 +8,6 @@ int h() { return "bye"=="hellbye"+8; }
 /* { dg-final { scan-tree-dump-times "hello" 1 "original" } } */
 /* The test in h() should be retained because the result depends on
string merging.  */
-/* { dg-final { scan-assembler "hellooo" } } */
+/* { dg-final { scan-assembler "hellooo" { target { ! nvptx*-*-* } } } } */
+/* { dg-final { scan-assembler "104, 101, 108, 108, 111, 111, 111" { target { 
nvptx*-*-* } } } } */
+


Re: Add access through parameter derference tracking to modref

2020-09-24 Thread Richard Biener via Gcc-patches
On Thu, Sep 24, 2020 at 11:06 AM Jan Hubicka  wrote:
>
> Hi,
> this patch re-adds tracking of accesses which was unfinished in David's patch.
> At the moment I only implemented tracking of the fact that access is based on
> derefernece of the parameter (so we track THIS pointers).
> Patch does not implement IPA propagation since it needs bit more work which
> I will post shortly: ipa-fnsummary needs to track when parameter points to
> local memory, summaries needs to be merged when function is inlined (because
> jump functions are) and propagation needs to be turned into iterative dataflow
> on SCC components.
>
> Patch also adds documentation of -fipa-modref and params that was left 
> uncommited
> in my branch :(.
>
> Even without this change it does lead to nice increase of disambiguations
> for cc1plus build.
>
> Alias oracle query stats:
>   refs_may_alias_p: 62758323 disambiguations, 72935683 queries
>   ref_maybe_used_by_call_p: 139511 disambiguations, 63654045 queries
>   call_may_clobber_ref_p: 23502 disambiguations, 29242 queries
>   nonoverlapping_component_refs_p: 0 disambiguations, 37654 queries
>   nonoverlapping_refs_since_match_p: 19417 disambiguations, 5 must 
> overlaps, 75721 queries
>   aliasing_component_refs_p: 54665 disambiguations, 752449 queries
>   TBAA oracle: 21917926 disambiguations 53054678 queries
>15763411 are in alias set 0
>10162238 queries asked about the same object
>124 queries asked about the same alias set
>0 access volatile
>3681593 are dependent in the DAG
>1529386 are aritificially in conflict with void *
>
> Modref stats:
>   modref use: 8311 disambiguations, 32527 queries
>   modref clobber: 742126 disambiguations, 1036986 queries
>   1987054 tbaa queries (1.916182 per modref query)
>   125479 base compares (0.121004 per modref query)
>
> PTA query stats:
>   pt_solution_includes: 968314 disambiguations, 13609584 queries
>   pt_solutions_intersect: 1019136 disambiguations, 13147139 queries
>
> So compared to
> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554605.html
> we get 41% more use disambiguations (with similar number of queries) and 8% 
> more
> clobber disambiguations.
>
> For tramp3d:
> Alias oracle query stats:
>   refs_may_alias_p: 2052256 disambiguations, 2312703 queries
>   ref_maybe_used_by_call_p: 7122 disambiguations, 2089118 queries
>   call_may_clobber_ref_p: 234 disambiguations, 234 queries
>   nonoverlapping_component_refs_p: 0 disambiguations, 4299 queries
>   nonoverlapping_refs_since_match_p: 329 disambiguations, 10200 must 
> overlaps, 10616 queries
>   aliasing_component_refs_p: 857 disambiguations, 34555 queries
>   TBAA oracle: 885546 disambiguations 1677080 queries
>132105 are in alias set 0
>469030 queries asked about the same object
>0 queries asked about the same alias set
>0 access volatile
>190084 are dependent in the DAG
>315 are aritificially in conflict with void *
>
> Modref stats:
>   modref use: 426 disambiguations, 1881 queries
>   modref clobber: 10042 disambiguations, 16202 queries
>   19405 tbaa queries (1.197692 per modref query)
>   2775 base compares (0.171275 per modref query)
>
> PTA query stats:
>   pt_solution_includes: 313908 disambiguations, 526183 queries
>   pt_solutions_intersect: 130510 disambiguations, 416084 queries
>
> Here uses decrease by 4 disambiguations and clobber improve by 3.5%.  I think
> the difference is caused by fact that gcc has much more alias set 0 accesses
> originating from gimple and tree unions as I mentioned in original mail.
>
> After pushing out the IPA propagation I will re-add code to track offsets and
> sizes that further improve disambiguation. On tramp3d it enables a lot of DSE
> for structure fields not acessed by uninlined function.
>
> Bootstrapped/regtested x86_64-linux also lto-bootstrapped without chekcing (to
> get the stats). OK?
>
> Richi, all aliasing related changes are in base_may_alias_with_dereference_p.
>
> Honza
>
> 2020-09-24  Jan Hubicka  
>
> * doc/invoke.texi: Document -fipa-modref, ipa-modref-max-bases,
> ipa-modref-max-refs, ipa-modref-max-accesses, ipa-modref-max-tests.
> * ipa-modref-tree.c (test_insert_search_collapse): Update.
> (test_merge): Update.
> (gt_ggc_mx): New function.
> * ipa-modref-tree.h (struct modref_access_node): New structure.
> (struct modref_ref_node): Add every_access and accesses array.
> (modref_ref_node::modref_ref_node): Update ctor.
> (modref_ref_node::search): New member function.
> (modref_ref_node::collapse): New member function.
> (modref_ref_node::insert_access): New member function.
> (modref_base_node::insert_ref): Do not collapse base if ref is 0.
> (modref_base_node::collapse): Copllapse also refs.
> 

[PATCH][GCC 8] AArch64: Update Armv8.4-a's FP16 FML intrinsics

2020-09-24 Thread Kyrylo Tkachov
Hi all,

I'd like to backport this fix from Tamar to the GCC 8 branch to avoid having 
incorrectly-named intrinsics.
Tested on aarch64-none-elf.

Committing to the branch.

This patch updates the Armv8.4-a FP16 FML intrinsics's suffixes from u32 to f16
to be more consistent with the naming convention for intrinsics.

The specifications for these intrinsics have not been published yet so we do
not need to maintain the old names.

The patch was created with the following script:

grep -lIE "(vfml[as].+)_u32" -r gcc/ | grep -iEv ".+Changelog.*" \
  | xargs sed -i -E -e "s/(vfml[as].+)_u32/\1_f16/g"

gcc/
PR target/71233
* config/aarch64/arm_neon.h (vfmlal_low_u32, vfmlsl_low_u32,
vfmlalq_low_u32, vfmlslq_low_u32, vfmlal_high_u32, vfmlsl_high_u32,
vfmlalq_high_u32, vfmlslq_high_u32, vfmlal_lane_low_u32,
vfmlsl_lane_low_u32, vfmlal_laneq_low_u32, vfmlsl_laneq_low_u32,
vfmlalq_lane_low_u32, vfmlslq_lane_low_u32, vfmlalq_laneq_low_u32,
vfmlslq_laneq_low_u32, vfmlal_lane_high_u32, vfmlsl_lane_high_u32,
vfmlal_laneq_high_u32, vfmlsl_laneq_high_u32, vfmlalq_lane_high_u32,
vfmlslq_lane_high_u32, vfmlalq_laneq_high_u32, vfmlslq_laneq_high_u32):
Rename ...
(vfmlal_low_f16, vfmlsl_low_f16, vfmlalq_low_f16, vfmlslq_low_f16,
vfmlal_high_f16, vfmlsl_high_f16, vfmlalq_high_f16, vfmlslq_high_f16,
vfmlal_lane_low_f16, vfmlsl_lane_low_f16, vfmlal_laneq_low_f16,
vfmlsl_laneq_low_f16, vfmlalq_lane_low_f16, vfmlslq_lane_low_f16,
vfmlalq_laneq_low_f16, vfmlslq_laneq_low_f16, vfmlal_lane_high_f16,
vfmlsl_lane_high_f16, vfmlal_laneq_high_f16, vfmlsl_laneq_high_f16,
vfmlalq_lane_high_f16, vfmlslq_lane_high_f16, vfmlalq_laneq_high_f16,
vfmlslq_laneq_high_f16): ... To this.

gcc/testsuite/
PR target/71233
* gcc.target/aarch64/fp16_fmul_high.h (test_vfmlal_high_u32,
test_vfmlalq_high_u32, test_vfmlsl_high_u32, test_vfmlslq_high_u32):
Rename ...
(test_vfmlal_high_f16, test_vfmlalq_high_f16, test_vfmlsl_high_f16,
test_vfmlslq_high_f16): ... To this.
* gcc.target/aarch64/fp16_fmul_lane_high.h (test_vfmlal_lane_high_u32,
tets_vfmlsl_lane_high_u32, test_vfmlal_laneq_high_u32,
test_vfmlsl_laneq_high_u32, test_vfmlalq_lane_high_u32,
test_vfmlslq_lane_high_u32, test_vfmlalq_laneq_high_u32,
test_vfmlslq_laneq_high_u32): Rename ...
(test_vfmlal_lane_high_f16, tets_vfmlsl_lane_high_f16,
test_vfmlal_laneq_high_f16, test_vfmlsl_laneq_high_f16,
test_vfmlalq_lane_high_f16, test_vfmlslq_lane_high_f16,
test_vfmlalq_laneq_high_f16, test_vfmlslq_laneq_high_f16): ... To this.
* gcc.target/aarch64/fp16_fmul_lane_low.h (test_vfmlal_lane_low_u32,
test_vfmlsl_lane_low_u32, test_vfmlal_laneq_low_u32,
test_vfmlsl_laneq_low_u32, test_vfmlalq_lane_low_u32,
test_vfmlslq_lane_low_u32, test_vfmlalq_laneq_low_u32,
test_vfmlslq_laneq_low_u32): Rename ...
(test_vfmlal_lane_low_f16, test_vfmlsl_lane_low_f16,
test_vfmlal_laneq_low_f16, test_vfmlsl_laneq_low_f16,
test_vfmlalq_lane_low_f16, test_vfmlslq_lane_low_f16,
test_vfmlalq_laneq_low_f16, test_vfmlslq_laneq_low_f16): ... To this.
* gcc.target/aarch64/fp16_fmul_low.h (test_vfmlal_low_u32,
test_vfmlalq_low_u32, test_vfmlsl_low_u32, test_vfmlslq_low_u32):
Rename ...
(test_vfmlal_low_f16, test_vfmlalq_low_f16, test_vfmlsl_low_f16,
test_vfmlslq_low_f16): ... To This.
* lib/target-supports.exp
(check_effective_target_arm_fp16fml_neon_ok_nocache): Update test.

(cherry picked from commit 9d04c986b6faed878dbcc86d2f9392a721a3936e)


fmla-rename.patch
Description: fmla-rename.patch


[PATCH][GCC 8] Add missing AArch64 NEON instrinctics for Armv8.2-a to Armv8.4-a

2020-09-24 Thread Kyrylo Tkachov
Hi all,

I'd like to backport this patch to the GCC 8 branch that implements intrinsics 
that were (erroneously) missed out
of the initial implementation in GCC 8.

Bootstrapped and tested on aarch64-none-linux-gnu on the branch.

This patch adds the missing neon intrinsics for all 128 bit vector Integer 
modes for the
three-way XOR and negate and xor instructions for Arm8.2-a to Armv8.4-a.

gcc/
2018-05-21  Tamar Christina  

PR target/71233
* config/aarch64/aarch64-simd.md (aarch64_eor3qv8hi): Change to
eor3q4.
(aarch64_bcaxqv8hi): Change to bcaxq4.
* config/aarch64/aarch64-simd-builtins.def (veor3q_u8, veor3q_u32,
veor3q_u64, veor3q_s8, veor3q_s16, veor3q_s32, veor3q_s64, vbcaxq_u8,
vbcaxq_u32, vbcaxq_u64, vbcaxq_s8, vbcaxq_s16, vbcaxq_s32,
vbcaxq_s64): New.
* config/aarch64/arm_neon.h: Likewise.
* config/aarch64/iterators.md (VQ_I): New.

gcc/testsuite/
2018-05-21  Tamar Christina  

PR target/71233
* gcc.target/aarch64/sha3.h (veor3q_u8, veor3q_u32,
veor3q_u64, veor3q_s8, veor3q_s16, veor3q_s32, veor3q_s64, vbcaxq_u8,
vbcaxq_u32, vbcaxq_u64, vbcaxq_s8, vbcaxq_s16, vbcaxq_s32,
vbcaxq_s64): New.
* gcc.target/aarch64/sha3_1.c: Likewise.
* gcc.target/aarch64/sha3_2.c: Likewise.
* gcc.target/aarch64/sha3_3.c: Likewise.

(cherry picked from commit d21052ebd7ac9d545a26dde3229c57f872c1d5f3)


bcax.patch
Description: bcax.patch


[PATCH][testsuite] Add effective target ident_directive

2020-09-24 Thread Tom de Vries
Hi,

On nvptx we run into:
...
FAIL: c-c++-common/ident-1b.c  -Wc++-compat   scan-assembler GCC:
FAIL: c-c++-common/ident-2b.c  -Wc++-compat   scan-assembler GCC:
...

Using a scan-assembler directive adds -fno-indent to the compile options.
The test c-c++-common/ident-1b.c adds dg-options "-fident", and intends to
check that the -fident overrides the -fno-indent, by means of the
scan-assembler.  But for nvptx, there's no .ident directive, both with -fident
and -fno-ident.

Fix this by adding an effective target ident_directive, and requiring
it in both test-cases.

Tested on nvptx and x86_64.

OK for trunk?

Thanks,
- Tom

[testsuite] Add effective target ident_directive

gcc/testsuite/ChangeLog:

2020-09-24  Tom de Vries  

* lib/target-supports.exp (check_effective_target_ident_directive):
New proc.
* c-c++-common/ident-1b.c: Require effective target ident_directive.
* c-c++-common/ident-2b.c: Same.

---
 gcc/testsuite/c-c++-common/ident-1b.c | 1 +
 gcc/testsuite/c-c++-common/ident-2b.c | 1 +
 gcc/testsuite/lib/target-supports.exp | 9 +
 3 files changed, 11 insertions(+)

diff --git a/gcc/testsuite/c-c++-common/ident-1b.c 
b/gcc/testsuite/c-c++-common/ident-1b.c
index 69567442a03..b8b83e64ad2 100644
--- a/gcc/testsuite/c-c++-common/ident-1b.c
+++ b/gcc/testsuite/c-c++-common/ident-1b.c
@@ -2,6 +2,7 @@
  * Make sure scan-assembler turns off .ident unless -fident in testcase */
 /* { dg-do compile } */
 /* { dg-options "-fident" } */
+/* { dg-require-effective-target ident_directive }*/
 int i;
 
 /* { dg-final { scan-assembler "GCC: " { xfail { { hppa*-*-hpux* && { ! lp64 } 
} || { powerpc-ibm-aix* || powerpc*-*-darwin* } } } } } */
diff --git a/gcc/testsuite/c-c++-common/ident-2b.c 
b/gcc/testsuite/c-c++-common/ident-2b.c
index fae6a031571..52f0693e164 100644
--- a/gcc/testsuite/c-c++-common/ident-2b.c
+++ b/gcc/testsuite/c-c++-common/ident-2b.c
@@ -2,6 +2,7 @@
  * Make sure scan-assembler-times turns off .ident unless -fident in testcase 
*/
 /* { dg-do compile } */
 /* { dg-options "-fident" } */
+/* { dg-require-effective-target ident_directive }*/
 int ident;
 
 /* { dg-final { scan-assembler "GCC: " { xfail { { hppa*-*-hpux* && { ! lp64 } 
} || { powerpc-ibm-aix* || powerpc*-*-darwin* } } } } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 5cbe32ffbd6..0a00972edb5 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -10510,3 +10510,12 @@ proc check_symver_available { } {
}
}]
 }
+
+# Return 1 if emitted assembly contains .ident directive.
+
+proc check_effective_target_ident_directive {} {
+return [check_no_messages_and_pattern ident_directive \
+   "(?n)^\[\t\]+\\.ident" assembly {
+   int i;
+}]
+}


Tighten flag_pic processing in vxworks_override_options

2020-09-24 Thread Olivier Hainque

This fixes spurious complaints about PIC mode not supported
on "gcc --help=...", on VxWorks without -mrtp. The spurious message
is emitted by vxworks_override_options, called with flag_pic == -1
when we're running for --help.

The change simply adjusts the check testing for "we're generating pic code"
to "flag_pic > 0" instead of just "flag_pic". We're not generating code at
all when reaching here with -1.

Tested by verifying that the spurious message goes away in
production gcc-9 based toolchains for more than a year now, and
sanity checked that I can build a mainline compiler with the
patch applied.

Committing to mainline shortly.

Olivier

2020-09-24  Olivier Hainque  

* config/vxworks.c (vxworks_override_options): Guard pic checks with
flag_pic > 0 instead of just flag_pic.




0001-Tigthen-flag_pic-processing-in-vxworks_override_o.diff
Description: Binary data


Re: [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179)

2020-09-24 Thread Richard Biener via Gcc-patches
On Thu, Sep 24, 2020 at 11:50 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Thu, Sep 24, 2020 at 11:41:00AM +0200, Tobias Burnus wrote:
> > Following Jakub's suggestion, I also added
> >   __attribute__((used))
> > to the tree belonging to both tables in omp-offload.c's omp_finish
> > but that did not help, either.
>
> That is really DECL_PRESERVED_P, the attribute is turned into that, so no
> need to have attribute around after setting it.
> That is needed (but already done), but clearly not sufficient.
> What we need to emulate is the effect of all those decls being referenced
> from a single (preserved) initializer, which would need to refer to their
> names too.  Except we don't really have such a var and initializer
> constructed early enough probably.
> Now, for vars with initializers I think there is
> record_references_in_initializer to remember those references, so do we need
> to emulate that behavior?
> Or, see what effects it has on the partitioning, and if it means forcing all
> the referenced decls that aren't TREE_PUBLIC into the same partition, do it
> for the offloading funcs and vars too?

Create the offload table at WPA time so we get to see it during partitioning?

> Jakub
>


Re: [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179)

2020-09-24 Thread Richard Biener via Gcc-patches
On Thu, Sep 24, 2020 at 11:41 AM Tobias Burnus  wrote:
>
> On 9/24/20 10:03 AM, Richard Biener wrote:
>
> >> The symbols are added to offload_vars + offload_funcs.
> >> In lto-cgraph.c's output_offload_tables there is the last chance
> >> to remove now unused nodes ? as once the tables are streamed
> >> for device usage, they cannot be changed. Hence, there one
> >> has
> >> node->force_output = 1;
> >> [Unrelated: this prevents later optimizations, which still
> >> could be done; cf. PR95622]
> >>
> >>
> >> The table itself is written in omp-offload.c's omp_finish_file.
> > But this is called at LTRANS time only, in particular we seem
> > to stream the offload_funcs/vars array, marking streamed nodes
> > as force_output but we do not make the offload table visible
> > to the partitioner.  But force_output should make the
> > nodes not renamed.  But then output_offload_tables is called at
> > the very end and we likely do not stream the altered
> > force_output state.
> >
> > So - can you try, in prune_offload_funcs, in addition to
> > setting DECL_PRESERVE_P, mark the cgraph node ->force_output
> > so this happens early?  I guess the same is needed for
> > variables (there's no prune_offloar_vars ...).
>
> As it accesses global variables, I could do just the same
> with the variables – but it did not seems to have an effect.
>
> Following Jakub's suggestion, I also added
>__attribute__((used))
> to the tree belonging to both tables in omp-offload.c's omp_finish
> but that did not help, either.
>
> I think both the 'used' and 'force_output' are red herrings:
> after all, the tables and the referrenced funcs/vars are output;
> the problem is 'just' that they end up in different ltrans
> while not being public. – Thus, some property ia wrong
> during building the cgraph or when it is partitioned into ltrans.
>
> Any additional suggestion to try?

As I said the table itself is only created _after_ partitioning
so LTO doesn't see they are referenced from outside of their
LTRANS unit (in the unit that has the offload table).

I think we need to create the offload table during WPA
instead.

Richard.

> Tobias
>
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, 
> Alexander Walter


Re: [PATCH] aarch64: Do not alter value on a force_reg returned rtx expanding __jcvt

2020-09-24 Thread Andrea Corallo
Andrea Corallo  writes:

> Kyrylo Tkachov  writes:
[...]
>>
>> Can you please also backport it to the appropriate branches as well after 
>> some time on trunk.
>> Thanks,
>> Kyrill
>
> Ciao Kyrill,
>
> Sure happy to do that.  For now into trunk as 2c62952f816.

Backported into gcc-10 as aa47c987340.

Thanks

  Andrea


Re: [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179)

2020-09-24 Thread Jakub Jelinek via Gcc-patches
On Thu, Sep 24, 2020 at 11:41:00AM +0200, Tobias Burnus wrote:
> Following Jakub's suggestion, I also added
>   __attribute__((used))
> to the tree belonging to both tables in omp-offload.c's omp_finish
> but that did not help, either.

That is really DECL_PRESERVED_P, the attribute is turned into that, so no
need to have attribute around after setting it.
That is needed (but already done), but clearly not sufficient.
What we need to emulate is the effect of all those decls being referenced
from a single (preserved) initializer, which would need to refer to their
names too.  Except we don't really have such a var and initializer
constructed early enough probably.
Now, for vars with initializers I think there is
record_references_in_initializer to remember those references, so do we need
to emulate that behavior?
Or, see what effects it has on the partitioning, and if it means forcing all
the referenced decls that aren't TREE_PUBLIC into the same partition, do it
for the offloading funcs and vars too?

Jakub



[ping] move and adjust PROBE_STACK_*_REG on aarch64

2020-09-24 Thread Olivier Hainque
Hello,

After
  https://gcc.gnu.org/pipermail/gcc-patches/2020-January/537843.html
and
  https://gcc.gnu.org/legacy-ml/gcc-patches/2019-12/msg01398.html

Re-proposing this patch after re-testing with a recent
mainline on on aarch64-linux (bootstrap and regression test
with --enable-languages=all), and more than a year of in-house
use in production for a few aarch64 ports on a gcc-9 base.

The change moves the definitions of PROBE_STACK_FIRST_REG
and PROBE_STACK_SECOND_REG to a more appropriate place for such
items (here, in aarch64.md as suggested by Richard), and adjusts
their value from r9/r10 to r10/r11 to free r9 for a possibly
more general purpose (e.g. as a static chain at least on targets
which have a private use of r18, such as Windows or Vxworks).

OK to commit?

Thanks in advance,

With Kind Regards,

Olivier

2020-11-07  Olivier Hainque  

* config/aarch64/aarch64.md: Define PROBE_STACK_FIRST_REGNUM
and PROBE_STACK_SECOND_REGNUM constants, designating r10/r11.
Replacements for the PROBE_STACK_FIRST/SECOND_REG constants in
aarch64.c.
* config/aarch64/aarch64.c (PROBE_STACK_FIRST_REG): Remove.
(PROBE_STACK_SECOND_REG): Remove.
(aarch64_emit_probe_stack_range): Adjust to the _REG -> _REGNUM
suffix update for PROBE_STACK register numbers.



aarch64-regnum.diff
Description: Binary data


Re: [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179)

2020-09-24 Thread Tobias Burnus

On 9/24/20 10:03 AM, Richard Biener wrote:


The symbols are added to offload_vars + offload_funcs.
In lto-cgraph.c's output_offload_tables there is the last chance
to remove now unused nodes ? as once the tables are streamed
for device usage, they cannot be changed. Hence, there one
has
node->force_output = 1;
[Unrelated: this prevents later optimizations, which still
could be done; cf. PR95622]


The table itself is written in omp-offload.c's omp_finish_file.

But this is called at LTRANS time only, in particular we seem
to stream the offload_funcs/vars array, marking streamed nodes
as force_output but we do not make the offload table visible
to the partitioner.  But force_output should make the
nodes not renamed.  But then output_offload_tables is called at
the very end and we likely do not stream the altered
force_output state.

So - can you try, in prune_offload_funcs, in addition to
setting DECL_PRESERVE_P, mark the cgraph node ->force_output
so this happens early?  I guess the same is needed for
variables (there's no prune_offloar_vars ...).


As it accesses global variables, I could do just the same
with the variables – but it did not seems to have an effect.

Following Jakub's suggestion, I also added
  __attribute__((used))
to the tree belonging to both tables in omp-offload.c's omp_finish
but that did not help, either.

I think both the 'used' and 'force_output' are red herrings:
after all, the tables and the referrenced funcs/vars are output;
the problem is 'just' that they end up in different ltrans
while not being public. – Thus, some property ia wrong
during building the cgraph or when it is partitioned into ltrans.

Any additional suggestion to try?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [PATCH] PR libstdc++/71579 assert that type traits are not misused with an incomplete type

2020-09-24 Thread Jonathan Wakely via Gcc-patches

On 24/09/20 10:15 +0300, Antony Polukhin via Libstdc++ wrote:

Looks like the last patch was not applied. Do I have to change something in
it?


No, it just hasn't been reviewed yet.




  1   2   >