[PATCH] i386: Fix splitters that call extract_insn_cached [PR93611]

2020-02-06 Thread Jakub Jelinek
Hi!

The following testcase ICEs.  The generated split_insns starts
with recog_data.insn = NULL and then tries to put various operands into
recog_data.operand array and checks various splitter conditions.
The problem is that some atom related tuning splitters indirectly call
extract_insn_cached on the insn they are used in.  This can change
recog_data.operand, but most likely it will just keep it as is, but
sets recog_data.insn to the current instruction.  If that splitter doesn't
match, we continue trying some other split conditions and modify
recog_data.operand array again.  If even that doesn't find any usable
splitter, we punt, but at that point recog_data.insn says that recog_data
is valid for that particular instruction, even when recog_data.operand array
can be anything.
The safest thing would be to copy whole recog_data to a temporary object
before doing the calls that can call extract_insn_cached and restore it
afterwards, but it would be also very costly, recog_data has 1280 bytes.
So, this patch just makes sure to clear recog_data.insn if it has changed
during the extract_insn_cached call, which means if we extract_insn_cached
later, we'll extract it properly, while if we call it say from some other
context than splitter conditions, the insn is already cached, we don't reset
the cache.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-02-07  Jakub Jelinek  

PR target/93611
* config/i386/i386.c (ix86_lea_outperforms): Make sure to clear
recog_data.insn if distance_non_agu_define changed it.

* gcc.target/i386/pr93611.c: New test.

--- gcc/config/i386/i386.c.jj   2020-01-28 08:45:56.781090684 +0100
+++ gcc/config/i386/i386.c  2020-02-06 16:29:35.548663197 +0100
@@ -14459,9 +14459,18 @@ ix86_lea_outperforms (rtx_insn *insn, un
   return true;
 }
 
+  rtx_insn *rinsn = recog_data.insn;
+
   dist_define = distance_non_agu_define (regno1, regno2, insn);
   dist_use = distance_agu_use (regno0, insn);
 
+  /* distance_non_agu_define can call extract_insn_cached.  If this function
+ is called from define_split conditions, that can break insn splitting,
+ because split_insns works by clearing recog_data.insn and then modifying
+ recog_data.operand array and match the various split conditions.  */
+  if (recog_data.insn != rinsn)
+recog_data.insn = NULL;
+
   if (dist_define < 0 || dist_define >= LEA_MAX_STALL)
 {
   /* If there is no non AGU operand definition, no AGU
--- gcc/testsuite/gcc.target/i386/pr93611.c.jj  2020-02-06 12:24:28.005976435 
+0100
+++ gcc/testsuite/gcc.target/i386/pr93611.c 2020-02-06 12:24:17.685131826 
+0100
@@ -0,0 +1,5 @@
+/* PR target/93611 */
+/* { dg-do compile } */
+/* { dg-options "-fira-algorithm=priority -O3 -mtune=bonnell" } */
+
+#include "../../gcc.dg/vect/pr58508.c"

Jakub



Re: [PATCH, rs6000]: mark clobber for registers changed by untpyed_call

2020-02-06 Thread Jiufu Guo
Segher Boessenkool  writes:

> Hi!
>
> On Thu, Feb 06, 2020 at 10:49:36AM +0800, Jiufu Guo wrote:
>> >   emit_call_insn (gen_call (operands[0], const0_rtx, const0_rtx));
>> >
>> >   for (i = 0; i < XVECLEN (operands[2], 0); i++)
>> > {
>> >   rtx set = XVECEXP (operands[2], 0, i);
>> >   emit_move_insn (SET_DEST (set), SET_SRC (set));
>> > }
>> >
>> > ... and nothing in the rtl stream says that those return registers are
>> > actually set by that call.  Maybe we should use gen_call_value?  Can we
>> > ever be asked to return more than a single thing here?
>> I was also thinking about using "gen_call_value" or "emit_clobber (r3)"
>> which could generate rtl: "%3:DI=call [foo]" or "call [foo]; clobber
>> r3".  This could tell optimizer that %3 is changed.
>
> The problem with "call ; clobber r3" is that some set+use of a pseudo can
> be moved between these, and then rnreg can rename that to r3 again.  We
> really need to show the call sets r3, in the general case (or that r3 is
> live after the call, at least).
Thanks! More careful thought You are right: set+use maybe able to move between 
"call ;
clobber". "%3=call" is ok without this issue.

>
>> While there are
>> potential issues that untyped_call may change other registers.  So, mark
>> clobber for all touched registers maybe more safe.
>
> Well, we can derive what registers it sets, perhaps?  What does x86 do
> here?  It does something, I know that, haven't looked much deeper yet
> though :-)
For x86, it is generates something like "%c=call":
  Ix86_Expand_call ((TARGET_FLOAT_RETURNS_IN_80387
 ? gen_rtx_REG (XCmode, FIRST_FLOAT_REG) : NULL),
operands[0], const0_rtx,...
first argument of ix86_expand_call is 'set of call'. As comment of
untyped_call of i386.md:
  /* In order to give reg-stack an easier job in validating two
 coprocessor registers as containing a possible return value,
 simply pretend the untyped call returns a complex long double
 value. 

For ppc, maybe %3:DI,%3:TI, %1SF... maybe set by untyped_call, right?
And from trunk source code(builtins.c and .md for targets):
  for (i = 0; i < XVECLEN (operands[2], 0); i++)
{
  rtx set = XVECEXP (operands[2], 0, i);
  emit_move_insn (SET_DEST (set), SET_SRC (set));
}

Above code may means all registers in operands[2] are stored/moved to
stack, those registers maybe altered. Any corrections?

Thanks for your comments and sugguestions!
Jiufu

>
> In general: this is not a problem for us only; some other archs may have
> found a good solution already.
>
>
> Segher


PING^6: [PATCH] i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move

2020-02-06 Thread H.J. Lu
On Mon, Jan 27, 2020 at 10:59 AM H.J. Lu  wrote:
>
> On Mon, Jul 8, 2019 at 8:19 AM H.J. Lu  wrote:
> >
> > On Tue, Jun 18, 2019 at 8:59 AM H.J. Lu  wrote:
> > >
> > > On Fri, May 31, 2019 at 10:38 AM H.J. Lu  wrote:
> > > >
> > > > On Tue, May 21, 2019 at 2:43 PM H.J. Lu  wrote:
> > > > >
> > > > > On Fri, Feb 22, 2019 at 8:25 AM H.J. Lu  wrote:
> > > > > >
> > > > > > Hi Jan, Uros,
> > > > > >
> > > > > > This patch fixes the wrong code bug:
> > > > > >
> > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89229
> > > > > >
> > > > > > Tested on AVX2 and AVX512 with and without --with-arch=native.
> > > > > >
> > > > > > OK for trunk?
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > H.J.
> > > > > > --
> > > > > > i386 backend has
> > > > > >
> > > > > > INT_MODE (OI, 32);
> > > > > > INT_MODE (XI, 64);
> > > > > >
> > > > > > So, XI_MODE represents 64 INTEGER bytes = 64 * 8 = 512 bit 
> > > > > > operation,
> > > > > > in case of const_1, all 512 bits set.
> > > > > >
> > > > > > We can load zeros with narrower instruction, (e.g. 256 bit by 
> > > > > > inherent
> > > > > > zeroing of highpart in case of 128 bit xor), so TImode in this case.
> > > > > >
> > > > > > Some targets prefer V4SF mode, so they will emit float xorps for 
> > > > > > zeroing.
> > > > > >
> > > > > > sse.md has
> > > > > >
> > > > > > (define_insn "mov_internal"
> > > > > >   [(set (match_operand:VMOVE 0 "nonimmediate_operand"
> > > > > >  "=v,v ,v ,m")
> > > > > > (match_operand:VMOVE 1 "nonimmediate_or_sse_const_operand"
> > > > > >  " C,BC,vm,v"))]
> > > > > > 
> > > > > >   /* There is no evex-encoded vmov* for sizes smaller than 
> > > > > > 64-bytes
> > > > > >  in avx512f, so we need to use workarounds, to access sse 
> > > > > > registers
> > > > > >  16-31, which are evex-only. In avx512vl we don't need 
> > > > > > workarounds.  */
> > > > > >   if (TARGET_AVX512F &&  < 64 && !TARGET_AVX512VL
> > > > > >   && (EXT_REX_SSE_REG_P (operands[0])
> > > > > >   || EXT_REX_SSE_REG_P (operands[1])))
> > > > > > {
> > > > > >   if (memory_operand (operands[0], mode))
> > > > > > {
> > > > > >   if ( == 32)
> > > > > > return "vextract64x4\t{$0x0, %g1, 
> > > > > > %0|%0, %g1, 0x0}";
> > > > > >   else if ( == 16)
> > > > > > return "vextract32x4\t{$0x0, %g1, 
> > > > > > %0|%0, %g1, 0x0}";
> > > > > >   else
> > > > > > gcc_unreachable ();
> > > > > > }
> > > > > > ...
> > > > > >
> > > > > > However, since ix86_hard_regno_mode_ok has
> > > > > >
> > > > > >  /* TODO check for QI/HI scalars.  */
> > > > > >   /* AVX512VL allows sse regs16+ for 128/256 bit modes.  */
> > > > > >   if (TARGET_AVX512VL
> > > > > >   && (mode == OImode
> > > > > >   || mode == TImode
> > > > > >   || VALID_AVX256_REG_MODE (mode)
> > > > > >   || VALID_AVX512VL_128_REG_MODE (mode)))
> > > > > > return true;
> > > > > >
> > > > > >   /* xmm16-xmm31 are only available for AVX-512.  */
> > > > > >   if (EXT_REX_SSE_REGNO_P (regno))
> > > > > > return false;
> > > > > >
> > > > > >   if (TARGET_AVX512F &&  < 64 && !TARGET_AVX512VL
> > > > > >   && (EXT_REX_SSE_REG_P (operands[0])
> > > > > >   || EXT_REX_SSE_REG_P (operands[1])))
> > > > > >
> > > > > > is a dead code.
> > > > > >
> > > > > > Also for
> > > > > >
> > > > > > long long *p;
> > > > > > volatile __m256i yy;
> > > > > >
> > > > > > void
> > > > > > foo (void)
> > > > > > {
> > > > > >_mm256_store_epi64 (p, yy);
> > > > > > }
> > > > > >
> > > > > > with AVX512VL, we should generate
> > > > > >
> > > > > > vmovdqa %ymm0, (%rax)
> > > > > >
> > > > > > not
> > > > > >
> > > > > > vmovdqa64   %ymm0, (%rax)
> > > > > >
> > > > > > All TYPE_SSEMOV vector moves are consolidated to ix86_output_ssemov:
> > > > > >
> > > > > > 1. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE/AVX vector
> > > > > > moves will be generated.
> > > > > > 2. If xmm16-xmm31/ymm16-ymm31 registers are used:
> > > > > >a. With AVX512VL, AVX512VL vector moves will be generated.
> > > > > >b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register
> > > > > >   move will be done with zmm register move.
> > > > > >
> > > > > > ext_sse_reg_operand is removed since it is no longer needed.
> > > > > >
> > > > > > Tested on AVX2 and AVX512 with and without --with-arch=native.
> > > > > >
> > > > > > gcc/
> > > > > >
> > > > > > PR target/89229
> > > > > > PR target/89346
> > > > > > * config/i386/i386-protos.h (ix86_output_ssemov): New 
> > > > > > prototype.
> > > > > > * config/i386/i386.c (ix86_get_ssemov): New function.
> > > > > > (ix86_output_ssemov): Likewise.
> > > > > > * config/i386/i386.md (*movxi_internal_avx512

Re: [PATCH] x86-64: Pass aggregates with only float/double in GPRs for MS_ABI

2020-02-06 Thread H.J. Lu
On Wed, Feb 05, 2020 at 09:51:14PM +0100, Uros Bizjak wrote:
> On Wed, Feb 5, 2020 at 6:59 PM H.J. Lu  wrote:
> >
> > MS_ABI requires passing aggregates with only float/double in integer
> > registers.  Checked gcc outputs against Clang and fixed:
> >
> > FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=54
> > -Wno-unused-variable -Wno-unused-parameter
> > -Wno-unused-but-set-variable -Wno-uninitialized -O0
> > -DABI_NUM=FFI_GNUW64 -DABI_ATTR=MSABI execution test
> > FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=54
> > -Wno-unused-variable -Wno-unused-parameter
> > -Wno-unused-but-set-variable -Wno-uninitialized -O2
> > -DABI_NUM=FFI_GNUW64 -DABI_ATTR=MSABI execution test
> > FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=55
> > -Wno-unused-variable -Wno-unused-parameter
> > -Wno-unused-but-set-variable -Wno-uninitialized -O0
> > -DABI_NUM=FFI_GNUW64 -DABI_ATTR=MSABI execution test
> > FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=55
> > -Wno-unused-variable -Wno-unused-parameter
> > -Wno-unused-but-set-variable -Wno-uninitialized -O2
> > -DABI_NUM=FFI_GNUW64 -DABI_ATTR=MSABI execution test
> > FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=56
> > -Wno-unused-variable -Wno-unused-parameter
> > -Wno-unused-but-set-variable -Wno-uninitialized -O0
> > -DABI_NUM=FFI_GNUW64 -DABI_ATTR=MSABI execution test
> > FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=56
> > -Wno-unused-variable -Wno-unused-parameter
> > -Wno-unused-but-set-variable -Wno-uninitialized -O2
> > -DABI_NUM=FFI_GNUW64 -DABI_ATTR=MSABI execution test
> >
> > in libffi testsuite.
> >
> > OK for master and backports to GCC 8/9 branches?
> >
> > gcc/
> >
> > PR target/85667
> > * config/i386/i386.c (function_arg_ms_64): Add a type argument.
> > Don't return aggregates with only SFmode and DFmode in SSE
> > register.
> > (ix86_function_arg): Pass arg.type to function_arg_ms_64.
> >
> > gcc/testsuite/
> >
> > PR target/85667
> > * gcc.target/i386/pr85667-10.c: New test.
> > * gcc.target/i386/pr85667-7.c: Likewise.
> > * gcc.target/i386/pr85667-8.c: Likewise.
> > * gcc.target/i386/pr85667-9.c: Likewise.
> 
> LGTM, but should really be reviewed by cygwin, mingw-w64 maintainer (CC'd).
> 

I checked the result against MSVC v19.10 at

https://godbolt.org/z/2NPygd

My patch matches MSVC v19.10.  I am checking it in tomorrow unless
mingw-w64 maintainer objects.

Thanks.

H.J.


[PATCH] Use the section flag 'o' for __patchable_function_entries

2020-02-06 Thread H.J. Lu
This commit in GNU binutils 2.35:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=b7d072167715829eed0622616f6ae0182900de3e

added the section flag 'o' to .section directive:

.section __patchable_function_entries,"awo",@progbits,foo

which specifies the symbol name which the section references.  Assembler
creates a unique __patchable_function_entries section with the section,
where foo is defined, as its linked-to section.  Linker keeps a section
if its linked-to section is kept during garbage collection.

This patch checks assembler support for the section flag 'o' and uses
it to implement __patchable_function_entries section.  Since Solaris may
use GNU assembler with Solairs ld.  Even if GNU assembler supports the
section flag 'o', it doesn't mean that Solairs ld supports it.  This
feature is disabled for Solairs targets.

gcc/

PR middle-end/93195
PR middle-end/93197
* configure.ac (HAVE_GAS_SECTION_LINK_ORDER): New.  Define if
the assembler supports the section flag 'o' for specifying
section with link-order.
* dwarf2out.c (output_comdat_type_unit): Pass 0 as flags2
to targetm.asm_out.named_section.
* config/sol2.c (solaris_elf_asm_comdat_section): Likewise.
* output.h (SECTION2_LINK_ORDER): New.
(switch_to_section): Add an unsigned int argument.
(default_no_named_section): Likewise.
(default_elf_asm_named_section): Likewise.
* target.def (asm_out.named_section): Likewise.
* targhooks.c (default_print_patchable_function_entry): Pass
current_function_decl to get_section and SECTION2_LINK_ORDER
to switch_to_section.
* varasm.c (default_no_named_section): Add an unsigned int
argument.
(default_elf_asm_named_section): Add an unsigned int argument,
flags2.  Use 'o' flag for SECTION2_LINK_ORDER if assembler
supports it.
(switch_to_section): Add an unsigned int argument and pass it
to targetm.asm_out.named_section.
(handle_vtv_comdat_section): Pass 0 to
targetm.asm_out.named_section.
* config.in: Regenerated.
* configure: Likewise.
* doc/tm.texi: Likewise.

gcc/testsuite/

PR middle-end/93195
* g++.dg/pr93195a.C: New test.
* g++.dg/pr93195b.C: Likewise.
* lib/target-supports.exp
(check_effective_target_o_flag_in_section): New proc.
---
 gcc/config.in |  6 
 gcc/config/sol2.c |  3 +-
 gcc/configure | 52 +++
 gcc/configure.ac  | 22 
 gcc/doc/tm.texi   |  5 +--
 gcc/dwarf2out.c   |  4 +--
 gcc/output.h  | 11 --
 gcc/target.def|  5 +--
 gcc/targhooks.c   |  4 ++-
 gcc/testsuite/g++.dg/pr93195a.C   | 27 ++
 gcc/testsuite/g++.dg/pr93195b.C   | 14 
 gcc/testsuite/lib/target-supports.exp | 40 +
 gcc/varasm.c  | 25 ++---
 13 files changed, 202 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pr93195a.C
 create mode 100644 gcc/testsuite/g++.dg/pr93195b.C

diff --git a/gcc/config.in b/gcc/config.in
index 48292861842..d1ecc5b15a6 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -1313,6 +1313,12 @@
 #endif
 
 
+/* Define if your assembler supports 'o' flag in .section directive. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_GAS_SECTION_LINK_ORDER
+#endif
+
+
 /* Define 0/1 if your assembler supports marking sections with SHF_MERGE flag.
*/
 #ifndef USED_FOR_TARGET
diff --git a/gcc/config/sol2.c b/gcc/config/sol2.c
index cf9d9f1f684..62bbdec2f97 100644
--- a/gcc/config/sol2.c
+++ b/gcc/config/sol2.c
@@ -224,7 +224,8 @@ solaris_elf_asm_comdat_section (const char *name, unsigned 
int flags, tree decl)
  emits this as a regular section.  Emit section before .group
  directive since Sun as treats undeclared sections as @progbits,
  which conflicts with .bss* sections which are @nobits.  */
-  targetm.asm_out.named_section (section, flags & ~SECTION_LINKONCE, decl);
+  targetm.asm_out.named_section (section, flags & ~SECTION_LINKONCE,
+0, decl);
   
   /* Sun as separates declaration of a group section and of the group
  itself, using the .group directive and the #comdat flag.  */
diff --git a/gcc/configure b/gcc/configure
index 5fa565a40a4..a7315e33a62 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -24185,6 +24185,58 @@ cat >>confdefs.h <<_ACEOF
 _ACEOF
 
 
+# Test if the assembler supports the section flag 'o' for specifying
+# section with link-order.
+case "${target}" in
+  # Solaris may use GNU assembler with Solairs ld.  Even if GNU
+  # assembler supports the section flag 'o', it doesn't mean that
+  # Solairs ld supports it.
+  *-*-solaris2*)
+gcc_cv

RE: [PATCH][AARCH64] Fix for PR86901

2020-02-06 Thread Modi Mo via gcc-patches
> I did a quick bootstrap, this shows several failures like:
> 
> gcc/builtins.c:9427:1: error: unrecognizable insn:
>  9427 | }
>   | ^
> (insn 212 211 213 24 (set (reg:SI 207)
> (zero_extract:SI (reg:SI 206)
> (const_int 26 [0x1a])
> (const_int 6 [0x6]))) "/gcc/builtins.c":9413:44 -1
>  (nil))
> 
> The issue here is that 26+6 = 32 and that's not a valid ubfx encoding.
> Currently cases like this are split into a right shift in aarch64.md around 
> line
> 5569:

Appreciate you taking a look and the validation. I've gotten access to an 
aarch64 server and the bootstrap demonstrated the issue you saw. This was 
caused by my re-definition of the pattern to:
+  if (width == 0 || (pos + width) > GET_MODE_BITSIZE (mode))
+FAIL;

Which meant for SImode only a sum of >32 bit actually triggers the fail 
condition for the define_expand whereas the existing define_insn fails on >=32 
bit. I looked into the architecture reference manual and the bits are available 
for ubfx/sbfx for that type of encoding and the documentation says you can use 
[lsb, 32-lsb] for SImode as a legal pair. Checking with the GNU assembler it 
does accept a sum of 32 but transforms it into a LSR:

Assembly file:
ubfxw0, w0, 24, 8

Disassembly of section .text:

 :
   0:   53187c00lsr w0, w0, #24

Similarly with the 64 bit version it'll become a 64 bit LSR. Certainly other 
assemblers could trip over, I've attached a new patch that allows this encoding 
and bootstrap + testing c/c++ testsuite looks good. I'll defer to you if it's 
better to explicitly do the transformation in GCC.

> ;; When the bit position and width add up to 32 we can use a W-reg LSR ;;
> instruction taking advantage of the implicit zero-extension of the X-reg.
> (define_split
>   [(set (match_operand:DI 0 "register_operand")
> (zero_extract:DI (match_operand:DI 1 "register_operand")
>  (match_operand 2
>"aarch64_simd_shift_imm_offset_di")
>  (match_operand 3
>"aarch64_simd_shift_imm_di")))]
>   "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]), 1,
>  GET_MODE_BITSIZE (DImode) - 1)
>&& (INTVAL (operands[2]) + INTVAL (operands[3]))
>== GET_MODE_BITSIZE (SImode)"
>   [(set (match_dup 0)
> (zero_extend:DI (lshiftrt:SI (match_dup 4) (match_dup 3]
>   {
> operands[4] = gen_lowpart (SImode, operands[1]);
>   }
> 
> However that only supports DImode, not SImode, so it needs to be changed
> to be more general using GPI.
> 
> Your new extv patterns should replace the magic patterns above it:

With the previous discovery that a sum of 32/64 will trigger LSR in the 
assembler I was curious what would happen if I remove this pattern. Turns out, 
you will end up with a UBFX x0, x0, 24, 8 compared to a LSR w0, w0, 24 in the 
test case associated with this change (gcc.target/aarch64/ubfx_lsr_1.c) which 
doesn't get transformed into an LSR by the assembler since it's in 64 bit mode. 
So this pattern still has value but I don't think it's necessary to extend it 
to DI since that'll automatically get turned into a LSR by the assembler as I 
previously mentioned.


> ;; ---
> ;; Bitfields
> ;; ---
> 
> (define_expand ""
> 
> These are the current extv/extzv patterns, but without a mode. They are no
> longer used when we start using the new ones.
> 
> Note you can write  to combine the extzv and extz patterns.
> But please add a comment mentioning the pattern names so they are easy to
> find!

Good call here, made this change in the new patch. I did see the define_insn of 
these guys below it but somehow missed that they were expanded just above. I 
believe, from my understanding of GCC, that the matching pattern below the 
first line is what constrains  into just extv/extsv from the long list 
of iterators it belongs to. Still, I see that there's constrained iterators 
elsewhere like: 

;; Optab prefix for sign/zero-extending operations
(define_code_attr su_optab [(sign_extend "") (zero_extend "u")

I added a comment in this patch before the pattern. Thoughts on defining 
another constrained version to make it clearer (in addition or in lieu of the 
comment)?

> Besides a bootstrap it is always useful to compile a large body of code with
> your change (say SPEC2006/2017) and check for differences in at least
> codesize. If there is an increase in instruction count then there may be more
> issues that need to be resolved.

Sounds good. I'll get those setup and running and will report back on findings. 
What's the preferred way to measure codesize? I'm assuming by default the code 
pages are aligned so smaller differences would need to trip over the boundary 
to actually show up. 
 
> I find it easiest to develop on a many-core AArch64 ser

Re: [wwwdocs] Mention common attribute in gcc-10/porting_to.html

2020-02-06 Thread Sandra Loosemore

On 2/6/20 3:18 PM, Gerald Pfeifer wrote:

On Thu, 6 Feb 2020, Jakub Jelinek wrote:

+  If tentative definitions of particular variable or variables need to be


I believe that would be "a particular variable", but best to simplify
to "of particular variables".


+  placed in a common block, __attribute__((__common__)) can be
+  used to force that behavior for those even in code compiled without
+  -fcommon.


Here I'd omit "for those".

This makes sense to me and reads well; okay from my side. :)


Looks good to me with those changes.

-Sandra


Re: [PATCH 2/3] libstdc++: Implement C++20 constrained algorithms

2020-02-06 Thread Patrick Palka
On Thu, 6 Feb 2020, Jonathan Wakely wrote:

> On 03/02/20 21:07 -0500, Patrick Palka wrote:
> > +#ifndef _RANGES_ALGO_H
> > +#define _RANGES_ALGO_H 1
> > +
> > +#if __cplusplus > 201703L
> > +
> > +#include 
> > +#include 
> > +#include 
> > +// #include 
> 
> This line could be removed, or leave it as a reminder to me to
> refactor  so that the small utility pieces are in a small
> utility header (like  that can be included
> instead of the whole of .

I guess I'll leave it in then.

> 
> > +#include 
> > +#include 
> > +#include  // __is_byte
> > +#include  // concept uniform_random_bit_generator
> 
> I wonder if we want to move that concept to 
> instead, which already exists to allow  to avoid including
> the whole of . If we do that, it would make sense to rename
>  to  or something like
> that.

That makes sense -- I can try to do that in a followup patch.

> 
> > +
> > +#if __cpp_lib_concepts
> > +namespace std _GLIBCXX_VISIBILITY(default)
> > +{
> > +_GLIBCXX_BEGIN_NAMESPACE_VERSION
> > +namespace ranges
> > +{
> > +  namespace __detail
> > +  {
> > +template
> > +constexpr inline bool __is_normal_iterator = false;
> 
> All these templates in the __detail namespace should be indented by
> two spaces after the template-head i.e.
> 
> template
>   constexpr inline bool __is_normal_iterator = false;
> 
> (That indentation scheme has been in the libstdc++ style guide for
> longer than I've been contributing to the project, but it doesn't seem
> very popular with new contributors, and it wastes a level of
> indentation for templates, which means most of the library. Maybe we
> should revisit that convention.)

Fixed

> 
> 
> > +  template
> > +using unary_transform_result = copy_result<_Iter, _Out>;
> > +
> > +  template _Sent,
> > +  weakly_incrementable _Out,
> > +  copy_constructible _Fp, typename _Proj = identity>
> > +requires writable<_Out, indirect_result_t<_Fp&, projected<_Iter,
> > _Proj>>>
> 
> I have a pending patch to implement P1878R1, which renames writable
> (and a few other concepts). I'll wait until your patch is in, and
> change these places using it.

Sounds good.

> 
> > +partial_sort_copy(_Iter1 __first, _Sent1 __last,
> > + _Iter2 __result_first, _Sent2 __result_last,
> > + _Comp __comp = {},
> > + _Proj1 __proj1 = {}, _Proj2 __proj2 = {})
> > +{
> > +  if (__result_first == __result_last)
> > +   {
> > + // TODO: Eliminating the variable __lasti triggers an ICE.
> > + auto __lasti = ranges::next(std::move(__first),
> > + std::move(__last));
> > + return {std::move(__lasti), std::move(__result_first)};
> 
> Please try to reduce that and report it to bugzilla at some point,
> thanks.

Will do!  Interestingly, it was an ICE in the middle-end.  I wasn't able
to reproduce it anymore, but I'll try more carefully tomorrow.

> 
> > +++ b/libstdc++-v3/testsuite/25_algorithms/all_of/constrained.cc
> > @@ -0,0 +1,90 @@
> > +// Copyright (C) 2019 Free Software Foundation, Inc.
> 
> This should be 2020. That's the only change necessary though, please
> adjust that and commit to master. Great work, thank you!

Fixed.  Thank you for the review!  Patch committed, hopefully without
any fallout.



[PATCH 4/3] Add [range.istream]

2020-02-06 Thread Patrick Palka
This patch adds ranges::basic_istream_view and ranges::istream_view.  This seems
to be the last missing part of the ranges header.

libstdc++-v3/ChangeLog:

* include/std/ranges (ranges::__detail::__stream_extractable,
ranges::basic_istream_view, ranges::istream_view): Define.
* testsuite/std/ranges/istream_view: New test.
---
 libstdc++-v3/include/std/ranges   | 94 +++
 .../testsuite/std/ranges/istream_view.cc  | 76 +++
 2 files changed, 170 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/std/ranges/istream_view.cc

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 8a8fefb6f19..88b98310ef9 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -951,6 +951,100 @@ namespace views
   inline constexpr _Iota iota{};
 } // namespace views
 
+  namespace __detail
+  {
+template
+  concept __stream_extractable
+   = requires(basic_istream<_CharT, _Traits>& is, _Val& t) { is >> t; };
+  } // namespace __detail
+
+  template
+requires default_initializable<_Val>
+  && __detail::__stream_extractable<_Val, _CharT, _Traits>
+class basic_istream_view
+: public view_interface>
+{
+public:
+  basic_istream_view() = default;
+
+  constexpr explicit
+  basic_istream_view(basic_istream<_CharT, _Traits>& __stream)
+   : _M_stream(std::__addressof(__stream))
+  { }
+
+  constexpr auto
+  begin()
+  {
+   if (_M_stream != nullptr)
+ *_M_stream >> _M_object;
+   return _Iterator{*this};
+  }
+
+  constexpr default_sentinel_t
+  end() const noexcept
+  { return default_sentinel; }
+
+private:
+  basic_istream<_CharT, _Traits>* _M_stream = nullptr;
+  _Val _M_object = _Val();
+
+  struct _Iterator
+  {
+  public:
+   using iterator_category = input_iterator_tag;
+   using difference_type = ptrdiff_t;
+   using value_type = _Val;
+
+   _Iterator() = default;
+
+   constexpr explicit
+   _Iterator(basic_istream_view& __parent) noexcept
+ : _M_parent(std::__addressof(__parent))
+   { }
+
+   _Iterator(const _Iterator&) = delete;
+   _Iterator(_Iterator&&) = default;
+   _Iterator& operator=(const _Iterator&) = delete;
+   _Iterator& operator=(_Iterator&&) = default;
+
+   _Iterator&
+   operator++()
+   {
+ __glibcxx_assert(_M_parent->_M_stream != nullptr);
+ *_M_parent->_M_stream >> _M_parent->_M_object;
+   }
+
+   void
+   operator++(int)
+   { ++*this; }
+
+   _Val&
+   operator*() const
+   {
+ __glibcxx_assert(_M_parent->_M_stream != nullptr);
+ return _M_parent->_M_object;
+   }
+
+   friend bool
+   operator==(const _Iterator& __x, default_sentinel_t)
+   { return __x.__at_end(); }
+
+  private:
+   basic_istream_view* _M_parent = nullptr;
+
+   bool
+   __at_end() const
+   { return _M_parent == nullptr || !*_M_parent->_M_stream; }
+  };
+
+  friend _Iterator;
+};
+
+  template
+basic_istream_view<_Val, _CharT, _Traits>
+istream_view(basic_istream<_CharT, _Traits>& __s)
+{ return basic_istream_view<_Val, _CharT, _Traits>{__s}; }
+
 namespace __detail
 {
   struct _Empty { };
diff --git a/libstdc++-v3/testsuite/std/ranges/istream_view.cc 
b/libstdc++-v3/testsuite/std/ranges/istream_view.cc
new file mode 100644
index 000..c573ba57ae8
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/ranges/istream_view.cc
@@ -0,0 +1,76 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++2a" }
+// { dg-do run { target c++2a } }
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+namespace ranges = std::ranges;
+namespace views = std::views;
+
+struct X : __gnutest::rvalstruct
+{
+  char c;
+
+  friend std::istream&
+  operator>>(std::istream& is, X& m)
+  {
+is >> m.c;
+return is;
+  }
+};
+
+
+void
+test01()
+{
+  std::string s = "0123456789";
+  auto ss = std::istringstream{s};
+  auto v = ranges::istream_view(ss) | views::transform(&X::c);
+  VERIFY( ranges::equal(v, s) );
+}
+
+void
+test02()
+{
+  auto ints = std::istringstream{"0 1  2

[committed] analyzer: fix reproducer for PR 93375

2020-02-06 Thread David Malcolm
Reproducing the ICE in PR analyzer/93375 required some kind of
analyzer diagnostic occurring after a call with fewer arguments
than required by the callee.

The testcase used __builtin_memcpy with a NULL argument for this.

On x86_64-pc-linux-gnu this happened to be already optimized into:
  _4 = MEM  [(char * {ref-all})0B];
  MEM  [(char * {ref-all})rl_1] = _4;
by the time of the analyzer pass, leading to the diagnostic in question
being:
  warning: dereference of NULL ‘rl’ [CWE-690] [-Wanalyzer-null-dereference]

On other targets e.g. arm-unknown-linux-gnueabi, the builtin isn't
optimized at the time of the analyzer pass, leading to this diagnostic
instead:
  warning: use of NULL ‘rl’ where non-null expected [CWE-690] 
[-Wanalyzer-null-argument]
  : note: argument 1 of ‘__builtin_memcpy’ must be non-null

This patch fixes the test case by using a custom function marked as
nonnull.  I manually verified that it still reproduces the ICE if the
patch for the PR is reverted, and verified the messages on
x86_64-pc-linux-gnu and arm-unknown-linux-gnueabi.

Successfully regrtested on x86_64-pc-linux-gnu.
Pushed to master as r10-6496-g13f5b93e6453d121abc15c718dfcc588aca976c3.

gcc/testsuite/ChangeLog:
PR analyzer/93375
* gcc.dg/analyzer/pr93375.c: Rework test case to avoid per-target
differences in how __builtin_memcpy has been optimized at the time
the analyzer runs.
---
 gcc/testsuite/gcc.dg/analyzer/pr93375.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/analyzer/pr93375.c 
b/gcc/testsuite/gcc.dg/analyzer/pr93375.c
index 93a3e87f2cb..f6108547fb7 100644
--- a/gcc/testsuite/gcc.dg/analyzer/pr93375.c
+++ b/gcc/testsuite/gcc.dg/analyzer/pr93375.c
@@ -1,5 +1,7 @@
 /* { dg-additional-options "-Wno-implicit-int" } */
 
+extern void foo (void *) __attribute__((nonnull));
+
 void
 en (jm)
 {
@@ -11,5 +13,5 @@ p2 ()
   char *rl = 0;
 
   en ();
-  __builtin_memcpy (rl, 0, sizeof (0)); /* { dg-warning "dereference of NULL" 
} */
+  foo (rl); /* { dg-warning "use of NULL 'rl' where non-null expected" } */
 }
-- 
2.21.0



[PATCH] c++: Fix ICE with template codes in check_narrowing [PR91465]

2020-02-06 Thread Marek Polacek
In ed4f2c001a883b2456fc607a33f1c59f9c4ee65d I changed the call to
fold_non_dependent_expr in check_narrowing to maybe_constant_value.
That was the wrong thing to do as these tests show: check_narrowing
bails out for dependent expressions but we can still have template
codes like CAST_EXPR that don't have anything dependent in it so are
considered non-dependent.  But cxx_eval_* don't grok template codes,
so we need to call fold_non_dependent_expr instead which knows what
to do with template codes.  (I fully accept a "told you so".)

I'm passing tf_none to it, otherwise we'd emit a bogus error for
constexpr-ex4.C: there INIT is "A::operator int(&a)" and while
instantiating this CALL_EXPR (in a template) we call finish_call_expr
and that sees a BASELINK and so emits a new dummy object for 'this',
and then we complain about the wrong number of arguments, because now
we basically have two 'this's.  Which is exactly the problem I saw
recently in c++/92948.

Bootstrapped/regtested on x86_64-linux, ok for trunk and 9?

PR c++/91465 - ICE with template codes in check_narrowing.
* typeck2.c (check_narrowing): Call fold_non_dependent_expr
instead of maybe_constant_value.

* g++.dg/cpp0x/pr91465.C: New test.
* g++.dg/cpp1z/pr91465.C: New test.
---
 gcc/cp/typeck2.c |  4 +++-
 gcc/testsuite/g++.dg/cpp0x/pr91465.C | 16 
 gcc/testsuite/g++.dg/cpp1z/pr91465.C | 10 ++
 3 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr91465.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/pr91465.C

diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index 371b203c29b..8f8e9703ac8 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -981,7 +981,9 @@ check_narrowing (tree type, tree init, tsubst_flags_t 
complain,
   return ok;
 }
 
-  init = maybe_constant_value (init);
+  init = fold_non_dependent_expr (init, tf_none);
+  if (init == error_mark_node)
+return ok;
 
   /* If we were asked to only check constants, return early.  */
   if (const_only && !TREE_CONSTANT (init))
diff --git a/gcc/testsuite/g++.dg/cpp0x/pr91465.C 
b/gcc/testsuite/g++.dg/cpp0x/pr91465.C
new file mode 100644
index 000..e2021aa13e1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/pr91465.C
@@ -0,0 +1,16 @@
+// PR c++/91465 - ICE with template codes in check_narrowing.
+// { dg-do compile { target c++11 } }
+
+enum class D { X };
+enum class S { Z };
+
+D foo(S) { return D{}; }
+D foo(double) { return D{}; }
+
+template 
+struct Bar {
+  D baz(S s)
+  {
+return D{foo(s)};
+  }
+};
diff --git a/gcc/testsuite/g++.dg/cpp1z/pr91465.C 
b/gcc/testsuite/g++.dg/cpp1z/pr91465.C
new file mode 100644
index 000..5b1205349d0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/pr91465.C
@@ -0,0 +1,10 @@
+// PR c++/91465 - ICE with template codes in check_narrowing.
+// { dg-do compile { target c++17 } }
+
+enum class E { Z };
+
+template 
+void foo(F)
+{
+  E{char(0)};
+}

base-commit: cb273d81a45092ceee793f0357526e291f03c7b7
-- 
Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA



Re: [PATCH 3/3] libstdc++: Implement C++20 range adaptors

2020-02-06 Thread Patrick Palka
On Thu, 6 Feb 2020, Jonathan Wakely wrote:

> On 03/02/20 21:07 -0500, Patrick Palka wrote:
> > This patch implements [range.adaptors].  It also includes the changes from
> > P3280
> > and P3278 and P3323, without which many standard examples won't work.
> > 
> > The implementation is mostly dictated by the spec and there was not much
> > room
> > for implementation discretion.  The most interesting part that was not
> > specified
> > by the spec is the design of the range adaptors and range adaptor closures,
> > which I tried to design in a way that minimizes boilerplate and statefulness
> > (so
> > that e.g. the composition of two stateless closures is stateless).
> > 
> > What is left unimplemented is caching of calls to begin() in filter_view,
> > drop_view and reverse_view, which is required to guarantee that begin() has
> > amortized constant time complexity.  I can implement this in a subsequent
> > patch.
> > 
> > "Interesting" parts of the patch are marked with XXX comments.
> 
> 
> 
> > --- a/libstdc++-v3/include/std/ranges
> > +++ b/libstdc++-v3/include/std/ranges
> > @@ -39,6 +39,7 @@
> > #if __cpp_lib_concepts
> > 
> > #include 
> > +#include  // std::ref
> 
> Please use  instead.  is huge.

Fixed.

> 
> > #include 
> > #include 
> > #include 
> > +
> > +namespace __detail
> > +{
> > +  struct _Empty { };
> > +} // namespace __detail
> > +
> > +namespace views
> > +{
> > +  template
> > +struct _RangeAdaptorClosure;
> > +
> > +  template
> > +struct _RangeAdaptor
> > +{
> > +protected:
> > +  [[no_unique_address]]
> > +   conditional_t,
> > + _Callable, __detail::_Empty> _M_callable;
> > +
> > +public:
> > +  constexpr
> > +  _RangeAdaptor(const _Callable& = {})
> > +   requires is_default_constructible_v<_Callable>
> > +  { }
> > +
> > +  constexpr
> > +  _RangeAdaptor(_Callable __callable)
> 
> As mentioned on IRC, the non-explicit constructors here make me
> nervous. I'd either like them to be explicit, or for these typesto be
> in their own namespace so that there is never a reason to attempt
> implicit conversions to them just because some function related to
> them is found in an associated namespace.

Fixed by quarantining these classes into the namespace __adaptor.

> 
> > +   requires (!is_default_constructible_v<_Callable>)
> > +   : _M_callable(std::move(__callable))
> > +  { }
> > +
> 
> 
> 
> 
> > +  template<__detail::__not_same_as _Tp>
> > +   requires convertible_to<_Tp, _Range&>
> > + && requires { _S_fun(declval<_Tp>()); }
> > +   constexpr
> > +   ref_view(_Tp&& __t)
> > + : _M_r(addressof(static_cast<_Range&>(std::forward<_Tp>(__t
> 
> This should be std-qualified to avoid ADL, and should use the internal
> std::__addressof function (just to avoid the extra call from
> std::addressof).

Fixed, for good measure I replaced every call to addressof with
std::__addressof.

> 
> > +  // XXX: the following algos are copied verbatim from ranges_algo.h to
> > avoid a
> > +  // circular dependency with that header.
> 
> Ugh, that's unfortunate, but OK.
> 
> I guess we could put the body of the functions in new, unconstrained
> functions, and then have the ones in  and these
> call those, to reuse the implementations. But they're so small and
> simple it's probably not worth it.

Yeah, I suppose not given their simplicity.

> 
> > +  namespace __detail
> > +  {
> > +template _Sent,
> > +typename _Proj = identity,
> > +indirect_unary_predicate> _Pred>
> > +  constexpr _Iter
> > +  find_if(_Iter __first, _Sent __last, _Pred __pred, _Proj __proj = {})
> > +  {
> > +   while (__first != __last
> > +   && !(bool)std::__invoke(__pred, std::__invoke(__proj, *__first)))
> > + ++__first;
> > +   return __first;
> > +  }
> > +
> > +template > +indirect_unary_predicate, _Proj>>
> > +  _Pred>
> > +  constexpr safe_iterator_t<_Range>
> > +  find_if(_Range&& __r, _Pred __pred, _Proj __proj = {})
> > +  {
> > +   return __detail::find_if(ranges::begin(__r), ranges::end(__r),
> > +std::move(__pred), std::move(__proj));
> > +  }
> 
> It looks like maybe we don't need this overload.

Fixed by removing this overload.

> 
> > +template _Sent,
> > +typename _Proj = identity,
> > +indirect_unary_predicate> _Pred>
> > +  constexpr _Iter
> > +  find_if_not(_Iter __first, _Sent __last, _Pred __pred, _Proj __proj =
> > {})
> > +  {
> > +   while (__first != __last
> > +   && (bool)std::__invoke(__pred, std::__invoke(__proj, *__first)))
> > + ++__first;
> > +   return __first;
> > +  }
> > +
> > +template > +indirect_unary_predicate, _Proj>>
> > +  _Pred>
> > +  constexpr safe_iterator_t<_Range>
> > +  find_if_not(_Range&& __r, _Pred __pred, _Proj __proj = {})
> > +  {
> > +   return __detail::find_if_not(ranges::begin(__r), ranges::end(__r),
> > +

Re: [PATCH] Fix PowerPC -fstack-clash-protection -mprefixed-addr ICE (PR target/93122)

2020-02-06 Thread Segher Boessenkool
Hi again,

On Thu, Feb 06, 2020 at 08:51:06PM +0100, Jakub Jelinek wrote:
> On Thu, Feb 06, 2020 at 01:15:25PM -0600, Segher Boessenkool wrote:
> > On Thu, Jan 30, 2020 at 05:14:08PM +0100, Jakub Jelinek wrote:
> > > Here is what I meant as the alternative, i.e. don't check any predicates,
> > > just gen_add3_insn, if that fails, force rs into register and retry.
> > 
> > I don't like gen_add3_insn here *at all*, as I said, but okay, you're
> > only fixing existing code.  But as long as it is there, this code will
> > be a problem child.
> 
> gen_add3_insn is used 25 times elsewhere in the rs6000 backend when not
> counting these 2 calls that were just slightly moved around by the patch.

Yes, and almost none of those cases check for errors.  If they really
cannot error, they can probably just call one of the actual patterns for
the machine instructions directly (like we already do in many more cases).

> > > And, add REG_FRAME_RELATED_EXPR note always when we haven't emitted a 
> > > single
> > > insn that has rtl exactly matching what we'd add the 
> > > REG_FRAME_RELATED_EXPR
> > > with (in that case, dwarf2cfi.c is able to figure it out by itself, no 
> > > need
> > > to waste compile time memory).
> > 
> > I would say "just always emit that note", but that is what the patch
> > does, already :-)
> 
> No, the patch doesn't emit it always, see below.

So move the comment *before* "if (add_note)" then?  :-)

(I don't think it would be terrible to do it actually always either, fwiw,
but this is fine).

> 2020-02-06  Jakub Jelinek  
> 
>   PR target/93122
>   * config/rs6000/rs6000-logue.c
>   (rs6000_emit_probe_stack_range_stack_clash): Always use gen_add3_insn,
>   if it fails, move rs into end_addr and retry.  Add
>   REG_FRAME_RELATED_EXPR note whenever it returns more than one insn or
>   the insn pattern doesn't describe well what exactly happens to
>   dwarf2cfi.c.
> 
>   * gcc.target/powerpc/pr93122.c: New test.

Okay for trunk (and backports if you want those).  Thanks for the patch,
and thanks for bearing with me.


Segher


Re: [PATCH] avoid issuing -Wrestrict from folder (PR 93519)

2020-02-06 Thread Martin Sebor

On 2/6/20 6:16 AM, Richard Biener wrote:

On Thu, Feb 6, 2020 at 2:00 PM Jeff Law  wrote:


On Wed, 2020-02-05 at 09:19 +0100, Richard Biener wrote:

On Tue, Feb 4, 2020 at 11:02 PM Martin Sebor  wrote:

On 2/4/20 2:31 PM, Jeff Law wrote:

On Tue, 2020-02-04 at 13:08 -0700, Martin Sebor wrote:

On 2/4/20 12:15 PM, Richard Biener wrote:

On February 4, 2020 5:30:42 PM GMT+01:00, Jeff Law  wrote:

On Tue, 2020-02-04 at 10:34 +0100, Richard Biener wrote:

On Tue, Feb 4, 2020 at 1:44 AM Martin Sebor  wrote:

PR 93519 reports a false positive -Wrestrict issued for an inlined

call

to strcpy that carefully guards against self-copying.  This is

caused

by the caller's arguments substituted into the call during inlining

and

before dead code elimination.

The attached patch avoids this by removing -Wrestrict from the

folder

and deferring folding perfectly overlapping (and so undefined)

calls

to strcpy (and mempcpy, but not memcpy) until much later.  Calls to
perfectly overlapping calls to memcpy are still folded early.


Why do we bother to warn at all for this case?  Just DWIM here.

Warnings like

this can be emitted from the analyzer?

They potentially can, but the analyzer is and will almost always
certainly be considerably slower.  I would not expect it to be used
nearly as much as the core compiler.

WHether or not a particular warning makes sense in the core compiler or
analyzer would seem to me to depend on whether or not we can reasonably
issue warnings without interprocedural analysis.  double-free
realistically requires interprocedural analysis to be effective.  I'm
not sure Wrestrict really does.



That is, I suggest to simply remove the bogus warning code from

folding

(and _not_ fail the folding).

I haven't looked at the patch, but if we can get the warning out of the
folder that's certainly preferable.  And we could investigate deferring
self-copy removal.


I think the issue is as usual, warning for code we'll later remove as dead. 
Warning at folding is almost always premature.


In this instance the code is reachable (or isn't obviously unreachable).
GCC doesn't remove it, but provides benign (and reasonable) semantics
for it(*).  To me, that's one aspect of quality.  Letting the user know
that the code is buggy is another.  I view that as at least as important
as folding the ill-effects away because it makes it possible to fix
the problem so the code works correctly even with compilers that don't
provide these benign semantics.

If you look at the guts of what happens at the point where we issue the
warning from within gimple_fold_builtin_strcpy we have:


DCH_to_char (char * in, char * out, int collid)
{
int type;
char * D.2148;
char * dest;
char * num;
long unsigned int _4;
char * _5;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
;;succ:   4

;;   basic block 4, loop depth 0
;;pred:   2
;;succ:   5

;;   basic block 5, loop depth 0
;;pred:   4
;;succ:   6

;;   basic block 6, loop depth 0
;;pred:   5
if (0 != 0)
  goto ; [53.47%]
else
  goto ; [46.53%]
;;succ:   7
;;8

;;   basic block 7, loop depth 0
;;pred:   6
strcpy (out_1(D), out_1(D));
;;succ:   8

;;   basic block 8, loop depth 0
;;pred:   6
;;7
_4 = __builtin_strlen (out_1(D));
_5 = out_1(D) + _4;
__builtin_memcpy (_5, "foo", 4);
;;succ:   3

;;   basic block 3, loop depth 0
;;pred:   8
return;
;;succ:   EXIT

}



Which shows the code is obviously unreachable in the case we're warning
about.  You can't see this in the dumps because it's exposed by
inlining, then cleaned up before writing the dump file.


In the specific case of the bug the code is of course eliminated
because it's guarded by the if (s != d).  I was referring to
the general (unguarded) case of:

char *s = "", *p;

int main (void)
{
  p = strcpy (s, s);
  puts (p);
}

where GCC folds the assignment 'p = strcpy(s, s);' to effectively
p = s;  That's perfectly reasonable but it could equally as well
leave the call alone, as it does when s is null, for instance.

I think folding it away is not only reasonable but preferable to
making the invalid call, but it's done only rarely.  Most of
the time GCC does emit the undefined access (it does that with
calls to library functions as well as with direct stores and
reads).  (I am hoping we can change that in the future so that
these kinds of problems are handled with some consistency.)


ISTM this would be a case we could handle with the __builtin_warning
stuff.

I think the question is do we want to do anything about it this cycle?


If so, I think Martin's approach is quite reasonable.  It disables
folding away the self-copies from gimple-fold and moves the warning
into the expander.  So if there's such a call in the IL at expansion
time we get a warning (-O0).

I'd hazard a guess that the 

Re: [PATCH] PR target/93569 [version 2], Fix PowerPC vsx-builtin-15d.c test case

2020-02-06 Thread Segher Boessenkool
On Thu, Feb 06, 2020 at 01:40:03PM -0500, Michael Meissner wrote:
> This patch addresses the concern the Segher raised in the original submission
> of the patch to fix PR target/93569.  In addition to checking for D*-form
> addresses in the traditional Altivec registers, this patch also checks for
> D*-form addresses for vectors in the traditional floating point registers.
> Neither one of these address forms were allowed before ISA 3.0 (power9).

This is okay for trunk.  Thanks!


Segher


Re: [PATCH] add -mvsx to pr92923-1.c test requiring vsx

2020-02-06 Thread Segher Boessenkool
Hi Will,

On Thu, Feb 06, 2020 at 11:41:47AM -0600, will schmidt wrote:
>   The existing testcase pr92923-1.c uses vector long long, and thus
>   requires vsx.
>   OK for master?

Sure!  Thanks for the patch.

>   * testsuite/gcc.target/powerpc/pr92923-1.c: Add -mvsx.

The changelog is testsuite/ChangeLog, so entries there do not have
"testsuite/" in it.


Segher


Re: [wwwdocs] Mention common attribute in gcc-10/porting_to.html

2020-02-06 Thread Gerald Pfeifer
On Thu, 6 Feb 2020, Jakub Jelinek wrote:
> +  If tentative definitions of particular variable or variables need to be

I believe that would be "a particular variable", but best to simplify
to "of particular variables".

> +  placed in a common block, __attribute__((__common__)) can be
> +  used to force that behavior for those even in code compiled without
> +  -fcommon.

Here I'd omit "for those".

This makes sense to me and reads well; okay from my side. :)

Gerald


Patch to fix PR93561

2020-02-06 Thread Vladimir Makarov

The following patch fixes

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93561

The patch was successfully bootstrapped on x86-64.

commit d26f37a16e3ed3d75a93ffb1da10c44c36a8a36d (HEAD -> master)
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 1754aa76399..aec58a06529 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2020-02-06  
+  	Vladimir Makarov  
+
+	PR rtl-optimization/93561
+	* lra-assigns.c (spill_for): Check that tested hard regno is not out of
+	hard register range.
+
 2020-02-06  Richard Sandiford  
 
 	* config/aarch64/aarch64.md (aarch64_movk): Add a type
diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c
index 031ce402c32..40e323c2a64 100644
--- a/gcc/lra-assigns.c
+++ b/gcc/lra-assigns.c
@@ -964,6 +964,8 @@ spill_for (int regno, bitmap spilled_pseudo_bitmap, bool first_p)
   bitmap_clear (&spill_pseudos_bitmap);
   for (j = hard_regno_nregs (hard_regno, mode) - 1; j >= 0; j--)
 	{
+  if (hard_regno + j >= FIRST_PSEUDO_REGISTER)
+	break;
 	  if (try_hard_reg_pseudos_check[hard_regno + j] != curr_pseudo_check)
 	continue;
 	  lra_assert (!bitmap_empty_p (&try_hard_reg_pseudos[hard_regno + j]));


Re: [PATCH] middle-end/93519 - avoid folding stmts in obviously unreachable code

2020-02-06 Thread Martin Sebor

On 2/6/20 7:52 AM, Richard Biener wrote:

The inliner folds stmts delayed, the following arranges things so
to not fold stmts that are obviously not reachable to avoid warnings
from those code regions.

Bootstrapped and tested on x86_64-unknown-linux-gnu.


It fixes the reported problem so it works for me.

The tests I submitted with my patch fail a number of cases because
along with strcpy it also deferred folding overlapping mempcpy calls.
That was not strictly part of the regression so I'm okay with deferring
it until GCC 11.  I will resubmit an updated patch to defer the folding
then.

Thanks
Martin


OK?

Thanks,
Richard.

2020-02-06  Richard Biener  

PR middle-end/93519
* tree-inline.c (fold_marked_statements): Do a PRE walk,
skipping unreachable regions.
(optimize_inline_calls): Skip folding stmts when we didn't
inline.

* gcc.dg/Wrestrict-21.c: New testcase.
---
  gcc/testsuite/gcc.dg/Wrestrict-21.c |  18 +++
  gcc/tree-inline.c   | 195 
  2 files changed, 133 insertions(+), 80 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/Wrestrict-21.c

diff --git a/gcc/testsuite/gcc.dg/Wrestrict-21.c 
b/gcc/testsuite/gcc.dg/Wrestrict-21.c
new file mode 100644
index 000..e300663758e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wrestrict-21.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wrestrict" } */
+
+static char *
+str_numth(char *dest, char *num, int type)
+{
+  if (dest != num)
+__builtin_strcpy(dest, num); /* { dg-bogus "is the same" } */
+  __builtin_strcat(dest, "foo");
+  return dest;
+}
+
+void
+DCH_to_char(char *in, char *out, int collid)
+{
+  char *s = out;
+  str_numth(s, s, 42);
+}
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 5b0050a53d2..19154bb843e 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -5261,86 +5261,118 @@ static void
  fold_marked_statements (int first, hash_set *statements)
  {
auto_bitmap to_purge;
-  for (; first < last_basic_block_for_fn (cfun); first++)
-if (BASIC_BLOCK_FOR_FN (cfun, first))
-  {
-gimple_stmt_iterator gsi;
  
-	for (gsi = gsi_start_bb (BASIC_BLOCK_FOR_FN (cfun, first));

-!gsi_end_p (gsi);
-gsi_next (&gsi))
- if (statements->contains (gsi_stmt (gsi)))
-   {
- gimple *old_stmt = gsi_stmt (gsi);
- tree old_decl
-   = is_gimple_call (old_stmt) ? gimple_call_fndecl (old_stmt) : 0;
+  auto_vec stack (n_basic_blocks_for_fn (cfun) + 2);
+  auto_sbitmap visited (last_basic_block_for_fn (cfun));
+  bitmap_clear (visited);
+
+  stack.quick_push (ei_start (ENTRY_BLOCK_PTR_FOR_FN (cfun)->succs));
+  while (!stack.is_empty ())
+{
+  /* Look at the edge on the top of the stack.  */
+  edge_iterator ei = stack.last ();
+  basic_block dest = ei_edge (ei)->dest;
+  edge known_taken;
+
+  if (dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
+ && !bitmap_bit_p (visited, dest->index)
+ /* Avoid walking unreachable edges, the iteration scheme
+using edge iterators doesn't allow to not push them so
+ignore them here instead (FIXME: use an edge flag at least?).  */
+ && !((known_taken = find_taken_edge (ei_edge (ei)->src, NULL_TREE))
+  && known_taken != ei_edge (ei)))
+   {
+ bitmap_set_bit (visited, dest->index);
  
-	  if (old_decl && fndecl_built_in_p (old_decl))

-   {
- /* Folding builtins can create multiple instructions,
-we need to look at all of them.  */
- gimple_stmt_iterator i2 = gsi;
- gsi_prev (&i2);
- if (fold_stmt (&gsi))
-   {
- gimple *new_stmt;
- /* If a builtin at the end of a bb folded into nothing,
-the following loop won't work.  */
- if (gsi_end_p (gsi))
-   {
- cgraph_update_edges_for_call_stmt (old_stmt,
-old_decl, NULL);
- break;
-   }
- if (gsi_end_p (i2))
-   i2 = gsi_start_bb (BASIC_BLOCK_FOR_FN (cfun, first));
- else
-   gsi_next (&i2);
- while (1)
-   {
- new_stmt = gsi_stmt (i2);
- update_stmt (new_stmt);
- cgraph_update_edges_for_call_stmt (old_stmt, old_decl,
-new_stmt);
+ if (dest->index >= first)
+   for (gimple_stmt_iterator gsi = gsi_start_bb (dest);
+!gsi_end_p (gsi); gsi_next (&gsi))
+ {
+   if (!statements->contains (gsi_stmt (gsi)))
+ continue;
  
-			  if

[PATCH 2/2] analyzer: use ultimate alias target at calls (PR 93288)

2020-02-06 Thread David Malcolm
PR analyzer/93288 reports an ICE in a C++ testcase when calling a
constructor.

The issue is that when building the supergraph, we encounter the
cgraph edge to "__ct_comp ", the DECL_COMPLETE_CONSTRUCTOR_P, and
this node's DECL_STRUCT_FUNCTION has a NULL CFG, which the analyzer
reads through, leading to the ICE.

This patch reworks function and fndecl lookup at calls throughout the
analyzer so that it looks for the ultimate_alias_target of the callee.
In the case above, this means using the "__ct_base " for the ctor,
which has a CFG, fixing the ICE.

Getting this right allows for some simple C++ cases involving ctors to
work, so the patch also adds some test coverage for that.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

gcc/analyzer/ChangeLog:
PR analyzer/93288
* analysis-plan.cc (analysis_plan::use_summary_p): Look through
the ultimate_alias_target when getting the called function.
* engine.cc (exploded_node::on_stmt): Rename second "ctxt" to
"sm_ctxt".  Use the region_model's get_fndecl_for_call rather than
gimple_call_fndecl.
* region-model.cc (region_model::get_fndecl_for_call): Use
ultimate_alias_target on fndecl.
* supergraph.cc (get_ultimate_function_for_cgraph_edge): New
function.
(supergraph_call_edge): Use it when rejecting edges without
functions.
(supergraph::supergraph): Use it to get the function for the
cgraph_edge when building interprocedural superedges.
(callgraph_superedge::get_callee_function):  Use it.
* supergraph.h (supergraph::get_num_snodes): Make param const.
(supergraph::function_to_num_snodes_t): Make first type param
const.

gcc/testsuite/ChangeLog:
PR analyzer/93288
* g++.dg/analyzer/malloc.C: Add test coverage for a double-free
called in a constructor.
* g++.dg/analyzer/pr93288.C: New test.
---
 gcc/analyzer/analysis-plan.cc   |  6 +-
 gcc/analyzer/engine.cc  | 12 +--
 gcc/analyzer/region-model.cc|  5 -
 gcc/analyzer/supergraph.cc  | 28 +++--
 gcc/analyzer/supergraph.h   |  4 ++--
 gcc/testsuite/g++.dg/analyzer/malloc.C  | 16 ++
 gcc/testsuite/g++.dg/analyzer/pr93288.C |  8 +++
 7 files changed, 63 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/analyzer/pr93288.C

diff --git a/gcc/analyzer/analysis-plan.cc b/gcc/analyzer/analysis-plan.cc
index 8ad2fa2ebb4..3c8b10b3314 100644
--- a/gcc/analyzer/analysis-plan.cc
+++ b/gcc/analyzer/analysis-plan.cc
@@ -120,7 +120,11 @@ analysis_plan::use_summary_p (const cgraph_edge *edge) 
const
 
   /* Require the callee to be sufficiently complex to be worth
  summarizing.  */
-  if ((int)m_sg.get_num_snodes (callee->get_fun ())
+  const function *fun
+= const_cast  (callee)->ultimate_alias_target ()->get_fun 
();
+  /* TODO(stage1): can ultimate_alias_target be made const?  */
+
+  if ((int)m_sg.get_num_snodes (fun)
   < param_analyzer_min_snodes_for_call_summary)
 return false;
 
diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 63579da953a..c4d7088d3e9 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -1044,19 +1044,19 @@ exploded_node::on_stmt (exploded_graph &eg,
   const sm_state_map *old_smap
= old_state.m_checker_states[sm_idx];
   sm_state_map *new_smap = state->m_checker_states[sm_idx];
-  impl_sm_context ctxt (eg, sm_idx, sm, this, &old_state, state,
-   change,
-   old_smap, new_smap);
+  impl_sm_context sm_ctxt (eg, sm_idx, sm, this, &old_state, state,
+  change,
+  old_smap, new_smap);
   /* Allow the state_machine to handle the stmt.  */
-  if (sm.on_stmt (&ctxt, snode, stmt))
+  if (sm.on_stmt (&sm_ctxt, snode, stmt))
unknown_side_effects = false;
   else
{
  /* For those stmts that were not handled by the state machine.  */
  if (const gcall *call = dyn_cast  (stmt))
{
- tree callee_fndecl = gimple_call_fndecl (call);
- // TODO: maybe we can be smarter about handling function pointers?
+ tree callee_fndecl
+   = state->m_region_model->get_fndecl_for_call (call, &ctxt);
 
  if (!fndecl_has_gimple_body_p (callee_fndecl))
new_smap->purge_for_unknown_fncall (eg, sm, call, callee_fndecl,
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 61390aa4cd1..0ae7536a032 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -6665,7 +6665,10 @@ region_model::get_fndecl_for_call (const gcall *call,
   if (code)
{
  tree fn_decl = code->get_tree_for_child_region (fn_rid);
- return fn_decl;
+ const cgraph_node *ultimat

[PATCH 1/2] analyzer: g++ testsuite support

2020-02-06 Thread David Malcolm
PR analyzer/93288 reports a C++-specific ICE with -fanalyzer.

This patch creates the beginnings of a C++ test suite for the analyzer,
so that there's a place to put test coverage for the fix.
It adds a regression test for PR analyzer/93212, an ICE fixed
in r10-5970-g32077b693df8e3ed0424031a322df23822bf2f7e.

Successfully regrtested on x86_64-pc-linux-gnu (in conjunction with
the second patch).

OK for master?

gcc/testsuite/ChangeLog:
PR analyzer/93212
* g++.dg/analyzer/analyzer.exp: New subdirectory and .exp suite.
* g++.dg/analyzer/malloc.C: New test.
* g++.dg/analyzer/pr93212.C: New test.
---
 gcc/testsuite/g++.dg/analyzer/analyzer.exp | 49 ++
 gcc/testsuite/g++.dg/analyzer/malloc.C |  9 
 gcc/testsuite/g++.dg/analyzer/pr93212.C| 17 
 3 files changed, 75 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/analyzer/analyzer.exp
 create mode 100644 gcc/testsuite/g++.dg/analyzer/malloc.C
 create mode 100644 gcc/testsuite/g++.dg/analyzer/pr93212.C

diff --git a/gcc/testsuite/g++.dg/analyzer/analyzer.exp 
b/gcc/testsuite/g++.dg/analyzer/analyzer.exp
new file mode 100644
index 000..60262f678ee
--- /dev/null
+++ b/gcc/testsuite/g++.dg/analyzer/analyzer.exp
@@ -0,0 +1,49 @@
+#   Copyright (C) 2020 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# G++ testsuite that uses the `dg.exp' driver.
+
+# Load support procs.
+load_lib g++-dg.exp
+
+# If the analyzer has not been enabled, bail.
+if { ![check_effective_target_analyzer] } {
+return
+}
+
+if [info exists DEFAULT_CXXFLAGS] then {
+  set save_default_cxxflags $DEFAULT_CXXFLAGS
+}
+
+# If a testcase doesn't have special options, use these.
+set DEFAULT_CXXFLAGS " -fanalyzer -fdiagnostics-path-format=separate-events 
-Wanalyzer-too-complex -fanalyzer-call-summaries"
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+set tests [lsort [glob -nocomplain $srcdir/$subdir/*.C]]
+
+g++-dg-runtest $tests "" $DEFAULT_CXXFLAGS
+
+# All done.
+dg-finish
+
+if [info exists save_default_cxxflags] {
+  set DEFAULT_CXXFLAGS $save_default_cxxflags
+} else {
+  unset DEFAULT_CXXFLAGS
+}
diff --git a/gcc/testsuite/g++.dg/analyzer/malloc.C 
b/gcc/testsuite/g++.dg/analyzer/malloc.C
new file mode 100644
index 000..0637295e1f2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/analyzer/malloc.C
@@ -0,0 +1,9 @@
+// { dg-do compile }
+
+#include 
+
+void test_1 (void *ptr)
+{
+  free (ptr);
+  free (ptr); /* { dg-warning "double-'free' of 'ptr'" } */
+}
diff --git a/gcc/testsuite/g++.dg/analyzer/pr93212.C 
b/gcc/testsuite/g++.dg/analyzer/pr93212.C
new file mode 100644
index 000..cfbb42d2275
--- /dev/null
+++ b/gcc/testsuite/g++.dg/analyzer/pr93212.C
@@ -0,0 +1,17 @@
+// { dg-do compile { target c++11 } }
+
+#include 
+auto lol()
+{
+int aha = 3;
+return [&aha] {
+return aha;
+};
+}
+
+int main()
+{
+auto lambda = lol();
+std::cout << lambda() << std::endl;
+return 0;
+}
-- 
2.21.0



[wwwdocs] Mention common attribute in gcc-10/porting_to.html

2020-02-06 Thread Jakub Jelinek
Hi!

On Tue, Jan 07, 2020 at 03:15:05PM +, Wilco Dijkstra wrote:
> --- a/htdocs/gcc-10/porting_to.html
> +++ b/htdocs/gcc-10/porting_to.html
> @@ -29,9 +29,25 @@ and provide solutions. Let us know if you have suggestions 
> for improvements!
>  Preprocessor issues
>  -->
>  
> -
> +
> +Default to -fno-common
> +
> +
> +  A common mistake in C is omitting extern when declaring a 
> global
> +  variable in a header file.  If the header is included by several files it
> +  results in multiple definitions of the same variable.  In previous GCC
> +  versions this error is ignored.  GCC 10 defaults to 
> -fno-common,
> +  which means a linker error will now be reported.
> +  To fix this, use extern in header files when declaring global
> +  variables, and ensure each global is defined in exactly one C file.
> +  As a workaround, legacy C code can be compiled with -fcommon.
> +
> +  
> +  int x;  // tentative definition - avoid in header files
> +
> +  extern int y;  // correct declaration in a header file
> +  
>  
>  Fortran language issues

IMHO we should mention also the common attribute, in some cases the common
behavior is intentional decision and there is no problem supporting it,
just the code should mark it explicitly.

Ok for wwwdocs?

diff --git a/htdocs/gcc-10/porting_to.html b/htdocs/gcc-10/porting_to.html
index 980d3af1..c5d7eb82 100644
--- a/htdocs/gcc-10/porting_to.html
+++ b/htdocs/gcc-10/porting_to.html
@@ -41,7 +41,12 @@ and provide solutions. Let us know if you have suggestions 
for improvements!
   which means a linker error will now be reported.
   To fix this, use extern in header files when declaring global
   variables, and ensure each global is defined in exactly one C file.
-  As a workaround, legacy C code can be compiled with -fcommon.
+  If tentative definitions of particular variable or variables need to be
+  placed in a common block, __attribute__((__common__)) can be
+  used to force that behavior for those even in code compiled without
+  -fcommon.
+  As a workaround, legacy C code where all tentative definitions should
+  be placed into a common block can be compiled with -fcommon.
 
   
   int x;  // tentative definition - avoid in header files


Jakub



[PATCH 1/2] analyzer: gfortran testsuite support

2020-02-06 Thread David Malcolm
PR analyzer/93405 reports an ICE when attempting to use -fanalyzer on
certain gfortran code.  The second patch in this kit fixes that, but
in the meantime I need somewhere to put regression tests for -fanalyzer
with gfortran.

This patch adds a gfortran.dg/analyzer subdirectory with an analyzer.exp,
setting DEFAULT_FFLAGS on the tests run within it.

It also adds a couple of simple proof-of-concept tests of e.g. detecting
double-frees from gfortran.  These work, though there are some issues
with the output:
(a) the double-free is reported as:
Warning: double-‘free’ of ‘_1’
 rather than:
Warning: double-‘free’ of ‘ptr_x’

(b) the default output format for diagnostic paths is
-fdiagnostics-path-format=inline-events
but the various events in the path all have column == 0, and
the path-printing doesn't do a good job of that (the event descriptions
don't show up)
With -fdiagnostics-path-format=separate-events, the output looks like:

../../src/gcc/testsuite/gfortran.dg/analyzer/malloc.f90:18:0:

   18 |   call free(ptr_x) ! { dg-warning "double-'free'" }
  |
Warning: double-‘free’ of ‘_1’ [CWE-415] [-Wanalyzer-double-free]
../../src/gcc/testsuite/gfortran.dg/analyzer/malloc.f90:16:0:

   16 |   ptr_x = malloc(20*8)
  |
note: (1) allocated here
../../src/gcc/testsuite/gfortran.dg/analyzer/malloc.f90:17:0:

   17 |   call free(ptr_x)
  |
note: (2) first ‘free’ here
../../src/gcc/testsuite/gfortran.dg/analyzer/malloc.f90:18:0:

   18 |   call free(ptr_x) ! { dg-warning "double-'free'" }
  |
note: (3) second ‘free’ here; first ‘free’ was at (2)

In any case, is this OK for master? (as a place to put such tests, and
do the tests look sane?  I'm not an expert at Fortran, sorry).

Successfully tested on x86_64-pc-linux-gnu; the combination of the
two patches add 6 PASS results to gfortran.sum

gcc/testsuite/ChangeLog:
* gfortran.dg/analyzer/analyzer.exp: New subdirectory and .exp
suite.
* gfortran.dg/analyzer/malloc-example.f90: New test.
* gfortran.dg/analyzer/malloc.f90: New test.
---
 .../gfortran.dg/analyzer/analyzer.exp | 55 +++
 .../gfortran.dg/analyzer/malloc-example.f90   | 21 +++
 gcc/testsuite/gfortran.dg/analyzer/malloc.f90 | 19 +++
 3 files changed, 95 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/analyzer/analyzer.exp
 create mode 100644 gcc/testsuite/gfortran.dg/analyzer/malloc-example.f90
 create mode 100644 gcc/testsuite/gfortran.dg/analyzer/malloc.f90

diff --git a/gcc/testsuite/gfortran.dg/analyzer/analyzer.exp 
b/gcc/testsuite/gfortran.dg/analyzer/analyzer.exp
new file mode 100644
index 000..00edfa54dce
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/analyzer/analyzer.exp
@@ -0,0 +1,55 @@
+#  Copyright (C) 2020 Free Software Foundation, Inc.
+
+#  This file is part of GCC.
+#
+#  GCC is free software; you can redistribute it and/or modify it under
+#  the terms of the GNU General Public License as published by the Free
+#  Software Foundation; either version 3, or (at your option) any later
+#  version.
+#
+#  GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+#  WARRANTY; without even the implied warranty of MERCHANTABILITY or
+#  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+#  for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with GCC; see the file COPYING3.  If not see
+#  .
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Load support procs.
+load_lib gfortran-dg.exp
+load_lib gfortran.exp
+
+# If the analyzer has not been enabled, bail.
+if { ![check_effective_target_analyzer] } {
+return
+}
+
+global DEFAULT_FFLAGS
+if [info exists DEFAULT_FFLAGS] then {
+  set save_default_fflags $DEFAULT_FFLAGS
+}
+
+# If a testcase doesn't have special options, use these.
+set DEFAULT_FFLAGS "-fanalyzer -fdiagnostics-path-format=separate-events 
-Wanalyzer-too-complex -fanalyzer-call-summaries"
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+
+gfortran_init
+
+gfortran-dg-runtest [lsort \
+   [glob -nocomplain $srcdir/$subdir/*.\[fF\]{,90,95,03,08} ] ] "" 
$DEFAULT_FFLAGS
+
+# All done.
+dg-finish
+
+if [info exists save_default_fflags] {
+  set DEFAULT_FFLAGS $save_default_fflags
+} else {
+  unset DEFAULT_FFLAGS
+}
diff --git a/gcc/testsuite/gfortran.dg/analyzer/malloc-example.f90 
b/gcc/testsuite/gfortran.dg/analyzer/malloc-example.f90
new file mode 100644
index 000..4c48d415e05
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/analyzer/malloc-example.f90
@@ -0,0 +1,21 @@
+! Example from GCC documentation
+! { dg-do compile }
+! { dg-additional-options "-fcray-pointer" }
+
+program test_malloc
+  implicit none
+  integer i
+  real*8 x(*), z
+  pointer(ptr_x,x)
+
+  ptr_x = malloc(20*8)
+  do i = 1, 20
+x(i) = sqrt(1.0d0 / i)
+  end do
+  z = 0
+  do i = 1, 20
+z = z + x(i)
+print *, z
+  end do
+  call free(ptr_x)
+end program 

[PATCH 2/2] analyzer: fix ICE with fortran constant arguments (PR 93405)

2020-02-06 Thread David Malcolm
PR analyzer/93405 reports an ICE with -fanalyzer when passing
a constant "by reference" in gfortran.

The issue is that the constant is passed as an ADDR_EXPR
of a CONST_DECL, and region_model::get_lvalue_1 doesn't
know how to handle CONST_DECL.

This patch implements it for CONST_DECL by providing
a placeholder region, holding the CONST_DECL's value,
fixing the ICE.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

Is the Fortran testcase OK for master?  (this relies on patch 1 in
the kit, obviously)

gcc/analyzer/ChangeLog:
PR analyzer/93405
* region-model.cc (region_model::get_lvalue_1): Implement
CONST_DECL.

gcc/testsuite/ChangeLog:
PR analyzer/93405
* gfortran.dg/analyzer/pr93405.f90: New test.
---
 gcc/analyzer/region-model.cc   | 13 +
 gcc/testsuite/gfortran.dg/analyzer/pr93405.f90 | 14 ++
 2 files changed, 27 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/analyzer/pr93405.f90

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 8b57a623084..61390aa4cd1 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -4717,6 +4717,19 @@ region_model::get_lvalue_1 (path_var pv, 
region_model_context *ctxt)
   }
   break;
 
+case CONST_DECL:
+  {
+   tree cst_type = TREE_TYPE (expr);
+   region_id cst_rid = add_region_for_type (m_root_rid, cst_type);
+   if (tree value = DECL_INITIAL (expr))
+ {
+   svalue_id sid = get_rvalue (value, ctxt);
+   get_region (cst_rid)->set_value (*this, cst_rid, sid, ctxt);
+ }
+   return cst_rid;
+  }
+  break;
+
 case STRING_CST:
   {
tree cst_type = TREE_TYPE (expr);
diff --git a/gcc/testsuite/gfortran.dg/analyzer/pr93405.f90 
b/gcc/testsuite/gfortran.dg/analyzer/pr93405.f90
new file mode 100644
index 000..e2c23753015
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/analyzer/pr93405.f90
@@ -0,0 +1,14 @@
+! { dg-do compile }
+real a(10), b(10), c(10)
+a = 0.
+b = 1.
+call sum(a, b, c, 10)
+print *, c(5)
+end
+subroutine sum(a, b, c, n)
+integer i, n
+real a(n), b(n), c(n)
+do i = 1, n
+   c(i) = a(i) + b(i)
+enddo
+end
-- 
2.21.0



Re: [PATCH] Fix PowerPC -fstack-clash-protection -mprefixed-addr ICE (PR target/93122)

2020-02-06 Thread Jakub Jelinek
On Thu, Feb 06, 2020 at 01:15:25PM -0600, Segher Boessenkool wrote:
> On Thu, Jan 30, 2020 at 05:14:08PM +0100, Jakub Jelinek wrote:
> > Here is what I meant as the alternative, i.e. don't check any predicates,
> > just gen_add3_insn, if that fails, force rs into register and retry.
> 
> I don't like gen_add3_insn here *at all*, as I said, but okay, you're
> only fixing existing code.  But as long as it is there, this code will
> be a problem child.

gen_add3_insn is used 25 times elsewhere in the rs6000 backend when not
counting these 2 calls that were just slightly moved around by the patch.

> > And, add REG_FRAME_RELATED_EXPR note always when we haven't emitted a single
> > insn that has rtl exactly matching what we'd add the REG_FRAME_RELATED_EXPR
> > with (in that case, dwarf2cfi.c is able to figure it out by itself, no need
> > to waste compile time memory).
> 
> I would say "just always emit that note", but that is what the patch
> does, already :-)

No, the patch doesn't emit it always, see below.

> > +  rtx set;
> > +  if (!NONJUMP_INSN_P (insn)
> > + || NEXT_INSN (insn)
> > + || (set = single_set (insn)) == NULL_RTX
> > + || SET_DEST (set) != end_addr
> > + || GET_CODE (SET_SRC (set)) != PLUS
> > + || XEXP (SET_SRC (set), 0) != stack_pointer_rtx
> > + || XEXP (SET_SRC (set), 1) != rs)
> 
> Please don't have side effects in conditions.  Two nested ifs would be
> fine here.

Ok, so like attached?  There is an add_note boolean, so that the
insn = emit_insn (insn); doesn't have to be done 3 times.

> > + insn = emit_insn (insn);
> > + /* Describe the effect of INSN to the CFI engine, unless it
> > +is a single insn that describes it itself.  */
> >   add_reg_note (insn, REG_FRAME_RELATED_EXPR,
> > gen_rtx_SET (end_addr,
> >  gen_rtx_PLUS (Pmode, stack_pointer_rtx,
> >rs)));
> 
> So please fix the comment?

The point is not to add the REG_FRAME_RELATED_EXPR note in the most common case
where it would be just waste of compile time memory.
E.g.
(insn/f 17 16 18 2 (set (reg:DI 12 12)
(plus:DI (reg/f:DI 1 1)
(const_int -40960 [0x6000]))) "pr93122-2.c":9:1 -1
 (nil))
doesn't need the note, the /f flag on the insn is all that is needed,
because the instruction is self-descriptive to dwarf2frame.c.  The note
we'd add in that case would be
(set (reg:DI 12 12) (plus:DI (reg/f:DI 1 1) (const_int -40960)))
but that is the PATTERN of the insn already.
We need it in all other cases, when there is more than one insn doing that
or it isn't like that (end_addr = stack_pointer_rtx + rs).
So e.g. on the testcase in the patch,
(insn 17 16 18 2 (set (reg:DI 12 12)
(const_int -4294967296 [0x])) "pr93122.c":9:1 -1
 (nil))
(insn/f 18 17 19 2 (set (reg:DI 12 12)
(plus:DI (reg:DI 12 12)
(reg/f:DI 1 1))) "pr93122.c":9:1 -1
 (expr_list:REG_FRAME_RELATED_EXPR (set (reg:DI 12 12)
(plus:DI (reg/f:DI 1 1)
(const_int -4294967296 [0x])))
(nil)))
Only the last insn is /f marked and the effect of it is the one
written in the note.

2020-02-06  Jakub Jelinek  

PR target/93122
* config/rs6000/rs6000-logue.c
(rs6000_emit_probe_stack_range_stack_clash): Always use gen_add3_insn,
if it fails, move rs into end_addr and retry.  Add
REG_FRAME_RELATED_EXPR note whenever it returns more than one insn or
the insn pattern doesn't describe well what exactly happens to
dwarf2cfi.c.

* gcc.target/powerpc/pr93122.c: New test.

--- gcc/config/rs6000/rs6000-logue.c.jj 2020-01-30 17:55:38.606339203 +0100
+++ gcc/config/rs6000/rs6000-logue.c2020-02-06 20:36:21.511409319 +0100
@@ -1604,20 +1604,34 @@ rs6000_emit_probe_stack_range_stack_clas
   rtx end_addr
= copy_reg ? gen_rtx_REG (Pmode, 0) : gen_rtx_REG (Pmode, 12);
   rtx rs = GEN_INT (-rounded_size);
-  rtx_insn *insn;
-  if (add_operand (rs, Pmode))
-   insn = emit_insn (gen_add3_insn (end_addr, stack_pointer_rtx, rs));
+  rtx_insn *insn = gen_add3_insn (end_addr, stack_pointer_rtx, rs);
+  if (insn == NULL)
+   {
+ emit_move_insn (end_addr, rs);
+ insn = gen_add3_insn (end_addr, end_addr, stack_pointer_rtx);
+ gcc_assert (insn);
+   }
+  bool add_note = false;
+  if (!NONJUMP_INSN_P (insn) || NEXT_INSN (insn))
+   add_note = true;
   else
{
- emit_move_insn (end_addr, GEN_INT (-rounded_size));
- insn = emit_insn (gen_add3_insn (end_addr, end_addr,
-  stack_pointer_rtx));
- /* Describe the effect of INSN to the CFI engine.  */
- add_reg_note (insn, REG_FRAME_RELATED_EXPR,
-   gen_rtx_SET (end_addr,
-gen_rtx_PLUS (Pmode, stack_poi

[committed] analyzer: round-trip pointer-equality through intptr_t

2020-02-06 Thread David Malcolm
When investigating how the analyzer handles malloc/free of Cray pointers
in gfortran I noticed that that analyzer was losing information on
pointers that were cast to an integer type, and then back to a pointer
type again.

The root cause is that region_model::maybe_cast_1 was only preserving
the region_svalue-ness of the result if both types were pointers,
instead returning an unknown_svalue for a pointer-to-int cast.

This patch updates the above code so that it attempts to use a
region_svalue if *either* type is a pointer

Doing so allows the analyzer to recognize that the same underlying
region is in use through various casts through integer types.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as cb273d81a45092ceee793f0357526e291f03c7b7.

gcc/analyzer/ChangeLog:
* region-model.cc (region_model::maybe_cast_1): Attempt to provide
a region_svalue if either type is a pointer, rather than if both
types are pointers.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/torture/intptr_t.c: New test.
---
 gcc/analyzer/region-model.cc  |  2 +-
 .../gcc.dg/analyzer/torture/intptr_t.c| 28 +++
 2 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/torture/intptr_t.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 571ae6e4d56..8b57a623084 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -5004,7 +5004,7 @@ region_model::maybe_cast_1 (tree dst_type, svalue_id sid)
 return sid;
 
   if (POINTER_TYPE_P (dst_type)
-  && POINTER_TYPE_P (src_type))
+  || POINTER_TYPE_P (src_type))
 {
   /* Pointer to region.  */
   if (region_svalue *ptr_sval = sval->dyn_cast_region_svalue ())
diff --git a/gcc/testsuite/gcc.dg/analyzer/torture/intptr_t.c 
b/gcc/testsuite/gcc.dg/analyzer/torture/intptr_t.c
new file mode 100644
index 000..847ba626350
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/torture/intptr_t.c
@@ -0,0 +1,28 @@
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } { "" } } */
+
+#include 
+
+typedef __INTPTR_TYPE__ intptr_t;
+typedef __UINTPTR_TYPE__ uintptr_t;
+
+void test_1 (void)
+{
+  intptr_t ip;
+  void *p = malloc (1024);
+  ip = (intptr_t)p;
+  free ((void *)ip);
+} /* { dg-bogus "leak" } */
+
+void test_2 (void)
+{
+  uintptr_t uip;
+  void *p = malloc (1024);
+  uip = (uintptr_t)p;
+  free ((void *)uip);
+} /* { dg-bogus "leak" } */
+
+void test_3 (intptr_t ip)
+{
+  free ((void *)ip); /* { dg-message "first 'free'" } */
+  free ((void *)ip); /* { dg-warning "double-'free'" } */
+}
-- 
2.21.0



[PATCH/RFC] analyzer: workaround for nested pp_printf

2020-02-06 Thread David Malcolm
The dumps from the analyzer sometimes contain garbled output.

The root cause is due to nesting of calls to pp_printf: I'm using
pp_printf with %qT to print types with a PP using default_tree_printer.

default_tree_printer handles 'T' (and various other codes) via
  dump_generic_node (pp, t, 0, TDF_SLIM, 0);
and dump_generic_node can call pp_printf in various ways, leading
to a pp_printf within a pp_printf, and garbled output.

I don't think it's feasible to fix pp_printf to be reentrant, in
stage 4, at least, so for the moment this patch works around it
in the analyzer.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for master?  Or am I missing something here?  (in that I'm
surprised that this problem isn't more widespread)

Thanks
Dave

gcc/analyzer/ChangeLog:
* region-model.cc (print_quoted_type): New function.
(svalue::print): Use it to replace %qT.
(region::dump_to_pp): Likewise.
(region::dump_child_label): Likewise.
(region::print_fields): Likewise.
---
 gcc/analyzer/region-model.cc | 35 +++
 1 file changed, 31 insertions(+), 4 deletions(-)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index c837ec6ed3b..571ae6e4d56 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -73,6 +73,25 @@ dump_tree (pretty_printer *pp, tree t)
   dump_generic_node (pp, t, 0, TDF_SLIM, 0);
 }
 
+/* Equivalent to pp_printf (pp, "%qT", t), to avoid nesting pp_printf
+   calls within other pp_printf calls.
+
+   default_tree_printer handles 'T' and some other codes by calling
+ dump_generic_node (pp, t, 0, TDF_SLIM, 0);
+   dump_generic_node calls pp_printf in various places, leading to
+   garbled output.
+
+   Ideally pp_printf could be made to be reentrant, but in the meantime
+   this function provides a workaround.  */
+
+static void
+print_quoted_type (pretty_printer *pp, tree t)
+{
+  pp_begin_quote (pp, pp_show_color (pp));
+  dump_generic_node (pp, t, 0, TDF_SLIM, 0);
+  pp_end_quote (pp, pp_show_color (pp));
+}
+
 /* Dump this path_var to PP (which must support %E for trees).
 
Express the stack depth using an "@DEPTH" suffix, so e.g. given
@@ -319,7 +338,9 @@ svalue::print (const region_model &model,
   if (m_type)
 {
   gcc_assert (TYPE_P (m_type));
-  pp_printf (pp, "type: %qT, ", m_type);
+  pp_string (pp, "type: ");
+  print_quoted_type (pp, m_type);
+  pp_string (pp, ", ");
 }
 
   /* vfunc.  */
@@ -1282,7 +1303,8 @@ region::dump_to_pp (const region_model &model,
 }
   if (m_type)
 {
-  pp_printf (pp, "%s type: %qT", field_prefix, m_type);
+  pp_printf (pp, "%s type: ", field_prefix);
+  print_quoted_type (pp, m_type);
   pp_newline (pp);
 }
 
@@ -1332,7 +1354,9 @@ region::dump_child_label (const region_model &model,
pp_string (pp, "active ");
   else
pp_string (pp, "inactive ");
-  pp_printf (pp, "view as %qT: ", child->get_type ());
+  pp_string (pp, "view as ");
+  print_quoted_type (pp, child->get_type ());
+  pp_string (pp, ": ");
 }
 }
 
@@ -1463,7 +1487,10 @@ region::print_fields (const region_model &model 
ATTRIBUTE_UNUSED,
   m_sval_id.print (pp);
 
   if (m_type)
-pp_printf (pp, ", type: %qT", m_type);
+{
+  pp_printf (pp, ", type: ");
+  print_quoted_type (pp, m_type);
+}
 }
 
 /* Determine if a pointer to this region must be non-NULL.
-- 
2.21.0



Re: [PATCH] Fix PowerPC -fstack-clash-protection -mprefixed-addr ICE (PR target/93122)

2020-02-06 Thread Segher Boessenkool
Hi!

Sorry for dropping this once again.

On Thu, Jan 30, 2020 at 05:14:08PM +0100, Jakub Jelinek wrote:
> Here is what I meant as the alternative, i.e. don't check any predicates,
> just gen_add3_insn, if that fails, force rs into register and retry.

I don't like gen_add3_insn here *at all*, as I said, but okay, you're
only fixing existing code.  But as long as it is there, this code will
be a problem child.

> And, add REG_FRAME_RELATED_EXPR note always when we haven't emitted a single
> insn that has rtl exactly matching what we'd add the REG_FRAME_RELATED_EXPR
> with (in that case, dwarf2cfi.c is able to figure it out by itself, no need
> to waste compile time memory).

I would say "just always emit that note", but that is what the patch
does, already :-)

> +  rtx set;
> +  if (!NONJUMP_INSN_P (insn)
> +   || NEXT_INSN (insn)
> +   || (set = single_set (insn)) == NULL_RTX
> +   || SET_DEST (set) != end_addr
> +   || GET_CODE (SET_SRC (set)) != PLUS
> +   || XEXP (SET_SRC (set), 0) != stack_pointer_rtx
> +   || XEXP (SET_SRC (set), 1) != rs)

Please don't have side effects in conditions.  Two nested ifs would be
fine here.

> +   insn = emit_insn (insn);
> +   /* Describe the effect of INSN to the CFI engine, unless it
> +  is a single insn that describes it itself.  */
> add_reg_note (insn, REG_FRAME_RELATED_EXPR,
>   gen_rtx_SET (end_addr,
>gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>  rs)));

So please fix the comment?


Segher


[PATCH] PR target/93569 [version 2], Fix PowerPC vsx-builtin-15d.c test case

2020-02-06 Thread Michael Meissner
This patch addresses the concern the Segher raised in the original submission
of the patch to fix PR target/93569.  In addition to checking for D*-form
addresses in the traditional Altivec registers, this patch also checks for
D*-form addresses for vectors in the traditional floating point registers.
Neither one of these address forms were allowed before ISA 3.0 (power9).

I have done bootstraps on both little and big endian Linux 64-bit systems, and
there were no regressions for this change.  Can I check this patch into the
master branch?

https://gcc.gnu.org/ml/gcc-patches/2020-02/msg00387.html

2020-02-06  Michael Meissner  

PR target/93569
* config/rs6000/rs6000.c (reg_to_non_prefixed): Before ISA 3.0
we only had X-FORM (reg+reg) addressing for vectors.  Also before
ISA 3.0, we only had X-FORM addressing for scalars in the
traditional Altivec registers.

--- /tmp/VQDg8p_rs6000.c2020-02-06 11:55:27.509363545 -0500
+++ gcc/config/rs6000/rs6000.c  2020-02-06 11:54:28.461531334 -0500
@@ -24923,7 +24923,8 @@ reg_to_non_prefixed (rtx reg, machine_mo
   unsigned size = GET_MODE_SIZE (mode);
 
   /* FPR registers use D-mode for scalars, and DQ-mode for vectors, IEEE
- 128-bit floating point, and 128-bit integers.  */
+ 128-bit floating point, and 128-bit integers.  Before power9, only indexed
+ addressing was available for vectors.  */
   if (FP_REGNO_P (r))
 {
   if (mode == SFmode || size == 8 || FLOAT128_2REG_P (mode))
@@ -24936,16 +24937,20 @@ reg_to_non_prefixed (rtx reg, machine_mo
   && (VECTOR_MODE_P (mode)
   || FLOAT128_VECTOR_P (mode)
   || mode == TImode || mode == CTImode))
-   return NON_PREFIXED_DQ;
+   return (TARGET_P9_VECTOR) ? NON_PREFIXED_DQ : NON_PREFIXED_X;
 
   else
return NON_PREFIXED_DEFAULT;
 }
 
   /* Altivec registers use DS-mode for scalars, and DQ-mode for vectors, IEEE
- 128-bit floating point, and 128-bit integers.  */
+ 128-bit floating point, and 128-bit integers.  Before power9, only indexed
+ addressing was available.  */
   else if (ALTIVEC_REGNO_P (r))
 {
+  if (!TARGET_P9_VECTOR)
+   return NON_PREFIXED_X;
+
   if (mode == SFmode || size == 8 || FLOAT128_2REG_P (mode))
return NON_PREFIXED_DS;
 

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


[committed] aarch64: Add a type attribute to aarch64_movk

2020-02-06 Thread Richard Sandiford
Kyrill pointed out off-list that this new pattern was missing
a type attribute -- sorry about that.

Tested on aarch64-linux-gnu & pushed.

Richard


2020-02-06  Richard Sandiford  

gcc/
* config/aarch64/aarch64.md (aarch64_movk): Add a type
attribute.
---
 gcc/config/aarch64/aarch64.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 9c1f17d0f85..fbf90d907ba 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1297,6 +1297,7 @@ (define_insn "aarch64_movk"
 operands[3] = gen_int_mode (shift, SImode);
 return "movk\\t%0, #%X2, lsl %3";
   }
+  [(set_attr "type" "mov_imm")]
 )
 
 (define_expand "movti"


Re: [PATCH], PR target/93569, Fix PowerPC vsx-builtin-15d.c test case

2020-02-06 Thread Michael Meissner
On Thu, Feb 06, 2020 at 09:49:18AM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Feb 06, 2020 at 08:29:41AM -0500, Michael Meissner wrote:
> > --- /tmp/eAu61F_rs6000.c2020-02-05 18:08:48.698992017 -0500
> > +++ gcc/config/rs6000/rs6000.c  2020-02-05 17:23:55.733650185 -0500
> > @@ -24943,9 +24943,13 @@ reg_to_non_prefixed (rtx reg, machine_mo
> >  }
> >  
> >/* Altivec registers use DS-mode for scalars, and DQ-mode for vectors, 
> > IEEE
> > - 128-bit floating point, and 128-bit integers.  */
> > + 128-bit floating point, and 128-bit integers.  Before power9, only 
> > indexed
> > + addressing was available.  */
> >else if (ALTIVEC_REGNO_P (r))
> >  {
> > +  if (!TARGET_P9_VECTOR)
> > +   return NON_PREFIXED_X;
> > +
> >if (mode == SFmode || size == 8 || FLOAT128_2REG_P (mode))
> > return NON_PREFIXED_DS;
> 
> That looks fine, but is this complete?  What about the other VSRs?  Like
> right before this:
> 
>   if (FP_REGNO_P (r))
> {
>   if (mode == SFmode || size == 8 || FLOAT128_2REG_P (mode))
> return NON_PREFIXED_D;
> 
>   else if (size < 8)
> return NON_PREFIXED_X;
> 
>   else if (TARGET_VSX && size >= 16
>&& (VECTOR_MODE_P (mode)
>|| FLOAT128_VECTOR_P (mode)
>|| mode == TImode || mode == CTImode))
> return NON_PREFIXED_DQ;
> 
>   else
> return NON_PREFIXED_DEFAULT;
> }
> 
> If we are dealing with a SF or DF (or whatever else in a "legacy" FPR),
> that is fine, but what about vectors in those regs?  It says we can use
> DQ-mode here, but that is only true from p9 onward, no?

Good point.  I'll submit a revised patch once the bootstrap and make check 
finishes.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


[PATCH] rs6000: Use rldimi for 64-bit constants with high=low (PR93012)

2020-02-06 Thread Segher Boessenkool
We currently use an (up to) five instruction sequence to generate such
constants.  After this change we just generate a 32-bit constant and do
a rotate-and-mask-insert instruction, making the sequence only up to
three instructions.

Tested on powerpc64-linux {-m32,-m64}; committing to trunk.


Segher


2020-02-06  Segher Boessenkool  

* config/rs6000/rs6000.c (rs6000_emit_set_long_const): Handle the case
where the low and the high 32 bits are equal to each other specially,
with an rldimi instruction.

gcc/testsuite/
* gcc.target/powerpc/pr93012.c: New.

---
 gcc/config/rs6000/rs6000.c |  9 +
 gcc/testsuite/gcc.target/powerpc/pr93012.c | 13 +
 2 files changed, 22 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr93012.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 7457956..f2516a8 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -9257,6 +9257,15 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
   gen_lowpart (SImode,
copy_rtx (temp;
 }
+  else if (ud1 == ud3 && ud2 == ud4)
+{
+  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
+  HOST_WIDE_INT num = (ud2 << 16) | ud1;
+  rs6000_emit_set_long_const (temp, (num ^ 0x8000) - 0x8000);
+  rtx one = gen_rtx_AND (DImode, temp, GEN_INT (0x));
+  rtx two = gen_rtx_ASHIFT (DImode, temp, GEN_INT (32));
+  emit_move_insn (dest, gen_rtx_IOR (DImode, one, two));
+}
   else if ((ud4 == 0x && (ud3 & 0x8000))
   || (ud4 == 0 && ! (ud3 & 0x8000)))
 {
diff --git a/gcc/testsuite/gcc.target/powerpc/pr93012.c 
b/gcc/testsuite/gcc.target/powerpc/pr93012.c
new file mode 100644
index 000..4f764d0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr93012.c
@@ -0,0 +1,13 @@
+/* PR target/93012 */
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2 -std=c99" } */
+
+unsigned long long msk66() { return 0xULL; }
+unsigned long long mskih() { return 0xabcd1234abcd1234ULL; }
+unsigned long long mskh0() { return 0x12341234ULL; }
+unsigned long long mskl0() { return 0xabcdabcdULL; }
+unsigned long long mskh1() { return 0x92349234ULL; }
+unsigned long long mskl1() { return 0x2bcd2bcdULL; }
+unsigned long long mskse() { return 0x12341234ULL; }
+
+/* { dg-final { scan-assembler-times {\mrldimi\M} 7 } } */
-- 
1.8.3.1



[PATCH] add -mvsx to pr92923-1.c test requiring vsx

2020-02-06 Thread will schmidt


Hi,
  The existing testcase pr92923-1.c uses vector long long, and thus
  requires vsx.
  OK for master?

Thanks,
-Will

[testsuite]
* testsuite/gcc.target/powerpc/pr92923-1.c: Add -mvsx.


diff --git a/gcc/testsuite/gcc.target/powerpc/pr92923-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr92923-1.c
index f901244..262f1a1 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr92923-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr92923-1.c
@@ -1,8 +1,8 @@
 /* { dg-do compile } */
-/* { dg-require-effective-target powerpc_altivec_ok } */
-/* { dg-options "-maltivec -O2 -fdump-tree-gimple" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-mvsx -O2 -fdump-tree-gimple" } */
 
 /* Verify that overloaded built-ins for "and", "andc", "nor", "or" and "xor"
do not produce VIEW_CONVERT_EXPR operations on their operands.  Like so:
 
   _1 = VIEW_CONVERT_EXPR<__vector signed int>(x);



[committed] aarch64: Add an and/ior-based movk pattern [PR87763]

2020-02-06 Thread Richard Sandiford
This patch adds a second movk pattern that models the instruction
as a "normal" and/ior operation rather than an insertion.  It fixes
the third insv_1.c failure in PR87763, which was a regression from
GCC 8.

Tested on aarch64-linux-gnu & pushed.

Richard


2020-02-04  Richard Sandiford  

gcc/
PR target/87763
* config/aarch64/aarch64-protos.h (aarch64_movk_shift): Declare.
* config/aarch64/aarch64.c (aarch64_movk_shift): New function.
* config/aarch64/aarch64.md (aarch64_movk): New pattern.

gcc/testsuite/
PR target/87763
* gcc.target/aarch64/movk_2.c: New test.
---
 gcc/config/aarch64/aarch64-protos.h   |  1 +
 gcc/config/aarch64/aarch64.c  | 24 +++
 gcc/config/aarch64/aarch64.md | 17 +
 gcc/testsuite/gcc.target/aarch64/movk_2.c | 78 +++
 4 files changed, 120 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movk_2.c

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 24cc65a383a..d29975a8921 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -560,6 +560,7 @@ bool aarch64_sve_float_mul_immediate_p (rtx);
 bool aarch64_split_dimode_const_store (rtx, rtx);
 bool aarch64_symbolic_address_p (rtx);
 bool aarch64_uimm12_shift (HOST_WIDE_INT);
+int aarch64_movk_shift (const wide_int_ref &, const wide_int_ref &);
 bool aarch64_use_return_insn_p (void);
 const char *aarch64_output_casesi (rtx *);
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 6581e4cb075..6a1b4099af1 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7895,6 +7895,30 @@ aarch64_movw_imm (HOST_WIDE_INT val, scalar_int_mode 
mode)
  || (val & (((HOST_WIDE_INT) 0x) << 16)) == val);
 }
 
+/* Test whether:
+
+ X = (X & AND_VAL) | IOR_VAL;
+
+   can be implemented using:
+
+ MOVK X, #(IOR_VAL >> shift), LSL #shift
+
+   Return the shift if so, otherwise return -1.  */
+int
+aarch64_movk_shift (const wide_int_ref &and_val,
+   const wide_int_ref &ior_val)
+{
+  unsigned int precision = and_val.get_precision ();
+  unsigned HOST_WIDE_INT mask = 0x;
+  for (unsigned int shift = 0; shift < precision; shift += 16)
+{
+  if (and_val == ~mask && (ior_val & mask) == ior_val)
+   return shift;
+  mask <<= 16;
+}
+  return -1;
+}
+
 /* VAL is a value with the inner mode of MODE.  Replicate it to fill a
64-bit (DImode) integer.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 90eebce85c0..9c1f17d0f85 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1282,6 +1282,23 @@ (define_insn "insv_imm"
   [(set_attr "type" "mov_imm")]
 )
 
+;; Match MOVK as a normal AND and IOR operation.
+(define_insn "aarch64_movk"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+   (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+ (match_operand:GPI 2 "const_int_operand"))
+(match_operand:GPI 3 "const_int_operand")))]
+  "aarch64_movk_shift (rtx_mode_t (operands[2], mode),
+  rtx_mode_t (operands[3], mode)) >= 0"
+  {
+int shift = aarch64_movk_shift (rtx_mode_t (operands[2], mode),
+   rtx_mode_t (operands[3], mode));
+operands[2] = gen_int_mode (UINTVAL (operands[3]) >> shift, SImode);
+operands[3] = gen_int_mode (shift, SImode);
+return "movk\\t%0, #%X2, lsl %3";
+  }
+)
+
 (define_expand "movti"
   [(set (match_operand:TI 0 "nonimmediate_operand")
(match_operand:TI 1 "general_operand"))]
diff --git a/gcc/testsuite/gcc.target/aarch64/movk_2.c 
b/gcc/testsuite/gcc.target/aarch64/movk_2.c
new file mode 100644
index 000..a0477ad5d42
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movk_2.c
@@ -0,0 +1,78 @@
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include 
+
+#define H3 ((uint64_t) 0x << 48)
+#define H2 ((uint64_t) 0x << 32)
+#define H1 ((uint64_t) 0x << 16)
+#define H0 ((uint64_t) 0x)
+
+/*
+** f1:
+** mov w0, w1
+** movkw0, #0x9876(?:, lsl #?0)?
+** ret
+*/
+uint32_t
+f1 (uint32_t dummy, uint32_t x)
+{
+  return (x & 0x) | 0x9876;
+}
+
+/*
+** f2:
+** movkw0, #0x1234, lsl #?16
+** ret
+*/
+uint32_t
+f2 (uint32_t x)
+{
+  return (x & 0x) | 0x1234;
+}
+
+/*
+** g1:
+** movkx0, #0x1234, lsl #?0
+** ret
+*/
+uint64_t
+g1 (uint64_t x)
+{
+  return (x & (H3 | H2 | H1)) | 0x1234;
+}
+
+/*
+** g2:
+** movkx0, #0x900e, lsl #?16
+** ret
+*/
+uint64_t
+g2 (uint64_t x)
+{
+  return (x & (H3 | H2 | H0)) | ((uint64_t) 0x900e << 16);
+}
+
+/*
+** g3:
+** movkx0, #0xee33, lsl #?32
+** ret
+*/
+uint64_t
+g3 (uint64_t x)
+{
+  return (x & (H3 | H1 | H0)) | ((uint64_t) 0xee33 << 32);
+}
+
+/*

[committed] aarch64: Add an extra sbfiz pattern [PR87763]

2020-02-06 Thread Richard Sandiford
This patch matches another form of sbfiz, in which the input
has DImode and the output has SImode.  It fixes a regression
in gcc.target/aarch64/lsl_asr_sbfiz.c from GCC 8.

Tested on aarch64-linux-gnu & pushed.

Richard


2020-02-04  Richard Sandiford  

gcc/
PR rtl-optimization/87763
* config/aarch64/aarch64.md (*ashiftsi_extvdi_bfiz): New pattern.
---
 gcc/config/aarch64/aarch64.md | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 4f5898185f5..90eebce85c0 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5771,6 +5771,21 @@ (define_insn "*ashift_extv_bfiz"
   [(set_attr "type" "bfx")]
 )
 
+(define_insn "*ashiftsi_extvdi_bfiz"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (ashift:SI
+ (match_operator:SI 4 "subreg_lowpart_operator"
+   [(sign_extract:DI
+  (match_operand:DI 1 "register_operand" "r")
+  (match_operand 2 "aarch64_simd_shift_imm_offset_si")
+  (const_int 0))])
+ (match_operand 3 "aarch64_simd_shift_imm_si")))]
+  "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
+1, GET_MODE_BITSIZE (SImode) - 1)"
+  "sbfiz\\t%w0, %w1, %3, %2"
+  [(set_attr "type" "bfx")]
+)
+
 ;; When the bit position and width of the equivalent extraction add up to 32
 ;; we can use a W-reg LSL instruction taking advantage of the implicit
 ;; zero-extension of the X-reg.


Re: [PATCH 3/3] libstdc++: Implement C++20 range adaptors

2020-02-06 Thread Jonathan Wakely

On 03/02/20 21:07 -0500, Patrick Palka wrote:

This patch implements [range.adaptors].  It also includes the changes from P3280
and P3278 and P3323, without which many standard examples won't work.

The implementation is mostly dictated by the spec and there was not much room
for implementation discretion.  The most interesting part that was not specified
by the spec is the design of the range adaptors and range adaptor closures,
which I tried to design in a way that minimizes boilerplate and statefulness (so
that e.g. the composition of two stateless closures is stateless).

What is left unimplemented is caching of calls to begin() in filter_view,
drop_view and reverse_view, which is required to guarantee that begin() has
amortized constant time complexity.  I can implement this in a subsequent patch.

"Interesting" parts of the patch are marked with XXX comments.





--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -39,6 +39,7 @@
#if __cpp_lib_concepts

#include 
+#include  // std::ref


Please use  instead.  is huge.


#include 
#include 
#include 
+
+namespace __detail
+{
+  struct _Empty { };
+} // namespace __detail
+
+namespace views
+{
+  template
+struct _RangeAdaptorClosure;
+
+  template
+struct _RangeAdaptor
+{
+protected:
+  [[no_unique_address]]
+   conditional_t,
+ _Callable, __detail::_Empty> _M_callable;
+
+public:
+  constexpr
+  _RangeAdaptor(const _Callable& = {})
+   requires is_default_constructible_v<_Callable>
+  { }
+
+  constexpr
+  _RangeAdaptor(_Callable __callable)


As mentioned on IRC, the non-explicit constructors here make me
nervous. I'd either like them to be explicit, or for these typesto be
in their own namespace so that there is never a reason to attempt
implicit conversions to them just because some function related to
them is found in an associated namespace.


+   requires (!is_default_constructible_v<_Callable>)
+   : _M_callable(std::move(__callable))
+  { }
+






+  template<__detail::__not_same_as _Tp>
+   requires convertible_to<_Tp, _Range&>
+ && requires { _S_fun(declval<_Tp>()); }
+   constexpr
+   ref_view(_Tp&& __t)
+ : _M_r(addressof(static_cast<_Range&>(std::forward<_Tp>(__t


This should be std-qualified to avoid ADL, and should use the internal
std::__addressof function (just to avoid the extra call from
std::addressof).


+  // XXX: the following algos are copied verbatim from ranges_algo.h to avoid a
+  // circular dependency with that header.


Ugh, that's unfortunate, but OK.

I guess we could put the body of the functions in new, unconstrained
functions, and then have the ones in  and these
call those, to reuse the implementations. But they're so small and
simple it's probably not worth it.


+  namespace __detail
+  {
+template _Sent,
+typename _Proj = identity,
+indirect_unary_predicate> _Pred>
+  constexpr _Iter
+  find_if(_Iter __first, _Sent __last, _Pred __pred, _Proj __proj = {})
+  {
+   while (__first != __last
+   && !(bool)std::__invoke(__pred, std::__invoke(__proj, *__first)))
+ ++__first;
+   return __first;
+  }
+
+template, _Proj>>
+  _Pred>
+  constexpr safe_iterator_t<_Range>
+  find_if(_Range&& __r, _Pred __pred, _Proj __proj = {})
+  {
+   return __detail::find_if(ranges::begin(__r), ranges::end(__r),
+std::move(__pred), std::move(__proj));
+  }


It looks like maybe we don't need this overload.


+template _Sent,
+typename _Proj = identity,
+indirect_unary_predicate> _Pred>
+  constexpr _Iter
+  find_if_not(_Iter __first, _Sent __last, _Pred __pred, _Proj __proj = {})
+  {
+   while (__first != __last
+   && (bool)std::__invoke(__pred, std::__invoke(__proj, *__first)))
+ ++__first;
+   return __first;
+  }
+
+template, _Proj>>
+  _Pred>
+  constexpr safe_iterator_t<_Range>
+  find_if_not(_Range&& __r, _Pred __pred, _Proj __proj = {})
+  {
+   return __detail::find_if_not(ranges::begin(__r), ranges::end(__r),
+std::move(__pred), std::move(__proj));
+  }


Nor this one.


+template>
+  _Comp = ranges::less>
+  constexpr const _Tp&
+  min(const _Tp& __a, const _Tp& __b, _Comp __comp = {}, _Proj __proj = {})
+  {
+   if (std::__invoke(std::move(__comp),
+ std::__invoke(__proj, __b),
+ std::__invoke(__proj, __a)))
+ return __b;
+   else
+ return __a;
+  }
+
+template
+  struct mismatch_result


It seems like these result types could definitely be shared (by
putting them in a new  header that both include). Duplicated
function templates isn't great, but not too bad. Duplicated class
templates means add

[committed] testsuite: Unify gcc.target/i386/memcpy scan strings.

2020-02-06 Thread Uros Bizjak
After -fno-common became the default, we can unify various
scan strings between 64bit and 32bit targets.

Tested on x86_64-linux-gnu {,-m32}.

2020-02-06  Uroš Bizjak  

* gcc.target/i386/memcpy-strategy-1.c (dg-final):
Unify scan-assembler strings for all targets.
* gcc.target/i386/memcpy-strategy-2.c (dg-final): Ditto.
* gcc.target/i386/memcpy-strategy-3.c (dg-final): Ditto.
* gcc.target/i386/memcpy-vector_loop-1.c (dg-final): Ditto.

Committed to mainline.

Uros.
diff --git a/gcc/testsuite/gcc.target/i386/memcpy-strategy-1.c 
b/gcc/testsuite/gcc.target/i386/memcpy-strategy-1.c
index 48d0b77da58..6ac80c91053 100644
--- a/gcc/testsuite/gcc.target/i386/memcpy-strategy-1.c
+++ b/gcc/testsuite/gcc.target/i386/memcpy-strategy-1.c
@@ -1,8 +1,7 @@
 /* { dg-do compile } */
 /* { dg-skip-if "" { *-*-* } { "-march=*" } { "-march=atom" } } */
 /* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:-1:align" } */
-/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! ia32 } } } } */
-/* { dg-final { scan-assembler-times "movdqa" 4 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movdqa" 8 } } */
 
 char a[2048];
 char b[2048];
diff --git a/gcc/testsuite/gcc.target/i386/memcpy-strategy-2.c 
b/gcc/testsuite/gcc.target/i386/memcpy-strategy-2.c
index 9e26ea996d7..c103896a110 100644
--- a/gcc/testsuite/gcc.target/i386/memcpy-strategy-2.c
+++ b/gcc/testsuite/gcc.target/i386/memcpy-strategy-2.c
@@ -1,8 +1,7 @@
 /* { dg-do compile } */
 /* { dg-skip-if "" { *-*-* } { "-march=*" } { "-march=atom" } } */
 /* { dg-options "-O2 -march=atom 
-mmemcpy-strategy=vector_loop:3000:align,libcall:-1:align" } */
-/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! ia32 } } } } */
-/* { dg-final { scan-assembler-times "movdqa" 4 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movdqa" 8 } } */
 
 char a[2048];
 char b[2048];
diff --git a/gcc/testsuite/gcc.target/i386/memcpy-strategy-3.c 
b/gcc/testsuite/gcc.target/i386/memcpy-strategy-3.c
index 11687e8c9b5..2d72155bb2e 100644
--- a/gcc/testsuite/gcc.target/i386/memcpy-strategy-3.c
+++ b/gcc/testsuite/gcc.target/i386/memcpy-strategy-3.c
@@ -1,9 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -march=atom 
-mmemcpy-strategy=vector_loop:2000:align,libcall:-1:align" } */
-/* On ELF platforms, one hit comes from the .file directive.  */
-/* { dg-final { scan-assembler-times "memcpy" 2 { target { ! *-*-darwin* } } } 
} */
-/* But not on Darwin, which doesn't have a .file directive by default.  */
-/* { dg-final { scan-assembler-times "_memcpy" 1  { target *-*-darwin* } } } */
+/* { dg-final { scan-assembler-times "call\[\\t \]*_?memcpy" 1 } } */
 
 char a[2048];
 char b[2048];
diff --git a/gcc/testsuite/gcc.target/i386/memcpy-vector_loop-1.c 
b/gcc/testsuite/gcc.target/i386/memcpy-vector_loop-1.c
index 113c876c324..93f428acc85 100644
--- a/gcc/testsuite/gcc.target/i386/memcpy-vector_loop-1.c
+++ b/gcc/testsuite/gcc.target/i386/memcpy-vector_loop-1.c
@@ -1,8 +1,7 @@
 /* { dg-do compile } */
 /* { dg-skip-if "" { *-*-* } { "-march=*" } { "-march=atom" } } */
 /* { dg-options "-O2 -march=atom -minline-all-stringops 
-mstringop-strategy=vector_loop" } */
-/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! ia32 } } } } */
-/* { dg-final { scan-assembler-times "movdqa" 4 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movdqa" 8 } } */
 
 char a[2048];
 char b[2048];


Re: [committed] x86: Emit "#" instead of calling gcc_unreachable for invalid insns.

2020-02-06 Thread Uros Bizjak
On Thu, Feb 6, 2020 at 6:07 PM Jakub Jelinek  wrote:
>
> On Thu, Feb 06, 2020 at 05:36:43PM +0100, Uros Bizjak wrote:
> > 2020-02-06  Uroš Bizjak  
> >
> > * config/i386/i386.md (*pushtf): Emit "#" instead of
> > calling gcc_unreachable in insn output.
> > (*pushxf): Ditto.
> > (*pushdf): Ditto.
> > (*pushsf_rex64): Ditto for alternatives other than 1.
> > (*pushsf): Ditto for alternatives other than 1.
> >
> > Committed to mainline.
> >
> > Uros.
>
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index 496a843..34649c010b8 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -3032,7 +3032,7 @@
> >"TARGET_64BIT || TARGET_SSE"
> >  {
> >/* This insn should be already split before reg-stack.  */
> > -  gcc_unreachable ();
> > +  return ("#");
>
> No need for those ()s around, just return "#"; would do.

Eh... will "fix".

> > @@ -3156,7 +3156,8 @@
> >"TARGET_64BIT"
> >  {
> >/* Anything else should be already split before reg-stack.  */
> > -  gcc_assert (which_alternative == 1);
> > +  if (which_alternative != 1)
> > +return ("#");
> >return "push{q}\t%q1";
>
> Shouldn't this be then
>   "@
>#
>push{q}\t%q1
>#"
> instead then?

I have considered this option, but choose the above. It just looks better.

Thanks,
Uros.


Re: [committed] x86: Emit "#" instead of calling gcc_unreachable for invalid insns.

2020-02-06 Thread Jakub Jelinek
On Thu, Feb 06, 2020 at 05:36:43PM +0100, Uros Bizjak wrote:
> 2020-02-06  Uroš Bizjak  
> 
> * config/i386/i386.md (*pushtf): Emit "#" instead of
> calling gcc_unreachable in insn output.
> (*pushxf): Ditto.
> (*pushdf): Ditto.
> (*pushsf_rex64): Ditto for alternatives other than 1.
> (*pushsf): Ditto for alternatives other than 1.
> 
> Committed to mainline.
> 
> Uros.

> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 496a843..34649c010b8 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -3032,7 +3032,7 @@
>"TARGET_64BIT || TARGET_SSE"
>  {
>/* This insn should be already split before reg-stack.  */
> -  gcc_unreachable ();
> +  return ("#");

No need for those ()s around, just return "#"; would do.

> @@ -3156,7 +3156,8 @@
>"TARGET_64BIT"
>  {
>/* Anything else should be already split before reg-stack.  */
> -  gcc_assert (which_alternative == 1);
> +  if (which_alternative != 1)
> +return ("#");
>return "push{q}\t%q1";

Shouldn't this be then
  "@
   #
   push{q}\t%q1
   #"
instead then?

> @@ -3169,7 +3170,8 @@
>"!TARGET_64BIT"
>  {
>/* Anything else should be already split before reg-stack.  */
> -  gcc_assert (which_alternative == 1);
> +  if (which_alternative != 1)
> +return ("#");
>return "push{l}\t%1";
>  }
>[(set_attr "type" "multi,push,multi")

Likewise.

Jakub



Re: [GCC][PATCH][AArch64] ACLE intrinsics bfmmla and bfmlal for AArch64 AdvSIMD

2020-02-06 Thread Richard Sandiford
Delia Burduv  writes:
> Sure, here it is. I'll do that for the other patch too.

Thanks, belatedly pushed as f78335df69993a900512f92324cab6a20b1bde0c.
Sorry for the delay.

Richard

>
> Thanks,
> Delia
>
> On 1/31/20 3:37 PM, Richard Sandiford wrote:
>> Delia Burduv  writes:
>>> Thank you, Richard!
>>>
>>> Here is the updated patch. The test that checks for errors when bf16 is
>>> disabled is in the bfcvt patch.
>> 
>> Looks good.  Just a couple of very minor things...
>> 
>>>
>>> Cheers,
>>> Delia
>>>
>>> gcc/ChangeLog:
>>>
>>> 2019-11-06  Delia Burduv  
>>>
>>>   * config/aarch64/aarch64-simd-builtins.def
>>>   (bfcvtn): New built-in function.
>>>   (bfcvtn_q): New built-in function.
>>>   (bfcvtn2): New built-in function.
>>>   (bfcvt): New built-in function.
>>>   * config/aarch64/aarch64-simd.md
>>>   (aarch64_bfcvtn): New pattern.
>>>   (aarch64_bfcvtn2v8bf): New pattern.
>>>   (aarch64_bfcvtbf): New pattern.
>>>   * config/aarch64/arm_bf16.h (float32_t): New typedef.
>>>   (vcvth_bf16_f32): New intrinsic.
>>>   * config/aarch64/arm_bf16.h (vcvt_bf16_f32): New intrinsic.
>>>   (vcvtq_low_bf16_f32): New intrinsic.
>>>   (vcvtq_high_bf16_f32): New intrinsic.
>>>   * config/aarch64/iterators.md (V4SF_TO_BF): New mode iterator.
>>>   (UNSPEC_BFCVTN): New UNSPEC.
>>>   (UNSPEC_BFCVTN2): New UNSPEC.
>>>   (UNSPEC_BFCVT): New UNSPEC.
>>>   * config/arm/types.md (bf_cvt): New type.
>> 
>> The patch no longer changes types.md. :-)
>> 
>>> diff --git 
>>> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c 
>>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
>>> new file mode 100644
>>> index 
>>> ..9feb7ee7905cb14037427a36797fc67a6fa3fbc8
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
>>> @@ -0,0 +1,67 @@
>>> +/* { dg-do assemble { target { aarch64*-*-* } } } */
>>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
>>> +/* { dg-add-options arm_v8_2a_bf16_neon } */
>>> +/* { dg-additional-options "-save-temps" } */
>>> +/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
>>> +
>>> +#include 
>>> +
>>> +/*
>>> +**test_bfmlalb:
>>> +**  bfmlalb\tv0.4s, v1.8h, v2.8h
>> 
>> This version uses \t while the previous one used literal tabs.
>> TBH I think the literal tab is nicer (and what we use for SVE FWIW).
>> 
>> OK with those changes, thanks.  Seems silly to ask when the changes
>> are so trivial, but: please could you post an updated patch so that
>> I can apply verbatim?
>> 
>> Richard
>> 


Re: [GCC][PATCH][ARM] Regenerate arm-tables.opt for Armv8.1-M patch

2020-02-06 Thread Kyrill Tkachov



On 2/3/20 5:18 PM, Mihail Ionescu wrote:

Hi all,

I've regenerated arm-tables.opt in config/arm to replace the improperly
generated arm-tables.opt file from "[PATCH, GCC/ARM, 2/10] Add command
line support for Armv8.1-M Mainline" 
(9722215a027b68651c3c7a8af9204d033197e9c0).



2020-02-03  Mihail Ionescu  

    * config/arm/arm-tables.opt: Regenerate.

Ok for trunk?



Ok. I would consider it obvious too.

Thanks,

Kyrill




Regards,
Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index 
f295a4cffa2bbb3f8163fb9cef784b5af59aee12..a51a131505d184f120a3cfc51273b419bb0cb103 
100644

--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -353,13 +353,16 @@ EnumValue
 Enum(arm_arch) String(armv8-m.main) Value(28)

 EnumValue
-Enum(arm_arch) String(armv8.1-m.main) Value(29)
+Enum(arm_arch) String(armv8-r) Value(29)

 EnumValue
-Enum(arm_arch) String(iwmmxt) Value(30)
+Enum(arm_arch) String(armv8.1-m.main) Value(30)

 EnumValue
-Enum(arm_arch) String(iwmmxt2) Value(31)
+Enum(arm_arch) String(iwmmxt) Value(31)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt2) Value(32)

 Enum
 Name(arm_fpu) Type(enum fpu_type)



Re: [GCC][PATCH][ARM] Set profile to M for Armv8.1-M

2020-02-06 Thread Kyrill Tkachov



On 2/4/20 1:49 PM, Christophe Lyon wrote:
On Mon, 3 Feb 2020 at 18:20, Mihail Ionescu 
 wrote:

>
> Hi,
>
> We noticed that the profile for armv8.1-m.main was not set in 
arm-cpus.in

> , which led to TARGET_ARM_ARCH_PROFILE and _ARM_ARCH_PROFILE not being
> defined properly.
>
>
>
> gcc/ChangeLog:
>
> 2020-02-03  Mihail Ionescu 
>
> * config/arm/arm-cpus.in: Set profile M
> for armv8.1-m.main.
>
>
> Ok for trunk?
>
> Regards,
> Mihail
>
>
> ### Attachment also inlined for ease of reply    
###

>
>
> diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
> index 
1805b2b1cd8d6f65a967b4e3945257854a7e0fc1..96f584da325172bd1460251e2de0ad679589d312 
100644

> --- a/gcc/config/arm/arm-cpus.in
> +++ b/gcc/config/arm/arm-cpus.in
> @@ -692,6 +692,7 @@ begin arch armv8.1-m.main
>   tune for cortex-m7
>   tune flags CO_PROC
>   base 8M_MAIN
> + profile M
>   isa ARMv8_1m_main
>  # fp => FPv5-sp-d16; fp.dp => FPv5-d16
>   option dsp add armv7em
>

I'm wondering whether this is obvious?
OTOH, what's the impact of missing this (or why didn't we notice the
problem via a failing testcase?)


It's only used to set the __ARM_ARCH_PROFILE macro in arm-c.c

I do agree that the patch is obvious, so go ahead and commit this 
please, Mihail.


Thanks,

Kyrill





Christophe


[committed] x86: Emit "#" instead of calling gcc_unreachable for invalid insns.

2020-02-06 Thread Uros Bizjak
Implement standard approach by emitting "#" for insns that have to be split.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

2020-02-06  Uroš Bizjak  

* config/i386/i386.md (*pushtf): Emit "#" instead of
calling gcc_unreachable in insn output.
(*pushxf): Ditto.
(*pushdf): Ditto.
(*pushsf_rex64): Ditto for alternatives other than 1.
(*pushsf): Ditto for alternatives other than 1.

Committed to mainline.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 496a843..34649c010b8 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3032,7 +3032,7 @@
   "TARGET_64BIT || TARGET_SSE"
 {
   /* This insn should be already split before reg-stack.  */
-  gcc_unreachable ();
+  return ("#");
 }
   [(set_attr "isa" "*,x64")
(set_attr "type" "multi")
@@ -3087,7 +3087,7 @@
   ""
 {
   /* This insn should be already split before reg-stack.  */
-  gcc_unreachable ();
+  return ("#");
 }
   [(set_attr "isa" "*,*,*,nox64,x64")
(set_attr "type" "multi")
@@ -3123,7 +3123,7 @@
   ""
 {
   /* This insn should be already split before reg-stack.  */
-  gcc_unreachable ();
+  return ("#");
 }
   [(set_attr "isa" "*,nox64,nox64,nox64,x64,sse2")
(set_attr "type" "multi")
@@ -3156,7 +3156,8 @@
   "TARGET_64BIT"
 {
   /* Anything else should be already split before reg-stack.  */
-  gcc_assert (which_alternative == 1);
+  if (which_alternative != 1)
+return ("#");
   return "push{q}\t%q1";
 }
   [(set_attr "type" "multi,push,multi")
@@ -3169,7 +3170,8 @@
   "!TARGET_64BIT"
 {
   /* Anything else should be already split before reg-stack.  */
-  gcc_assert (which_alternative == 1);
+  if (which_alternative != 1)
+return ("#");
   return "push{l}\t%1";
 }
   [(set_attr "type" "multi,push,multi")


Re: [PATCH 2/3] libstdc++: Implement C++20 constrained algorithms

2020-02-06 Thread Jonathan Wakely

On 03/02/20 21:07 -0500, Patrick Palka wrote:

+#ifndef _RANGES_ALGO_H
+#define _RANGES_ALGO_H 1
+
+#if __cplusplus > 201703L
+
+#include 
+#include 
+#include 
+// #include 


This line could be removed, or leave it as a reminder to me to
refactor  so that the small utility pieces are in a small
utility header (like  that can be included
instead of the whole of .


+#include 
+#include 
+#include  // __is_byte
+#include  // concept uniform_random_bit_generator


I wonder if we want to move that concept to 
instead, which already exists to allow  to avoid including
the whole of . If we do that, it would make sense to rename
 to  or something like
that.


+
+#if __cpp_lib_concepts
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+namespace ranges
+{
+  namespace __detail
+  {
+template
+constexpr inline bool __is_normal_iterator = false;


All these templates in the __detail namespace should be indented by
two spaces after the template-head i.e.

template
  constexpr inline bool __is_normal_iterator = false;

(That indentation scheme has been in the libstdc++ style guide for
longer than I've been contributing to the project, but it doesn't seem
very popular with new contributors, and it wastes a level of
indentation for templates, which means most of the library. Maybe we
should revisit that convention.)



+  template
+using unary_transform_result = copy_result<_Iter, _Out>;
+
+  template _Sent,
+  weakly_incrementable _Out,
+  copy_constructible _Fp, typename _Proj = identity>
+requires writable<_Out, indirect_result_t<_Fp&, projected<_Iter, _Proj>>>


I have a pending patch to implement P1878R1, which renames writable
(and a few other concepts). I'll wait until your patch is in, and
change these places using it.


+partial_sort_copy(_Iter1 __first, _Sent1 __last,
+ _Iter2 __result_first, _Sent2 __result_last,
+ _Comp __comp = {},
+ _Proj1 __proj1 = {}, _Proj2 __proj2 = {})
+{
+  if (__result_first == __result_last)
+   {
+ // TODO: Eliminating the variable __lasti triggers an ICE.
+ auto __lasti = ranges::next(std::move(__first),
+ std::move(__last));
+ return {std::move(__lasti), std::move(__result_first)};


Please try to reduce that and report it to bugzilla at some point,
thanks.


+++ b/libstdc++-v3/testsuite/25_algorithms/all_of/constrained.cc
@@ -0,0 +1,90 @@
+// Copyright (C) 2019 Free Software Foundation, Inc.


This should be 2020. That's the only change necessary though, please
adjust that and commit to master. Great work, thank you!



Re: [PATCH] c++: Handle CONSTRUCTORs without indexes in find_array_ctor_elt [PR93549]

2020-02-06 Thread Jakub Jelinek
On Thu, Feb 06, 2020 at 10:38:25AM -0500, Jason Merrill wrote:
> > I don't know, can try to add some instrumentation and do bootstrap/regtest
> > with it.  The handling of the CONSTRUCTORs with missing or present or mixed
> > indexes is what I found in various middle-end routines.
> > The only thing I see in our verifiers is that in GIMPLE function bodies,
> > we don't allow non-VECTOR_TYPE CONSTRUCTORs with any elements, and for
> > VECTOR_TYPE CONSTRUCTORs we require that indexes are NULL for elements with
> > VECTOR_TYPE and for others require that it is either NULL or INTEGER_CST
> > matching the position (so effectively for those direct access is still
> > possible).
> >
> 
> Where are these verifiers?  I'm not finding them.

tree-cfg.c (verify_gimple_assign_single).
Though, we don't really have verifiers for initializers of global variables,
guess it would need to be called somewhere from varpool_node::assemble_decl
or so (or other varpool method or multiple of them).

Jakub



Re: [PATCH], PR target/93569, Fix PowerPC vsx-builtin-15d.c test case

2020-02-06 Thread Segher Boessenkool
Hi!

On Thu, Feb 06, 2020 at 08:29:41AM -0500, Michael Meissner wrote:
> --- /tmp/eAu61F_rs6000.c  2020-02-05 18:08:48.698992017 -0500
> +++ gcc/config/rs6000/rs6000.c2020-02-05 17:23:55.733650185 -0500
> @@ -24943,9 +24943,13 @@ reg_to_non_prefixed (rtx reg, machine_mo
>  }
>  
>/* Altivec registers use DS-mode for scalars, and DQ-mode for vectors, IEEE
> - 128-bit floating point, and 128-bit integers.  */
> + 128-bit floating point, and 128-bit integers.  Before power9, only 
> indexed
> + addressing was available.  */
>else if (ALTIVEC_REGNO_P (r))
>  {
> +  if (!TARGET_P9_VECTOR)
> + return NON_PREFIXED_X;
> +
>if (mode == SFmode || size == 8 || FLOAT128_2REG_P (mode))
>   return NON_PREFIXED_DS;

That looks fine, but is this complete?  What about the other VSRs?  Like
right before this:

  if (FP_REGNO_P (r))
{
  if (mode == SFmode || size == 8 || FLOAT128_2REG_P (mode))
return NON_PREFIXED_D;

  else if (size < 8)
return NON_PREFIXED_X;

  else if (TARGET_VSX && size >= 16
   && (VECTOR_MODE_P (mode)
   || FLOAT128_VECTOR_P (mode)
   || mode == TImode || mode == CTImode))
return NON_PREFIXED_DQ;

  else
return NON_PREFIXED_DEFAULT;
}

If we are dealing with a SF or DF (or whatever else in a "legacy" FPR),
that is fine, but what about vectors in those regs?  It says we can use
DQ-mode here, but that is only true from p9 onward, no?


Segher


Re: [PATCH] c++: Handle CONSTRUCTORs without indexes in find_array_ctor_elt [PR93549]

2020-02-06 Thread Jason Merrill
On Thu, Feb 6, 2020 at 7:02 AM Jakub Jelinek  wrote:

> On Wed, Feb 05, 2020 at 01:31:30PM -0500, Jason Merrill wrote:
> > > from the constexpr new change apparently broke the following testcase.
> > > When handling COND_EXPR, we build_vector_from_val, however as the
> argument we
> > > pass to it is not an INTEGER_CST/REAL_CST, but that wrapped in a
> > > NON_LVALUE_EXPR location wrapper, we end up with a CONSTRUCTOR and as
> it is
> > > middle-end that builds it, it doesn't bother with indexes.  The
> > > cp_fully_fold_init call used to fold it into VECTOR_CST in the past,
> but as
> > > we intentionally don't invoke it anymore as it might fold away
> something
> > > that needs to be diagnosed during constexpr evaluation, we end up
> evaluating
> > > ARRAY_REF into the index-less CONSTRUCTOR.  The following patch fixes
> the
> > > ICE by teaching find_array_ctor_elt to handle CONSTRUCTORs without
> indexes
> > > (that itself could be still very efficient) and CONSTRUCTORs with some
> > > indexes present and others missing (the rules are that if the index on
> the
> > > first element is missing, then it is the array's lowest index (in
> C/C++ 0)
> > > and if other indexes are missing, they are the index of the previous
> element
> > > + 1).
> >
> > Is it currently possible to get a CONSTRUCTOR with non-init-list type
> that
> > has some indexes present and others missing?  Other than from the new
> code
> > in your patch that sets some indexes?
>
> I don't know, can try to add some instrumentation and do bootstrap/regtest
> with it.  The handling of the CONSTRUCTORs with missing or present or mixed
> indexes is what I found in various middle-end routines.
> The only thing I see in our verifiers is that in GIMPLE function bodies,
> we don't allow non-VECTOR_TYPE CONSTRUCTORs with any elements, and for
> VECTOR_TYPE CONSTRUCTORs we require that indexes are NULL for elements with
> VECTOR_TYPE and for others require that it is either NULL or INTEGER_CST
> matching the position (so effectively for those direct access is still
> possible).
>

Where are these verifiers?  I'm not finding them.


> The question might not be just what we do emit right now, but also what
> we'd
> like to emit in the future, because as has been noted several times, for
> large initializers those explicit indexes consume huge amounts of memory.
> In C with designated initializers, I can see us not emitting indexes from
> the start because we'd want to avoid the memory overhead for normal
> sequential initializers, but then much later we can find a designated
> initializer that wants to skip over some elements and thus add an index at
> that point (or range designator for which we want RANGE_EXPR); shall we add
> indexes to all elements at that point?
>

That sounds right to me.  Until we have the linear range start marker you
suggested above.


> In C++, I think we don't allow non-useless array designated initializers,
> so
> there is no way to skip elements using that or go backwards, but still,
> don't we emit RANGE_EXPRs if we see the same initializer for many elements?
>

Yes, though only for omitted initializers where {}-initialization is
different from zero-initialization; we don't currently combine explicit
initializers.


> I guess right now we emit indexes for all elements for those, but if we
> choose to optimize?
>
> > Is it unreasonable to assume that if the first element has no index,
> none of
> > the elements do?
>
> Not sure, see above.  Depends on what we want to guarantee.
>

If we go back and fill in indexes when we see something more complicated,
we could enforce this in the verifiers.


> > > +   else if (i == j + (middle - begin))
> > > + {
> > > +   (*elts)[middle].index = dindex;
> >
> > Why set this index?
>
> Because the caller asserts or relies that it has one.
>   constructor_elt *cep = NULL;
>   if (code == ARRAY_TYPE)
> {
>   HOST_WIDE_INT i
> = find_array_ctor_elt (*valp, index, /*insert*/true);
>   gcc_assert (i >= 0);
>   cep = CONSTRUCTOR_ELT (*valp, i);
>   gcc_assert (TREE_CODE (cep->index) != RANGE_EXPR);
>

Let's change this assert to allow null index.


>
> Now, ATM we are aware of just small CONSTRUCTORs that can appear this way
> (VECTOR_TYPE and so generally not too many elements in real-world
> testcases), so if you prefer, the function when seeing NULL index could
> just
> add indexes to all elements and retry and defer deciding if and how we
> optimize large constructors for later.
>
> Jakub
>
>


Re: [PATCH, rs6000]: mark clobber for registers changed by untpyed_call

2020-02-06 Thread Segher Boessenkool
Hi!

On Thu, Feb 06, 2020 at 10:49:36AM +0800, Jiufu Guo wrote:
> >   emit_call_insn (gen_call (operands[0], const0_rtx, const0_rtx));
> >
> >   for (i = 0; i < XVECLEN (operands[2], 0); i++)
> > {
> >   rtx set = XVECEXP (operands[2], 0, i);
> >   emit_move_insn (SET_DEST (set), SET_SRC (set));
> > }
> >
> > ... and nothing in the rtl stream says that those return registers are
> > actually set by that call.  Maybe we should use gen_call_value?  Can we
> > ever be asked to return more than a single thing here?
> I was also thinking about using "gen_call_value" or "emit_clobber (r3)"
> which could generate rtl: "%3:DI=call [foo]" or "call [foo]; clobber
> r3".  This could tell optimizer that %3 is changed.

The problem with "call ; clobber r3" is that some set+use of a pseudo can
be moved between these, and then rnreg can rename that to r3 again.  We
really need to show the call sets r3, in the general case (or that r3 is
live after the call, at least).

> While there are
> potential issues that untyped_call may change other registers.  So, mark
> clobber for all touched registers maybe more safe.

Well, we can derive what registers it sets, perhaps?  What does x86 do
here?  It does something, I know that, haven't looked much deeper yet
though :-)

In general: this is not a problem for us only; some other archs may have
found a good solution already.


Segher


Re: Is machine_name fix still needed?

2020-02-06 Thread Segher Boessenkool
Hi!

On Thu, Feb 06, 2020 at 11:26:23AM -0300, Matheus Castanho wrote:
> I recently faced problems while building GCC caused by a system header
> being broken by the machine_name fix from fixincludes [1]. And
> apparently I am not the first one [2][3].
> 
> After digging into the fixincludes code, I found the following comment
> on fixincludes/fixinc.in:
> 
> > # # # # # # # # # # # # # # # # # # # # #
> > #
> > #  Check to see if the machine_name fix needs to be disabled.
> > #
> > #  On some platforms, machine_name doesn't work properly and
> > #  breaks some of the header files.  Since everything works
> > #  properly without it, just wipe the macro list to
> > #  disable the fix.
> 
> Indeed adding the target to the case that follows this comment did the
> trick for me:
> 
> diff --git a/fixincludes/fixinc.in b/fixincludes/fixinc.in
> index 15cbaa23544..c0791454b9c 100755
> --- a/fixincludes/fixinc.in
> +++ b/fixincludes/fixinc.in
> @@ -136,6 +136,9 @@ fi
>  #  disable the fix.
> 
>  case "${target_canonical}" in
> +powerpc*-*-linux-*)
> + test -f ${MACRO_LIST} &&  echo > ${MACRO_LIST}
> +;;
>  *-*-vxworks*)
>   test -f ${MACRO_LIST} &&  echo > ${MACRO_LIST}
>  ;;
> -- 
> 2.21.1
> 
> I haven't noticed any harm done by disabling this fix for my specific
> case. From what I could understand from the code and from the diffs of
> the 'fixed' headers, this fix just surrounds some identifiers with '__'
> on #ifdef/#define/#include lines.
> 
> Since even the source code states 'everything works properly without
> it', is this fix still really needed? Would it make any sense to disable
> machine_name by default for other targets upstream?
> 
> Note I'm not particularly familiar with the historical need for this, my
> question is based specially on the comment quoted above and my empirical
> tests.
> 
> [1] https://gcc.gnu.org/ml/gcc-help/2020-02/msg00023.html
> [2] https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01901.html
> [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91085

Thanks for the investigation and patch!

If no one comes up with a better suggestion soon, I'll apply your patch
so that your AT builds work again.  (I'll write a changelog etc.)


Segher


[PATCH] middle-end/93519 - avoid folding stmts in obviously unreachable code

2020-02-06 Thread Richard Biener
The inliner folds stmts delayed, the following arranges things so
to not fold stmts that are obviously not reachable to avoid warnings
from those code regions.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

2020-02-06  Richard Biener  

PR middle-end/93519
* tree-inline.c (fold_marked_statements): Do a PRE walk,
skipping unreachable regions.
(optimize_inline_calls): Skip folding stmts when we didn't
inline.

* gcc.dg/Wrestrict-21.c: New testcase.
---
 gcc/testsuite/gcc.dg/Wrestrict-21.c |  18 +++
 gcc/tree-inline.c   | 195 
 2 files changed, 133 insertions(+), 80 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/Wrestrict-21.c

diff --git a/gcc/testsuite/gcc.dg/Wrestrict-21.c 
b/gcc/testsuite/gcc.dg/Wrestrict-21.c
new file mode 100644
index 000..e300663758e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wrestrict-21.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wrestrict" } */
+
+static char *
+str_numth(char *dest, char *num, int type)
+{
+  if (dest != num)
+__builtin_strcpy(dest, num); /* { dg-bogus "is the same" } */
+  __builtin_strcat(dest, "foo");
+  return dest;
+}
+
+void
+DCH_to_char(char *in, char *out, int collid)
+{
+  char *s = out;
+  str_numth(s, s, 42);
+}
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 5b0050a53d2..19154bb843e 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -5261,86 +5261,118 @@ static void
 fold_marked_statements (int first, hash_set *statements)
 {
   auto_bitmap to_purge;
-  for (; first < last_basic_block_for_fn (cfun); first++)
-if (BASIC_BLOCK_FOR_FN (cfun, first))
-  {
-gimple_stmt_iterator gsi;
 
-   for (gsi = gsi_start_bb (BASIC_BLOCK_FOR_FN (cfun, first));
-!gsi_end_p (gsi);
-gsi_next (&gsi))
- if (statements->contains (gsi_stmt (gsi)))
-   {
- gimple *old_stmt = gsi_stmt (gsi);
- tree old_decl
-   = is_gimple_call (old_stmt) ? gimple_call_fndecl (old_stmt) : 0;
+  auto_vec stack (n_basic_blocks_for_fn (cfun) + 2);
+  auto_sbitmap visited (last_basic_block_for_fn (cfun));
+  bitmap_clear (visited);
+
+  stack.quick_push (ei_start (ENTRY_BLOCK_PTR_FOR_FN (cfun)->succs));
+  while (!stack.is_empty ())
+{
+  /* Look at the edge on the top of the stack.  */
+  edge_iterator ei = stack.last ();
+  basic_block dest = ei_edge (ei)->dest;
+  edge known_taken;
+
+  if (dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
+ && !bitmap_bit_p (visited, dest->index)
+ /* Avoid walking unreachable edges, the iteration scheme
+using edge iterators doesn't allow to not push them so
+ignore them here instead (FIXME: use an edge flag at least?).  */
+ && !((known_taken = find_taken_edge (ei_edge (ei)->src, NULL_TREE))
+  && known_taken != ei_edge (ei)))
+   {
+ bitmap_set_bit (visited, dest->index);
 
- if (old_decl && fndecl_built_in_p (old_decl))
-   {
- /* Folding builtins can create multiple instructions,
-we need to look at all of them.  */
- gimple_stmt_iterator i2 = gsi;
- gsi_prev (&i2);
- if (fold_stmt (&gsi))
-   {
- gimple *new_stmt;
- /* If a builtin at the end of a bb folded into nothing,
-the following loop won't work.  */
- if (gsi_end_p (gsi))
-   {
- cgraph_update_edges_for_call_stmt (old_stmt,
-old_decl, NULL);
- break;
-   }
- if (gsi_end_p (i2))
-   i2 = gsi_start_bb (BASIC_BLOCK_FOR_FN (cfun, first));
- else
-   gsi_next (&i2);
- while (1)
-   {
- new_stmt = gsi_stmt (i2);
- update_stmt (new_stmt);
- cgraph_update_edges_for_call_stmt (old_stmt, old_decl,
-new_stmt);
+ if (dest->index >= first)
+   for (gimple_stmt_iterator gsi = gsi_start_bb (dest);
+!gsi_end_p (gsi); gsi_next (&gsi))
+ {
+   if (!statements->contains (gsi_stmt (gsi)))
+ continue;
 
- if (new_stmt == gsi_stmt (gsi))
-   {
- /* It is okay to check only for the very last
-of these statements.  If it is a throwing
-statement nothing will change.  If it isn't
-this can remove EH edges.  If that weren't
-  

Re: [PR47785] COLLECT_AS_OPTIONS

2020-02-06 Thread Prathamesh Kulkarni
On Thu, 6 Feb 2020 at 18:42, Richard Biener  wrote:
>
> On Thu, Feb 6, 2020 at 1:48 PM Prathamesh Kulkarni
>  wrote:
> >
> > On Tue, 4 Feb 2020 at 19:44, Richard Biener  
> > wrote:
> > >
> > > On Mon, Feb 3, 2020 at 12:37 PM Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Thu, 30 Jan 2020 at 19:10, Richard Biener 
> > > >  wrote:
> > > > >
> > > > > On Thu, Jan 30, 2020 at 5:31 AM Prathamesh Kulkarni
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, 28 Jan 2020 at 17:17, Richard Biener 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Fri, Jan 24, 2020 at 7:04 AM Prathamesh Kulkarni
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Mon, 20 Jan 2020 at 15:44, Richard Biener 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Jan 8, 2020 at 11:20 AM Prathamesh Kulkarni
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, 5 Nov 2019 at 17:38, Richard Biener 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Nov 5, 2019 at 12:17 AM Kugan Vivekanandarajah
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi,
> > > > > > > > > > > > Thanks for the review.
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, 5 Nov 2019 at 03:57, H.J. Lu 
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Sun, Nov 3, 2019 at 6:45 PM Kugan Vivekanandarajah
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the reviews.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Sat, 2 Nov 2019 at 02:49, H.J. Lu 
> > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:33 PM Kugan 
> > > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, 30 Oct 2019 at 03:11, H.J. Lu 
> > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Sun, Oct 27, 2019 at 6:33 PM Kugan 
> > > > > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks for the review.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Wed, 23 Oct 2019 at 23:07, Richard 
> > > > > > > > > > > > > > > > > > Biener  wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Mon, Oct 21, 2019 at 10:04 AM Kugan 
> > > > > > > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks for the pointers.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Fri, 11 Oct 2019 at 22:33, Richard 
> > > > > > > > > > > > > > > > > > > > Biener  
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Fri, Oct 11, 2019 at 6:15 AM Kugan 
> > > > > > > > > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > > > > > > > > >  
> > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > > > > > > > > > > Thanks for the review.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Wed, 2 Oct 2019 at 20:41, 
> > > > > > > > > > > > > > > > > > > > > > Richard Biener 
> > > > > > > > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 2, 2019 at 10:39 AM 
> > > > > > > > > > > > > > > > > > > > > > > Kugan Vivekanandarajah
> > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > As mentioned in the PR, 
> > > > > > > > > > > > > > > > > > > > > > > > attached patch adds 
> > > > > > > > > > > > > > > > > > > > > > > > COLLECT_AS_OPTIONS for
> > > > > > > > > > > > > > > > > > > > > > > > passing assembler options 
> > > > > > > > > > > > > > > > > > > > > > > > specified with -Wa, to the 
> > > > > > > > > > > > > > > > > > > > > > > > link-time driver.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > The proposed solution only 
> > > > > > > > > > > > > > > > > > > > > > > > works for uniform -Wa options 
>

Re: [PATCH] libstdc++: Optimize C++20 comparison category types

2020-02-06 Thread Jonathan Wakely

On 06/02/20 13:53 +, Jonathan Wakely wrote:

On 06/02/20 13:40 +, Jonathan Wakely wrote:

This reduces sizeof(std::partial_ordering) and optimizes conversion and
comparison operators to avoid conditional branches where possible.

* libsupc++/compare (__cmp_cat::_Ncmp::unordered): Change value to 2.
(partial_ordering::_M_is_ordered): Remove data member.
(partial_ordering): Use second bit of _M_value for unordered. Adjust
comparison operators.
(weak_ordering::operator partial_ordering): Simplify to remove
branches.
(operator<=>(unspecified, weak_ordering)): Likewise.
(strong_ordering::operator partial_ordering): Likewise.
(strong_ordering::operator weak_ordering): Likewise.
(operator<=>(unspecified, strong_ordering)): Likewise.
* testsuite/18_support/comparisons/categories/partialord.cc: New test.
* testsuite/18_support/comparisons/categories/strongord.cc: New test.
* testsuite/18_support/comparisons/categories/weakord.cc: New test.

Tested powerpc64le-linux and x86_64-linux.

This is an ABI change for the partial_ordering type, but that is why I
think we should do it now, not after GCC 10 is released. The sooner
the better, before these types are being widely used.

I plan to commit this in the next 12 hours or so, unless there are
(valid :-) objections.

Thanks to Barry Revzin for pointing out there was room for these
operators to be improved.


We could also change the int _M_value data member of all three
comparison category types to be a signed char instead of int. That
would reduce the size further.


Or maybe std::int_fast8_t is the right type here.


It probably doesn't matter for most uses, only when one of the types
is used as a data member and the smaller type would allow a more
compact layout. I'm not sure how common such uses will be, but I
suppose it's plausible somebody could have a function returning a
std::tuple which would benefit.

Anybody want to argue for or against making them 8 bits?






Is machine_name fix still needed?

2020-02-06 Thread Matheus Castanho
Hi,

I recently faced problems while building GCC caused by a system header
being broken by the machine_name fix from fixincludes [1]. And
apparently I am not the first one [2][3].

After digging into the fixincludes code, I found the following comment
on fixincludes/fixinc.in:

> # # # # # # # # # # # # # # # # # # # # #
> #
> #  Check to see if the machine_name fix needs to be disabled.
> #
> #  On some platforms, machine_name doesn't work properly and
> #  breaks some of the header files.  Since everything works
> #  properly without it, just wipe the macro list to
> #  disable the fix.

Indeed adding the target to the case that follows this comment did the
trick for me:

diff --git a/fixincludes/fixinc.in b/fixincludes/fixinc.in
index 15cbaa23544..c0791454b9c 100755
--- a/fixincludes/fixinc.in
+++ b/fixincludes/fixinc.in
@@ -136,6 +136,9 @@ fi
 #  disable the fix.

 case "${target_canonical}" in
+powerpc*-*-linux-*)
+   test -f ${MACRO_LIST} &&  echo > ${MACRO_LIST}
+;;
 *-*-vxworks*)
test -f ${MACRO_LIST} &&  echo > ${MACRO_LIST}
 ;;
-- 
2.21.1

I haven't noticed any harm done by disabling this fix for my specific
case. From what I could understand from the code and from the diffs of
the 'fixed' headers, this fix just surrounds some identifiers with '__'
on #ifdef/#define/#include lines.

Since even the source code states 'everything works properly without
it', is this fix still really needed? Would it make any sense to disable
machine_name by default for other targets upstream?

Note I'm not particularly familiar with the historical need for this, my
question is based specially on the comment quoted above and my empirical
tests.

[1] https://gcc.gnu.org/ml/gcc-help/2020-02/msg00023.html
[2] https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01901.html
[3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91085

Cheers,
Matheus Castanho


rs6000: Correct documentation for __builtin_mtfsf

2020-02-06 Thread Bill Schmidt

Hi,

PR93570 reports that the documentation shows __builtin_mtfsf to return a double,
but that is incorrect.  The return signature should be void.  Corrected herein.

Built on powerpc64le-unknown-linux-gnu and verified correct PDF output.  
Committed
as obvious.

Thanks!
Bill


2020-02-06  Bill Schmidt  

PR target/93570
* doc/extend.texi (Basic PowerPC Built-in Functions): Correct
prototype for __builtin_mtfsf.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ec99c38a607..5739063b330 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -17166,7 +17166,7 @@ unsigned long __builtin_ppc_mftb ();
 double __builtin_unpack_ibm128 (__ibm128, int);
 __ibm128 __builtin_pack_ibm128 (double, double);
 double __builtin_mffs (void);
-double __builtin_mtfsf (const int, double);
+void __builtin_mtfsf (const int, double);
 void __builtin_mtfsb0 (const int);
 void __builtin_mtfsb1 (const int);
 void __builtin_set_fpscr_rn (int);



Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse

2020-02-06 Thread Segher Boessenkool
On Wed, Feb 05, 2020 at 11:48:23AM -0700, Jeff Law wrote:
> Yea, it's closely related.  In your case you need to effectively ignore
> the nop insn to get the optimization you want.  In mine that nop insn
> causes an ICE.
> 
> I think we could take your cse bits + adding a !CALL_P separately from
> the simplify-rtx stuff which Segher objected to.  THat'd likely solve
> the ARM ICEs and take you a tiny step forward on optimizing that SVE
> case.  Thoughts?

CSE should consistently keep track of what insns are no-op moves (in its
definition, all passes have a slightly different definition of this),
and use that everywhere consistently.

(Or we should rewrite CSE).


Segher


Re: [PATCH] Revert mangling of names with -fprofile-generate=.

2020-02-06 Thread Jan Hubicka
> On 2/6/20 2:26 PM, Jan Hubicka wrote:
> > > Hi.
> > > 
> > > The patch reverts mangling of filenames due to file
> > > length limitation. Creation of a folder tree seems fine
> > > in context of PGO.
> > > 
> > > Ready for master?
> > > Thanks,
> > > Martin
> > > 
> > > gcc/ChangeLog:
> > > 
> > > 2020-02-06  Martin Liska  
> > > 
> > >   PR gcov-profile/91971
> > >   PR gcov-profile/93466
> > >   * coverage.c (coverage_init): Revert mangling of
> > >   path into filename.  It can lead to huge filename length.
> > >   Creation of subfolders seem more natural.
> > 
> > This does make sense to me - the overly long filenames looked like a bad
> > move to me.  But what was motivation for introducing them at first
> > place?
> 
> The motivation is described here:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91971#c0
> 
> Well, the original issue is not a fundamental problem.
> I'm going to install the patch.
In the light of our discussion about prefix stripping, I think it is
actually an intended behaviour :)
So agreed, we can drop #s. Thanks!

Honza
> 
> Martin


Re: [PATCH] libstdc++: Optimize C++20 comparison category types

2020-02-06 Thread Jonathan Wakely

On 06/02/20 13:40 +, Jonathan Wakely wrote:

This reduces sizeof(std::partial_ordering) and optimizes conversion and
comparison operators to avoid conditional branches where possible.

* libsupc++/compare (__cmp_cat::_Ncmp::unordered): Change value to 2.
(partial_ordering::_M_is_ordered): Remove data member.
(partial_ordering): Use second bit of _M_value for unordered. Adjust
comparison operators.
(weak_ordering::operator partial_ordering): Simplify to remove
branches.
(operator<=>(unspecified, weak_ordering)): Likewise.
(strong_ordering::operator partial_ordering): Likewise.
(strong_ordering::operator weak_ordering): Likewise.
(operator<=>(unspecified, strong_ordering)): Likewise.
* testsuite/18_support/comparisons/categories/partialord.cc: New test.
* testsuite/18_support/comparisons/categories/strongord.cc: New test.
* testsuite/18_support/comparisons/categories/weakord.cc: New test.

Tested powerpc64le-linux and x86_64-linux.

This is an ABI change for the partial_ordering type, but that is why I
think we should do it now, not after GCC 10 is released. The sooner
the better, before these types are being widely used.

I plan to commit this in the next 12 hours or so, unless there are
(valid :-) objections.

Thanks to Barry Revzin for pointing out there was room for these
operators to be improved.


We could also change the int _M_value data member of all three
comparison category types to be a signed char instead of int. That
would reduce the size further.

It probably doesn't matter for most uses, only when one of the types
is used as a data member and the smaller type would allow a more
compact layout. I'm not sure how common such uses will be, but I
suppose it's plausible somebody could have a function returning a
std::tuple which would benefit.

Anybody want to argue for or against making them 8 bits?




Re: [PATCH] Revert mangling of names with -fprofile-generate=.

2020-02-06 Thread Martin Liška

On 2/6/20 2:26 PM, Jan Hubicka wrote:

Hi.

The patch reverts mangling of filenames due to file
length limitation. Creation of a folder tree seems fine
in context of PGO.

Ready for master?
Thanks,
Martin

gcc/ChangeLog:

2020-02-06  Martin Liska  

PR gcov-profile/91971
PR gcov-profile/93466
* coverage.c (coverage_init): Revert mangling of
path into filename.  It can lead to huge filename length.
Creation of subfolders seem more natural.


This does make sense to me - the overly long filenames looked like a bad
move to me.  But what was motivation for introducing them at first
place?


The motivation is described here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91971#c0

Well, the original issue is not a fundamental problem.
I'm going to install the patch.

Martin



Patch is OK unless that reason was very good :)

Honza





[PATCH] libstdc++: Optimize C++20 comparison category types

2020-02-06 Thread Jonathan Wakely
This reduces sizeof(std::partial_ordering) and optimizes conversion and
comparison operators to avoid conditional branches where possible.

* libsupc++/compare (__cmp_cat::_Ncmp::unordered): Change value to 2.
(partial_ordering::_M_is_ordered): Remove data member.
(partial_ordering): Use second bit of _M_value for unordered. Adjust
comparison operators.
(weak_ordering::operator partial_ordering): Simplify to remove
branches.
(operator<=>(unspecified, weak_ordering)): Likewise.
(strong_ordering::operator partial_ordering): Likewise.
(strong_ordering::operator weak_ordering): Likewise.
(operator<=>(unspecified, strong_ordering)): Likewise.
* testsuite/18_support/comparisons/categories/partialord.cc: New test.
* testsuite/18_support/comparisons/categories/strongord.cc: New test.
* testsuite/18_support/comparisons/categories/weakord.cc: New test.

Tested powerpc64le-linux and x86_64-linux.

This is an ABI change for the partial_ordering type, but that is why I
think we should do it now, not after GCC 10 is released. The sooner
the better, before these types are being widely used.

I plan to commit this in the next 12 hours or so, unless there are
(valid :-) objections.

Thanks to Barry Revzin for pointing out there was room for these
operators to be improved.

commit 556a60b573cd599d44f7dae3dccafb9d0694f088
Author: Jonathan Wakely 
Date:   Thu Feb 6 13:31:36 2020 +

libstdc++: Optimize C++20 comparison category types

This reduces sizeof(std::partial_ordering) and optimizes conversion and
comparison operators to avoid conditional branches where possible.

* libsupc++/compare (__cmp_cat::_Ncmp::unordered): Change value to 
2.
(partial_ordering::_M_is_ordered): Remove data member.
(partial_ordering): Use second bit of _M_value for unordered. Adjust
comparison operators.
(weak_ordering::operator partial_ordering): Simplify to remove
branches.
(operator<=>(unspecified, weak_ordering)): Likewise.
(strong_ordering::operator partial_ordering): Likewise.
(strong_ordering::operator weak_ordering): Likewise.
(operator<=>(unspecified, strong_ordering)): Likewise.
* testsuite/18_support/comparisons/categories/partialord.cc: New 
test.
* testsuite/18_support/comparisons/categories/strongord.cc: New 
test.
* testsuite/18_support/comparisons/categories/weakord.cc: New test.

diff --git a/libstdc++-v3/libsupc++/compare b/libstdc++-v3/libsupc++/compare
index a7a29ef0440..8ac446a9bc5 100644
--- a/libstdc++-v3/libsupc++/compare
+++ b/libstdc++-v3/libsupc++/compare
@@ -50,7 +50,7 @@ namespace std
   {
 enum class _Ord { equivalent = 0, less = -1, greater = 1 };
 
-enum class _Ncmp { _Unordered = -127 };
+enum class _Ncmp { _Unordered = 2 };
 
 struct __unspec
 {
@@ -61,18 +61,20 @@ namespace std
   class partial_ordering
   {
 int _M_value;
-bool _M_is_ordered;
 
 constexpr explicit
 partial_ordering(__cmp_cat::_Ord __v) noexcept
-: _M_value(int(__v)), _M_is_ordered(true)
+: _M_value(int(__v))
 { }
 
 constexpr explicit
 partial_ordering(__cmp_cat::_Ncmp __v) noexcept
-: _M_value(int(__v)), _M_is_ordered(false)
+: _M_value(int(__v))
 { }
 
+friend class weak_ordering;
+friend class strong_ordering;
+
   public:
 // valid values
 static const partial_ordering less;
@@ -83,42 +85,42 @@ namespace std
 // comparisons
 friend constexpr bool
 operator==(partial_ordering __v, __cmp_cat::__unspec) noexcept
-{ return __v._M_is_ordered && __v._M_value == 0; }
+{ return __v._M_value == 0; }
 
 friend constexpr bool
 operator==(partial_ordering, partial_ordering) noexcept = default;
 
 friend constexpr bool
 operator< (partial_ordering __v, __cmp_cat::__unspec) noexcept
-{ return __v._M_is_ordered && __v._M_value < 0; }
+{ return __v._M_value == -1; }
 
 friend constexpr bool
 operator> (partial_ordering __v, __cmp_cat::__unspec) noexcept
-{ return __v._M_is_ordered && __v._M_value > 0; }
+{ return __v._M_value == 1; }
 
 friend constexpr bool
 operator<=(partial_ordering __v, __cmp_cat::__unspec) noexcept
-{ return __v._M_is_ordered && __v._M_value <= 0; }
+{ return __v._M_value <= 0; }
 
 friend constexpr bool
 operator>=(partial_ordering __v, __cmp_cat::__unspec) noexcept
-{ return __v._M_is_ordered && __v._M_value >= 0; }
+{ return (__v._M_value & 1) == __v._M_value; }
 
 friend constexpr bool
 operator< (__cmp_cat::__unspec, partial_ordering __v) noexcept
-{ return __v._M_is_ordered && 0 < __v._M_value; }
+{ return __v._M_value == 1; }
 
 friend constexpr bool
 operator> (__cmp_cat::__unspec, partial_ordering __v) noexcept
-{ return __v._M_is_ordered && 0 > 

[committed] libstdc++: Fix comment to refer to correct PR

2020-02-06 Thread Jonathan Wakely
* include/bits/stl_iterator.h (__detail::__common_iter_ptr): Fix PR
number in comment. Fix indentation.

Tested powerpc64le-linux, committed to master.
commit bd630df033784c791c3ca49fc30821eaee35f7c2
Author: Jonathan Wakely 
Date:   Thu Feb 6 11:33:12 2020 +

libstdc++: Fix comment to refer to correct PR

* include/bits/stl_iterator.h (__detail::__common_iter_ptr): Fix PR
number in comment. Fix indentation.

diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index c200f7a9d14..69c6ae66cdf 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -1737,12 +1737,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   namespace __detail
   {
-// FIXME: This has to be at namespace-scope because of PR 92078.
+// FIXME: This has to be at namespace-scope because of PR 92103.
 template
   struct __common_iter_ptr
-   {
- using type = void;
-   };
+  {
+   using type = void;
+  };
 
 template
   requires __detail::__common_iter_has_arrow<_Iter>


[committed] libstdc++: decay in viewable_range should be remove_cvref (LWG 3375)

2020-02-06 Thread Jonathan Wakely
* include/bits/stl_algobase.h (__iter_swap, __iter_swap): Remove
redundant _GLIBCXX20_CONSTEXPR.

Tested powerpc64le-linux, committed to master.

commit 26eae9ac2bf75a26a419dc1e47a067c66331fb74
Author: Jonathan Wakely 
Date:   Thu Feb 6 11:30:30 2020 +

libstdc++: decay in viewable_range should be remove_cvref (LWG 3375)

* include/bits/stl_algobase.h (__iter_swap, __iter_swap): 
Remove
redundant _GLIBCXX20_CONSTEXPR.

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index ea558c76c9d..860f7283be5 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -92,7 +92,7 @@ namespace ranges
   /// A range which can be safely converted to a view.
   template
 concept viewable_range = range<_Tp>
-  && (safe_range<_Tp> || view>);
+  && (safe_range<_Tp> || view>);
 
   namespace __detail
   {


[PATCH], PR target/93569, Fix PowerPC vsx-builtin-15d.c test case

2020-02-06 Thread Michael Meissner
When I applied my previous patches for vec_extract, I switched to using
reg_to_non_prefixed to validate the vector extract address.  It uncovered a bug
that reg_to_non_prefixed allowed D-FORM (reg+offset) addresses to load up
Altivec registers on power7 and power8.  However, those systems only supported
X-FORM (reg+reg) addressing.  Power9 added support for DS-FORM and DQ-FORM
addressing to the Altivec registers.  This patch fixes this so that the
vsx-builtin-15d.c test case now passes.

Can I check this into the master branch?

I have done bootstrap builds and make check on both a little endian Power8
system and a big endian Power8 system.  There were no regressions.  On the big
endian system, just vsx-builtin-15d.c now passes.  On the little endian system,
vsx-builtin-15d.c now passes along with some Fortran tests.

2020-02-05  Michael Meissner  

PR target/93569
* config/rs6000/rs6000.c (reg_to_non_prefixed): Before ISA 3.0
we only had X-FORM (reg+reg) addressing in the traditional Altivec
registers.

--- /tmp/eAu61F_rs6000.c2020-02-05 18:08:48.698992017 -0500
+++ gcc/config/rs6000/rs6000.c  2020-02-05 17:23:55.733650185 -0500
@@ -24943,9 +24943,13 @@ reg_to_non_prefixed (rtx reg, machine_mo
 }
 
   /* Altivec registers use DS-mode for scalars, and DQ-mode for vectors, IEEE
- 128-bit floating point, and 128-bit integers.  */
+ 128-bit floating point, and 128-bit integers.  Before power9, only indexed
+ addressing was available.  */
   else if (ALTIVEC_REGNO_P (r))
 {
+  if (!TARGET_P9_VECTOR)
+   return NON_PREFIXED_X;
+
   if (mode == SFmode || size == 8 || FLOAT128_2REG_P (mode))
return NON_PREFIXED_DS;
 

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: [PATCH] Revert mangling of names with -fprofile-generate=.

2020-02-06 Thread Jan Hubicka
> Hi.
> 
> The patch reverts mangling of filenames due to file
> length limitation. Creation of a folder tree seems fine
> in context of PGO.
> 
> Ready for master?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2020-02-06  Martin Liska  
> 
>   PR gcov-profile/91971
>   PR gcov-profile/93466
>   * coverage.c (coverage_init): Revert mangling of
>   path into filename.  It can lead to huge filename length.
>   Creation of subfolders seem more natural.

This does make sense to me - the overly long filenames looked like a bad
move to me.  But what was motivation for introducing them at first
place?

Patch is OK unless that reason was very good :)

Honza


[PATCH] Revert mangling of names with -fprofile-generate=.

2020-02-06 Thread Martin Liška

Hi.

The patch reverts mangling of filenames due to file
length limitation. Creation of a folder tree seems fine
in context of PGO.

Ready for master?
Thanks,
Martin

gcc/ChangeLog:

2020-02-06  Martin Liska  

PR gcov-profile/91971
PR gcov-profile/93466
* coverage.c (coverage_init): Revert mangling of
path into filename.  It can lead to huge filename length.
Creation of subfolders seem more natural.
---
 gcc/coverage.c | 8 
 1 file changed, 8 deletions(-)


diff --git a/gcc/coverage.c b/gcc/coverage.c
index f29ff640c43..30ae84df90f 100644
--- a/gcc/coverage.c
+++ b/gcc/coverage.c
@@ -1227,14 +1227,6 @@ coverage_init (const char *filename)
   else
 	profile_data_prefix = getpwd ();
 }
-  else if (profile_data_prefix != NULL)
-{
-  /* when filename is a absolute path, we also need to mangle the full
-  path of filename to prevent the profiling data being stored into a
-  different path than that specified by profile_data_prefix.  */
-  filename = mangle_path (filename);
-  len = strlen (filename);
-}
 
   if (profile_data_prefix)
 prefix_len = strlen (profile_data_prefix);



Re: [PATCH] avoid issuing -Wrestrict from folder (PR 93519)

2020-02-06 Thread Richard Biener
On Thu, Feb 6, 2020 at 2:00 PM Jeff Law  wrote:
>
> On Wed, 2020-02-05 at 09:19 +0100, Richard Biener wrote:
> > On Tue, Feb 4, 2020 at 11:02 PM Martin Sebor  wrote:
> > > On 2/4/20 2:31 PM, Jeff Law wrote:
> > > > On Tue, 2020-02-04 at 13:08 -0700, Martin Sebor wrote:
> > > > > On 2/4/20 12:15 PM, Richard Biener wrote:
> > > > > > On February 4, 2020 5:30:42 PM GMT+01:00, Jeff Law 
> > > > > >  wrote:
> > > > > > > On Tue, 2020-02-04 at 10:34 +0100, Richard Biener wrote:
> > > > > > > > On Tue, Feb 4, 2020 at 1:44 AM Martin Sebor  
> > > > > > > > wrote:
> > > > > > > > > PR 93519 reports a false positive -Wrestrict issued for an 
> > > > > > > > > inlined
> > > > > > > call
> > > > > > > > > to strcpy that carefully guards against self-copying.  This is
> > > > > > > caused
> > > > > > > > > by the caller's arguments substituted into the call during 
> > > > > > > > > inlining
> > > > > > > and
> > > > > > > > > before dead code elimination.
> > > > > > > > >
> > > > > > > > > The attached patch avoids this by removing -Wrestrict from the
> > > > > > > folder
> > > > > > > > > and deferring folding perfectly overlapping (and so undefined)
> > > > > > > calls
> > > > > > > > > to strcpy (and mempcpy, but not memcpy) until much later.  
> > > > > > > > > Calls to
> > > > > > > > > perfectly overlapping calls to memcpy are still folded early.
> > > > > > > >
> > > > > > > > Why do we bother to warn at all for this case?  Just DWIM here.
> > > > > > > Warnings like
> > > > > > > > this can be emitted from the analyzer?
> > > > > > > They potentially can, but the analyzer is and will almost always
> > > > > > > certainly be considerably slower.  I would not expect it to be 
> > > > > > > used
> > > > > > > nearly as much as the core compiler.
> > > > > > >
> > > > > > > WHether or not a particular warning makes sense in the core 
> > > > > > > compiler or
> > > > > > > analyzer would seem to me to depend on whether or not we can 
> > > > > > > reasonably
> > > > > > > issue warnings without interprocedural analysis.  double-free
> > > > > > > realistically requires interprocedural analysis to be effective.  
> > > > > > > I'm
> > > > > > > not sure Wrestrict really does.
> > > > > > >
> > > > > > >
> > > > > > > > That is, I suggest to simply remove the bogus warning code from
> > > > > > > folding
> > > > > > > > (and _not_ fail the folding).
> > > > > > > I haven't looked at the patch, but if we can get the warning out 
> > > > > > > of the
> > > > > > > folder that's certainly preferable.  And we could investigate 
> > > > > > > deferring
> > > > > > > self-copy removal.
> > > > > >
> > > > > > I think the issue is as usual, warning for code we'll later remove 
> > > > > > as dead. Warning at folding is almost always premature.
> > > > >
> > > > > In this instance the code is reachable (or isn't obviously 
> > > > > unreachable).
> > > > > GCC doesn't remove it, but provides benign (and reasonable) semantics
> > > > > for it(*).  To me, that's one aspect of quality.  Letting the user 
> > > > > know
> > > > > that the code is buggy is another.  I view that as at least as 
> > > > > important
> > > > > as folding the ill-effects away because it makes it possible to fix
> > > > > the problem so the code works correctly even with compilers that don't
> > > > > provide these benign semantics.
> > > > If you look at the guts of what happens at the point where we issue the
> > > > warning from within gimple_fold_builtin_strcpy we have:
> > > >
> > > > > DCH_to_char (char * in, char * out, int collid)
> > > > > {
> > > > >int type;
> > > > >char * D.2148;
> > > > >char * dest;
> > > > >char * num;
> > > > >long unsigned int _4;
> > > > >char * _5;
> > > > >
> > > > > ;;   basic block 2, loop depth 0
> > > > > ;;pred:   ENTRY
> > > > > ;;succ:   4
> > > > >
> > > > > ;;   basic block 4, loop depth 0
> > > > > ;;pred:   2
> > > > > ;;succ:   5
> > > > >
> > > > > ;;   basic block 5, loop depth 0
> > > > > ;;pred:   4
> > > > > ;;succ:   6
> > > > >
> > > > > ;;   basic block 6, loop depth 0
> > > > > ;;pred:   5
> > > > >if (0 != 0)
> > > > >  goto ; [53.47%]
> > > > >else
> > > > >  goto ; [46.53%]
> > > > > ;;succ:   7
> > > > > ;;8
> > > > >
> > > > > ;;   basic block 7, loop depth 0
> > > > > ;;pred:   6
> > > > >strcpy (out_1(D), out_1(D));
> > > > > ;;succ:   8
> > > > >
> > > > > ;;   basic block 8, loop depth 0
> > > > > ;;pred:   6
> > > > > ;;7
> > > > >_4 = __builtin_strlen (out_1(D));
> > > > >_5 = out_1(D) + _4;
> > > > >__builtin_memcpy (_5, "foo", 4);
> > > > > ;;succ:   3
> > > > >
> > > > > ;;   basic block 3, loop depth 0
> > > > > ;;pred:   8
> > > > >return;
> > > > > ;;succ:   EXIT
> > > > >
> > > > > }
> > > > >
> > > >
> > > > Which shows the code is obviously unreachable in the case we're wa

Re: [PR47785] COLLECT_AS_OPTIONS

2020-02-06 Thread Richard Biener
On Thu, Feb 6, 2020 at 1:48 PM Prathamesh Kulkarni
 wrote:
>
> On Tue, 4 Feb 2020 at 19:44, Richard Biener  
> wrote:
> >
> > On Mon, Feb 3, 2020 at 12:37 PM Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Thu, 30 Jan 2020 at 19:10, Richard Biener  
> > > wrote:
> > > >
> > > > On Thu, Jan 30, 2020 at 5:31 AM Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > On Tue, 28 Jan 2020 at 17:17, Richard Biener 
> > > > >  wrote:
> > > > > >
> > > > > > On Fri, Jan 24, 2020 at 7:04 AM Prathamesh Kulkarni
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Mon, 20 Jan 2020 at 15:44, Richard Biener 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Wed, Jan 8, 2020 at 11:20 AM Prathamesh Kulkarni
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Tue, 5 Nov 2019 at 17:38, Richard Biener 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Nov 5, 2019 at 12:17 AM Kugan Vivekanandarajah
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi,
> > > > > > > > > > > Thanks for the review.
> > > > > > > > > > >
> > > > > > > > > > > On Tue, 5 Nov 2019 at 03:57, H.J. Lu 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Sun, Nov 3, 2019 at 6:45 PM Kugan Vivekanandarajah
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the reviews.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Sat, 2 Nov 2019 at 02:49, H.J. Lu 
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:33 PM Kugan 
> > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, 30 Oct 2019 at 03:11, H.J. Lu 
> > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Sun, Oct 27, 2019 at 6:33 PM Kugan 
> > > > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for the review.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed, 23 Oct 2019 at 23:07, Richard Biener 
> > > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Mon, Oct 21, 2019 at 10:04 AM Kugan 
> > > > > > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks for the pointers.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Fri, 11 Oct 2019 at 22:33, Richard 
> > > > > > > > > > > > > > > > > > > Biener  wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Fri, Oct 11, 2019 at 6:15 AM Kugan 
> > > > > > > > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > > > > > > > >  
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > > > > > > > > > Thanks for the review.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Wed, 2 Oct 2019 at 20:41, Richard 
> > > > > > > > > > > > > > > > > > > > > Biener  
> > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 2, 2019 at 10:39 AM 
> > > > > > > > > > > > > > > > > > > > > > Kugan Vivekanandarajah
> > > > > > > > > > > > > > > > > > > > > >  
> > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > As mentioned in the PR, attached 
> > > > > > > > > > > > > > > > > > > > > > > patch adds COLLECT_AS_OPTIONS for
> > > > > > > > > > > > > > > > > > > > > > > passing assembler options 
> > > > > > > > > > > > > > > > > > > > > > > specified with -Wa, to the 
> > > > > > > > > > > > > > > > > > > > > > > link-time driver.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > The proposed solution only works 
> > > > > > > > > > > > > > > > > > > > > > > for uniform -Wa options across all
> > > > > > > > > > > > > > > > > > > > > > > TUs. As mentioned by Richard 
> > > > > > > > > > > > > > > > > > > > > > > Biener, supporting non-uniform 
> > > > > > > > > > > > > > > > > > > > > > > -Wa flags
> > > > > > > > > > > > > > > > > > > > > > > would require either adjusting 
> > > > > > > > > > > > > > > > > > > > > > > partitioning 

[PATCH] [MIPS] Remove unnecessary moves around DSP multiply-accumulate instructions

2020-02-06 Thread Mihailo Stojanovic
Unnecessary moves around dpadd and dpsub are caused by different pseudos
being assigned to the input-output operands which correspond to the same
register.

Just like for the MSA multiply-accumulate instructions, this forces the
same pseudo to the input-output operands,
which removes unnecesary moves.

Tested on mips-mti-linux-gnu.

gcc/ChangeLog:

* gcc/config/mips/mips.c (mips_expand_builtin_insn): Operands of
DSP multiply-accumulate instructions which correspond to the
same input-output register now have the same pseudo asigned to
them.

gcc/testsuite/ChangeLog:

* gcc/testsuite/gcc.target/mips/mac-zero-reload.c: New test.
---
 gcc/config/mips/mips.c  | 24 +++
 gcc/testsuite/gcc.target/mips/mac-zero-reload.c | 32 +
 2 files changed, 56 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/mips/mac-zero-reload.c

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index e337b82..3aa2c11 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -16994,6 +16994,30 @@ mips_expand_builtin_insn (enum insn_code icode, 
unsigned int nops,
 case CODE_FOR_msa_dpsub_u_w:
 case CODE_FOR_msa_dpsub_u_h:
 case CODE_FOR_msa_dpsub_u_d:
+
+case CODE_FOR_mips_dpau_h_qbl:
+case CODE_FOR_mips_dpau_h_qbr:
+case CODE_FOR_mips_dpsu_h_qbl:
+case CODE_FOR_mips_dpsu_h_qbr:
+case CODE_FOR_mips_dpaq_s_w_ph:
+case CODE_FOR_mips_dpsq_s_w_ph:
+case CODE_FOR_mips_mulsaq_s_w_ph:
+case CODE_FOR_mips_dpaq_sa_l_w:
+case CODE_FOR_mips_dpsq_sa_l_w:
+case CODE_FOR_mips_maq_s_w_phl:
+case CODE_FOR_mips_maq_s_w_phr:
+case CODE_FOR_mips_maq_sa_w_phl:
+case CODE_FOR_mips_maq_sa_w_phr:
+
+case CODE_FOR_mips_dpa_w_ph:
+case CODE_FOR_mips_dps_w_ph:
+case CODE_FOR_mips_mulsa_w_ph:
+case CODE_FOR_mips_dpax_w_ph:
+case CODE_FOR_mips_dpsx_w_ph:
+case CODE_FOR_mips_dpaqx_s_w_ph:
+case CODE_FOR_mips_dpaqx_sa_w_ph:
+case CODE_FOR_mips_dpsqx_s_w_ph:
+case CODE_FOR_mips_dpsqx_sa_w_ph:
   /* Force the operands which correspond to the same in-out register
  to have the same pseudo assigned to them.  If the input operand
  is not REG, create one for it.  */
diff --git a/gcc/testsuite/gcc.target/mips/mac-zero-reload.c 
b/gcc/testsuite/gcc.target/mips/mac-zero-reload.c
new file mode 100644
index 000..a70dfb5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/mac-zero-reload.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-fno-unroll-loops -mgp32 -mdspr2" } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
+/* { dg-final { scan-assembler-not "\tmflo\t" } } */
+/* { dg-final { scan-assembler-not "\tmfhi\t" } } */
+/* { dg-final { scan-assembler-not "\tmtlo\t" } } */
+/* { dg-final { scan-assembler-not "\tmthi\t" } } */
+
+typedef short v2i16 __attribute__ ((vector_size(4)));
+
+extern v2i16 ps32Ptrl[4096];
+
+extern int sink[4096];
+
+int main(void)
+{
+  v2i16 v2i16_h0;
+  long long   s64Acc;
+
+  for (int i = 0; i < 4; ++i)
+{
+  v2i16_h0 = ps32Ptrl[i];
+
+  s64Acc = 0;
+
+  s64Acc = __builtin_mips_dpa_w_ph(s64Acc, v2i16_h0, v2i16_h0);
+
+  sink[i] = __builtin_mips_extr_rs_w(s64Acc, 0);
+}
+
+  return 0;
+}
-- 
2.7.4



Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse

2020-02-06 Thread Jeff Law
On Wed, 2020-02-05 at 13:30 +, Richard Sandiford wrote:
> Jeff Law  writes:
> > Richard & Segher, if y'all could check my analysis here, it'd be
> > appreciated.
> > 
> > pr90275 is a P2 regression that is only triggering on ARM.  David's
> > testcase in c#1 is the best for this problem as it doesn't require
> > magic flags like -fno-dce to trigger.
> > 
> > The block in question:
> > 
> > > (code_label 89 88 90 24 15 (nil) [0 uses])
> > > (note 90 89 97 24 [bb 24] NOTE_INSN_BASIC_BLOCK)
> > > (insn 97 90 98 24 (parallel [
> > > (set (reg:CC 100 cc)
> > > (compare:CC (reg:SI 131 [ d_lsm.21 ])
> > > (const_int 0 [0])))
> > > (set (reg:SI 135 [ d_lsm.21 ])
> > > (reg:SI 131 [ d_lsm.21 ]))
> > > ]) "pr90275.c":21:45 248 {*movsi_compare0}
> > >  (expr_list:REG_DEAD (reg:SI 131 [ d_lsm.21 ])
> > > (nil)))
> > > (insn 98 97 151 24 (set (reg:SI 136 [+4 ])
> > > (reg:SI 132 [ d_lsm.21+4 ])) "pr90275.c":21:45 241 
> > > {*arm_movsi_insn}
> > >  (expr_list:REG_DEAD (reg:SI 132 [ d_lsm.21+4 ])
> > > (expr_list:REG_DEAD (reg:CC 100 cc)
> > > (nil
> > > (insn 151 98 152 24 (set (reg:SI 131 [ d_lsm.21 ])
> > > (reg:SI 131 [ d_lsm.21 ])) "pr90275.c":21:45 241 {*arm_movsi_insn}
> > >  (expr_list:REG_DEAD (reg:SI 135 [ d_lsm.21 ])
> > > (nil)))
> > > (insn 152 151 103 24 (set (reg:SI 132 [ d_lsm.21+4 ])
> > > (reg:SI 136 [+4 ])) "pr90275.c":21:45 241 {*arm_movsi_insn}
> > >  (expr_list:REG_DEAD (reg:SI 136 [+4 ])
> > > (nil)))
> > > 
> > insns 97 and 151 are the most important.
> > 
> > We process insn 97 which creates an equivalency between r135 and r131. 
> > This is expressed by putting both on on the "same_value" chain
> > (table_elt->{next,prev}_same_value).
> > 
> > When we put the REGs on the chain we'll set REG_QTY to a positive
> > number which indicates their values are valid.
> > 
> > We continue processing insns forward and run into insn 151 which is a
> > self-copy.
> > 
> > First CSE will invalidate r131 (because its set).  Invalidation is
> > accomplished by setting REG_QTY for r131 to a negative value.  It does
> > not remove r131 from the same value chains.
> > 
> > Then CSE will call insert_regs for r131.  The qty is not valid, so we
> > get into this code:
> > 
> > >  if (modified || ! qty_valid)
> > > {
> > >   if (classp)
> > > for (classp = classp->first_same_value;
> > >  classp != 0;
> > >  classp = classp->next_same_value)
> > >   if (REG_P (classp->exp)
> > >   && GET_MODE (classp->exp) == GET_MODE (x))
> > > {
> > >   unsigned c_regno = REGNO (classp->exp);
> > > 
> > >   gcc_assert (REGNO_QTY_VALID_P (c_regno));
> > > [ ... ]
> > 
> > So we walk the chain of same values for r131.  WHen walking we run into
> > r131 itself.  Since r131 has been invalidated  we trip the assert.
> > 
> > 
> > The fix is pretty simple.  We just arrange to stop processing insns
> > that are nop reg->reg copies much like we already do for mem->mem
> > copies and (set (pc) (pc)).
> > 
> > This has bootstrapped and regression tested on x86_64.  I've also
> > verified it fixes the testcase in c#1 of pr90275, the test in pr93125
> > and pr92388.  Interestingly enough I couldn't trigger the original
> > testcase in 90275, but I'm confident this is ultimately all the same
> > problem.
> 
> This looks similar to the infamous (to me):
> 
>https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01581.html
> 
> which had to be reverted because it broke powerpc64 bootstrap.
> The problem was that n_sets is misleading for calls:
> 
>https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01858.html
> 
> That's easy to fix (and I have a fix).  But given the damage this caused
> last time, I think it's probably best left to GCC 11.
Yea, it's closely related.  In your case you need to effectively ignore
the nop insn to get the optimization you want.  In mine that nop insn
causes an ICE.

I think we could take your cse bits + adding a !CALL_P separately from
the simplify-rtx stuff which Segher objected to.  THat'd likely solve
the ARM ICEs and take you a tiny step forward on optimizing that SVE
case.  Thoughts?

Jeff



[committed] Fix minor hppa testsuite failure due to recent IRA changes

2020-02-06 Thread Jeff Law

The recent IRA changes twiddled register allocation.  Not surprisingly
there's a bit of fallout.

On the PA we've started failing one of the shadd tests which was
triggered by the IRA changes.  In simplest terms the register
allocations changed, which obviously changes the hard registers live at
any given point.  That in turn changes some of the decisions in the
delay slot filling code.

One of the delay slot filling strategies is to try to fill the delay
slot of a branch with the insn at the target of the branch.   If the
target of the branch is reached from multiple points, then we actually
make a copy of that candidate insn (for the delay slot) and change the
branch to target the next insn.  ie

   [ ... ]
L:
candidate_insn
[ ... ]
b L
  delay slot


L can be reached either via the fallthru path or via the branch
(and possibly other branches).  To fill the slot we transform that
into:

   [ ... ]
L:
   candidate_insn
L:
   [ ... ]
   b L
 copy of candidate_insn



The test in question counts the number of shadd insns as a proxy for
other behavior prior to delay slot filling.  Obviously this kind of
slot filling can change the number of shadd insns in the resulting
assembly code which brings a undesirable degree of instability to the
test.

This change does two things.

First, it disables delay slot filling for the test so improve the
test's stability.

Second, it adjusts the expected count of shadd insns.

Committed to the trunk.
Jeff
commit f976fe0937c2b46880628c2e2749ca3a788c5db0
Author: Jeff Law 
Date:   Wed Feb 5 10:00:48 2020 -0700

Fix testsuite "regression" on hppa after recent IRA changes.

* gcc.target/hppa/shadd-3.c: Disable delay slot filling and
adjust expected shadd insn count appropriately.

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index f6291df9795..0a6513e666b 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2020-02-05  Jeff Law  
+
+   * gcc.target/hppa/shadd-3.c: Disable delay slot filling and
+   adjust expected shadd insn count appropriately.
+
 2020-02-05  David Malcolm  
 
* gcc.dg/analyzer/data-model-1.c: Update for changed output to
diff --git a/gcc/testsuite/gcc.target/hppa/shadd-3.c 
b/gcc/testsuite/gcc.target/hppa/shadd-3.c
index f0443ea9977..2d0b648f384 100644
--- a/gcc/testsuite/gcc.target/hppa/shadd-3.c
+++ b/gcc/testsuite/gcc.target/hppa/shadd-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile }  */
-/* { dg-options "-O2" }  */
+/* { dg-options "-O2 -fno-delayed-branch" }  */
 /* In this test we want to verify that combine canonicalizes the
MULT into an ASHIFT which in turn allows postreload-gcse to
find the common subexpression.
@@ -8,8 +8,9 @@
for parsing here, so we count the shadd insns.  More is not
necessarily better in this test.  If this test is too fragile
over time we'll have to revisit the combine and/or postreload
-   dumps.  */
-/* { dg-final { scan-assembler-times "sh.add" 5 } }  */
+   dumps.  Note we have disabled delay slot filling to improve
+   test stability.  */
+/* { dg-final { scan-assembler-times "sh.add" 4 } }  */
 
 extern void oof (void);
 typedef struct simple_bitmap_def *sbitmap;


Re: [PATCH] adjust object size computation for union accesses and PHIs (PR 92765)

2020-02-06 Thread Jeff Law
On Wed, 2020-02-05 at 16:57 -0700, Martin Sebor wrote:
> 
> It passes thanks to the TREE_CODE (arg) == PARM_DECL test added
> in the patch to get_range_strlen (the test was missing before
> and so while it handled ordinary objects (local or global) it
> unnecessarily excluded function arguments.
Oh yea, duh.  I recall noting you added the PARM_DECL handling and
thinking it might allow us to salvage some of the tests.  THen promptly
forgot.

jeff
> 



Re: [PATCH] avoid issuing -Wrestrict from folder (PR 93519)

2020-02-06 Thread Jeff Law
On Wed, 2020-02-05 at 09:19 +0100, Richard Biener wrote:
> On Tue, Feb 4, 2020 at 11:02 PM Martin Sebor  wrote:
> > On 2/4/20 2:31 PM, Jeff Law wrote:
> > > On Tue, 2020-02-04 at 13:08 -0700, Martin Sebor wrote:
> > > > On 2/4/20 12:15 PM, Richard Biener wrote:
> > > > > On February 4, 2020 5:30:42 PM GMT+01:00, Jeff Law  
> > > > > wrote:
> > > > > > On Tue, 2020-02-04 at 10:34 +0100, Richard Biener wrote:
> > > > > > > On Tue, Feb 4, 2020 at 1:44 AM Martin Sebor  
> > > > > > > wrote:
> > > > > > > > PR 93519 reports a false positive -Wrestrict issued for an 
> > > > > > > > inlined
> > > > > > call
> > > > > > > > to strcpy that carefully guards against self-copying.  This is
> > > > > > caused
> > > > > > > > by the caller's arguments substituted into the call during 
> > > > > > > > inlining
> > > > > > and
> > > > > > > > before dead code elimination.
> > > > > > > > 
> > > > > > > > The attached patch avoids this by removing -Wrestrict from the
> > > > > > folder
> > > > > > > > and deferring folding perfectly overlapping (and so undefined)
> > > > > > calls
> > > > > > > > to strcpy (and mempcpy, but not memcpy) until much later.  
> > > > > > > > Calls to
> > > > > > > > perfectly overlapping calls to memcpy are still folded early.
> > > > > > > 
> > > > > > > Why do we bother to warn at all for this case?  Just DWIM here.
> > > > > > Warnings like
> > > > > > > this can be emitted from the analyzer?
> > > > > > They potentially can, but the analyzer is and will almost always
> > > > > > certainly be considerably slower.  I would not expect it to be used
> > > > > > nearly as much as the core compiler.
> > > > > > 
> > > > > > WHether or not a particular warning makes sense in the core 
> > > > > > compiler or
> > > > > > analyzer would seem to me to depend on whether or not we can 
> > > > > > reasonably
> > > > > > issue warnings without interprocedural analysis.  double-free
> > > > > > realistically requires interprocedural analysis to be effective.  
> > > > > > I'm
> > > > > > not sure Wrestrict really does.
> > > > > > 
> > > > > > 
> > > > > > > That is, I suggest to simply remove the bogus warning code from
> > > > > > folding
> > > > > > > (and _not_ fail the folding).
> > > > > > I haven't looked at the patch, but if we can get the warning out of 
> > > > > > the
> > > > > > folder that's certainly preferable.  And we could investigate 
> > > > > > deferring
> > > > > > self-copy removal.
> > > > > 
> > > > > I think the issue is as usual, warning for code we'll later remove as 
> > > > > dead. Warning at folding is almost always premature.
> > > > 
> > > > In this instance the code is reachable (or isn't obviously unreachable).
> > > > GCC doesn't remove it, but provides benign (and reasonable) semantics
> > > > for it(*).  To me, that's one aspect of quality.  Letting the user know
> > > > that the code is buggy is another.  I view that as at least as important
> > > > as folding the ill-effects away because it makes it possible to fix
> > > > the problem so the code works correctly even with compilers that don't
> > > > provide these benign semantics.
> > > If you look at the guts of what happens at the point where we issue the
> > > warning from within gimple_fold_builtin_strcpy we have:
> > > 
> > > > DCH_to_char (char * in, char * out, int collid)
> > > > {
> > > >int type;
> > > >char * D.2148;
> > > >char * dest;
> > > >char * num;
> > > >long unsigned int _4;
> > > >char * _5;
> > > > 
> > > > ;;   basic block 2, loop depth 0
> > > > ;;pred:   ENTRY
> > > > ;;succ:   4
> > > > 
> > > > ;;   basic block 4, loop depth 0
> > > > ;;pred:   2
> > > > ;;succ:   5
> > > > 
> > > > ;;   basic block 5, loop depth 0
> > > > ;;pred:   4
> > > > ;;succ:   6
> > > > 
> > > > ;;   basic block 6, loop depth 0
> > > > ;;pred:   5
> > > >if (0 != 0)
> > > >  goto ; [53.47%]
> > > >else
> > > >  goto ; [46.53%]
> > > > ;;succ:   7
> > > > ;;8
> > > > 
> > > > ;;   basic block 7, loop depth 0
> > > > ;;pred:   6
> > > >strcpy (out_1(D), out_1(D));
> > > > ;;succ:   8
> > > > 
> > > > ;;   basic block 8, loop depth 0
> > > > ;;pred:   6
> > > > ;;7
> > > >_4 = __builtin_strlen (out_1(D));
> > > >_5 = out_1(D) + _4;
> > > >__builtin_memcpy (_5, "foo", 4);
> > > > ;;succ:   3
> > > > 
> > > > ;;   basic block 3, loop depth 0
> > > > ;;pred:   8
> > > >return;
> > > > ;;succ:   EXIT
> > > > 
> > > > }
> > > > 
> > > 
> > > Which shows the code is obviously unreachable in the case we're warning
> > > about.  You can't see this in the dumps because it's exposed by
> > > inlining, then cleaned up before writing the dump file.
> > 
> > In the specific case of the bug the code is of course eliminated
> > because it's guarded by the if (s != d).  I was referring to
> > the general (unguarded) case of:
> > 

Re: [PR47785] COLLECT_AS_OPTIONS

2020-02-06 Thread Prathamesh Kulkarni
On Tue, 4 Feb 2020 at 19:44, Richard Biener  wrote:
>
> On Mon, Feb 3, 2020 at 12:37 PM Prathamesh Kulkarni
>  wrote:
> >
> > On Thu, 30 Jan 2020 at 19:10, Richard Biener  
> > wrote:
> > >
> > > On Thu, Jan 30, 2020 at 5:31 AM Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Tue, 28 Jan 2020 at 17:17, Richard Biener 
> > > >  wrote:
> > > > >
> > > > > On Fri, Jan 24, 2020 at 7:04 AM Prathamesh Kulkarni
> > > > >  wrote:
> > > > > >
> > > > > > On Mon, 20 Jan 2020 at 15:44, Richard Biener 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Wed, Jan 8, 2020 at 11:20 AM Prathamesh Kulkarni
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, 5 Nov 2019 at 17:38, Richard Biener 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Nov 5, 2019 at 12:17 AM Kugan Vivekanandarajah
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > > Thanks for the review.
> > > > > > > > > >
> > > > > > > > > > On Tue, 5 Nov 2019 at 03:57, H.J. Lu  
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Sun, Nov 3, 2019 at 6:45 PM Kugan Vivekanandarajah
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the reviews.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Sat, 2 Nov 2019 at 02:49, H.J. Lu 
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:33 PM Kugan Vivekanandarajah
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, 30 Oct 2019 at 03:11, H.J. Lu 
> > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Sun, Oct 27, 2019 at 6:33 PM Kugan 
> > > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks for the review.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, 23 Oct 2019 at 23:07, Richard Biener 
> > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Mon, Oct 21, 2019 at 10:04 AM Kugan 
> > > > > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks for the pointers.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Fri, 11 Oct 2019 at 22:33, Richard 
> > > > > > > > > > > > > > > > > > Biener  wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Fri, Oct 11, 2019 at 6:15 AM Kugan 
> > > > > > > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > > > > > > > > Thanks for the review.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Wed, 2 Oct 2019 at 20:41, Richard 
> > > > > > > > > > > > > > > > > > > > Biener  
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Wed, Oct 2, 2019 at 10:39 AM Kugan 
> > > > > > > > > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > > > > > > > > >  
> > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > As mentioned in the PR, attached 
> > > > > > > > > > > > > > > > > > > > > > patch adds COLLECT_AS_OPTIONS for
> > > > > > > > > > > > > > > > > > > > > > passing assembler options specified 
> > > > > > > > > > > > > > > > > > > > > > with -Wa, to the link-time driver.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > The proposed solution only works 
> > > > > > > > > > > > > > > > > > > > > > for uniform -Wa options across all
> > > > > > > > > > > > > > > > > > > > > > TUs. As mentioned by Richard 
> > > > > > > > > > > > > > > > > > > > > > Biener, supporting non-uniform -Wa 
> > > > > > > > > > > > > > > > > > > > > > flags
> > > > > > > > > > > > > > > > > > > > > > would require either adjusting 
> > > > > > > > > > > > > > > > > > > > > > partitioning according to flags or
> > > > > > > > > > > > > > > > > > > > > > emitting multiple object files  
> > > > > > > > > > > > > > > > > > > > > > from a single LTRANS CU. We could
> > > > > > > > > > > > > > > > > > > > > > consider this as a follow up.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Bootstrapped and regression tests 
> > > > > > > >

Re: [PATCH 2/3] libstdc++: Implement C++20 constrained algorithms

2020-02-06 Thread Jonathan Wakely

On 05/02/20 19:39 +0100, François Dumont wrote:

Hi

    Is it me or the patch isn't an attachment ? It is far more 
convenient to provide something easy to extract and apply locally.


On 2/4/20 3:07 AM, Patrick Palka wrote:

This patch implements the C++20 ranges overloads for the algorithms in
[algorithms].  Most of the algorithms were reimplemented, with each of their
implementations very closely following the existing implementation in
bits/stl_algo.h and bits/stl_algobase.h.  The reason for reimplementing most of
the algorithms instead of forwarding to their STL-style overload is because
forwarding cannot be conformantly and efficiently performed for algorithms that
operate on non-random-access iterators.  But algorithms that operate on random
access iterators can safely and efficiently be forwarded to the STL-style
implementation, and this patch does so for push_heap, pop_heap, make_heap,
sort_heap, sort, stable_sort, nth_element, inplace_merge and stable_partition.

What's missing from this patch is debug-iterator


Always the 5th wheel of the car like we say in French :-)

I'll be looking at this point once I manage to apply the patch.


 and container specializations
that are present for some of the STL-style algorithms that need to be ported
over to the ranges algos.  I marked them missing at TODO comments.  There are
also some other minor outstanding TODOs.

The code that could use the most thorough review is ranges::__copy_or_move,
ranges::__copy_or_move_backward, ranges::__equal and
ranges::__lexicographical_compare.  In the tests, I tried to test the interface
of each new overload, as well as the correctness of the new implementation.

diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
new file mode 100644
index 000..2e177ce7f7a
--- /dev/null
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -0,0 +1,3640 @@
+// Core algorithmic facilities -*- C++ -*-
+
+// Copyright (C) 2019-2020 Free Software Foundation, Inc.


Copyright for new files is wrong, should be only 2020. I know it is 
painful to maintain that when you work on patch on several years.


I assume Patrick kept the 2019 date because his patch started from a
file I sent him with a few of the algos, and that was dated 2019.

I can't remember what the actual rule is for new files that contain
old code. The copyright on some of the new file *is* from 2019, even
if it wasn't added to the GCC repo yet.

New files containing new code should definitely only have the new date
(usually when I point out wrong dates in patch review it's because
somebody's just copied the comment header from an old testcase and so
the old dates are wrong).

I only wrote that code in December 2019 though, so it doesn't make a
lot of difference either way, 2020 is fine.



Re: [Patch][Testsuite] – More fixes for remote execution: check_gc_sections_available, …

2020-02-06 Thread Tobias Burnus

Hi Richard,

On 2/5/20 5:36 PM, Richard Sandiford wrote:


In each case it might be more obvious to use:
   [list "additional_flags=..."]
with no explicit { ... } quoting.

The .exp files are supposed to follow the 80 char limit where possible,
so there should probably be a line break before [list ...].

OK with those changes, thanks.


I concur that the "[list …" looks nicer. However, the line break does not work:
I tried:

# Check if the ld used by gcc supports --gc-sections.
-   set gcc_ld [lindex [${tool}_target_compile "-print-prog-name=ld" "" "none" 
""] 0]
+   set gcc_ld [lindex [${tool}_target_compile "" "" "none"
+   [list "additional_flags=-print-prog-name=ld"]] 0]

And that failed with:
…/gcsec-1.c: wrong # args: should be "gcc_target_compile source dest type options" for " 
dg-require-gc-sections 4 "" "

(Trying an hello-world example in tclsh, I can also reproduce it there.)

Hence, I have installed the following, which uses the variable $optional
for the value. In check_multi_dir, one line is still 87 characters, but that 
does not help.
(At least: It is 5 characters shorter as before.)

r10-6475-g101baaee42afe05c3d271925e4d40f0f8f642bd5

Tobias

commit 101baaee42afe05c3d271925e4d40f0f8f642bd5
Author: Tobias Burnus 
Date:   Thu Feb 6 13:27:45 2020 +0100

[Testsuite] – More fixes for remote execution: check_gc_sections_available, …

* gcc.target/arm/multilib.exp (multilib_config): Pass flags to
…_target_compile as (additional_flags=) option and not as source
filename to make it work with remote execution.
* lib/target-supports.exp (check_runtime, check_gc_sections_available,
check_effective_target_gas, check_effective_target_gld): Likewise.

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 7b0b9c2c242..d0955e039b5 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,11 @@
+2020-02-06  Tobias Burnus  
+
+	* gcc.target/arm/multilib.exp (multilib_config): Pass flags to
+	…_target_compile as (additional_flags=) option and not as source
+	filename to make it work with remote execution.
+	* lib/target-supports.exp (check_runtime, check_gc_sections_available,
+	check_effective_target_gas, check_effective_target_gld): Likewise.
+
 2020-02-06  Jakub Jelinek  
 
 	PR target/93594
diff --git a/gcc/testsuite/gcc.target/arm/multilib.exp b/gcc/testsuite/gcc.target/arm/multilib.exp
index 67d00266f6b..17111ee5257 100644
--- a/gcc/testsuite/gcc.target/arm/multilib.exp
+++ b/gcc/testsuite/gcc.target/arm/multilib.exp
@@ -40,7 +40,8 @@ proc multilib_config {profile} {
 proc check_multi_dir { gcc_opts multi_dir } {
 global tool
 
-set gcc_output [${tool}_target_compile "--print-multi-directory $gcc_opts" "" "none" ""]
+set options [list "additional_flags=[concat "--print-multi-directory" [gcc_opts]]"]
+set gcc_output [${tool}_target_compile "" "" "none" $options]
 if { [string match "$multi_dir\n" $gcc_output] } {
 	pass "multilibdir $gcc_opts $multi_dir"
 } else {
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 5377d7b11cb..d3b2798df3e 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -260,7 +260,8 @@ proc check_runtime {prop args} {
 proc check_configured_with { pattern } {
 global tool
 
-set gcc_output [${tool}_target_compile "-v" "" "none" ""]
+set options [list "additional_flags=-v"]
+set gcc_output [${tool}_target_compile "" "" "none" $options]
 if { [ regexp "Configured with: \[^\n\]*$pattern" $gcc_output ] } {
 verbose "Matched: $pattern" 2
 return 1
@@ -504,7 +505,8 @@ proc check_gc_sections_available { } {
 	}
 
 	# Check if the ld used by gcc supports --gc-sections.
-	set gcc_ld [lindex [${tool}_target_compile "-print-prog-name=ld" "" "none" ""] 0]
+	set options [list "additional_flags=-print-prog-name=ld"]
+	set gcc_ld [lindex [${tool}_target_compile "" "" "none" $options] 0]
 	set ld_output [remote_exec host "$gcc_ld" "--help"]
 	if { [ string first "--gc-sections" $ld_output ] >= 0 } {
 	return 1
@@ -8566,7 +8568,8 @@ proc check_effective_target_gas { } {
 
 if {![info exists use_gas_saved]} {
 	# Check if the as used by gcc is GNU as.
-	set gcc_as [lindex [${tool}_target_compile "-print-prog-name=as" "" "none" ""] 0]
+	set options [list "additional_flags=-print-prog-name=as"]
+	set gcc_as [lindex [${tool}_target_compile "" "" "none" $options] 0]
 	# Provide /dev/null as input, otherwise gas times out reading from
 	# stdin.
 	set status [remote_exec host "$gcc_as" "-v /dev/null"]
@@ -8588,7 +8591,8 @@ proc check_effective_target_gld { } {
 
 if {![info exists use_gld_saved]} {
 	# Check if the ld used by gcc is GNU ld.
-	set gcc_ld [lindex [${tool}_target_compile "-print-prog-name=ld" "" "none" ""] 0]
+	set options [list "additional_flags=-print-prog-name=ld"]
+	set gcc_ld [lindex 

Re: [PATCH] c++: Handle CONSTRUCTORs without indexes in find_array_ctor_elt [PR93549]

2020-02-06 Thread Richard Biener
On Thu, 6 Feb 2020, Jakub Jelinek wrote:

> On Wed, Feb 05, 2020 at 01:31:30PM -0500, Jason Merrill wrote:
> > > from the constexpr new change apparently broke the following testcase.
> > > When handling COND_EXPR, we build_vector_from_val, however as the 
> > > argument we
> > > pass to it is not an INTEGER_CST/REAL_CST, but that wrapped in a
> > > NON_LVALUE_EXPR location wrapper, we end up with a CONSTRUCTOR and as it 
> > > is
> > > middle-end that builds it, it doesn't bother with indexes.  The
> > > cp_fully_fold_init call used to fold it into VECTOR_CST in the past, but 
> > > as
> > > we intentionally don't invoke it anymore as it might fold away something
> > > that needs to be diagnosed during constexpr evaluation, we end up 
> > > evaluating
> > > ARRAY_REF into the index-less CONSTRUCTOR.  The following patch fixes the
> > > ICE by teaching find_array_ctor_elt to handle CONSTRUCTORs without indexes
> > > (that itself could be still very efficient) and CONSTRUCTORs with some
> > > indexes present and others missing (the rules are that if the index on the
> > > first element is missing, then it is the array's lowest index (in C/C++ 0)
> > > and if other indexes are missing, they are the index of the previous 
> > > element
> > > + 1).
> > 
> > Is it currently possible to get a CONSTRUCTOR with non-init-list type that
> > has some indexes present and others missing?  Other than from the new code
> > in your patch that sets some indexes?
> 
> I don't know, can try to add some instrumentation and do bootstrap/regtest
> with it.  The handling of the CONSTRUCTORs with missing or present or mixed
> indexes is what I found in various middle-end routines.
> The only thing I see in our verifiers is that in GIMPLE function bodies,
> we don't allow non-VECTOR_TYPE CONSTRUCTORs with any elements, and for
> VECTOR_TYPE CONSTRUCTORs we require that indexes are NULL for elements with
> VECTOR_TYPE and for others require that it is either NULL or INTEGER_CST
> matching the position (so effectively for those direct access is still
> possible).
> The question might not be just what we do emit right now, but also what we'd
> like to emit in the future, because as has been noted several times, for
> large initializers those explicit indexes consume huge amounts of memory.
> In C with designated initializers, I can see us not emitting indexes from
> the start because we'd want to avoid the memory overhead for normal
> sequential initializers, but then much later we can find a designated
> initializer that wants to skip over some elements and thus add an index at
> that point (or range designator for which we want RANGE_EXPR); shall we add
> indexes to all elements at that point?
> In C++, I think we don't allow non-useless array designated initializers, so
> there is no way to skip elements using that or go backwards, but still,
> don't we emit RANGE_EXPRs if we see the same initializer for many elements?
> I guess right now we emit indexes for all elements for those, but if we
> choose to optimize?

I've played with eliding them (on the C frontend) some time ago and
the issue with designated initializers is not themselves but that
we need to "sort" the CTOR and at that point we need indexes for all
elements (or have some other way of dealing with it).

Also for sparse CTORs the middle-end needs indices for binary search.

I wonder if we could replace INTEGER_CSTs with (index<<1)|1 in
constructor_elt.index or so ... (or sth less hackish)

I bet there's a way around the sorting issue of course.  Like
the suggested late add.

> > Is it unreasonable to assume that if the first element has no index, none of
> > the elements do?
> 
> Not sure, see above.  Depends on what we want to guarantee.

In the middle-end we don't want to rely on this unless we have 
checking that we honor it.  For pure frontend code feel free to
set your own constraints ;)

> > > + else if (i == j + (middle - begin))
> > > +   {
> > > + (*elts)[middle].index = dindex;
> > 
> > Why set this index?
> 
> Because the caller asserts or relies that it has one.
>   constructor_elt *cep = NULL;
>   if (code == ARRAY_TYPE)
> {
>   HOST_WIDE_INT i
> = find_array_ctor_elt (*valp, index, /*insert*/true);
>   gcc_assert (i >= 0);
>   cep = CONSTRUCTOR_ELT (*valp, i);
>   gcc_assert (TREE_CODE (cep->index) != RANGE_EXPR);
> 
> Now, ATM we are aware of just small CONSTRUCTORs that can appear this way
> (VECTOR_TYPE and so generally not too many elements in real-world
> testcases), so if you prefer, the function when seeing NULL index could just
> add indexes to all elements and retry and defer deciding if and how we
> optimize large constructors for later.
>
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH] c++: Handle CONSTRUCTORs without indexes in find_array_ctor_elt [PR93549]

2020-02-06 Thread Jakub Jelinek
On Wed, Feb 05, 2020 at 01:31:30PM -0500, Jason Merrill wrote:
> > from the constexpr new change apparently broke the following testcase.
> > When handling COND_EXPR, we build_vector_from_val, however as the argument 
> > we
> > pass to it is not an INTEGER_CST/REAL_CST, but that wrapped in a
> > NON_LVALUE_EXPR location wrapper, we end up with a CONSTRUCTOR and as it is
> > middle-end that builds it, it doesn't bother with indexes.  The
> > cp_fully_fold_init call used to fold it into VECTOR_CST in the past, but as
> > we intentionally don't invoke it anymore as it might fold away something
> > that needs to be diagnosed during constexpr evaluation, we end up evaluating
> > ARRAY_REF into the index-less CONSTRUCTOR.  The following patch fixes the
> > ICE by teaching find_array_ctor_elt to handle CONSTRUCTORs without indexes
> > (that itself could be still very efficient) and CONSTRUCTORs with some
> > indexes present and others missing (the rules are that if the index on the
> > first element is missing, then it is the array's lowest index (in C/C++ 0)
> > and if other indexes are missing, they are the index of the previous element
> > + 1).
> 
> Is it currently possible to get a CONSTRUCTOR with non-init-list type that
> has some indexes present and others missing?  Other than from the new code
> in your patch that sets some indexes?

I don't know, can try to add some instrumentation and do bootstrap/regtest
with it.  The handling of the CONSTRUCTORs with missing or present or mixed
indexes is what I found in various middle-end routines.
The only thing I see in our verifiers is that in GIMPLE function bodies,
we don't allow non-VECTOR_TYPE CONSTRUCTORs with any elements, and for
VECTOR_TYPE CONSTRUCTORs we require that indexes are NULL for elements with
VECTOR_TYPE and for others require that it is either NULL or INTEGER_CST
matching the position (so effectively for those direct access is still
possible).
The question might not be just what we do emit right now, but also what we'd
like to emit in the future, because as has been noted several times, for
large initializers those explicit indexes consume huge amounts of memory.
In C with designated initializers, I can see us not emitting indexes from
the start because we'd want to avoid the memory overhead for normal
sequential initializers, but then much later we can find a designated
initializer that wants to skip over some elements and thus add an index at
that point (or range designator for which we want RANGE_EXPR); shall we add
indexes to all elements at that point?
In C++, I think we don't allow non-useless array designated initializers, so
there is no way to skip elements using that or go backwards, but still,
don't we emit RANGE_EXPRs if we see the same initializer for many elements?
I guess right now we emit indexes for all elements for those, but if we
choose to optimize?

> Is it unreasonable to assume that if the first element has no index, none of
> the elements do?

Not sure, see above.  Depends on what we want to guarantee.

> > +   else if (i == j + (middle - begin))
> > + {
> > +   (*elts)[middle].index = dindex;
> 
> Why set this index?

Because the caller asserts or relies that it has one.
  constructor_elt *cep = NULL;
  if (code == ARRAY_TYPE)
{
  HOST_WIDE_INT i
= find_array_ctor_elt (*valp, index, /*insert*/true);
  gcc_assert (i >= 0);
  cep = CONSTRUCTOR_ELT (*valp, i);
  gcc_assert (TREE_CODE (cep->index) != RANGE_EXPR);

Now, ATM we are aware of just small CONSTRUCTORs that can appear this way
(VECTOR_TYPE and so generally not too many elements in real-world
testcases), so if you prefer, the function when seeing NULL index could just
add indexes to all elements and retry and defer deciding if and how we
optimize large constructors for later.

Jakub



Re: [PATCH] avoid issuing -Wrestrict from folder (PR 93519)

2020-02-06 Thread Richard Biener
On Thu, Feb 6, 2020 at 12:14 PM Richard Biener
 wrote:
>
> On Thu, Feb 6, 2020 at 11:33 AM Richard Biener
>  wrote:
> >
> > On Thu, Feb 6, 2020 at 11:06 AM Richard Biener
> >  wrote:
> > >
> > > On Wed, Feb 5, 2020 at 4:55 PM Martin Sebor  wrote:
> > > >
> > > > On 2/5/20 1:19 AM, Richard Biener wrote:
> > > > > On Tue, Feb 4, 2020 at 11:02 PM Martin Sebor  wrote:
> > > > >>
> > > > >> On 2/4/20 2:31 PM, Jeff Law wrote:
> > > > >>> On Tue, 2020-02-04 at 13:08 -0700, Martin Sebor wrote:
> > > >  On 2/4/20 12:15 PM, Richard Biener wrote:
> > > > > On February 4, 2020 5:30:42 PM GMT+01:00, Jeff Law 
> > > > >  wrote:
> > > > >> On Tue, 2020-02-04 at 10:34 +0100, Richard Biener wrote:
> > > > >>> On Tue, Feb 4, 2020 at 1:44 AM Martin Sebor  
> > > > >>> wrote:
> > > >  PR 93519 reports a false positive -Wrestrict issued for an 
> > > >  inlined
> > > > >> call
> > > >  to strcpy that carefully guards against self-copying.  This is
> > > > >> caused
> > > >  by the caller's arguments substituted into the call during 
> > > >  inlining
> > > > >> and
> > > >  before dead code elimination.
> > > > 
> > > >  The attached patch avoids this by removing -Wrestrict from the
> > > > >> folder
> > > >  and deferring folding perfectly overlapping (and so undefined)
> > > > >> calls
> > > >  to strcpy (and mempcpy, but not memcpy) until much later.  
> > > >  Calls to
> > > >  perfectly overlapping calls to memcpy are still folded early.
> > > > >>>
> > > > >>> Why do we bother to warn at all for this case?  Just DWIM here.
> > > > >> Warnings like
> > > > >>> this can be emitted from the analyzer?
> > > > >> They potentially can, but the analyzer is and will almost always
> > > > >> certainly be considerably slower.  I would not expect it to be 
> > > > >> used
> > > > >> nearly as much as the core compiler.
> > > > >>
> > > > >> WHether or not a particular warning makes sense in the core 
> > > > >> compiler or
> > > > >> analyzer would seem to me to depend on whether or not we can 
> > > > >> reasonably
> > > > >> issue warnings without interprocedural analysis.  double-free
> > > > >> realistically requires interprocedural analysis to be effective. 
> > > > >>  I'm
> > > > >> not sure Wrestrict really does.
> > > > >>
> > > > >>
> > > > >>> That is, I suggest to simply remove the bogus warning code from
> > > > >> folding
> > > > >>> (and _not_ fail the folding).
> > > > >> I haven't looked at the patch, but if we can get the warning out 
> > > > >> of the
> > > > >> folder that's certainly preferable.  And we could investigate 
> > > > >> deferring
> > > > >> self-copy removal.
> > > > >
> > > > > I think the issue is as usual, warning for code we'll later 
> > > > > remove as dead. Warning at folding is almost always premature.
> > > > 
> > > >  In this instance the code is reachable (or isn't obviously 
> > > >  unreachable).
> > > >  GCC doesn't remove it, but provides benign (and reasonable) 
> > > >  semantics
> > > >  for it(*).  To me, that's one aspect of quality.  Letting the user 
> > > >  know
> > > >  that the code is buggy is another.  I view that as at least as 
> > > >  important
> > > >  as folding the ill-effects away because it makes it possible to fix
> > > >  the problem so the code works correctly even with compilers that 
> > > >  don't
> > > >  provide these benign semantics.
> > > > >>> If you look at the guts of what happens at the point where we issue 
> > > > >>> the
> > > > >>> warning from within gimple_fold_builtin_strcpy we have:
> > > > >>>
> > > >  DCH_to_char (char * in, char * out, int collid)
> > > >  {
> > > >  int type;
> > > >  char * D.2148;
> > > >  char * dest;
> > > >  char * num;
> > > >  long unsigned int _4;
> > > >  char * _5;
> > > > 
> > > >  ;;   basic block 2, loop depth 0
> > > >  ;;pred:   ENTRY
> > > >  ;;succ:   4
> > > > 
> > > >  ;;   basic block 4, loop depth 0
> > > >  ;;pred:   2
> > > >  ;;succ:   5
> > > > 
> > > >  ;;   basic block 5, loop depth 0
> > > >  ;;pred:   4
> > > >  ;;succ:   6
> > > > 
> > > >  ;;   basic block 6, loop depth 0
> > > >  ;;pred:   5
> > > >  if (0 != 0)
> > > >    goto ; [53.47%]
> > > >  else
> > > >    goto ; [46.53%]
> > > >  ;;succ:   7
> > > >  ;;8
> > > > 
> > > >  ;;   basic block 7, loop depth 0
> > > >  ;;pred:   6
> > > >  strcpy (out_1(D), out_1(D));
> > > >  ;;succ:   8
> > > > 
> > > >  ;;   basic block 8, loop depth 0
> > > > 

[committed] libstdc++: Remove redundant macro that is always empty

2020-02-06 Thread Jonathan Wakely
The __iter_swap class template and explicit specialization are only
declared (and used) for C++03 so _GLIBCXX20_CONSTEXPR does nothing here.

* include/bits/stl_algobase.h (__iter_swap, __iter_swap): Remove
redundant _GLIBCXX20_CONSTEXPR.

Tested x86_64-linux, committed to master.

commit d1aa7705d59e56191c2ccc5594983d8fa0832718
Author: Jonathan Wakely 
Date:   Thu Feb 6 10:45:38 2020 +

libstdc++: Remove redundant macro that is always empty

The __iter_swap class template and explicit specialization are only
declared (and used) for C++03 so _GLIBCXX20_CONSTEXPR does nothing here.

* include/bits/stl_algobase.h (__iter_swap, __iter_swap): 
Remove
redundant _GLIBCXX20_CONSTEXPR.

diff --git a/libstdc++-v3/include/bits/stl_algobase.h 
b/libstdc++-v3/include/bits/stl_algobase.h
index dc922a0f3d2..efda15f816e 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -139,7 +139,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct __iter_swap
 {
   template
-   _GLIBCXX20_CONSTEXPR
static void
iter_swap(_ForwardIterator1 __a, _ForwardIterator2 __b)
{
@@ -155,14 +154,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct __iter_swap
 {
   template
-   _GLIBCXX20_CONSTEXPR
static void
iter_swap(_ForwardIterator1 __a, _ForwardIterator2 __b)
{
  swap(*__a, *__b);
}
 };
-#endif
+#endif // C++03
 
   /**
*  @brief Swaps the contents of two iterators.
@@ -205,6 +203,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
&& __are_same<_ValueType2&, _ReferenceType2>::__value>::
iter_swap(__a, __b);
 #else
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 187. iter_swap underspecified
   swap(*__a, *__b);
 #endif
 }


Re: [PATCH 0/4] Fix various minor issues seen with cppcheck

2020-02-06 Thread Richard Biener
On Thu, Feb 6, 2020 at 12:07 PM Martin Liska  wrote:
>
> Hi.
>
> The series is about small issues that were spotted with cppcheck
> and where David Binderman suggested a patch.
>
> It's probably a stage1 material?

Yes.

> Martin
>
> Martin Liska (4):
>   Remove 2 dead variables in bid_internal.h.
>   Use const for some function arguments.
>   Put index check before use.
>   Use const for template argument.
>
>  gcc/alloc-pool.h   | 2 +-
>  gcc/bitmap.h   | 2 +-
>  gcc/mem-stats.h| 4 ++--
>  gcc/sese.h | 4 ++--
>  libgcc/config/libbid/bid_internal.h| 4 
>  liboffloadmic/runtime/offload_target.cpp   | 2 +-
>  libstdc++-v3/include/parallel/multiway_merge.h | 4 ++--
>  7 files changed, 9 insertions(+), 13 deletions(-)
>
> --
> 2.25.0
>


Re: [PATCH] avoid issuing -Wrestrict from folder (PR 93519)

2020-02-06 Thread Richard Biener
On Thu, Feb 6, 2020 at 11:33 AM Richard Biener
 wrote:
>
> On Thu, Feb 6, 2020 at 11:06 AM Richard Biener
>  wrote:
> >
> > On Wed, Feb 5, 2020 at 4:55 PM Martin Sebor  wrote:
> > >
> > > On 2/5/20 1:19 AM, Richard Biener wrote:
> > > > On Tue, Feb 4, 2020 at 11:02 PM Martin Sebor  wrote:
> > > >>
> > > >> On 2/4/20 2:31 PM, Jeff Law wrote:
> > > >>> On Tue, 2020-02-04 at 13:08 -0700, Martin Sebor wrote:
> > >  On 2/4/20 12:15 PM, Richard Biener wrote:
> > > > On February 4, 2020 5:30:42 PM GMT+01:00, Jeff Law 
> > > >  wrote:
> > > >> On Tue, 2020-02-04 at 10:34 +0100, Richard Biener wrote:
> > > >>> On Tue, Feb 4, 2020 at 1:44 AM Martin Sebor  
> > > >>> wrote:
> > >  PR 93519 reports a false positive -Wrestrict issued for an 
> > >  inlined
> > > >> call
> > >  to strcpy that carefully guards against self-copying.  This is
> > > >> caused
> > >  by the caller's arguments substituted into the call during 
> > >  inlining
> > > >> and
> > >  before dead code elimination.
> > > 
> > >  The attached patch avoids this by removing -Wrestrict from the
> > > >> folder
> > >  and deferring folding perfectly overlapping (and so undefined)
> > > >> calls
> > >  to strcpy (and mempcpy, but not memcpy) until much later.  Calls 
> > >  to
> > >  perfectly overlapping calls to memcpy are still folded early.
> > > >>>
> > > >>> Why do we bother to warn at all for this case?  Just DWIM here.
> > > >> Warnings like
> > > >>> this can be emitted from the analyzer?
> > > >> They potentially can, but the analyzer is and will almost always
> > > >> certainly be considerably slower.  I would not expect it to be used
> > > >> nearly as much as the core compiler.
> > > >>
> > > >> WHether or not a particular warning makes sense in the core 
> > > >> compiler or
> > > >> analyzer would seem to me to depend on whether or not we can 
> > > >> reasonably
> > > >> issue warnings without interprocedural analysis.  double-free
> > > >> realistically requires interprocedural analysis to be effective.  
> > > >> I'm
> > > >> not sure Wrestrict really does.
> > > >>
> > > >>
> > > >>> That is, I suggest to simply remove the bogus warning code from
> > > >> folding
> > > >>> (and _not_ fail the folding).
> > > >> I haven't looked at the patch, but if we can get the warning out 
> > > >> of the
> > > >> folder that's certainly preferable.  And we could investigate 
> > > >> deferring
> > > >> self-copy removal.
> > > >
> > > > I think the issue is as usual, warning for code we'll later remove 
> > > > as dead. Warning at folding is almost always premature.
> > > 
> > >  In this instance the code is reachable (or isn't obviously 
> > >  unreachable).
> > >  GCC doesn't remove it, but provides benign (and reasonable) semantics
> > >  for it(*).  To me, that's one aspect of quality.  Letting the user 
> > >  know
> > >  that the code is buggy is another.  I view that as at least as 
> > >  important
> > >  as folding the ill-effects away because it makes it possible to fix
> > >  the problem so the code works correctly even with compilers that 
> > >  don't
> > >  provide these benign semantics.
> > > >>> If you look at the guts of what happens at the point where we issue 
> > > >>> the
> > > >>> warning from within gimple_fold_builtin_strcpy we have:
> > > >>>
> > >  DCH_to_char (char * in, char * out, int collid)
> > >  {
> > >  int type;
> > >  char * D.2148;
> > >  char * dest;
> > >  char * num;
> > >  long unsigned int _4;
> > >  char * _5;
> > > 
> > >  ;;   basic block 2, loop depth 0
> > >  ;;pred:   ENTRY
> > >  ;;succ:   4
> > > 
> > >  ;;   basic block 4, loop depth 0
> > >  ;;pred:   2
> > >  ;;succ:   5
> > > 
> > >  ;;   basic block 5, loop depth 0
> > >  ;;pred:   4
> > >  ;;succ:   6
> > > 
> > >  ;;   basic block 6, loop depth 0
> > >  ;;pred:   5
> > >  if (0 != 0)
> > >    goto ; [53.47%]
> > >  else
> > >    goto ; [46.53%]
> > >  ;;succ:   7
> > >  ;;8
> > > 
> > >  ;;   basic block 7, loop depth 0
> > >  ;;pred:   6
> > >  strcpy (out_1(D), out_1(D));
> > >  ;;succ:   8
> > > 
> > >  ;;   basic block 8, loop depth 0
> > >  ;;pred:   6
> > >  ;;7
> > >  _4 = __builtin_strlen (out_1(D));
> > >  _5 = out_1(D) + _4;
> > >  __builtin_memcpy (_5, "foo", 4);
> > >  ;;succ:   3
> > > 
> > >  ;;   basic block 3, loop depth 0
> > >  ;;pred:   8
> > >  return;

[PATCH] Improve splitX passes management

2020-02-06 Thread Uros Bizjak
The names of split_before_sched2 ("split4") and split_before_regstack
("split3") do not reflect their insertion point in the sequence of passes,
where split_before_regstack follows split_before_sched2. Reorder the code
and rename the passes to reflect the reality.

split_before_regstack pass does not need to run if split_before_sched2 pass
was already performed. Introduce enable_split_before_sched2 function to
simplify gating functions of these two passes.

There is no need for a separate rest_of_handle_split_before_sched2.
split_all_insns can be called unconditionally from
pass_split_before_sched2::execute, since the corresponding gating function
determines if the pass is executed or not.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

2020-02-06  Uroš Bizjak  

* recog.c: Move pass_split_before_sched2 code in front of
pass_split_before_regstack.
(pass_data_split_before_sched2): Rename pass to split3 from split4.
(pass_data_split_before_regstack): Rename pass to split4 from split3.
(rest_of_handle_split_before_sched2): Remove.
(pass_split_before_sched2::execute): Unconditionally call
split_all_insns.
(enable_split_before_sched2): New function.
(pass_split_before_sched2::gate): Use enable_split_before_sched2.
(pass_split_before_regstack::gate): Ditto.
* config/nds32/nds32.c (nds32_split_double_word_load_store_p):
Update name check for renamed split4 pass.
* config/sh/sh.c (register_sh_passes): Update pass insertion
point for renamed split4 pass.

Uros.
diff --git a/gcc/config/nds32/nds32.c b/gcc/config/nds32/nds32.c
index 625fa8ce7db8..acf13715d830 100644
--- a/gcc/config/nds32/nds32.c
+++ b/gcc/config/nds32/nds32.c
@@ -5496,7 +5496,7 @@ nds32_split_double_word_load_store_p(rtx *operands, bool 
load_p)
 return false;
 
   const char *pass_name = current_pass->name;
-  if (pass_name && ((strcmp (pass_name, "split4") == 0)
+  if (pass_name && ((strcmp (pass_name, "split3") == 0)
 || (strcmp (pass_name, "split5") == 0)))
 return !satisfies_constraint_Da (mem) || MEM_VOLATILE_P (mem);
 
diff --git a/gcc/config/sh/sh.c b/gcc/config/sh/sh.c
index 3439f1663830..a178cfd3b9c9 100644
--- a/gcc/config/sh/sh.c
+++ b/gcc/config/sh/sh.c
@@ -800,7 +800,7 @@ register_sh_passes (void)
   /* Run sh_treg_combine pass after register allocation and basic block
  reordering as this sometimes creates new opportunities.  */
   register_pass (make_pass_sh_treg_combine (g, true, "sh_treg_combine3"),
-PASS_POS_INSERT_AFTER, "split4", 1);
+PASS_POS_INSERT_AFTER, "split3", 1);
 
   /* Optimize sett and clrt insns, by e.g. removing them if the T bit value
  is known after a conditional branch.
diff --git a/gcc/recog.c b/gcc/recog.c
index 5790a58a9114..8c098cf5b0fe 100644
--- a/gcc/recog.c
+++ b/gcc/recog.c
@@ -3943,9 +3943,19 @@ make_pass_split_after_reload (gcc::context *ctxt)
   return new pass_split_after_reload (ctxt);
 }
 
+static bool
+enable_split_before_sched2 (void)
+{
+#ifdef INSN_SCHEDULING
+  return optimize > 0 && flag_schedule_insns_after_reload;
+#else
+  return false;
+#endif
+}
+
 namespace {
 
-const pass_data pass_data_split_before_regstack =
+const pass_data pass_data_split_before_sched2 =
 {
   RTL_PASS, /* type */
   "split3", /* name */
@@ -3958,61 +3968,38 @@ const pass_data pass_data_split_before_regstack =
   0, /* todo_flags_finish */
 };
 
-class pass_split_before_regstack : public rtl_opt_pass
+class pass_split_before_sched2 : public rtl_opt_pass
 {
 public:
-  pass_split_before_regstack (gcc::context *ctxt)
-: rtl_opt_pass (pass_data_split_before_regstack, ctxt)
+  pass_split_before_sched2 (gcc::context *ctxt)
+: rtl_opt_pass (pass_data_split_before_sched2, ctxt)
   {}
 
   /* opt_pass methods: */
-  virtual bool gate (function *);
+  virtual bool gate (function *)
+{
+  return enable_split_before_sched2 ();
+}
+
   virtual unsigned int execute (function *)
 {
   split_all_insns ();
   return 0;
 }
 
-}; // class pass_split_before_regstack
-
-bool
-pass_split_before_regstack::gate (function *)
-{
-#if HAVE_ATTR_length && defined (STACK_REGS)
-  /* If flow2 creates new instructions which need splitting
- and scheduling after reload is not done, they might not be
- split until final which doesn't allow splitting
- if HAVE_ATTR_length.  */
-# ifdef INSN_SCHEDULING
-  return !optimize || !flag_schedule_insns_after_reload;
-# else
-  return true;
-# endif
-#else
-  return false;
-#endif
-}
+}; // class pass_split_before_sched2
 
 } // anon namespace
 
 rtl_opt_pass *
-make_pass_split_before_regstack (gcc::context *ctxt)
-{
-  return new pass_split_before_regstack (ctxt);
-}
-
-static unsigned int
-rest_of_handle_split_before_sched2 (void)
+make_pass_split_before_sched2 (gcc::context *ctxt)
 {
-#ifdef INSN_SCHEDULING
-  split_all_insns ();
-#endif
-  return 0;
+  return new pass_split_before_sched2 (ctxt);
 }
 
 namespace {
 

Re: [PATCH coroutines] Change lowering behavior of alias variable from copy to substitute

2020-02-06 Thread JunMa

在 2020/2/6 下午5:12, Iain Sandoe 写道:

Hi JunMa,

JunMa  wrote:


在 2020/2/4 下午8:17, JunMa 写道:

Hi
When testing coroutines with lambda function, I find we copy each 
captured
variable to frame. However, according to gimplify pass, for each 
declaration

that is an alias for another expression(DECL_VALUE_EXPR), we can
substitute them directly.

Since lambda captured variables is one of this kind. It is better to 
replace them
rather than copy them, This can reduce frame size (all of the 
captured variables

are field of closure class) and avoid extra copy behavior as well.

This patch remove all of the code related to copy captured variable.
Instead, we first rewrite DECL_VALUE_EXPR with frame field, then we 
check
every variable whether it has DECL_VALUE_EXPR, and then substitute 
it, this

patch does not create frame field for such variables.

Bootstrap and test on X86_64, is it OK?



minor update: only handle var_decl when iterate BIND_EXPR_VARS
in register_local_var_uses.


Do you have any other local patches applied along with this?

Testing locally (on Darwin), I see regressions with optimisation 
O2/O3/Os e.g:


class-05-lambda-capture-copy-local.C   -O2  (internal compiler error)
class-06-lambda-capture-ref.C   -O2  (internal compiler error)
lambda-05-capture-copy-local.C   -O2  (internal compiler error)
lambda-06-multi-capture.C   -O2  (internal compiler error)
lambda-07-multi-capture.C   -O2  (internal compiler error)
lambda-08-co-ret-parm-ref.C   -O3 -g  (internal compiler error)

I have applied this to master, and on top of the patches posted by you 
and

Bin, but the results are the same.


+Bin
This is known issue which has been fixed by Bin, he will send the patch.

Regards
JunMa

thanks
Iain


gcc/cp
2020-02-04  Jun Ma 

    * coroutines.cc (morph_fn_to_coro): Remove code related to
    copy captured variable.
    (register_local_var_uses):  Ditto.
    (register_param_uses):  Collect use of parameters inside
    DECL_VALUE_EXPR.
    (transform_local_var_uses): Substitute the alias variable
    with DECL_VALUE_EXPR if it has one.


gcc/testsuite
2020-02-04  Jun Ma 

    * g++.dg/coroutines/lambda-07-multi-capture.C: New test.



<0001-fix-alias-variable.patch>






Re: [PATCH coroutines] Change lowering behavior of alias variable from copy to substitute

2020-02-06 Thread Bin.Cheng
On Thu, Feb 6, 2020 at 5:12 PM Iain Sandoe  wrote:
>
> Hi JunMa,
>
> JunMa  wrote:
>
> > 在 2020/2/4 下午8:17, JunMa 写道:
> >> Hi
> >> When testing coroutines with lambda function, I find we copy each captured
> >> variable to frame. However, according to gimplify pass, for each
> >> declaration
> >> that is an alias for another expression(DECL_VALUE_EXPR), we can
> >> substitute them directly.
> >>
> >> Since lambda captured variables is one of this kind. It is better to
> >> replace them
> >> rather than copy them, This can reduce frame size (all of the captured
> >> variables
> >> are field of closure class) and avoid extra copy behavior as well.
> >>
> >> This patch remove all of the code related to copy captured variable.
> >> Instead, we first rewrite DECL_VALUE_EXPR with frame field, then we check
> >> every variable whether it has DECL_VALUE_EXPR, and then substitute it,
> >> this
> >> patch does not create frame field for such variables.
> >>
> >> Bootstrap and test on X86_64, is it OK?
>
> > minor update: only handle var_decl when iterate BIND_EXPR_VARS
> > in register_local_var_uses.
>
> Do you have any other local patches applied along with this?
>
> Testing locally (on Darwin), I see regressions with optimisation  O2/O3/Os
> e.g:
>
> class-05-lambda-capture-copy-local.C   -O2  (internal compiler error)
> class-06-lambda-capture-ref.C   -O2  (internal compiler error)
> lambda-05-capture-copy-local.C   -O2  (internal compiler error)
> lambda-06-multi-capture.C   -O2  (internal compiler error)
> lambda-07-multi-capture.C   -O2  (internal compiler error)
> lambda-08-co-ret-parm-ref.C   -O3 -g  (internal compiler error)
>
> I have applied this to master, and on top of the patches posted by you and
> Bin, but the results are the same.
Hi Iains,

Thanks for helping.
Yes, there will be another patch fixing the O2/O3 issues.  Will send
it out for review soon.

Thanks,
bin
>
> thanks
> Iain
>
> >> gcc/cp
> >> 2020-02-04  Jun Ma 
> >>
> >> * coroutines.cc (morph_fn_to_coro): Remove code related to
> >> copy captured variable.
> >> (register_local_var_uses):  Ditto.
> >> (register_param_uses):  Collect use of parameters inside
> >> DECL_VALUE_EXPR.
> >> (transform_local_var_uses): Substitute the alias variable
> >> with DECL_VALUE_EXPR if it has one.
> >>
> >>
> >> gcc/testsuite
> >> 2020-02-04  Jun Ma 
> >>
> >> * g++.dg/coroutines/lambda-07-multi-capture.C: New test.
> >
> >
> > <0001-fix-alias-variable.patch>
>
>


[PATCH 3/4] Put index check before use.

2020-02-06 Thread Martin Liska

liboffloadmic/ChangeLog:

2020-02-04  Martin Liska  

PR other/89860.
* runtime/offload_target.cpp: Put index check
before its use.
---
 liboffloadmic/runtime/offload_target.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/liboffloadmic/runtime/offload_target.cpp b/liboffloadmic/runtime/offload_target.cpp
index 8273faac13b..16ba4a32991 100644
--- a/liboffloadmic/runtime/offload_target.cpp
+++ b/liboffloadmic/runtime/offload_target.cpp
@@ -329,7 +329,7 @@ void OffloadDescriptor::merge_var_descs(
 }
 }
 // instead of m_vars[i].type.dst we will use m_vars_extra[i].type_dst
-if (m_vars[i].type.dst == c_extended_type && i < vars_total) {
+if (i < vars_total && m_vars[i].type.dst == c_extended_type) {
 VarDescExtendedType *etype =
 reinterpret_cast(vars[i].into);
 m_vars_extra[i].type_dst = etype->extended_type;


[PATCH 1/4] Remove 2 dead variables in bid_internal.h.

2020-02-06 Thread Martin Liska

libgcc/config/libbid/ChangeLog:

2020-02-04  Martin Liska  

PR libgcc/92565
* bid_internal.h (handle_UF_128_rem): Remove unused variable.
(handle_UF_128): Likewise.
---
 libgcc/config/libbid/bid_internal.h | 4 
 1 file changed, 4 deletions(-)

diff --git a/libgcc/config/libbid/bid_internal.h b/libgcc/config/libbid/bid_internal.h
index cef36a9bb80..9baa098caac 100644
--- a/libgcc/config/libbid/bid_internal.h
+++ b/libgcc/config/libbid/bid_internal.h
@@ -1540,8 +1540,6 @@ handle_UF_128_rem (UINT128 * pres, UINT64 sgn, int expon, UINT128 CQ,
 __shr_128 (CQ, Qh, amount);
   }
 
-  expon = 0;
-
 #ifndef IEEE_ROUND_NEAREST_TIES_AWAY
 #ifndef IEEE_ROUND_NEAREST
   if (!(*prounding_mode))
@@ -1676,8 +1674,6 @@ handle_UF_128 (UINT128 * pres, UINT64 sgn, int expon, UINT128 CQ,
 __shr_128 (CQ, Qh, amount);
   }
 
-  expon = 0;
-
 #ifndef IEEE_ROUND_NEAREST_TIES_AWAY
 #ifndef IEEE_ROUND_NEAREST
   if (!(*prounding_mode))


[PATCH 4/4] Use const for template argument.

2020-02-06 Thread Martin Liska

libstdc++-v3/ChangeLog:

2020-02-04  Martin Liska  

PR c/92472.
* include/parallel/multiway_merge.h:
Use const for _Compare template argument.
---
 libstdc++-v3/include/parallel/multiway_merge.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/parallel/multiway_merge.h b/libstdc++-v3/include/parallel/multiway_merge.h
index 983c7b2bd9a..97a9ce0feb7 100644
--- a/libstdc++-v3/include/parallel/multiway_merge.h
+++ b/libstdc++-v3/include/parallel/multiway_merge.h
@@ -118,7 +118,7 @@ namespace __gnu_parallel
*  @return @c true if less. */
   friend bool
   operator<(_GuardedIterator<_RAIter, _Compare>& __bi1,
-		_GuardedIterator<_RAIter, _Compare>& __bi2)
+		_GuardedIterator<_RAIter, const _Compare>& __bi2)
   {
 	if (__bi1._M_current == __bi1._M_end)   // __bi1 is sup
 	  return __bi2._M_current == __bi2._M_end;  // __bi2 is not sup
@@ -188,7 +188,7 @@ namespace __gnu_parallel
*  @return @c true if less. */
   friend bool
   operator<(_UnguardedIterator<_RAIter, _Compare>& __bi1,
-		_UnguardedIterator<_RAIter, _Compare>& __bi2)
+		_UnguardedIterator<_RAIter, const _Compare>& __bi2)
   {
 	// Normal compare.
 	return (__bi1.__comp)(*__bi1, *__bi2);


[PATCH 2/4] Use const for some function arguments.

2020-02-06 Thread Martin Liska

gcc/ChangeLog:

2020-02-04  Martin Liska  

PR c/92472.
* alloc-pool.h: Use const for some arguments.
* bitmap.h: Likewise.
* mem-stats.h: Likewise.
* sese.h (get_entry_bb): Likewise.
(get_exit_bb): Likewise.
---
 gcc/alloc-pool.h | 2 +-
 gcc/bitmap.h | 2 +-
 gcc/mem-stats.h  | 4 ++--
 gcc/sese.h   | 4 ++--
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/alloc-pool.h b/gcc/alloc-pool.h
index 1686a8b5f91..fd7194bfea4 100644
--- a/gcc/alloc-pool.h
+++ b/gcc/alloc-pool.h
@@ -60,7 +60,7 @@ public:
 
   /* Dump usage coupled to LOC location, where TOTAL is sum of all rows.  */
   inline void
-  dump (mem_location *loc, mem_usage &total) const
+  dump (mem_location *loc, const mem_usage &total) const
   {
 char *location_string = loc->to_string ();
 
diff --git a/gcc/bitmap.h b/gcc/bitmap.h
index d52fd5bb905..b481f4b2606 100644
--- a/gcc/bitmap.h
+++ b/gcc/bitmap.h
@@ -237,7 +237,7 @@ public:
 
   /* Dump usage coupled to LOC location, where TOTAL is sum of all rows.  */
   inline void
-  dump (mem_location *loc, mem_usage &total) const
+  dump (mem_location *loc, const mem_usage &total) const
   {
 char *location_string = loc->to_string ();
 
diff --git a/gcc/mem-stats.h b/gcc/mem-stats.h
index 21d038bb370..4a3177dd4fc 100644
--- a/gcc/mem-stats.h
+++ b/gcc/mem-stats.h
@@ -70,7 +70,7 @@ public:
 
   /* Return true if the memory location is equal to OTHER.  */
   int
-  equal (mem_location &other)
+  equal (const mem_location &other)
   {
 return m_filename == other.m_filename && m_function == other.m_function
   && m_line == other.m_line;
@@ -203,7 +203,7 @@ public:
 
   /* Dump usage coupled to LOC location, where TOTAL is sum of all rows.  */
   inline void
-  dump (mem_location *loc, mem_usage &total) const
+  dump (mem_location *loc, const mem_usage &total) const
   {
 char *location_string = loc->to_string ();
 
diff --git a/gcc/sese.h b/gcc/sese.h
index 8afea28b07e..74d3fe3cd8a 100644
--- a/gcc/sese.h
+++ b/gcc/sese.h
@@ -45,7 +45,7 @@ void dump_sese (const sese_l &);
 /* Get the entry of an sese S.  */
 
 static inline basic_block
-get_entry_bb (sese_l &s)
+get_entry_bb (const sese_l &s)
 {
   return s.entry->dest;
 }
@@ -53,7 +53,7 @@ get_entry_bb (sese_l &s)
 /* Get the exit of an sese S.  */
 
 static inline basic_block
-get_exit_bb (sese_l &s)
+get_exit_bb (const sese_l &s)
 {
   return s.exit->src;
 }


[PATCH 0/4] Fix various minor issues seen with cppcheck

2020-02-06 Thread Martin Liska
Hi.

The series is about small issues that were spotted with cppcheck
and where David Binderman suggested a patch.

It's probably a stage1 material?

Martin

Martin Liska (4):
  Remove 2 dead variables in bid_internal.h.
  Use const for some function arguments.
  Put index check before use.
  Use const for template argument.

 gcc/alloc-pool.h   | 2 +-
 gcc/bitmap.h   | 2 +-
 gcc/mem-stats.h| 4 ++--
 gcc/sese.h | 4 ++--
 libgcc/config/libbid/bid_internal.h| 4 
 liboffloadmic/runtime/offload_target.cpp   | 2 +-
 libstdc++-v3/include/parallel/multiway_merge.h | 4 ++--
 7 files changed, 9 insertions(+), 13 deletions(-)

-- 
2.25.0



Re: [PATCH 2/3] libstdc++: Implement C++20 constrained algorithms

2020-02-06 Thread Jonathan Wakely

On 05/02/20 14:24 -0500, Patrick Palka wrote:

Also IIRC, the way __miter_base() is currently defined assumes that the
underlying iterator is copyable which is not necessarily true anymore
for non-forward iterators.  So I would have to also fix __miter_base()
which might be risky to do at this stage.


Agreed. The current patch only affects C++20, which makes it much less
risky.

I was thinking about how range algos interact with debug mode, and I
think we might want to take the opportunity to do things a bit
differently.

Just like if-constexpr allows algos to use different implementations
without tag-dispatching, we might be able to simplify how we deal with
debug iterators.

For example, instead of spliting every algo into foo and __foo parts
and making foo do the debug checks, then unwrap the debug iterators
and call __foo, we could just unwrap them and recursively call the
same function again:

template
  constexpr It
  foo(It __first, __last)
  {
if constexpr (__is_debug_iter<_It>)
  {
// do debug checks ...
// and the work on unwrapped iterators:
return std::__niter_wrap(foo(std::__niter_base(__first),
 std::__niter_base(__last)));
  }

  // ...
  }

It's OK to use the functions that assume the iterators are copyable
here, because we know that our debug iterators are copyable.

We should also consider when we even need debug checks for the algos
taking a range. In many cases, calling foo(vec) doesn't need to check
if the iterators are valid, because we know that ranges::begin(vec)
and ranges::end(vec) will call vec.begin() and vec.end() which are
valid. That won't always be true, because somebody could create an
invalid range by trying hard enough, but I think in many cases we can
assume that a range doesn't contain invalid iterators. However, since
they just forward to the overload taking a pair of iterators, we will
get the debug checks there anyway. But I don't think the overloads
taking a range should do any debug checks explicitly.

We can add debug assertions to subrange, and to range adaptors like
take_view and drop_view to prevent the creation of invalid ranges in
the first place, so that we can assume they're valid after that.

I'll talk to the Microsoft library team about this topic when I see
them next week. I assume they've already been thinking about it and
will probably have some useful input.




Re: [PATCH] avoid issuing -Wrestrict from folder (PR 93519)

2020-02-06 Thread Richard Biener
On Thu, Feb 6, 2020 at 11:06 AM Richard Biener
 wrote:
>
> On Wed, Feb 5, 2020 at 4:55 PM Martin Sebor  wrote:
> >
> > On 2/5/20 1:19 AM, Richard Biener wrote:
> > > On Tue, Feb 4, 2020 at 11:02 PM Martin Sebor  wrote:
> > >>
> > >> On 2/4/20 2:31 PM, Jeff Law wrote:
> > >>> On Tue, 2020-02-04 at 13:08 -0700, Martin Sebor wrote:
> >  On 2/4/20 12:15 PM, Richard Biener wrote:
> > > On February 4, 2020 5:30:42 PM GMT+01:00, Jeff Law  
> > > wrote:
> > >> On Tue, 2020-02-04 at 10:34 +0100, Richard Biener wrote:
> > >>> On Tue, Feb 4, 2020 at 1:44 AM Martin Sebor  
> > >>> wrote:
> >  PR 93519 reports a false positive -Wrestrict issued for an inlined
> > >> call
> >  to strcpy that carefully guards against self-copying.  This is
> > >> caused
> >  by the caller's arguments substituted into the call during inlining
> > >> and
> >  before dead code elimination.
> > 
> >  The attached patch avoids this by removing -Wrestrict from the
> > >> folder
> >  and deferring folding perfectly overlapping (and so undefined)
> > >> calls
> >  to strcpy (and mempcpy, but not memcpy) until much later.  Calls to
> >  perfectly overlapping calls to memcpy are still folded early.
> > >>>
> > >>> Why do we bother to warn at all for this case?  Just DWIM here.
> > >> Warnings like
> > >>> this can be emitted from the analyzer?
> > >> They potentially can, but the analyzer is and will almost always
> > >> certainly be considerably slower.  I would not expect it to be used
> > >> nearly as much as the core compiler.
> > >>
> > >> WHether or not a particular warning makes sense in the core compiler 
> > >> or
> > >> analyzer would seem to me to depend on whether or not we can 
> > >> reasonably
> > >> issue warnings without interprocedural analysis.  double-free
> > >> realistically requires interprocedural analysis to be effective.  I'm
> > >> not sure Wrestrict really does.
> > >>
> > >>
> > >>> That is, I suggest to simply remove the bogus warning code from
> > >> folding
> > >>> (and _not_ fail the folding).
> > >> I haven't looked at the patch, but if we can get the warning out of 
> > >> the
> > >> folder that's certainly preferable.  And we could investigate 
> > >> deferring
> > >> self-copy removal.
> > >
> > > I think the issue is as usual, warning for code we'll later remove as 
> > > dead. Warning at folding is almost always premature.
> > 
> >  In this instance the code is reachable (or isn't obviously 
> >  unreachable).
> >  GCC doesn't remove it, but provides benign (and reasonable) semantics
> >  for it(*).  To me, that's one aspect of quality.  Letting the user know
> >  that the code is buggy is another.  I view that as at least as 
> >  important
> >  as folding the ill-effects away because it makes it possible to fix
> >  the problem so the code works correctly even with compilers that don't
> >  provide these benign semantics.
> > >>> If you look at the guts of what happens at the point where we issue the
> > >>> warning from within gimple_fold_builtin_strcpy we have:
> > >>>
> >  DCH_to_char (char * in, char * out, int collid)
> >  {
> >  int type;
> >  char * D.2148;
> >  char * dest;
> >  char * num;
> >  long unsigned int _4;
> >  char * _5;
> > 
> >  ;;   basic block 2, loop depth 0
> >  ;;pred:   ENTRY
> >  ;;succ:   4
> > 
> >  ;;   basic block 4, loop depth 0
> >  ;;pred:   2
> >  ;;succ:   5
> > 
> >  ;;   basic block 5, loop depth 0
> >  ;;pred:   4
> >  ;;succ:   6
> > 
> >  ;;   basic block 6, loop depth 0
> >  ;;pred:   5
> >  if (0 != 0)
> >    goto ; [53.47%]
> >  else
> >    goto ; [46.53%]
> >  ;;succ:   7
> >  ;;8
> > 
> >  ;;   basic block 7, loop depth 0
> >  ;;pred:   6
> >  strcpy (out_1(D), out_1(D));
> >  ;;succ:   8
> > 
> >  ;;   basic block 8, loop depth 0
> >  ;;pred:   6
> >  ;;7
> >  _4 = __builtin_strlen (out_1(D));
> >  _5 = out_1(D) + _4;
> >  __builtin_memcpy (_5, "foo", 4);
> >  ;;succ:   3
> > 
> >  ;;   basic block 3, loop depth 0
> >  ;;pred:   8
> >  return;
> >  ;;succ:   EXIT
> > 
> >  }
> > 
> > >>>
> > >>> Which shows the code is obviously unreachable in the case we're warning
> > >>> about.  You can't see this in the dumps because it's exposed by
> > >>> inlining, then cleaned up before writing the dump file.
> > >>
> > >> In the specific case of the bug the code is of course eliminated
> > >> because it's gua

Re: [PATCH] avoid issuing -Wrestrict from folder (PR 93519)

2020-02-06 Thread Richard Biener
On Wed, Feb 5, 2020 at 4:55 PM Martin Sebor  wrote:
>
> On 2/5/20 1:19 AM, Richard Biener wrote:
> > On Tue, Feb 4, 2020 at 11:02 PM Martin Sebor  wrote:
> >>
> >> On 2/4/20 2:31 PM, Jeff Law wrote:
> >>> On Tue, 2020-02-04 at 13:08 -0700, Martin Sebor wrote:
>  On 2/4/20 12:15 PM, Richard Biener wrote:
> > On February 4, 2020 5:30:42 PM GMT+01:00, Jeff Law  
> > wrote:
> >> On Tue, 2020-02-04 at 10:34 +0100, Richard Biener wrote:
> >>> On Tue, Feb 4, 2020 at 1:44 AM Martin Sebor  wrote:
>  PR 93519 reports a false positive -Wrestrict issued for an inlined
> >> call
>  to strcpy that carefully guards against self-copying.  This is
> >> caused
>  by the caller's arguments substituted into the call during inlining
> >> and
>  before dead code elimination.
> 
>  The attached patch avoids this by removing -Wrestrict from the
> >> folder
>  and deferring folding perfectly overlapping (and so undefined)
> >> calls
>  to strcpy (and mempcpy, but not memcpy) until much later.  Calls to
>  perfectly overlapping calls to memcpy are still folded early.
> >>>
> >>> Why do we bother to warn at all for this case?  Just DWIM here.
> >> Warnings like
> >>> this can be emitted from the analyzer?
> >> They potentially can, but the analyzer is and will almost always
> >> certainly be considerably slower.  I would not expect it to be used
> >> nearly as much as the core compiler.
> >>
> >> WHether or not a particular warning makes sense in the core compiler or
> >> analyzer would seem to me to depend on whether or not we can reasonably
> >> issue warnings without interprocedural analysis.  double-free
> >> realistically requires interprocedural analysis to be effective.  I'm
> >> not sure Wrestrict really does.
> >>
> >>
> >>> That is, I suggest to simply remove the bogus warning code from
> >> folding
> >>> (and _not_ fail the folding).
> >> I haven't looked at the patch, but if we can get the warning out of the
> >> folder that's certainly preferable.  And we could investigate deferring
> >> self-copy removal.
> >
> > I think the issue is as usual, warning for code we'll later remove as 
> > dead. Warning at folding is almost always premature.
> 
>  In this instance the code is reachable (or isn't obviously unreachable).
>  GCC doesn't remove it, but provides benign (and reasonable) semantics
>  for it(*).  To me, that's one aspect of quality.  Letting the user know
>  that the code is buggy is another.  I view that as at least as important
>  as folding the ill-effects away because it makes it possible to fix
>  the problem so the code works correctly even with compilers that don't
>  provide these benign semantics.
> >>> If you look at the guts of what happens at the point where we issue the
> >>> warning from within gimple_fold_builtin_strcpy we have:
> >>>
>  DCH_to_char (char * in, char * out, int collid)
>  {
>  int type;
>  char * D.2148;
>  char * dest;
>  char * num;
>  long unsigned int _4;
>  char * _5;
> 
>  ;;   basic block 2, loop depth 0
>  ;;pred:   ENTRY
>  ;;succ:   4
> 
>  ;;   basic block 4, loop depth 0
>  ;;pred:   2
>  ;;succ:   5
> 
>  ;;   basic block 5, loop depth 0
>  ;;pred:   4
>  ;;succ:   6
> 
>  ;;   basic block 6, loop depth 0
>  ;;pred:   5
>  if (0 != 0)
>    goto ; [53.47%]
>  else
>    goto ; [46.53%]
>  ;;succ:   7
>  ;;8
> 
>  ;;   basic block 7, loop depth 0
>  ;;pred:   6
>  strcpy (out_1(D), out_1(D));
>  ;;succ:   8
> 
>  ;;   basic block 8, loop depth 0
>  ;;pred:   6
>  ;;7
>  _4 = __builtin_strlen (out_1(D));
>  _5 = out_1(D) + _4;
>  __builtin_memcpy (_5, "foo", 4);
>  ;;succ:   3
> 
>  ;;   basic block 3, loop depth 0
>  ;;pred:   8
>  return;
>  ;;succ:   EXIT
> 
>  }
> 
> >>>
> >>> Which shows the code is obviously unreachable in the case we're warning
> >>> about.  You can't see this in the dumps because it's exposed by
> >>> inlining, then cleaned up before writing the dump file.
> >>
> >> In the specific case of the bug the code is of course eliminated
> >> because it's guarded by the if (s != d).  I was referring to
> >> the general (unguarded) case of:
> >>
> >> char *s = "", *p;
> >>
> >> int main (void)
> >> {
> >>   p = strcpy (s, s);
> >>   puts (p);
> >> }
> >>
> >> where GCC folds the assignment 'p = strcpy(s, s);' to effectively
> >> p = s;  That's perfectly reasonable but it could equally as well
> >> leave

Re: [PATCH] Use a non-empty test program to test ability to link

2020-02-06 Thread Richard Sandiford
Sandra Loosemore  writes:
> This patch is for PR 79193 and 88999, problems where libstdc++ is 
> mis-configuring itself when building for a bare-metal target because it 
> thinks it can link programs without pulling in the BSP that provides 
> low-level I/O support.  (Specifically, this was observed on nios2-elf 
> with Newlib and GDB semihosting.)  It'll build just fine if it 
> recognizes that it can only compile programs and not link them, but it's 
> confused because using an empty program to test for ability to link 
> succeeds.
>
> Is this configure change OK, and suitable for stage 4?
>
> BTW, I did run autoconf in every subdirectory that contains a 
> configure.ac, but it appears only libstc++-v3 actually uses this test; 
> all the other regenerated configure scripts were unchanged.

LGTM FWIW.  AIUI the effect will be to use the hard-coded list of
supported newlib features (as intended) instead of trying to detech them
at configure time.  But I guess there's a risk that this could unmask
latent problems on other targets by enabling features that some BSPs don't
support, either due to a limited libgloss or due to some other restriction.
Jeff's autotesters will probably pick that up though.

Thanks,
Richard

>
> -Sandra
>
> From 44b769a9b5e01a58c9b89b24ca5a00fc1ff53012 Mon Sep 17 00:00:00 2001
> From: Sandra Loosemore 
> Date: Wed, 5 Feb 2020 10:03:58 -0800
> Subject: [PATCH] Use a non-empty test program to test ability to link.
>
> On bare-metal targets, I/O support is typically provided by a BSP and
> requires a linker script and/or hosting library to be specified on the
> linker command line.  Linking an empty program with the default linker
> script may succeed, however, which confuses libstdc++ configuration
> when programs that probe for the presence of various I/O features fail
> with link errors.
>
> 2020-02-05  Sandra Loosemore  
>
>   PR libstdc++/79193
>   PR libstdc++/88999
>
>   config/
>   * no-executables.m4: Use a non-empty program to test for linker
>   support.
>
>   libstdc++v-3/
>   * configure: Regenerated.
> ---
>  config/ChangeLog | 8 
>  config/no-executables.m4 | 4 +++-
>  libstdc++-v3/ChangeLog   | 7 +++
>  libstdc++-v3/configure   | 4 ++--
>  4 files changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/config/ChangeLog b/config/ChangeLog
> index f1fec81..d2a12bd 100644
> --- a/config/ChangeLog
> +++ b/config/ChangeLog
> @@ -1,3 +1,11 @@
> +2020-02-05  Sandra Loosemore  
> +
> + PR libstdc++/79193
> + PR libstdc++/88999
> +
> + * no-executables.m4: Use a non-empty program to test for linker
> + support.
> +
>  2020-02-01  Andrew Burgess  
>  
>   * lib-link.m4 (AC_LIB_LINKFLAGS_BODY): Update shell syntax.
> diff --git a/config/no-executables.m4 b/config/no-executables.m4
> index 9061624..6842f84 100644
> --- a/config/no-executables.m4
> +++ b/config/no-executables.m4
> @@ -25,7 +25,9 @@ AC_BEFORE([$0], [_AC_COMPILER_EXEEXT])
>  AC_BEFORE([$0], [AC_LINK_IFELSE])
>  
>  m4_define([_AC_COMPILER_EXEEXT],
> -[AC_LANG_CONFTEST([AC_LANG_PROGRAM()])
> +[AC_LANG_CONFTEST([AC_LANG_PROGRAM(
> +  [#include ],
> +  [printf ("hello world\n");])])
>  # FIXME: Cleanup?
>  AS_IF([AC_TRY_EVAL(ac_link)], [gcc_no_link=no], [gcc_no_link=yes])
>  if test x$gcc_no_link = xyes; then
> diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
> index 76a6e2b..46ab7c0 100644
> --- a/libstdc++-v3/ChangeLog
> +++ b/libstdc++-v3/ChangeLog
> @@ -1,3 +1,10 @@
> +2020-02-05  Sandra Loosemore  
> +
> + PR libstdc++/79193
> + PR libstdc++/88999
> +
> + * configure: Regenerated.
> +
>  2020-02-05  Jonathan Wakely  
>  
>   * include/bits/iterator_concepts.h (iter_reference_t)
> diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
> index a39c33b..9f9c5a2 100755
> --- a/libstdc++-v3/configure
> +++ b/libstdc++-v3/configure
> @@ -4130,11 +4130,11 @@ done
>  
>  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
>  /* end confdefs.h.  */
> -
> +#include 
>  int
>  main ()
>  {
> -
> +printf ("hello world\n");
>;
>return 0;
>  }


Re: [PATCH coroutines] Change lowering behavior of alias variable from copy to substitute

2020-02-06 Thread Iain Sandoe

Hi JunMa,

JunMa  wrote:


在 2020/2/4 下午8:17, JunMa 写道:

Hi
When testing coroutines with lambda function, I find we copy each captured
variable to frame. However, according to gimplify pass, for each  
declaration

that is an alias for another expression(DECL_VALUE_EXPR), we can
substitute them directly.

Since lambda captured variables is one of this kind. It is better to  
replace them
rather than copy them, This can reduce frame size (all of the captured  
variables

are field of closure class) and avoid extra copy behavior as well.

This patch remove all of the code related to copy captured variable.
Instead, we first rewrite DECL_VALUE_EXPR with frame field, then we check
every variable whether it has DECL_VALUE_EXPR, and then substitute it,  
this

patch does not create frame field for such variables.

Bootstrap and test on X86_64, is it OK?



minor update: only handle var_decl when iterate BIND_EXPR_VARS
in register_local_var_uses.


Do you have any other local patches applied along with this?

Testing locally (on Darwin), I see regressions with optimisation  O2/O3/Os  
e.g:


class-05-lambda-capture-copy-local.C   -O2  (internal compiler error)
class-06-lambda-capture-ref.C   -O2  (internal compiler error)
lambda-05-capture-copy-local.C   -O2  (internal compiler error)
lambda-06-multi-capture.C   -O2  (internal compiler error)
lambda-07-multi-capture.C   -O2  (internal compiler error)
lambda-08-co-ret-parm-ref.C   -O3 -g  (internal compiler error)

I have applied this to master, and on top of the patches posted by you and
Bin, but the results are the same.

thanks
Iain


gcc/cp
2020-02-04  Jun Ma 

* coroutines.cc (morph_fn_to_coro): Remove code related to
copy captured variable.
(register_local_var_uses):  Ditto.
(register_param_uses):  Collect use of parameters inside
DECL_VALUE_EXPR.
(transform_local_var_uses): Substitute the alias variable
with DECL_VALUE_EXPR if it has one.


gcc/testsuite
2020-02-04  Jun Ma 

* g++.dg/coroutines/lambda-07-multi-capture.C: New test.



<0001-fix-alias-variable.patch>





Re: [PATCH] Use a non-empty test program to test ability to link

2020-02-06 Thread Jonathan Wakely

On 05/02/20 11:52 -0700, Sandra Loosemore wrote:
This patch is for PR 79193 and 88999, problems where libstdc++ is 
mis-configuring itself when building for a bare-metal target because 
it thinks it can link programs without pulling in the BSP that 
provides low-level I/O support.  (Specifically, this was observed on 
nios2-elf with Newlib and GDB semihosting.)  It'll build just fine if 
it recognizes that it can only compile programs and not link them, but 
it's confused because using an empty program to test for ability to 
link succeeds.


Is this configure change OK, and suitable for stage 4?

BTW, I did run autoconf in every subdirectory that contains a 
configure.ac, but it appears only libstc++-v3 actually uses this test; 
all the other regenerated configure scripts were unchanged.


Thanks for making this work properly.

No objection from me, but I can't approve the top-level configury
change, and the whole "no executables" thing is a bit of a mystery to
me.  Joseph is probably the right person to approve that, if no other
relevant maintainer responds.




Re: [PATCH] i386: Improve avx* vector concatenation [PR93594]

2020-02-06 Thread Uros Bizjak
On Thu, Feb 6, 2020 at 9:34 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The following testcase shows that for _mm256_set*_m128i and similar
> intrinsics, we sometimes generate bad code.  All 4 routines are expressing
> the same thing, a 128-bit vector zero padded to 256-bit vector, but only the
> 3rd one actually emits the desired vmovdqa  %xmm0, %xmm0 insn, the
> others vpxor%xmm1, %xmm1, %xmm1; vinserti128$0x1, %xmm1, %ymm0, 
> %ymm0
> The problem is that the cast builtins use UNSPEC_CAST which is after reload
> simplified using a splitter, but during combine it prevents optimizations.
> We do have avx_vec_concat* patterns that generate efficient code, both for
> this low part + zero concatenation special case and for other cases too, so
> the following define_insn_and_split just recognizes avx_vec_concat made of a
> low half of a cast and some other reg.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2020-02-06  Jakub Jelinek  
>
> PR target/93594
> * config/i386/predicates.md (avx_identity_operand): New predicate.
> * config/i386/sse.md (*avx_vec_concat_1): New
> define_insn_and_split.
>
> * gcc.target/i386/avx2-pr93594.c: New test.

LGTM.

Thanks,
Uros.

> --- gcc/config/i386/predicates.md.jj2020-01-12 11:54:36.331414646 +0100
> +++ gcc/config/i386/predicates.md   2020-02-05 17:44:44.663517106 +0100
> @@ -1584,6 +1584,19 @@ (define_predicate "palignr_operand"
>return true;
>  })
>
> +;; Return true if OP is a parallel for identity permute.
> +(define_predicate "avx_identity_operand"
> +  (and (match_code "parallel")
> +   (match_code "const_int" "a"))
> +{
> +  int i, nelt = XVECLEN (op, 0);
> +
> +  for (i = 0; i < nelt; ++i)
> +if (INTVAL (XVECEXP (op, 0, i)) != i)
> +  return false;
> +  return true;
> +})
> +
>  ;; Return true if OP is a proper third operand to vpblendw256.
>  (define_predicate "avx2_pblendw_operand"
>(match_code "const_int")
> --- gcc/config/i386/sse.md.jj   2020-02-05 15:38:06.636292475 +0100
> +++ gcc/config/i386/sse.md  2020-02-05 17:55:06.696352286 +0100
> @@ -21358,6 +21358,24 @@ (define_insn "avx_vec_concat"
> (set_attr "prefix" "maybe_evex")
> (set_attr "mode" "")])
>
> +(define_insn_and_split "*avx_vec_concat_1"
> +  [(set (match_operand:V_256_512 0 "register_operand")
> +   (vec_concat:V_256_512
> + (vec_select:
> +   (unspec:V_256_512
> + [(match_operand: 1 "nonimmediate_operand")]
> + UNSPEC_CAST)
> +   (match_parallel 3 "avx_identity_operand"
> + [(match_operand 4 "const_int_operand")]))
> + (match_operand: 2 "nonimm_or_0_operand")))]
> +  "TARGET_AVX
> +   && (operands[2] == CONST0_RTX (mode)
> +   || !MEM_P (operands[1]))
> +   && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0) (vec_concat:V_256_512 (match_dup 1) (match_dup 2)))])
> +
>  (define_insn "vcvtph2ps"
>[(set (match_operand:V4SF 0 "register_operand" "=v")
> (vec_select:V4SF
> --- gcc/testsuite/gcc.target/i386/avx2-pr93594.c.jj 2020-02-05 
> 17:59:33.470416968 +0100
> +++ gcc/testsuite/gcc.target/i386/avx2-pr93594.c2020-02-05 
> 18:06:20.703403613 +0100
> @@ -0,0 +1,32 @@
> +/* PR target/93594 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx2 -masm=att" } */
> +/* { dg-final { scan-assembler-times "vmovdqa\t%xmm0, %xmm0" 4 } } */
> +/* { dg-final { scan-assembler-not "vpxor\t%" } } */
> +/* { dg-final { scan-assembler-not "vinserti128\t\\\$" } } */
> +
> +#include 
> +
> +__m256i
> +foo (__m128i x)
> +{
> +  return _mm256_setr_m128i (x, _mm_setzero_si128 ());
> +}
> +
> +__m256i
> +bar (__m128i x)
> +{
> +  return _mm256_set_m128i (_mm_setzero_si128 (), x);
> +}
> +
> +__m256i
> +baz (__m128i x)
> +{
> +  return _mm256_insertf128_si256 (_mm256_setzero_si256 (), x, 0);
> +}
> +
> +__m256i
> +qux (__m128i x)
> +{
> +  return _mm256_insertf128_si256 (_mm256_castsi128_si256 (x), 
> _mm_setzero_si128 (), 1);
> +}
>
> Jakub
>


[PATCH] i386: Improve avx* vector concatenation [PR93594]

2020-02-06 Thread Jakub Jelinek
Hi!

The following testcase shows that for _mm256_set*_m128i and similar
intrinsics, we sometimes generate bad code.  All 4 routines are expressing
the same thing, a 128-bit vector zero padded to 256-bit vector, but only the
3rd one actually emits the desired vmovdqa  %xmm0, %xmm0 insn, the
others vpxor%xmm1, %xmm1, %xmm1; vinserti128$0x1, %xmm1, %ymm0, 
%ymm0
The problem is that the cast builtins use UNSPEC_CAST which is after reload
simplified using a splitter, but during combine it prevents optimizations.
We do have avx_vec_concat* patterns that generate efficient code, both for
this low part + zero concatenation special case and for other cases too, so
the following define_insn_and_split just recognizes avx_vec_concat made of a
low half of a cast and some other reg.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-02-06  Jakub Jelinek  

PR target/93594
* config/i386/predicates.md (avx_identity_operand): New predicate.
* config/i386/sse.md (*avx_vec_concat_1): New
define_insn_and_split.

* gcc.target/i386/avx2-pr93594.c: New test.

--- gcc/config/i386/predicates.md.jj2020-01-12 11:54:36.331414646 +0100
+++ gcc/config/i386/predicates.md   2020-02-05 17:44:44.663517106 +0100
@@ -1584,6 +1584,19 @@ (define_predicate "palignr_operand"
   return true;
 })
 
+;; Return true if OP is a parallel for identity permute.
+(define_predicate "avx_identity_operand"
+  (and (match_code "parallel")
+   (match_code "const_int" "a"))
+{
+  int i, nelt = XVECLEN (op, 0);
+
+  for (i = 0; i < nelt; ++i)
+if (INTVAL (XVECEXP (op, 0, i)) != i)
+  return false;
+  return true;
+})
+
 ;; Return true if OP is a proper third operand to vpblendw256.
 (define_predicate "avx2_pblendw_operand"
   (match_code "const_int")
--- gcc/config/i386/sse.md.jj   2020-02-05 15:38:06.636292475 +0100
+++ gcc/config/i386/sse.md  2020-02-05 17:55:06.696352286 +0100
@@ -21358,6 +21358,24 @@ (define_insn "avx_vec_concat"
(set_attr "prefix" "maybe_evex")
(set_attr "mode" "")])
 
+(define_insn_and_split "*avx_vec_concat_1"
+  [(set (match_operand:V_256_512 0 "register_operand")
+   (vec_concat:V_256_512
+ (vec_select:
+   (unspec:V_256_512
+ [(match_operand: 1 "nonimmediate_operand")]
+ UNSPEC_CAST)
+   (match_parallel 3 "avx_identity_operand"
+ [(match_operand 4 "const_int_operand")]))
+ (match_operand: 2 "nonimm_or_0_operand")))]
+  "TARGET_AVX
+   && (operands[2] == CONST0_RTX (mode)
+   || !MEM_P (operands[1]))
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0) (vec_concat:V_256_512 (match_dup 1) (match_dup 2)))])
+
 (define_insn "vcvtph2ps"
   [(set (match_operand:V4SF 0 "register_operand" "=v")
(vec_select:V4SF
--- gcc/testsuite/gcc.target/i386/avx2-pr93594.c.jj 2020-02-05 
17:59:33.470416968 +0100
+++ gcc/testsuite/gcc.target/i386/avx2-pr93594.c2020-02-05 
18:06:20.703403613 +0100
@@ -0,0 +1,32 @@
+/* PR target/93594 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx2 -masm=att" } */
+/* { dg-final { scan-assembler-times "vmovdqa\t%xmm0, %xmm0" 4 } } */
+/* { dg-final { scan-assembler-not "vpxor\t%" } } */
+/* { dg-final { scan-assembler-not "vinserti128\t\\\$" } } */
+
+#include 
+
+__m256i
+foo (__m128i x)
+{
+  return _mm256_setr_m128i (x, _mm_setzero_si128 ());
+}
+
+__m256i
+bar (__m128i x)
+{
+  return _mm256_set_m128i (_mm_setzero_si128 (), x);
+}
+
+__m256i
+baz (__m128i x)
+{
+  return _mm256_insertf128_si256 (_mm256_setzero_si256 (), x, 0);
+}
+
+__m256i
+qux (__m128i x)
+{
+  return _mm256_insertf128_si256 (_mm256_castsi128_si256 (x), 
_mm_setzero_si128 (), 1);
+}

Jakub



openmp: Fix handling of non-addressable shared scalars in parallel nested inside of target [PR93515]

2020-02-06 Thread Jakub Jelinek
Hi!

As the following testcase shows, we need to consider even target to be a 
construct
that forces not to use copy in/out for shared on parallel inside of the target.
E.g. for parallel nested inside another parallel or host teams, we already avoid
copy in/out and we need to treat target the same.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2020-02-06  Jakub Jelinek  

PR libgomp/93515
* omp-low.c (use_pointer_for_field): For nested constructs, also
look for map clauses on target construct.
(scan_omp_1_stmt) : Bump temporarily
taskreg_nesting_level.

* testsuite/libgomp.c-c++-common/pr93515.c: New test.

--- gcc/omp-low.c.jj2020-01-12 11:54:36.688409260 +0100
+++ gcc/omp-low.c   2020-01-31 15:00:46.852168424 +0100
@@ -477,18 +477,30 @@ use_pointer_for_field (tree decl, omp_co
  omp_context *up;
 
  for (up = shared_ctx->outer; up; up = up->outer)
-   if (is_taskreg_ctx (up) && maybe_lookup_decl (decl, up))
+   if ((is_taskreg_ctx (up)
+|| (gimple_code (up->stmt) == GIMPLE_OMP_TARGET
+&& is_gimple_omp_offloaded (up->stmt)))
+   && maybe_lookup_decl (decl, up))
  break;
 
  if (up)
{
  tree c;
 
- for (c = gimple_omp_taskreg_clauses (up->stmt);
-  c; c = OMP_CLAUSE_CHAIN (c))
-   if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SHARED
-   && OMP_CLAUSE_DECL (c) == decl)
- break;
+ if (gimple_code (up->stmt) == GIMPLE_OMP_TARGET)
+   {
+ for (c = gimple_omp_target_clauses (up->stmt);
+  c; c = OMP_CLAUSE_CHAIN (c))
+   if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+   && OMP_CLAUSE_DECL (c) == decl)
+ break;
+   }
+ else
+   for (c = gimple_omp_taskreg_clauses (up->stmt);
+c; c = OMP_CLAUSE_CHAIN (c))
+ if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SHARED
+ && OMP_CLAUSE_DECL (c) == decl)
+   break;
 
  if (c)
goto maybe_mark_addressable_and_ret;
@@ -3781,7 +3793,14 @@ scan_omp_1_stmt (gimple_stmt_iterator *g
   break;
 
 case GIMPLE_OMP_TARGET:
-  scan_omp_target (as_a  (stmt), ctx);
+  if (is_gimple_omp_offloaded (stmt))
+   {
+ taskreg_nesting_level++;
+ scan_omp_target (as_a  (stmt), ctx);
+ taskreg_nesting_level--;
+   }
+  else
+   scan_omp_target (as_a  (stmt), ctx);
   break;
 
 case GIMPLE_OMP_TEAMS:
--- libgomp/testsuite/libgomp.c-c++-common/pr93515.c.jj 2020-01-31 
14:53:01.163112148 +0100
+++ libgomp/testsuite/libgomp.c-c++-common/pr93515.c2020-01-31 
14:52:38.627448474 +0100
@@ -0,0 +1,36 @@
+/* PR libgomp/93515 */
+
+#include 
+#include 
+
+int
+main ()
+{
+  int i;
+  int a = 42;
+#pragma omp target teams distribute parallel for defaultmap(tofrom: scalar)
+  for (i = 0; i < 64; ++i)
+if (omp_get_team_num () == 0)
+  if (omp_get_thread_num () == 0)
+   a = 142;
+  if (a != 142)
+__builtin_abort ();
+  a = 42;
+#pragma omp target parallel for defaultmap(tofrom: scalar)
+  for (i = 0; i < 64; ++i)
+if (omp_get_thread_num () == 0)
+  a = 143;
+  if (a != 143)
+__builtin_abort ();
+  a = 42;
+#pragma omp target firstprivate(a)
+  {
+#pragma omp parallel for
+for (i = 0; i < 64; ++i)
+  if (omp_get_thread_num () == 0)
+   a = 144;
+if (a != 144)
+  abort ();
+  }
+  return 0;
+}


Jakub



[PATCH] openmp: Notice reduction decl in outer contexts after adding it to shared [PR93515]

2020-02-06 Thread Jakub Jelinek
Hi!

If we call omp_add_variable, following omp_notice_variable will already find it
on that construct and not go through outer constructs, the following patch 
fixes that.
Note, this still doesn't follow OpenMP 5.0 semantics on target combined with 
other
constructs with reduction/lastprivate/linear clauses, will handle that for 
GCC11.

Without this patch, the patch I'll post next breaks
c-c++-common/gomp/loop-5.c testcase.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2020-02-06  Jakub Jelinek  

PR libgomp/93515
* gimplify.c (gimplify_scan_omp_clauses) : If adding
shared clause, call omp_notice_variable on outer context if any.

--- gcc/gimplify.c.jj   2020-01-17 12:42:46.0 +0100
+++ gcc/gimplify.c  2020-02-05 15:25:25.316658638 +0100
@@ -9464,9 +9464,13 @@ gimplify_scan_omp_clauses (tree *list_p,
  == POINTER_TYPE
omp_firstprivatize_variable (outer_ctx, decl);
  else
-   omp_add_variable (outer_ctx, decl,
- GOVD_SEEN | GOVD_SHARED);
- omp_notice_variable (outer_ctx, decl, true);
+   {
+ omp_add_variable (outer_ctx, decl,
+   GOVD_SEEN | GOVD_SHARED);
+ if (outer_ctx->outer_context)
+   omp_notice_variable (outer_ctx->outer_context, decl,
+true);
+   }
}
}
  if (outer_ctx)


Jakub



Re: [PATCH] Add patch_area_size and patch_area_entry to crtl

2020-02-06 Thread Richard Sandiford
"H.J. Lu"  writes:
> On Wed, Feb 5, 2020 at 2:51 PM H.J. Lu  wrote:
>>
>> On Wed, Feb 5, 2020 at 2:37 PM Marek Polacek  wrote:
>> >
>> > On Wed, Feb 05, 2020 at 02:24:48PM -0800, H.J. Lu wrote:
>> > > On Wed, Feb 5, 2020 at 12:20 PM H.J. Lu  wrote:
>> > > >
>> > > > On Wed, Feb 5, 2020 at 9:00 AM Richard Sandiford
>> > > >  wrote:
>> > > > >
>> > > > > "H.J. Lu"  writes:
>> > > > > > Currently patchable area is at the wrong place.
>> > > > >
>> > > > > Agreed :-)
>> > > > >
>> > > > > > It is placed immediately
>> > > > > > after function label and before .cfi_startproc.  A backend should 
>> > > > > > be able
>> > > > > > to add a pseudo patchable area instruction durectly into RTL.  
>> > > > > > This patch
>> > > > > > adds patch_area_size and patch_area_entry to cfun so that the 
>> > > > > > patchable
>> > > > > > area info is available in RTL passes.
>> > > > >
>> > > > > It might be better to add it to crtl, since it should only be needed
>> > > > > during rtl generation.
>> > > > >
>> > > > > > It also limits patch_area_size and patch_area_entry to 65535, 
>> > > > > > which is
>> > > > > > a reasonable maximum size for patchable area.
>> > > > > >
>> > > > > > gcc/
>> > > > > >
>> > > > > >   PR target/93492
>> > > > > >   * function.c (expand_function_start): Set 
>> > > > > > cfun->patch_area_size
>> > > > > >   and cfun->patch_area_entry.
>> > > > > >   * function.h (function): Add patch_area_size and 
>> > > > > > patch_area_entry.
>> > > > > >   * opts.c (common_handle_option): Limit
>> > > > > >   function_entry_patch_area_size and 
>> > > > > > function_entry_patch_area_start
>> > > > > >   to USHRT_MAX.  Fix a typo in error message.
>> > > > > >   * varasm.c (assemble_start_function): Use 
>> > > > > > cfun->patch_area_size
>> > > > > >   and cfun->patch_area_entry.
>> > > > > >   * doc/invoke.texi: Document the maximum value for
>> > > > > >   -fpatchable-function-entry.
>> > > > > >
>> > > > > > gcc/testsuite/
>> > > > > >
>> > > > > >   PR target/93492
>> > > > > >   * c-c++-common/patchable_function_entry-error-1.c: New test.
>> > > > > >   * c-c++-common/patchable_function_entry-error-2.c: Likewise.
>> > > > > >   * c-c++-common/patchable_function_entry-error-3.c: Likewise.
>> > > > > > ---
>> > > > > >  gcc/doc/invoke.texi   |  1 +
>> > > > > >  gcc/function.c| 35 
>> > > > > > +++
>> > > > > >  gcc/function.h|  6 
>> > > > > >  gcc/opts.c|  4 ++-
>> > > > > >  .../patchable_function_entry-error-1.c|  9 +
>> > > > > >  .../patchable_function_entry-error-2.c|  9 +
>> > > > > >  .../patchable_function_entry-error-3.c| 20 +++
>> > > > > >  gcc/varasm.c  | 30 
>> > > > > > ++--
>> > > > > >  8 files changed, 85 insertions(+), 29 deletions(-)
>> > > > > >  create mode 100644 
>> > > > > > gcc/testsuite/c-c++-common/patchable_function_entry-error-1.c
>> > > > > >  create mode 100644 
>> > > > > > gcc/testsuite/c-c++-common/patchable_function_entry-error-2.c
>> > > > > >  create mode 100644 
>> > > > > > gcc/testsuite/c-c++-common/patchable_function_entry-error-3.c
>> > > > > >
>> > > > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> > > > > > index 35b341e759f..dd4835199b0 100644
>> > > > > > --- a/gcc/doc/invoke.texi
>> > > > > > +++ b/gcc/doc/invoke.texi
>> > > > > > @@ -13966,6 +13966,7 @@ If @code{N=0}, no pad location is recorded.
>> > > > > >  The NOP instructions are inserted at---and maybe before, 
>> > > > > > depending on
>> > > > > >  @var{M}---the function entry address, even before the prologue.
>> > > > > >
>> > > > > > +The maximum value of @var{N} and @var{M} is 65535.
>> > > > > >  @end table
>> > > > > >
>> > > > > >
>> > > > > > diff --git a/gcc/function.c b/gcc/function.c
>> > > > > > index d8008f60422..badbf538eec 100644
>> > > > > > --- a/gcc/function.c
>> > > > > > +++ b/gcc/function.c
>> > > > > > @@ -5202,6 +5202,41 @@ expand_function_start (tree subr)
>> > > > > >/* If we are doing generic stack checking, the probe should go 
>> > > > > > here.  */
>> > > > > >if (flag_stack_check == GENERIC_STACK_CHECK)
>> > > > > >  stack_check_probe_note = emit_note (NOTE_INSN_DELETED);
>> > > > > > +
>> > > > > > +  unsigned HOST_WIDE_INT patch_area_size = 
>> > > > > > function_entry_patch_area_size;
>> > > > > > +  unsigned HOST_WIDE_INT patch_area_entry = 
>> > > > > > function_entry_patch_area_start;
>> > > > > > +
>> > > > > > +  tree patchable_function_entry_attr
>> > > > > > += lookup_attribute ("patchable_function_entry",
>> > > > > > + DECL_ATTRIBUTES (cfun->decl));
>> > > > > > +  if (patchable_function_entry_attr)
>> > > > > > +{
>> > > > > > +  tree pp_val = TREE_VALUE (patchable_function_entry_attr);
>> > > > > > +  tree