[Bug c++/68689] flexible array members in unions accepted in C++

2015-12-03 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68689

Martin Sebor  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2015-12-04
   Assignee|unassigned at gcc dot gnu.org  |msebor at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Martin Sebor  ---
Working on a patch.

[Bug c++/68689] New: flexible array members in unions accepted in C++

2015-12-03 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68689

Bug ID: 68689
   Summary: flexible array members in unions accepted in C++
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: msebor at gcc dot gnu.org
  Target Milestone: ---

GCC (in C mode) rejects flexible array members in unions.  G++ (i.e., in C++
mode) accepts them.  Since flexible array members are a G++ extension provided
for compatibility with C, and specifically GCC, G++ should accept and reject
the same constructs as GCC does.

$ cat z.cpp && /build/gcc-trunk-svn/gcc/xg++ -B /build/gcc-trunk-svn/gcc -S
-Wall -Wextra -Wpedantic -o/dev/null z.cpp
union U {
int n;
int a[];
} u;
z.cpp:3:11: warning: ISO C++ forbids zero-size array ‘a’ [-Wpedantic]
 int a[];
   ^

[Bug target/68690] New: PowerPC64: TOC save in PHP core loop results in load hit store

2015-12-03 Thread anton at samba dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68690

Bug ID: 68690
   Summary: PowerPC64: TOC save in PHP core loop results in load
hit store
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: anton at samba dot org
  Target Milestone: ---

We see a load hit store issue in the core loop of the PHP interpreter. A simple
test case:

$ cat test.c

void (*fn)(void);

void do_nothing(void) { }

int main(void)
{
unsigned long i;

fn = do_nothing;
for (i = 0; i < 10; i++)
fn();

return 0;
}

$ gcc -O2 -o test test.c

We continually save the TOC on each call:

13f0:   00 00 3e e9 ld  r9,0(r30)
13f4:   a6 03 29 7d mtctr   r9
13f8:   78 4b 2c 7d mr  r12,r9
13fc:   18 00 41 f8 std r2,24(r1) <-- save it again!
1400:   21 04 80 4e bctrl
1404:   18 00 41 e8 ld  r2,24(r1)
1408:   01 00 3f 2c cmpdi   r31,1
140c:   ff ff ff 3b addir31,r31,-1
1410:   e0 ff 82 40 bne 13f0 

This should be moved out of the loop. One way to force that is via the
-msave-toc-indirect option:

gcc -msave-toc-indirect -O2 -o test test.c

Which is over 2x faster on this (admittedly worst case) test. For something
more real world, we also see a 20% speedup on PHP7 on various microbenchmarks.

[Bug tree-optimization/68692] New: [graphite] ice: Segmentation fault

2015-12-03 Thread Joost.VandeVondele at mat dot ethz.ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68692

Bug ID: 68692
   Summary: [graphite] ice: Segmentation fault
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: Joost.VandeVondele at mat dot ethz.ch
  Target Milestone: ---

> cat bug.f90
MODULE spme
  INTEGER, PARAMETER :: dp=8
  PRIVATE
  PUBLIC :: get_patch
CONTAINS
  SUBROUTINE get_patch ( part, box, green, npts, p, rhos, is_core, is_shell,&
 unit_charge, charges, coeff, n )
INTEGER, POINTER :: box
REAL(KIND=dp), &
  DIMENSION(-(n-1):n-1, 0:n-1), &
  INTENT(IN) :: coeff
INTEGER, DIMENSION(3), INTENT(IN):: npts
REAL(KIND=dp), DIMENSION(:, :, :), &
  INTENT(OUT):: rhos
REAL(KIND=dp):: q
REAL(KIND=dp), DIMENSION(3)  :: delta, r
CALL get_delta ( box, r, npts, delta, nbox )
CALL spme_get_patch ( rhos, nbox, delta, q, coeff )
  END SUBROUTINE get_patch
  SUBROUTINE spme_get_patch ( rhos, n, delta, q, coeff )
REAL(KIND=dp), DIMENSION(:, :, :), &
  INTENT(OUT):: rhos
REAL(KIND=dp), DIMENSION(3), INTENT(IN)  :: delta
REAL(KIND=dp), INTENT(IN):: q
REAL(KIND=dp), &
  DIMENSION(-(n-1):n-1, 0:n-1), &
  INTENT(IN) :: coeff
INTEGER, PARAMETER   :: nmax = 12
REAL(KIND=dp), DIMENSION(3, -nmax:nmax)  :: w_assign
REAL(KIND=dp), DIMENSION(3, 0:nmax-1):: deltal
REAL(KIND=dp), DIMENSION(3, 1:nmax)  :: f_assign
DO l = 1, n-1
   deltal ( 3, l ) = deltal ( 3, l-1 ) * delta ( 3 )
END DO
DO j = -(n-1), n-1, 2
   DO l = 0, n-1
  w_assign ( 1, j ) =  w_assign ( 1, j ) + &
 coeff ( j, l ) * deltal ( 1, l )
   END DO
   f_assign (3, i ) = w_assign ( 3, j )
   DO i2 = 1, n
  DO i1 = 1, n
 rhos ( i1, i2, i3 ) = r2 * f_assign ( 1, i1 )
  END DO
   END DO
END DO
  END SUBROUTINE spme_get_patch
  SUBROUTINE get_delta ( box, r, npts, delta, n )
INTEGER, POINTER :: box
REAL(KIND=dp), DIMENSION(3), INTENT(IN)  :: r
INTEGER, DIMENSION(3), INTENT(IN):: npts
REAL(KIND=dp), DIMENSION(3), INTENT(OUT) :: delta
INTEGER, DIMENSION(3):: center
REAL(KIND=dp), DIMENSION(3)  :: ca, grid_i, s
CALL real_to_scaled(s,r,box)
s = s - REAL ( NINT ( s ),KIND=dp)
IF ( MOD ( n, 2 ) == 0 ) THEN
   ca ( : ) = REAL ( center ( : ) )
END IF
delta ( : ) = grid_i ( : ) - ca ( : )
  END SUBROUTINE get_delta
END MODULE spme


> gfortran  -c -O3  -floop-nest-optimize  bug.f90
bug.f90:6:0:

   SUBROUTINE get_patch ( part, box, green, npts, p, rhos, is_core, is_shell,&


internal compiler error: Segmentation fault
0xb676cf crash_signal
../../gcc/gcc/toplev.c:334
0xbba647 ssa_default_def(function*, tree_node*)
../../gcc/gcc/tree-dfa.c:305
0xbbd088 get_or_create_ssa_default_def(function*, tree_node*)
../../gcc/gcc/tree-dfa.c:357
0xbf3e83 get_reaching_def
../../gcc/gcc/tree-into-ssa.c:1168
0xbf3e83 get_reaching_def
../../gcc/gcc/tree-into-ssa.c:1155
0xbf5dbe maybe_replace_use
../../gcc/gcc/tree-into-ssa.c:1753
0xbf5dbe rewrite_update_stmt
../../gcc/gcc/tree-into-ssa.c:1948
0xbf5dbe rewrite_update_dom_walker::before_dom_children(basic_block_def*)
../../gcc/gcc/tree-into-ssa.c:2128
0xbf5dbe rewrite_update_dom_walker::before_dom_children(basic_block_def*)
../../gcc/gcc/tree-into-ssa.c:2068
0x125a71a dom_walker::walk(basic_block_def*)
../../gcc/gcc/domwalk.c:176
0xbf28b5 rewrite_blocks
../../gcc/gcc/tree-into-ssa.c:2190
0xbf9a68 update_ssa(unsigned int)
../../gcc/gcc/tree-into-ssa.c:3351
0x128530a graphite_regenerate_ast_isl(scop*)
../../gcc/gcc/graphite-isl-ast-to-gimple.c:3271
0x127cea3 graphite_transform_loops()
../../gcc/gcc/graphite.c:336
0x127d370 graphite_transforms
../../gcc/gcc/graphite.c:363
0x127d370 execute
../../gcc/gcc/graphite.c:440
Please submit a full bug report,

> gfortran -v
Using built-in specs.
COLLECT_GCC=gfortran
COLLECT_LTO_WRAPPER=/data/vjoost/gnu/gcc_trunk/install/libexec/gcc/x86_64-pc-linux-gnu/6.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc/configure --prefix=/data/vjoost/gnu/gcc_trunk/install
--enable-languages=c,c++,fortran --disable-multilib --enable-plugins
--enable-lto --disable-bootstrap
Thread model: posix
gcc version 6.0.0 20151204 (experimental) [trunk revision 231243] (GCC)

[Bug tree-optimization/68529] scev failed for while(i--)

2015-12-03 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68529

amker at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from amker at gcc dot gnu.org ---
Fixed.

Re: [PATCH] Improve constant vec_perm expansion on i?86 (PR target/68655)

2015-12-03 Thread Uros Bizjak
On Thu, Dec 3, 2015 at 9:52 PM, Jakub Jelinek  wrote:
> Hi!
>
> As discussed in the PR, for some permutation we can get better code
> if we try to expand it as if it was a permutation in a mode with the
> same vector size, but wider vector element.  The first attempt to do this
> always had mixed results, lots of improvements, lots of pessimizations,
> this one at least on gcc.dg/vshuf*
> {-msse2,-msse4,-mavx,-mavx2,-mavx512f,-mavx512bw} shows only
> improvements - it tries the original permutation for single insn,
> if that doesn't work tries the wider one single insn, and then
> as complete fallback, if we don't have any expansion whatsoever, tries
> the wider one too.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2015-12-03  Jakub Jelinek  
>
> PR target/68655
> * config/i386/i386.c (canonicalize_vector_int_perm): New function.
> (expand_vec_perm_1): Use it and recurse if everything else
> failed.  Use nd.perm instead of perm2.
> (expand_vec_perm_even_odd_1): If testing_p, use gen_raw_REG
> instead of gen_lowpart for the target.
> (ix86_expand_vec_perm_const_1): Use canonicalize_vector_int_perm
> and recurse if everything else failed.
>
> * gcc.dg/torture/vshuf-4.inc (TESTS): Add one extra test.
> * gcc.dg/torture/vshuf-4.inc (TESTS): Add two extra tests.

OK for mainline.

Thanks,
Uros.

> --- gcc/config/i386/i386.c.jj   2015-12-02 20:27:00.0 +0100
> +++ gcc/config/i386/i386.c  2015-12-03 15:03:13.415764986 +0100
> @@ -49365,6 +49365,57 @@ expand_vec_perm_pshufb (struct expand_ve
>return true;
>  }
>
> +/* For V*[QHS]Imode permutations, check if the same permutation
> +   can't be performed in a 2x, 4x or 8x wider inner mode.  */
> +
> +static bool
> +canonicalize_vector_int_perm (const struct expand_vec_perm_d *d,
> + struct expand_vec_perm_d *nd)
> +{
> +  int i;
> +  enum machine_mode mode = VOIDmode;
> +
> +  switch (d->vmode)
> +{
> +case V16QImode: mode = V8HImode; break;
> +case V32QImode: mode = V16HImode; break;
> +case V64QImode: mode = V32HImode; break;
> +case V8HImode: mode = V4SImode; break;
> +case V16HImode: mode = V8SImode; break;
> +case V32HImode: mode = V16SImode; break;
> +case V4SImode: mode = V2DImode; break;
> +case V8SImode: mode = V4DImode; break;
> +case V16SImode: mode = V8DImode; break;
> +default: return false;
> +}
> +  for (i = 0; i < d->nelt; i += 2)
> +if ((d->perm[i] & 1) || d->perm[i + 1] != d->perm[i] + 1)
> +  return false;
> +  nd->vmode = mode;
> +  nd->nelt = d->nelt / 2;
> +  for (i = 0; i < nd->nelt; i++)
> +nd->perm[i] = d->perm[2 * i] / 2;
> +  if (GET_MODE_INNER (mode) != DImode)
> +canonicalize_vector_int_perm (nd, nd);
> +  if (nd != d)
> +{
> +  nd->one_operand_p = d->one_operand_p;
> +  nd->testing_p = d->testing_p;
> +  if (d->op0 == d->op1)
> +   nd->op0 = nd->op1 = gen_lowpart (nd->vmode, d->op0);
> +  else
> +   {
> + nd->op0 = gen_lowpart (nd->vmode, d->op0);
> + nd->op1 = gen_lowpart (nd->vmode, d->op1);
> +   }
> +  if (d->testing_p)
> +   nd->target = gen_raw_REG (nd->vmode, LAST_VIRTUAL_REGISTER + 1);
> +  else
> +   nd->target = gen_reg_rtx (nd->vmode);
> +}
> +  return true;
> +}
> +
>  /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to instantiate D
> in a single instruction.  */
>
> @@ -49372,7 +49423,7 @@ static bool
>  expand_vec_perm_1 (struct expand_vec_perm_d *d)
>  {
>unsigned i, nelt = d->nelt;
> -  unsigned char perm2[MAX_VECT_LEN];
> +  struct expand_vec_perm_d nd;
>
>/* Check plain VEC_SELECT first, because AVX has instructions that could
>   match both SEL and SEL+CONCAT, but the plain SEL will allow a memory
> @@ -49385,10 +49436,10 @@ expand_vec_perm_1 (struct expand_vec_per
>
>for (i = 0; i < nelt; i++)
> {
> - perm2[i] = d->perm[i] & mask;
> - if (perm2[i] != i)
> + nd.perm[i] = d->perm[i] & mask;
> + if (nd.perm[i] != i)
> identity_perm = false;
> - if (perm2[i])
> + if (nd.perm[i])
> broadcast_perm = false;
> }
>
> @@ -49457,7 +49508,7 @@ expand_vec_perm_1 (struct expand_vec_per
> }
> }
>
> -  if (expand_vselect (d->target, d->op0, perm2, nelt, d->testing_p))
> +  if (expand_vselect (d->target, d->op0, nd.perm, nelt, d->testing_p))
> return true;
>
>/* There are plenty of patterns in sse.md that are written for
> @@ -49468,10 +49519,10 @@ expand_vec_perm_1 (struct expand_vec_per
>  every other permutation operand.  */
>for (i = 0; i < nelt; i += 2)
> {
> - perm2[i] = d->perm[i] & mask;
> - perm2[i + 1] = (d->perm[i + 1] & mask) + nelt;
> + nd.perm[i] = d->perm[i] & mask;
> + nd.perm[i + 1] 

[Bug rtl-optimization/68691] New: ICE at -O3 with -g enabled on x86_64-linux-gnu in alter_subregs, at lra-spills.c:610 (in 32-bit mode)

2015-12-03 Thread su at cs dot ucdavis.edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68691

Bug ID: 68691
   Summary: ICE at -O3 with -g enabled on x86_64-linux-gnu in
alter_subregs, at lra-spills.c:610 (in 32-bit mode)
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: su at cs dot ucdavis.edu
  Target Milestone: ---

The following code causes an ICE when compiled with the current gcc trunk at
-O3 with -g enabled on x86_64-linux-gnu in the 32-bit mode (but not in 64-bit
mode). 

It is a regression from 5.2.x.


$ gcc-trunk -v
Using built-in specs.
COLLECT_GCC=gcc-trunk
COLLECT_LTO_WRAPPER=/usr/local/gcc-trunk/libexec/gcc/x86_64-pc-linux-gnu/6.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-trunk/configure --prefix=/usr/local/gcc-trunk
--enable-languages=c,c++ --disable-werror --enable-multilib
Thread model: posix
gcc version 6.0.0 20151203 (experimental) [trunk revision 231219] (GCC) 
$ 
$ gcc-trunk -m32 -O2 -g -c small.c
$ gcc-trunk -m64 -O3 -g -c small.c
$ gcc-5.2 -m32 -O3 -g -c small.c
$ 
$ gcc-trunk -m32 -O3 -g -c small.c
small.c: In function ‘fn1’:
small.c:39:1: internal compiler error: in alter_subregs, at lra-spills.c:610
 }
 ^

0x9a4ccd alter_subregs
../../gcc-trunk/gcc/lra-spills.c:610
0x9a5f32 lra_final_code_change()
../../gcc-trunk/gcc/lra-spills.c:745
0x9868dc lra(_IO_FILE*)
../../gcc-trunk/gcc/lra.c:2383
0x93c449 do_reload
../../gcc-trunk/gcc/ira.c:5383
0x93c449 execute
../../gcc-trunk/gcc/ira.c:5554
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.
$ 


--


char a, b, i, j;
int c, d, e, f, g, h, n;

char
fn1 ()
{
  char k, l, m;
  int p;
  e = g > f;
  for (b = 0; b < 2; b++)
{
  for (p = 0; p < 3; p++)
{
  for (; h < 1; h++)
{
  for (; m;)
goto lbl;
  e = g;
}
  l = a < 0 || a < d;
}
  d++;
  for (;;)
{
  k = g;
  n = -k;
  j = n;
  c = j;
  e = 2;
  if (l)
break;
  return 2;
}
}
  for (;;)
;
lbl:
  return i;
}

Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address

2015-12-03 Thread Bin.Cheng
On Thu, Dec 3, 2015 at 6:26 PM, Richard Earnshaw
 wrote:
> On 03/12/15 05:26, Bin.Cheng wrote:
>> On Tue, Dec 1, 2015 at 6:25 PM, Richard Earnshaw
>>  wrote:
>>> On 01/12/15 03:19, Bin.Cheng wrote:
 On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw
  wrote:
> On 24/11/15 09:56, Richard Earnshaw wrote:
>> On 24/11/15 02:51, Bin.Cheng wrote:
> The aarch64's problem is we don't define addptr3 pattern, and we don't
>>> have direct insn pattern describing the "x + y << z".  According to
>>> gcc internal:
>>>
>>> ‘addptrm3’
>>> Like addm3 but is guaranteed to only be used for address 
>>> calculations.
>>> The expanded code is not allowed to clobber the condition code. It
>>> only needs to be defined if addm3 sets the condition code.
>
> addm3 on aarch64 does not set the condition codes, so by this rule we
> shouldn't need to define this pattern.
>>> Hi Richard,
>>> I think that rule has a prerequisite that backend needs to support
>>> register shifted addition in addm3 pattern.
>>
>> addm3 is a named pattern and its format is well defined.  It does not
>> take a shifted operand and never has.
>>
>>> Apparently for AArch64,
>>> addm3 only supports "reg+reg" or "reg+imm".  Also we don't really
>>> "does not set the condition codes" actually, because both
>>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.
>>
>> You appear to be confusing named patterns (used by expand) with
>> recognizers.  Anyway, we have
>>
>> (define_insn "*add__"
>>   [(set (match_operand:GPI 0 "register_operand" "=r")
>> (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" 
>> "r")
>>   (match_operand:QI 2
>> "aarch64_shift_imm_" "n"))
>>   (match_operand:GPI 3 "register_operand" "r")))]
>>
>> Which is a non-flag setting add with shifted operand.
>>
>>> Either way I think it is another backend issue, so do you approve that
>>> I commit this patch now?
>>
>> Not yet.  I think there's something fundamental amiss here.
>>
>> BTW, it looks to me as though addptr3 should have exactly the same
>> operand rules as add3 (documentation reads "like add3"), so a
>> shifted operand shouldn't be supported there either.  If that isn't the
>> case then that should be clearly called out in the documentation.
>>
>> R.
>>
>
> PS.
>
> I presume you are aware of the canonicalization rules for add?  That is,
> for a shift-and-add operation, the shift operand must appear first.  Ie.
>
> (plus (shift (op, op)), op)
>
> not
>
> (plus (op, (shift (op, op))

 Hi Richard,
 Thanks for the comments.  I realized that the not-recognized insn
 issue is because the original patch build non-canonical expressions.
 When reloading address expression, LRA generates non-canonical
 register scaled insn, which can't be recognized by aarch64 backend.

 Here is the updated patch using canonical form pattern,  it passes
 bootstrap and regression test.  Well, the ivo failure still exists,
 but it analyzed in the original message.

 Is this patch OK?

 As for Jiong's concern about the additional extension instruction, I
 think this only stands for atmoic load store instructions.  For
 general load store, AArch64 supports zext/sext in register scaling
 addressing mode, the additional instruction can be forward propagated
 into memory reference.  The problem for atomic load store is AArch64
 only supports direct register addressing mode.  After LRA reloads
 address expression out of memory reference, there is no combine/fwprop
 optimizer to merge instructions.  The problem is atomic_store's
 predicate doesn't match its constraint.   The predicate used for
 atomic_store is memory_operand, while all other atomic patterns
 use aarch64_sync_memory_operand.  I think this might be a typo.  With
 this change, expand will not generate addressing mode requiring reload
 anymore.  I will test another patch fixing this.

 Thanks,
 bin
>>>
>>> Some comments inline.
>>>
>
> R.
>
> aarch64_legitimize_addr-20151128.txt
>
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 3fe2f0f..5b3e3c4 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  
> */, machine_mode mode)
>   We try to pick as large a range for the offset as possible to
>   maximize the chance of a CSE.  However, for aligned addresses
>   we limit the 

[Bug c++/68478] flexible array members have complete type

2015-12-03 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68478

Martin Sebor  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2015-12-04
   Assignee|unassigned at gcc dot gnu.org  |msebor at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Martin Sebor  ---
Patch posted for review:
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00511.html

[Bug c++/68613] initializer-string for array of chars is too long error on flexible array member

2015-12-03 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68613

--- Comment #2 from Martin Sebor  ---
Patch posted for review:
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00511.html

[Bug lto/68662] [6 regression] FAIL: gcc.dg/lto/20090210 c_lto_20090210_0.o-c_lto_20090210_1.o link, -O2 -flto -flto-partition=none -fuse-linker-plugin -fno-fat-lto-objects

2015-12-03 Thread amodra at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68662

Alan Modra  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-12-04
 CC||amodra at gmail dot com
 Ever confirmed|0   |1

--- Comment #2 from Alan Modra  ---
Confirmed.  Also the -O0 variant of this test.  I had a look at assembly output
and see lines like
 addis 30,30,-.L7@ha

This comes from rs6000_emit_load_toc_table use of toc_label_name which appears
to be all-zero.  toc_label_name is supposed to be set in
rs6000_option_override_internal, but apparently isn't because when
rs6000_option_override_internal runs, TARGET_TOC is not true.  Making the
initialisation unconditional isn't enough to cure the problem as that leads to
linker errors about an undefined reference to ".LCTOC1".  That is because code
to set .LCTOC1 isn't being emitted from rs6000_elf_output_toc_section_asm_op.
I haven't verified this, but I'd guess that is because rs6000_file_start
doesn't see flag_pic == 2, and therefore doesn't call switch_to_section
(toc_section).

Notice that c_lto_20090210_0.o does *not* have -fPIC, while c_lto_20090210_1.o
does.

[Bug libstdc++/68688] segmentation fault on regex matching long strings

2015-12-03 Thread timshen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68688

Tim Shen  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||timshen at gcc dot gnu.org
 Resolution|--- |DUPLICATE

--- Comment #1 from Tim Shen  ---
This is a known issue, that is the regex engine uses stack frame (through deep
recursion, rather than heap storage) to store data. I plan to fix it after a
series of refactoring, which are waiting for review.

In the mean time, the simplest workaround is to make your stack larger. Sorry
for the inconvenience!

*** This bug has been marked as a duplicate of bug 61582 ***

[Bug c/53548] allow flexible array members in unions like zero-length arrays

2015-12-03 Thread vapier at gentoo dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53548

--- Comment #4 from Mike Frysinger  ---
(In reply to Martin Sebor from comment #3)

that's fine.  thanks !

[Bug tree-optimization/68550] [6 Regression] ICE: verify_gimple failed Error: missing PHI def

2015-12-03 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68550

Sebastian Pop  changed:

   What|Removed |Added

 CC||sch...@linux-m68k.org

--- Comment #6 from Sebastian Pop  ---
*** Bug 68659 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/68659] [6 regression] FAIL: gcc.dg/graphite/id-pr45230-1.c (internal compiler error)

2015-12-03 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68659

Sebastian Pop  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #8 from Sebastian Pop  ---
Most likely fixed in r231206.

*** This bug has been marked as a duplicate of bug 68550 ***

[PATCH AArch64]Use aarch64_sync_memory_operand in atomic_store pattern

2015-12-03 Thread Bin Cheng
Hi,
I noticed atmoic_store pattern is the only one in atomic.md that uses
memory_operand as predicate.  This seems like a typo to me.  It also causes
problem.  The general address expression supported by memory_operand is kept
till LRA finds out it doesn't match the "Q" constraint.  As a result LRA
needs to reload the address expression out of memory reference.  Since there
is no combine optimizer after LRA, below inefficient code is generated for
atomic stores:
  67 add x1, x29, 64
  68 add x0, x1, x0, sxtw 3
  69 sub x0, x0, #16
  70 stlrx19, [x0]
Or:
  67 sxtwx0, w0
  68 add x1, x29, 48
  69 add x1, x1, x0, sxtw 3
  70 stlrx19, [x1]

With this patch, we force atomic_store to use direct register addressing
mode at earlier compilation phase and better code will be generated:
  67 add x1, x29, 48
  68 add x1, x1, x0, sxtw 3
  69 stlrx19, [x1]

Bootstrap and test on aarch64.  Is it OK?

Thanks,
bin

2015-12-01  Bin Cheng  

* config/aarch64/atomics.md (atomic_store): Use predicate
aarch64_sync_memory_operand.

diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index 3c034fb..68dc27a 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -481,7 +481,7 @@
 )
 
 (define_insn "atomic_store"
-  [(set (match_operand:ALLI 0 "memory_operand" "=Q")
+  [(set (match_operand:ALLI 0 "aarch64_sync_memory_operand" "=Q")
 (unspec_volatile:ALLI
   [(match_operand:ALLI 1 "general_operand" "rZ")
(match_operand:SI 2 "const_int_operand")]   ;; model


[Bug c/53548] allow flexible array members in unions like zero-length arrays

2015-12-03 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53548

Martin Sebor  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2015-12-04
 Ever confirmed|0   |1

--- Comment #3 from Martin Sebor  ---
Since the C standard requires the test case to be diagnosed and there is an
existing extension that makes the requested functionality possible I'd like to
close this as WONTFIX.  Mike, please let me know if you disagree.

Ping [PATCH] c++/42121 - diagnose invalid flexible array members

2015-12-03 Thread Martin Sebor

[CC Jason for the C++ changes and Joseph for the one C change.]

Attached is a reworked and expanded patch for the bug plus three
others in the same area that I uncovered while developing and
testing the former patch:

c++/68689 - flexible array members in unions accepted in C++
c++/68478 - flexible array members have complete type
c++/68613 - initializer-string for array of chars is too long error
on flexible array member

The patch should bring C++ support for flexible array members closer
to C (most of the same constructs should be accepted and rejected).
The only C change in this patch is to include the size of excessively
large types in diagnostics (I found knowing the size helpful when
adding tests and I think it might be helpful to others as well).

Unlike in my first attempt, this patch distinguishes flexible array
members from zero-length arrays by setting the upper bound of the
former to null.  This seems to be in line with what the C front end
does but has required bigger changes than I had hoped.  Hopefully,
the result is a more consistent treatment of the extension between
the two front ends (for example, both C and C++ now emit the same
ADA specification for flexible array members).

Tested by bootstrapping and running C and C++ tests (including
libstdc++) on x86_64.

I'm not sure if this is appropriate for this stage or if it needs
to wait until after the release.  Either is fine with me.

Martin

On 11/21/2015 03:17 PM, Martin Sebor wrote:

Bug 42121 - g++ should warn or error on internal 0 size array in
struct, is a request to diagnose declarations of flexible array
members that aren't last in the enclosing struct, such as in the
following:

 struct S
 {
 int a;
 char b[];   // invalid
 int c;
 };

The C front end diagnoses such cases because they are invalid in
standard C.  Comment 8 on the bug points out that flexible array
members should not be treated identically to zero-size arrays
(they're not in C).

The attached patch implements the requested diagnostic, keeping
comment 8 in mind.  It also issues a diagnostic for flexible array
members in unions (which are also diagnosed as invalid in C mode).
The patch found a number of instances of invalid flexible array
members in the C++ test suites.  I corrected them.

Since the C++ front end doesn't distinguish between flexible array
members and zero-size arrays (both are considered to have an upper
bound of SIZE_MAX), and since determining whether or not
a declaration of such a member is valid cannot be done until
the whole containing struct has been processed, the patch makes
use one of the DECL_LANG_FLAGs to temporarily remember which is
which (I somewhat arbitrarily picked DECL_LANG_FLAG_1), before
clearing it. There might be a better flag to use, and it might
be appropriate to define a descriptive macro for this purpose
in cp-tree.h, along the same lines as the macros already defined
for other such purposes.

Martin


gcc/testsuite/ChangeLog:
2015-12-02  Martin Sebor  

	c++/42121
	c++/68478
	c++/68613
	c++/68689
	* g++.dg/ext/flexary2.C: Expect a sole flexible array member
	to be rejected.  Add a test case exercising zero-length array.
	* g++.dg/ext/flexary3.C: Expect a sole flexible array member
	to be rejected.
	* g++.dg/ext/flexary4.C: New file.
	* g++.dg/ext/flexary5.C: New file.
	* g++.dg/ext/flexary6.C: New file.
	* g++.dg/ext/flexary7.C: New file.
	* g++.dg/other/dump-ada-spec-2.C: Adjust to reflect flexible
	array members.
	* g++.dg/parse/pr43765.C: Add a member to make a struct with
	a flexible array member valid.  Adjust expected error message.
	* g++.dg/torture/pr64280.C: Expect a sole flexible array member
	to be rejected.
	* g++.dg/torture/pr64312.C: Add a member to make a struct with
	a flexible array member valid.
	* g++.dg/ubsan/object-size-1.C: Adjust expected diagnostic.
	* g++.dg/other/dump-ada-spec-2.C: Adjust expected type.

gcc/cp/ChangeLog:
2015-12-02  Martin Sebor  

	c++/42121
	c++/68478
	c++/68613
	c++/68689
	* class.c (walk_subobject_offsets): Avoid assuming type domain
	is non-null or has an upper bound.
	(layout_class_type): Include type size in error message.
	(all_bases_empty_p, field_nonempty_p): New helper functions.
	(check_flexarrays): New function.
	(finish_struct_1): Call check_flexarrays.
	* decl.c (compute_array_index_type): Distinguish flexible array
	members from zero-length arrays.
	(grokdeclarator): Reject flexible array members in unions.  Avoid
	rejecting members of incomplete types that are flexible array members.
	* error.c (dump_type_suffix): Handle flexible array members with null
	upper bound.
	* init.c (perform_member_init): Same.
	* pt.c (instantiate_class_template_1): Allow flexible array members.
	(tsubst): Handle flexible array members with null upper bound.
	* typeck2.c (digest_init_r): Warn for initialization of flexible
	array members.
	(process_init_constructor_record): Handle flexible array 

[Bug tree-optimization/68693] New: [6 Regression] ice: in harmful_stmt_in_region, at graphite-scop-detection.c:1052

2015-12-03 Thread Joost.VandeVondele at mat dot ethz.ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68693

Bug ID: 68693
   Summary: [6 Regression] ice: in harmful_stmt_in_region, at
graphite-scop-detection.c:1052
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: Joost.VandeVondele at mat dot ethz.ch
  Target Milestone: ---

> cat bug.f90 
MODULE dbcsr_index_operations
  INTERFACE dbcsr_build_row_index
  END INTERFACE
CONTAINS
  SUBROUTINE merge_index_arrays (new_row_i, new_col_i, new_blk_p, new_size,&
   old_row_i, old_col_i, old_blk_p, old_size,&
   add_ip, add_size, new_blk_d, old_blk_d,&
   added_size_offset, added_sizes, added_size, added_nblks, error)
INTEGER, DIMENSION(new_size), &
  INTENT(OUT):: new_blk_p, new_col_i, &
new_row_i
INTEGER, INTENT(IN)  :: old_size
INTEGER, DIMENSION(old_size), INTENT(IN) :: old_blk_p, old_col_i, &
old_row_i
INTEGER, DIMENSION(new_size), &
  INTENT(OUT), OPTIONAL  :: new_blk_d
INTEGER, DIMENSION(old_size), &
  INTENT(IN), OPTIONAL   :: old_blk_d
INTEGER, DIMENSION(:), INTENT(IN), &
  OPTIONAL   :: added_sizes
INTEGER, INTENT(OUT), OPTIONAL   :: added_size, added_nblks
LOGICAL  :: multidata
IF (add_size .GT. 0) THEN
   IF (old_size .EQ. 0) THEN
  IF (PRESENT (added_size)) added_size = SUM (added_sizes)
   ENDIF
ELSE
   new_row_i(1:old_size) = old_row_i(1:old_size)
   new_col_i(1:old_size) = old_col_i(1:old_size)
   new_blk_p(1:old_size) = old_blk_p(1:old_size)
   IF (multidata) new_blk_d(1:old_size) = old_blk_d(1:old_size)
ENDIF
  END SUBROUTINE merge_index_arrays
END MODULE dbcsr_index_operations


s> gfortran -c -floop-nest-optimize -O2 bug.f90 
bug.f90:5:0:

   SUBROUTINE merge_index_arrays (new_row_i, new_col_i, new_blk_p, new_size,&


internal compiler error: in harmful_stmt_in_region, at
graphite-scop-detection.c:1052
0x128b761 harmful_stmt_in_region
../../gcc/gcc/graphite-scop-detection.c:1052
0x128b761 merge_sese
../../gcc/gcc/graphite-scop-detection.c:857
0x128bd5c build_scop_breadth
../../gcc/gcc/graphite-scop-detection.c:910
0x128bd5c build_scop_depth
../../gcc/gcc/graphite-scop-detection.c:888
0x128bd21 build_scop_breadth
../../gcc/gcc/graphite-scop-detection.c:902
0x128bd21 build_scop_depth
../../gcc/gcc/graphite-scop-detection.c:888
0x128bb1f build_scop_depth
../../gcc/gcc/graphite-scop-detection.c:886
0x128ba75 build_scop_depth
../../gcc/gcc/graphite-scop-detection.c:874
0x128e3da build_scops(vec*)
../../gcc/gcc/graphite-scop-detection.c:1922
0x127cd31 graphite_transform_loops()
../../gcc/gcc/graphite.c:314
0x127d370 graphite_transforms
../../gcc/gcc/graphite.c:363
0x127d370 execute
../../gcc/gcc/graphite.c:440
Please submit a full bug report,


> gfortran  -v
Using built-in specs.
COLLECT_GCC=gfortran
COLLECT_LTO_WRAPPER=/data/vjoost/gnu/gcc_trunk/install/libexec/gcc/x86_64-pc-linux-gnu/6.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc/configure --prefix=/data/vjoost/gnu/gcc_trunk/install
--enable-languages=c,c++,fortran --disable-multilib --enable-plugins
--enable-lto --disable-bootstrap
Thread model: posix
gcc version 6.0.0 20151204 (experimental) [trunk revision 231243] (GCC)

[Bug tree-optimization/68693] [6 Regression] ice: in harmful_stmt_in_region, at graphite-scop-detection.c:1052

2015-12-03 Thread Joost.VandeVondele at mat dot ethz.ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68693

Joost VandeVondele  changed:

   What|Removed |Added

   Last reconfirmed||2015-12-4
 CC||Joost.VandeVondele at mat dot 
ethz
   ||.ch, spop at gcc dot gnu.org
   Target Milestone|--- |6.0
  Known to fail||6.0

--- Comment #1 from Joost VandeVondele  
---
another graphite ice

-fstrict-aliasing fixes 6/6: permit inlining of comdats

2015-12-03 Thread Jan Hubicka
Hi,
this is the last patch of the series.  It makes operand_equal_p to compare
alias sets even in !flag_strict_aliasing before inlining so inlining 
!flag_strict_aliasing to flag_strict_aliasing is possible when callee is
merged comdat.  I tried to explain it in greater detail in the comment
in ipa-inline-tranform.

While working on the code I noticed that I managed to overload merged with
two meanings. One is that the function had bodies defined in multiple units
(and thus its inlining should not be considered cross-modulo) and other is
that it used to be comdat.  This is usually the same, but not always - one
can manually define weak functions where the bypass for OPTIMIZAITON_NODE
checks can not apply.

Since the first only affects heuristics and I do not think I need to care
about weaks much, I dropped it and renamed the flag to merged_comdat to make
it more obvious what it means.

Bootstrapped/regtested x86_64-linux, OK?

I will work on some testcases for the ICF and fold-const that would lead
to wrong code if alias sets was ignored early.

Honza
* fold-const.c (operand_equal_p): Before inlining do not permit
transformations that would break with strict aliasing.
* ipa-inline.c (can_inline_edge_p) Use merged_comdat.
* ipa-inline-transform.c (inline_call): When inlining merged comdat do
not drop strict_aliasing flag of caller.
* cgraphclones.c (cgraph_node::create_clone): Use merged_comdat.
* cgraph.c (cgraph_node::dump): Dump merged_comdat.
* ipa-icf.c (sem_function::merge): Drop merged_comdat when merging
comdat and non-comdat.
* cgraph.h (cgraph_node): Rename merged to merged_comdat.
* ipa-inline-analysis.c (simple_edge_hints): Check both merged_comdat
and icf_merged.

* lto-symtab.c (lto_cgraph_replace_node): Update code computing
merged_comdat.
Index: fold-const.c
===
--- fold-const.c(revision 231239)
+++ fold-const.c(working copy)
@@ -2987,7 +2987,7 @@ operand_equal_p (const_tree arg0, const_
   flags)))
return 0;
  /* Verify that accesses are TBAA compatible.  */
- if (flag_strict_aliasing
+ if ((flag_strict_aliasing || !cfun->after_inlining)
  && (!alias_ptr_types_compatible_p
(TREE_TYPE (TREE_OPERAND (arg0, 1)),
 TREE_TYPE (TREE_OPERAND (arg1, 1)))
Index: ipa-inline.c
===
--- ipa-inline.c(revision 231239)
+++ ipa-inline.c(working copy)
@@ -466,7 +466,7 @@ can_inline_edge_p (struct cgraph_edge *e
  optimized with the optimization flags of module they are used in.
 Also do not care about mixing up size/speed optimization when
 DECL_DISREGARD_INLINE_LIMITS is set.  */
-  else if ((callee->merged
+  else if ((callee->merged_comdat
&& !lookup_attribute ("optimize",
  DECL_ATTRIBUTES (caller->decl)))
   || DECL_DISREGARD_INLINE_LIMITS (callee->decl))
Index: ipa-inline-transform.c
===
--- ipa-inline-transform.c  (revision 231239)
+++ ipa-inline-transform.c  (working copy)
@@ -322,11 +322,26 @@ inline_call (struct cgraph_edge *e, bool
   if (DECL_FUNCTION_PERSONALITY (callee->decl))
 DECL_FUNCTION_PERSONALITY (to->decl)
   = DECL_FUNCTION_PERSONALITY (callee->decl);
+
+  /* merged_comdat indicate that function was originally COMDAT and merged
+ from multiple units.  Because every unit using COMDAT must also define it,
+ we know that the function is safe to build with each of the optimization
+ flags used used to compile them.
+
+ If one unit is compiled with -fstrict-aliasing and
+ other with -fno-strict-aliasing we may bypass dropping the
+ flag_strict_aliasing because we know it would be valid to inline
+ -fstrict-aliaisng variant of the calee, too.  Unless optimization
+ attribute was used, the caller and COMDAT callee must have been
+ compiled with the same flags.  */
   if (!opt_for_fn (callee->decl, flag_strict_aliasing)
-  && opt_for_fn (to->decl, flag_strict_aliasing))
+  && opt_for_fn (to->decl, flag_strict_aliasing)
+  && (!callee->merged_comdat
+ || lookup_attribute ("optimization",
+  DECL_ATTRIBUTES (e->caller->decl))
+ || lookup_attribute ("optimization", DECL_ATTRIBUTES (callee->decl
 {
   struct gcc_options opts = global_options;
-
   cl_optimization_restore (,
 TREE_OPTIMIZATION (DECL_FUNCTION_SPECIFIC_OPTIMIZATION (to->decl)));
   opts.x_flag_strict_aliasing = false;
Index: cgraphclones.c
===
--- 

[Bug c++/68689] flexible array members in unions accepted in C++

2015-12-03 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68689

--- Comment #2 from Martin Sebor  ---
Patch posted for review:
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00511.html

[Bug tree-optimization/68692] [6 Regression] ice: Segmentation fault

2015-12-03 Thread Joost.VandeVondele at mat dot ethz.ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68692

Joost VandeVondele  changed:

   What|Removed |Added

   Last reconfirmed||2015-12-4
 CC||Joost.VandeVondele at mat dot 
ethz
   ||.ch, spop at gcc dot gnu.org
   Target Milestone|--- |6.0
Summary|[graphite] ice: |[6 Regression] ice:
   |Segmentation fault  |Segmentation fault
  Known to fail||6.0

--- Comment #1 from Joost VandeVondele  
---
trying to get the nightly tester to run, another graphite ice

[Bug tree-optimization/63586] x+x+x+x -> 4*x in gimple

2015-12-03 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63586

--- Comment #4 from Marc Glisse  ---
(In reply to kugan from comment #2)
> ;; Function f4 (f4, funcdef_no=3, decl_uid=4162, cgraph_uid=3,
> symbol_order=3)
> 
> ;; 1 loops found
> ;;
> ;; Loop 0
> ;;  header 0, latch 1
> ;;  depth 0, outer -1
> ;;  nodes: 0 1 2
> ;; 2 succs { 1 }
> f4 (unsigned int x, unsigned int z, unsigned int k)
> {
>   unsigned int y;
>   unsigned int reassocmul_12;
>   unsigned int reassocmul_13;
>   unsigned int _14;
>   unsigned int _15;
> 
>   :
>   reassocmul_12 = x_2(D) * 3;
>   reassocmul_13 = z_6(D) * 3;
>   _14 = x_2(D) + reassocmul_13;
>   _15 = _14 + reassocmul_12;
>   y_10 = _15 + k_1(D);
>   return y_10;
> 
> }

So the patch fails in this case? It misses the 4th x.


(In reply to kugan from comment #3)
> I think the intention is to have multiplication by power-of-2?

At gimple level, multiplication by any constant would be a good
canonicalization (it can be expanded back to sums later if that's what the
target prefers).

[Bug libstdc++/61582] C++11 regex memory corruption

2015-12-03 Thread timshen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582

Tim Shen  changed:

   What|Removed |Added

 CC||kerukuro at gmail dot com

--- Comment #17 from Tim Shen  ---
*** Bug 68688 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/68550] [6 Regression] ICE: verify_gimple failed Error: missing PHI def

2015-12-03 Thread Joost.VandeVondele at mat dot ethz.ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68550

--- Comment #7 from Joost VandeVondele  
---
(In reply to Sebastian Pop from comment #5)
> fixed

BTW, with this fixed, I can compile our CP2K code with -floop-nest-optimize at
various -Ox and all seems correct. Thanks!

I'll try to integrate '-floop-nest-optimize' in our nightly testers.

[Bug middle-end/68291] [6 regression] ICE in emit_move_insn, at expr.c:3540

2015-12-03 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68291

--- Comment #7 from Alexandre Oliva  ---
Eric, apologies for the slow response, I'm in the middle of an all-week trip
with little Internet access.

I think the best course of action is to adjust gimple_can_coalesce_p so that it
returns false for RESULT_DECLs for which promote_ssa_mode returns BLKmode, and
then adjust the block you quoted to assign a group rtx to the result decl, like
the original code (still present a few lines below) used to do when
hard_function_value returned a non-REG.  I don't think we can allow coalescing
in this case, because IIRC expanders can't deal with these parallels in
general.

I can look into this next week, when I'll be back home, if you prefer.  Just
let me know.

[Bug c/57180] Structures with a flexible arrray member have wrong size

2015-12-03 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57180

Martin Sebor  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||msebor at gcc dot gnu.org
  Known to work||4.9.3, 5.1.0, 6.0
 Resolution|--- |FIXED

--- Comment #5 from Martin Sebor  ---
The test case passes with GCC 4.9.3 and is rejected with GCC 5.1.0 and 6.0 so
it looks like it's resolved as Marek says.  Closing as FIXED.

[Bug sanitizer/68650] Firefox compilation fails with Address Sanitizer (error: undefined reference to 'dlerror')

2015-12-03 Thread gk at torproject dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68650

--- Comment #4 from Georg Koppen  ---
It is using -lasan it seems:

Executing: c++ -o firefox -Wall -Wempty-body -Woverloaded-virtual
-Wsign-compare -Wwrite-strings -Wno-invalid-offsetof -Wcast-align -v
-fsanitize=address -Dxmalloc=myxmalloc -fno-exceptions -fno-strict-aliasing
-fno-rtti -fno-exceptions -fno-math-errno -std=gnu++0x -pthread -pipe -DNDEBUG
-DTRIMMED -g -freorder-blocks -Os -fno-omit-frame-pointer
/home/thomas/Arbeit/Tor/mozilla-central/obj-x86_64-unknown-linux-gnu/browser/app/tmpjOe32q.list
-lpthread -fsanitize=address -Wl,-z,noexecstack -Wl,-z,text -Wl,--build-id -B
../../build/unix/gold -Wl,-Bsymbolic -rdynamic
-Wl,-rpath-link,/home/thomas/Arbeit/Tor/mozilla-central/obj-x86_64-unknown-linux-gnu/dist/bin
-Wl,-rpath-link,NONE/lib ../../xpcom/glue/standalone/libxpcomglue.a
/home/thomas/Arbeit/Tor/mozilla-central/obj-x86_64-unknown-linux-gnu/browser/app/tmpjOe32q.list:
INPUT("nsBrowserApp.o")
INPUT("../../mozglue/build/AsanOptions.o")
INPUT("../../mozglue/build/SSE.o")
INPUT("../../mozglue/build/dummy.o")
INPUT("../../memory/mozalloc/Unified_cpp_memory_mozalloc0.o")
INPUT("../../mozglue/misc/StackWalk.o")
INPUT("../../mozglue/misc/TimeStamp.o")
INPUT("../../mozglue/misc/TimeStamp_posix.o")
INPUT("../../mfbt/Compression.o")
INPUT("../../mfbt/Decimal.o")
INPUT("../../mfbt/Unified_cpp_mfbt0.o")
INPUT("../../memory/fallible/fallible.o")

Using built-in specs.
COLLECT_GCC=c++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 5.2.1-23'
--with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs
--enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr
--program-suffix=-5 --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib
--disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home
--with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64
--with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar
--enable-objc-gc --enable-multiarch --with-arch-32=i586 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu
--target=x86_64-linux-gnu
Thread model: posix
gcc version 5.2.1 20151028 (Debian 5.2.1-23) 
COMPILER_PATH=../../build/unix/gold/:/usr/lib/gcc/x86_64-linux-gnu/5/:/usr/lib/gcc/x86_64-linux-gnu/5/:/usr/lib/gcc/x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/5/:/usr/lib/gcc/x86_64-linux-gnu/
LIBRARY_PATH=../../build/unix/gold/:/usr/lib/gcc/x86_64-linux-gnu/5/:/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/5/../../../../lib/:/lib/x86_64-linux-gnu/:/lib/../lib/:/usr/lib/x86_64-linux-gnu/:/usr/lib/../lib/:/usr/lib/gcc/x86_64-linux-gnu/5/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-o' 'firefox' '-Wall' '-Wempty-body'
'-Woverloaded-virtual' '-Wsign-compare' '-Wwrite-strings'
'-Wno-invalid-offsetof' '-Wcast-align' '-v' '-fsanitize=address' '-D'
'xmalloc=myxmalloc' '-fno-strict-aliasing' '-fno-rtti' '-fno-exceptions'
'-fno-math-errno' '-std=gnu++11' '-pthread' '-pipe' '-D' 'NDEBUG' '-D'
'TRIMMED' '-g' '-freorder-blocks' '-Os' '-fno-omit-frame-pointer'
'-fsanitize=address' '-B' '../../build/unix/gold' '-rdynamic' '-shared-libgcc'
'-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-linux-gnu/5/collect2 -plugin
/usr/lib/gcc/x86_64-linux-gnu/5/liblto_plugin.so
-plugin-opt=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
-plugin-opt=-fresolution=/tmp/ccNcaQre.res -plugin-opt=-pass-through=-lgcc_s
-plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lpthread
-plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc_s
-plugin-opt=-pass-through=-lgcc --sysroot=/ --build-id --eh-frame-hdr -m
elf_x86_64 --hash-style=gnu -export-dynamic -dynamic-linker
/lib64/ld-linux-x86-64.so.2 -o firefox
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crt1.o
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/5/crtbegin.o -L../../build/unix/gold
-L/usr/lib/gcc/x86_64-linux-gnu/5
-L/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu
-L/usr/lib/gcc/x86_64-linux-gnu/5/../../../../lib -L/lib/x86_64-linux-gnu
-L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib
-L/usr/lib/gcc/x86_64-linux-gnu/5/../../..
/usr/lib/gcc/x86_64-linux-gnu/5/libasan_preinit.o -lasan

[Bug c/68513] [5/6 Regression] ICE in gimplify_expr, at gimplify.c:8832, c_maybe_const_expr in IL

2015-12-03 Thread mpolacek at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68513

--- Comment #12 from Marek Polacek  ---
No, this isn't something we'd want to backport I think,  For GCC 5, we'll need
another (but trivial) fix.

Re: [PATCH, C++] Wrap OpenACC wait in EXPR_STMT

2015-12-03 Thread Chung-Lin Tang
On 2015/12/3 6:11 PM, Jakub Jelinek wrote:
> On Thu, Dec 03, 2015 at 06:05:36PM +0800, Chung-Lin Tang wrote:
>>> Oh wait, it looks like the C++ front end is not actually using the
>>> functions defined in the C/C++-shared gcc/c-family/c-omp.c, but has its
>>> own implementations in gcc/cp/semantics.c, without "c_" prefixes?  In
>>> addition to finish_expr_stmt calls, I see it's also using
>>> finish_call_expr instead of build_call_expr_loc/build_call_expr_loc_vec.
>>> So I guess we'll want to model this the same way for OpenACC support
>>> functions, and then (later) we should clean this up, to move the
>>> C-specific code from gcc/c-family/c-omp.c into the C front end?  (Jakub?)
>>
>> I see most OpenACC/OpenMP constructs are represented by special statement 
>> codes,
>> so they should be a different case. I so far only see the OpenACC wait 
>> directive
>> being represented as a CALL_EXPR (maybe there are others, haven't 
>> exhaustively searched).
> 
> No, Thomas is right, just look at
> finish_omp_{barrier,flush,taskwait,taskyield,cancel,cancellation_point},
> all those are represented as CALL_EXPRs.
> 
>   Jakub
> 

Okay, I guess my impression was only for some OpenACC constructs.

Overall, OpenACC wait seems one of the few cases of using c_finish_* in 
cp/parser.c.
Whether other cases should move towards/away from that kind of style is a 
larger question,
I was only trying to fix a libgomp.oacc-c++/template-reduction.C regression 
(testcase currently still in gomp4 branch)

Chung-Lin



Re: Add an rsqrt_optab and IFN_RSQRT internal function

2015-12-03 Thread Richard Biener
On Thu, Dec 3, 2015 at 10:39 AM, Jakub Jelinek  wrote:
> On Thu, Dec 03, 2015 at 09:21:03AM +, Richard Sandiford wrote:
>>   * internal-fn.def (RSQRT): New function.
>>   * optabs.def (rsqrt_optab): New optab.
>>   * doc/tm.texi (rsqrtM2): Document
>
> Missing full stop.
>
> Otherwise looks to me like a nice cleanup and hopefully fixes the aarch64
> regression.

Looks good to me as well.

Richard.

> Jakub


[Bug sanitizer/68650] Firefox compilation fails with Address Sanitizer (error: undefined reference to 'dlerror')

2015-12-03 Thread gk at torproject dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68650

Georg Koppen  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #7 from Georg Koppen  ---
Resolving this as invalid then.

[Bug ipa/68672] [4.9/5/6 Regression] g++.dg/torture/pr68470.C: ICE: cannot update SSA form: statement uses released SSA name

2015-12-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68672

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-12-03
 CC||hubicka at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org
  Component|middle-end  |ipa
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Confirmed.  Looks similar to PR68470, IPA split messes up.

[Bug fortran/68676] New: ICE in gfc_match_formal_arglist when compiling gfortran.dg/submodule_10.f08

2015-12-03 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68676

Bug ID: 68676
   Summary: ICE in gfc_match_formal_arglist when compiling
gfortran.dg/submodule_10.f08
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ubizjak at gmail dot com
  Target Milestone: ---
Target: alphaev68-linux-gnu

Recent regression on alphaev68-linux-gnu, fails several submodule_??.f08 tests
[1]. The ICE can be triggered with a crosscompiler to alpha-linux-gnu:

~/gcc-build-alpha/gcc/f951 -O submodule_10.f08

f951: internal compiler error: Segmentation fault
0xb19d7f crash_signal
../../gcc-svn/trunk/gcc/toplev.c:334
0x5498e5 gfc_match_formal_arglist(gfc_symbol*, int, int)
../../gcc-svn/trunk/gcc/fortran/decl.c:4829
0x54c88b gfc_match_subroutine()
../../gcc-svn/trunk/gcc/fortran/decl.c:6016
0x5a52f5 decode_statement
../../gcc-svn/trunk/gcc/fortran/parse.c:378

gdb session:

Starting program: /home/uros/gcc-build-alpha/gcc/f951 -O submodule_10.f08

Program received signal SIGSEGV, Segmentation fault.
0x005498e5 in gfc_match_formal_arglist (progname=0x175b330, st_flag=0,
null_flag=1) at ../../gcc-svn/trunk/gcc/fortran/decl.c:4829
4829  if (!sym->abr_modproc_decl && formal && !head)
(gdb) p sym
$1 = (gfc_symbol *) 0x0
(gdb) list
4824  if (!formal && head)
4825arg_count_mismatch = true;
4826
4827  /* Abbreviated module procedure declaration is not meant to have
any
4828 formal arguments!  */
4829  if (!sym->abr_modproc_decl && formal && !head)
4830arg_count_mismatch = true;
4831
4832  for (p = formal, q = head; p && q; p = p->next, q = q->next)
4833{
(gdb) bt
#0  0x005498e5 in gfc_match_formal_arglist (progname=0x175b330,
st_flag=0, null_flag=1) at ../../gcc-svn/trunk/gcc/fortran/decl.c:4829
#1  0x0054c88c in gfc_match_subroutine () at
../../gcc-svn/trunk/gcc/fortran/decl.c:6016
#2  0x005a52f6 in decode_statement () at
../../gcc-svn/trunk/gcc/fortran/parse.c:378
#3  0x005a6a58 in next_free () at
../../gcc-svn/trunk/gcc/fortran/parse.c:1076
#4  0x005a6de9 in next_statement () at
../../gcc-svn/trunk/gcc/fortran/parse.c:1310
#5  0x005a95af in parse_contained (module=1) at
../../gcc-svn/trunk/gcc/fortran/parse.c:5038
#6  0x005a99ad in parse_module () at
../../gcc-svn/trunk/gcc/fortran/parse.c:5431

sym is dereferenced when NULL.

[1] https://gcc.gnu.org/ml/gcc-testresults/2015-12/msg00267.html

[Bug lto/68556] [6 Regression] -r -flto test failures

2015-12-03 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68556

H.J. Lu  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from H.J. Lu  ---
Fixed as of r231223.

[Bug testsuite/68545] gcc.dg/guality/guality.exp hides compiler error

2015-12-03 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68545

H.J. Lu  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-12-03
 Ever confirmed|0   |1

Re: [PATCH] Handle BUILT_IN_GOACC_PARALLEL in ipa-pta

2015-12-03 Thread Jakub Jelinek
On Thu, Dec 03, 2015 at 12:09:04PM +0100, Tom de Vries wrote:
> The flag is set here in expand_omp_target:
> ...
> 12682 /* Prevent IPA from removing child_fn as unreachable,
>  since there are no
> 12683refs from the parent function to child_fn in offload
>  LTO mode.  */
> 12684 if (ENABLE_OFFLOADING)
> 12685   cgraph_node::get (child_fn)->mark_force_output ();
> ...
> 
> I guess setting forced_by_abi instead would also mean child_fn is not
> removed as unreachable, while still allowing optimizations:
> ...
>   /* Like FORCE_OUTPUT, but in the case it is ABI requiring the symbol
>  to be exported.  Unlike FORCE_OUTPUT this flag gets cleared to
>  symbols promoted to static and it does not inhibit
>  optimization.  */
>   unsigned forced_by_abi : 1;
> ...
> 
> But I suspect that other optimizations (than ipa-pta) might break things.
> 
> Essentially we have two situations:
> - in the host compiler, there is no need for the forced_output flag,
>   and it inhibits optimization
> - in the accelerator compiler, it (or some equivalent) is needed
> 
> I wonder if setting the force_output flag only when streaming the bytecode
> for offloading would work. That way, it wouldn't be set in the host
> compiler, while being set in the accelerator compiler.

I believe that the host and offload func (and var) tables need to be in
sync, so there needs to be something both in the host and accel compilers
that prevents the functions and variables that have their accel or host
counterpart in the tables from being optimized away, or say replaced by
a clone with different arguments etc.

Jakub


[Bug tree-optimization/68671] New: [5/6 Regression] gcc.dg/torture/pr66952.c FAILs with -fno-tree-dce

2015-12-03 Thread zsojka at seznam dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68671

Bug ID: 68671
   Summary: [5/6 Regression] gcc.dg/torture/pr66952.c FAILs with
-fno-tree-dce
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zsojka at seznam dot cz
  Target Milestone: ---

Created attachment 36895
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36895=edit
reduced testcase withouth UB

For armv7a, aarch64, powerpc, powerpc64 targets (tested in QEMU) (x86_64 is
OK), the testcase fails:
$ $CC -O2 -fno-tree-dce
$ ./a.out
Aborted

$ armv7a-hardfloat-linux-gnueabi-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-armv7a-hardfloat/bin/armv7a-hardfloat-linux-gnueabi-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-231194-checking-yes-rtl-df-nographite-armv7a-hardfloat/bin/../libexec/gcc/armv7a-hardfloat-linux-gnueabi/6.0.0/lto-wrapper
Target: armv7a-hardfloat-linux-gnueabi
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++
--enable-checking=yes,rtl,df --without-cloog --without-ppl --without-isl
--with-float=hard --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
--target=armv7a-hardfloat-linux-gnueabi
--with-ld=/usr/bin/armv7a-hardfloat-linux-gnueabi-ld
--with-as=/usr/bin/armv7a-hardfloat-linux-gnueabi-as
--with-sysroot=/usr/armv7a-hardfloat-linux-gnueabi --disable-libstdcxx-pch
--prefix=/repo/gcc-trunk//binary-trunk-231194-checking-yes-rtl-df-nographite-armv7a-hardfloat
Thread model: posix
gcc version 6.0.0 20151202 (experimental) (GCC)

Re: [PATCH] Fix shrink-wrap bug with anticipating into loops (PR67778, PR68634)

2015-12-03 Thread Bernd Schmidt

On 12/02/2015 07:21 PM, Segher Boessenkool wrote:

After shrink-wrapping has found the "tightest fit" for where to place
the prologue, it tries move it earlier (so that frame saves are run
earlier) -- but without copying any more basic blocks.

Unfortunately a candidate block we select can be inside a loop, and we
will still allow it (because the loop always exits via our previously
chosen block).



So we need to detect this situation.  We can place the prologue at a
previous block PRE only if PRE dominates every block reachable from
it.  This is a bit hard / expensive to compute, so instead this patch
allows a block PRE only if PRE does not post-dominate any of its
successors (other than itself).


Are the two conditions equivalent though? I'm not fully convinced. Let's 
say the loop has multiple exits, then none of these exit blocks 
postdominate the loop entry block, right?


I think I agree with Jakub that we don't want to do unnecessary work in 
this piece of code.



/* If we can move PRO back without having to duplicate more blocks, do so.
   We can move back to a block PRE if every path from PRE will eventually
- need a prologue, that is, PRO is a post-dominator of PRE.  */
+ need a prologue, that is, PRO is a post-dominator of PRE.  We might
+ need to duplicate PRE if there is any path from a successor of PRE back
+ to PRE, so don't allow that either (but self-loops are fine, as are any
+ other loops entirely dominated by PRE; this in general seems too
+ expensive to check for, for such an uncommon case).  */


The last comment is unclear and I don't know what it wants to tell me.


Bernd


[Bug tree-optimization/68673] New: Handle __builtin_GOMP_task optimally in ipa-pta

2015-12-03 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68673

Bug ID: 68673
   Summary: Handle __builtin_GOMP_task optimally in ipa-pta
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03477.html :
...
__builtin_GOMP_task supposedly can be treated similarly
if the third argument is NULL (if 3rd arg is non-NULL, then
the caller passes a different structure from what the callee receives,
but perhaps it could be emulated as pretending that cpyfn is called first
with address of a temporary var and the data argument and then fn
is called with the address of the temporary var).
...

[Bug target/68674] ARM attribute target neon warning: incompatible implicit declaration of built-in function

2015-12-03 Thread chrbr at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68674

chrbr at gcc dot gnu.org changed:

   What|Removed |Added

 Blocks||65837
   Assignee|unassigned at gcc dot gnu.org  |chrbr at gcc dot gnu.org

--- Comment #1 from chrbr at gcc dot gnu.org ---
Found this when fixing target/65837:

builtins should have a global scope. but this unmask a few other failures,
mostly with the hooks TARGET_VECTORIZE_PREFERRED_SIMD_MODE that return wrongs
values for global variables.

The problem is that TARGET_NEON is false for global scopes decls, although the
types were known.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65837
[Bug 65837] [arm-linux-gnueabihf] lto1 target specific builtin not available

[Bug target/68655] SSE2 cannot vec_perm of low and high part

2015-12-03 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68655

--- Comment #6 from rguenther at suse dot de  ---
On Thu, 3 Dec 2015, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68655
> 
> --- Comment #5 from Jakub Jelinek  ---
> Created attachment 36897
>   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36897=edit
> gcc6-pr68655.patch
> 
> Initial untested patch.  Unfortunately, it doesn't seem to be always a win,
> when looking at the differences between old and new compiler.
> I'm looking at
> cd /usr/src/gcc/gcc/testsuite/gcc.dg/torture; for i in vshuf-v*[hqs]i.c; do 
> for
> j in -msse2 -msse4 -mavx -mavx2 -mavx512f -mavx512bw; do
> /usr/src/gcc/obj/gcc/cc1.v246 -quiet -O2 $j $i -DEXPENSIVE -o /tmp/1.s;
> /usr/src/gcc/obj/gcc/cc1 -quiet -O2 $j $i -DEXPENSIVE -o /tmp/2.s; echo ===$i
> $j===; diff -up /tmp/1.s /tmp/2.s; done; done
> output now (where cc1.v246 is vanilla cc1, cc1 is one with this patch 
> applied).
> In some cases the patch helps, but I've seen so far some cases where for
> AVX512* it resulted in more instructions.

So maybe that's a too generic handling (doing it up-front?) and maybe
it should be done in the shufp[ds] handler only to catch my case?

Otherwise the x86 ISA makes this quite awkward without building some
pattern recognition scheme with a generator.

[Bug target/68655] SSE2 cannot vec_perm of low and high part

2015-12-03 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68655

--- Comment #7 from Jakub Jelinek  ---
I guess it needs analysis.
Some examples of changes:
vshuf-v16qi.c -msse2 test_2, scalar code vs. punpcklqdq, clear win
vshuf-v16qi.c -msse4 test_2, pshufb -> punpcklqdq (is this a win or not?)
(similarly for -mavx, -mavx2, -mavx512f, -mavx512bw)
vshuf-v16si.c -mavx512{f,bw} test_2:
-   vpermi2d%zmm1, %zmm1, %zmm0
+   vmovdqa64   .LC2(%rip), %zmm0
+   vpermi2q%zmm1, %zmm1, %zmm0
looks like pessimization.
vshuf-v32hi.c -mavx512bw test_2, similar pessimization.
vshuf-v32hi.c -mavx512bw test_2, similarly:
-   vpermi2w%zmm1, %zmm1, %zmm0
+   vmovdqa64   .LC2(%rip), %zmm0
+   vpermi2q%zmm1, %zmm1, %zmm0
vshuf-v4si.c -msse2 test_183, another pessimization:
-   pshufd  $78, %xmm0, %xmm1
+   movdqa  %xmm0, %xmm1
movdb(%rip), %xmm4
pshufd  $255, %xmm0, %xmm2
+   shufpd  $1, %xmm0, %xmm1
vshuf-v4si.c -msse4 test_183, another pessimization:
-   pshufd  $78, %xmm1, %xmm0
+   movdqa  %xmm1, %xmm0
+   palignr $8, %xmm0, %xmm0
vshuf-v4si.c -mavx test_183:
-   vpshufd $78, %xmm1, %xmm0
+   vpalignr$8, %xmm1, %xmm1, %xmm0
vshuf-v64qi.c -mavx512bw, desirable change:
-   vpermi2w%zmm1, %zmm1, %zmm0
-   vpshufb .LC3(%rip), %zmm0, %zmm1
-   vpshufb .LC4(%rip), %zmm0, %zmm0
-   vporq   %zmm0, %zmm1, %zmm0
+   vpermi2q%zmm1, %zmm1, %zmm0
vshuf-v8hi.c -msse2 test_1 another scalar to punpcklqdq, win
vshuf-v8hi.c -msse4 test_2 (supposedly a win):
-   pshufb  .LC3(%rip), %xmm0
+   punpcklqdq  %xmm0, %xmm0
vshuf-v8hi.c -mavx test_2, similarly:
-   vpshufb .LC3(%rip), %xmm0, %xmm0
+   vpunpcklqdq %xmm0, %xmm0, %xmm0
vshuf-v8si.c -mavx2 test_2, another win:
-   vmovdqa a(%rip), %ymm0
-   vperm2i128  $0, %ymm0, %ymm0, %ymm0
+   vpermq  $68, a(%rip), %ymm0
vshuf-v8si.c -mavx2 test_5, another win:
-   vmovdqa .LC6(%rip), %ymm0
-   vmovdqa .LC7(%rip), %ymm1
-   vmovdqa %ymm0, -48(%rbp)
vmovdqa a(%rip), %ymm0
-   vpermd  %ymm0, %ymm1, %ymm1
-   vpshufb .LC8(%rip), %ymm0, %ymm3
-   vpshufb .LC10(%rip), %ymm0, %ymm0
-   vmovdqa %ymm1, c(%rip)
-   vmovdqa b(%rip), %ymm1
-   vpermq  $78, %ymm3, %ymm3
-   vpshufb .LC9(%rip), %ymm1, %ymm2
-   vpshufb .LC11(%rip), %ymm1, %ymm1
-   vpor%ymm3, %ymm0, %ymm0
-   vpermq  $78, %ymm2, %ymm2
-   vpor%ymm2, %ymm1, %ymm1
-   vpor%ymm1, %ymm0, %ymm0
+   vmovdqa .LC7(%rip), %ymm2
+   vmovdqa .LC6(%rip), %ymm1
+   vpermd  %ymm0, %ymm2, %ymm2
+   vpermd  b(%rip), %ymm1, %ymm3
+   vmovdqa %ymm1, -48(%rbp)
+   vmovdqa %ymm2, c(%rip)
+   vpermd  %ymm0, %ymm1, %ymm0
+   vmovdqa .LC8(%rip), %ymm2
+   vpand   %ymm2, %ymm1, %ymm1
+   vpcmpeqd%ymm2, %ymm1, %ymm1
+   vpblendvb   %ymm1, %ymm3, %ymm0, %ymm0
vshuf-v8si.c -mavx512f test_2, another win?
-   vmovdqa a(%rip), %ymm0
-   vperm2i128  $0, %ymm0, %ymm0, %ymm0
+   vpermq  $68, a(%rip), %ymm0

The above does not list all changes, I've been often ignoring further changes
in the file if say one change adds or removes a .LC*, then everything else is
renumbered (and doesn't sometimes list cases where the same or similar change
appears with multiple ISAs). So the results are clearly mixed.

Perhaps I should just try doing this at the end of expand_vec_perm_1 (i.e. if
we (most likely) couldn't get a single insn normally, see if we would get it
otherwise), and at the end of ix86_expand_vec_perm_const_1 (as the fallback
after all sequences).  It won't catch some beneficial one insn to one insn
changes (e.g. where in the original case the insn needs a constant operand in
memory) though.

[Bug rtl-optimization/68670] [4.9/5/6 Regression] gcc.c-torture/execute/pr68376-2.c FAILs with -ftracer

2015-12-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68670

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |4.9.4

Re: [PATCH] Empty redirect_edge_var_map after each pass and function

2015-12-03 Thread Richard Biener
On Thu, 3 Dec 2015, Alan Lawrence wrote:

> On 02/12/15 14:13, Jeff Law wrote:
> > On 12/02/2015 01:33 AM, Richard Biener wrote:
> > > > Right.  So the question I have is how/why did DOM leave anything in the
> > > > map.
> > > > And if DOM is fixed to not leave stuff lying around, can we then assert
> > > > that
> > > > nothing is ever left in those maps between passes?  There's certainly no
> > > > good
> > > > reason I'm aware of why DOM would leave things in this state.
> > > 
> > > It happens not only with DOM but with all passes doing edge redirection.
> > > This is because the map is populated by GIMPLE cfg hooks just in case
> > > it might be used.  But there is no such thing as a "start CFG manip"
> > > and "end CFG manip" to cleanup such dead state.
> > Sigh.
> > 
> > > 
> > > IMHO the redirect-edge-var-map stuff is just the very most possible
> > > unclean implementation possible. :(  (see how remove_edge "clears"
> > > stale info from the map to avoid even more "interesting" stale
> > > data)
> > > 
> > > Ideally we could assert the map is empty whenever we leave a pass,
> > > but as said it triggers all over the place.  Even cfg-cleanup causes
> > > such stale data.
> > > 
> > > I agree that the patch is only a half-way "solution", but a full
> > > solution would require sth more explicit, like we do with
> > > initialize_original_copy_tables/free_original_copy_tables.  Thus
> > > require passes to explicitely request the edge data to be preserved
> > > with a initialize_edge_var_map/free_edge_var_map call pair.
> > > 
> > > Not appropriate at this stage IMHO (well, unless it turns out to be
> > > a very localized patch).
> > So maybe as a follow-up to aid folks in the future, how about a debugging
> > verify_whatever function that we can call manually if debugging a problem in
> > this space.  With a comment indicating why we can't call it unconditionally
> > (yet).
> > 
> > 
> > jeff
> 
> I did a (fwiw disable bootstrap) build with the map-emptying code in passes.c
> (not functions.c), printing out passes after which the map was non-empty
> (before emptying it, to make sure passes weren't just carrying through stale
> data from earlier). My (non-exhaustive!) list of passes after which the
> edge_var_redirect_map can be non-empty stands at...
> 
> aprefetch ccp cddce ch ch_vect copyprop crited crited cselim cunroll cunrolli
> dce dom ehcleanup einline esra fab fnsplit forwprop fre graphite ifcvt
> isolate-paths ldist lim local-pure-const mergephi oaccdevlow ompexpssa
> optimized parloops pcom phicprop phiopt phiprop pre profile profile_estimate
> sccp sink slsr split-paths sra switchconv tailc tailr tracer unswitch
> veclower2 vect vrm vrp whole-program

Yeah, exactly my findings...  note that most of the above are likely
due to cfgcleanup even though it already does sth like

  e = redirect_edge_and_branch (e, dest);
  redirect_edge_var_map_clear (e);

so eventually placing a redirect_edge_var_map_empty () at the end
of the cleanup_tree_cfg function should prune down the above list
considerably (well, then assert the map is empty on entry to that
function of course)

> FWIW, the route by which dom added the edge to the redirect map was:
> #0  redirect_edge_var_map_add (e=e@entry=0x7fb7a5f508, result=0x7fb725a000,
> def=0x7fb78eaea0, locus=2147483884) at ../../gcc/gcc/tree-ssa.c:54
> #1  0x00cccf58 in ssa_redirect_edge (e=e@entry=0x7fb7a5f508,
> dest=dest@entry=0x7fb79cc680) at ../../gcc/gcc/tree-ssa.c:158
> #2  0x00b00738 in gimple_redirect_edge_and_branch (e=0x7fb7a5f508,
> dest=0x7fb79cc680) at ../../gcc/gcc/tree-cfg.c:5662
> #3  0x006ec678 in redirect_edge_and_branch (e=e@entry=0x7fb7a5f508,
> dest=) at ../../gcc/gcc/cfghooks.c:356
> #4  0x00cb4530 in ssa_fix_duplicate_block_edges (rd=0x1a29f10,
> local_info=local_info@entry=0x7fed40)
> at ../../gcc/gcc/tree-ssa-threadupdate.c:1184
> #5  0x00cb5520 in ssa_fixup_template_block (slot=,
> local_info=0x7fed40) at ../../gcc/gcc/tree-ssa-threadupdate.c:1369
> #6  traverse_noresize (
> argument=0x7fed40, this=0x1a21a00) at ../../gcc/gcc/hash-table.h:911
> #7  traverse (
> argument=0x7fed40, this=0x1a21a00) at ../../gcc/gcc/hash-table.h:933
> #8  thread_block_1 (bb=bb@entry=0x7fb7485bc8,
> noloop_only=noloop_only@entry=true, joiners=joiners@entry=true)
> at ../../gcc/gcc/tree-ssa-threadupdate.c:1592
> #9  0x00cb5a40 in thread_block (bb=0x7fb7485bc8,
> noloop_only=noloop_only@entry=true)
> at ../../gcc/gcc/tree-ssa-threadupdate.c:1629
> ---Type  to continue, or q  to quit---
> #10 0x00cb6bf8 in thread_through_all_blocks (
> may_peel_loop_headers=true) at ../../gcc/gcc/tree-ssa-threadupdate.c:2736
> #11 0x00becf6c in (anonymous namespace)::pass_dominator::execute (
> this=, fun=0x7fb77d1b28)
> 

Re: [UPC 02/22] tree-related changes

2015-12-03 Thread Richard Biener
On Wed, 2 Dec 2015, Gary Funck wrote:

> On 12/01/15 12:26:32, Richard Biener wrote:
> > On Mon, 30 Nov 2015, Gary Funck wrote:
> > > -struct GTY(()) tree_type_common {
> > > +struct GTY((user)) tree_type_common {
> > >struct tree_common common;
> > >tree size;
> > >tree size_unit;
> > > @@ -1441,10 +1458,10 @@ struct GTY(()) tree_type_common {
> > >tree pointer_to;
> > >tree reference_to;
> > >union tree_type_symtab {
> > > -int GTY ((tag ("TYPE_SYMTAB_IS_ADDRESS"))) address;
> > > -const char * GTY ((tag ("TYPE_SYMTAB_IS_POINTER"))) pointer;
> > > -struct die_struct * GTY ((tag ("TYPE_SYMTAB_IS_DIE"))) die;
> > > -  } GTY ((desc ("debug_hooks->tree_type_symtab_field"))) symtab;
> > > +int address;
> > > +const char *pointer;
> > > +struct die_struct *die;
> > > +  } symtab;
> >
> > Err, you don't have debug info for this?  What is address?
> 
> Not sure what you mean.  The 'die' field is retained.
> Is there something in the semantics of "GTY(( ((tag "
> that relates to debug information?

Ah, sorry.  I misread the diff.

> > I do not like the explict GC of tree_type_common.
> 
> I'm not a fan either.
> 
> The gist is that we needed a map from tree nodes to tree nodes
> to record the "layout qualifier" for layout qualifiers with
> a value greater than one.  But when the garbage collector ran
> over the hash table that maps integer constants to tree nodes,
> it didn't know that the constant was being referenced by the
> layout qualifier tree map.
> 
> We described the issue here:
> https://gcc.gnu.org/ml/gcc-patches/2011-10/msg00800.html
> 
> The conclusion that we reached is that when tree nodes
> were walked, we needed to check if there was a
> tree node -> integer constant mapping, the integer constant map
> (used to make tree nodes used to hold CST's unique)
> needed to be marked to keep the CST mapping from going away.
> 
> This led to the conclusion that a custom GC routine was
> needed for tree nodes.  Maybe that conclusion is wrong or
> there is a better way to do things?

It should simply work as long as the hash-map is properly marked
as GC root.  It might _not_ work (reliably) if the hash-map is
also a "cache" by itself.  But it eventually works now given some
fixes went into the area of collecting/marking caches.

> > > ===
> > > --- gcc/tree-pretty-print.c   (.../trunk) (revision 231059)
> > > +++ gcc/tree-pretty-print.c   (.../branches/gupc) (revision 
> > > 231080)
> > > @@ -1105,6 +1105,25 @@ dump_block_node (pretty_printer *pp, tre
> > >  }
> > >  
> > >  
> > > +static void
> > > +dump_upc_type_quals (pretty_printer *buffer, tree type, int quals)
> >
> > Functions need comments.
> 
> OK.  Missed that one.  Will check on others.
> 
> > > Index: gcc/tree-sra.c
> > > ===
> > > --- gcc/tree-sra.c(.../trunk) (revision 231059)
> > > +++ gcc/tree-sra.c(.../branches/gupc) (revision 231080)
> > > @@ -3882,6 +3882,7 @@ find_param_candidates (void)
> > >  
> > > if (TREE_CODE (type) == FUNCTION_TYPE
> > > || TYPE_VOLATILE (type)
> > > +   || SHARED_TYPE_P (type)
> > 
> > UPC_SHARED_TYPE_P ()
> 
> OK. As I mentioned in a previous reply, originally we prefixed
> all "UPC" specific tree node fields and functions with UPC_ or upc_,
> but as we transitioned away from UPC as a separate language
> (ala ObjC) and made compilation conditional upon -fupc, an
> observation was made off list that since the base tree nodes
> are generic that naming UPC-related fields with "UPC" prefixes
> didn't make sense, so we removed those prefixes.  There might
> be a middle ground, however, whee UPC_SHARED_TYPE_P() is preferred
> to SHARED_TYPE_P() because as you/others have mentioned,
> the term "shared" gets used in a lot of contexts.

Yes, specifically for predicates/functions used in the middle-end.

> > > @@ -4381,6 +4422,7 @@ build1_stat (enum tree_code code, tree t
> > >/* Whether a dereference is readonly has nothing to do with whether
> > >its operand is readonly.  */
> > >TREE_READONLY (t) = 0;
> > > +  TREE_SHARED (t) = SHARED_TYPE_P (type);
> > 
> > This is frontend logic and should reside in FEs.
> 
> [... several other similar actions taken contingent
> upon SHARED_TYPE_P() elided ...]
> 
> OK, will take a look.
> 
> > > +  outer_is_pts_p = (POINTER_TYPE_P (outer_type)
> > > +&& SHARED_TYPE_P (TREE_TYPE (outer_type)));
> > > +  inner_is_pts_p = (POINTER_TYPE_P (inner_type)
> > > +&& SHARED_TYPE_P (TREE_TYPE (inner_type)));
> > > +
> > > +  /* Pointer-to-shared types have special
> > > + equivalence rules that must be checked.  */
> > > +  if (outer_is_pts_p && inner_is_pts_p
> > > +  && lang_hooks.types_compatible_p)
> > > +return lang_hooks.types_compatible_p (outer_type, inner_type);
> > 
> > Sorry, but 

[Bug rtl-optimization/68670] New: [4.9/5/6 Regression] gcc.c-torture/execute/pr68376-2.c FAILs with -ftracer

2015-12-03 Thread zsojka at seznam dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68670

Bug ID: 68670
   Summary: [4.9/5/6 Regression] gcc.c-torture/execute/pr68376-2.c
FAILs with -ftracer
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zsojka at seznam dot cz
  Target Milestone: ---

Created attachment 36894
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36894=edit
reduced testcase

As mentioned in PR68376#c8, the testcase is failing with -ftracer.

On the 5-branch, -O2 -ftracer is needed; -O -tracer is enough on other
branches.

armv7, aarch64, powerpc, x86_64 is affected; powerpc64 doesn't seem to fail.

Dumps up to .optimized look fine, it still seems to be an RTL optimizer bug.

Output:
$ $CC -O2 -ftracer testcase.c
$ ./a.out
Aborted


$ gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest/bin/x86_64-pc-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-231194-checking-yes-rtl-df-nographite/bin/../libexec/gcc/x86_64-pc-linux-gnu/6.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++
--enable-checking=yes,rtl,df --without-cloog --without-ppl --without-isl
--disable-libstdcxx-pch
--prefix=/repo/gcc-trunk//binary-trunk-231194-checking-yes-rtl-df-nographite
Thread model: posix
gcc version 6.0.0 20151202 (experimental) (GCC)

[Bug middle-end/68672] New: [4.9/5/6 Regression] g++.dg/torture/pr68470.C: ICE: cannot update SSA form: statement uses released SSA name

2015-12-03 Thread zsojka at seznam dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68672

Bug ID: 68672
   Summary: [4.9/5/6 Regression] g++.dg/torture/pr68470.C: ICE:
cannot update SSA form: statement uses released SSA
name
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zsojka at seznam dot cz
  Target Milestone: ---

Created attachment 36896
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36896=edit
preprocessed g++.dg/torture/pr68470.C

Compiler output:
$ x86_64-pc-linux-gnu-gcc -v -O -finline-small-functions -fpartial-inlining
--param=partial-inlining-entry-probability=100 pr68470.ii
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest/bin/x86_64-pc-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-231194-checking-yes-rtl-df-nographite/bin/../libexec/gcc/x86_64-pc-linux-gnu/6.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++
--enable-checking=yes,rtl,df --without-cloog --without-ppl --without-isl
--disable-libstdcxx-pch
--prefix=/repo/gcc-trunk//binary-trunk-231194-checking-yes-rtl-df-nographite
Thread model: posix
gcc version 6.0.0 20151202 (experimental) (GCC) 
/repo/gcc-trunk/gcc/testsuite/g++.dg/torture/pr68470.C: In member function
'virtual void I::m_fn3()':
/repo/gcc-trunk/gcc/testsuite/g++.dg/torture/pr68470.C:35:1: error: statement
uses released SSA name:
 }
 ^

# .MEM_20 = VDEF <.MEM>
MEM[(struct  &)_12] ={v} {CLOBBER};
The use of _12 should have been replaced
/repo/gcc-trunk/gcc/testsuite/g++.dg/torture/pr68470.C:35:1: internal compiler
error: cannot update SSA form
0xdfa08c update_ssa(unsigned int)
/repo/gcc-trunk/gcc/tree-into-ssa.c:3190
0xc5f257 execute_function_todo
/repo/gcc-trunk/gcc/passes.c:1926
0xc5fc9b execute_todo
/repo/gcc-trunk/gcc/passes.c:2010
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

Tested revisions:
trunk r231194 - ICE
5-branch r231055 - ICE
4_9-branch 231054 - ICE
4_8-branch r224828 - OK

[Bug target/68674] New: ARM attribute target neon warning: incompatible implicit declaration of built-in function

2015-12-03 Thread chrbr at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68674

Bug ID: 68674
   Summary: ARM attribute target neon warning: incompatible
implicit declaration of built-in function
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chrbr at gcc dot gnu.org
  Target Milestone: ---

this test compiled with

cc1   -O2 test.c  -mcpu=cortex-a9 -mfloat-abi=hard -quiet -mfpu=neon

#include "arm_neon.h"

int8x8_t a, b;
int16x8_t e;

void
__attribute__ ((target("fpu=neon")))
foo(void)
{
  e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
}

ICEs with 

test.c:10:18: warning: incompatible implicit declaration of built-in function
'__builtin_neon_vaddlsv8qi'
   e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
  ^

test.c:10:7: internal compiler error: in copy_to_mode_reg, at explow.c:595
   e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
   ^~~

Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta

2015-12-03 Thread Tom de Vries

On 30/11/15 14:32, Jakub Jelinek wrote:

On Mon, Nov 30, 2015 at 02:24:18PM +0100, Richard Biener wrote:

OK for stage3 trunk if bootstrap and reg-test succeeds?


-|| node->address_taken);
+|| (node->address_taken
+&& !node->parallelized_function));

please add a comment here on why this is safe.

Ok with this change.


BTW, __builting_GOMP_task supposedly can be treated similarly
if the third argument is NULL (if 3rd arg is non-NULL, then
the caller passes a different structure from what the callee receives,
but perhaps it could be emulated as pretending that cpyfn is called first
with address of a temporary var and the data argument and then fn
is called with the address of the temporary var).


Filed as PR68673 - Handle __builtin_GOMP_task optimally in ipa-pta.

Can you provide testcases for both (3rd arg NULL/non-NULL) cases? I'm 
not fluent in openmp.


Thanks,
- Tom


Re: [PATCH] Fix shrink-wrap bug with anticipating into loops (PR67778, PR68634)

2015-12-03 Thread Bernd Schmidt

On 12/02/2015 07:21 PM, Segher Boessenkool wrote:

After shrink-wrapping has found the "tightest fit" for where to place
the prologue, it tries move it earlier (so that frame saves are run
earlier) -- but without copying any more basic blocks.


Another question would be - is there really a good reason to do this at all?


Bernd


Re: Add fuzzing coverage support

2015-12-03 Thread Bernd Schmidt

On 12/02/2015 06:38 PM, Dmitry Vyukov wrote:

One thing to consider would
be whether you really need this split between O0/optimize versions, or
whether you can find a place in the queue where to insert it
unconditionally. Have you considered this at all or did you just follow
asan/tsan?


I inserted the pass just before asan/tsan because it looks like the
right place for it. If we do it after asan, it will insert coverage
for all asan-emited BBs which is highly undesirable. I also think it
is a good idea to run a bunch of optimizations before coverage pass to
not emit too many coverage callbacks (but I can't say that I am very
knowledgeable in this area). FWIW clang does the same: coverage passes
run just before asan/tsan.


There's one other thing I want to put out there. Is this kind of thing 
maybe what plugins were invented for? I don't really like the concept of 
plugins, but it seems to me that this sort of thing might be an 
application for them.



+public:
+  static pass_data pd ()
+  {
+static const pass_data data =



I think a static data member would be better than the unnecessary pd ()
function. This is also unlike existing practice, and I wonder how others
think about it. IMO a fairly strong case could be made that if we're using
C++, then this sort of thing ought to be part of the class definition.


I vary name of the pass depending on the O0 template argument (again
following asan):

 O0 ? "sancov_O0" : "sancov", /* name */

If we call it "sancov" always, then I can make it just a global var
(as all other passes in gcc).
Or I can make it a static variable of the template class and move
definition of the class (as you proposed).
What would you prefer?


I think I prefer the static var of the template class. I just wonder why 
we don't have the pass_data for all the existing passes as static data 
members? I'm sure there's some reason.


asan also distinguishes the name between asan/asan0. I'd either follow 
that naming convention, or remove the _O0 variant for all three of them. 
I lean towards the latter.



Bernd


Re: [ARM] Fix PR middle-end/65958

2015-12-03 Thread Eric Botcazou
> I can understand this restriction, but...
> 
> > +  /* See the same assertion on PROBE_INTERVAL above.  */
> > +  gcc_assert ((first % 4096) == 0);
> 
> ... why isn't this a test that FIRST is aligned to PROBE_INTERVAL?

Because that isn't guaranteed, FIRST is related to the size of the protection 
area while PROBE_INTERVAL is related to the page size.

> blank line between declarations and code. Also, can we come up with a
> suitable define for 4096 here that expresses the context and then use
> that consistently through the remainder of this function?

OK, let's use ARITH_BASE.

> > +(define_insn "probe_stack_range"
> > +  [(set (match_operand:DI 0 "register_operand" "=r")
> > +   (unspec_volatile:DI [(match_operand:DI 1 "register_operand" "0")
> > +(match_operand:DI 2 "register_operand" "r")]
> > +UNSPEC_PROBE_STACK_RANGE))]
> 
> I think this should really use PTRmode, so that it's ILP32 ready (I'm
> not going to ask you to make sure that works though, since I suspect
> there are still other issues to resolve with ILP32 at this time).

Done.  Manually tested for now, I'll fully test it if approved.


PR middle-end/65958
* config/aarch64/aarch64-protos.h (aarch64_output_probe_stack-range):
Declare.
* config/aarch64/aarch64.md: Declare UNSPECV_BLOCKAGE and
UNSPEC_PROBE_STACK_RANGE.
(blockage): New instruction.
(probe_stack_range_): Likewise.
* config/aarch64/aarch64.c (aarch64_emit_probe_stack_range): New
function.
(aarch64_output_probe_stack_range): Likewise.
(aarch64_expand_prologue): Invoke aarch64_emit_probe_stack_range if
static builtin stack checking is enabled.
* config/aarch64/aarch64-linux.h (STACK_CHECK_STATIC_BUILTIN):
Define.

-- 
Eric BotcazouIndex: config/aarch64/aarch64-linux.h
===
--- config/aarch64/aarch64-linux.h	(revision 231206)
+++ config/aarch64/aarch64-linux.h	(working copy)
@@ -88,4 +88,7 @@
 #undef TARGET_BINDS_LOCAL_P
 #define TARGET_BINDS_LOCAL_P default_binds_local_p_2
 
+/* Define this to be nonzero if static stack checking is supported.  */
+#define STACK_CHECK_STATIC_BUILTIN 1
+
 #endif  /* GCC_AARCH64_LINUX_H */
Index: config/aarch64/aarch64-protos.h
===
--- config/aarch64/aarch64-protos.h	(revision 231206)
+++ config/aarch64/aarch64-protos.h	(working copy)
@@ -340,6 +340,7 @@ void aarch64_asm_output_labelref (FILE *
 void aarch64_cpu_cpp_builtins (cpp_reader *);
 void aarch64_elf_asm_named_section (const char *, unsigned, tree);
 const char * aarch64_gen_far_branch (rtx *, int, const char *, const char *);
+const char * aarch64_output_probe_stack_range (rtx, rtx);
 void aarch64_err_no_fpadvsimd (machine_mode, const char *);
 void aarch64_expand_epilogue (bool);
 void aarch64_expand_mov_immediate (rtx, rtx);
Index: config/aarch64/aarch64.c
===
--- config/aarch64/aarch64.c	(revision 231206)
+++ config/aarch64/aarch64.c	(working copy)
@@ -62,6 +62,7 @@
 #include "sched-int.h"
 #include "cortex-a57-fma-steering.h"
 #include "target-globals.h"
+#include "common/common-target.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -2183,6 +2184,179 @@ aarch64_libgcc_cmp_return_mode (void)
   return SImode;
 }
 
+#define PROBE_INTERVAL (1 << STACK_CHECK_PROBE_INTERVAL_EXP)
+
+/* We use the 12-bit shifted immediate arithmetic instructions so values
+   must be multiple of (1 << 12), i.e. 4096.  */
+#define ARITH_BASE 4096
+
+#if (PROBE_INTERVAL % ARITH_BASE) != 0
+#error Cannot use simple address calculation for stack probing
+#endif
+
+/* The pair of scratch registers used for stack probing.  */
+#define PROBE_STACK_FIRST_REG  9
+#define PROBE_STACK_SECOND_REG 10
+
+/* Emit code to probe a range of stack addresses from FIRST to FIRST+SIZE,
+   inclusive.  These are offsets from the current stack pointer.  */
+
+static void
+aarch64_emit_probe_stack_range (HOST_WIDE_INT first, HOST_WIDE_INT size)
+{
+  rtx reg1 = gen_rtx_REG (ptr_mode, PROBE_STACK_FIRST_REG);
+
+  /* See the same assertion on PROBE_INTERVAL above.  */
+  gcc_assert ((first % ARITH_BASE) == 0);
+
+  /* See if we have a constant small number of probes to generate.  If so,
+ that's the easy case.  */
+  if (size <= PROBE_INTERVAL)
+{
+  const HOST_WIDE_INT base = ROUND_UP (size, ARITH_BASE);
+
+  emit_set_insn (reg1,
+		 plus_constant (ptr_mode,
+stack_pointer_rtx, -(first + base)));
+  emit_stack_probe (plus_constant (ptr_mode, reg1, base - size));
+}
+
+  /* The run-time loop is made up of 8 insns in the generic case while the
+ compile-time loop is made up of 4+2*(n-2) insns for n # of intervals.  */
+  else if (size <= 4 * PROBE_INTERVAL)
+{
+  HOST_WIDE_INT i, rem;
+
+  emit_set_insn (reg1,
+		

[Bug sanitizer/68650] Firefox compilation fails with Address Sanitizer (error: undefined reference to 'dlerror')

2015-12-03 Thread gk at torproject dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68650

--- Comment #6 from Georg Koppen  ---
Alright, thanks. So, what happens with r215527 is that checking for dlopen()
working properly in the configure script is not enough anymore to decide
whether one needs -ldl needs to get added explicitly if address sanitizer is
enabled. If it is not enabled the dlopen() check is still sufficient.

If one uses dlsym() for instance instead the check for needing -ldl is working
as before.

I still don't understand exactly why this is needed but I guess this is a thing
which needs to get fixed on the Mozilla side. Sorry for the noise.

Re: [ARM] Fix PR middle-end/65958

2015-12-03 Thread Richard Earnshaw
Sorry for the delay, very busy on other things these days...

On 16/11/15 20:00, Eric Botcazou wrote:
>> More comments inline.
>
> Revised version attached, which addresses all your comments and in
particular
> removes the
>
> +#if PROBE_INTERVAL > 4096
> +#error Cannot use indexed addressing mode for stack probing
> +#endif
>
> compile-time assertion.  It generates the same code for PROBE_INTERVAL
== 4096
> as before and it generates code that can be assembled for 8192.
>
> Tested on Aarch64/Linux, OK for the mainline?
>

> +#define PROBE_INTERVAL (1 << STACK_CHECK_PROBE_INTERVAL_EXP)
> +
> +/* We use the 12-bit shifted immediate arithmetic instructions so values
> +   must be multiple of (1 << 12), i.e. 4096.  */
> +#if (PROBE_INTERVAL % 4096) != 0

I can understand this restriction, but...

> +  /* See the same assertion on PROBE_INTERVAL above.  */
> +  gcc_assert ((first % 4096) == 0);

... why isn't this a test that FIRST is aligned to PROBE_INTERVAL?

> +  /* See if we have a constant small number of probes to generate.
If so,
> + that's the easy case.  */
> +  if (size <= PROBE_INTERVAL)
> +{
> +  const HOST_WIDE_INT base = ROUND_UP (size, 4096);
> +  emit_set_insn (reg1,

blank line between declarations and code. Also, can we come up with a
suitable define for 4096 here that expresses the context and then use
that consistently through the remainder of this function?

> +(define_insn "probe_stack_range"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> + (unspec_volatile:DI [(match_operand:DI 1 "register_operand" "0")
> +  (match_operand:DI 2 "register_operand" "r")]
> +  UNSPEC_PROBE_STACK_RANGE))]

I think this should really use PTRmode, so that it's ILP32 ready (I'm
not going to ask you to make sure that works though, since I suspect
there are still other issues to resolve with ILP32 at this time).

R.




Re: [PATCH] Handle BUILT_IN_GOACC_PARALLEL in ipa-pta

2015-12-03 Thread Tom de Vries

On 03/12/15 09:59, Richard Biener wrote:

On Thu, 3 Dec 2015, Tom de Vries wrote:


On 03/12/15 01:10, Tom de Vries wrote:


I've managed to reproduce it. The difference between pass and fail is
whether the compiler is configured with or without accelerator.

I'll look into it.


In the configuration with accelerator, the flag node->force_output is on for
foo._omp.fn.

This causes nonlocal_p to be true in ipa_pta_execute, which causes the
optimization to fail.

The flag is decribed as:
...
   /* The symbol will be assumed to be used in an invisible way (like
  by an toplevel asm statement).  */
  ...

Looks like I have to ignore the force_output flag as well in ipa_pta_execute
for this sort of node.


It rather looks like the flag shouldn't be set.  The fn after all has
its address taken!(?)



The flag is set here in expand_omp_target:
...
12682 /* Prevent IPA from removing child_fn as unreachable,
 since there are no
12683refs from the parent function to child_fn in offload
 LTO mode.  */
12684 if (ENABLE_OFFLOADING)
12685   cgraph_node::get (child_fn)->mark_force_output ();
...

I guess setting forced_by_abi instead would also mean child_fn is not 
removed as unreachable, while still allowing optimizations:

...
  /* Like FORCE_OUTPUT, but in the case it is ABI requiring the symbol
 to be exported.  Unlike FORCE_OUTPUT this flag gets cleared to
 symbols promoted to static and it does not inhibit
 optimization.  */
  unsigned forced_by_abi : 1;
...

But I suspect that other optimizations (than ipa-pta) might break things.

Essentially we have two situations:
- in the host compiler, there is no need for the forced_output flag,
  and it inhibits optimization
- in the accelerator compiler, it (or some equivalent) is needed

I wonder if setting the force_output flag only when streaming the 
bytecode for offloading would work. That way, it wouldn't be set in the 
host compiler, while being set in the accelerator compiler.


Thanks,
- Tom


[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2015-12-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 66051, which changed state.

Bug 66051 Summary: can't vectorize reductions inside an SLP group
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66051

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/66051] can't vectorize reductions inside an SLP group

2015-12-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66051

--- Comment #3 from Richard Biener  ---
Author: rguenth
Date: Thu Dec  3 11:26:56 2015
New Revision: 231225

URL: https://gcc.gnu.org/viewcvs?rev=231225=gcc=rev
Log:
2015-12-03  Richard Biener  

PR tree-optimization/66051
* tree-vect-slp.c (vect_build_slp_tree_1): Remove restriction
on load group size.  Do not pass in vectorization_factor.
(vect_transform_slp_perm_load): Do not require any permute support.
(vect_build_slp_tree): Do not pass in vectorization factor.
(vect_analyze_slp_instance): Do not compute vectorization
factor estimate.  Use vector size instead of vectorization factor
estimate to split store groups for BB vectorization.

* gcc.dg/vect/slp-42.c: New testcase.

Added:
trunk/gcc/testsuite/gcc.dg/vect/slp-42.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-vect-slp.c

[Bug tree-optimization/66051] can't vectorize reductions inside an SLP group

2015-12-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66051

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Richard Biener  ---
Fixed.

Re: [Patch, fortran] PR68534 - No error on mismatch in number of arguments between submodule and module interface

2015-12-03 Thread Paul Richard Thomas
Dear Steve,

I'll take a look at this this afternoon. Thanks for bringing it to my attention.

Cheers

Paul

On 3 December 2015 at 07:43, Steve Kargl
 wrote:
> On Wed, Dec 02, 2015 at 10:26:30PM -0800, Steve Kargl wrote:
>> On Wed, Dec 02, 2015 at 10:02:33PM -0800, Steve Kargl wrote:
>> > Paul,
>> >
>> > I'm stumped.  Something is broken on i386-*-freebsd. :-(
>> >
>> > Running /mnt/kargl/gcc/gcc/testsuite/gfortran.dg/dg.exp ...
>> > FAIL: gfortran.dg/submodule_10.f08   -O  (internal compiler error)
>> > FAIL: gfortran.dg/submodule_10.f08   -O  (test for excess errors)
>> > FAIL: gfortran.dg/submodule_11.f08   -O0  (internal compiler error)
>> > FAIL: gfortran.dg/submodule_11.f08   -O0  (test for excess errors)
>> > FAIL: gfortran.dg/submodule_11.f08   -O1  (internal compiler error)
>> > FAIL: gfortran.dg/submodule_11.f08   -O1  (test for excess errors)
>> > FAIL: gfortran.dg/submodule_11.f08   -O2  (internal compiler error)
>> > FAIL: gfortran.dg/submodule_11.f08   -O2  (test for excess errors)
>> > FAIL: gfortran.dg/submodule_11.f08   -O3 -fomit-frame-pointer 
>> > -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal 
>> > compiler error)
>> > FAIL: gfortran.dg/submodule_11.f08   -O3 -fomit-frame-pointer 
>> > -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
>> > errors)
>> > FAIL: gfortran.dg/submodule_11.f08   -O3 -g  (internal compiler error)
>> > FAIL: gfortran.dg/submodule_11.f08   -O3 -g  (test for excess errors)
>> > FAIL: gfortran.dg/submodule_11.f08   -Os  (internal compiler error)
>> > FAIL: gfortran.dg/submodule_11.f08   -Os  (test for excess errors)
>>
>> Well, if I change the order of the conditionals decl.c:4831, I
>> can get rid of the above FAILs.
>>
>> Index: decl.c
>> ===
>> --- decl.c  (revision 231219)
>> +++ decl.c  (working copy)
>> @@ -4826,7 +4826,7 @@ ok:
>>
>>/* Abbreviated module procedure declaration is not meant to have any
>>  formal arguments!  */
>> -  if (!sym->abr_modproc_decl && formal && !head)
>> +  if (formal && !head && sym && !sym->abr_modproc_decl)
>> arg_count_mismatch = true;
>>
>>for (p = formal, q = head; p && q; p = p->next, q = q->next)
>>
>> --
>> steve
>>
>> > FAIL: gfortran.dg/submodule_13.f08   -O  (internal compiler error)
>> > FAIL: gfortran.dg/submodule_13.f08   -O   (test for errors, line 29)
>> > FAIL: gfortran.dg/submodule_13.f08   -O  (test for excess errors)
>
> These ICEs persist at line 4831.  In looking at the code, I'm
> now somewhat unsure what it should be doing.  In particular,
> there are 2 gfc_error_now() calls in the below:
>
>
>   for (p = formal, q = head; p && q; p = p->next, q = q->next)
> {
>   if ((p->next != NULL && q->next == NULL)
>   || (p->next == NULL && q->next != NULL))
> arg_count_mismatch = true;
>   else if ((p->sym == NULL && q->sym == NULL)
> || strcmp (p->sym->name, q->sym->name) == 0)
> continue;
>   else
> gfc_error_now ("Mismatch in MODULE PROCEDURE formal "
>"argument names (%s/%s) at %C",
>p->sym->name, q->sym->name);
> }
>
>   if (arg_count_mismatch)
>   gfc_error_now ("Mismatch in number of MODULE PROCEDURE "
>  "formal arguments at %C");
> }
>
>   return MATCH_YES;
>
> cleanup:
>   gfc_free_formal_arglist (head);
>   return m;
>
> But, we return MATCH_YES?  I would expect setting m = MATCH_ERROR
> and jumping to cleanup.  That's ugly.
>
> --
> Steve



-- 
Outside of a dog, a book is a man's best friend. Inside of a dog it's
too dark to read.

Groucho Marx


Re: [PATCH, 4/16] Implement -foffload-alias

2015-12-03 Thread Tom de Vries

On 11/11/15 12:00, Jakub Jelinek wrote:

On Wed, Nov 11, 2015 at 11:51:02AM +0100, Richard Biener wrote:

The option -foffload-alias=pointer instructs the compiler to assume that
objects references in an offload region do not alias.

The option -foffload-alias=all instructs the compiler to make no
assumptions about aliasing in offload regions.

The default value is -foffload-alias=none.


I think global options for this is nonsense.  Please follow what
we do for #pragma GCC ivdep for example, thus allow the alias
behavior to be specified per "region" (whatever makes sense here
in the context of offloading).


Yeah, completely agreed.  I don't see why the offloaded region would be in
any way special, they are C/C++/Fortran code as any other.
What we can and should improve is teach IPA aliasing/points to analysis
about the way we lower the host vs. offloading region boundary, so that
if alias analysis on the caller of GOMP_target_ext/GOACC_parallel_keyed
determines something it can be used on the offloaded function side and vice
versa, but a switch like the above is just wrong.


Filed the GOMP_target_ext bit as PR 68675 - Handle GOMP_target_ext 
optimally in ipa-pta.


Thanks,
- Tom


[Bug target/68655] SSE2 cannot vec_perm of low and high part

2015-12-03 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68655

--- Comment #5 from Jakub Jelinek  ---
Created attachment 36897
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36897=edit
gcc6-pr68655.patch

Initial untested patch.  Unfortunately, it doesn't seem to be always a win,
when looking at the differences between old and new compiler.
I'm looking at
cd /usr/src/gcc/gcc/testsuite/gcc.dg/torture; for i in vshuf-v*[hqs]i.c; do for
j in -msse2 -msse4 -mavx -mavx2 -mavx512f -mavx512bw; do
/usr/src/gcc/obj/gcc/cc1.v246 -quiet -O2 $j $i -DEXPENSIVE -o /tmp/1.s;
/usr/src/gcc/obj/gcc/cc1 -quiet -O2 $j $i -DEXPENSIVE -o /tmp/2.s; echo ===$i
$j===; diff -up /tmp/1.s /tmp/2.s; done; done
output now (where cc1.v246 is vanilla cc1, cc1 is one with this patch applied).
In some cases the patch helps, but I've seen so far some cases where for
AVX512* it resulted in more instructions.

[Bug tree-optimization/68675] New: Handle GOMP_target_ext optimally in ipa-pta

2015-12-03 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68675

Bug ID: 68675
   Summary: Handle GOMP_target_ext optimally in ipa-pta
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01336.html :
...
What we can and should improve is teach IPA aliasing/points to analysis
about the way we lower the host vs. offloading region boundary, so that
if alias analysis on the caller of GOMP_target_ext/GOACC_parallel_keyed
determines something it can be used on the offloaded function side and vice
versa.
...

GOACC_parallel_keyed has been implemented, todo: GOMP_target_ext

[Bug c++/68669] [5 regression] -Wunused-variable is not correctly supressed by #pragmas

2015-12-03 Thread trippels at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68669

Markus Trippelsdorf  changed:

   What|Removed |Added

 Status|WAITING |NEW
Version|5.1.0   |5.2.1
Summary|-Wunused-variable is not|[5 regression]
   |correctly supressed by  |-Wunused-variable is not
   |#pragmas|correctly supressed by
   ||#pragmas

--- Comment #3 from Markus Trippelsdorf  ---
markus@x4 tmp % cat run_tests.ii
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wunused-variable"
class A {
public:
  static A _fn1();
};
namespace {
A  = A::m_fn1();
}
#pragma GCC diagnostic pop

markus@x4 tmp % g++ -Wall -std=c++14 -O2 -c run_tests.ii
run_tests.ii:8:4: warning: ‘{anonymous}::a’ defined but not used
[-Wunused-variable]
 A  = A::m_fn1();
^

gcc-6 doesn't warn at all (even without the "diagnostic ignored" pragma).

[Bug middle-end/68672] [4.9/5/6 Regression] g++.dg/torture/pr68470.C: ICE: cannot update SSA form: statement uses released SSA name

2015-12-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68672

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |4.9.4

[Bug tree-optimization/68671] [5/6 Regression] gcc.dg/torture/pr66952.c FAILs with -fno-tree-dce

2015-12-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68671

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |5.4

[Bug rtl-optimization/68670] [4.9/5/6 Regression] gcc.c-torture/execute/pr68376-2.c FAILs with -ftracer

2015-12-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68670

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-12-03
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Confirmed.

[Bug target/68655] SSE2 cannot vec_perm of low and high part

2015-12-03 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68655

--- Comment #8 from rguenther at suse dot de  ---
On Thu, 3 Dec 2015, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68655
> 
> --- Comment #7 from Jakub Jelinek  ---
> I guess it needs analysis.
> Some examples of changes:
> vshuf-v16qi.c -msse2 test_2, scalar code vs. punpcklqdq, clear win
> vshuf-v16qi.c -msse4 test_2, pshufb -> punpcklqdq (is this a win or not?)
> (similarly for -mavx, -mavx2, -mavx512f, -mavx512bw)
> vshuf-v16si.c -mavx512{f,bw} test_2:
> -   vpermi2d%zmm1, %zmm1, %zmm0
> +   vmovdqa64   .LC2(%rip), %zmm0
> +   vpermi2q%zmm1, %zmm1, %zmm0
> looks like pessimization.
> vshuf-v32hi.c -mavx512bw test_2, similar pessimization.
> vshuf-v32hi.c -mavx512bw test_2, similarly:
> -   vpermi2w%zmm1, %zmm1, %zmm0
> +   vmovdqa64   .LC2(%rip), %zmm0
> +   vpermi2q%zmm1, %zmm1, %zmm0
> vshuf-v4si.c -msse2 test_183, another pessimization:
> -   pshufd  $78, %xmm0, %xmm1
> +   movdqa  %xmm0, %xmm1
> movdb(%rip), %xmm4
> pshufd  $255, %xmm0, %xmm2
> +   shufpd  $1, %xmm0, %xmm1
> vshuf-v4si.c -msse4 test_183, another pessimization:
> -   pshufd  $78, %xmm1, %xmm0
> +   movdqa  %xmm1, %xmm0
> +   palignr $8, %xmm0, %xmm0
> vshuf-v4si.c -mavx test_183:
> -   vpshufd $78, %xmm1, %xmm0
> +   vpalignr$8, %xmm1, %xmm1, %xmm0
> vshuf-v64qi.c -mavx512bw, desirable change:
> -   vpermi2w%zmm1, %zmm1, %zmm0
> -   vpshufb .LC3(%rip), %zmm0, %zmm1
> -   vpshufb .LC4(%rip), %zmm0, %zmm0
> -   vporq   %zmm0, %zmm1, %zmm0
> +   vpermi2q%zmm1, %zmm1, %zmm0
> vshuf-v8hi.c -msse2 test_1 another scalar to punpcklqdq, win
> vshuf-v8hi.c -msse4 test_2 (supposedly a win):
> -   pshufb  .LC3(%rip), %xmm0
> +   punpcklqdq  %xmm0, %xmm0
> vshuf-v8hi.c -mavx test_2, similarly:
> -   vpshufb .LC3(%rip), %xmm0, %xmm0
> +   vpunpcklqdq %xmm0, %xmm0, %xmm0
> vshuf-v8si.c -mavx2 test_2, another win:
> -   vmovdqa a(%rip), %ymm0
> -   vperm2i128  $0, %ymm0, %ymm0, %ymm0
> +   vpermq  $68, a(%rip), %ymm0
> vshuf-v8si.c -mavx2 test_5, another win:
> -   vmovdqa .LC6(%rip), %ymm0
> -   vmovdqa .LC7(%rip), %ymm1
> -   vmovdqa %ymm0, -48(%rbp)
> vmovdqa a(%rip), %ymm0
> -   vpermd  %ymm0, %ymm1, %ymm1
> -   vpshufb .LC8(%rip), %ymm0, %ymm3
> -   vpshufb .LC10(%rip), %ymm0, %ymm0
> -   vmovdqa %ymm1, c(%rip)
> -   vmovdqa b(%rip), %ymm1
> -   vpermq  $78, %ymm3, %ymm3
> -   vpshufb .LC9(%rip), %ymm1, %ymm2
> -   vpshufb .LC11(%rip), %ymm1, %ymm1
> -   vpor%ymm3, %ymm0, %ymm0
> -   vpermq  $78, %ymm2, %ymm2
> -   vpor%ymm2, %ymm1, %ymm1
> -   vpor%ymm1, %ymm0, %ymm0
> +   vmovdqa .LC7(%rip), %ymm2
> +   vmovdqa .LC6(%rip), %ymm1
> +   vpermd  %ymm0, %ymm2, %ymm2
> +   vpermd  b(%rip), %ymm1, %ymm3
> +   vmovdqa %ymm1, -48(%rbp)
> +   vmovdqa %ymm2, c(%rip)
> +   vpermd  %ymm0, %ymm1, %ymm0
> +   vmovdqa .LC8(%rip), %ymm2
> +   vpand   %ymm2, %ymm1, %ymm1
> +   vpcmpeqd%ymm2, %ymm1, %ymm1
> +   vpblendvb   %ymm1, %ymm3, %ymm0, %ymm0
> vshuf-v8si.c -mavx512f test_2, another win?
> -   vmovdqa a(%rip), %ymm0
> -   vperm2i128  $0, %ymm0, %ymm0, %ymm0
> +   vpermq  $68, a(%rip), %ymm0
> 
> The above does not list all changes, I've been often ignoring further changes
> in the file if say one change adds or removes a .LC*, then everything else is
> renumbered (and doesn't sometimes list cases where the same or similar change
> appears with multiple ISAs). So the results are clearly mixed.
> 
> Perhaps I should just try doing this at the end of expand_vec_perm_1 (i.e. if
> we (most likely) couldn't get a single insn normally, see if we would get it
> otherwise), and at the end of ix86_expand_vec_perm_const_1 (as the fallback
> after all sequences).

Yeah, I would have done it only if we fail to permute, not generally.
I think you need to stop at 16 byte boundaries (TImode) only for AVX256
and 32byte (OImode) for AVX512.  Not sure if there are cases where
a "effective" DImode permute works with SImode but not DImode,
say { 4, 5, 6, 7, 0, 1, 2, 3 } HImode can be done with both an
SImode { 2, 3, 0, 1 } or a DImode { 1, 0 } permute.

> It won't catch some beneficial one insn to one insn
> changes (e.g. where in the original case the insn needs a constant operand in
> memory) though.

True.  I fear that at some point we want a generator covering all
possible permutes using permute patterns (input would be the .md
file and a list of insns to consider - or maybe even autodetect those).

The code handling permutation is already quite unwieldly (and it tries
generating RTL ...) :/

[Bug c++/68669] -Wunused-variable is not correctly supressed by #pragmas

2015-12-03 Thread pavel.celba at ricardo dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68669

--- Comment #2 from Pavel Celba  ---
Created attachment 36893
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36893=edit
Preprocessed run_tests.cpp

Added the pre-processed run_tests.cpp

Re: [PATCH] Handle BUILT_IN_GOACC_PARALLEL in ipa-pta

2015-12-03 Thread Richard Biener
On Thu, 3 Dec 2015, Tom de Vries wrote:

> On 03/12/15 09:59, Richard Biener wrote:
> > On Thu, 3 Dec 2015, Tom de Vries wrote:
> > 
> > > On 03/12/15 01:10, Tom de Vries wrote:
> > > > 
> > > > I've managed to reproduce it. The difference between pass and fail is
> > > > whether the compiler is configured with or without accelerator.
> > > > 
> > > > I'll look into it.
> > > 
> > > In the configuration with accelerator, the flag node->force_output is on
> > > for
> > > foo._omp.fn.
> > > 
> > > This causes nonlocal_p to be true in ipa_pta_execute, which causes the
> > > optimization to fail.
> > > 
> > > The flag is decribed as:
> > > ...
> > >/* The symbol will be assumed to be used in an invisible way (like
> > >   by an toplevel asm statement).  */
> > >   ...
> > > 
> > > Looks like I have to ignore the force_output flag as well in
> > > ipa_pta_execute
> > > for this sort of node.
> > 
> > It rather looks like the flag shouldn't be set.  The fn after all has
> > its address taken!(?)
> > 
> 
> The flag is set here in expand_omp_target:
> ...
> 12682 /* Prevent IPA from removing child_fn as unreachable,
>  since there are no
> 12683refs from the parent function to child_fn in offload
>  LTO mode.  */
> 12684 if (ENABLE_OFFLOADING)
> 12685   cgraph_node::get (child_fn)->mark_force_output ();
> ...
> 

How are there no refs from the "parent"?  Are there not refs from
some kind of descriptor that maps fallback CPU and offloaded variants?

I think the above needs sorting out in somw way, making the refs
explicit rather than implicit via force_output.

> I guess setting forced_by_abi instead would also mean child_fn is not removed
> as unreachable, while still allowing optimizations:
> ...
>   /* Like FORCE_OUTPUT, but in the case it is ABI requiring the symbol
>  to be exported.  Unlike FORCE_OUTPUT this flag gets cleared to
>  symbols promoted to static and it does not inhibit
>  optimization.  */
>   unsigned forced_by_abi : 1;
> ...
> 
> But I suspect that other optimizations (than ipa-pta) might break things.

How so?

> Essentially we have two situations:
> - in the host compiler, there is no need for the forced_output flag,
>   and it inhibits optimization
> - in the accelerator compiler, it (or some equivalent) is needed
> 
> I wonder if setting the force_output flag only when streaming the bytecode for
> offloading would work. That way, it wouldn't be set in the host compiler,
> while being set in the accelerator compiler.

Yeah, that was my original thinking btw.

Richard.


Re: [PATCH] Empty redirect_edge_var_map after each pass and function

2015-12-03 Thread Alan Lawrence

On 02/12/15 14:13, Jeff Law wrote:

On 12/02/2015 01:33 AM, Richard Biener wrote:

Right.  So the question I have is how/why did DOM leave anything in the map.
And if DOM is fixed to not leave stuff lying around, can we then assert that
nothing is ever left in those maps between passes?  There's certainly no good
reason I'm aware of why DOM would leave things in this state.


It happens not only with DOM but with all passes doing edge redirection.
This is because the map is populated by GIMPLE cfg hooks just in case
it might be used.  But there is no such thing as a "start CFG manip"
and "end CFG manip" to cleanup such dead state.

Sigh.



IMHO the redirect-edge-var-map stuff is just the very most possible
unclean implementation possible. :(  (see how remove_edge "clears"
stale info from the map to avoid even more "interesting" stale
data)

Ideally we could assert the map is empty whenever we leave a pass,
but as said it triggers all over the place.  Even cfg-cleanup causes
such stale data.

I agree that the patch is only a half-way "solution", but a full
solution would require sth more explicit, like we do with
initialize_original_copy_tables/free_original_copy_tables.  Thus
require passes to explicitely request the edge data to be preserved
with a initialize_edge_var_map/free_edge_var_map call pair.

Not appropriate at this stage IMHO (well, unless it turns out to be
a very localized patch).

So maybe as a follow-up to aid folks in the future, how about a debugging
verify_whatever function that we can call manually if debugging a problem in
this space.  With a comment indicating why we can't call it unconditionally 
(yet).


jeff


I did a (fwiw disable bootstrap) build with the map-emptying code in passes.c 
(not functions.c), printing out passes after which the map was non-empty (before 
emptying it, to make sure passes weren't just carrying through stale data from 
earlier). My (non-exhaustive!) list of passes after which the 
edge_var_redirect_map can be non-empty stands at...


aprefetch ccp cddce ch ch_vect copyprop crited crited cselim cunroll cunrolli 
dce dom ehcleanup einline esra fab fnsplit forwprop fre graphite ifcvt 
isolate-paths ldist lim local-pure-const mergephi oaccdevlow ompexpssa optimized 
parloops pcom phicprop phiopt phiprop pre profile profile_estimate sccp sink 
slsr split-paths sra switchconv tailc tailr tracer unswitch veclower2 vect vrm 
vrp whole-program


FWIW, the route by which dom added the edge to the redirect map was:
#0  redirect_edge_var_map_add (e=e@entry=0x7fb7a5f508, result=0x7fb725a000,
def=0x7fb78eaea0, locus=2147483884) at ../../gcc/gcc/tree-ssa.c:54
#1  0x00cccf58 in ssa_redirect_edge (e=e@entry=0x7fb7a5f508,
dest=dest@entry=0x7fb79cc680) at ../../gcc/gcc/tree-ssa.c:158
#2  0x00b00738 in gimple_redirect_edge_and_branch (e=0x7fb7a5f508,
dest=0x7fb79cc680) at ../../gcc/gcc/tree-cfg.c:5662
#3  0x006ec678 in redirect_edge_and_branch (e=e@entry=0x7fb7a5f508,
dest=) at ../../gcc/gcc/cfghooks.c:356
#4  0x00cb4530 in ssa_fix_duplicate_block_edges (rd=0x1a29f10,
local_info=local_info@entry=0x7fed40)
at ../../gcc/gcc/tree-ssa-threadupdate.c:1184
#5  0x00cb5520 in ssa_fixup_template_block (slot=,
local_info=0x7fed40) at ../../gcc/gcc/tree-ssa-threadupdate.c:1369
#6  traverse_noresize (
argument=0x7fed40, this=0x1a21a00) at ../../gcc/gcc/hash-table.h:911
#7  traverse (
argument=0x7fed40, this=0x1a21a00) at ../../gcc/gcc/hash-table.h:933
#8  thread_block_1 (bb=bb@entry=0x7fb7485bc8,
noloop_only=noloop_only@entry=true, joiners=joiners@entry=true)
at ../../gcc/gcc/tree-ssa-threadupdate.c:1592
#9  0x00cb5a40 in thread_block (bb=0x7fb7485bc8,
noloop_only=noloop_only@entry=true)
at ../../gcc/gcc/tree-ssa-threadupdate.c:1629
---Type  to continue, or q  to quit---
#10 0x00cb6bf8 in thread_through_all_blocks (
may_peel_loop_headers=true) at ../../gcc/gcc/tree-ssa-threadupdate.c:2736
#11 0x00becf6c in (anonymous namespace)::pass_dominator::execute (
this=, fun=0x7fb77d1b28)
at ../../gcc/gcc/tree-ssa-dom.c:622
#12 0x009feef4 in execute_one_pass (pass=pass@entry=0x16d1a80)
at ../../gcc/gcc/passes.c:2311

The edge is then deleted much later:
#3  0x00f858e4 in free_edge (fn=, e=)
at ../../gcc/gcc/cfg.c:91
#4  remove_edge_raw (e=) at ../../gcc/gcc/cfg.c:350
#5  0x006ec814 in remove_edge (e=)
at ../../gcc/gcc/cfghooks.c:418
#6  0x006ecaec in delete_basic_block (bb=bb@entry=0x7fb74b3618)
at ../../gcc/gcc/cfghooks.c:597
#7  0x00f8d1d4 in try_optimize_cfg (mode=32)
at ../../gcc/gcc/cfgcleanup.c:2701
#8  cleanup_cfg (mode=mode@entry=32) at ../../gcc/gcc/cfgcleanup.c:3028
#9  0x0070180c in cfg_layout_initialize (flags=flags@entry=0)
at ../../gcc/gcc/cfgrtl.c:4264
#10 0x00f7cdc8 in 

Re: building gcc with macro support for gdb?

2015-12-03 Thread Andreas Schwab
Ryan Burn  writes:

> Is there any way to easily build a stage1 gcc with macro support for 
> debugging?

Set STAGE1_CFLAGS.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


[Bug c/68668] New: [6 Regression] bogus error: invalid use of array with unspecified bounds

2015-12-03 Thread trippels at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68668

Bug ID: 68668
   Summary: [6 Regression] bogus error: invalid use of array with
unspecified bounds
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: trippels at gcc dot gnu.org
  Target Milestone: ---

% cat vp8_dx_iface.i
typedef const int vp8_tree[];
int fn1(vp8_tree p1) { return p1[0]; }

 % gcc -c vp8_dx_iface.i
vp8_dx_iface.i: In function ‘fn1’:
vp8_dx_iface.i:2:1: error: invalid use of array with unspecified bounds
 int fn1(vp8_tree p1) { return p1[0]; }
 ^~~

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2015-12-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 67800, which changed state.

Bug 67800 Summary: [6 Regression] Missed vectorization opportunity on x86 
(DOT_PROD_EXPR in non-reduction)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67800

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/67800] [6 Regression] Missed vectorization opportunity on x86 (DOT_PROD_EXPR in non-reduction)

2015-12-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67800

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Richard Biener  ---
Fixed.

[Bug tree-optimization/68659] [6 regression] FAIL: gcc.dg/graphite/id-pr45230-1.c (internal compiler error)

2015-12-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68659

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |6.0

--- Comment #3 from Richard Biener  ---
Sounds similar to the fixed PR68550.  But confirmed, I also saw the ICE (with
ISL 0.12 and 0.14).

Re: Add an rsqrt_optab and IFN_RSQRT internal function

2015-12-03 Thread Jakub Jelinek
On Thu, Dec 03, 2015 at 09:21:03AM +, Richard Sandiford wrote:
>   * internal-fn.def (RSQRT): New function.
>   * optabs.def (rsqrt_optab): New optab.
>   * doc/tm.texi (rsqrtM2): Document

Missing full stop.

Otherwise looks to me like a nice cleanup and hopefully fixes the aarch64
regression.

Jakub


Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address

2015-12-03 Thread Richard Earnshaw
gt; into memory reference.  The problem for atomic load store is AArch64
>>> only supports direct register addressing mode.  After LRA reloads
>>> address expression out of memory reference, there is no combine/fwprop
>>> optimizer to merge instructions.  The problem is atomic_store's
>>> predicate doesn't match its constraint.   The predicate used for
>>> atomic_store is memory_operand, while all other atomic patterns
>>> use aarch64_sync_memory_operand.  I think this might be a typo.  With
>>> this change, expand will not generate addressing mode requiring reload
>>> anymore.  I will test another patch fixing this.
>>>
>>> Thanks,
>>> bin
>>
>> Some comments inline.
>>
>>>>
>>>> R.
>>>>
>>>> aarch64_legitimize_addr-20151128.txt
>>>>
>>>>
>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>>> index 3fe2f0f..5b3e3c4 100644
>>>> --- a/gcc/config/aarch64/aarch64.c
>>>> +++ b/gcc/config/aarch64/aarch64.c
>>>> @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  
>>>> */, machine_mode mode)
>>>>   We try to pick as large a range for the offset as possible to
>>>>   maximize the chance of a CSE.  However, for aligned addresses
>>>>   we limit the range to 4k so that structures with different sized
>>>> - elements are likely to use the same base.  */
>>>> + elements are likely to use the same base.  We need to be careful
>>>> + not split CONST for some forms address expressions, otherwise it
>>
>> not to split a CONST for some forms of address expression,
>>
>>>> + will generate sub-optimal code.  */
>>>>
>>>>if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
>>>>  {
>>>>HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>>>>HOST_WIDE_INT base_offset;
>>>>
>>>> +  if (GET_CODE (XEXP (x, 0)) == PLUS)
>>>> +{
>>>> +  rtx op0 = XEXP (XEXP (x, 0), 0);
>>>> +  rtx op1 = XEXP (XEXP (x, 0), 1);
>>>> +
>>>> +  /* For addr expression in the form like "r1 + r2 + 0x3ffc".
>>>> + Since the offset is within range supported by addressing
>>>> + mode "reg+offset", we don't split the const and legalize
>>>> + it into below insn and expr sequence:
>>>> +   r3 = r1 + r2;
>>>> +   "r3 + 0x3ffc".  */
>>
>> I think this comment would read better as
>>
>> /* Address expressions of the form Ra + Rb + CONST.
>>
>>If CONST is within the range supported by the addressing
>>mode "reg+offset", do not split CONST and use the
>>sequence
>> Rt = Ra + Rb
>> addr = Rt + CONST.  */
>>
>>>> +  if (REG_P (op0) && REG_P (op1))
>>>> +{
>>>> +  machine_mode addr_mode = GET_MODE (x);
>>>> +  rtx base = gen_reg_rtx (addr_mode);
>>>> +  rtx addr = plus_constant (addr_mode, base, offset);
>>>> +
>>>> +  if (aarch64_legitimate_address_hook_p (mode, addr, false))
>>>> +{
>>>> +  emit_insn (gen_adddi3 (base, op0, op1));
>>>> +  return addr;
>>>> +}
>>>> +}
>>>> +  /* For addr expression in the form like "r1 + r2<<2 + 0x3ffc".
>>>> + Live above, we don't split the const and legalize it into
>>>> + below insn and expr sequence:
>>
>> Similarly.
>>>> +   r3 = 0x3ffc;
>>>> +   r4 = r1 + r3;
>>>> +   "r4 + r2<<2".  */
>>
>> Why don't we generate
>>
>>   r3 = r1 + r2 << 2
>>   r4 = r3 + 0x3ffc
>>
>> utilizing the shift-and-add instructions?
> 
> All other comments are addressed in the attached new patch.
> As for this question, Wilco also asked it on internal channel before.
> The main idea is to depend on GIMPLE IVO/SLSR to find CSE
> opportunities of the scaled plus sub expr.  The scaled index is most
> likely loop iv, so I would like to split const plus out of memory
> reference so that it can be identified/hoisted as loop invariant.
> This is more important whe

[Bug preprocessor/57580] Repeated _Pragma message directives in macro causes problems

2015-12-03 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57580

--- Comment #8 from Jakub Jelinek  ---
Fixed for 6+.

[Bug sanitizer/68650] Firefox compilation fails with Address Sanitizer (error: undefined reference to 'dlerror')

2015-12-03 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68650

--- Comment #5 from Jakub Jelinek  ---
/home/thomas/Arbeit/Tor/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp:167:
 
error: undefined reference to 'dlerror' 

That does look like it is actually one of the firefox' objects that needs
dlerror, not *asan*.  And -fsanitize=address never emits calls to these
functions itself, when they aren't present in the source code already.
So perhaps firefox has some code guarded with the sanitizer preprocessor macros
or something similar?
Look for dlerror in nsXPCOMGlue.cpp ?

[Bug tree-optimization/68639] [6 Regression] ICE: Floating point exception

2015-12-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68639

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Richard Biener  ---
Fixed.

[Bug tree-optimization/68639] [6 Regression] ICE: Floating point exception

2015-12-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68639

--- Comment #4 from Richard Biener  ---
Author: rguenth
Date: Thu Dec  3 08:38:10 2015
New Revision: 231220

URL: https://gcc.gnu.org/viewcvs?rev=231220=gcc=rev
Log:
2015-12-03  Richard Biener  

PR tree-optimization/68639
* tree-vect-data-refs.c (dr_group_sort_cmp): Split groups
belonging to different loops.
(vect_analyze_data_ref_accesses): Likewise.

* gfortran.fortran-torture/compile/pr68639.f90: New testcase.

Added:
trunk/gcc/testsuite/gfortran.fortran-torture/compile/pr68639.f90
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-vect-data-refs.c

Fix buildbreaker with isl 0.14

2015-12-03 Thread Tom de Vries

[ was: Re: [PATCH] [graphite] handle missing isl_ast_expr ]

On 03/12/15 00:56, Tom de Vries wrote:

Hi,

This break the build for me, with isl 0.14.

...
src/gcc/graphite-isl-ast-to-gimple.c: In member function ‘tree_node*
translate_isl_ast_to_gimple::binary_op_to_tree(tree, isl_ast_expr*,
ivs_params&)’:
src/gcc/graphite-isl-ast-to-gimple.c:591:10: error: ‘isl_ast_op_zdiv_r’
was not declared in this scope
  case isl_ast_op_zdiv_r:
   ^
...

Thanks,
- Tom

On 02/12/15 23:17, Sebastian Pop wrote:

 From ISL's documentation, isl_ast_op_zdiv_r is equal to zero iff the
remainder
on integer division is zero.  Code generate a modulo operation for that.

* graphite-isl-ast-to-gimple.c (binary_op_to_tree): Handle
isl_ast_op_zdiv_r.
 (gcc_expression_from_isl_expr_op): Same.

* gcc.dg/graphite/id-28.c: New.


this patch fixes the build breaker with isl 0.14 for me. I'm using the 
HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS macro (which is set for isl 
0.15, and not before) to guard the code handling isl_ast_op_zdiv_r 
(which I suppose is new in isl 0.15).


OK for trunk?

Thanks,
- Tom
Guard isl_ast_op_zdiv_r usage with HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS

2015-12-03  Tom de Vries  

	* graphite-isl-ast-to-gimple.c (binary_op_to_tree)
	(gcc_expression_from_isl_expr_op): Guard isl_ast_op_zdiv_r usage with
	HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS.

---
 gcc/graphite-isl-ast-to-gimple.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 06a2062..20eb80f 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -588,7 +588,9 @@ binary_op_to_tree (tree type, __isl_take isl_ast_expr *expr, ivs_params )
 	}
   return fold_build2 (TRUNC_DIV_EXPR, type, tree_lhs_expr, tree_rhs_expr);
 
+#if HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS
 case isl_ast_op_zdiv_r:
+#endif
 case isl_ast_op_pdiv_r:
   /* As ISL operates on arbitrary precision numbers, we may end up with
 	 division by 2^64 that is folded to 0.  */
@@ -759,7 +761,9 @@ gcc_expression_from_isl_expr_op (tree type, __isl_take isl_ast_expr *expr,
 case isl_ast_op_pdiv_q:
 case isl_ast_op_pdiv_r:
 case isl_ast_op_fdiv_q:
+#if HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS
 case isl_ast_op_zdiv_r:
+#endif
 case isl_ast_op_and:
 case isl_ast_op_or:
 case isl_ast_op_eq:


[Bug testsuite/68232] gcc.dg/ifcvt-4.c fails on some arm configurations

2015-12-03 Thread andre.simoesdiasvieira at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68232

Andre Vieira  changed:

   What|Removed |Added

 CC||andre.simoesdiasvieira@arm.
   ||com

--- Comment #3 from Andre Vieira  ---
Fails also on any ARM M-profile arch/cpu combination I've tried (all with
-mthumb):
-march={armv6-m,armv7-m}
or -mcpu=cortex-m{0,0plus,3,4,7}

It does pass for armv7-r and cortex-r4 with and without -mthumb.

This all with target 'arm-none-eabi'.

[Bug target/68620] ICE on gcc.target/arm/attr-neon-fp16.c

2015-12-03 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68620

--- Comment #4 from Christophe Lyon  ---
Maybe, that's what I'm trying to figure out.

Given the comment in arm.h before the definition of CANNOT_CHANGE_MODE_CLASS,
maybe we need to define more patterns, for all the sizes where
CANNOT_CHANGE_MODE_CLASS is true on big-endian?

[Bug ipa/64812] [4.9 regression] x86 LibreOffice Build failure: undefined reference to acquire

2015-12-03 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64812

--- Comment #13 from rguenther at suse dot de  ---
On Thu, 3 Dec 2015, hubicka at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64812
> 
> Jan Hubicka  changed:
> 
>What|Removed |Added
> 
>  CC||rguenther at suse dot de
> 
> --- Comment #12 from Jan Hubicka  ---
> Markus,
> the devirtualization seems valid to me.  You are supposed to link with the
> implementaiton of the class.  It is possible to overwrite this by manually
> setting the visibility.
> 
> We however end up with funny code in optimized dump:
> 
>   :
>   _3 = this_2(D)->D.2399.D.2325._vptr.B;
>   _4 = *_3;
>   PROF_6 = [obj_type_ref] OBJ_TYPE_REF(_4;(struct
> WindowListenerMultiplexer)this_2(D)->0);
>   if (PROF_6 == acquire)
> goto ;
>   else
> goto ;
> 
>   :
>   PROF_10 = [obj_type_ref] OBJ_TYPE_REF(_4;(struct
> WindowListenerMultiplexer)this_2(D)->0);
>   if (PROF_10 == acquire)
> goto ;
>   else
> goto ;
> 
>   :
>   PROF_14 = [obj_type_ref] OBJ_TYPE_REF(_4;(struct
> WindowListenerMultiplexer)this_2(D)->0);
>   if (PROF_14 == acquire)
> goto ;
>   else
> goto ;
> 
>   :
>   PROF_18 = [obj_type_ref] OBJ_TYPE_REF(_4;(struct
> WindowListenerMultiplexer)this_2(D)->0);
>   if (PROF_18 == acquire)
> goto ;
>   else
> goto ;
> 
>   :
>   PROF_22 = [obj_type_ref] OBJ_TYPE_REF(_4;(struct
> WindowListenerMultiplexer)this_2(D)->0);
>   if (PROF_22 == acquire)
> goto ;
>   else
> goto ;
> 
>   :
>   PROF_26 = [obj_type_ref] OBJ_TYPE_REF(_4;(struct
> WindowListenerMultiplexer)this_2(D)->0);
>   if (PROF_26 == acquire)
> goto ;
>   else
> goto ;
>   :
>   PROF_30 = [obj_type_ref] OBJ_TYPE_REF(_4;(struct
> WindowListenerMultiplexer)this_2(D)->0);
>   if (PROF_30 == acquire)
> goto ;
>   else
> goto ;
> 
>   :
>   PROF_34 = [obj_type_ref] OBJ_TYPE_REF(_4;(struct
> WindowListenerMultiplexer)this_2(D)->0);
>   if (PROF_34 == acquire)
> goto ;
>   else
> goto ;
> 
>   :
>   OBJ_TYPE_REF(_4;(struct WindowListenerMultiplexer)this_2(D)->0) (this_2(D));
> [tail call]
>   goto ;
> 
>   :
>   OBJ_TYPE_REF(_4;(struct WindowListenerMultiplexer)this_2(D)->0) (this_2(D));
> [tail call]
>   goto ;
> 
>   :
>   OBJ_TYPE_REF(_4;(struct WindowListenerMultiplexer)this_2(D)->0) (this_2(D));
> [tail call]
>   goto ;
> 
>   :
>   OBJ_TYPE_REF(_4;(struct WindowListenerMultiplexer)this_2(D)->0) (this_2(D));
> [tail call]
>   goto ;
> 
>   :
>   OBJ_TYPE_REF(_4;(struct WindowListenerMultiplexer)this_2(D)->0) (this_2(D));
> [tail call]
>   goto ;
> 
>   :
>   OBJ_TYPE_REF(_4;(struct WindowListenerMultiplexer)this_2(D)->0) (this_2(D));
> [tail call]
>   goto ;
> 
>   :
>   OBJ_TYPE_REF(_4;(struct WindowListenerMultiplexer)this_2(D)->0) (this_2(D));
> [tail call]
>   goto ;
> 
>   :
>   OBJ_TYPE_REF(_4;(struct WindowListenerMultiplexer)this_2(D)->0) (this_2(D));
> [tail call]
>   goto ;
> 
>   :
>   OBJ_TYPE_REF(_4;(struct WindowListenerMultiplexer)this_2(D)->0) (this_2(D));
> [tail call]
> 
>   :
>   return;
> 
> }
> 
> this is result of recursive inlining over the devirtualized call that sort of
> seem legit even though it is bit of overkill.
> 
> RTL optimizers simplify this to:
> 
> _ZN25WindowListenerMultiplexer7acquireEv:
> .LFB1:
> .cfi_startproc
> movq(%rdi), %rax
> jmp *(%rax)
> .cfi_endproc
> .LFE1:
> 
> RIchi, I think PRE is supposed to optimize this?

Ah, for some reason OBJ_TYPE_REF is a GENERIC tree (single RHS) and
SCCVN doesn't handle it at all.  I have a patch which will produce

  :
  _3 = this_2(D)->D.2399.D.2325._vptr.B;
  _4 = *_3;
  PROF_6 = [obj_type_ref] OBJ_TYPE_REF(_4;(struct 
WindowListenerMultiplexer)this_2(D)->0);
  if (PROF_6 == acquire)
goto ;
  else
goto ;

  :
  OBJ_TYPE_REF(_4;(struct WindowListenerMultiplexer)this_2(D)->0) 
(this_2(D)); [tail call]
  goto ;

  :
  OBJ_TYPE_REF(_4;(struct WindowListenerMultiplexer)this_2(D)->0) 
(this_2(D)); [tail call]

thus it also misses to "propagate" acquire into the tailcall.  That
could be fixed as well but only in two steps, first replacing
the OBJ_TYPE_REF in the call with PROF_6 ("CSE" it) and later
DOM might propagate the equivalence.

Re: [gomp-nvptx 4/9] nvptx backend: add -mgomp option and multilib

2015-12-03 Thread Alexander Monakov
On Wed, 2 Dec 2015, Jakub Jelinek wrote:
> Can you post sample code with assembly for -msoft-stack and -muniform-simt
> showing how are short interesting cases expanded?

Here's short examples;  please let me know if I'm misunderstanding and you
wanted something else.

First, -muniform-simt effect on this input:

int f (int *p, int v)
{
  return __atomic_exchange_n (p, v, __ATOMIC_SEQ_CST);
}

leads to this assembly (showing diff -without/+with option):

 .visible .func (.param.u32 %out_retval)f(.param.u64 %in_ar1, .param.u32 
%in_ar2)
 {
.reg.u64 %ar1;
.reg.u32 %ar2;
.reg.u32 %retval;
.reg.u64 %hr10;
.reg.u32 %r23;
.reg.u64 %r25;
.reg.u32 %r26;
+   .reg.u32 %r28;
+   .reg.pred %r29;
ld.param.u64 %ar1, [%in_ar1];
ld.param.u32 %ar2, [%in_ar2];
+   {
+   .reg.u32 %ustmp0;
+   .reg.u64 %ustmp1;
+   .reg.u64 %ustmp2;
+   mov.u32 %ustmp0, %tid.y;
+   mul.wide.u32 %ustmp1, %ustmp0, 4;
+   mov.u64 %ustmp2, __nvptx_uni;
+   add.u64 %ustmp2, %ustmp2, %ustmp1;
+   ld.shared.u32 %r28, [%ustmp2];
+   mov.u32 %ustmp0, %tid.x;
+   and.b32 %r28, %r28, %ustmp0;
+   setp.eq.u32 %r29, %r28, %ustmp0;
+   }
mov.u64 %r25, %ar1;
mov.u32 %r26, %ar2;
-   atom.exch.b32   %r23, [%r25], %r26;
+   @%r29   atom.exch.b32   %r23, [%r25], %r26;
+   shfl.idx.b32%r23, %r23, %r28, 31;
mov.u32 %retval, %r23;
st.param.u32[%out_retval], %retval;
ret;
}
+// BEGIN GLOBAL VAR DECL: __nvptx_uni
+.extern .shared .u32 __nvptx_uni[32];

And, -msoft-stack for this input:

void g(void *);
void f()
{
  char a[42] __attribute__((aligned(64)));
  g(a);
}

leads to:

 .visible .func f
 {
.reg.u64 %hr10;
.reg.u64 %r22;
.reg.u64 %frame;
-   .local.align 64 .b8 %farray[48];
-   cvta.local.u64 %frame, %farray;
+   .reg.u32 %fstmp0;
+   .reg.u64 %fstmp1;
+   .reg.u64 %fstmp2;
+   mov.u32 %fstmp0, %tid.y;
+   mul.wide.u32 %fstmp1, %fstmp0, 8;
+   mov.u64 %fstmp2, __nvptx_stacks;
+   add.u64 %fstmp2, %fstmp2, %fstmp1;
+   ld.shared.u64 %fstmp1, [%fstmp2];
+   sub.u64 %frame, %fstmp1, 48;
+   and.b64 %frame, %frame, -64;
+   st.shared.u64 [%fstmp2], %frame;
mov.u64 %r22, %frame;
{
.param.u64 %out_arg0;
st.param.u64 [%out_arg0], %r22;
call g, (%out_arg0);
}
+   st.shared.u64 [%fstmp2], %fstmp1;
ret;
}
 // BEGIN GLOBAL FUNCTION DECL: g
 .extern .func g(.param.u64 %in_ar1);
+// BEGIN GLOBAL VAR DECL: __nvptx_stacks
+.extern .shared .u64 __nvptx_stacks[32];


Alexander


[Ping^2][AArch64][TLSGD][2/2] Implement TLS GD traditional for tiny code model

2015-12-03 Thread Jiong Wang

On 13/11/15 15:21, Jiong Wang wrote:


On 05/11/15 14:57, Jiong Wang wrote:

Marcus Shawcroft writes:


+#ifdef HAVE_AS_TINY_TLSGD_RELOCS
+  return SYMBOL_TINY_TLSGD;
+#else
+  return SYMBOL_SMALL_TLSGD;
+#endif

Rather than introduce blocks of conditional compilation it is better
to gate different behaviours with a test on a constant expression. In
this case add something like this:

#if define(HAVE_AS_TINY_TLSGD_RELOCS)
#define USE_TINY_TLSGD 1
#else
#define USE_TINY_TLSGD 0
#endif

up near the definition of TARGET_HAVE_TLS then write the above
fragment without using the preprocessor:

return USE_TINY_TLSGD ? SYMBOL_TINY_TLSGD : SYMBOL_SMALL_TLSGD;


Done.


- aarch64_emit_call_insn (gen_tlsgd_small (result, imm, resolver));
+ if (type == SYMBOL_SMALL_TLSGD)
+  aarch64_emit_call_insn (gen_tlsgd_small (result, imm, resolver));
+ else
+  aarch64_emit_call_insn (gen_tlsgd_tiny (result, imm, resolver));
  insns = get_insns ();
  end_sequence ();

Add a separate case statment for SYMBOL_TINY_TLSGD rather than reusing
the case statement for SYMBOL_SMALL_TLSGD and then needing to add
another test against symbol type within the body of the case
statement.


Done.



+(define_insn "tlsgd_tiny"
+  [(set (match_operand 0 "register_operand" "")
+ (call (mem:DI (match_operand:DI 2 "" "")) (const_int 1)))
+   (unspec:DI [(match_operand:DI 1 "aarch64_valid_symref" "S")]
UNSPEC_GOTTINYTLS)
+   (clobber (reg:DI LR_REGNUM))
+  ]
+  ""
+  "adr\tx0, %A1;bl\t%2;nop";
+  [(set_attr "type" "multiple")
+   (set_attr "length" "12")])

I don't think the explicit clobber LR_REGNUM is required since your
change last September:
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02654.html


We don't need this explict clobber LR_REGNUM only if operand 0 happen
be allocated to LR_REGNUM as after my above patch LR_REGNUM is 
allocable.


However we still need the explict clobber here.  Because for all other
cases LR_REGNUM not allocated, gcc data flow analysis can't deduct 
LR_REGNUM

will still be clobbered implicitly by the call instruction.

Without this "clobber" tag, a direct impact is df_regs_ever_live is 
calculated

incorrectly for x30, then for the following simple testcase:

__thread int t0 = 0x10;
__thread int t1 = 0x10;

int
main (int argc, char **argv)
{
  if (t0 != t1)
return 1;
  return  0;
}


if you compile with

 "-O2 -ftls-model=global-dynamic -fpic -mtls-dialect=trad t.c 
-mcmodel=tiny -fomit-frame-pointer",

wrong code will be generated:

 main:
str x19, [sp, -16]!  <--- x30 is not saved.
adr x0, :tlsgd:t0
bl __tls_get_addr
nop

Patch updated. tls regression OK

OK for trunk?

2015-11-05  Jiong Wang  

gcc/
  * configure.ac: Add check for binutils global dynamic tiny code model
  relocation support.
  * configure: Regenerate.
  * config.in: Regenerate.
  * config/aarch64/aarch64.md (tlsgd_tiny): New define_insn.
  * config/aarch64/aarch64-protos.h (aarch64_symbol_type): New
  enumeration SYMBOL_TINY_TLSGD.
  (aarch64_symbol_context): New comment on SYMBOL_TINY_TLSGD.
  * config/aarch64/aarch64.c (aarch64_classify_tls_symbol): Support
  SYMBOL_TINY_TLSGD.
  (aarch64_print_operand): Likewise.
  (aarch64_expand_mov_immediate): Likewise.
  (aarch64_load_symref_appropriately): Likewise.

gcc/testsuite/
  * lib/target-supports.exp (check_effective_target_aarch64_tlsgdtiny):
  New effective check.
  * gcc.target/aarch64/tlsgd_small_1.c: New testcase.
  * gcc.target/aarch64/tlsgd_small_ilp32_1.c: Likewise.
  * gcc.target/aarch64/tlsgd_tiny_1.c: Likewise.
  * gcc.target/aarch64/tlsgd_tiny_ilp32_1.c: Likewise.

Ping ~


Ping^2


[PATCH, CHKP] Fix bounds returned for structures

2015-12-03 Thread Ilya Enkovich
Hi,

Currently multiple return-struct-* tests from MPX testsuite fail.  This patch 
fixes it.  Bootstrapped and tested on x86_64-unknown-linux-gnu.  Applied to 
trunk.  I'm going to port it to GCC5 after 5.3 release.

Thanks,
Ilya
--
gcc/

2015-12-03  Ilya Enkovich  

* cfgexpand.c (expand_gimple_stmt_1): Return statement with
DECL as return value is allowed to have NULL bounds.


diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 1990e10..2c3b23d 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -3534,6 +3534,12 @@ expand_gimple_stmt_1 (gimple *stmt)
  {
tree result = DECL_RESULT (current_function_decl);
 
+   /* Mark we have return statement with missing bounds.  */
+   if (!bnd
+   && chkp_function_instrumented_p (cfun->decl)
+   && !DECL_P (op0))
+ bnd = error_mark_node;
+
/* If we are not returning the current function's RESULT_DECL,
   build an assignment to it.  */
if (op0 != result)
@@ -3550,9 +3556,6 @@ expand_gimple_stmt_1 (gimple *stmt)
op0 = build2 (MODIFY_EXPR, TREE_TYPE (result),
  result, op0);
  }
-   /* Mark we have return statement with missing bounds.  */
-   if (!bnd && chkp_function_instrumented_p (cfun->decl))
- bnd = error_mark_node;
  }
 
if (!op0)


[Bug c/68513] [5/6 Regression] ICE in gimplify_expr, at gimplify.c:8832, c_maybe_const_expr in IL

2015-12-03 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68513

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #10 from Jakub Jelinek  ---
Supposedly r231178 and r231110.

Add an rsqrt_optab and IFN_RSQRT internal function

2015-12-03 Thread Richard Sandiford
All current uses of builtin_reciprocal convert 1.0/sqrt into rsqrt.
This patch adds an rsqrt optab and associated internal function for
that instead.  We can then pick up the vector forms of rsqrt automatically,
fixing an AArch64 regression from my internal_fn patches.

With that change, builtin_reciprocal only needs to handle target-specific
built-in functions.  I've restricted the hook to those since, if we need
a reciprocal of another standard function later, I think there should be
a strong preference for adding a new optab and internal function for it,
rather than hiding the code in a backend.

Three targets implement builtin_reciprocal: aarch64, i386 and rs6000.
i386 and rs6000 already used the obvious rsqrt2 pattern names
for the instructions, so they pick up the new code automatically.
aarch64 needs a slight rename.

mn10300 is unusual in that its native operation is rsqrt, and
sqrt is approximated as 1.0/rsqrt.  The port also uses rsqrt2
for the rsqrt pattern, so after the patch we now pick it up as a native
operation.

Two other ports define rsqrt patterns: sh and v850.  AFAICT these
patterns aren't currently used, but I think the patch does what the
authors of the patterns would have expected.  There's obviously some
risk of fallout though.

Tested on x86_64-linux-gnu, aarch64-linux-gnu, arm-linux-gnueabihf
(as a target without the hooks) and powerpc64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
* internal-fn.def (RSQRT): New function.
* optabs.def (rsqrt_optab): New optab.
* doc/tm.texi (rsqrtM2): Document
* target.def (builtin_reciprocal): Replace gcall argument with
a function decl.  Restrict hook to machine functions.
* doc/tm.texi: Regenerate.
* targhooks.h (default_builtin_reciprocal): Update prototype.
* targhooks.c (default_builtin_reciprocal): Likewise.
* tree-ssa-math-opts.c: Include internal-fn.h.
(internal_fn_reciprocal): New function.
(pass_cse_reciprocals::execute): Call it, and build a call to an
internal function on success.  Only call targetm.builtin_reciprocal
for machine functions.
* config/aarch64/aarch64-protos.h (aarch64_builtin_rsqrt): Remove
second argument.
* config/aarch64/aarch64-builtins.c (aarch64_expand_builtin_rsqrt):
Rename aarch64_rsqrt_2 to rsqrt2.
(aarch64_builtin_rsqrt): Remove md_fn argument and only handle
machine functions.
* config/aarch64/aarch64.c (use_rsqrt_p): New function.
(aarch64_builtin_reciprocal): Replace gcall argument with a
function decl.  Use use_rsqrt_p.  Remove optimize_size check.
Only handle machine functions.  Update call to aarch64_builtin_rsqrt.
(aarch64_optab_supported_p): New function.
(TARGET_OPTAB_SUPPORTED_P): Define.
* config/aarch64/aarch64-simd.md (aarch64_rsqrt_2): Rename to...
(rsqrt2): ...this.
* config/i386/i386.c (use_rsqrt_p): New function.
(ix86_builtin_reciprocal): Replace gcall argument with a
function decl.  Use use_rsqrt_p.  Remove optimize_insn_for_size_p
check.  Only handle machine functions.
(ix86_optab_supported_p): Handle rsqrt_optab.
* config/rs6000/rs6000.c (TARGET_OPTAB_SUPPORTED_P): Define.
(rs6000_builtin_reciprocal): Replace gcall argument with a
function decl.  Remove optimize_insn_for_size_p check.
Only handle machine functions.
(rs6000_optab_supported_p): New function.

Index: gcc/internal-fn.def
===
--- gcc/internal-fn.def 2015-12-03 09:16:57.0 +
+++ gcc/internal-fn.def 2015-12-03 09:17:00.811513362 +
@@ -91,6 +91,8 @@ DEF_INTERNAL_OPTAB_FN (LOAD_LANES, ECF_C
 DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store)
 DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes)
 
+DEF_INTERNAL_OPTAB_FN (RSQRT, ECF_CONST, rsqrt, unary)
+
 /* Unary math functions.  */
 DEF_INTERNAL_FLT_FN (ACOS, ECF_CONST, acos, unary)
 DEF_INTERNAL_FLT_FN (ASIN, ECF_CONST, asin, unary)
Index: gcc/optabs.def
===
--- gcc/optabs.def  2015-12-03 09:16:57.0 +
+++ gcc/optabs.def  2015-12-03 09:17:00.811513362 +
@@ -267,6 +267,7 @@ OPTAB_D (log_optab, "log$a2")
 OPTAB_D (logb_optab, "logb$a2")
 OPTAB_D (pow_optab, "pow$a3")
 OPTAB_D (remainder_optab, "remainder$a3")
+OPTAB_D (rsqrt_optab, "rsqrt$a2")
 OPTAB_D (scalb_optab, "scalb$a3")
 OPTAB_D (signbit_optab, "signbit$F$a2")
 OPTAB_D (significand_optab, "significand$a2")
Index: gcc/doc/md.texi
===
--- gcc/doc/md.texi 2015-12-03 09:16:57.0 +
+++ gcc/doc/md.texi 2015-12-03 09:17:00.811513362 +
@@ -5331,6 +5331,18 @@ corresponds to the C data type @code{dou
 built-in function uses the mode which corresponds to 

Re: [PATCH] RFC: Use Levenshtein spelling suggestions in Fortran FE

2015-12-03 Thread Janne Blomqvist
On Tue, Dec 1, 2015 at 7:51 PM, Bernhard Reutner-Fischer
 wrote:
> As said, we could as well use a list of candidates with NULL as record marker.
> Implementation cosmetics. Steve seems to not be thrilled by the
> overall idea in the first place, so unless there is clear support by
> somebody else i won't pursue this any further, it's not that i'm bored
> or ran out of stuff i should do.. ;)

FWIW, I think the idea of this patch is quite nice, and I'd like to
see it in the compiler.

I'm personally Ok with "C++-isms", but nowadays my contributions are
so minor that my opinion shouldn't carry that much weight on this
matter.


-- 
Janne Blomqvist


[Bug rtl-optimization/68651] [5/6 Regression][AArch64] Missing combination of shift-by-one with other arithmetic patterns with -mcpu=cortex-a53

2015-12-03 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68651

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2015-12-03
  Component|target  |rtl-optimization
   Assignee|unassigned at gcc dot gnu.org  |ktkachov at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from ktkachov at gcc dot gnu.org ---
I think this can/should be fixed in a target-independent way.
I have an idea I'm trying out

[PATCH] Handle OBJ_TYPE_REF in FRE

2015-12-03 Thread Richard Biener

The following patch handles CSEing OBJ_TYPE_REF which was omitted
because it is a GENERIC expression even on GIMPLE (for whatever
reason...).  Rather than changing this now the following patch
simply treats it properly as such.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Note that this does not (yet) substitute OBJ_TYPE_REFs in calls
with SSA names that have the same value - not sure if that would
be desired generally (does the devirt machinery cope with that?).

Thanks,
Richard.

2015-12-03  Richard Biener  

PR tree-optimization/64812
* tree-ssa-sccvn.c (vn_get_stmt_kind): Handle OBJ_TYPE_REF.
(vn_nary_length_from_stmt): Likewise.
(init_vn_nary_op_from_stmt): Likewise.
* gimple-match-head.c (maybe_build_generic_op): Likewise.
* gimple-pretty-print.c (dump_unary_rhs): Likewise.

* g++.dg/tree-ssa/ssa-fre-1.C: New testcase.

Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c(revision 231221)
--- gcc/tree-ssa-sccvn.c(working copy)
*** vn_get_stmt_kind (gimple *stmt)
*** 460,465 
--- 460,467 
  ? VN_CONSTANT : VN_REFERENCE);
else if (code == CONSTRUCTOR)
  return VN_NARY;
+   else if (code == OBJ_TYPE_REF)
+ return VN_NARY;
return VN_NONE;
  }
  default:
*** vn_nary_length_from_stmt (gimple *stmt)
*** 2479,2484 
--- 2481,2487 
return 1;
  
  case BIT_FIELD_REF:
+ case OBJ_TYPE_REF:
return 3;
  
  case CONSTRUCTOR:
*** init_vn_nary_op_from_stmt (vn_nary_op_t
*** 2508,2513 
--- 2511,2517 
break;
  
  case BIT_FIELD_REF:
+ case OBJ_TYPE_REF:
vno->length = 3;
vno->op[0] = TREE_OPERAND (gimple_assign_rhs1 (stmt), 0);
vno->op[1] = TREE_OPERAND (gimple_assign_rhs1 (stmt), 1);
Index: gcc/gimple-match-head.c
===
*** gcc/gimple-match-head.c (revision 231221)
--- gcc/gimple-match-head.c (working copy)
*** maybe_build_generic_op (enum tree_code c
*** 243,248 
--- 243,249 
*op0 = build1 (code, type, *op0);
break;
  case BIT_FIELD_REF:
+ case OBJ_TYPE_REF:
*op0 = build3 (code, type, *op0, op1, op2);
break;
  default:;
Index: gcc/gimple-pretty-print.c
===
*** gcc/gimple-pretty-print.c   (revision 231221)
--- gcc/gimple-pretty-print.c   (working copy)
*** dump_unary_rhs (pretty_printer *buffer,
*** 302,308 
  || TREE_CODE_CLASS (rhs_code) == tcc_reference
  || rhs_code == SSA_NAME
  || rhs_code == ADDR_EXPR
! || rhs_code == CONSTRUCTOR)
{
  dump_generic_node (buffer, rhs, spc, flags, false);
  break;
--- 302,309 
  || TREE_CODE_CLASS (rhs_code) == tcc_reference
  || rhs_code == SSA_NAME
  || rhs_code == ADDR_EXPR
! || rhs_code == CONSTRUCTOR
! || rhs_code == OBJ_TYPE_REF)
{
  dump_generic_node (buffer, rhs, spc, flags, false);
  break;
Index: gcc/testsuite/g++.dg/tree-ssa/ssa-fre-1.C
===
*** gcc/testsuite/g++.dg/tree-ssa/ssa-fre-1.C   (revision 0)
--- gcc/testsuite/g++.dg/tree-ssa/ssa-fre-1.C   (working copy)
***
*** 0 
--- 1,44 
+ /* { dg-do compile } */
+ /* { dg-options "-O2 -fdump-tree-fre2" } */
+ 
+ template  class A
+ {
+   T *p;
+ 
+ public:
+   A (T *p1) : p (p1) { p->acquire (); }
+ };
+ 
+ class B
+ {
+ public:
+ virtual void acquire ();
+ };
+ class D : public B
+ {
+ };
+ class F : B
+ {
+   int mrContext;
+ };
+ class WindowListenerMultiplexer : F, public D
+ {
+   void acquire () { acquire (); }
+ };
+ class C
+ {
+   void createPeer () throw ();
+   WindowListenerMultiplexer maWindowListeners;
+ };
+ class FmXGridPeer
+ {
+ public:
+ void addWindowListener (A);
+ } a;
+ void
+ C::createPeer () throw ()
+ {
+   a.addWindowListener ();
+ }
+ 
+ /* { dg-final { scan-tree-dump-times "= OBJ_TYPE_REF" 1 "fre2" } } */


Re: [PATCH, C++] Wrap OpenACC wait in EXPR_STMT

2015-12-03 Thread Thomas Schwinge
Hi Chung-Lin!

On Mon, 23 Nov 2015 21:15:00 +0800, Chung-Lin Tang  
wrote:
> The OpenACC wait directive is represented as a call to the runtime
> function "GOACC_wait" instead of a tree code.  I am seeing when
> '#pragma acc wait' is using inside a template function, the CALL_EXPR
> to GOACC_wait is being silently ignored/removed during tsubst_expr().

Uh.

> I think the correct way to organize this is that the call should be inside
> an EXPR_STMT, so here's a patch to do that; basically remove the
> add_stmt() call from the shared c_finish_oacc_wait() code, and add
> add_stmt()/finish_expr_stmt() in the corresponding C/C++ parts.
> 
> Tested with no regressions on trunk, okay to commit?

> --- c-family/c-omp.c  (revision 230703)
> +++ c-family/c-omp.c  (working copy)
> @@ -63,7 +63,6 @@ c_finish_oacc_wait (location_t loc, tree parms, tr
>  }
>  
>stmt = build_call_expr_loc_vec (loc, stmt, args);
> -  add_stmt (stmt);
>  
>vec_free (args);
|  
|return stmt;
|  }

I see in gcc/c/c-omp.c that several other c_finish_omp_* functions that
build builtin calls instead of tree nodes, do similar things like
c_finish_oacc_wait; I'd like to understand why it's -- presumably -- not
a problem for these: c_finish_omp_barrier, c_finish_omp_taskwait,
c_finish_omp_taskyield, c_finish_omp_flush?  (Jakub?)

> --- c/c-parser.c  (revision 230703)
> +++ c/c-parser.c  (working copy)
> @@ -13886,6 +13886,7 @@ c_parser_oacc_wait (location_t loc, c_parser *pars
>strcpy (p_name, " wait");
>clauses = c_parser_oacc_all_clauses (parser, OACC_WAIT_CLAUSE_MASK, 
> p_name);
>stmt = c_finish_oacc_wait (loc, list, clauses);
> +  add_stmt (stmt);
>  
>return stmt;
>  }
> --- cp/parser.c   (revision 230703)
> +++ cp/parser.c   (working copy)
> @@ -34930,6 +34930,7 @@ cp_parser_oacc_wait (cp_parser *parser, cp_token *
>   "#pragma acc wait", pragma_tok);
>  
>stmt = c_finish_oacc_wait (loc, list, clauses);
> +  stmt = finish_expr_stmt (stmt);
>  
>return stmt;
>  }


Grüße
 Thomas


signature.asc
Description: PGP signature


[Bug c/68668] [6 Regression] bogus error: invalid use of array with unspecified bounds

2015-12-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68668

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |6.0

[PATCH][RTL-ifcvt] PR rtl-optimization/68624: Clean up logic that checks for clobbering conflicts across basic blocks

2015-12-03 Thread Kyrill Tkachov

Hi all,

In this fix I want to simplify the control flow of the code that chooses the 
order in which to emit
the then and else basic blocks (and their associated emit_a and emit_b 
instructions).
Currently we check the then block and only if there is a modification there we 
check the else block
and make a decision there. IMO it's much simpler if we check both blocks and 
write the logic that
chooses the order as a simple IF-ELSEIF-ELSE block that only emits the blocks 
and doesn't try to do
any other checks.  The bug in the logic that was preventing the clobber check 
from being performed
in this PR was in the code:
  if (emit_a || modified_in_a)
{
  modified_in_b = emit_b != NULL_RTX && modified_in_p (orig_a, emit_b);
  if (tmp_b && else_bb)
{
  FOR_BB_INSNS (else_bb, tmp_insn)

where the second if condition should have been:
  if (tmp_a && else_bb)

Just changing the tmp_b to tmp_a in that condition would have fixed the 
wrong-code part of this PR
as we would have ended up rejecting if-conversion. However, there is a valid 
if-conversion opportunity
here, we just have to emit emit_a followed by else_bb, which the current 
control flow made awkward, which
is why I'm suggesting this small rewrite.

Bootstrapped and tested on x86_64, aarch64, arm.

Ok for trunk?
Thanks,
Kyrill

2015-12-03  Kyrylo Tkachov  

PR rtl-optimization/68624
* ifcvt.c (noce_try_cmove_arith): Check clobbers of temp regs in both
blocks if they exist and simplify the logic choosing the order to emit
them in.

2015-12-03  Kyrylo Tkachov  

PR rtl-optimization/68624
* gcc.c-torture/execute/pr68624.c: New test.
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 86b6ef7246ceddd223e93922737496af3d93f148..ef23c4cda66e6a659eee9b30089a6cc056cea30f 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -2202,10 +2202,6 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 	}
 }
 
-/* If insn to set up A clobbers any registers B depends on, try to
-   swap insn that sets up A with the one that sets up B.  If even
-   that doesn't help, punt.  */
-
   modified_in_a = emit_a != NULL_RTX && modified_in_p (orig_b, emit_a);
   if (tmp_b && then_bb)
 {
@@ -2220,31 +2216,33 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 	  }
 
 }
-  if (emit_a || modified_in_a)
+
+  modified_in_b = emit_b != NULL_RTX && modified_in_p (orig_a, emit_b);
+  if (tmp_a && else_bb)
 {
-  modified_in_b = emit_b != NULL_RTX && modified_in_p (orig_a, emit_b);
-  if (tmp_b && else_bb)
+  FOR_BB_INSNS (else_bb, tmp_insn)
+  /* Don't check inside insn_b.  We will have changed it to emit_b
+	 with a destination that doesn't conflict.  */
+  if (!(insn_b && tmp_insn == insn_b)
+	  && modified_in_p (orig_a, tmp_insn))
 	{
-	  FOR_BB_INSNS (else_bb, tmp_insn)
-	  /* Don't check inside insn_b.  We will have changed it to emit_b
-	 with a destination that doesn't conflict.  */
-	  if (!(insn_b && tmp_insn == insn_b)
-	  && modified_in_p (orig_a, tmp_insn))
-	{
-	  modified_in_b = true;
-	  break;
-	}
+	  modified_in_b = true;
+	  break;
 	}
-  if (modified_in_b)
-	goto end_seq_and_fail;
+}
 
+  /* If insn to set up A clobbers any registers B depends on, try to
+ swap insn that sets up A with the one that sets up B.  If even
+ that doesn't help, punt.  */
+  if (modified_in_a && !modified_in_b)
+{
   if (!noce_emit_bb (emit_b, else_bb, b_simple))
 	goto end_seq_and_fail;
 
   if (!noce_emit_bb (emit_a, then_bb, a_simple))
 	goto end_seq_and_fail;
 }
-  else
+  else if (!modified_in_a)
 {
   if (!noce_emit_bb (emit_a, then_bb, a_simple))
 	goto end_seq_and_fail;
@@ -2252,6 +2250,8 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   if (!noce_emit_bb (emit_b, else_bb, b_simple))
 	goto end_seq_and_fail;
 }
+  else
+goto end_seq_and_fail;
 
   target = noce_emit_cmove (if_info, x, code, XEXP (if_info->cond, 0),
 			XEXP (if_info->cond, 1), a, b);
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr68624.c b/gcc/testsuite/gcc.c-torture/execute/pr68624.c
new file mode 100644
index ..abb716b1550038cb3d0e96e8917b7ed0ba8bfa83
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr68624.c
@@ -0,0 +1,30 @@
+int b, c, d, e = 1, f, g, h, j;
+
+static int
+fn1 ()
+{
+  int a = c;
+  if (h)
+return 9;
+  g = (c || b) % e;
+  if ((g || f) && b)
+return 9;
+  e = d;
+  for (c = 0; c > -4; c--)
+;
+  if (d)
+c--;
+  j = c;
+  return d;
+}
+
+int
+main ()
+{
+  fn1 ();
+
+  if (c != -4)
+__builtin_abort ();
+
+  return 0;
+}


[Bug target/68655] SSE2 cannot vec_perm of low and high part

2015-12-03 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68655

--- Comment #4 from rguenther at suse dot de  ---
On Thu, 3 Dec 2015, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68655
> 
> Jakub Jelinek  changed:
> 
>What|Removed |Added
> 
>  Status|UNCONFIRMED |NEW
>Last reconfirmed||2015-12-03
>  Ever confirmed|0   |1
> 
> --- Comment #3 from Jakub Jelinek  ---
> Well, doing something like that at the optabs.c level wouldn't be really
> helpful, as i?86 has tons of different permutation instructions and for many
> permutations different sequence lengths.
> 
> So, the question is, does any supported CPU have some extra reinterpretation
> costs if we use a different integral vector mode (I believe there is some cost
> for some CPU when reinterpreting an integral vector as float vector and back,
> vice versa, or perhaps even float vector as double vector and vice versa)?
> If not, then the easiest fix is IMHO to change either
> ix86_expand_vec_perm_const_1
> or both
> ix86_expand_vec_perm_const and ix86_vectorize_vec_perm_const_ok
> to detect the case when V*{QI,HI,SI} permutation is doable in a wider unit 
> mode
> same whole vector size mode and just transform it to that case 
> unconditionally.
> If there is some cost, then we'd perhaps should do that at the end of
> expand_vec_perm_1 (if everything else failed for single instruction), but then
> the question is what to do with the 2-5 long sequences, we'd need to repeat
> that at all the other spots.

Older AMD CPUs had "reformatting" costs but only when you apply operations
to vectors that may destroy properties such as whether the value is
a NaN - and the formatting penalty applied only when you then perform
an operation in FP representation on that vector that would care about
this.

So generally I think changing from vector integer modes to
vector integer or float modes of different size and then back
for the purpose of permutation is fine.

Doing this for vector float modes might have an issue depending
on the HW thus using vshufpd on a fload vector.  Practially
the FP state doesn't change unless you shuffle sub-parts of the
float but of course the HW might not be so clever to detect this.

So I think using larger modes or even smaller modes (we already
try chars in optans.c unconditionally (even for float modes?))
for integer vector mode shuffles is ok.  For float vector modes
I would avoid this unless we do more research.

[Bug target/68620] ICE on gcc.target/arm/attr-neon-fp16.c

2015-12-03 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68620

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #3 from ktkachov at gcc dot gnu.org ---
Does that mean we need to define a movv4hf pattern?

  1   2   3   4   >