Re: [PATCH] Don't necessarily emit object size checks for ARRAY_REFs

2014-11-11 Thread Jakub Jelinek
On Thu, Nov 06, 2014 at 11:19:08PM +0100, Marek Polacek wrote:
 First part of this patch is about removing the useless check that
 we talked about earlier today.
 
 The rest is about not emitting UBSAN_OBJECT_SIZE checks (those often
 come with multiple statements to compute a pointer difference) for
 ARRAY_REFs that are already instrumented by UBSAN_BOUNDS.
 
 I do this by moving the UBSAN_OBJECT_SIZE instrumentation so that
 it is done first in the ubsan pass - then I can just check whether
 the statement before that ARRAY_REF is a UBSAN_BOUND check.  If it
 is, it should be clear that it is checking the ARRAY_REF, and I can
 drop the UBSAN_OBJECT_SIZE check.  (Moving the UBSAN_OBJECT_SIZE
 instrumentation means that there won't be e.g. UBSAN_NULL check in
 between the ARRAY_REF and UBSAN_BOUND.)
 
 Earlier, I thought I should check that both UBSAN_OBJECT_SIZE and
 UBSAN_BOUND checks are checking the same array index, but that
 wouldn't work for multidimensional arrays, and just should not be
 needed.

IMHO it is needed and is highly desirable, otherwise you risk missed
diagnostics from -fsanitize=object-size when it is needed.

Consider e.g.:

extern int a[][10][10];

int
foo (int x, int y, int z)
{
  return a[x][y][z];
}

int a[10][10][10] = {};

testcase, here only the y and z indexes are bounds checked, but
the x index is not (UBSAN_BOUNDS is added early, before the a var
definition is parsed, while ubsan pass runs afterwards, so can know the
object size.

If you have a multi-dimensional array, you can just walk backwards
within the same bb, looking for UBSAN_BOUNDS calls that verify
the indexes where needed.

Say on:
struct S { int a:3; };
extern struct S a[][10][10];

int
foo (int x, int y, int z)
{
  return a[5][11][z].a;
}

struct S a[10][10][10] = {};

you have:
  UBSAN_BOUNDS (0B, 11, 9);
  z.0_4 = z_3(D);
  UBSAN_BOUNDS (0B, z.0_4, 9);
  _6 = a[5][11][z.0_4].a;

and you walk the handled components:
1) COMPONENT_REF - ok
2) ARRAY_REF with index z.0_4 and array index maximum is 9, there is
   UBSAN_BOUNDS right above it checking that
3) ARRAY_REF with index 11; 11 is bigger than index maximum 9,
   there is UBSAN_BOUNDS call for it in the same bb
4) ARRAY_REF with index 5; 5 is smaller or equal than index maximum 9,
   no UBSAN_BOUNDS is needed
5) decl inside of the innermost handled component, we can avoid
   the object-size instrumentation; if the base is not a decl,
   never omit object-size instrumentation.

Jakub


Re: [PATCH, i386]: Use std::swap

2014-11-11 Thread Uros Bizjak
On Mon, Nov 10, 2014 at 10:51 PM, Marc Glisse marc.gli...@inria.fr wrote:
 On Mon, 10 Nov 2014, Richard Biener wrote:

 No extra includes required?


 utility is already included in wide-int.h and rtl.h, should probably move
 those.

Bah, we hit a problem. std::swap has been moved from algorithm to
utility in C++11, and the patch breaks build on CentOS 5.11
(gcc-4.1.2).

Short of reverting the i386.c patch, is there a quick solution by
including some additional headers?

Uros.


Re: [PATCH] c++ify sreal

2014-11-11 Thread Jakub Jelinek
On Tue, Nov 11, 2014 at 08:51:41AM +0100, Uros Bizjak wrote:
 Hello!
 
  do $subject, and cleanup for always 64 bit hwi.
 
 
  bootstrapped + regtested x86_64-unknown-linux-gnu, ok?
 
  Ok.  Can you please replace remaining HOST_WIDE_INT
  vestiges in there with [u]int64_t please?
 
 
  This patch breaks the build on debian 6.0:
 
  ../../gcc/sreal.c: In member function âint64_t sreal::to_int() constâ:
  ../../gcc/sreal.c:159: error: âINT64_MAXâ was not declared in this scope
 
 Index: system.h
 ===
 --- system.h(revision 217338)
 +++ system.h(working copy)
 @@ -27,6 +27,7 @@
 event inttypes.h gets pulled in by another header it is already
 defined.  */
  #define __STDC_FORMAT_MACROS
 +#define __STDC_LIMIT_MACROS
 
  /* We must include stdarg.h before stdio.h.  */
  #include stdarg.h

Still, I don't believe it will be portable everywhere.
Can't you use
INTTYPE_MAXIMUM (int64_t) instead of INT64_MAX?  We already use that
in GCC...

Jakub


Re: [gimple-classes, committed 4/6] tree-ssa-tail-merge.c: Use gassign

2014-11-11 Thread Eric Botcazou
 I just don't like all the as_a/is_a stuff enforced everywhere,
 it means more typing, more temporaries, more indentation.
 So, as I view it, instead of the checks being done cheaply (yes, I think
 the gimple checking as we have right now is very cheap) under the
 hood by the accessors (gimple_assign_{lhs,rhs1} etc.), those changes
 put the burden on the developers, who has to check that manually through
 the as_a/is_a stuff everywhere, more typing and uglier syntax.
 I just don't see that as a step forward, instead a huge step backwards.
 But perhaps I'm alone with this.

IMO that's the sort of things some of us were afraid of when the C++ switch 
was being discussed and IIRC we were told this would not happen...

-- 
Eric Botcazou


Re: [Patch,ARM/Thumb1]Fix 'mov' instruction for Thumb-1 UAL

2014-11-11 Thread Ramana Radhakrishnan



On 11/11/14 08:40, Terry Guo wrote:

Hi there,

Attached patch intends to fix below trunk failure caused by recent thumb-1
UAL patch:

/tmp/cc9EfnXy.s: Assembler messages:
/tmp/cc9EfnXy.s:69: Error: MOV Rd, Rs with two low registers is not
permitted on this architecture -- `mov r6,r7'

Now for pre-v6 Thumb-1, the 'movs' will be used rather than the 'mov'.

The multilib for ARM/Thumb1/hard-float all can be built. Tested with
regression test on armv4t thumb and v6m thumb. No regression. Is it ok to
trunk?


This is OK.

Ramana



BR,
Terry

2014-11-11  Terry Guo  terry@arm.com

  * doc/invoke.texi (-masm-syntax-unified): Reword and fix typo.
  * config/arm/thumb1.md (*thumb_mulsi3): Use movs to move low registers.
  (*thumb1_movhf): Likewise.



[Patch,ARM/Thumb1]Fix 'mov' instruction for Thumb-1 UAL

2014-11-11 Thread Terry Guo
Hi there,

Attached patch intends to fix below trunk failure caused by recent thumb-1
UAL patch:

/tmp/cc9EfnXy.s: Assembler messages:
/tmp/cc9EfnXy.s:69: Error: MOV Rd, Rs with two low registers is not
permitted on this architecture -- `mov r6,r7'

Now for pre-v6 Thumb-1, the 'movs' will be used rather than the 'mov'.

The multilib for ARM/Thumb1/hard-float all can be built. Tested with
regression test on armv4t thumb and v6m thumb. No regression. Is it ok to
trunk?

BR,
Terry

2014-11-11  Terry Guo  terry@arm.com

 * doc/invoke.texi (-masm-syntax-unified): Reword and fix typo.
 * config/arm/thumb1.md (*thumb_mulsi3): Use movs to move low registers.
 (*thumb1_movhf): Likewise.diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
index 8a2abe9..3d6f80b 100644
--- a/gcc/config/arm/thumb1.md
+++ b/gcc/config/arm/thumb1.md
@@ -131,12 +131,10 @@
(mult:SI (match_operand:SI 1 register_operand %l,*h,0)
 (match_operand:SI 2 register_operand l,l,l)))]
  TARGET_THUMB1  !arm_arch6
-  *
-  if (which_alternative  2)
-return \mov\\t%0, %1\;muls\\t%0, %2\;
-  else
-return \muls\\t%0, %2\;
-  
+  @
+   movs\\t%0, %1\;muls\\t%0, %2
+   mov\\t%0, %1\;muls\\t%0, %2
+   muls\\t%0, %2
   [(set_attr length 4,4,2)
(set_attr type muls)]
 )
@@ -787,6 +785,8 @@
   *
   switch (which_alternative)
 {
+case 0:
+  return \movs\\t%0, %1\;
 case 1:
   {
rtx addr;
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index cd20b6e..13270bc 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13040,13 +13040,11 @@ off by default.
 
 @item -masm-syntax-unified
 @opindex masm-syntax-unified
-Assume the Thumb1 inline assembly code are using unified syntax.
-The default is currently off, which means divided syntax is assumed.
+Assume inline assembler is using unified asm syntax.  The default is
+currently off which implies divided syntax.  Currently this option is
+available only for Thumb1 and has no effect on ARM state and Thumb2.
 However, this may change in future releases of GCC.  Divided syntax
-should be considered deprecated.  This option has no effect when
-generating Thumb2 code.  Thumb2 assembly code always uses unified syntax.
-This option has no effect for ARM state assembly code which will still
-uses divided syntax.
+should be considered deprecated.
 
 @item -mrestrict-it
 @opindex mrestrict-it


Re: [AARCH64, NEON] Any regression testcase for AARCH64 NEON intrinsics in GCC testsuite?

2014-11-11 Thread Yangfei (Felix)
  Hello,
 
  I have written a testsuite for AArch32 Neon intrinsics, available at
  https://gitorious.org/arm-neon-tests
 
  I am in the process of converting in into DejaGnu form for integration into 
  GCC.
 
  My most recent submission was
  https://gcc.gnu.org/ml/gcc-patches/2014-07/msg00022.html
  but I plan to submit another version soon.
 
  As you'll notice, this first submission only covers a small subset of
  the original testsuite, but I do plan to convert it all.
 
  That being said, the current testsuite only covers AArch32 Neon
  intrinsics, and needs to be expanded to cover the AArch64. It is still
  useful to test the AArch32 subset on AArch64.
 
  Christophe.


Hello Christophe, 

  Is the testsuite https://gitorious.org/arm-neon-tests written for 
little-endian? 
  I noticed that some testcases treat result_int8x8 as an array and access it 
by array indexing when checking the test result. 
  And this will not work for big-endian. 

An example:
exec_vzip
{
 int8_t buffer_int8x8 [] = { (int8_t)-16, (int8_t)-15, (int8_t)-14, 
(int8_t)-13, (int8_t)-12, (int8_t)-11, (int8_t)-10, (int8_t)-9, };
 int8x8_t vector1_int8x8;
 int8x8_t vector2_int8x8;
 vector1_int8x8 = vld1_s8(buffer_int8x8);
 vector2_int8x8 = vdup_n_s8(0x11);
 result_vec_int8x8x2 = vzip_s8(vector1_int8x8, vector2_int8x8);
 vst2_s8(result_bis_int8x8, result_vec_int8x8x2); 
 memcpy(result_int8x8, result_bis_int8x8, sizeof(result_int8x8));
   
   
 { { int i; for(i=0; i8 ; i++) { if (result_int8x8[i] != 
expected0_int8x8[i]) { fprintf(stderr, ERROR in %s (%s line %d in buffer '%s') 
at type %s  index %d: got 0x% x  != 0x% x  %s\n, VZIP/VZIPQ, 
./gcc.target/aarch64/advsimd-intrinsics/vzip.c, 232, expected0, int8x8, 
i, result_int8x8[i], expected0_int8x8[i], strlen((chunk 0))  0 ? (chunk 0) 
: ); abort(); } } }; };
}


Re: [PATCH] c++ify sreal

2014-11-11 Thread Uros Bizjak
On Tue, Nov 11, 2014 at 9:11 AM, Jakub Jelinek ja...@redhat.com wrote:

  do $subject, and cleanup for always 64 bit hwi.
 
 
  bootstrapped + regtested x86_64-unknown-linux-gnu, ok?
 
  Ok.  Can you please replace remaining HOST_WIDE_INT
  vestiges in there with [u]int64_t please?
 
 
  This patch breaks the build on debian 6.0:
 
  ../../gcc/sreal.c: In member function āint64_t sreal::to_int() constā:
  ../../gcc/sreal.c:159: error: āINT64_MAXā was not declared in this scope


 Still, I don't believe it will be portable everywhere.
 Can't you use
 INTTYPE_MAXIMUM (int64_t) instead of INT64_MAX?  We already use that
 in GCC...

Yes, following patch also bootstraps:

--cut here--
Index: sreal.c
===
--- sreal.c (revision 217338)
+++ sreal.c (working copy)
@@ -156,7 +156,7 @@ sreal::to_int () const
   if (m_exp = -SREAL_BITS)
 return 0;
   if (m_exp = SREAL_PART_BITS)
-return INT64_MAX;
+return INTTYPE_MAXIMUM (int64_t);
   if (m_exp  0)
 return m_sig  m_exp;
   if (m_exp  0)
--cut here--

Uros.


Re: [PATCH] c++ify sreal

2014-11-11 Thread Jakub Jelinek
On Tue, Nov 11, 2014 at 09:45:38AM +0100, Uros Bizjak wrote:
 On Tue, Nov 11, 2014 at 9:11 AM, Jakub Jelinek ja...@redhat.com wrote:
 
   do $subject, and cleanup for always 64 bit hwi.
  
  
   bootstrapped + regtested x86_64-unknown-linux-gnu, ok?
  
   Ok.  Can you please replace remaining HOST_WIDE_INT
   vestiges in there with [u]int64_t please?
  
  
   This patch breaks the build on debian 6.0:
  
   ../../gcc/sreal.c: In member function āint64_t sreal::to_int() constā:
   ../../gcc/sreal.c:159: error: āINT64_MAXā was not declared in this scope
 
 
  Still, I don't believe it will be portable everywhere.
  Can't you use
  INTTYPE_MAXIMUM (int64_t) instead of INT64_MAX?  We already use that
  in GCC...
 
 Yes, following patch also bootstraps:

This is ok for trunk with appropriate ChangeLog entry.  Thanks.

 --- sreal.c (revision 217338)
 +++ sreal.c (working copy)
 @@ -156,7 +156,7 @@ sreal::to_int () const
if (m_exp = -SREAL_BITS)
  return 0;
if (m_exp = SREAL_PART_BITS)
 -return INT64_MAX;
 +return INTTYPE_MAXIMUM (int64_t);
if (m_exp  0)
  return m_sig  m_exp;
if (m_exp  0)
 --cut here--

Jakub


Re: [match-and-simplify] operator-lists in expression

2014-11-11 Thread Richard Biener
On Mon, Nov 10, 2014 at 2:39 PM, Prathamesh Kulkarni
bilbotheelffri...@gmail.com wrote:
 Hi,
   This patch adds support for operator-lists to be used in expression.

 I reuse operator-list as the iterator. This is not really valid since
 user-defined operator-lists cannot be iterator in 'for', but it was
 convenient to reuse operator-list as a 'for' iterator
 and lower_for doesn't care about that.
 eg:
 (define_operator_list  list1 plus minus)

 (simplify
   (list1 @x integer_zerop)
   (non_lvalue @x))

 is wrapped into 'for' as: (lower_operator_list):
 (for list1 (plus minus)
   (simplify
 (list1 @x integer_zerop)
 (non_lvalue @x)))

 this is not really valid since we reject list1 to be used as iterator if
 it were written by user.

 Is this okay or should I introduce an explicit temporary iterator ?

No, it's ok to re-use it.

I think you should get rid of the extra lowering step and instead
in parse_simplify create the extra for directly when building
a simplify (the multiple simplfy buildings really ask for factoring
it out to a method in the parser class which has access to
active_fors, active_ifs and friends).

Also you use a vector to store operator_lists - this will gobble
up duplicates.  It's probably better to use a pointer_hash user_id *
for this.

Thanks for continuing to work on this!

Richard.

 so it gets lowered to something like:
 (for tmp1 (list1)
   (simplify
 (tmp1 @x integer_zerop)
 (non_lvalue @x)))

 * genmatch.c
   (fatal_at): New overloaded function.
   (simplify::oper_lists): New member.
   (simplify::simplify): Add default argument.
   (lower_commutative): Adjust call to simplify::simplify.
   (lower_opt_convert): Likewise.
   (lower_operator_list): New function.
   (lower): Call lower_operator_list.
   (parser::parsing_for_p): New member function.
   (parser::oper_lists): New member.
   (parser::parse_operation): Check for operator-list.
   (parser::parse_c_expr): Likewise.
   (parser::parse_simplify): Reset parser::oper_lists.
 Adjust call to simplify::simplify.
   (parser::parser): Initialize parser::oper_lists.

 * match-builtin.pd:
   Adjust patten to use SQRTs and POWs.

 Thanks,
 Prathamesh


Fix libtool.m4 for Darwin = 10.10

2014-11-11 Thread FX
libtool.m4 has a globbing pattern that assumes Mac OS version numbers 10.x are 
one digit for x. That’s unfortunate, especially now that Mac OS 10.10 was 
released :)

libtool has released a new version to fix this bug. The attached patch, 
bootstrapped and regtested on x86_64-apple-darwin14 (Mac OS 10.10), 
incorporates this fix into our libtool.m4 and regenerates the configures under 
our control.

OK to commit? This touches so many area it probably needs a build maintainer or 
global maintainer to approve it.

FX

PS: Let me know what the procedure is for the toplevel files (libtool.m4 and 
configure).




libtool.ChangeLog
Description: Binary data


libtool.diff
Description: Binary data


Re: Fix libtool.m4 for Darwin = 10.10

2014-11-11 Thread Jakub Jelinek
On Tue, Nov 11, 2014 at 09:58:45AM +0100, FX wrote:
 libtool.m4 has a globbing pattern that assumes Mac OS version numbers 10.x 
 are one digit for x. That’s unfortunate, especially now that Mac OS 10.10 was 
 released :)
 
 libtool has released a new version to fix this bug. The attached patch, 
 bootstrapped and regtested on x86_64-apple-darwin14 (Mac OS 10.10), 
 incorporates this fix into our libtool.m4 and regenerates the configures 
 under our control.
 
 OK to commit? This touches so many area it probably needs a build maintainer 
 or global maintainer to approve it.
 
 FX
 
 PS: Let me know what the procedure is for the toplevel files (libtool.m4 and 
 configure).

Your patch contains lots of other changes, not just the libtool.m4
change.  Please filter those out.

2014-11-11  Francois-Xavier Coudert  fxcoud...@gcc.gnu.org

PR target/63610
* boehm-gc/configure: Regenerate.

boehm-gc/ etc. have their own ChangeLog files, so the
entries should say just * configure: Regenerate.

2014-11-11  Francois-Xavier Coudert  fxcoud...@gcc.gnu.org

PR target/63610
* libcc1/plugin.cc: 

???

2014-11-11  Francois-Xavier Coudert  fxcoud...@gcc.gnu.org

PR target/63610
* libvtv/configure (else): 

??

Jakub


Re: [PATCH][Revisedx2] Fix PR63750

2014-11-11 Thread Richard Biener
On Mon, Nov 10, 2014 at 3:58 PM, FX fxcoud...@gmail.com wrote:
 My knowledge of C++ is limited, but I think this additional patch to 
 wide-int.h is the proper fix to the issue reported by Jack, no?
 I’m bootstrapping it right now, it already passed stage 2.

 Boostrapped succeeded on x86_64-apple-darwin14.
 OK to commit to trunk?

Ok.

Thanks,
Richard.


Re: Fix libtool.m4 for Darwin = 10.10

2014-11-11 Thread FX
 Your patch contains lots of other changes, not just the libtool.m4
 change.  Please filter those out.

Sorry about that. The patch attached should be clean, and the ChangeLog entries 
formatted as they should.

OK to commit? This touches so many area it probably needs a build maintainer or 
global maintainer to approve it.

FX





libtool.diff
Description: Binary data


libtool.ChangeLog
Description: Binary data


Re: [PATCH][Revisedx2] Fix PR63750

2014-11-11 Thread FX
 Ok.

Committed as rev. 217342.
Thanks for the review!

FX


Re: [PATCH] c++ify sreal

2014-11-11 Thread Marc Glisse

On Tue, 11 Nov 2014, Jakub Jelinek wrote:


On Tue, Nov 11, 2014 at 08:51:41AM +0100, Uros Bizjak wrote:

Hello!


do $subject, and cleanup for always 64 bit hwi.


bootstrapped + regtested x86_64-unknown-linux-gnu, ok?


Ok.  Can you please replace remaining HOST_WIDE_INT
vestiges in there with [u]int64_t please?



This patch breaks the build on debian 6.0:

../../gcc/sreal.c: In member function âint64_t sreal::to_int() constâ:
../../gcc/sreal.c:159: error: âINT64_MAXâ was not declared in this scope


Index: system.h
===
--- system.h(revision 217338)
+++ system.h(working copy)
@@ -27,6 +27,7 @@
event inttypes.h gets pulled in by another header it is already
defined.  */
 #define __STDC_FORMAT_MACROS
+#define __STDC_LIMIT_MACROS

 /* We must include stdarg.h before stdio.h.  */
 #include stdarg.h


Still, I don't believe it will be portable everywhere.
Can't you use
INTTYPE_MAXIMUM (int64_t) instead of INT64_MAX?  We already use that
in GCC...


We could also start using the standard C++ mechanism (numeric_limits).

(nothing wrong with INTTYPE_MAXIMUM, just an alternative)

--
Marc Glisse


RE: [2/2][PATCH,ARM]Generate UAL assembly code for Thumb-1 target

2014-11-11 Thread Terry Guo


 -Original Message-
 From: Terry Guo [mailto:terry@arm.com]
 Sent: Friday, November 07, 2014 6:01 PM
 To: 'Christian Bruel'
 Cc: gcc-patches@gcc.gnu.org
 Subject: RE: [2/2][PATCH,ARM]Generate UAL assembly code for Thumb-1
 target
 
 
 
  -Original Message-
  From: Christian Bruel [mailto:christian.br...@st.com]
  Sent: Friday, November 07, 2014 5:27 PM
  To: Terry Guo
  Cc: gcc-patches@gcc.gnu.org
  Subject: Re: [2/2][PATCH,ARM]Generate UAL assembly code for Thumb-1
  target
 
  hi,
 
  the ARM bootstrap seems to fail for libgcc2.c on the thumb multilib
  for
  libgcc2: muldi3 -mthumb -O2  -g
 
  /tmp/ccYrycUw.s: Assembler messages:
  /tmp/ccYrycUw.s:69: Error: MOV Rd, Rs with two low registers is not
  permitted on this architecture -- `mov r6,r7'
 
  preprocessed attached.
 
  Thanks
 
  Christian
 
 Many thanks. I am looking into it now.
 
 BR,
 Terry

Fix is committed to trunk at 
https://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=217341.

BR,
Terry





Re: [x86, 6/n] Replace builtins with vector extensions

2014-11-11 Thread Kirill Yukhin
Hello Marc, Uroš,
On 10 Nov 21:33, Uros Bizjak wrote:
 On Sun, Nov 9, 2014 at 5:26 PM, Marc Glisse marc.gli...@inria.fr wrote:
  Hello,
 
and == for integer vectors of size 128. I was surprised not to find
  _mm_cmplt_epi64 anywhere. Note that I can do the same for size 256, but not
  512, there is no corresponding intrinsic, there are only _mask versions that
  return a mask.
 
 Let's ask Kirill (CC'd) about missing intrinsics.
We have no `_mm_cmplt_epi64' intrinsic because there's no such instruction in
Intel ISA. All we have is [V]PCMP[EQ|GT] on pre-AVX-512* and VPCMP starting from
AVX-512*.
VPCMP is able to model VPCMPLT by specifiyng corresponding immediate and we
have intrinsics for that (config/i386/avx512fintrin.h):
extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
_mm512_cmplt_epu32_mask (__m512i __X, __m512i __Y)

--
Thanks, K




Re: [x86, 6/n] Replace builtins with vector extensions

2014-11-11 Thread Marc Glisse

On Tue, 11 Nov 2014, Kirill Yukhin wrote:


Hello Marc, Uroš,
On 10 Nov 21:33, Uros Bizjak wrote:

On Sun, Nov 9, 2014 at 5:26 PM, Marc Glisse marc.gli...@inria.fr wrote:

Hello,

  and == for integer vectors of size 128. I was surprised not to find
_mm_cmplt_epi64 anywhere. Note that I can do the same for size 256, but not
512, there is no corresponding intrinsic, there are only _mask versions that
return a mask.


Let's ask Kirill (CC'd) about missing intrinsics.
We have no `_mm_cmplt_epi64' intrinsic because there's no such 
instruction in Intel ISA.


We have _mm_cmplt_epi32 without a corresponding instruction though ;-)
(yes, it is useless)


All we have is [V]PCMP[EQ|GT] on pre-AVX-512* and VPCMP starting from
AVX-512*.
VPCMP is able to model VPCMPLT by specifiyng corresponding immediate and we
have intrinsics for that (config/i386/avx512fintrin.h):
extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
_mm512_cmplt_epu32_mask (__m512i __X, __m512i __Y)


--
Marc Glisse


Re: [PATCH, i386]: Use std::swap

2014-11-11 Thread Uros Bizjak
On Tue, Nov 11, 2014 at 9:09 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Mon, Nov 10, 2014 at 10:51 PM, Marc Glisse marc.gli...@inria.fr wrote:
 On Mon, 10 Nov 2014, Richard Biener wrote:

 No extra includes required?


 utility is already included in wide-int.h and rtl.h, should probably move
 those.

 Bah, we hit a problem. std::swap has been moved from algorithm to
 utility in C++11, and the patch breaks build on CentOS 5.11
 (gcc-4.1.2).

 Short of reverting the i386.c patch, is there a quick solution by
 including some additional headers?

Attached patch that implements both suggestions from Richi and Marc
fixes the bootstrap.

2014-11-11  Uros Bizjak  ubiz...@gmail.com

* system.h: Include algorithm and utility.
* rtl.h: Do not include utility here.
* wide-int.h: Ditto.
* tree-vect-data-refs.c (swap): Remove template.
(vect_prune_runtime_alias_test_list): Use std::swap instead of swap.

Bootstrapped on x86_64-linux-gnu (CentOS 5.11).

OK for mainline?

BTW: There are lots of places where std::swap can be used, a nice
search-and-replace task for someone to start with gcc development. ;)

Uros.
Index: tree-vect-data-refs.c
===
--- tree-vect-data-refs.c   (revision 217340)
+++ tree-vect-data-refs.c   (working copy)
@@ -2718,14 +2718,6 @@
   return 0;
 }
 
-template class T static void
-swap (T a, T b)
-{
-  T c (a);
-  a = b;
-  b = c;
-}
-
 /* Function vect_vfa_segment_size.
 
Create an expression that computes the size of segment
@@ -2858,7 +2850,7 @@ vect_prune_runtime_alias_test_list (loop_vec_info
   dr_with_seg_len (dr_b, segment_length_b));
 
   if (compare_tree (DR_BASE_ADDRESS (dr_a), DR_BASE_ADDRESS (dr_b))  0)
-   swap (dr_with_seg_len_pair.first, dr_with_seg_len_pair.second);
+   std::swap (dr_with_seg_len_pair.first, dr_with_seg_len_pair.second);
 
   comp_alias_ddrs.safe_push (dr_with_seg_len_pair);
 }
@@ -2908,8 +2900,8 @@ vect_prune_runtime_alias_test_list (loop_vec_info
 and DR_A1 and DR_A2 are two consecutive memrefs.  */
  if (*dr_a1 == *dr_a2)
{
- swap (dr_a1, dr_b1);
- swap (dr_a2, dr_b2);
+ std::swap (dr_a1, dr_b1);
+ std::swap (dr_a2, dr_b2);
}
 
  if (!operand_equal_p (DR_BASE_ADDRESS (dr_a1-dr),
Index: wide-int.h
===
--- wide-int.h  (revision 217340)
+++ wide-int.h  (working copy)
@@ -216,8 +216,6 @@
the same result as X + X; the precision of the shift amount Y
can be arbitrarily different from X.  */
 
-
-#include utility
 #include system.h
 #include hwint.h
 #include signop.h
Index: rtl.h
===
--- rtl.h   (revision 217340)
+++ rtl.h   (working copy)
@@ -20,7 +20,6 @@
 #ifndef GCC_RTL_H
 #define GCC_RTL_H
 
-#include utility
 #include statistics.h
 #include machmode.h
 #include input.h
Index: system.h
===
--- system.h(revision 217340)
+++ system.h(working copy)
@@ -208,7 +208,9 @@
 #endif
 
 #ifdef __cplusplus
+# include algorithm
 # include cstring
+# include utility
 #endif
 
 /* Some of glibc's string inlines cause warnings.  Plus we'd rather


Re: libstdc++ new deque failures

2014-11-11 Thread FX
The patch below break bootstrap on darwin 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63811):

 Fix std::deque move construction with non-equal allocators.
 
   * include/bits/stl_deque.h (_Deque_base::_Deque_base(_Deque_base)):
   Dispatch according to whether allocators are always equal.
   (_Deque_base::_M_move_impl()): Implement move-from state.


In file included from 
/Users/fx/devel/gcc/ibin2/x86_64-apple-darwin14.0.0/libstdc++-v3/include/deque:64:0,
 from 
/Users/fx/devel/gcc/trunk2/libstdc++-v3/include/precompiled/stdc++.h:67:
/Users/fx/devel/gcc/ibin2/x86_64-apple-darwin14.0.0/libstdc++-v3/include/bits/stl_deque.h:
 In member function ‘std::_Deque_base_Tp, _Alloc::_Deque_impl 
std::_Deque_base_Tp, _Alloc::_M_move_impl()’:
/Users/fx/devel/gcc/ibin2/x86_64-apple-darwin14.0.0/libstdc++-v3/include/bits/stl_deque.h:645:17:
 error: expected primary-expression before ‘__attribute’
  _Tp_alloc_type __attribute((__unused__)) {std::move(__alloc)};
 ^
make[2]: *** [x86_64-apple-darwin14.0.0/bits/stdc++.h.gch/O2ggnu++0x.gch] Error 
1



Re: [PATCH 2/2] Simplify and extend VRP edge-assertion code

2014-11-11 Thread Patrick Palka
This patch failed regtesting -- and on second thought I'm not too
confident that the refactoring is strictly an improvement so I will
try to fix the main issue (that is to make the test vrp-1.c fail to
compile) in a more direct way.


Re: [PATCH 1/2] VRP: Simplify logic for checking if any asserts need to be inserted

2014-11-11 Thread Richard Biener
On Tue, Nov 11, 2014 at 4:51 AM, Patrick Palka patr...@parcs.ath.cx wrote:
 Hi,

 This patch tweaks the VRP code to simply inspect the need_assert_for
 bitmap when determining whether any asserts need to be inserted.
 Consequently we no longer have to manually keep track of whether a call
 to register_new_assert_for() was made.

 This patch is an updated version of a patch that was approved a few
 months ago but was never committed.  Bootstrapped and regtested on
 x86_64-unknown-linux-gnu with no new regressions.  Is it OK to commit?

Ok.

Thanks,
Richard.

 2014-08-13  Patrick Palka  ppa...@gcc.gnu.org

 * tree-vrp.c (register_edge_assert_for_2): Change return type to
 void and adjust accordingly.
 (register_edge_assert_for_1): Likewise.
 (register_edge_assert_for): Likewise.
 (find_conditional_asserts): Likewise.
 (find_switch_asserts): Likewise.
 (find_assert_locations_1): Likewise.
 (find_assert_locations): Likewise.
 (insert_range_insertions): Inspect the need_assert_for bitmap.
 ---
  gcc/tree-vrp.c | 157 
 ++---
  1 file changed, 49 insertions(+), 108 deletions(-)

 diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
 index 4e4ebe0..f0a4382 100644
 --- a/gcc/tree-vrp.c
 +++ b/gcc/tree-vrp.c
 @@ -4977,32 +4977,27 @@ masked_increment (const wide_int val_in, const 
 wide_int mask,

  /* Try to register an edge assertion for SSA name NAME on edge E for
 the condition COND contributing to the conditional jump pointed to by BSI.
 -   Invert the condition COND if INVERT is true.
 -   Return true if an assertion for NAME could be registered.  */
 +   Invert the condition COND if INVERT is true.  */

 -static bool
 +static void
  register_edge_assert_for_2 (tree name, edge e, gimple_stmt_iterator bsi,
 enum tree_code cond_code,
 tree cond_op0, tree cond_op1, bool invert)
  {
tree val;
enum tree_code comp_code;
 -  bool retval = false;

if (!extract_code_and_val_from_cond_with_ops (name, cond_code,
 cond_op0,
 cond_op1,
 invert, comp_code, val))
 -return false;
 +return;

/* Only register an ASSERT_EXPR if NAME was found in the sub-graph
   reachable from E.  */
if (live_on_edge (e, name)
 !has_single_use (name))
 -{
 -  register_new_assert_for (name, name, comp_code, val, NULL, e, bsi);
 -  retval = true;
 -}
 +register_new_assert_for (name, name, comp_code, val, NULL, e, bsi);

/* In the case of NAME = CST and NAME being defined as
   NAME = (unsigned) NAME2 + CST2 we can assert NAME2 = -CST2
 @@ -5063,8 +5058,6 @@ register_edge_assert_for_2 (tree name, edge e, 
 gimple_stmt_iterator bsi,
 }

   register_new_assert_for (name3, tmp, comp_code, val, NULL, e, bsi);
 -
 - retval = true;
 }

/* If name2 is used later, create an ASSERT_EXPR for it.  */
 @@ -5094,8 +5087,6 @@ register_edge_assert_for_2 (tree name, edge e, 
 gimple_stmt_iterator bsi,
 }

   register_new_assert_for (name2, tmp, comp_code, val, NULL, e, bsi);
 -
 - retval = true;
 }
  }

 @@ -5133,7 +5124,6 @@ register_edge_assert_for_2 (tree name, edge e, 
 gimple_stmt_iterator bsi,
   cst = int_const_binop (code, val, cst);
   register_new_assert_for (name2, name2, comp_code, cst,
NULL, e, bsi);
 - retval = true;
 }
 }
  }
 @@ -5197,8 +5187,6 @@ register_edge_assert_for_2 (tree name, edge e, 
 gimple_stmt_iterator bsi,

   register_new_assert_for (name2, tmp, new_comp_code, cst, NULL,
e, bsi);
 -
 - retval = true;
 }
 }

 @@ -5276,7 +5264,6 @@ register_edge_assert_for_2 (tree name, edge e, 
 gimple_stmt_iterator bsi,

   register_new_assert_for (name2, tmp, new_comp_code, new_val,
NULL, e, bsi);
 - retval = true;
 }
 }

 @@ -5297,8 +5284,7 @@ register_edge_assert_for_2 (tree name, edge e, 
 gimple_stmt_iterator bsi,
TREE_CODE (TREE_TYPE (val)) == INTEGER_TYPE
TYPE_UNSIGNED (TREE_TYPE (val))
TYPE_PRECISION (TREE_TYPE (gimple_assign_rhs1 (def_stmt)))
 - prec
 -  !retval))
 + prec))
 {
   name2 = gimple_assign_rhs1 (def_stmt);
   if (rhs_code == BIT_AND_EXPR)
 @@ -5522,13 +5508,10 @@ register_edge_assert_for_2 (tree name, edge e, 
 gimple_stmt_iterator bsi,

 register_new_assert_for (names[i], tmp, LE_EXPR,
  new_val, NULL, e, bsi);
 - 

[PATCH] [AArch64, RTL] Bics instruction generation for aarch64

2014-11-11 Thread Alex Velenko

From 98bb6d7323ce79e28be8ef892b919391ed857e1f Mon Sep 17 00:00:00 2001
From: Alex Velenko alex.vele...@arm.com
Date: Fri, 31 Oct 2014 18:43:32 +
Subject: [PATCH] [AArch64, RTL] Bics instruction generation for aarch64

Hi,

This patch adds rtl patterns for aarch64 to generate bics instructions in
cases when caputed value gets discarded and only only the status regester
change of the instruction gets reused.

Previously, bics would only be generated, if the value computed by bics
would later be reused, which is not necessarily the case when computing
this value for if statements.

Is this patch ok?

Thanks,
Alex

gcc/

2014-11-10  Alex Velenko  alex.vele...@arm.com

* gcc/config/aarch64/aarch64.md 
(and_one_cmplmode3_compare0_no_reuse):

  New define_insn.
* (and_one_cmpl_SHIFT:optabmode3_compare0_no_reuse):
  Likewise.

gcc/testsuite/

2014-11-10  Alex Velenko  alex.vele...@arm.com

* gcc.target/aarch64/bics1.c : New testcase.
---
 gcc/config/aarch64/aarch64.md | 26 
 gcc/testsuite/gcc.target/aarch64/bics_3.c | 69 
+++

 2 files changed, 95 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/bics_3.c

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 341c26f..6158d82 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2845,6 +2845,18 @@
   [(set_attr type logics_reg)]
 )

+(define_insn *and_one_cmplmode3_compare0_no_reuse
+  [(set (reg:CC_NZ CC_REGNUM)
+(compare:CC_NZ
+ (and:GPI (not:GPI
+   (match_operand:GPI 0 register_operand r))
+  (match_operand:GPI 1 register_operand r))
+ (const_int 0)))]
+  
+  bics\\twzr, %w1, %w0
+  [(set_attr type logics_reg)]
+)
+
 (define_insn *LOGICAL:optab_one_cmpl_SHIFT:optabmode3
   [(set (match_operand:GPI 0 register_operand =r)
 (LOGICAL:GPI (not:GPI
@@ -2894,6 +2906,20 @@
   [(set_attr type logics_shift_imm)]
 )

+(define_insn *and_one_cmpl_SHIFT:optabmode3_compare0_no_reuse
+  [(set (reg:CC_NZ CC_REGNUM)
+(compare:CC_NZ
+ (and:GPI (not:GPI
+   (SHIFT:GPI
+(match_operand:GPI 0 register_operand r)
+(match_operand:QI 1 aarch64_shift_imm_mode n)))
+  (match_operand:GPI 2 register_operand r))
+ (const_int 0)))]
+  
+  bics\\twzr, %w2, %w0, SHIFT:shift %1
+  [(set_attr type logics_shift_imm)]
+)
+
 (define_insn clzmode2
   [(set (match_operand:GPI 0 register_operand =r)
 (clz:GPI (match_operand:GPI 1 register_operand r)))]
diff --git a/gcc/testsuite/gcc.target/aarch64/bics_3.c 
b/gcc/testsuite/gcc.target/aarch64/bics_3.c

new file mode 100644
index 000..ecb53e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/bics_3.c
@@ -0,0 +1,69 @@
+/* { dg-do run } */
+/* { dg-options -O2 --save-temps } */
+
+extern void abort (void);
+
+int __attribute__ ((noinline))
+bics_si_test (int a, int b)
+{
+  if (a  ~b)
+return 1;
+  else
+return 0;
+}
+
+int __attribute__ ((noinline))
+bics_si_test2 (int a, int b)
+{
+  if (a  ~ (b  2))
+return 1;
+  else
+return 0;
+}
+
+typedef long long s64;
+
+int __attribute__ ((noinline))
+bics_di_test (s64 a, s64 b)
+{
+  if (a  ~b)
+return 1;
+  else
+return 0;
+}
+
+int __attribute__ ((noinline))
+bics_di_test2 (s64 a, s64 b)
+{
+  if (a  ~(b  2))
+return 1;
+  else
+return 0;
+}
+
+int
+main (void)
+{
+  int a = 5;
+  int b = 5;
+  int c = 20;
+  s64 d = 5;
+  s64 e = 5;
+  s64 f = 20;
+  if (bics_si_test (a, b))
+abort ();
+  if (bics_si_test2 (c, b))
+abort ();
+  if (bics_di_test (d, e))
+abort ();
+  if (bics_di_test2 (f, e))
+abort ();
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times bics\twzr, w\[0-9\]+, w\[0-9\]+ 
2 } } */
+/* { dg-final { scan-assembler-times bics\twzr, w\[0-9\]+, w\[0-9\]+, 
lsl 2 1 } } */
+/* { dg-final { scan-assembler-times bics\txzr, x\[0-9\]+, x\[0-9\]+ 
2 } } */
+/* { dg-final { scan-assembler-times bics\txzr, x\[0-9\]+, x\[0-9\]+, 
lsl 2 1 } } */

+
+/* { dg-final { cleanup-saved-temps } } */
--
1.8.1.2





Re: libstdc++ new deque failures

2014-11-11 Thread Jonathan Wakely

On 11/11/14 10:49 +0100, FX wrote:

The patch below break bootstrap on darwin 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63811):


Fix std::deque move construction with non-equal allocators.

* include/bits/stl_deque.h (_Deque_base::_Deque_base(_Deque_base)):
Dispatch according to whether allocators are always equal.
(_Deque_base::_M_move_impl()): Implement move-from state.



In file included from 
/Users/fx/devel/gcc/ibin2/x86_64-apple-darwin14.0.0/libstdc++-v3/include/deque:64:0,
from 
/Users/fx/devel/gcc/trunk2/libstdc++-v3/include/precompiled/stdc++.h:67:
/Users/fx/devel/gcc/ibin2/x86_64-apple-darwin14.0.0/libstdc++-v3/include/bits/stl_deque.h: 
In member function ‘std::_Deque_base_Tp, _Alloc::_Deque_impl 
std::_Deque_base_Tp, _Alloc::_M_move_impl()’:
/Users/fx/devel/gcc/ibin2/x86_64-apple-darwin14.0.0/libstdc++-v3/include/bits/stl_deque.h:645:17:
 error: expected primary-expression before ‘__attribute’
 _Tp_alloc_type __attribute((__unused__)) {std::move(__alloc)};
^
make[2]: *** [x86_64-apple-darwin14.0.0/bits/stdc++.h.gch/O2ggnu++0x.gch] Error 
1


Should be fixed with this renaming.

Tested x86_64-linux, committed to trunk.

commit 3a81c243672bd721f15bc6320fc7a82e850fc3d8
Author: Jonathan Wakely jwak...@redhat.com
Date:   Tue Nov 11 10:11:09 2014 +

	PR libstdc++/63811
	* include/bits/stl_deque.h (_Deque_base::_M_move_impl()): Avoid using
	badname.

diff --git a/libstdc++-v3/include/bits/stl_deque.h b/libstdc++-v3/include/bits/stl_deque.h
index c0052b3..3a1c85d 100644
--- a/libstdc++-v3/include/bits/stl_deque.h
+++ b/libstdc++-v3/include/bits/stl_deque.h
@@ -642,7 +642,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 	// Create a copy of the current allocator.
 	_Tp_alloc_type __alloc{_M_get_Tp_allocator()};
 	// Put that copy in a moved-from state.
-	_Tp_alloc_type __unused __attribute((__unused__)) {std::move(__alloc)};
+	_Tp_alloc_type __sink __attribute((__unused__)) {std::move(__alloc)};
 	// Create an empty map that allocates using the moved-from allocator.
 	_Deque_base __empty{__alloc};
 	// Now safe to modify current allocator and perform non-throwing swaps.


Re: [gimple-classes, committed 4/6] tree-ssa-tail-merge.c: Use gassign

2014-11-11 Thread Richard Biener
On Tue, Nov 11, 2014 at 8:26 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Mon, Nov 10, 2014 at 05:27:50PM -0500, David Malcolm wrote:
 On Sat, 2014-11-08 at 14:56 +0100, Jakub Jelinek wrote:
  On Sat, Nov 08, 2014 at 01:07:28PM +0100, Richard Biener wrote:
   To be constructive here - the above case is from within a
   GIMPLE_ASSIGN case label
   and thus I'd have expected
  
   case GIMPLE_ASSIGN:
 {
   gassign *a1 = as_a gassign * (s1);
   gassign *a2 = as_a gassign * (s2);
 lhs1 = gimple_assign_lhs (a1);
 lhs2 = gimple_assign_lhs (a2);
 if (TREE_CODE (lhs1) != SSA_NAME
  TREE_CODE (lhs2) != SSA_NAME)
   return (operand_equal_p (lhs1, lhs2, 0)
gimple_operand_equal_value_p (gimple_assign_rhs1 (a1),
gimple_assign_rhs1 
   (a2)));
 else if (TREE_CODE (lhs1) == SSA_NAME
   TREE_CODE (lhs2) == SSA_NAME)
   return vn_valueize (lhs1) == vn_valueize (lhs2);
 return false;
 }
  
   instead.  That's the kind of changes I have expected and have approved 
   of.
 
  But even that looks like just adding extra work for all developers, with no
  gain.  You only have to add extra code and extra temporaries, in switches
  typically also have to add {} because of the temporaries and thus extra
  indentation level, and it doesn't simplify anything in the code.

 The branch attempts to use the C++ typesystem to capture information
 about the kinds of gimple statement we expect, both:
   (A) so that the compiler can detect type errors, and
   (B) as a comprehension aid to the human reader of the code

 The ideal here is when function params and struct field can be
 strengthened from gimple to a subclass ptr.  This captures the
 knowledge that every use of a function or within a struct has a given
 gimple code.

 I just don't like all the as_a/is_a stuff enforced everywhere,
 it means more typing, more temporaries, more indentation.
 So, as I view it, instead of the checks being done cheaply (yes, I think
 the gimple checking as we have right now is very cheap) under the
 hood by the accessors (gimple_assign_{lhs,rhs1} etc.), those changes
 put the burden on the developers, who has to check that manually through
 the as_a/is_a stuff everywhere, more typing and uglier syntax.
 I just don't see that as a step forward, instead a huge step backwards.
 But perhaps I'm alone with this.
 Can you e.g. compare the size of - lines in your patchset combined, and
 size of + lines in your patchset?  As in, if your changes lead to less
 typing or more.

I see two ways out here.  One is to add overloads to all the functions
taking the special types like

tree
gimple_assign_rhs1 (gimple *);

or simply add

gassign *operator ()(gimple *g) { return as_a gassign * (g); }

into a gimple-compat.h header which you include in places that
are not converted nicely.

Both avoid manually making the compiler happy (which the
explicit as_a stuff is!  It doesn't add any checking - it's
just placing the as_a at the callers and thus make the
runtine ICE fire there).

As much as I don't like global conversion operators I don't
like adding overloads to all of the accessor functions even more.

Whether you enable them generally or just for selected files
via a gimple-compat.h will be up to you (but I'd rather get
rid of them at some point).

Note this allows seamless transform of random functions
taking a gimple now but really only expecting a single kind.

Note that we don't absolutely have to rush this all in for GCC 5.
Being the very first for GCC 6 stage1 is another possibility.
We just should get it right.

Thanks,
Richard.


 Jakub


Re: [PATCH, i386]: Use std::swap

2014-11-11 Thread Richard Biener
On Tue, Nov 11, 2014 at 10:41 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Tue, Nov 11, 2014 at 9:09 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Mon, Nov 10, 2014 at 10:51 PM, Marc Glisse marc.gli...@inria.fr wrote:
 On Mon, 10 Nov 2014, Richard Biener wrote:

 No extra includes required?


 utility is already included in wide-int.h and rtl.h, should probably move
 those.

 Bah, we hit a problem. std::swap has been moved from algorithm to
 utility in C++11, and the patch breaks build on CentOS 5.11
 (gcc-4.1.2).

 Short of reverting the i386.c patch, is there a quick solution by
 including some additional headers?

 Attached patch that implements both suggestions from Richi and Marc
 fixes the bootstrap.

 2014-11-11  Uros Bizjak  ubiz...@gmail.com

 * system.h: Include algorithm and utility.
 * rtl.h: Do not include utility here.
 * wide-int.h: Ditto.
 * tree-vect-data-refs.c (swap): Remove template.
 (vect_prune_runtime_alias_test_list): Use std::swap instead of swap.

 Bootstrapped on x86_64-linux-gnu (CentOS 5.11).

 OK for mainline?

 BTW: There are lots of places where std::swap can be used, a nice
 search-and-replace task for someone to start with gcc development. ;)

Agreed ;)  Note that we have to be careful to avoid pulling all of libstdc++
into all files via system.h (system.h is so a bad thing... :/).

Ok.

Thanks,
Richard.

 Uros.


[PATCH][fortran] PR 63701 Make sure variable is always used initialised

2014-11-11 Thread Kyrill Tkachov

Hi all,

As this trivial PR says, found is not initialised, later conditionally 
set to true in the for loop that follows and gcc_asserted in the end.
It is expected that the found = true; statement will always be hit, but 
in case something elsewhere goes wrong and it is not, we want the 
gcc_assert to

use a properly initialised found = false value.

Ok for trunk?

Thanks,
Kyrill

2014-11-11  Kyrylo Tkachov  kyrylo.tkac...@arm.com

PR fortran/63701
* trans-expr.c (gfc_get_tree_for_caf_expr): Initialise found to false.diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 18bc502..b36acbe 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -1406,7 +1406,7 @@ tree
 gfc_get_tree_for_caf_expr (gfc_expr *expr)
 {
   tree caf_decl;
-  bool found;
+  bool found = false;
   gfc_ref *ref;
 
   gcc_assert (expr  expr-expr_type == EXPR_VARIABLE);

Re: [x86, 6/n] Replace builtins with vector extensions

2014-11-11 Thread Kirill Yukhin
On 11 Nov 10:28, Marc Glisse wrote:
 On Tue, 11 Nov 2014, Kirill Yukhin wrote:
 
 Hello Marc, Uroš,
 On 10 Nov 21:33, Uros Bizjak wrote:
 On Sun, Nov 9, 2014 at 5:26 PM, Marc Glisse marc.gli...@inria.fr wrote:
 Hello,
 
   and == for integer vectors of size 128. I was surprised not to find
 _mm_cmplt_epi64 anywhere. Note that I can do the same for size 256, but not
 512, there is no corresponding intrinsic, there are only _mask versions 
 that
 return a mask.
 
 Let's ask Kirill (CC'd) about missing intrinsics.
 We have no `_mm_cmplt_epi64' intrinsic because there's no such
 instruction in Intel ISA.
 
 We have _mm_cmplt_epi32 without a corresponding instruction though ;-)
 (yes, it is useless)
Right, but not in official SDM [1]. I believe this extra intrinsics were added
for compatibility w/ ICC which also features it.
 
 -- 
 Marc Glisse

[1] - 
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf

--
Thanks, K


Re: [C PATCH] warn for empty struct -Wc++-compat

2014-11-11 Thread Marek Polacek
On Tue, Nov 11, 2014 at 04:45:46AM +0530, Prathamesh Kulkarni wrote:
 Index: gcc/c/c-decl.c
 ===
 --- gcc/c/c-decl.c(revision 217287)
 +++ gcc/c/c-decl.c(working copy)
 @@ -606,6 +606,8 @@
/* If warn_cxx_compat, a list of typedef names used when defining
   fields in this struct.  */
vectree typedefs_seen;
 +  /* code to distinguish between struct/union */
 +  enum tree_code code;
  
I don't think this is desirable, you might just pass T down from
finish_struct to warn_cxx_compat_finish_struct.

 @@ -7506,12 +7509,19 @@
  /* Finish up struct info used by -Wc++-compat.  */
  
  static void
 -warn_cxx_compat_finish_struct (tree fieldlist)
 +warn_cxx_compat_finish_struct (tree fieldlist, location_t record_loc)
  {
unsigned int ix;
tree x;
struct c_binding *b;
  
 +  if (fieldlist == NULL_TREE)
 +{
 +  warning_at (record_loc, OPT_Wc___compat,
 +   empty %s has size 0 in C, 1 in C++,
 +   (struct_parse_info-code == RECORD_TYPE) ? struct : 
 union);
 +}
 +

I think this won't work well wrt translations, so you need to have
an if here.  See the pedwarns at the beginning of finish_struct.

 Index: gcc/testsuite/gcc.dg/Wcxx-compat-22.c
 ===
 --- gcc/testsuite/gcc.dg/Wcxx-compat-22.c (revision 0)
 +++ gcc/testsuite/gcc.dg/Wcxx-compat-22.c (working copy)
 @@ -0,0 +1,4 @@
 +/* { dg-do compile } */
 +/* { dg-options -Wc++-compat } */
 +struct A {}; /* { dg-warning empty struct has size 0 in C } */
 +union B {}; /* { dg-warning empty union has size 0 in C } */

Please also test an empty struct in a struct.

Thanks,

Marek


Re: [C PATCH] warn for empty struct -Wc++-compat

2014-11-11 Thread Marc Glisse

On Tue, 11 Nov 2014, Marek Polacek wrote:


@@ -7506,12 +7509,19 @@
 /* Finish up struct info used by -Wc++-compat.  */

 static void
-warn_cxx_compat_finish_struct (tree fieldlist)
+warn_cxx_compat_finish_struct (tree fieldlist, location_t record_loc)
 {
   unsigned int ix;
   tree x;
   struct c_binding *b;

+  if (fieldlist == NULL_TREE)
+{
+  warning_at (record_loc, OPT_Wc___compat,
+ empty %s has size 0 in C, 1 in C++,
+ (struct_parse_info-code == RECORD_TYPE) ? struct : 
union);
+}
+


I think this won't work well wrt translations, so you need to have
an if here.  See the pedwarns at the beginning of finish_struct.


Do keywords like struct/union really require translation?

--
Marc Glisse


Re: [PATCH][fortran] PR 63701 Make sure variable is always used initialised

2014-11-11 Thread FX
 2014-11-11  Kyrylo Tkachov  kyrylo.tkac...@arm.com
 
PR fortran/63701
* trans-expr.c (gfc_get_tree_for_caf_expr): Initialise found to 
 false.init-found.patch

OK, thanks for the patch.

FX

[Patch AArch64] Fix up BSL expander for floating point types

2014-11-11 Thread James Greenhalgh

Hi,

As Ramana hinted here:
  https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00607.html

There are two issues with the way we've defined our BSL pattern. We pun
types around in a way that is scary and quite likely unsafe, and we
haven't canonicalized the pattern so combine is unlikely to pick it up.

This patch fixes both of these issues and adds testcases to ensure we are
picking up the combine opportunity.

I've bootstrapped and tested this on aarch64-none-linux-gnu and
cross-tested it for aarch64-none-elf.

OK?

Cheers,
James

---
gcc/

2014-11-11  James Greenhalgh  james.greenha...@arm.com

* config/aarch64/aarch64-simd.md
(aarch64_simd_bslmode_internal): Remove float cases, canonicalize.
(aarch64_simd_bslmode): Add gen_lowpart expressions where we
are punning between float vectors and integer vectors.

gcc/testsuite/

2014-11-11  James Greenhalgh  james.greenha...@arm.com

* gcc.target/aarch64/vbslq_f64_1.c: New.
* gcc.target/aarch64/vbslq_f64_2.c: Likewise.
* gcc.target/aarch64/vbslq_u64_1.c: Likewise.
* gcc.target/aarch64/vbslq_u64_2.c: Likewise.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index ef196e4b6fb39c0d2fd9ebfee76abab8369b1e92..f7012ecab07c1b38836e949c2f4e5bd0c7939b5c 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1924,15 +1924,15 @@ (define_insn aarch64_reduc_maxmin_uns
 ;; bif op0, op1, mask
 
 (define_insn aarch64_simd_bslmode_internal
-  [(set (match_operand:VALLDIF 0 register_operand		=w,w,w)
-	(ior:VALLDIF
-	   (and:VALLDIF
-	 (match_operand:V_cmp_result 1 register_operand	 0,w,w)
-	 (match_operand:VALLDIF 2 register_operand	 w,w,0))
-	   (and:VALLDIF
+  [(set (match_operand:VSDQ_I_DI 0 register_operand		=w,w,w)
+	(ior:VSDQ_I_DI
+	   (and:VSDQ_I_DI
 	 (not:V_cmp_result
-		(match_dup:V_cmp_result 1))
-	 (match_operand:VALLDIF 3 register_operand	 w,0,w))
+	   (match_operand:V_cmp_result 1 register_operand	 0,w,w))
+	 (match_operand:VSDQ_I_DI 3 register_operand	 w,0,w))
+	   (and:VSDQ_I_DI
+	 (match_dup:V_cmp_result 1)
+	 (match_operand:VSDQ_I_DI 2 register_operand	 w,w,0))
 	))]
   TARGET_SIMD
   @
@@ -1950,9 +1950,21 @@ (define_expand aarch64_simd_bslmode
  TARGET_SIMD
 {
   /* We can't alias operands together if they have different modes.  */
+  rtx tmp = operands[0];
+  if (FLOAT_MODE_P (MODEmode))
+{
+  operands[2] = gen_lowpart (V_cmp_resultmode, operands[2]);
+  operands[3] = gen_lowpart (V_cmp_resultmode, operands[3]);
+  tmp = gen_reg_rtx (V_cmp_resultmode);
+}
   operands[1] = gen_lowpart (V_cmp_resultmode, operands[1]);
-  emit_insn (gen_aarch64_simd_bslmode_internal (operands[0], operands[1],
-		  operands[2], operands[3]));
+  emit_insn (gen_aarch64_simd_bslv_cmp_result_internal (tmp,
+			  operands[1],
+			  operands[2],
+			  operands[3]));
+  if (tmp != operands[0])
+emit_move_insn (operands[0], gen_lowpart (MODEmode, tmp));
+
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/aarch64/vbslq_f64_1.c b/gcc/testsuite/gcc.target/aarch64/vbslq_f64_1.c
new file mode 100644
index 000..7b0e8f9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vbslq_f64_1.c
@@ -0,0 +1,20 @@
+/* Test vbslq_f64 can be folded.  */
+/* { dg-do assemble } */
+/* { dg-options --save-temps -O3 } */
+
+#include arm_neon.h
+
+/* Folds to ret.  */
+
+float32x4_t
+fold_me (float32x4_t a, float32x4_t b)
+{
+  uint32x4_t mask = {-1, -1, -1, -1};
+  return vbslq_f32 (mask, a, b);
+}
+
+/* { dg-final { scan-assembler-not bsl\\tv } } */
+/* { dg-final { scan-assembler-not bit\\tv } } */
+/* { dg-final { scan-assembler-not bif\\tv } } */
+
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vbslq_f64_2.c b/gcc/testsuite/gcc.target/aarch64/vbslq_f64_2.c
new file mode 100644
index 000..1dca90d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vbslq_f64_2.c
@@ -0,0 +1,23 @@
+/* Test vbslq_f64 can be folded.  */
+/* { dg-do assemble } */
+/* { dg-options --save-temps -O3 } */
+
+#include arm_neon.h
+
+/* Should fold out one half of the BSL, leaving just a BIC.  */
+
+float32x4_t
+half_fold_me (uint32x4_t mask)
+{
+  float32x4_t a = {0.0, 0.0, 0.0, 0.0};
+  float32x4_t b = {2.0, 4.0, 8.0, 16.0};
+  return vbslq_f32 (mask, a, b);
+
+}
+
+/* { dg-final { scan-assembler-not bsl\\tv } } */
+/* { dg-final { scan-assembler-not bit\\tv } } */
+/* { dg-final { scan-assembler-not bif\\tv } } */
+/* { dg-final { scan-assembler bic\\tv } } */
+
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vbslq_u64_1.c b/gcc/testsuite/gcc.target/aarch64/vbslq_u64_1.c
new file mode 100644
index 000..9c61d1a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vbslq_u64_1.c
@@ -0,0 +1,16 @@
+/* Test if a BSL-like instruction can be generated from a C idiom.  */
+/* { dg-do assemble } */
+/* { dg-options --save-temps -O3 } 

[x86, 7/n] Replace builtins with vector extensions

2014-11-11 Thread Marc Glisse

Hello,

last patch, extending == and  to size 256. Regtested as usual.

Is the branch ready to be merged into trunk?

--
Marc GlisseIndex: ChangeLog.x86-intrinsics-ext
===
--- ChangeLog.x86-intrinsics-ext(revision 217319)
+++ ChangeLog.x86-intrinsics-ext(working copy)
@@ -1,10 +1,17 @@
+2014-11-11  Marc Glisse  marc.gli...@inria.fr
+
+   * config/i386/avx2intrin.h (_mm256_cmpeq_epi8, _mm256_cmpeq_epi16,
+   _mm256_cmpeq_epi32, _mm256_cmpeq_epi64, _mm256_cmpgt_epi8,
+   _mm256_cmpgt_epi16, _mm256_cmpgt_epi32, _mm256_cmpgt_epi64):
+   Use vector extensions instead of builtins.
+
 2014-11-10  Marc Glisse  marc.gli...@inria.fr
 
* config/i386/emmintrin.h (_mm_cmpeq_epi8, _mm_cmpeq_epi16,
_mm_cmpeq_epi32, _mm_cmplt_epi8, _mm_cmplt_epi16, _mm_cmplt_epi32,
_mm_cmpgt_epi8, _mm_cmpgt_epi16, _mm_cmpgt_epi32): Use vector
extensions instead of builtins.
* config/i386/smmintrin.h (_mm_cmpeq_epi64, _mm_cmpgt_epi64):
Likewise.
 
 2014-11-10  Marc Glisse  marc.gli...@inria.fr
Index: config/i386/avx2intrin.h
===
--- config/i386/avx2intrin.h(revision 217318)
+++ config/i386/avx2intrin.h(working copy)
@@ -223,73 +223,70 @@ _mm256_blend_epi16 (__m256i __X, __m256i
 #else
 #define _mm256_blend_epi16(X, Y, M)\
   ((__m256i) __builtin_ia32_pblendw256 ((__v16hi)(__m256i)(X), \
(__v16hi)(__m256i)(Y), (int)(M)))
 #endif
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cmpeq_epi8 (__m256i __A, __m256i __B)
 {
-  return (__m256i)__builtin_ia32_pcmpeqb256 ((__v32qi)__A, (__v32qi)__B);
+  return (__m256i) ((__v32qi)__A == (__v32qi)__B);
 }
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cmpeq_epi16 (__m256i __A, __m256i __B)
 {
-  return (__m256i)__builtin_ia32_pcmpeqw256 ((__v16hi)__A, (__v16hi)__B);
+  return (__m256i) ((__v16hi)__A == (__v16hi)__B);
 }
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cmpeq_epi32 (__m256i __A, __m256i __B)
 {
-  return (__m256i)__builtin_ia32_pcmpeqd256 ((__v8si)__A, (__v8si)__B);
+  return (__m256i) ((__v8si)__A == (__v8si)__B);
 }
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cmpeq_epi64 (__m256i __A, __m256i __B)
 {
-  return (__m256i)__builtin_ia32_pcmpeqq256 ((__v4di)__A, (__v4di)__B);
+  return (__m256i) ((__v4di)__A == (__v4di)__B);
 }
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cmpgt_epi8 (__m256i __A, __m256i __B)
 {
-  return (__m256i)__builtin_ia32_pcmpgtb256 ((__v32qi)__A,
-(__v32qi)__B);
+  return (__m256i) ((__v32qi)__A  (__v32qi)__B);
 }
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cmpgt_epi16 (__m256i __A, __m256i __B)
 {
-  return (__m256i)__builtin_ia32_pcmpgtw256 ((__v16hi)__A,
-(__v16hi)__B);
+  return (__m256i) ((__v16hi)__A  (__v16hi)__B);
 }
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cmpgt_epi32 (__m256i __A, __m256i __B)
 {
-  return (__m256i)__builtin_ia32_pcmpgtd256 ((__v8si)__A,
-(__v8si)__B);
+  return (__m256i) ((__v8si)__A  (__v8si)__B);
 }
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cmpgt_epi64 (__m256i __A, __m256i __B)
 {
-  return (__m256i)__builtin_ia32_pcmpgtq256 ((__v4di)__A, (__v4di)__B);
+  return (__m256i) ((__v4di)__A  (__v4di)__B);
 }
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_hadd_epi16 (__m256i __X, __m256i __Y)
 {
   return (__m256i) __builtin_ia32_phaddw256 ((__v16hi)__X,
 (__v16hi)__Y);
 }
 


Re: [C PATCH] warn for empty struct -Wc++-compat

2014-11-11 Thread Marek Polacek
On Tue, Nov 11, 2014 at 12:13:32PM +0100, Marc Glisse wrote:
 On Tue, 11 Nov 2014, Marek Polacek wrote:
 
 @@ -7506,12 +7509,19 @@
  /* Finish up struct info used by -Wc++-compat.  */
 
  static void
 -warn_cxx_compat_finish_struct (tree fieldlist)
 +warn_cxx_compat_finish_struct (tree fieldlist, location_t record_loc)
  {
unsigned int ix;
tree x;
struct c_binding *b;
 
 +  if (fieldlist == NULL_TREE)
 +{
 +  warning_at (record_loc, OPT_Wc___compat,
 + empty %s has size 0 in C, 1 in C++,
 + (struct_parse_info-code == RECORD_TYPE) ? struct : 
 union);
 +}
 +
 
 I think this won't work well wrt translations, so you need to have
 an if here.  See the pedwarns at the beginning of finish_struct.
 
 Do keywords like struct/union really require translation?

C keywords don't require translation, but you always need to have
complete sentences in diagnostics so I better pointed it out.  Joseph
would know better than me though.

Marek


Re: [PATCH] libstdc++ - Add xmethods for associative containers (ordered and unordered)

2014-11-11 Thread Jonathan Wakely

On 10/11/14 21:49 +, Jonathan Wakely wrote:

On 09/11/14 16:00 -0800, Siva Chandra wrote:

Hello,

Attached is a patch which adds xmethods for the associative containers
(set, map, multiset and multimap) and their unordered versions. I
think the GDB Python API is not rich enough to implement xmethods for
the more interesting methods like find, count etc. The attached patch
only implements xmethods for size and empty. That way, it is a fairly
straightforward patch.


This looks fine, I'll commit it soon. Thanks.


Committed to trunk.



Re: [gimple-classes, committed 4/6] tree-ssa-tail-merge.c: Use gassign

2014-11-11 Thread Bernd Schmidt

On 11/11/2014 09:30 AM, Eric Botcazou wrote:

I just don't like all the as_a/is_a stuff enforced everywhere,
it means more typing, more temporaries, more indentation.
So, as I view it, instead of the checks being done cheaply (yes, I think
the gimple checking as we have right now is very cheap) under the
hood by the accessors (gimple_assign_{lhs,rhs1} etc.), those changes
put the burden on the developers, who has to check that manually through
the as_a/is_a stuff everywhere, more typing and uglier syntax.
I just don't see that as a step forward, instead a huge step backwards.
But perhaps I'm alone with this.


IMO that's the sort of things some of us were afraid of when the C++ switch
was being discussed and IIRC we were told this would not happen...


I'm with both of you on this.


Bernd




[PATCH][AArch64] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P

2014-11-11 Thread Kyrill Tkachov

Hi all,

This is the aarch64 implementation of the macro fusion hook, used to 
fuse mov+movk instructions together.


A new field is declared in the tuning struct and as we add more fuseable 
ops in the future we will fill in more bits in the fuseable_ops field.


Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

2014-11-11  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* config/aarch64/aarch64-protos.h (struct tune_params): Add
fuseable_ops field.
* config/aarch64/aarch64.c (generic_tunings): Specify fuseable_ops.
(cortexa53_tunings): Likewise.
(cortexa57_tunings): Likewise.
(thunderx_tunings): Likewise.
(aarch64_macro_fusion_p): New function.
(aarch_macro_fusion_pair_p): Likewise.
(TARGET_SCHED_MACRO_FUSION_P): Define.
(TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
(AARCH64_FUSE_MOV_MOVK): Likewise.
(AARCH64_FUSE_NOTHING): Likewise.commit 3181b0988eed091c8b1ead7a6381c6f9aee7774e
Author: Kyrylo Tkachov kyrylo.tkac...@arm.com
Date:   Tue Oct 21 10:36:48 2014 +0100

[AArch64] Implement TARGET_MACRO_FUSION

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 810644c..d3d295d 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -170,6 +170,7 @@ struct tune_params
   const struct cpu_vector_cost *const vec_costs;
   const int memmov_cost;
   const int issue_rate;
+  const unsigned int fuseable_ops;
 };
 
 HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9aeac7c..96f6c47 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -299,6 +299,9 @@ static const struct cpu_vector_cost cortexa57_vector_cost =
   NAMED_PARAM (cond_not_taken_branch_cost, 1)
 };
 
+#define AARCH64_FUSE_NOTHING	(0)
+#define AARCH64_FUSE_MOV_MOVK	(1  0)
+
 #if HAVE_DESIGNATED_INITIALIZERS  GCC_VERSION = 2007
 __extension__
 #endif
@@ -309,7 +312,8 @@ static const struct tune_params generic_tunings =
   generic_regmove_cost,
   generic_vector_cost,
   NAMED_PARAM (memmov_cost, 4),
-  NAMED_PARAM (issue_rate, 2)
+  NAMED_PARAM (issue_rate, 2),
+  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_NOTHING)
 };
 
 static const struct tune_params cortexa53_tunings =
@@ -319,7 +323,8 @@ static const struct tune_params cortexa53_tunings =
   cortexa53_regmove_cost,
   generic_vector_cost,
   NAMED_PARAM (memmov_cost, 4),
-  NAMED_PARAM (issue_rate, 2)
+  NAMED_PARAM (issue_rate, 2),
+  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_MOV_MOVK)
 };
 
 static const struct tune_params cortexa57_tunings =
@@ -329,7 +334,8 @@ static const struct tune_params cortexa57_tunings =
   cortexa57_regmove_cost,
   cortexa57_vector_cost,
   NAMED_PARAM (memmov_cost, 4),
-  NAMED_PARAM (issue_rate, 3)
+  NAMED_PARAM (issue_rate, 3),
+  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_MOV_MOVK)
 };
 
 static const struct tune_params thunderx_tunings =
@@ -339,7 +345,8 @@ static const struct tune_params thunderx_tunings =
   thunderx_regmove_cost,
   generic_vector_cost,
   NAMED_PARAM (memmov_cost, 6),
-  NAMED_PARAM (issue_rate, 2)
+  NAMED_PARAM (issue_rate, 2),
+  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_NOTHING)
 };
 
 /* A processor implementing AArch64.  */
@@ -10017,6 +10024,48 @@ aarch64_use_by_pieces_infrastructure_p (unsigned int size,
   return default_use_by_pieces_infrastructure_p (size, align, op, speed_p);
 }
 
+static bool
+aarch64_macro_fusion_p (void)
+{
+  return aarch64_tune_params-fuseable_ops != AARCH64_FUSE_NOTHING;
+}
+
+static bool
+aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
+{
+  rtx set_dest;
+  rtx prev_set = single_set (prev);
+  rtx curr_set = single_set (curr);
+
+  if (!prev_set
+  || !curr_set)
+return false;
+
+  if (any_condjump_p (curr))
+return false;
+
+  if (!aarch64_macro_fusion_p ())
+return false;
+
+  if (aarch64_tune_params-fuseable_ops  AARCH64_FUSE_MOV_MOVK)
+{
+  /* We are trying to fuse
+ mov imm / movk imm
+ instructions as a group that gets scheduled together.  */
+
+  set_dest = SET_DEST (curr_set);
+
+  return GET_CODE (set_dest) == ZERO_EXTRACT
+  CONST_INT_P (SET_SRC (curr_set))
+  CONST_INT_P (SET_SRC (prev_set))
+  REG_P (XEXP (set_dest, 0))
+  REG_P (SET_DEST (prev_set))
+  REGNO (XEXP (set_dest, 0)) == REGNO (SET_DEST (prev_set));
+}
+
+  return false;
+}
+
 #undef TARGET_ADDRESS_COST
 #define TARGET_ADDRESS_COST aarch64_address_cost
 
@@ -10273,6 +10322,12 @@ aarch64_use_by_pieces_infrastructure_p (unsigned int size,
 #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
   aarch64_use_by_pieces_infrastructure_p
 
+#undef TARGET_SCHED_MACRO_FUSION_P
+#define TARGET_SCHED_MACRO_FUSION_P aarch64_macro_fusion_p
+
+#undef TARGET_SCHED_MACRO_FUSION_PAIR_P
+#define TARGET_SCHED_MACRO_FUSION_PAIR_P aarch_macro_fusion_pair_p
+
 struct gcc_target targetm = 

[PATCH][ARM] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P

2014-11-11 Thread Kyrill Tkachov

Hi all,

This is the arm implementation of the macro fusion hook.
It tries to fuse movw+movt operations together. It also tries to take 
lo_sum RTXs into account since those generate movt instructions as well.


Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2014-11-11  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* config/arm/arm-protos.h (tune_params): Add fuseable_ops field.
* config/arm/arm.c (arm_macro_fusion_p): New function.
(arm_macro_fusion_pair_p): Likewise.
(TARGET_SCHED_MACRO_FUSION_P): Define.
(TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
(ARM_FUSE_NOTHING): Likewise.
(ARM_FUSE_MOVW_MOVT): Likewise.
(arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,
arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,
arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,
arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune,
arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune
arm_cortex_a5_tune): Specify fuseable_ops value.diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index a37aa80..98e3cf0 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -281,6 +281,8 @@ struct tune_params
   bool string_ops_prefer_neon;
   /* Maximum number of instructions to inline calls to memset.  */
   int max_insns_inline_memset;
+  /* Bitfield encoding the fuseable pairs of instructions.  */
+  unsigned int fuseable_ops;
 };
 
 extern const struct tune_params *current_tune;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 3f2ddd4..40df4c0 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -258,6 +258,7 @@ static tree arm_build_builtin_va_list (void);
 static void arm_expand_builtin_va_start (tree, rtx);
 static tree arm_gimplify_va_arg_expr (tree, tree, gimple_seq *, gimple_seq *);
 static void arm_option_override (void);
+static bool arm_macro_fusion_p (void);
 static unsigned HOST_WIDE_INT arm_shift_truncation_mask (machine_mode);
 static bool arm_cannot_copy_insn_p (rtx_insn *);
 static int arm_issue_rate (void);
@@ -296,6 +297,7 @@ static int arm_default_branch_cost (bool, bool);
 static int arm_cortex_a5_branch_cost (bool, bool);
 static int arm_cortex_m_branch_cost (bool, bool);
 
+static bool aarch_macro_fusion_pair_p (rtx_insn*, rtx_insn*);
 static bool arm_vectorize_vec_perm_const_ok (machine_mode vmode,
 	 const unsigned char *sel);
 
@@ -404,6 +406,12 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef  TARGET_COMP_TYPE_ATTRIBUTES
 #define TARGET_COMP_TYPE_ATTRIBUTES arm_comp_type_attributes
 
+#undef TARGET_SCHED_MACRO_FUSION_P
+#define TARGET_SCHED_MACRO_FUSION_P arm_macro_fusion_p
+
+#undef TARGET_SCHED_MACRO_FUSION_PAIR_P
+#define TARGET_SCHED_MACRO_FUSION_PAIR_P aarch_macro_fusion_pair_p
+
 #undef  TARGET_SET_DEFAULT_TYPE_ATTRIBUTES
 #define TARGET_SET_DEFAULT_TYPE_ATTRIBUTES arm_set_default_type_attributes
 
@@ -1710,6 +1718,9 @@ const struct cpu_cost_table v7m_extra_costs =
   }
 };
 
+#define ARM_FUSE_NOTHING	(0)
+#define ARM_FUSE_MOVW_MOVT	(1  0)
+
 const struct tune_params arm_slowmul_tune =
 {
   arm_slowmul_rtx_costs,
@@ -1726,7 +1737,8 @@ const struct tune_params arm_slowmul_tune =
   false,/* Prefer Neon for 64-bits bitops.  */
   false, false, /* Prefer 32-bit encodings.  */
   false,	/* Prefer Neon for stringops.  */
-  8		/* Maximum insns to inline memset.  */
+  8,		/* Maximum insns to inline memset.  */
+  ARM_FUSE_NOTHING/* Fuseable pairs of instructions.  */
 };
 
 const struct tune_params arm_fastmul_tune =
@@ -1745,7 +1757,8 @@ const struct tune_params arm_fastmul_tune =
   false,/* Prefer Neon for 64-bits bitops.  */
   false, false, /* Prefer 32-bit encodings.  */
   false,	/* Prefer Neon for stringops.  */
-  8		/* Maximum insns to inline memset.  */
+  8,		/* Maximum insns to inline memset.  */
+  ARM_FUSE_NOTHING/* Fuseable pairs of instructions.  */
 };
 
 /* StrongARM has early execution of branches, so a sequence that is worth
@@ -1767,7 +1780,8 @@ const struct tune_params arm_strongarm_tune =
   false,/* Prefer Neon for 64-bits bitops.  */
   false, false, /* Prefer 32-bit encodings.  */
   false,	/* Prefer Neon for stringops.  */
-  8		/* Maximum insns to inline memset.  */
+  8,		/* Maximum insns to inline memset.  */
+  ARM_FUSE_NOTHING/* Fuseable pairs of instructions.  */
 };
 
 const struct tune_params arm_xscale_tune =
@@ -1786,7 +1800,8 @@ const struct tune_params arm_xscale_tune =
   false,/* Prefer Neon for 64-bits bitops.  */
   false, false, /* Prefer 32-bit encodings.  */
   false,	/* Prefer Neon for stringops.  */

Re: [7/7] nvptx testsuite patches: Return addresses

2014-11-11 Thread Bernd Schmidt

On 11/10/2014 09:19 PM, H.J. Lu wrote:

I checked in this patch to revert the accidental checkin.


Sorry about that, and thanks for fixing it.


Bernd



Fix PR ada/42978

2014-11-11 Thread Eric Botcazou
This makes it so that gnatmake echoes the full command line passed to ranlib.

Tested on x86_64-suse-linux, applied on the mainline.


2014-11-11  Simon Wright  si...@pushface.org

PR ada/42978
* mlib-utl.adb (ar): Output the options passed to ranlib.


-- 
Eric BotcazouIndex: mlib-utl.adb
===
--- mlib-utl.adb	(revision 217259)
+++ mlib-utl.adb	(working copy)
@@ -282,6 +282,10 @@ package body MLib.Utl is
  if not Opt.Quiet_Output then
 Write_Str  (Ranlib_Name.all);
 Write_Char (' ');
+for J in Ranlib_Options'Range loop
+   Write_Str  (Ranlib_Options (J).all);
+   Write_Char (' ');
+end loop;
 Write_Line (Arguments (Ar_Options'Length + 1).all);
  end if;
 

[PATCH][ARM/AArch64] Improve modeled latency between FP operations and FP-GP register moves

2014-11-11 Thread Kyrill Tkachov

Hi all,

This patch models the latency of moves between FP and GP registers on 
the A15 and A57 a bit more accurately by splitting the reservations for 
FP-GP and GP-FP moves and adding an appropriate bypass.


Bootstrapped and tested on arm-none-linux-gnueabihf and 
aarch64-none-linux-gnu.


Ok for trunk?

Thanks,
Kyrill

2014-11-11  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* config/arm/cortex-a15-neon.md (cortex_a15_vfp_to_from_gp):
Split into...
(cortex_a15_gp_to_vfp): ...This.
(cortex_a15_fp_to_gp): ...And this.
Define and comment bypass from vfp operations to fp-gp moves.commit c176d3e691f470598a02507fa75a8294da954c3f
Author: Kyrylo Tkachov kyrylo.tkac...@arm.com
Date:   Fri Jun 13 11:11:16 2014 +0100

[ARM/AArch64] Model FP-GP move stalls

diff --git a/gcc/config/arm/cortex-a15-neon.md b/gcc/config/arm/cortex-a15-neon.md
index 02d4a53..bc09cd6 100644
--- a/gcc/config/arm/cortex-a15-neon.md
+++ b/gcc/config/arm/cortex-a15-neon.md
@@ -655,10 +655,20 @@ (define_insn_reservation cortex_a15_vfp_cpys 4
(eq_attr type fmov))
   ca15_issue1,ca15_cx_perm)
 
-(define_insn_reservation cortex_a15_vfp_to_from_gp 5
+(define_insn_reservation cortex_a15_gp_to_vfp 5
   (and (eq_attr tune cortexa15)
-   (eq_attr type f_mcr, f_mcrr, f_mrc, f_mrrc))
-  ca15_issue1,ca15_ls1+ca15_ls2)
+   (eq_attr type f_mcr, f_mcrr))
+  ca15_issue1,ca15_ls)
+
+(define_insn_reservation cortex_a15_mov_vfp_to_gp 5
+  (and (eq_attr tune cortexa15)
+   (eq_attr type f_mrc, f_mrrc))
+  ca15_issue1,ca15_ls)
+
+;; Moves from floating point registers to general purpose registers
+;; induce additional latency.
+(define_bypass 10 cortex_a15_vfp*, cortex_a15_neon*, cortex_a15_gp_to_vfp cortex_a15_mov_vfp_to_gp)
+
 
 (define_insn_reservation cortex_a15_vfp_ariths 7
   (and (eq_attr tune cortexa15)

Re: [PATCH] c++ify sreal

2014-11-11 Thread Andrew Pinski
On Tue, Nov 11, 2014 at 1:23 AM, Marc Glisse marc.gli...@inria.fr wrote:
 On Tue, 11 Nov 2014, Jakub Jelinek wrote:

 On Tue, Nov 11, 2014 at 08:51:41AM +0100, Uros Bizjak wrote:

 Hello!

 do $subject, and cleanup for always 64 bit hwi.


 bootstrapped + regtested x86_64-unknown-linux-gnu, ok?


 Ok.  Can you please replace remaining HOST_WIDE_INT
 vestiges in there with [u]int64_t please?



 This patch breaks the build on debian 6.0:

 ../../gcc/sreal.c: In member function āint64_t sreal::to_int() constā:
 ../../gcc/sreal.c:159: error: āINT64_MAXā was not declared in this scope


 Index: system.h
 ===
 --- system.h(revision 217338)
 +++ system.h(working copy)
 @@ -27,6 +27,7 @@
 event inttypes.h gets pulled in by another header it is already
 defined.  */
  #define __STDC_FORMAT_MACROS
 +#define __STDC_LIMIT_MACROS

  /* We must include stdarg.h before stdio.h.  */
  #include stdarg.h


 Still, I don't believe it will be portable everywhere.
 Can't you use
 INTTYPE_MAXIMUM (int64_t) instead of INT64_MAX?  We already use that
 in GCC...


 We could also start using the standard C++ mechanism (numeric_limits).

Except int64_t does not have to be defined for a C++ implementation.

Thanks,
Andrew


 (nothing wrong with INTTYPE_MAXIMUM, just an alternative)

 --
 Marc Glisse


[PATCH] Look through widening type conversions for possible edge assertions

2014-11-11 Thread Patrick Palka
This patch is a replacement for the 2nd VRP refactoring patch.  It
simply teaches VRP to look through widening type conversions when
finding suitable edge assertions, e.g.

bool p = x != y;
int q = (int) p;
if (q == 0) // new edge assert: p == 0 and therefore x == y

The new testcase requires that such an edge assertion be inserted.

Full bootstrap + regtest on x86_64-unknown-linux-gnu in progress.  Does
the patch look OK for trunk if no new regressions?

2014-11-11  Patrick Palka  ppa...@gcc.gnu.org

gcc/
* tree-vrp.c (register_edge_assert_for): Look through
widening type conversions for posible edge assertions.

gcc/testsuite/
* gcc.dg/vrp-1.c: New testcase.
---
 gcc/testsuite/gcc.dg/vrp-1.c | 31 +++
 gcc/tree-vrp.c   | 22 ++
 2 files changed, 53 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vrp-1.c

diff --git a/gcc/testsuite/gcc.dg/vrp-1.c b/gcc/testsuite/gcc.dg/vrp-1.c
new file mode 100644
index 000..df5334e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vrp-1.c
@@ -0,0 +1,31 @@
+/* { dg-options -O2 } */
+
+void runtime_error (void) __attribute__ ((noreturn));
+void compiletime_error (void) __attribute__ ((noreturn, error ()));
+
+static void
+compiletime_check_equals_1 (int *x, int y)
+{
+  int __p = *x != y;
+  if (__builtin_constant_p (__p)  __p)
+compiletime_error ();
+  if (__p)
+runtime_error ();
+}
+
+static void
+compiletime_check_equals_2 (int *x, int y)
+{
+  int __p = *x != y;
+  if (__builtin_constant_p (__p)  __p)
+compiletime_error (); /* { dg-error call to } */
+  if (__p)
+runtime_error ();
+}
+
+void
+foo (int *x)
+{
+  compiletime_check_equals_1 (x, 5);
+  compiletime_check_equals_2 (x, 10);
+}
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index f0a4382..979ab44 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -5634,6 +5634,7 @@ register_edge_assert_for (tree name, edge e, 
gimple_stmt_iterator si,
  the value zero or one, then we may be able to assert values
  for SSA_NAMEs which flow into COND.  */
 
+
   /* In the case of NAME == 1 or NAME != 0, for BIT_AND_EXPR defining
  statement of NAME we can assert both operands of the BIT_AND_EXPR
  have nonzero value.  */
@@ -5673,6 +5674,27 @@ register_edge_assert_for (tree name, edge e, 
gimple_stmt_iterator si,
  register_edge_assert_for_1 (op1, EQ_EXPR, e, si);
}
 }
+
+  /* In the case of NAME != 0 or NAME == 0, if NAME's defining statement
+ is a widening type conversion then we can assert that NAME's
+ RHS is accordingly nonzero or zero.  */
+  if ((comp_code == EQ_EXPR || comp_code == NE_EXPR)
+   integer_zerop (val))
+{
+  gimple def_stmt = SSA_NAME_DEF_STMT (name);
+  if (is_gimple_assign (def_stmt))
+   {
+ enum tree_code def_code = gimple_assign_rhs_code (def_stmt);
+ if (CONVERT_EXPR_CODE_P (def_code))
+   {
+ tree lhs = gimple_assign_lhs (def_stmt);
+ tree rhs = gimple_assign_rhs1 (def_stmt);
+ if (TYPE_PRECISION (TREE_TYPE (lhs))
+ = TYPE_PRECISION (TREE_TYPE (rhs)))
+   register_edge_assert_for_1 (rhs, comp_code, e, si);
+   }
+   }
+}
 }
 
 
-- 
2.2.0.rc1.16.g6066a7e



Re: [PATCH 2/2] Simplify and extend VRP edge-assertion code

2014-11-11 Thread Richard Biener
On Tue, Nov 11, 2014 at 4:52 AM, Patrick Palka patr...@parcs.ath.cx wrote:
 This patch refactors the VRP edge-assertion code to make it always
 traverse SSA-name definitions in order to find suitable edge assertions
 to insert.  Currently SSA-name definitions get traversed only when the
 LHS of the original conditional is a bitwise AND or OR operation which
 seems like a strange restriction.  We should always try to traverse
 the SSA-name definitions inside the conditional, in particular for
 conditionals with the form:

   int p = x COMP y;
   if (p != 0) -- edge assertion: x COMP y

Of course this specific case should have been simplified to

 if (x COMP y)

if that comparison cannot trap and -fnon-call-exceptions is in effect.

 To achieve this the patch merges the mutually recursive functions
 register_edge_assert_for_1() and register_edge_assert_for_2() into a
 single recursive function, register_edge_assert_for_1().  In doing so,
 code duplication can be reduced and at the same time the more general
 logic allows VRP to detect more useful edge assertions.

 The recursion of the function register_edge_assert_for_1() is bounded by
 a new 'limit' argument which is arbitrarily set to 4 so that at most 4
 levels of SSA-name definitions will be traversed per conditional.
 (Incidentally this hard recursion limit makes the related fix for PR
 57685 unnecessary.)

 A test in uninit-pred-9_b.c now has to be marked xfail because in it VRP
 (correctly) transforms the statement

   # prephitmp_35 = PHI pretmp_9(8), _28(10)
   into
   # prephitmp_35 = PHI pretmp_9(8), 1(10)

 and the uninit pass doesn't properly handle such PHIs containing a
 constant value as one of its arguments -- so a bogus uninit warning is
 now emitted.

Did you try fixing that?  It seems to me a constant should be easy
to handle?

 Full bootstrap + regtesting on x86_64-unknown-linux-gnu is in progress.
 Is it OK to commit if testing finishes with no new regressions?

Ok.

Thanks,
Richard.

 2014-11-11  Patrick Palka  patr...@parcs.ath.cx

 gcc/
 * tree-vrp.c (extract_code_and_val_from_cond_with_ops): Ensure
 that NAME always equals COND_OP0 or COND_OP1.
 (register_edge_assert_for, register_edge_assert_for_1,
 register_edge_assert_for_2): Refactor and consolidate
 edge-assertion logic into ...
 (register_edge_assert_for_2): ... here.  Add LIMIT parameter.
 Rename to ...
 (register_edge_assert_for_1): ... this.

 gcc/testsuite/
 * gcc.dg/vrp-1.c: New testcase.
 * gcc.dg/vrp-2.c: New testcase.
 * gcc.dg/uninit-pred-9_b.c: xfail test on line 24.
 ---
  gcc/testsuite/gcc.dg/uninit-pred-9_b.c |   2 +-
  gcc/testsuite/gcc.dg/vrp-1.c   |  31 
  gcc/testsuite/gcc.dg/vrp-2.c   |  78 ++
  gcc/tree-vrp.c | 261 
 +++--
  4 files changed, 231 insertions(+), 141 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/vrp-1.c
  create mode 100644 gcc/testsuite/gcc.dg/vrp-2.c

 diff --git a/gcc/testsuite/gcc.dg/uninit-pred-9_b.c 
 b/gcc/testsuite/gcc.dg/uninit-pred-9_b.c
 index d9ae75e..555ec20 100644
 --- a/gcc/testsuite/gcc.dg/uninit-pred-9_b.c
 +++ b/gcc/testsuite/gcc.dg/uninit-pred-9_b.c
 @@ -21,7 +21,7 @@ int foo (int n, int l, int m, int r)
blah(v); /* { dg-bogus uninitialized bogus warning } */

if ( (n = 8)   (m  99)   (r  19) )
 -  blah(v); /* { dg-bogus uninitialized bogus warning } */
 +  blah(v); /* { dg-bogus uninitialized bogus warning { xfail *-*-* } 
 } */

return 0;
  }
 diff --git a/gcc/testsuite/gcc.dg/vrp-1.c b/gcc/testsuite/gcc.dg/vrp-1.c
 new file mode 100644
 index 000..df5334e
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/vrp-1.c
 @@ -0,0 +1,31 @@
 +/* { dg-options -O2 } */
 +
 +void runtime_error (void) __attribute__ ((noreturn));
 +void compiletime_error (void) __attribute__ ((noreturn, error ()));
 +
 +static void
 +compiletime_check_equals_1 (int *x, int y)
 +{
 +  int __p = *x != y;
 +  if (__builtin_constant_p (__p)  __p)
 +compiletime_error ();
 +  if (__p)
 +runtime_error ();
 +}
 +
 +static void
 +compiletime_check_equals_2 (int *x, int y)
 +{
 +  int __p = *x != y;
 +  if (__builtin_constant_p (__p)  __p)
 +compiletime_error (); /* { dg-error call to } */
 +  if (__p)
 +runtime_error ();
 +}
 +
 +void
 +foo (int *x)
 +{
 +  compiletime_check_equals_1 (x, 5);
 +  compiletime_check_equals_2 (x, 10);
 +}
 diff --git a/gcc/testsuite/gcc.dg/vrp-2.c b/gcc/testsuite/gcc.dg/vrp-2.c
 new file mode 100644
 index 000..5757c2f
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/vrp-2.c
 @@ -0,0 +1,78 @@
 +/* { dg-options -O2 } */
 +
 +void runtime_error (void) __attribute__ ((noreturn));
 +void compiletime_error (void) __attribute__ ((noreturn, error ()));
 +
 +void dummy (int x);
 +
 +void
 +bar (int x, int y, int z)
 +{
 +  int p = ~(x  y  z) == 37;
 +  if (p)
 +{
 +  if (!x || !y || !z)
 +   compiletime_error (); /* { 

Re: [PATCH] c++ify sreal

2014-11-11 Thread Richard Biener
On Tue, Nov 11, 2014 at 1:08 PM, Andrew Pinski pins...@gmail.com wrote:
 On Tue, Nov 11, 2014 at 1:23 AM, Marc Glisse marc.gli...@inria.fr wrote:
 On Tue, 11 Nov 2014, Jakub Jelinek wrote:

 On Tue, Nov 11, 2014 at 08:51:41AM +0100, Uros Bizjak wrote:

 Hello!

 do $subject, and cleanup for always 64 bit hwi.


 bootstrapped + regtested x86_64-unknown-linux-gnu, ok?


 Ok.  Can you please replace remaining HOST_WIDE_INT
 vestiges in there with [u]int64_t please?



 This patch breaks the build on debian 6.0:

 ../../gcc/sreal.c: In member function āint64_t sreal::to_int() constā:
 ../../gcc/sreal.c:159: error: āINT64_MAXā was not declared in this scope


 Index: system.h
 ===
 --- system.h(revision 217338)
 +++ system.h(working copy)
 @@ -27,6 +27,7 @@
 event inttypes.h gets pulled in by another header it is already
 defined.  */
  #define __STDC_FORMAT_MACROS
 +#define __STDC_LIMIT_MACROS

  /* We must include stdarg.h before stdio.h.  */
  #include stdarg.h


 Still, I don't believe it will be portable everywhere.
 Can't you use
 INTTYPE_MAXIMUM (int64_t) instead of INT64_MAX?  We already use that
 in GCC...


 We could also start using the standard C++ mechanism (numeric_limits).

 Except int64_t does not have to be defined for a C++ implementation.

Also not through stdint.h / cstdint?  Note that we should only care
for what happens in practice here.  I hope that at least for more recent
standards than C++04 (which is what we require IIRC) they are on
parity with C99.

Richard.

 Thanks,
 Andrew


 (nothing wrong with INTTYPE_MAXIMUM, just an alternative)

 --
 Marc Glisse


Re: [PATCH 2/2] Simplify and extend VRP edge-assertion code

2014-11-11 Thread Andrew Pinski
On Tue, Nov 11, 2014 at 4:52 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Tue, Nov 11, 2014 at 4:52 AM, Patrick Palka patr...@parcs.ath.cx wrote:
 This patch refactors the VRP edge-assertion code to make it always
 traverse SSA-name definitions in order to find suitable edge assertions
 to insert.  Currently SSA-name definitions get traversed only when the
 LHS of the original conditional is a bitwise AND or OR operation which
 seems like a strange restriction.  We should always try to traverse
 the SSA-name definitions inside the conditional, in particular for
 conditionals with the form:

   int p = x COMP y;
   if (p != 0) -- edge assertion: x COMP y

 Of course this specific case should have been simplified to

  if (x COMP y)

 if that comparison cannot trap and -fnon-call-exceptions is in effect.

Except I have found that if p was used below also. We still have if(p
!= 0).  I just saw that recently when I was working on enhancing
PHI-opt.

Thanks,
Andrew Pinski


 To achieve this the patch merges the mutually recursive functions
 register_edge_assert_for_1() and register_edge_assert_for_2() into a
 single recursive function, register_edge_assert_for_1().  In doing so,
 code duplication can be reduced and at the same time the more general
 logic allows VRP to detect more useful edge assertions.

 The recursion of the function register_edge_assert_for_1() is bounded by
 a new 'limit' argument which is arbitrarily set to 4 so that at most 4
 levels of SSA-name definitions will be traversed per conditional.
 (Incidentally this hard recursion limit makes the related fix for PR
 57685 unnecessary.)

 A test in uninit-pred-9_b.c now has to be marked xfail because in it VRP
 (correctly) transforms the statement

   # prephitmp_35 = PHI pretmp_9(8), _28(10)
   into
   # prephitmp_35 = PHI pretmp_9(8), 1(10)

 and the uninit pass doesn't properly handle such PHIs containing a
 constant value as one of its arguments -- so a bogus uninit warning is
 now emitted.

 Did you try fixing that?  It seems to me a constant should be easy
 to handle?

 Full bootstrap + regtesting on x86_64-unknown-linux-gnu is in progress.
 Is it OK to commit if testing finishes with no new regressions?

 Ok.

 Thanks,
 Richard.

 2014-11-11  Patrick Palka  patr...@parcs.ath.cx

 gcc/
 * tree-vrp.c (extract_code_and_val_from_cond_with_ops): Ensure
 that NAME always equals COND_OP0 or COND_OP1.
 (register_edge_assert_for, register_edge_assert_for_1,
 register_edge_assert_for_2): Refactor and consolidate
 edge-assertion logic into ...
 (register_edge_assert_for_2): ... here.  Add LIMIT parameter.
 Rename to ...
 (register_edge_assert_for_1): ... this.

 gcc/testsuite/
 * gcc.dg/vrp-1.c: New testcase.
 * gcc.dg/vrp-2.c: New testcase.
 * gcc.dg/uninit-pred-9_b.c: xfail test on line 24.
 ---
  gcc/testsuite/gcc.dg/uninit-pred-9_b.c |   2 +-
  gcc/testsuite/gcc.dg/vrp-1.c   |  31 
  gcc/testsuite/gcc.dg/vrp-2.c   |  78 ++
  gcc/tree-vrp.c | 261 
 +++--
  4 files changed, 231 insertions(+), 141 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/vrp-1.c
  create mode 100644 gcc/testsuite/gcc.dg/vrp-2.c

 diff --git a/gcc/testsuite/gcc.dg/uninit-pred-9_b.c 
 b/gcc/testsuite/gcc.dg/uninit-pred-9_b.c
 index d9ae75e..555ec20 100644
 --- a/gcc/testsuite/gcc.dg/uninit-pred-9_b.c
 +++ b/gcc/testsuite/gcc.dg/uninit-pred-9_b.c
 @@ -21,7 +21,7 @@ int foo (int n, int l, int m, int r)
blah(v); /* { dg-bogus uninitialized bogus warning } */

if ( (n = 8)   (m  99)   (r  19) )
 -  blah(v); /* { dg-bogus uninitialized bogus warning } */
 +  blah(v); /* { dg-bogus uninitialized bogus warning { xfail *-*-* 
 } } */

return 0;
  }
 diff --git a/gcc/testsuite/gcc.dg/vrp-1.c b/gcc/testsuite/gcc.dg/vrp-1.c
 new file mode 100644
 index 000..df5334e
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/vrp-1.c
 @@ -0,0 +1,31 @@
 +/* { dg-options -O2 } */
 +
 +void runtime_error (void) __attribute__ ((noreturn));
 +void compiletime_error (void) __attribute__ ((noreturn, error ()));
 +
 +static void
 +compiletime_check_equals_1 (int *x, int y)
 +{
 +  int __p = *x != y;
 +  if (__builtin_constant_p (__p)  __p)
 +compiletime_error ();
 +  if (__p)
 +runtime_error ();
 +}
 +
 +static void
 +compiletime_check_equals_2 (int *x, int y)
 +{
 +  int __p = *x != y;
 +  if (__builtin_constant_p (__p)  __p)
 +compiletime_error (); /* { dg-error call to } */
 +  if (__p)
 +runtime_error ();
 +}
 +
 +void
 +foo (int *x)
 +{
 +  compiletime_check_equals_1 (x, 5);
 +  compiletime_check_equals_2 (x, 10);
 +}
 diff --git a/gcc/testsuite/gcc.dg/vrp-2.c b/gcc/testsuite/gcc.dg/vrp-2.c
 new file mode 100644
 index 000..5757c2f
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/vrp-2.c
 @@ -0,0 +1,78 @@
 +/* { dg-options -O2 } */
 +
 +void runtime_error (void) __attribute__ 

Re: [PATCH] c++ify sreal

2014-11-11 Thread Andrew Pinski
On Tue, Nov 11, 2014 at 4:54 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Tue, Nov 11, 2014 at 1:08 PM, Andrew Pinski pins...@gmail.com wrote:
 On Tue, Nov 11, 2014 at 1:23 AM, Marc Glisse marc.gli...@inria.fr wrote:
 On Tue, 11 Nov 2014, Jakub Jelinek wrote:

 On Tue, Nov 11, 2014 at 08:51:41AM +0100, Uros Bizjak wrote:

 Hello!

 do $subject, and cleanup for always 64 bit hwi.


 bootstrapped + regtested x86_64-unknown-linux-gnu, ok?


 Ok.  Can you please replace remaining HOST_WIDE_INT
 vestiges in there with [u]int64_t please?



 This patch breaks the build on debian 6.0:

 ../../gcc/sreal.c: In member function āint64_t sreal::to_int() constā:
 ../../gcc/sreal.c:159: error: āINT64_MAXā was not declared in this scope


 Index: system.h
 ===
 --- system.h(revision 217338)
 +++ system.h(working copy)
 @@ -27,6 +27,7 @@
 event inttypes.h gets pulled in by another header it is already
 defined.  */
  #define __STDC_FORMAT_MACROS
 +#define __STDC_LIMIT_MACROS

  /* We must include stdarg.h before stdio.h.  */
  #include stdarg.h


 Still, I don't believe it will be portable everywhere.
 Can't you use
 INTTYPE_MAXIMUM (int64_t) instead of INT64_MAX?  We already use that
 in GCC...


 We could also start using the standard C++ mechanism (numeric_limits).

 Except int64_t does not have to be defined for a C++ implementation.

 Also not through stdint.h / cstdint?  Note that we should only care
 for what happens in practice here.  I hope that at least for more recent
 standards than C++04 (which is what we require IIRC) they are on
 parity with C99.


C++03 did not add long long, only C++11 did.

Thanks,
Andrew Pinski


 Richard.

 Thanks,
 Andrew


 (nothing wrong with INTTYPE_MAXIMUM, just an alternative)

 --
 Marc Glisse


Re: [C PATCH] warn for empty struct -Wc++-compat

2014-11-11 Thread Joseph Myers
On Tue, 11 Nov 2014, Marek Polacek wrote:

  +  if (fieldlist == NULL_TREE)
  +{
  +  warning_at (record_loc, OPT_Wc___compat,
  +   empty %s has size 0 in C, 1 in C++,
  +   (struct_parse_info-code == RECORD_TYPE) ? struct : 
  union);
  +}
  +
  
  I think this won't work well wrt translations, so you need to have
  an if here.  See the pedwarns at the beginning of finish_struct.
  
  Do keywords like struct/union really require translation?
 
 C keywords don't require translation, but you always need to have
 complete sentences in diagnostics so I better pointed it out.  Joseph
 would know better than me though.

The situation where the above code would cause problems for translation is 
a language where struct and union have different grammatical gender 
and the translation of some other bit of the sentence (empty or has) 
needs to agree with that gender.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH][SPARC] default with_cpu to ultrasparc in sparc64-*-linux* targets

2014-11-11 Thread Jose E. Marchesi

Hi.

If no --with-cpu is specified at configure time gcc/config.gcc sets the
cpu option in configure_default_options to `v9' in sparc64 targets.
This leads to the usage of the following spec by the driver:

%{!m32:%{!mcpu=*:-mcpu=v9}}

Which in turn triggers the usage of -Av9 by default when invoking the
assembler.  This leads to failures when VIS instructions are used in
inline assembly or .s files:

[jemarch@install2 gcc]$ echo 'int main () { asm (fzero %f0); return 0; }' | 
gcc -xc -
/tmp/cc1F9iJm.s: Assembler messages:
/tmp/cc1F9iJm.s:11: Error: Architecture mismatch on fzero.
/tmp/cc1F9iJm.s:11:  (Requires v9a|v9b; requested architecture is v9.)

This prevents building upstream glibc with a gcc configured with not
--with-cpu option, for example.

I think it would be reasonable to have gcc targetting ultrasparc
extensions by default in sparc64-*-linux*.  WDYT?

Thanks.

2014-11-11  Jose E. Marchesi  jose.march...@oracle.com

* config.gcc: Use ultrasparc as the default with_cpu option in
sparc64-*-linux* targets.

Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 217346)
+++ gcc/config.gcc  (working copy)
@@ -2709,6 +2709,7 @@
tm_file=sparc/biarch64.h ${tm_file} dbxelf.h elfos.h sparc/sysv4.h 
gnu-user.h linux.h glibc-stdint.h sparc/default-64.h sparc/linux64.h 
sparc/tso.h
extra_options=${extra_options} sparc/long-double-switch.opt
tmake_file=${tmake_file} sparc/t-sparc sparc/t-linux64
+   test x$with_cpu != x || with_cpu=ultrasparc
;;
 sparc64-*-freebsd*|ultrasparc-*-freebsd*)
tm_file=${tm_file} ${fbsd_tm_file} dbxelf.h elfos.h sparc/sysv4.h 
sparc/freebsd.h


Re: [PATCH 2/2] Simplify and extend VRP edge-assertion code

2014-11-11 Thread Richard Biener
On Tue, Nov 11, 2014 at 1:56 PM, Andrew Pinski pins...@gmail.com wrote:
 On Tue, Nov 11, 2014 at 4:52 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Tue, Nov 11, 2014 at 4:52 AM, Patrick Palka patr...@parcs.ath.cx wrote:
 This patch refactors the VRP edge-assertion code to make it always
 traverse SSA-name definitions in order to find suitable edge assertions
 to insert.  Currently SSA-name definitions get traversed only when the
 LHS of the original conditional is a bitwise AND or OR operation which
 seems like a strange restriction.  We should always try to traverse
 the SSA-name definitions inside the conditional, in particular for
 conditionals with the form:

   int p = x COMP y;
   if (p != 0) -- edge assertion: x COMP y

 Of course this specific case should have been simplified to

  if (x COMP y)

 if that comparison cannot trap and -fnon-call-exceptions is in effect.

 Except I have found that if p was used below also. We still have if(p
 != 0).  I just saw that recently when I was working on enhancing
 PHI-opt.

Yeah - one of forwprop's single-use restrictions.  Definitely one
we don't want to preserve though.

Richard.

 Thanks,
 Andrew Pinski


 To achieve this the patch merges the mutually recursive functions
 register_edge_assert_for_1() and register_edge_assert_for_2() into a
 single recursive function, register_edge_assert_for_1().  In doing so,
 code duplication can be reduced and at the same time the more general
 logic allows VRP to detect more useful edge assertions.

 The recursion of the function register_edge_assert_for_1() is bounded by
 a new 'limit' argument which is arbitrarily set to 4 so that at most 4
 levels of SSA-name definitions will be traversed per conditional.
 (Incidentally this hard recursion limit makes the related fix for PR
 57685 unnecessary.)

 A test in uninit-pred-9_b.c now has to be marked xfail because in it VRP
 (correctly) transforms the statement

   # prephitmp_35 = PHI pretmp_9(8), _28(10)
   into
   # prephitmp_35 = PHI pretmp_9(8), 1(10)

 and the uninit pass doesn't properly handle such PHIs containing a
 constant value as one of its arguments -- so a bogus uninit warning is
 now emitted.

 Did you try fixing that?  It seems to me a constant should be easy
 to handle?

 Full bootstrap + regtesting on x86_64-unknown-linux-gnu is in progress.
 Is it OK to commit if testing finishes with no new regressions?

 Ok.

 Thanks,
 Richard.

 2014-11-11  Patrick Palka  patr...@parcs.ath.cx

 gcc/
 * tree-vrp.c (extract_code_and_val_from_cond_with_ops): Ensure
 that NAME always equals COND_OP0 or COND_OP1.
 (register_edge_assert_for, register_edge_assert_for_1,
 register_edge_assert_for_2): Refactor and consolidate
 edge-assertion logic into ...
 (register_edge_assert_for_2): ... here.  Add LIMIT parameter.
 Rename to ...
 (register_edge_assert_for_1): ... this.

 gcc/testsuite/
 * gcc.dg/vrp-1.c: New testcase.
 * gcc.dg/vrp-2.c: New testcase.
 * gcc.dg/uninit-pred-9_b.c: xfail test on line 24.
 ---
  gcc/testsuite/gcc.dg/uninit-pred-9_b.c |   2 +-
  gcc/testsuite/gcc.dg/vrp-1.c   |  31 
  gcc/testsuite/gcc.dg/vrp-2.c   |  78 ++
  gcc/tree-vrp.c | 261 
 +++--
  4 files changed, 231 insertions(+), 141 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/vrp-1.c
  create mode 100644 gcc/testsuite/gcc.dg/vrp-2.c

 diff --git a/gcc/testsuite/gcc.dg/uninit-pred-9_b.c 
 b/gcc/testsuite/gcc.dg/uninit-pred-9_b.c
 index d9ae75e..555ec20 100644
 --- a/gcc/testsuite/gcc.dg/uninit-pred-9_b.c
 +++ b/gcc/testsuite/gcc.dg/uninit-pred-9_b.c
 @@ -21,7 +21,7 @@ int foo (int n, int l, int m, int r)
blah(v); /* { dg-bogus uninitialized bogus warning } */

if ( (n = 8)   (m  99)   (r  19) )
 -  blah(v); /* { dg-bogus uninitialized bogus warning } */
 +  blah(v); /* { dg-bogus uninitialized bogus warning { xfail *-*-* 
 } } */

return 0;
  }
 diff --git a/gcc/testsuite/gcc.dg/vrp-1.c b/gcc/testsuite/gcc.dg/vrp-1.c
 new file mode 100644
 index 000..df5334e
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/vrp-1.c
 @@ -0,0 +1,31 @@
 +/* { dg-options -O2 } */
 +
 +void runtime_error (void) __attribute__ ((noreturn));
 +void compiletime_error (void) __attribute__ ((noreturn, error ()));
 +
 +static void
 +compiletime_check_equals_1 (int *x, int y)
 +{
 +  int __p = *x != y;
 +  if (__builtin_constant_p (__p)  __p)
 +compiletime_error ();
 +  if (__p)
 +runtime_error ();
 +}
 +
 +static void
 +compiletime_check_equals_2 (int *x, int y)
 +{
 +  int __p = *x != y;
 +  if (__builtin_constant_p (__p)  __p)
 +compiletime_error (); /* { dg-error call to } */
 +  if (__p)
 +runtime_error ();
 +}
 +
 +void
 +foo (int *x)
 +{
 +  compiletime_check_equals_1 (x, 5);
 +  compiletime_check_equals_2 (x, 10);
 +}
 diff --git a/gcc/testsuite/gcc.dg/vrp-2.c b/gcc/testsuite/gcc.dg/vrp-2.c
 

[Build, patch] Remove CLooG from the main configure.ac

2014-11-11 Thread Tobias Burnus
Now that CLooG is no longer used by GCC, it makes sense to also remove it from
the main configure file. Especially as the in-tree build currently only works
if also CLooG is available.

Build on x86-64-gnu-linux - and tested that Graphite still works.*
OK for the trunk?


[* I did see a failure for gcc.dg/graphite/vect-pr43423.c, but that seems to be
independent as I see it also with the yesterday's GCC; for Sparc/arm/aarch64,
it's PR62630.]


Tobias

2014-11-11  Tobias Burnus  bur...@net-b.de

* config/cloog.m4: Remove.
* Makefile.def: Remove CLooG.
* Makefile.tpl: Ditto.
* configure.ac: Ditto.
* configure: Regenerate.
* Makefile.in: Ditto.
 Makefile.def|8 ---
 Makefile.tpl|6 --
 config/cloog.m4 |  152 ---
 configure.ac|   47 ++---
 4 files changed, 6 insertions(+), 207 deletions(-)

diff --git a/Makefile.def b/Makefile.def
index dcbcd08..24dfb0b 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -66,11 +66,6 @@ host_modules= { module= isl; lib_path=.libs; bootstrap=true;
 		extra_configure_flags='--disable-shared @extra_isl_gmp_configure_flags@';
 		extra_make_flags='V=1';
 		no_install= true; };
-host_modules= { module= cloog; lib_path=.libs; bootstrap=true;
-		extra_configure_flags='--disable-shared --with-gmp=system --with-bits=gmp --with-isl=system';
-		extra_exports='CPPFLAGS=$(HOST_GMPINC) $(HOST_ISLINC) $$CPPFLAGS; export CPPFLAGS; LDFLAGS=-L$$r/$(HOST_SUBDIR)/gmp/.libs -L$$r/$(HOST_SUBDIR)/isl/.libs $$LDFLAGS; export LDFLAGS; ';
-		extra_make_flags='CPPFLAGS=$$CPPFLAGS LDFLAGS=$$LDFLAGS V=1';
-		no_install= true; };
 host_modules= { module= libelf; lib_path=.libs; bootstrap=true;
 		extra_configure_flags='--disable-shared';
 		no_install= true; };
@@ -319,7 +314,6 @@ dependencies = { module=all-gcc; on=all-libiberty; hard=true; };
 dependencies = { module=all-gcc; on=all-intl; };
 dependencies = { module=all-gcc; on=all-mpfr; };
 dependencies = { module=all-gcc; on=all-mpc; };
-dependencies = { module=all-gcc; on=all-cloog; };
 dependencies = { module=all-gcc; on=all-build-texinfo; };
 dependencies = { module=all-gcc; on=all-build-bison; };
 dependencies = { module=all-gcc; on=all-build-flex; };
@@ -365,8 +359,6 @@ dependencies = { module=all-utils; on=all-libiberty; };
 dependencies = { module=configure-mpfr; on=all-gmp; };
 dependencies = { module=configure-mpc; on=all-mpfr; };
 dependencies = { module=configure-isl; on=all-gmp; };
-dependencies = { module=configure-cloog; on=all-isl; };
-dependencies = { module=configure-cloog; on=all-gmp; };
 
 // Host modules specific to gdb.
 dependencies = { module=configure-gdb; on=all-intl; };
diff --git a/Makefile.tpl b/Makefile.tpl
index f7c7e38..884e02d 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -224,8 +224,6 @@ HOST_EXPORTS = \
 	GMPINC=$(HOST_GMPINC); export GMPINC; \
 	ISLLIBS=$(HOST_ISLLIBS); export ISLLIBS; \
 	ISLINC=$(HOST_ISLINC); export ISLINC; \
-	CLOOGLIBS=$(HOST_CLOOGLIBS); export CLOOGLIBS; \
-	CLOOGINC=$(HOST_CLOOGINC); export CLOOGINC; \
 	LIBELFLIBS=$(HOST_LIBELFLIBS) ; export LIBELFLIBS; \
 	LIBELFINC=$(HOST_LIBELFINC) ; export LIBELFINC; \
 @if gcc-bootstrap
@@ -318,10 +316,6 @@ HOST_GMPINC = @gmpinc@
 HOST_ISLLIBS = @isllibs@
 HOST_ISLINC = @islinc@
 
-# Where to find CLOOG
-HOST_CLOOGLIBS = @clooglibs@
-HOST_CLOOGINC = @clooginc@
-
 # Where to find libelf
 HOST_LIBELFLIBS = @libelflibs@
 HOST_LIBELFINC = @libelfinc@
diff --git a/config/cloog.m4 b/config/cloog.m4
deleted file mode 100644
index b80ac27..000
--- a/config/cloog.m4
+++ /dev/null
@@ -1,152 +0,0 @@
-# This file is part of GCC.
-#
-# GCC is free software; you can redistribute it and/or modify it under
-# the terms of the GNU General Public License as published by the Free
-# Software Foundation; either version 3, or (at your option) any later
-# version.
-#
-# GCC is distributed in the hope that it will be useful, but WITHOUT
-# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
-# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
-# for more details.
-#
-# You should have received a copy of the GNU General Public License
-# along with GCC; see the file COPYING3.  If not see
-# http://www.gnu.org/licenses/.
-#
-# Contributed by Andreas Simbuerger simbu...@fim.uni-passau.de
-
-# CLOOG_INIT_FLAGS ()
-# -
-# Provide configure switches for CLooG support.
-# Initialize clooglibs/clooginc according to the user input.
-AC_DEFUN([CLOOG_INIT_FLAGS],
-[
-  AC_ARG_WITH([cloog-include],
-[AS_HELP_STRING(
-  [--with-cloog-include=PATH],
-  [Specify directory for installed CLooG include files])])
-  AC_ARG_WITH([cloog-lib],
-[AS_HELP_STRING(
-  [--with-cloog-lib=PATH],
-  [Specify the directory for the installed CLooG library])])
-
-  AC_ARG_ENABLE(cloog-version-check,
-[AS_HELP_STRING(
-  [--disable-cloog-version-check],
-  [disable check for CLooG 

Re: [PATCH] libstdc++ - Add xmethods for associative containers (ordered and unordered)

2014-11-11 Thread Siva Chandra
On Tue, Nov 11, 2014 at 5:03 AM, Siva Chandra sivachan...@google.com wrote:
 On Tue, Nov 11, 2014 at 3:38 AM, Jonathan Wakely jwak...@redhat.com wrote:
 On 10/11/14 21:49 +, Jonathan Wakely wrote:

 On 09/11/14 16:00 -0800, Siva Chandra wrote:

 Hello,

 Attached is a patch which adds xmethods for the associative containers
 (set, map, multiset and multimap) and their unordered versions. I
 think the GDB Python API is not rich enough to implement xmethods for
 the more interesting methods like find, count etc. The attached patch
 only implements xmethods for size and empty. That way, it is a fairly
 straightforward patch.


 This looks fine, I'll commit it soon. Thanks.


 Committed to trunk.

Thanks for the quick review and commit.

(Sorry for the premature Send earlier.)


Re: [Build, patch] Remove CLooG from the main configure.ac

2014-11-11 Thread Tobias Grosser

On 11.11.2014 14:01, Tobias Burnus wrote:

Now that CLooG is no longer used by GCC, it makes sense to also remove it from
the main configure file. Especially as the in-tree build currently only works
if also CLooG is available.

Build on x86-64-gnu-linux - and tested that Graphite still works.*
OK for the trunk?


[* I did see a failure for gcc.dg/graphite/vect-pr43423.c, but that seems to be
independent as I see it also with the yesterday's GCC; for Sparc/arm/aarch64,
it's PR62630.]


Conceptually that is the right way to go. This requires however the OK 
from a autoconf maintainer.


Tobias


Re: [C++ Patch] PR 63265

2014-11-11 Thread Paolo Carlini

Hi,

On 11/10/2014 06:16 PM, Jason Merrill wrote:
I don't think we want to suppress this warning in general.  The 
problem in this PR is that the warning code is failing to recognize 
that the first operand is constant false.

Thanks. Then, shall we do something like the below? Passes testing.

Thanks,
Paolo.

//
/cp
2014-11-11  Paolo Carlini  paolo.carl...@oracle.com

PR c++/63265
* pt.c (tsubst_copy_and_build, case COND_EXPR): Maybe fold to
const the condition.

/testsuite
2014-11-11  Paolo Carlini  paolo.carl...@oracle.com

PR c++/63265
* g++.dg/cpp0x/constexpr-63265.C: New.
Index: cp/pt.c
===
--- cp/pt.c (revision 217342)
+++ cp/pt.c (working copy)
@@ -15137,7 +15137,9 @@ tsubst_copy_and_build (tree t,
 
 case COND_EXPR:
   {
-   tree cond = RECUR (TREE_OPERAND (t, 0));
+   tree cond
+ = maybe_constant_value (fold_non_dependent_expr_sfinae
+ (RECUR (TREE_OPERAND (t, 0)), tf_none));
tree exp1, exp2;
 
if (TREE_CODE (cond) == INTEGER_CST)
Index: testsuite/g++.dg/cpp0x/constexpr-63265.C
===
--- testsuite/g++.dg/cpp0x/constexpr-63265.C(revision 0)
+++ testsuite/g++.dg/cpp0x/constexpr-63265.C(working copy)
@@ -0,0 +1,19 @@
+// PR c++/63265
+// { dg-do compile { target c++11 } }
+
+#define LSHIFT (sizeof(unsigned int) * __CHAR_BIT__)
+
+template int lshift
+struct SpuriouslyWarns1 {
+static constexpr unsigned int v = lshift  LSHIFT ? 1U  lshift : 0;
+};
+
+static_assert(SpuriouslyWarns1LSHIFT::v == 0, Impossible occurred);
+
+template int lshift
+struct SpuriouslyWarns2 {
+static constexpr bool okay = lshift  LSHIFT;
+static constexpr unsigned int v = okay ? 1U  lshift : 0;
+};
+
+static_assert(SpuriouslyWarns2LSHIFT::v == 0, Impossible occurred);


Re: [patch] OpenACC fortran front end

2014-11-11 Thread Ilya Verbin
Hi,

On 11 Nov 08:10, Jakub Jelinek wrote:
 For the middle-end and libgomp changes, can you talk to the Intel folks to
 update their git branch to latest trunk (so that you have the nvptx bits in
 there) and send middle-end and libgomp diffs against that?
 As far as I remember, most of the changes from the branch are now approved,
 they are just waiting for review of the LTO related changes in the
 middle-end (please, correct me if I've missed something).

The updated branch is here:
https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/kyukhin/gomp4-offload

It contains 7 common patches.
Patches 2-4 are waiting for LTO review, the others are approved.

  -- Ilya


Re: [x86, 7/n] Replace builtins with vector extensions

2014-11-11 Thread Uros Bizjak
On Tue, Nov 11, 2014 at 12:35 PM, Marc Glisse marc.gli...@inria.fr wrote:

 last patch, extending == and  to size 256. Regtested as usual.

 Is the branch ready to be merged into trunk?

  2014-11-10  Marc Glisse  marc.gli...@inria.fr

 * config/i386/emmintrin.h (_mm_cmpeq_epi8, _mm_cmpeq_epi16,
 _mm_cmpeq_epi32, _mm_cmplt_epi8, _mm_cmplt_epi16, _mm_cmplt_epi32,
 _mm_cmpgt_epi8, _mm_cmpgt_epi16, _mm_cmpgt_epi32): Use vector
 extensions instead of builtins.
 * config/i386/smmintrin.h (_mm_cmpeq_epi64, _mm_cmpgt_epi64):
 Likewise.

OK.

Please post the complete ChangeLog and complete patch for a review and merge.

Thanks,
Uros.


Re: [C++ Patch] PR 63265

2014-11-11 Thread Jason Merrill

On 11/11/2014 08:04 AM, Paolo Carlini wrote:

-   tree cond = RECUR (TREE_OPERAND (t, 0));
+   tree cond
+ = maybe_constant_value (fold_non_dependent_expr_sfinae
+ (RECUR (TREE_OPERAND (t, 0)), tf_none));


I like this approach, but if the result of maybe_constant_value doesn't 
turn out to be an INTEGER_CST, we want to end up with the result of 
RECUR rather than the result of fold_non_dependent_expr, as the latter 
might not be suitable for subsequent tsubsting.


Jason



[PATCH][17/n] Merge from match-and-simplify, plus/minus association patterns

2014-11-11 Thread Richard Biener

This merges patterns from associate_plusminus and adjusts them with
details from their fold-const.c pendants.  It also fixes missing
flag_sanitize checks on negate contraction on the way.

This shows places where folds STRIP_NOPs was important (but also
shows where it may create wrong code - sth the patch doesn't fix
yet).  Without the conditonal convert handling on the negate
contraction we regress quite a few GENERIC folding testcases.

Note that the other explicit reassocation patterns are handled
by folds associate: piece which I am sure we don't implement
fully by the few patterns (OTOH on GIMPLE we have a reassoc
pass for that anyway).  So not too many patterns were removed
from fold.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2014-11-11  Richard Biener  rguent...@suse.de

* match.pd: Implement patterns from associate_plusminus
and factor in differences from the fold-const.c implementation.
* fold-const.c (fold_binary_loc): Remove patterns here.
* tree-ssa-forwprop.c (associate_plusminus): Remove.
(pass_forwprop::execute): Don't call it.
* tree.c (tree_nop_conversion_p): New function, factored
from tree_nop_conversion.
* tree.h (tree_nop_conversion_p): Declare.

Index: trunk/gcc/fold-const.c
===
*** trunk.orig/gcc/fold-const.c 2014-11-11 09:54:58.840824189 +0100
--- trunk/gcc/fold-const.c  2014-11-11 10:06:29.274793976 +0100
*** fold_binary_loc (location_t loc,
*** 9939,9997 
return NULL_TREE;
  
  case PLUS_EXPR:
-   /* A + (-B) - A - B */
-   if (TREE_CODE (arg1) == NEGATE_EXPR
-  (flag_sanitize  SANITIZE_SI_OVERFLOW) == 0)
-   return fold_build2_loc (loc, MINUS_EXPR, type,
-   fold_convert_loc (loc, type, arg0),
-   fold_convert_loc (loc, type,
- TREE_OPERAND (arg1, 0)));
-   /* (-A) + B - B - A */
-   if (TREE_CODE (arg0) == NEGATE_EXPR
-  reorder_operands_p (TREE_OPERAND (arg0, 0), arg1)
-  (flag_sanitize  SANITIZE_SI_OVERFLOW) == 0)
-   return fold_build2_loc (loc, MINUS_EXPR, type,
-   fold_convert_loc (loc, type, arg1),
-   fold_convert_loc (loc, type,
- TREE_OPERAND (arg0, 0)));
- 
if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
{
- /* Convert ~A + 1 to -A.  */
- if (TREE_CODE (arg0) == BIT_NOT_EXPR
-  integer_each_onep (arg1))
-   return fold_build1_loc (loc, NEGATE_EXPR, type,
-   fold_convert_loc (loc, type,
- TREE_OPERAND (arg0, 0)));
- 
- /* ~X + X is -1.  */
- if (TREE_CODE (arg0) == BIT_NOT_EXPR
-  !TYPE_OVERFLOW_TRAPS (type))
-   {
- tree tem = TREE_OPERAND (arg0, 0);
- 
- STRIP_NOPS (tem);
- if (operand_equal_p (tem, arg1, 0))
-   {
- t1 = build_all_ones_cst (type);
- return omit_one_operand_loc (loc, type, t1, arg1);
-   }
-   }
- 
- /* X + ~X is -1.  */
- if (TREE_CODE (arg1) == BIT_NOT_EXPR
-  !TYPE_OVERFLOW_TRAPS (type))
-   {
- tree tem = TREE_OPERAND (arg1, 0);
- 
- STRIP_NOPS (tem);
- if (operand_equal_p (arg0, tem, 0))
-   {
- t1 = build_all_ones_cst (type);
- return omit_one_operand_loc (loc, type, t1, arg0);
-   }
-   }
- 
  /* X + (X / CST) * -CST is X % CST.  */
  if (TREE_CODE (arg1) == MULT_EXPR
   TREE_CODE (TREE_OPERAND (arg1, 0)) == TRUNC_DIV_EXPR
--- 9939,9946 
*** fold_binary_loc (location_t loc,
*** 10469,10479 
return fold_build2_loc (loc, MINUS_EXPR, type, tmp, arg11);
}
}
-   /* A - (-B) - A + B */
-   if (TREE_CODE (arg1) == NEGATE_EXPR)
-   return fold_build2_loc (loc, PLUS_EXPR, type, op0,
-   fold_convert_loc (loc, type,
- TREE_OPERAND (arg1, 0)));
/* (-A) - B - (-B) - A  where B is easily negated and we can swap.  */
if (TREE_CODE (arg0) == NEGATE_EXPR
   negate_expr_p (arg1)
--- 10418,10423 
Index: trunk/gcc/match.pd
===
*** trunk.orig/gcc/match.pd 2014-11-11 09:54:58.840824189 +0100
--- trunk/gcc/match.pd  2014-11-11 11:55:27.283507870 +0100
*** along with GCC; see the file COPYING3.
*** 25,32 
  
  /* Generic tree predicates we inherit.  */
  (define_predicates
!integer_onep integer_zerop integer_all_onesp
!real_zerop real_onep
 

Re: Add the latest C++ SD-6 additions.

2014-11-11 Thread Jason Merrill

On 11/11/2014 12:52 AM, Ed Smith-Rowland wrote:

I'll might put this to the SD-6 list because it would be nice to have
clarity - even if it's implementation defined.


Sounds good.  I was thinking that defining to 0 tells the user this 
isn't supported which seems more useful than this may or may not be 
supported.


Jason



Re: [Build, patch] Remove CLooG from the main configure.ac

2014-11-11 Thread Richard Biener
On Tue, Nov 11, 2014 at 2:01 PM, Tobias Burnus
tobias.bur...@physik.fu-berlin.de wrote:
 Now that CLooG is no longer used by GCC, it makes sense to also remove it from
 the main configure file. Especially as the in-tree build currently only works
 if also CLooG is available.

 Build on x86-64-gnu-linux - and tested that Graphite still works.*
 OK for the trunk?

Ok.

Thanks,
Richard.


 [* I did see a failure for gcc.dg/graphite/vect-pr43423.c, but that seems to 
 be
 independent as I see it also with the yesterday's GCC; for Sparc/arm/aarch64,
 it's PR62630.]

Yeah, happens since quite some time for me as well.


 Tobias

 2014-11-11  Tobias Burnus  bur...@net-b.de

 * config/cloog.m4: Remove.
 * Makefile.def: Remove CLooG.
 * Makefile.tpl: Ditto.
 * configure.ac: Ditto.
 * configure: Regenerate.
 * Makefile.in: Ditto.


Re: [PATCH 2/2] Simplify and extend VRP edge-assertion code

2014-11-11 Thread Patrick Palka
On Tue, Nov 11, 2014 at 7:52 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Tue, Nov 11, 2014 at 4:52 AM, Patrick Palka patr...@parcs.ath.cx wrote:
 This patch refactors the VRP edge-assertion code to make it always
 traverse SSA-name definitions in order to find suitable edge assertions
 to insert.  Currently SSA-name definitions get traversed only when the
 LHS of the original conditional is a bitwise AND or OR operation which
 seems like a strange restriction.  We should always try to traverse
 the SSA-name definitions inside the conditional, in particular for
 conditionals with the form:

   int p = x COMP y;
   if (p != 0) -- edge assertion: x COMP y

 Of course this specific case should have been simplified to

  if (x COMP y)

 if that comparison cannot trap and -fnon-call-exceptions is in effect.

Like Andrew said, I noticed that if p is shared then such comparisons
don't get simplified.  And like in the case of uninit-pred-9_b.c it
seems that the compiler sometimes implicitly CSEs duplicate
conditionals.


 To achieve this the patch merges the mutually recursive functions
 register_edge_assert_for_1() and register_edge_assert_for_2() into a
 single recursive function, register_edge_assert_for_1().  In doing so,
 code duplication can be reduced and at the same time the more general
 logic allows VRP to detect more useful edge assertions.

 The recursion of the function register_edge_assert_for_1() is bounded by
 a new 'limit' argument which is arbitrarily set to 4 so that at most 4
 levels of SSA-name definitions will be traversed per conditional.
 (Incidentally this hard recursion limit makes the related fix for PR
 57685 unnecessary.)

 A test in uninit-pred-9_b.c now has to be marked xfail because in it VRP
 (correctly) transforms the statement

   # prephitmp_35 = PHI pretmp_9(8), _28(10)
   into
   # prephitmp_35 = PHI pretmp_9(8), 1(10)

 and the uninit pass doesn't properly handle such PHIs containing a
 constant value as one of its arguments -- so a bogus uninit warning is
 now emitted.

 Did you try fixing that?  It seems to me a constant should be easy
 to handle?

I tried a couple months ago and I failed.  I might try again.


 Full bootstrap + regtesting on x86_64-unknown-linux-gnu is in progress.
 Is it OK to commit if testing finishes with no new regressions?

 Ok.

I decided to replace this refactoring patch with a simpler patch (sent
to the ML) that just changes (adds) about 20LOC to tree-vrp.c.  The
patch is not as extensive (a few of the tests in vrp-2.c still fail)
but I am more comfortable about the patch's correctness and its impact
on compile time.  Sorry, I should've been more clear about that.


Re: [PATCH] Look through widening type conversions for possible edge assertions

2014-11-11 Thread Richard Biener
On Tue, Nov 11, 2014 at 1:10 PM, Patrick Palka patr...@parcs.ath.cx wrote:
 This patch is a replacement for the 2nd VRP refactoring patch.  It
 simply teaches VRP to look through widening type conversions when
 finding suitable edge assertions, e.g.

 bool p = x != y;
 int q = (int) p;
 if (q == 0) // new edge assert: p == 0 and therefore x == y

I think the proper fix is to forward x != y to q == 0 instead of this one.
That said - the tree-ssa-forwprop.c restriction on only forwarding
single-uses into conditions is clearly bogus here.  I suggest to
relax it for conversions and compares.  Like with

Index: tree-ssa-forwprop.c
===
--- tree-ssa-forwprop.c (revision 217349)
+++ tree-ssa-forwprop.c (working copy)
@@ -476,7 +476,7 @@ forward_propagate_into_comparison_1 (gim
{
  rhs0 = rhs_to_tree (TREE_TYPE (op1), def_stmt);
  tmp = combine_cond_expr_cond (stmt, code, type,
-   rhs0, op1, !single_use0_p);
+   rhs0, op1, false);
  if (tmp)
return tmp;
}


Thanks,
Richard.

 The new testcase requires that such an edge assertion be inserted.

 Full bootstrap + regtest on x86_64-unknown-linux-gnu in progress.  Does
 the patch look OK for trunk if no new regressions?

 2014-11-11  Patrick Palka  ppa...@gcc.gnu.org

 gcc/
 * tree-vrp.c (register_edge_assert_for): Look through
 widening type conversions for posible edge assertions.

 gcc/testsuite/
 * gcc.dg/vrp-1.c: New testcase.
 ---
  gcc/testsuite/gcc.dg/vrp-1.c | 31 +++
  gcc/tree-vrp.c   | 22 ++
  2 files changed, 53 insertions(+)
  create mode 100644 gcc/testsuite/gcc.dg/vrp-1.c

 diff --git a/gcc/testsuite/gcc.dg/vrp-1.c b/gcc/testsuite/gcc.dg/vrp-1.c
 new file mode 100644
 index 000..df5334e
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/vrp-1.c
 @@ -0,0 +1,31 @@
 +/* { dg-options -O2 } */
 +
 +void runtime_error (void) __attribute__ ((noreturn));
 +void compiletime_error (void) __attribute__ ((noreturn, error ()));
 +
 +static void
 +compiletime_check_equals_1 (int *x, int y)
 +{
 +  int __p = *x != y;
 +  if (__builtin_constant_p (__p)  __p)
 +compiletime_error ();
 +  if (__p)
 +runtime_error ();
 +}
 +
 +static void
 +compiletime_check_equals_2 (int *x, int y)
 +{
 +  int __p = *x != y;
 +  if (__builtin_constant_p (__p)  __p)
 +compiletime_error (); /* { dg-error call to } */
 +  if (__p)
 +runtime_error ();
 +}
 +
 +void
 +foo (int *x)
 +{
 +  compiletime_check_equals_1 (x, 5);
 +  compiletime_check_equals_2 (x, 10);
 +}
 diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
 index f0a4382..979ab44 100644
 --- a/gcc/tree-vrp.c
 +++ b/gcc/tree-vrp.c
 @@ -5634,6 +5634,7 @@ register_edge_assert_for (tree name, edge e, 
 gimple_stmt_iterator si,
   the value zero or one, then we may be able to assert values
   for SSA_NAMEs which flow into COND.  */

 +
/* In the case of NAME == 1 or NAME != 0, for BIT_AND_EXPR defining
   statement of NAME we can assert both operands of the BIT_AND_EXPR
   have nonzero value.  */
 @@ -5673,6 +5674,27 @@ register_edge_assert_for (tree name, edge e, 
 gimple_stmt_iterator si,
   register_edge_assert_for_1 (op1, EQ_EXPR, e, si);
 }
  }
 +
 +  /* In the case of NAME != 0 or NAME == 0, if NAME's defining statement
 + is a widening type conversion then we can assert that NAME's
 + RHS is accordingly nonzero or zero.  */
 +  if ((comp_code == EQ_EXPR || comp_code == NE_EXPR)
 +   integer_zerop (val))
 +{
 +  gimple def_stmt = SSA_NAME_DEF_STMT (name);
 +  if (is_gimple_assign (def_stmt))
 +   {
 + enum tree_code def_code = gimple_assign_rhs_code (def_stmt);
 + if (CONVERT_EXPR_CODE_P (def_code))
 +   {
 + tree lhs = gimple_assign_lhs (def_stmt);
 + tree rhs = gimple_assign_rhs1 (def_stmt);
 + if (TYPE_PRECISION (TREE_TYPE (lhs))
 + = TYPE_PRECISION (TREE_TYPE (rhs)))
 +   register_edge_assert_for_1 (rhs, comp_code, e, si);
 +   }
 +   }
 +}
  }


 --
 2.2.0.rc1.16.g6066a7e



Pending LTO review for OpenACC trunk-merge patches (was: Re: [patch] OpenACC fortran front end)

2014-11-11 Thread Tobias Burnus
Ilya Verbin wrote:
  On 11 Nov 08:10, Jakub Jelinek wrote:
   For the middle-end and libgomp changes, can you talk to the Intel folks to
   update their git branch to latest trunk (so that you have the nvptx bits in
   there) and send middle-end and libgomp diffs against that?
 As far as I remember, most of the changes from the branch are now 
 approved,
   they are just waiting for review of the LTO related changes in the
   middle-end (please, correct me if I've missed something).
  
  The updated branch is here:
  
 https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/kyukhin/gomp4-offload
  
  It contains 7 common patches.
  Patches 2-4 are waiting for LTO review, the others are approved.

Those are:

* [PATCH 2] OpenMP 4.0 offloading infrastructure: LTO streaming
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=5d1dfefd7cd529998968751a46f4daf87d8300a1

Which is identical except for re-diffing to:
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00299.html


* [PATCH 3] OpenMP 4.0 offloading infrastructure: Offload tables
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=06ffd7482ef4bf2b038a3a0d203b7bec586c6d17

Which has been posted at 
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00308.html


* [PATCH 4] OpenMP 4.0 offloading infrastructure: lto-wrapper
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=41d2ad0d52fb7c6cc78f6ee4fbec7781fa226c70

See https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01535.html
or rather: https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01531.html


Tobias


[PATCH 2/5] OpenACC 2.0 support for libgomp - temporarily work around missing __builtin_acc_on_device (repost)

2014-11-11 Thread Julian Brown
On Tue, 23 Sep 2014 19:19:55 +0100
Julian Brown jul...@codesourcery.com wrote:

 The patches implementing __builtin_acc_on_device are still in
 processing. For the time being this patch removes the dependency on
 that builtin in the OpenACC runtime.
 
 Julian
 
 -xx-xx  Julian Brown  jul...@codesourcery.com
 
   libgomp/
   * oacc-init.c (acc_on_device): Temporarily hard-code for host
   instead of using __builtin_acc_on_device.

This patch remains unchanged from the last posting.

OK to apply?

JulianFrom 99e76023ff0759925403b43e19612fb859c3759e Mon Sep 17 00:00:00 2001
From: Julian Brown jul...@codesourcery.com
Date: Fri, 19 Sep 2014 11:28:11 -0700
Subject: [PATCH 2/5] Work around lack of __builtin_acc_on_device for now

-xx-xx  Julian Brown  jul...@codesourcery.com

libgomp/
* oacc-init.c (acc_on_device): Temporarily hard-code for host
instead of using __builtin_acc_on_device.
---
 libgomp/oacc-init.c |   12 
 1 file changed, 12 insertions(+)

diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 8c91ea7..1cbb4d7 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -545,8 +545,20 @@ acc_on_device (acc_device_t dev)
acc_device_type (thr-dev-type) == acc_device_host_nonshm)
 return dev == acc_device_host_nonshm || dev == acc_device_not_host;
 
+#if 1
+  /* Support for __builtin_acc_on_device comes in later patches.  */
+  switch (dev)
+{
+case acc_device_none:
+case acc_device_host:
+  return 1;
+default:
+  return 0;
+}
+#else
   /* Just rely on the compiler builtin.  */
   return __builtin_acc_on_device (dev);
+#endif
 }
 ialias (acc_on_device)
 
-- 
1.7.10.4



[PATCH 3/5] OpenACC 2.0 support for libgomp - outline documentation (repost)

2014-11-11 Thread Julian Brown
On Tue, 23 Sep 2014 19:20:14 +0100
Julian Brown jul...@codesourcery.com wrote:

 This patch provides some documentation for the new OpenACC bits in
 libgomp.
 
 Julian
 
 -xx-xx  Thomas Schwinge  tho...@codesourcery.com
   James Norris  jnor...@codesourcery.com
 
   libgomp/
   * libgomp.texi: Outline documentation for OpenACC.

This patch also remains unchanged from the last posting.

OK to apply?

JulianFrom 1f17beb70b5607d1884fad1cb4734857f0e7846f Mon Sep 17 00:00:00 2001
From: Julian Brown jul...@codesourcery.com
Date: Mon, 22 Sep 2014 02:45:29 -0700
Subject: [PATCH 3/5] OpenACC documentation.

-xx-xx  Thomas Schwinge  tho...@codesourcery.com
	James Norris  jnor...@codesourcery.com

libgomp/
* libgomp.texi: Outline documentation for OpenACC.
---
 libgomp/libgomp.texi |  661 --
 1 file changed, 636 insertions(+), 25 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 254be57..9530a2b 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -31,10 +31,12 @@ texts being (a) (see below), and with the Back-Cover Texts being (b)
 @ifinfo
 @dircategory GNU Libraries
 @direntry
-* libgomp: (libgomp).GNU OpenMP runtime library
+* libgomp: (libgomp).GNU OpenACC and OpenMP runtime library
 @end direntry
 
-This manual documents the GNU implementation of the OpenMP API for 
+This manual documents the GNU implementation of the OpenACC API for 
+offloading of code to accelerator devices in C/C++ and Fortran and
+the GNU implementation of the OpenMP API for 
 multi-platform shared-memory parallel programming in C/C++ and Fortran.
 
 Published by the Free Software Foundation
@@ -48,7 +50,7 @@ Boston, MA 02110-1301 USA
 @setchapternewpage odd
 
 @titlepage
-@title The GNU OpenMP Implementation
+@title The GNU OpenACC and OpenMP Implementation
 @page
 @vskip 0pt plus 1filll
 @comment For the @value{version-GCC} Version*
@@ -69,7 +71,10 @@ Boston, MA 02110-1301, USA@*
 @top Introduction
 @cindex Introduction
 
-This manual documents the usage of libgomp, the GNU implementation of the 
+This manual documents the usage of libgomp, the GNU implementation of the
+@uref{http://www.openacc.org/, OpenACC} Application Programming Interface (API)
+for offloading of code to accelerator devices in C/C++ and Fortran, and
+the GNU implementation of the 
 @uref{http://www.openmp.org, OpenMP} Application Programming Interface (API)
 for multi-platform shared-memory parallel programming in C/C++ and Fortran.
 
@@ -81,23 +86,619 @@ for multi-platform shared-memory parallel programming in C/C++ and Fortran.
 @comment  better formatting.
 @comment
 @menu
-* Enabling OpenMP::How to enable OpenMP for your applications.
-* Runtime Library Routines::   The OpenMP runtime application programming 
-   interface.
-* Environment Variables::  Influencing runtime behavior with environment 
-   variables.
-* The libgomp ABI::Notes on the external ABI presented by libgomp.
-* Reporting Bugs:: How to report bugs in GNU OpenMP.
-* Copying::GNU general public license says
-   how you can copy and share libgomp.
-* GNU Free Documentation License::
-   How you can copy and share this manual.
-* Funding::How to help assure continued work for free 
-   software.
-* Library Index::  Index of this documentation.
+* Enabling OpenACC:: How to enable OpenACC for your
+ applications.
+* OpenACC Runtime Library Routines:: The OpenACC runtime application
+  programming interface.
+* OpenACC Environment Variables::Influencing OpenACC runtime behavior with
+ environment variables.
+* OpenACC Library Interoperability:: OpenACC library interoperability with the
+ NVIDIA CUBLAS library.
+* Enabling OpenMP::  How to enable OpenMP for your
+ applications.
+* OpenMP Runtime Library Routines: Runtime Library Routines.
+ The OpenMP runtime application programming
+ interface.
+* OpenMP Environment Variables: Environment Variables.
+ Influencing OpenMP runtime behavior with
+ environment variables.
+* The libgomp ABI::  Notes on the external libgomp ABI.
+* Reporting Bugs::   How to report bugs.
+* Copying::  GNU general public license says how you
+ can copy and share libgomp.
+* GNU Free Documentation License::   How you can copy and share this 

[PATCH 5/5] OpenACC 2.0 support for libgomp - temporary test harness tweaks

2014-11-11 Thread Julian Brown
Hi,

As mentioned in the previous mail in this series, testing the OpenACC
runtime support in libgomp is going to be awkward until the associated
middle-end pieces are ready. This stop-gap patch helps to allow tests
(that don't use any of the pragmas, only calling the run-time library
directly) to run successfully.

OK to apply?

Thanks,

Julian

ChangeLog

libgomp/
* testsuite/libgomp.oacc-c++/c++.exp (ALWAYS_CFLAGS): Temporarily
replace -fopenacc with -lgomp -lpthread, until -fopenacc support
lands upstream.
* testsuite/libgomp.oacc-c/c.exp (ALWAYS_CFLAGS): Likewise.
* testsuite/libgomp.oacc-fortran/fortran.exp (ALWAYS_CFLAGS):
Similar, but without -lpthread.
From c70f2aca94bc306e4600282aa81bc1a758ad81fa Mon Sep 17 00:00:00 2001
From: Julian Brown jul...@codesourcery.com
Date: Tue, 11 Nov 2014 02:54:09 -0800
Subject: [PATCH 5/5] Temporary testing tweaks

libgomp/
* testsuite/libgomp.oacc-c++/c++.exp (ALWAYS_CFLAGS): Temporarily replace
-fopenacc with -lgomp -lpthread, until -fopenacc support lands upstream.
* testsuite/libgomp.oacc-c/c.exp (ALWAYS_CFLAGS): Likewise.
* testsuite/libgomp.oacc-fortran/fortran.exp (ALWAYS_CFLAGS): Similar, but
without -lpthread.
---
 libgomp/testsuite/libgomp.oacc-c++/c++.exp |4 +++-
 libgomp/testsuite/libgomp.oacc-c/c.exp |4 +++-
 libgomp/testsuite/libgomp.oacc-fortran/fortran.exp |4 +++-
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c++/c++.exp b/libgomp/testsuite/libgomp.oacc-c++/c++.exp
index b8b3e85..1060344 100644
--- a/libgomp/testsuite/libgomp.oacc-c++/c++.exp
+++ b/libgomp/testsuite/libgomp.oacc-c++/c++.exp
@@ -23,7 +23,9 @@ dg-init
 
 # Turn on OpenACC.
 # XXX (TEMPORARY): Remove the -flto once that's properly integrated.
-lappend ALWAYS_CFLAGS additional_flags=-fopenacc -flto
+#lappend ALWAYS_CFLAGS additional_flags=-fopenacc -flto
+# TODO: Revert this temporary hack when OpenACC middle-end pieces are submitted.
+lappend ALWAYS_CFLAGS additional_flags=-lgomp -flto -lpthread
 
 set blddir [lookfor_file [get_multilibs] libgomp]
 
diff --git a/libgomp/testsuite/libgomp.oacc-c/c.exp b/libgomp/testsuite/libgomp.oacc-c/c.exp
index 5558ec8..85528aa 100644
--- a/libgomp/testsuite/libgomp.oacc-c/c.exp
+++ b/libgomp/testsuite/libgomp.oacc-c/c.exp
@@ -28,7 +28,9 @@ dg-init
 
 # Turn on OpenACC.
 # XXX (TEMPORARY): Remove the -flto once that's properly integrated.
-lappend ALWAYS_CFLAGS additional_flags=-fopenacc -flto
+#lappend ALWAYS_CFLAGS additional_flags=-fopenacc -flto
+# TODO: Revert temporary hack when OpenACC middle-end pieces are submitted.
+lappend ALWAYS_CFLAGS additional_flags=-lgomp -flto -lpthread
 
 lappend libgomp_compile_options compiler=$GCC_UNDER_TEST
 
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp b/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
index 0ada038..27cf4d5 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
+++ b/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
@@ -23,7 +23,9 @@ dg-init
 
 # Turn on OpenACC.
 # XXX (TEMPORARY): Remove the -flto once that's properly integrated.
-lappend ALWAYS_CFLAGS additional_flags=-fopenacc -flto
+#lappend ALWAYS_CFLAGS additional_flags=-fopenacc -flto
+# TODO: Revert this temporary hack when OpenACC middle-end pieces are submitted.
+lappend ALWAYS_CFLAGS additional_flags=-lgomp -flto
 
 if { $blddir !=  } {
 set lang_source_re {^.*\.[fF](|90|95|03|08)$}
-- 
1.7.10.4



[PATCH] Extend shift permutations on power of 2 cases

2014-11-11 Thread Evgeny Stupachenko
Hi,

The patch extends shift permutations technique on power of 2 cases
(previously even/odd transformations was used unconditionally).
Basically the patch just add loop for load group of length 2, like it
is done in vect_permute_load_chain function.

For Silvermont it reduces insn sequence for load group of length 4
from 31 to 20 insns.
Performance for the test in the patch improved by ~20%.

Bootstrap passed.
Make check in progress.

Is it ok?

2014-11-11  Evgeny Stupachenko  evstu...@gmail.com

gcc/testsuite
* gcc.target/i386/pr52252-atom-1.c: New.

gcc/
* tree-vect-data-refs.c (vect_shift_permute_load_chain): Extend shift
permutations on power of 2 cases.

diff --git a/gcc/testsuite/gcc.target/i386/pr52252-atom-1.c
b/gcc/testsuite/gcc.target/i386/pr52252-atom-1.c
new file mode 100644
index 000..1fbd258
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr52252-atom-1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ssse3 } */
+/* { dg-options -O2 -ftree-vectorize -mssse3 -mtune=slm } */
+#define byte unsigned char
+
+void
+pair_mul_sum(byte *in, byte *out, int size)
+{
+  int j;
+  for(j = 0; j  size; j++)
+{
+  byte a = in[0];
+  byte b = in[1];
+  byte c = in[2];
+  byte d = in[3];
+  out[0] = (byte)(a * b) + (byte)(b * c) + (byte)(c * d) + (byte)(d * a);
+  in += 4;
+  out += 1;
+}
+}
+
+/* { dg-final { scan-assembler palignr } } */
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 0bc0356..d2e0e93 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -5379,8 +5379,9 @@ vect_shift_permute_load_chain (vectree dr_chain,
   memcpy (result_chain-address (), dr_chain.address (),
  length * sizeof (tree));

-  if (length == 2  LOOP_VINFO_VECT_FACTOR (loop_vinfo)  4)
+  if (exact_log2 (length) != -1  LOOP_VINFO_VECT_FACTOR (loop_vinfo)  4)
 {
+  unsigned int j, log_length = exact_log2 (length);
   for (i = 0; i  nelt / 2; ++i)
sel[i] = i * 2;
   for (i = 0; i  nelt / 2; ++i)
@@ -5441,37 +5442,44 @@ vect_shift_permute_load_chain (vectree dr_chain,
   select_mask = vect_gen_perm_mask (vectype, sel);
   gcc_assert (select_mask != NULL);

-  first_vect = dr_chain[0];
-  second_vect = dr_chain[1];
-
-  data_ref = make_temp_ssa_name (vectype, NULL, vect_shuffle2);
-  perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref,
-   first_vect, first_vect,
-   perm2_mask1);
-  vect_finish_stmt_generation (stmt, perm_stmt, gsi);
-  vect[0] = data_ref;
+  for (i = 0; i  log_length; i++)
+   {
+ for (j = 0; j  length; j += 2)
+   {
+ first_vect = dr_chain[j];
+ second_vect = dr_chain[j + 1];

-  data_ref = make_temp_ssa_name (vectype, NULL, vect_shuffle2);
-  perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref,
-   second_vect, second_vect,
-   perm2_mask2);
-  vect_finish_stmt_generation (stmt, perm_stmt, gsi);
-  vect[1] = data_ref;
+ data_ref = make_temp_ssa_name (vectype, NULL, vect_shuffle2);
+ perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref,
+   first_vect, first_vect,
+   perm2_mask1);
+ vect_finish_stmt_generation (stmt, perm_stmt, gsi);
+ vect[0] = data_ref;

-  data_ref = make_temp_ssa_name (vectype, NULL, vect_shift);
-  perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref,
-   vect[0], vect[1],
-   shift1_mask);
-  vect_finish_stmt_generation (stmt, perm_stmt, gsi);
-  (*result_chain)[1] = data_ref;
+ data_ref = make_temp_ssa_name (vectype, NULL, vect_shuffle2);
+ perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref,
+   second_vect,
second_vect,
+   perm2_mask2);
+ vect_finish_stmt_generation (stmt, perm_stmt, gsi);
+ vect[1] = data_ref;

-  data_ref = make_temp_ssa_name (vectype, NULL, vect_select);
-  perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref,
-   vect[0], vect[1],
-   select_mask);
-  vect_finish_stmt_generation (stmt, perm_stmt, gsi);
-  (*result_chain)[0] = data_ref;
+ data_ref = make_temp_ssa_name (vectype, NULL, vect_shift);
+ perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref,
+   

[PATCH] Fix for ipa/63795, ipa/63622

2014-11-11 Thread Martin Liška

Hello.

Following patch adds checking for aliasing support. Patch can bootstrap on 
x86_64-apple-darwin1 and is part of patches needed for bootstrap restory on the 
target. I plan to introduce additional patch that will cover testsuite failures 
for the target.

Ready for trunk?
Thanks,
Martin
gcc/ChangeLog:

2014-11-11  Martin Liska  mli...@suse.cz

* ipa-icf.c (sem_function::merge): Add new target aliasing
support guide. 
(sem_variable::merge): Likewise.
* ipa-icf.h (target_supports_aliasing_p): New function.

gcc/testsuite/ChangeLog:

2014-11-11  Martin Liska  mli...@suse.cz

* g++.dg/ipa/ipa-icf-4.C: Add more precise dump scan.
* g++.dg/ipa/ipa-icf-5.C: Add condition for targets with aliasing 
support.
diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index 84cc0ca..f19c3c1 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -191,6 +191,18 @@ sem_item::dump (void)
 }
 }
 
+/* Return true if target supports aliasing.  */
+
+bool
+sem_item::target_supports_aliasing_p (void)
+{
+#if !defined (ASM_OUTPUT_DEF) || (!defined(ASM_OUTPUT_WEAK_ALIAS)  !defined (ASM_WEAKEN_DECL))
+  return false;
+#else
+  return true;
+#endif
+}
+
 /* Semantic function constructor that uses STACK as bitmap memory stack.  */
 
 sem_function::sem_function (bitmap_obstack *stack): sem_item (FUNC, stack),
@@ -589,7 +601,8 @@ sem_function::merge (sem_item *alias_item)
   redirect_callers = false;
 }
 
-  if (create_alias  DECL_COMDAT_GROUP (alias-decl))
+  if (create_alias  (DECL_COMDAT_GROUP (alias-decl)
+		   || !sem_item::target_supports_aliasing_p ()))
 {
   create_alias = false;
   create_thunk = true;
@@ -605,6 +618,14 @@ sem_function::merge (sem_item *alias_item)
 local_original
   = dyn_cast cgraph_node * (original-noninterposable_alias ());
 
+if (!local_original)
+  {
+	if (dump_file)
+	  fprintf (dump_file, Noninterposable alias cannot be created.\n\n);
+
+	return false;
+  }
+
   if (redirect_callers)
 {
   /* If alias is non-overwritable then
@@ -649,7 +670,7 @@ sem_function::merge (sem_item *alias_item)
   alias-resolve_alias (original);
 
   /* Workaround for PR63566 that forces equal calling convention
-	 to be used.  */
+   to be used.  */
   alias-local.local = false;
   original-local.local = false;
 
@@ -1155,6 +1176,13 @@ sem_variable::merge (sem_item *alias_item)
 {
   gcc_assert (alias_item-type == VAR);
 
+  if (!sem_item::target_supports_aliasing_p ())
+{
+  if (dump_file)
+	fprintf (dump_file, Aliasing is not supported by target\n\n);
+  return false;
+}
+
   sem_variable *alias_var = static_castsem_variable * (alias_item);
 
   varpool_node *original = get_node ();
diff --git a/gcc/ipa-icf.h b/gcc/ipa-icf.h
index d8e7b16..6e15166 100644
--- a/gcc/ipa-icf.h
+++ b/gcc/ipa-icf.h
@@ -138,9 +138,11 @@ public:
 
   /* Return base tree that can be used for compatible_types_p and
  contains_polymorphic_type_p comparison.  */
-
   static bool get_base_types (tree *t1, tree *t2);
 
+  /* Return true if target supports aliasing.  */
+  static bool target_supports_aliasing_p (void);
+
   /* Item type.  */
   sem_item_type type;
 
diff --git a/gcc/testsuite/g++.dg/ipa/ipa-icf-4.C b/gcc/testsuite/g++.dg/ipa/ipa-icf-4.C
index 9434289..67f2744 100644
--- a/gcc/testsuite/g++.dg/ipa/ipa-icf-4.C
+++ b/gcc/testsuite/g++.dg/ipa/ipa-icf-4.C
@@ -43,6 +43,6 @@ int main()
   return 123;
 }
 
-/* { dg-final { scan-ipa-dump Varpool alias has been created icf  } } */
+/* { dg-final { scan-ipa-dump \(Varpool alias has been created\)|\(Aliasing is not supported by target\) icf  } } */
 /* { dg-final { scan-ipa-dump Equal symbols: 6 icf  } } */
 /* { dg-final { cleanup-ipa-dump icf } } */
diff --git a/gcc/testsuite/g++.dg/ipa/ipa-icf-5.C b/gcc/testsuite/g++.dg/ipa/ipa-icf-5.C
index f835814..57dcb78 100644
--- a/gcc/testsuite/g++.dg/ipa/ipa-icf-5.C
+++ b/gcc/testsuite/g++.dg/ipa/ipa-icf-5.C
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-visibility  } */
+/* { dg-require-alias  } */
 /* { dg-options -O2 -fdump-ipa-icf } */
 
 struct test


Re: [PATCH] Extend shift permutations on power of 2 cases

2014-11-11 Thread Richard Biener
On Tue, Nov 11, 2014 at 3:21 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 Hi,

 The patch extends shift permutations technique on power of 2 cases
 (previously even/odd transformations was used unconditionally).
 Basically the patch just add loop for load group of length 2, like it
 is done in vect_permute_load_chain function.

 For Silvermont it reduces insn sequence for load group of length 4
 from 31 to 20 insns.
 Performance for the test in the patch improved by ~20%.

 Bootstrap passed.
 Make check in progress.

 Is it ok?

Ok.

Thanks,
Richard.

 2014-11-11  Evgeny Stupachenko  evstu...@gmail.com

 gcc/testsuite
 * gcc.target/i386/pr52252-atom-1.c: New.

 gcc/
 * tree-vect-data-refs.c (vect_shift_permute_load_chain): Extend shift
 permutations on power of 2 cases.

 diff --git a/gcc/testsuite/gcc.target/i386/pr52252-atom-1.c
 b/gcc/testsuite/gcc.target/i386/pr52252-atom-1.c
 new file mode 100644
 index 000..1fbd258
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/i386/pr52252-atom-1.c
 @@ -0,0 +1,22 @@
 +/* { dg-do compile } */
 +/* { dg-require-effective-target ssse3 } */
 +/* { dg-options -O2 -ftree-vectorize -mssse3 -mtune=slm } */
 +#define byte unsigned char
 +
 +void
 +pair_mul_sum(byte *in, byte *out, int size)
 +{
 +  int j;
 +  for(j = 0; j  size; j++)
 +{
 +  byte a = in[0];
 +  byte b = in[1];
 +  byte c = in[2];
 +  byte d = in[3];
 +  out[0] = (byte)(a * b) + (byte)(b * c) + (byte)(c * d) + (byte)(d * a);
 +  in += 4;
 +  out += 1;
 +}
 +}
 +
 +/* { dg-final { scan-assembler palignr } } */
 diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
 index 0bc0356..d2e0e93 100644
 --- a/gcc/tree-vect-data-refs.c
 +++ b/gcc/tree-vect-data-refs.c
 @@ -5379,8 +5379,9 @@ vect_shift_permute_load_chain (vectree dr_chain,
memcpy (result_chain-address (), dr_chain.address (),
   length * sizeof (tree));

 -  if (length == 2  LOOP_VINFO_VECT_FACTOR (loop_vinfo)  4)
 +  if (exact_log2 (length) != -1  LOOP_VINFO_VECT_FACTOR (loop_vinfo)  4)
  {
 +  unsigned int j, log_length = exact_log2 (length);
for (i = 0; i  nelt / 2; ++i)
 sel[i] = i * 2;
for (i = 0; i  nelt / 2; ++i)
 @@ -5441,37 +5442,44 @@ vect_shift_permute_load_chain (vectree dr_chain,
select_mask = vect_gen_perm_mask (vectype, sel);
gcc_assert (select_mask != NULL);

 -  first_vect = dr_chain[0];
 -  second_vect = dr_chain[1];
 -
 -  data_ref = make_temp_ssa_name (vectype, NULL, vect_shuffle2);
 -  perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref,
 -   first_vect, first_vect,
 -   perm2_mask1);
 -  vect_finish_stmt_generation (stmt, perm_stmt, gsi);
 -  vect[0] = data_ref;
 +  for (i = 0; i  log_length; i++)
 +   {
 + for (j = 0; j  length; j += 2)
 +   {
 + first_vect = dr_chain[j];
 + second_vect = dr_chain[j + 1];

 -  data_ref = make_temp_ssa_name (vectype, NULL, vect_shuffle2);
 -  perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref,
 -   second_vect, second_vect,
 -   perm2_mask2);
 -  vect_finish_stmt_generation (stmt, perm_stmt, gsi);
 -  vect[1] = data_ref;
 + data_ref = make_temp_ssa_name (vectype, NULL, vect_shuffle2);
 + perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, 
 data_ref,
 +   first_vect, 
 first_vect,
 +   perm2_mask1);
 + vect_finish_stmt_generation (stmt, perm_stmt, gsi);
 + vect[0] = data_ref;

 -  data_ref = make_temp_ssa_name (vectype, NULL, vect_shift);
 -  perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref,
 -   vect[0], vect[1],
 -   shift1_mask);
 -  vect_finish_stmt_generation (stmt, perm_stmt, gsi);
 -  (*result_chain)[1] = data_ref;
 + data_ref = make_temp_ssa_name (vectype, NULL, vect_shuffle2);
 + perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, 
 data_ref,
 +   second_vect,
 second_vect,
 +   perm2_mask2);
 + vect_finish_stmt_generation (stmt, perm_stmt, gsi);
 + vect[1] = data_ref;

 -  data_ref = make_temp_ssa_name (vectype, NULL, vect_select);
 -  perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref,
 -   vect[0], vect[1],
 -   select_mask);
 -  vect_finish_stmt_generation (stmt, perm_stmt, gsi);
 -  (*result_chain)[0] = data_ref;
 

Re: [PATCH] Fix for ipa/63795, ipa/63622

2014-11-11 Thread Richard Biener
On Tue, Nov 11, 2014 at 3:22 PM, Martin Liška mli...@suse.cz wrote:
 Hello.

 Following patch adds checking for aliasing support. Patch can bootstrap on
 x86_64-apple-darwin1 and is part of patches needed for bootstrap restory on
 the target. I plan to introduce additional patch that will cover testsuite
 failures for the target.

 Ready for trunk?

Aliasing sounds odd here.  I'd expand it to Symbol aliases, likewise
rename target_supports_aliasing_p to target_supports_symbol_aliases_p.

Ok with that change.

Thanks,
Richard.

 Thanks,
 Martin


Re: Fix libtool.m4 for Darwin = 10.10

2014-11-11 Thread Jack Howarth
FX,
It looks like you missed patching a few configure files...

libjava/classpath/configure
libjava/configure

are definitely needed while

libgo/configure
zlib/configure

should be added for completeness.
  Jack

On Tue, Nov 11, 2014 at 4:15 AM, FX fxcoud...@gmail.com wrote:
 Your patch contains lots of other changes, not just the libtool.m4
 change.  Please filter those out.

 Sorry about that. The patch attached should be clean, and the ChangeLog 
 entries formatted as they should.

 OK to commit? This touches so many area it probably needs a build maintainer 
 or global maintainer to approve it.

 FX





Re: [PATCH] Fix some ICF gimple_call handling issues

2014-11-11 Thread Martin Liška

On 11/11/2014 12:11 AM, Jakub Jelinek wrote:

On Mon, Nov 10, 2014 at 10:08:54PM +0100, Richard Biener wrote:

@@ -662,9 +662,49 @@ func_checker::compare_gimple_call (gimpl
   t1 = gimple_call_fndecl (s1);
   t2 = gimple_call_fndecl (s2);


Just drop these and compare gimple_call_fn only.


+  tree chain1 = gimple_call_chain (s1);
+  tree chain2 = gimple_call_chain (s2);
+
+  if ((chain1  !chain2) || (!chain1  chain2))
+return return_false_with_msg (Tree call chains are different);


I miss a compare_operands for the call chain.

Otherwise OK.


Here is what I've committed after another bootstrap/regtest.
Note, I've tried:
__attribute__ ((noinline, noclone))
int f1 (int x)
{
   int y = 3, z = 4;
   __attribute__ ((noinline, noclone)) int
   f2 (int a) { return a + x + y + z; }
   return f2 (5);
}

__attribute__ ((noinline, noclone))
int f3 (int x)
{
   int y = 3, z = 4;
   __attribute__ ((noinline, noclone)) int
   f4 (int a) { return a + x + y + z; }
   return f4 (5);
}

int
main ()
{
   if (f1 (9) != 21 || f3 (9) != 21)
 __builtin_abort ();
   return 0;
}
but ICF doesn't optimize this with or without the patch,
as the structs aren't the same type (supposedly different alias set?),
even when they have the same members laid out the same.


Hello Jakub.

You are right, more precisely types_compatible_p return false for these
two structures. I'll write this situation to my TODO list.

Thank you for sending the patch.
Martin



2014-11-11  Jakub Jelinek  ja...@redhat.com
Martin Liska  mli...@suse.cz

* ipa-icf-gimple.c (func_checker::compare_bb): Fix comment typo.
(func_checker::compare_gimple_call): Compare gimple_call_fn,
gimple_call_chain, gimple_call_fntype and call flags.
testsuite/
* gcc.dg/ubsan/ipa-icf-1.c: New test.
* gcc.dg/ipa/ipa-icf-31.c: New test.

--- gcc/ipa-icf-gimple.c.jj 2014-10-30 14:42:20.0 +0100
+++ gcc/ipa-icf-gimple.c2014-11-10 19:08:38.339986360 +0100
@@ -554,7 +554,7 @@ func_checker::parse_labels (sem_bb *bb)

 In general, a collection of equivalence dictionaries is built for types
 like SSA names, declarations (VAR_DECL, PARM_DECL, ..). This infrastructure
-   is utilized by every statement-by-stament comparison function.  */
+   is utilized by every statement-by-statement comparison function.  */

  bool
  func_checker::compare_bb (sem_bb *bb1, sem_bb *bb2)
@@ -659,12 +659,39 @@ func_checker::compare_gimple_call (gimpl
if (gimple_call_num_args (s1) != gimple_call_num_args (s2))
  return false;

-  t1 = gimple_call_fndecl (s1);
-  t2 = gimple_call_fndecl (s2);
-
-  /* Function pointer variables are not supported yet.  */
+  t1 = gimple_call_fn (s1);
+  t2 = gimple_call_fn (s2);
if (!compare_operand (t1, t2))
-return return_false();
+return return_false ();
+
+  /* Compare flags.  */
+  if (gimple_call_internal_p (s1) != gimple_call_internal_p (s2)
+  || gimple_call_ctrl_altering_p (s1) != gimple_call_ctrl_altering_p (s2)
+  || gimple_call_tail_p (s1) != gimple_call_tail_p (s2)
+  || gimple_call_return_slot_opt_p (s1) != gimple_call_return_slot_opt_p 
(s2)
+  || gimple_call_from_thunk_p (s1) != gimple_call_from_thunk_p (s2)
+  || gimple_call_va_arg_pack_p (s1) != gimple_call_va_arg_pack_p (s2)
+  || gimple_call_alloca_for_var_p (s1) != gimple_call_alloca_for_var_p (s2)
+  || gimple_call_with_bounds_p (s1) != gimple_call_with_bounds_p (s2))
+return false;
+
+  if (gimple_call_internal_p (s1)
+   gimple_call_internal_fn (s1) != gimple_call_internal_fn (s2))
+return false;
+
+  tree fntype1 = gimple_call_fntype (s1);
+  tree fntype2 = gimple_call_fntype (s2);
+  if ((fntype1  !fntype2)
+  || (!fntype1  fntype2)
+  || (fntype1  !types_compatible_p (fntype1, fntype2)))
+return return_false_with_msg (call function types are not compatible);
+
+  tree chain1 = gimple_call_chain (s1);
+  tree chain2 = gimple_call_chain (s2);
+  if ((chain1  !chain2)
+  || (!chain1  chain2)
+  || !compare_operand (chain1, chain2))
+return return_false_with_msg (static call chains are different);

/* Checking of argument.  */
for (i = 0; i  gimple_call_num_args (s1); ++i)
--- gcc/testsuite/gcc.dg/ubsan/ipa-icf-1.c.jj   2014-11-10 19:00:53.509525071 
+0100
+++ gcc/testsuite/gcc.dg/ubsan/ipa-icf-1.c  2014-11-10 19:02:21.836925806 
+0100
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-skip-if  { *-*-* } { * } { -O2 } } */
+/* { dg-options -fsanitize=undefined -fipa-icf } */
+
+__attribute__ ((noinline, noclone))
+int f1 (int x, int y)
+{
+  return x + y;
+}
+
+__attribute__ ((noinline, noclone))
+int f2 (int x, int y)
+{
+  return x - y;
+}
+
+int
+main ()
+{
+  if (f1 (5, 6) != 11 || f2 (5, 6) != -1)
+__builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.dg/ipa/ipa-icf-31.c.jj2014-11-10 18:59:16.604294652 
+0100
+++ gcc/testsuite/gcc.dg/ipa/ipa-icf-31.c   2014-11-10 18:59:59.690519616 
+0100
@@ -0,0 +1,41 @@
+/* { 

Re: Fix libtool.m4 for Darwin = 10.10

2014-11-11 Thread FX
It looks like you missed patching a few configure files...
 
 libjava/classpath/configure
 libjava/configure

Aren’t those under external control? i.e. maintained out of GCC tree?


 libgo/configure
 zlib/configure

Those are maintained upstream, and we import them directly. I’ve filed a bug 
for libgo (https://code.google.com/p/go/issues/detail?id=9089).

FX

[PATCH] Fix for mklog

2014-11-11 Thread Marat Zakirov

Hi all!

I found another issue of mklog.

Example:

--- a/gcc/asan.h
+++ b/gcc/asan.h
@@ -103,4 +103,14 @@ asan_intercepted_p (enum built_in_function fcode)
 || fcode == BUILT_IN_STRNCMP
 || fcode == BUILT_IN_STRNCPY;
 }
+
+/* Convert LEN to HOST_WIDE_INT if possible.
+   Returns -1 otherwise.  */
+
+static inline HOST_WIDE_INT
+maybe_tree_to_shwi (tree len)
+{
+  return tree_fits_shwi_p (len) ? tree_to_shwi (len) : -1;
+}
+
 #endif /* TREE_ASAN */

mklog output:

gcc/ChangeLog:

DATE

* asan.h (asan_intercepted_p):
(maybe_tree_to_shwi):

Currently mklog finds some changes for asan_intercepted_p which are do 
not exist.


Patched mklog output:

gcc/ChangeLog:

DATE

* asan.h (maybe_tree_to_shwi):

Attached patch make mklog to stop search for changes inside function 
once '}' occur.


Ok, to commit?

--Marat
contrib/ChangeLog:

2014-11-06  Marat Zakirov  m.zaki...@samsung.com

	* mklog: Symbol '}' stops search for changes.  

diff --git a/contrib/mklog b/contrib/mklog
index 6ed4c6e..7de485d 100755
--- a/contrib/mklog
+++ b/contrib/mklog
@@ -117,7 +117,7 @@ sub is_top_level {
 	} else {
 		$function =~ s/^.//;
 	}
-	return $function  $function !~ /^[\s{}]/;
+	return $function  $function !~ /^[\s{]/;
 }
 
 # For every file in the .diff print all the function names in ChangeLog


Re: [PATCH, aarch64] Add prefetch support

2014-11-11 Thread Marcus Shawcroft
On 30 October 2014 08:54, Gopalasubramanian, Ganesh
ganesh.gopalasubraman...@amd.com wrote:

 2014-10-30  Ganesh Gopalasubramanian ganesh.gopalasubraman...@amd.com

Check the whitespace in your ChangeLog line.

 * config/arm/types.md (define_attr type): Add prefetch.

The existing schedulers use 'load1'.  We can of course split that into
two introducing prefetch and update all of the existing schedulers
to reflect the change.  However I suggest we do that as a separate
activity when someone actually needs the distinction, note this change
will require updating the schedulers for both ARM and AArch64 backends
not just those relevant to AArch64.  For this prefetch patch I suggest
we go with the existing load1.

The inline patch has been munged by your mailer, I tried applying the
patch to my tree but it is full of  escape sequences.  Can you either
fix your mailer or submit patches as attachments?

 diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
 index 74b554e..12a3f170 100644
 --- a/gcc/config/aarch64/aarch64.md
 +++ b/gcc/config/aarch64/aarch64.md
 @@ -320,6 +320,38 @@
[(set_attr type no_insn)]
 )

 +
 +(define_insn prefetch
 +  [(prefetch (match_operand:DI 0 address_operand r)
 +(match_operand:QI 1 const_int_operand )
 +(match_operand:QI 2 const_int_operand ))]
 +  

 +  *
 +{

Use {} instead of *{, then all of the extra quoting in the C below goes away.

 +  const char * pftype[2][10]
 += { {\PLDL1STRM\, \PLDL3KEEP\, \PLDL2KEEP\, \PLDL1KEEP\},
 +   {\PSTL1STRM\, \PSTL3KEEP\, \PSTL2KEEP\, \PSTL1KEEP\},
 +  };
 +
 +  int locality = INTVAL (operands[2]);
 +  char pattern[100];
 +
 +  gcc_assert (IN_RANGE (locality, 0, 3));
 +
 +  strcpy (pattern, \prfm\\t\);
 +  strcat (pattern, (const char*)pftype[INTVAL(operands[1])][locality]);
 +  strcat (pattern, \, %a0\);

Use sprintf() rather that multiple calls to cpy and cat.  I suspect
the cast in front of pftype is superflous?

 +
 +  output_asm_insn (pattern,
 +   operands);

Unnecessary line break.

Cheers
/Marcus


Re: [PATCH][AArch64] LR register not used in leaf functions

2014-11-11 Thread Marcus Shawcroft
On 30 September 2014 16:00, Jiong Wang jiong.w...@arm.com wrote:

 gcc/
   * config/aarch64/aarch64.h (CALL_USED_REGISTERS): Mark LR as caller-save.
   (EPILOGUE_USES): Guard the check by epilogue_completed.
   * config/aarch64/aarch64.c (aarch64_layout_frame): Explictly check for LR.
   (aarch64_can_eliminate): Check LR_REGNUM liveness.

 gcc/testsuite/
   * gcc.target/aarch64/lr_free_1.c: New testcase for -fomit-frame-pointer.
   * gcc.target/aarch64/lr_free_2.c: New testcase for leaf
 -fno-omit-frame-pointer.

OK /Marcus


Re: [PATCH][AArch64] Properly guard CUMULATIVE_ARGS definition and remove 'enum' from machine_mode in aarch64.h

2014-11-11 Thread Marcus Shawcroft
On 31 October 2014 11:21, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
 Hi all,

 Following up from https://gcc.gnu.org/ml/gcc-patches/2014-10/msg03153.html
 this fixes up the aarch64 port
 accordingly to guard CUMULATIVE_ARGS properly so that we can remove the enum
 keyword from machine_mode.

OK /Marcus


Re: [PATCH] Fix for mklog

2014-11-11 Thread Diego Novillo

On 11/11/14 09:46, Marat Zakirov wrote:



Attached patch make mklog to stop search for changes inside function
once '}' occur.

Ok, to commit?



OK. Thanks.


Diego.


Re: [patch] OpenACC fortran front end

2014-11-11 Thread Julian Brown
On Tue, 11 Nov 2014 08:10:29 +0100
Jakub Jelinek ja...@redhat.com wrote:

 On Mon, Nov 10, 2014 at 02:43:38PM -0800, Cesar Philippidis wrote:
   I'll post a separate patch with the fortran tests later. If
   anyone wants to test this patch, please use gomp-4_0-branch
   instead. You don't need a CUDA accelerator to use
   OpenACC, and some of the runtime tests will fail because that
   branch doesn't include the nvptx backend.
   Now that the first series of PTX target patches have been
   committed: I assume it is still true that nvptx doesn't work
   because the libgomp bits aren't in yes, isn't it?
  
  That's correct. The nvptx backend also depends on the offloading
  changes that a team from Intel is working on for the MIC target.
  But Julian should be posting the libgomp patches tomorrow, I think,
  since his changes are somewhat self-contained.
 
 For the middle-end and libgomp changes, can you talk to the Intel
 folks to update their git branch to latest trunk (so that you have
 the nvptx bits in there) and send middle-end and libgomp diffs
 against that? As far as I remember, most of the changes from the
 branch are now approved, they are just waiting for review of the LTO
 related changes in the middle-end (please, correct me if I've missed
 something).

We've been preparing new patches against trunk for the libgomp and
middle-end bits: I've now posted the former, and the latter are on
their way soon, I believe. The middle-end bits are also present on the
gomp-4_0-branch SVN branch (likewise, the libgomp pieces), and I
believe we're planning to merge the PTX bits there also now they've
been committed to trunk.

Is it really worthwhile merging our patches to yet another branch at
this stage?

Thanks,

Julian


[gomp4] Re: FWD: Re: OpenACC subarray specifications in the GCC Fortran front end

2014-11-11 Thread Thomas Schwinge
Hi!

On Thu, 24 Jul 2014 15:11:08 +0200, I wrote:
 On Wed, 23 Jul 2014 17:42:32 -0700, Cesar Philippidis 
 ce...@codesourcery.com wrote:
  On 07/11/2014 03:29 AM, Jakub Jelinek wrote:
   On Fri, Jul 11, 2014 at 12:11:10PM +0200, Thomas Schwinge wrote:
   To avoid duplication of work: with Jakub's Fortran OpenMP 4 target
   changes recently committed to trunk, and now merged into gomp-4_0-branch,
   I have trimmed down Ilmir's patch to just the OpenACC bits, OpenMP 4
   target changes removed, and TODO markers added to integrate into that.
   
   Resolving the TODO markers would be nice, indeed.
  
  This patch has the openacc data clauses use the new openmp maps. In the
  process of doing so, I removed a lot of the old OMP_LIST_ enums and
  added a few OMP_MAP enums to match what the c frontend currently supports.
 
 Thanks!

 OMP_LIST_DEVICEPTR remains to be converted, which can be done as a later
 follow-up patch.

I have now committed the following to gomp-4_0-branch in r217352:

commit 779291a1fe21b3c0b0c0c615a0557f070f495d14
Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Tue Nov 11 14:52:04 2014 +

OpenACC deviceptr clause: Fix handling in Fortran.

With two gcc_asserts restored, and not handling OpenACC deviceptr clauses in
the same data paths as other OpenACC data clauses, we'd run into an internal
compiler error, when the deviceptr clause is used with (non-offloaded) 
OpenACC
data regions:

FAIL: gfortran.dg/goacc/data-tree.f95   -O  (internal compiler error)
FAIL: gfortran.dg/goacc/data-tree.f95   -O  (test for excess errors)

gcc/fortran/
* gfortran.h (OMP_LIST_DEVICEPTR): Remove, and instead...
(enum gfc_omp_map_op): ... add OMP_MAP_FORCE_DEVICEPTR here.
* dump-parse-tree.c (show_omp_clauses): Update.
* openmp.c (gfc_match_omp_clauses, resolve_omp_clauses)
(gfc_resolve_oacc_declare): Likewise.
* trans-openmp.c (gfc_trans_omp_clauses): Likewise.
gcc/
* omp-low.c (lower_omp_target): Restore two gcc_asserts.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@217352 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp|  4 
 gcc/fortran/ChangeLog.gomp|  9 +
 gcc/fortran/dump-parse-tree.c |  1 -
 gcc/fortran/gfortran.h|  6 +++---
 gcc/fortran/openmp.c  | 38 ++
 gcc/fortran/trans-openmp.c|  6 +++---
 gcc/omp-low.c |  2 ++
 7 files changed, 43 insertions(+), 23 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 9c997ce..dacfad8 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,7 @@
+2014-11-11  Thomas Schwinge  tho...@codesourcery.com
+
+   * omp-low.c (lower_omp_target): Restore two gcc_asserts.
+
 2014-11-06  Thomas Schwinge  tho...@codesourcery.com
 
* gimple.h (is_gimple_omp_oacc_specifically): Return true for
diff --git gcc/fortran/ChangeLog.gomp gcc/fortran/ChangeLog.gomp
index d10560e..1ae1d31 100644
--- gcc/fortran/ChangeLog.gomp
+++ gcc/fortran/ChangeLog.gomp
@@ -1,3 +1,12 @@
+2014-11-11  Thomas Schwinge  tho...@codesourcery.com
+
+   * gfortran.h (OMP_LIST_DEVICEPTR): Remove, and instead...
+   (enum gfc_omp_map_op): ... add OMP_MAP_FORCE_DEVICEPTR here.
+   * dump-parse-tree.c (show_omp_clauses): Update.
+   * openmp.c (gfc_match_omp_clauses, resolve_omp_clauses)
+   (gfc_resolve_oacc_declare): Likewise.
+   * trans-openmp.c (gfc_trans_omp_clauses): Likewise.
+
 2014-11-05  Thomas Schwinge  tho...@codesourcery.com
 
* openmp.c (OMP_CLAUSE_HOST, OMP_CLAUSE_SELF): Merge into the new
diff --git gcc/fortran/dump-parse-tree.c gcc/fortran/dump-parse-tree.c
index 57af730..e7aff22 100644
--- gcc/fortran/dump-parse-tree.c
+++ gcc/fortran/dump-parse-tree.c
@@ -1252,7 +1252,6 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
switch (list_type)
  {
  case OMP_LIST_COPY: type = COPY; break;
- case OMP_LIST_DEVICEPTR: type = DEVICEPTR; break;
  case OMP_LIST_USE_DEVICE: type = USE_DEVICE; break;
  case OMP_LIST_DEVICE_RESIDENT: type = USE_DEVICE; break;
  case OMP_LIST_CACHE: type = ; break;
diff --git gcc/fortran/gfortran.h gcc/fortran/gfortran.h
index 6bd131c..18adbee 100644
--- gcc/fortran/gfortran.h
+++ gcc/fortran/gfortran.h
@@ -1141,7 +1141,8 @@ typedef enum
   OMP_MAP_FORCE_TO,
   OMP_MAP_FORCE_FROM,
   OMP_MAP_FORCE_TOFROM,
-  OMP_MAP_FORCE_PRESENT
+  OMP_MAP_FORCE_PRESENT,
+  OMP_MAP_FORCE_DEVICEPTR
 }
 gfc_omp_map_op;
 
@@ -1184,8 +1185,7 @@ enum
   OMP_LIST_REDUCTION,
   OMP_LIST_COPY,
   OMP_LIST_DATA_CLAUSE_FIRST = OMP_LIST_COPY,
-  OMP_LIST_DEVICEPTR,
-  OMP_LIST_DATA_CLAUSE_LAST = OMP_LIST_DEVICEPTR,
+  OMP_LIST_DATA_CLAUSE_LAST = OMP_LIST_DATA_CLAUSE_FIRST,
   OMP_LIST_DEVICE_RESIDENT,
   OMP_LIST_USE_DEVICE,
   OMP_LIST_CACHE,
diff --git gcc/fortran/openmp.c gcc/fortran/openmp.c

Re: Fix libtool.m4 for Darwin = 10.10

2014-11-11 Thread Jack Howarth
On Tue, Nov 11, 2014 at 9:45 AM, FX fxcoud...@gmail.com wrote:
It looks like you missed patching a few configure files...

 libjava/classpath/configure
 libjava/configure

 Aren’t those under external control? i.e. maintained out of GCC tree?

However these are maintained, the libjava configure files still need
to be patched to prevent their associated shared libraries from being
inappropriately linked with -flat_namespace on darwin14 and later.
Since you are simply patching all the configure files, the question
seems academic unless you switch to properly regenerating all of the
configure files using a fixed libtool.m4.
 Jack



 libgo/configure
 zlib/configure

 Those are maintained upstream, and we import them directly. I’ve filed a bug 
 for libgo (https://code.google.com/p/go/issues/detail?id=9089).

 FX


Re: [PATCH] __builtin_*_overflow builtins (PR c/59708)

2014-11-11 Thread Uros Bizjak
Hello!

 This patch implements what I understood from Joseph's
 https://gcc.gnu.org/ml/gcc/2013-10/msg00280.html
 and also adds clang compatible builtins (which implement
 small subset of the typegeneric ones).

 Besides the clang compatibility builtins, there are 3 new
 type-generic builtins, __builtin_{add,sub,mul}_overflow, which
 have 3 arguments, two arbitrary integral arguments, and pointer
 to some integer type.  These builtins extend both arguments
 to infinite precision signed arguments, perform {+,-,*} operations
 in the infinite precision and finally cast the result to the type
 pointed by the third argument and store the result there (modulo
 2^precision of the type).  If the infinite precision result is equal
 to the stored value, the built-ins return false (no overflow), otherwise
 they return true.

 The built-ins are folded immediately into internal functions that return
 both results (integer result and boolean overflow flag) as _Complex integer
 result, so that the integer result doesn't have to be addressable.
 It partly reuses code to emit -fsanitize=signed-integer-overflow internal
 functions, for signed overflows on e.g. i?86 will use jo/jno/seto/setno
 instructions after the arithmetic instructions; for imsogmed arithmetic
 overflow, combiner manages to transform what is emitted into
 jc/jnc/setc/setnc where possible.

 After discussions with Richard on IRC, the internal functions have
 arbitrary integral arguments, which can have different or same signs,
 different or same precisions, and the result type is _Complex integer
 derived from the call's third argument.  gimple-fold.c and tree-vrp.c
 is tought to perform some optimizations on these, and most of the smarts
 are performed during expansion (many of the 16 different +/-
 signarg1/signarg2/signresult cases require different code, and for *
 there are also a couple of different cases).
 If somebody can come up with some shorter sequence how to test for the
 less common cases, I'd appreciate hints (internal-fn.c has big comments
 which explain how it now computes the integral result and especially
 the overflow flag).

 Bootstrapped/regtested on x86_64-linux and i686-linux (on top of the ICF
 gimple_call fix I've mailed a few minutes ago), ok for trunk?

 2014-11-10  Jakub Jelinek  ja...@redhat.com

 PR c/59708
 * builtin-attrs.def (ATTR_NOTHROW_TYPEGENERIC_LEAF): New attribute.
 * builtins.c (fold_builtin_arith_overflow): New function.
 (fold_builtin_3): Use it.
 * builtins.def (BUILT_IN_ADD_OVERFLOW, BUILT_IN_SUB_OVERFLOW,
 BUILT_IN_MUL_OVERFLOW, BUILT_IN_SADD_OVERFLOW,
 BUILT_IN_SADDL_OVERFLOW, BUILT_IN_SADDLL_OVERFLOW,
 BUILT_IN_SSUB_OVERFLOW, BUILT_IN_SSUBL_OVERFLOW,
 BUILT_IN_SSUBLL_OVERFLOW, BUILT_IN_SMUL_OVERFLOW,
 BUILT_IN_SMULL_OVERFLOW, BUILT_IN_SMULLL_OVERFLOW,
 BUILT_IN_UADDL_OVERFLOW, BUILT_IN_UADDLL_OVERFLOW,
 BUILT_IN_USUB_OVERFLOW, BUILT_IN_USUBL_OVERFLOW,
 BUILT_IN_USUBLL_OVERFLOW, BUILT_IN_UMUL_OVERFLOW,
 BUILT_IN_UMULL_OVERFLOW, BUILT_IN_UMULLL_OVERFLOW): New built-in functions.
 * builtin-types.def (BT_PTR_UINT, BT_PTR_ULONG, BT_PTR_LONGLONG,
 BT_FN_BOOL_INT_INT_INTPTR, BT_FN_BOOL_LONG_LONG_LONGPTR,
 BT_FN_BOOL_LONGLONG_LONGLONG_LONGLONGPTR,
 BT_FN_BOOL_UINT_UINT_UINTPTR, BT_FN_BOOL_ULONG_ULONG_ULONGPTR,
 BT_FN_BOOL_ULONGLONG_ULONGLONG_ULONGLONGPTR, BT_FN_BOOL_VAR): New.
 * expr.c (write_complex_part): Remove prototype, no longer static.
 * expr.h (write_complex_part): New prototype.
 * function.c (aggregate_value_p): For internal functions return 0.
 * gimple-fold.c (arith_overflowed_p, find_non_realpart_uses): New functions.
 (gimple_fold_call): Fold {ADD,SUB,MUL}_OVERFLOW internal calls.
 * gimple-fold.h (arith_overflowed_p): New prototype.
 * gimplify.c (gimplify_call_expr): Handle gimplification of internal calls 
 with lhs.
 * internal-fn.c (get_range_pos_neg, get_min_precision,
 expand_arith_overflow_result_store): New functions.
 (ubsan_expand_si_overflow_addsub_check): Renamed to ...
 (expand_addsub_overflow): ... this.  Add LOC, LHS, ARG0, ARG1,
 UNSR_P, UNS0_P, UNS1_P, IS_UBSAN arguments, remove STMT argument.
 Handle ADD_OVERFLOW and SUB_OVERFLOW expansion.
 (ubsan_expand_si_overflow_neg_check): Renamed to ...
 (expand_neg_overflow): ... this.  Add LOC, LHS, ARG1, IS_UBSAN
 arguments, remove STMT argument.  Handle SUB_OVERFLOW with
 0 as first argument expansion.
 (ubsan_expand_si_overflow_mul_check): Renamed to ...
 (expand_mul_overflow): ... this.  Add LOC, LHS, ARG0, ARG1,
 UNSR_P, UNS0_P, UNS1_P, IS_UBSAN arguments, remove STMT argument.
 Handle MUL_OVERFLOW expansion.
 (expand_UBSAN_CHECK_ADD): Use expand_addsub_overflow, prepare
 arguments for it.
 (expand_UBSAN_CHECK_SUB): Use expand_addsub_overflow or
 expand_neg_overflow, prepare arguments for it.
 (expand_UBSAN_CHECK_MUL): Use expand_mul_overflow, prepare arguments
 for it.
 (expand_arith_overflow, expand_ADD_OVERFLOW, expand_SUB_OVERFLOW,
 expand_MUL_OVERFLOW): New functions.
 * internal-fn.def (ADD_OVERFLOW, SUB_OVERFLOW, 

Re: [C++ Patch] PR 63265

2014-11-11 Thread Paolo Carlini

Hi,

On 11/11/2014 02:19 PM, Jason Merrill wrote:

On 11/11/2014 08:04 AM, Paolo Carlini wrote:

-tree cond = RECUR (TREE_OPERAND (t, 0));
+tree cond
+  = maybe_constant_value (fold_non_dependent_expr_sfinae
+  (RECUR (TREE_OPERAND (t, 0)), tf_none));


I like this approach, but if the result of maybe_constant_value 
doesn't turn out to be an INTEGER_CST, we want to end up with the 
result of RECUR rather than the result of fold_non_dependent_expr, as 
the latter might not be suitable for subsequent tsubsting.

I see. Something like the below, then?

Thanks,
Paolo.

/
Index: cp/pt.c
===
--- cp/pt.c (revision 217342)
+++ cp/pt.c (working copy)
@@ -15138,11 +15138,13 @@ tsubst_copy_and_build (tree t,
 case COND_EXPR:
   {
tree cond = RECUR (TREE_OPERAND (t, 0));
+   tree folded_cond = (maybe_constant_value
+   (fold_non_dependent_expr_sfinae (cond, tf_none)));
tree exp1, exp2;
 
-   if (TREE_CODE (cond) == INTEGER_CST)
+   if (TREE_CODE (folded_cond) == INTEGER_CST)
  {
-   if (integer_zerop (cond))
+   if (integer_zerop (folded_cond))
  {
++c_inhibit_evaluation_warnings;
exp1 = RECUR (TREE_OPERAND (t, 1));
@@ -15156,6 +15158,7 @@ tsubst_copy_and_build (tree t,
exp2 = RECUR (TREE_OPERAND (t, 2));
--c_inhibit_evaluation_warnings;
  }
+   cond = folded_cond;
  }
else
  {
Index: testsuite/g++.dg/cpp0x/constexpr-63265.C
===
--- testsuite/g++.dg/cpp0x/constexpr-63265.C(revision 0)
+++ testsuite/g++.dg/cpp0x/constexpr-63265.C(working copy)
@@ -0,0 +1,19 @@
+// PR c++/63265
+// { dg-do compile { target c++11 } }
+
+#define LSHIFT (sizeof(unsigned int) * __CHAR_BIT__)
+
+template int lshift
+struct SpuriouslyWarns1 {
+static constexpr unsigned int v = lshift  LSHIFT ? 1U  lshift : 0;
+};
+
+static_assert(SpuriouslyWarns1LSHIFT::v == 0, Impossible occurred);
+
+template int lshift
+struct SpuriouslyWarns2 {
+static constexpr bool okay = lshift  LSHIFT;
+static constexpr unsigned int v = okay ? 1U  lshift : 0;
+};
+
+static_assert(SpuriouslyWarns2LSHIFT::v == 0, Impossible occurred);


Re: Fix libtool.m4 for Darwin = 10.10

2014-11-11 Thread FX

 Since you are simply patching all the configure files, the question
 seems academic unless you switch to properly regenerating all of the
 configure files using a fixed libtool.m4.

I am actually proposing to fix libtool.m4 and regenerate the configure scripts 
(which gives the same result as patching, as expected).


 However these are maintained, the libjava configure files still need
 to be patched to prevent their associated shared libraries from being
 inappropriately linked with -flat_namespace on darwin14 and later.

Yes, but I don’t know whether libjava and classpath should be patched in GCC, 
or whether I should report them to be patched somewhere else (like libgo and 
zlib, for example). It’s important to do it properly, otherwise codebases 
diverge and maintance becomes difficult.

FX

[gomp4] Re: FWD: Re: OpenACC subarray specifications in the GCC Fortran front end

2014-11-11 Thread Thomas Schwinge
Hi!

On Mon, 28 Jul 2014 10:00:46 -0700, Cesar Philippidis ce...@codesourcery.com 
wrote:
 On 07/25/2014 09:01 AM, Thomas Schwinge wrote:
  [...] you may directly fold in the following patch to nuke the
  unused OMP_LIST_COPY (or do that later).

  --- gcc/fortran/dump-parse-tree.c
  +++ gcc/fortran/dump-parse-tree.c
  @@ -1257,7 +1257,6 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
  const char *type = NULL;
  switch (list_type)
{
  - case OMP_LIST_COPY: type = COPY; break;
case OMP_LIST_DEVICEPTR: type = DEVICEPTR; break;
case OMP_LIST_USE_DEVICE: type = USE_DEVICE; break;
case OMP_LIST_DEVICE_RESIDENT: type = USE_DEVICE; break;
  --- gcc/fortran/gfortran.h
  +++ gcc/fortran/gfortran.h
  @@ -1157,9 +1157,8 @@ enum
 OMP_LIST_TO,
 OMP_LIST_FROM,
 OMP_LIST_REDUCTION,
  -  OMP_LIST_COPY,
  -  OMP_LIST_DATA_CLAUSE_FIRST = OMP_LIST_COPY,
 OMP_LIST_DEVICEPTR,
  +  OMP_LIST_DATA_CLAUSE_FIRST = OMP_LIST_DEVICEPTR,
 OMP_LIST_DATA_CLAUSE_LAST = OMP_LIST_DEVICEPTR,
 OMP_LIST_DEVICE_RESIDENT,
 OMP_LIST_USE_DEVICE,
 
 I'll take care of this separately.

I have now committed the following to gomp-4_0-branch in r217353:

commit 782a3dab5694d561f80bda7a29000250a681781a
Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Tue Nov 11 14:52:16 2014 +

Fortran OMP_LIST_* maintenance.

gcc/fortran/
* gfortran.h (OMP_LIST_COPY, OMP_LIST_DATA_CLAUSE_FIRST)
(OMP_LIST_DATA_CLAUSE_LAST, OMP_LIST_LAST): Remove.
* dump-parse-tree.c (show_omp_clauses): Update.
* openmp.c (resolve_omp_clauses, gfc_resolve_oacc_declare):
Likewise.
* trans-openmp.c (gfc_trans_omp_clauses): Likewise.
(gfc_trans_omp_map_clause_list): Remove.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@217353 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/fortran/ChangeLog.gomp|  8 
 gcc/fortran/dump-parse-tree.c |  1 -
 gcc/fortran/gfortran.h|  6 +-
 gcc/fortran/openmp.c  | 42 ++
 gcc/fortran/trans-openmp.c| 31 ---
 5 files changed, 19 insertions(+), 69 deletions(-)

diff --git gcc/fortran/ChangeLog.gomp gcc/fortran/ChangeLog.gomp
index 1ae1d31..f846890 100644
--- gcc/fortran/ChangeLog.gomp
+++ gcc/fortran/ChangeLog.gomp
@@ -1,5 +1,13 @@
 2014-11-11  Thomas Schwinge  tho...@codesourcery.com
 
+   * gfortran.h (OMP_LIST_COPY, OMP_LIST_DATA_CLAUSE_FIRST)
+   (OMP_LIST_DATA_CLAUSE_LAST, OMP_LIST_LAST): Remove.
+   * dump-parse-tree.c (show_omp_clauses): Update.
+   * openmp.c (resolve_omp_clauses, gfc_resolve_oacc_declare):
+   Likewise.
+   * trans-openmp.c (gfc_trans_omp_clauses): Likewise.
+   (gfc_trans_omp_map_clause_list): Remove.
+
* gfortran.h (OMP_LIST_DEVICEPTR): Remove, and instead...
(enum gfc_omp_map_op): ... add OMP_MAP_FORCE_DEVICEPTR here.
* dump-parse-tree.c (show_omp_clauses): Update.
diff --git gcc/fortran/dump-parse-tree.c gcc/fortran/dump-parse-tree.c
index e7aff22..e9d04e7 100644
--- gcc/fortran/dump-parse-tree.c
+++ gcc/fortran/dump-parse-tree.c
@@ -1251,7 +1251,6 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
const char *type = NULL;
switch (list_type)
  {
- case OMP_LIST_COPY: type = COPY; break;
  case OMP_LIST_USE_DEVICE: type = USE_DEVICE; break;
  case OMP_LIST_DEVICE_RESIDENT: type = USE_DEVICE; break;
  case OMP_LIST_CACHE: type = ; break;
diff --git gcc/fortran/gfortran.h gcc/fortran/gfortran.h
index 18adbee..aed37d3 100644
--- gcc/fortran/gfortran.h
+++ gcc/fortran/gfortran.h
@@ -1183,14 +1183,10 @@ enum
   OMP_LIST_TO,
   OMP_LIST_FROM,
   OMP_LIST_REDUCTION,
-  OMP_LIST_COPY,
-  OMP_LIST_DATA_CLAUSE_FIRST = OMP_LIST_COPY,
-  OMP_LIST_DATA_CLAUSE_LAST = OMP_LIST_DATA_CLAUSE_FIRST,
   OMP_LIST_DEVICE_RESIDENT,
   OMP_LIST_USE_DEVICE,
   OMP_LIST_CACHE,
-  OMP_LIST_NUM,
-  OMP_LIST_LAST = OMP_LIST_NUM
+  OMP_LIST_NUM
 };
 
 /* Because a symbol can belong to multiple namelists, they must be
diff --git gcc/fortran/openmp.c gcc/fortran/openmp.c
index 82726b8..47c146e 100644
--- gcc/fortran/openmp.c
+++ gcc/fortran/openmp.c
@@ -2870,11 +2870,8 @@ resolve_omp_clauses (gfc_code *code, locus *where,
   static const char *clause_names[]
 = { PRIVATE, FIRSTPRIVATE, LASTPRIVATE, COPYPRIVATE, SHARED,
COPYIN, UNIFORM, ALIGNED, LINEAR, DEPEND, MAP,
-   TO, FROM, REDUCTION,
-   COPY, COPYIN, COPYOUT, CREATE, DELETE, PRESENT,
-   PRESENT_OR_COPY, PRESENT_OR_COPYIN, PRESENT_OR_COPYOUT,
-   PRESENT_OR_CREATE, DEVICE_RESIDENT, USE_DEVICE,
-   HOST, DEVICE, CACHE };
+   TO, FROM, REDUCTION, DEVICE_RESIDENT, USE_DEVICE,
+   CACHE };
 
   if (omp_clauses == NULL)
 return;
@@ -3231,15 +3228,6 @@ resolve_omp_clauses (gfc_code *code, locus *where,
  break;
}
 
-  

Re: [PATCH] AIX: Filename-based shared library versioning for libgcc_s

2014-11-11 Thread David Edelsohn
Michael,

Why does the configure change match with p*-*-aix... instead of power*
or powerpc*?  Yes, it's unique and will match, but why make it as
short as possible, which doesn't match other uses?

In your documentation, how are you distinguishing between Dynamic
Linking and Runtime Linking?

Thanks, David


On Mon, Nov 10, 2014 at 12:41 PM, Michael Haubenwallner
michael.haubenwall...@ssi-schaefer.com wrote:


 Am 2014-11-10 17:06, schrieb David Edelsohn:
 On Mon, Nov 10, 2014 at 4:59 AM, Michael Haubenwallner
 michael.haubenwall...@ssi-schaefer.com wrote:

 Am 2014-11-07 20:52, schrieb David Edelsohn:
 First, please explicitly copy me on AIX or PowerPC patches sent to 
 gcc-patches.

 I don't have a fundamental objection to including this option, but
 note that Richi, Honza and I have discovered that using AIX runtime
 linking option interacts badly with some GCC optimizations and can
 result in applications that hang in a loop.

 Feels like adding the aix-soname linking procedure becomes more important:

 All code on AIX is position independent (PIC) by default.  Executables
 and shared libraries essentially are PIE.  Because of this, AIX does
 not provide separate static libraries and one can link statically
 with a shared library.

 Creating a library enabled for runtime linking with -G (-brtl), causes
 a lot of problems, including a newly recognized failure mode.  Without
 careful control over AIX symbol export, all global calls with use
 glink code (equivalent to ELF PLTs). This also creates a TOC entry for
 every global call, possibly overflowing the TOC.

 About to define careful control over AIX symbol export:
 The symbols listed in the import file are those found in the object files
 only, not the ones created at linktime (like __GLOBAL*) or from the static
 objects found in libc.a. While I do this in libtool from the beginning here,
 I have had a helper script wrapping ld to support '--soname=' for 
 non-libtool
 packages, where creating the import file from the final shared object also
 included static libc-provided symbols, which turned out as dependency 
 pitfall.

 AIX added ELF-like visibility support to XCOFF, which would be
 preferred.  Except it was not added in a formal release, like AIX 8.1
 and apparently was back-ported to at least AIX 6.1, so its difficult
 to phase in the support. One would need to add a configure test for
 the feature and not all users are upgrading the system. So one cannot
 build and distribute GCC for AIX 7.1 and know the feature is
 available in the system tools.  GCC builds would be incompatible and
 object files, libraries, executables created by GCC would be
 incompatible.  Basically, a mess.

 As I've seen the weak information on an older AIX 5.3 TL8 already:
 Is this visibility support something different than what nm -l or nm -P 
 shows?

 While I haven't focussed on nor explicitly tested, I do believe that this
 also solves problems with global C++ constructor/destructor call orders.

 Why? There still is the problem of the AIX kernel runtime loader
 ordering dependent shared objects.

 Feels like I indeed haven't digged deep enough into that topic yet:
 To be ignored here then.

 But the main problem is GCC uses aliases and functions declared as
 weak to support some C++ features.

 This is another reason why I do force runtime linking for our application,
 which uses these C++ features while its main target platform is Linux.

 You have not explained how this has any fix / benefit affecting the
 problem, other than separate shared and static libraries.  Forcing
 runtime linking seems irrelevant.  It was linking shared before and
 linking shared after your patch (with runtime linking) so the net
 effect is zero.

 My main reason here is to allow for *filename*-based sharedlib versioning,
 which I haven't been able to achive without import files.
 In-archive versioning is a pita from a package manager's point of view.

 For a second reason:
 Due to its Linux-centric history (well, HP-UX and Solaris before), our
 application architecture does rely on runtime linking in some corner cases.
 This is why I force that for AIX in our development- and runtime-platform,
 which is similar to /opt/freeware/, but based on Gentoo Prefix.

 For a third reason (maybe I don't have deep enough insight as well):
 If I understand correctly, you switched to build libstdc++ without runtime
 linking, because of problems when linking statically against the rtl-enabled
 libstdc++, no? For this case, by incident aix-soname does prevent shared
 objects built with runtime linking from being statically linked.

 For another reason: I can imaging to provide an rtl_enable'd libc.so as
 well, to allow for easier use of memory debuggers that intercept the
 malloc/free co libc calls... But AFAICT these rely on every sharedlib
 to be built with runtime linking enabled.

 Again, runtime linking of all global symbols affects performance and
 bloats the TOC, making TOC overflow more 

Re: [patch,gomp-4_0-branch] openacc parallel reduction part 1

2014-11-11 Thread Thomas Schwinge
Hi!

On Tue, 8 Jul 2014 07:28:24 -0700, Cesar Philippidis 
cesar_philippi...@mentor.com wrote:
 On 07/07/2014 02:55 AM, Thomas Schwinge wrote:
 
  On Sun, 6 Jul 2014 16:10:56 -0700, Cesar Philippidis 
  cesar_philippi...@mentor.com wrote:
  This patch is the first step to enabling parallel reductions in openacc.

 I've committed this updated version
 of the patch.

In r217354, I just applied the following cleanup to gomp-4_0-branch:

commit 4fe8b3620b258ac904d9eade5f76dede69a80c98
Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Tue Nov 11 14:52:26 2014 +

OpenACC reductions maintenance.

gcc/
* omp-low.c (maybe_lookup_reduction): Don't require an OpenACC
context.
(lower_oacc_offload): Simplify use of maybe_lookup_reduction.

gcc/
* omp-low.c (delete_omp_context): Dispose of reduction_map.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@217354 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp |  6 ++
 gcc/omp-low.c  | 56 +-
 2 files changed, 36 insertions(+), 26 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index dacfad8..94a7f8c 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,11 @@
 2014-11-11  Thomas Schwinge  tho...@codesourcery.com
 
+   * omp-low.c (delete_omp_context): Dispose of reduction_map.
+
+   * omp-low.c (maybe_lookup_reduction): Don't require an OpenACC
+   context.
+   (lower_oacc_offload): Simplify use of maybe_lookup_reduction.
+
* omp-low.c (lower_omp_target): Restore two gcc_asserts.
 
 2014-11-06  Thomas Schwinge  tho...@codesourcery.com
diff --git gcc/omp-low.c gcc/omp-low.c
index c63ec4e..5695ec3 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -938,7 +938,7 @@ get_base_type (tree decl)
   return type;
 }
 
-/* Lookup variables in the decl or field splay trees.  The maybe form
+/* Lookup variables.  The maybe form
allows for the variable form to not have been entered, otherwise we
assert that the variable must have been entered.  */
 
@@ -975,17 +975,6 @@ lookup_sfield (tree var, omp_context *ctx)
 }
 
 static inline tree
-lookup_reduction (const char *id, omp_context *ctx)
-{
-  gcc_assert (is_gimple_omp_oacc_specifically (ctx-stmt));
-
-  splay_tree_node n;
-  n = splay_tree_lookup (ctx-reduction_map,
-(splay_tree_key) id);
-  return (tree) n-value;
-}
-
-static inline tree
 maybe_lookup_field (tree var, omp_context *ctx)
 {
   splay_tree_node n;
@@ -994,14 +983,22 @@ maybe_lookup_field (tree var, omp_context *ctx)
 }
 
 static inline tree
+lookup_reduction (const char *id, omp_context *ctx)
+{
+  gcc_assert (is_gimple_omp_oacc_specifically (ctx-stmt));
+
+  splay_tree_node n;
+  n = splay_tree_lookup (ctx-reduction_map, (splay_tree_key) id);
+  return (tree) n-value;
+}
+
+static inline tree
 maybe_lookup_reduction (tree var, omp_context *ctx)
 {
-  gcc_assert (is_gimple_omp_oacc_specifically (ctx-stmt));
-
-  splay_tree_node n;
-  n = splay_tree_lookup (ctx-reduction_map,
-(splay_tree_key) var);
-  return n ?(tree) n-value : NULL_TREE;
+  splay_tree_node n = NULL;
+  if (ctx-reduction_map)
+n = splay_tree_lookup (ctx-reduction_map, (splay_tree_key) var);
+  return n ? (tree) n-value : NULL_TREE;
 }
 
 /* Return true if DECL should be copied by pointer.  SHARED_CTX is
@@ -1574,6 +1571,11 @@ delete_omp_context (splay_tree_value value)
 splay_tree_delete (ctx-field_map);
   if (ctx-sfield_map)
 splay_tree_delete (ctx-sfield_map);
+  if (ctx-reduction_map
+  /* Shared over several omp_contexts.  */
+   (ctx-outer == NULL
+ || ctx-reduction_map != ctx-outer-reduction_map))
+splay_tree_delete (ctx-reduction_map);
 
   /* We hijacked DECL_ABSTRACT_ORIGIN earlier.  We need to clear it before
  it produces corrupt debug information.  */
@@ -10481,10 +10483,14 @@ lower_oacc_offload (gimple_stmt_iterator *gsi_p, 
omp_context *ctx)
|| (OMP_CLAUSE_MAP_KIND (c)
!= OMP_CLAUSE_MAP_FORCE_DEVICEPTR)
|| TREE_CODE (TREE_TYPE (ovar)) != ARRAY_TYPE);
-   if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
-OMP_CLAUSE_MAP_KIND (c) == OMP_CLAUSE_MAP_POINTER
-!OMP_CLAUSE_MAP_ZERO_BIAS_ARRAY_SECTION (c)
-TREE_CODE (TREE_TYPE (ovar)) == ARRAY_TYPE)
+   if (maybe_lookup_reduction (var, ctx))
+ {
+   gimplify_assign (x, var, ilist);
+ }
+   else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+ OMP_CLAUSE_MAP_KIND (c) == OMP_CLAUSE_MAP_POINTER
+ !OMP_CLAUSE_MAP_ZERO_BIAS_ARRAY_SECTION (c)
+ TREE_CODE (TREE_TYPE (ovar)) == ARRAY_TYPE)
  {
tree avar
   

Re: Fix libtool.m4 for Darwin = 10.10

2014-11-11 Thread Jakub Jelinek
On Tue, Nov 11, 2014 at 03:59:03PM +0100, FX wrote:
 
  Since you are simply patching all the configure files, the question
  seems academic unless you switch to properly regenerating all of the
  configure files using a fixed libtool.m4.
 
 I am actually proposing to fix libtool.m4 and regenerate the configure 
 scripts (which gives the same result as patching, as expected).
 
 
  However these are maintained, the libjava configure files still need
  to be patched to prevent their associated shared libraries from being
  inappropriately linked with -flat_namespace on darwin14 and later.
 
 Yes, but I don’t know whether libjava and classpath should be patched in
 GCC, or whether I should report them to be patched somewhere else (like
 libgo and zlib, for example).  It’s important to do it properly, otherwise
 codebases diverge and maintance becomes difficult.

libjava is maintained in GCC, libjava/classpath, while imported
occassionally from upstream, would upon merge result in regenerating the
generated files and thus should be patched too.  For the latter,
you should put it into libjava/classpath/ChangeLog.gcj.

Jakub


Re: RFC: Update ISL under gcc/infrastructure/ ? // Remove CLooG?

2014-11-11 Thread Jack Howarth
Tobias,
The only new regression seen in gcc trunk when using isl 0.14
with my mockup isl_0.14.diff patch is the failure...

UNRESOLVED: gcc.dg/graphite/isl-codegen-loop-dumping.c
scan-tree-dump-times graphite ISL AST generated by ISL: \\nfor
(int c1 = 0; c1  n - 1; c1 += 1)\\n  for (int c3 = 0;
c3  n; c3 += 1)\\nS_4(c1, c3); 1

at both -m32/-m64. Is this really a regression or simply detection of
a change in the tree-dump generated by isl 0.14?
Jack
ps I was under the impression that these later versions of isl were
supposed to have improved performance that potentially would result in
changes in such tree-dumps.

On Mon, Nov 10, 2014 at 8:40 PM, Jack Howarth howarth.at@gmail.com wrote:
 On x86_64-apple-darwin14, the attached patch allows gcc trunk to
 build against isl 0.14. I assume if we want to retain the...

 #if defined(__cplusplus)
 extern C {
 #endif

 #if defined(__cplusplus)
 }
 #endif

 wrappers around the include of  isl/val_gmp.h, to continue to support
 isl 0.12.2, isl.m4 will need to test for isl = 0.12.2 and set a
 define in autohost.h that can be added to the conditional on
 _cplusplus. The same define would have to be used in a conditional for
 selecting code changes required for using...

 if (isl_band_member_is_zero_distance (Band, i))

 in gcc/graphite-optimize-isl.c for isl = 0.12.2 rather than...

 if (isl_band_member_is_coincident (Band, i))

 and the other associated changes for isl   0.12.2.
 Jack
 ps The changes in gcc/graphite-optimize-isl.c are modelled on those in
 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191650#c6.

 pps The test suite results for make -k check
 RUNTESTFLAGS=graphite.exp --target_board=unix'{-m32,-m64}' are...

 LAST_UPDATED: Obtained from SVN: trunk revision 217269

 Native configuration is x86_64-apple-darwin13.4.0

 === g++ tests ===

 Running target unix/-m32

 === g++ Summary for unix/-m32 ===

 # of expected passes 27

 Running target unix/-m64

 === g++ Summary for unix/-m64 ===

 # of expected passes 27

 === g++ Summary ===

 # of expected passes 54
 /sw/src/fink.build/gcc50-5.0.0-1000/darwin_objdir/gcc/testsuite/g++/../../xg++
  version 5.0.0 20141109 (experimental) (GCC)

 === gcc tests ===

 Running target unix/-m32
 FAIL: gcc.dg/graphite/vect-pr43423.c scan-tree-dump-times vect
 vectorized 2 loops 1
 UNRESOLVED: gcc.dg/graphite/isl-codegen-loop-dumping.c
 scan-tree-dump-times graphite ISL AST generated by ISL: \\nfor
 (int c1 = 0; c1  n - 1; c1 += 1)\\n  for (int c3 = 0;
 c3  n; c3 += 1)\\nS_4(c1, c3); 1

 === gcc Summary for unix/-m32 ===

 # of expected passes 299
 # of unexpected failures 1
 # of expected failures 4
 # of unresolved testcases 1
 # of unsupported tests 5

 Running target unix/-m64
 FAIL: gcc.dg/graphite/vect-pr43423.c scan-tree-dump-times vect
 vectorized 2 loops 1
 UNRESOLVED: gcc.dg/graphite/isl-codegen-loop-dumping.c
 scan-tree-dump-times graphite ISL AST generated by ISL: \\nfor
 (int c1 = 0; c1  n - 1; c1 += 1)\\n  for (int c3 = 0;
 c3  n; c3 += 1)\\nS_4(c1, c3); 1

 === gcc Summary for unix/-m64 ===

 # of expected passes 299
 # of unexpected failures 1
 # of expected failures 4
 # of unresolved testcases 1
 # of unsupported tests 5

 === gcc Summary ===

 # of expected passes 598
 # of unexpected failures 2
 # of expected failures 8
 # of unresolved testcases 2
 # of unsupported tests 10
 /sw/src/fink.build/gcc50-5.0.0-1000/darwin_objdir/gcc/xgcc  version
 5.0.0 20141109 (experimental) (GCC)

 === gfortran tests ===

 Running target unix/-m32

 === gfortran Summary for unix/-m32 ===

 # of expected passes 112
 # of expected failures 14

 Running target unix/-m64

 === gfortran Summary for unix/-m64 ===

 # of expected passes 110
 # of expected failures 14
 # of unsupported tests 2

 === gfortran Summary ===

 # of expected passes 222
 # of expected failures 28
 # of unsupported tests 2
 /sw/src/fink.build/gcc50-5.0.0-1000/darwin_objdir/gcc/testsuite/gfortran/../../gfortran
  version 5.0.0 20141109 (experimental) (GCC)

 === libgomp tests ===

 Running target unix/-m32

 === libgomp Summary for unix/-m32 ===

 # of expected passes 49

 Running target unix/-m64

 === libgomp Summary for unix/-m64 ===

 # of expected passes 49

 === libgomp Summary ===

 # of expected passes 98

 Compiler version: 5.0.0 20141109 (experimental) (GCC)
 Platform: x86_64-apple-darwin13.4.0
 configure flags: --prefix=/sw --prefix=/sw/lib/gcc5.0
 --mandir=/sw/share/man --infodir=/sw/lib/gcc5.0/info
 --enable-languages=c,c++,fortran,lto,objc,obj-c++,java --with-gmp=/sw
 --with-libiconv-prefix=/sw --with-isl=/sw --without-cloog
 --with-mpc=/sw --with-system-zlib --x-includes=/usr/X11R6/include
 --x-libraries=/usr/X11R6/lib --program-suffix=-fsf-5.0


 On Mon, Nov 10, 2014 at 2:27 PM, Jack Howarth howarth.at@gmail.com 
 wrote:
  Is the current isl 0.12.2 in infrastructure 

Re: [C++ Patch] PR 63265

2014-11-11 Thread Jason Merrill

OK.

Jason


Re: [GRAPHITE, PATCH] Loop unroll and jam optimization

2014-11-11 Thread Mircea Namolaru

Many thanks. Here is the new patch that fixes the main problem of the previous 
one
(i.e separation of the loop after unroll and jam) as well as the problems raised
by you (see comments below).

Now the code with the separation class option looks: 

ISL AST generated by ISL: 
{
  for (int c0 = 0; c0  HEIGHT - 4; c0 += 4)
for (int c1 = 0; c1  LENGTH - 3; c1 += 1)
  for (int c2 = c0; c2 = c0 + 3; c2 += 1)
S_4(c2, c1);
  for (int c1 = 0; c1  LENGTH - 3; c1 += 1)
for (int c2 = -((HEIGHT - 1) % 4) + HEIGHT - 1; c2  HEIGHT; c2 += 1)
  S_4(c2, c1);
}
 
I tried the unroll option for AST, two loops are unrolled 
and the code looks like:

ISL AST generated by ISL: 
{
  for (int c0 = 0; c0  HEIGHT - 4; c0 += 4)
for (int c1 = 0; c1  LENGTH - 3; c1 += 1) {
  S_4(c0, c1);
  S_4(c0 + 1, c1);
  S_4(c0 + 2, c1);
  S_4(c0 + 3, c1);
}
  for (int c1 = 0; c1  LENGTH - 3; c1 += 1) {
S_4(-((HEIGHT - 1) % 4) + HEIGHT - 1, c1);
if ((HEIGHT - 1) % 4 = 1) {
  S_4(-((HEIGHT - 1) % 4) + HEIGHT, c1);
  if ((HEIGHT - 1) % 4 = 2) {
S_4(-((HEIGHT - 1) % 4) + HEIGHT + 1, c1);
if (HEIGHT % 4 == 0)
  S_4(HEIGHT - 1, c1);
  }
}
  }
}

As I don't quite like the unrolling of the second loop, and the GCC standard 
unrolling
is able to unroll the first one, decided to not perfrom the unrolling within 
graphite. 
But if desirable, it could be done.

The patch remains basically the same, two maps are build, one for the regular
unroll and jam (i.e. stride mining) and the other for computing the separating 
class 
(i.e. its image is  the image of the full tiles on strided dimension). In 
graphite-isl-ast-to-gimple.c, 
these two maps are used to build the separation class option and fix the 
scheduling. 

The main differences from the previous path is that the option separating class 
is set on a 
different dimension and a contraint was added to to the map used to build the 
separating_class. 

Now some comments to your message:
 
 
 I'm not sure if Tobi or Albert have told you, but the separation_class option
 is going to be phased out since its design is fundamentally flawed.
 If you can't wait until isl-0.15, then I guess you have no choice but
 to use this option, but you should realize that it will remain frozen
 in its current broken state (until it is removed at some point).
 

No, didn't know about the phase out of separation_class option. Anyway, for the 
time 
being is the best solution available. My understanding is that this option 
should
always generate correct code, of course as long as the scheduling is correct, 
but
think that had some cases when setting the separating_class leads to incorrect 
code. 

For isl_0.15, do you intend to provide some option with a similar functionality 
? 
 
  +  /* Extract the original and auxiliar maps from pbb-transformed.
  + Set pbb-transformed to the original map. */
  +  psmap = smap;
  +  psmap-n = 0;
  +  res = isl_map_foreach_basic_map (pbb-transformed, separate_map,
  (void *)psmap);
  +  gcc_assert (res == 0);
  +
  +  isl_map_free(pbb-transformed);
  +  pbb-transformed = isl_map_copy(psmap-map_arr[0]);
  +
 
 I have no idea what this pbb-transformed is supposed to represent,
 but you appear to be assuming that it has exactly two disjuncts and that
 they appear in some order.  Now, perhaps you have explicitly
 checked that this map has two disjuncts, but then you should
 probably bring the check closer since any operation on sets that
 you perform could change the internal representation (even of
 other sets).  However, in no way can you assume that
 isl_map_foreach_basic_map will iterate over these disjuncts
 in any specific order.


At this point pbb-transformed has two basic maps, one is the mapping for 
unroll and jam, 
and one for the full tile for the striped dimension. Introduce a check that 
differentiate 
between them as the image of one maps should be included in the other.

In fact to prevent any isl side-effects, thought to introduce a
new field pbb-transformed_full in the pbb structure to be on the safe side. 
 
 
Index: gcc/toplev.c
===
--- gcc/toplev.c	(revision 217013)
+++ gcc/toplev.c	(working copy)
@@ -1302,11 +1302,12 @@
   || flag_loop_block
   || flag_loop_interchange
   || flag_loop_strip_mine
-  || flag_loop_parallelize_all)
+  || flag_loop_parallelize_all
+  || flag_loop_unroll_jam)
 sorry (Graphite loop optimizations cannot be used (ISL is not available) 
 	   (-fgraphite, -fgraphite-identity, -floop-block, 
 	   -floop-interchange, -floop-strip-mine, -floop-parallelize-all, 
-	   and -ftree-loop-linear));
+	   -floop-unroll-and-jam, and -ftree-loop-linear));
 #endif
 
   /* One region RA really helps to decrease the code size.  */
Index: gcc/graphite-optimize-isl.c
===
--- 

Re: [GRAPHITE, PATCH] Loop unroll and jam optimization

2014-11-11 Thread Mircea Namolaru
Changed the option to -floop-unroll-and jam as you suggested.

  The patch takes advantage of the new isl based code generator introduced
  recently
  in GCC (in fact of the possible options for building the AST).
 
  The code generated for this optimization in the case of non-constant loop
  bounds
  initially looks as below. This is not very useful because the standard GCC
  unrolling don't succeed to unroll the most inner loop.
 
  ISL AST generated by ISL:
  for (int c0 = 0; c0  HEIGHT; c0 += 4)
for (int c1 = 0; c1  LENGTH - 3; c1 += 1)
  for (int c2 = c0; c2 = min(HEIGHT - 1, c0 + 3); c2 += 1)
 
 Hmm, so this iterates at most 4 times, right?  Eventually the body is
 considered
 too large by GCC or it fails to compute an upper bound for the number
 of iterations.
 Is that (an upper bound for the number of iterations) available readily from
 ISL
 at code-generation time?  If so you can transfer this knowledge to the GCC
 loop
 information.
 

The problem was not explained well. It is not only the unrolling, it is also
the loop separation (which the latest version of the patch does). Even if the 
gcc
unrolling succeeds to unroll the inner loop you will get a code similar with
the one obtained by the previous version of this patch, which is not what is 
wanted.

Last time when checked, GCC unrolling was not able to unroll the inner loop.
In my opinion it is the min and max that prevent it (graphite for blocking,
strip-mine, unroll and jam emits such code). The bounds of the iteration 
domain are expressed in min, max terms.
  
 I'm curious to see a testcase (and a way to generate the above form) to see
 what
 is actually the problem.
 

Of course. Take the code from the unroll-and-jam patch and the attached test
case (but as said other graphite options will generate similar code). But 
somehow
it seems that the new isl based code generator could handle more easily such
transformations.

Mircea



 Thanks,
 Richard.
 
S_4(c2, c1);
 
  Now, the separating class option (set for unroll and jam) produces this
  nice loop
  structure:
  ISL AST generated by ISL:
  for (int c0 = 0; c0  HEIGHT; c0 += 4)
for (int c1 = 0; c1  LENGTH - 3; c1 += 1)
  if (HEIGHT = c0 + 4) {
for (int c2 = c0; c2 = c0 + 3; c2 += 1)
  S_4(c2, c1);
  } else
for (int c2 = c0; c2  HEIGHT; c2 += 1)
  S_4(c2, c1);
 
  The unroll option (set for unroll and jam) produces:
  ISL AST generated by ISL:
  for (int c0 = 0; c0  HEIGHT; c0 += 4)
for (int c1 = 0; c1  LENGTH - 3; c1 += 1)
  if (HEIGHT = c0 + 4) {
S_4(c0, c1);
S_4(c0 + 1, c1);
S_4(c0 + 2, c1);
S_4(c0 + 3, c1);
  } else {
S_4(c0, c1);
if (HEIGHT = c0 + 2) {
  S_4(c0 + 1, c1);
  if (4 * floord(HEIGHT - 3, 4) + 3 == HEIGHT  c0 + 3 == HEIGHT)
S_4(HEIGHT - 1, c1);
}
  }
 
  The separate option (set by default for all dimensions for the new isl
  based code generator)
  don't succeed to remove the ifs from the loops and generate two loop
  structures (this would
  have been highly desirable).
 
  As the stage 1 is going to close soon, quick feedback to this patch is
  greatly appreciated.
  Many thanks, Mircea Namolaru
 
int
f1(int v[1024][1024], int HEIGHT, int LENGTH)
{
  int i, j;

  for (i=0; iHEIGHT; i++) {
for (j=3; j LENGTH; j++) {
  v[i][j] = v[i][j-3] + v[i][j-2] + v[i][j];
}
  }

}


[x86, merge] Replace builtins with vector extensions

2014-11-11 Thread Marc Glisse

Hello,

here is the combined patch+ChangeLog. I'll run a last regtest just before 
committing. Ok for trunk?


2014-11-12  Marc Glisse  marc.gli...@inria.fr

gcc/
* config/i386/xmmintrin.h (_mm_add_ps, _mm_sub_ps, _mm_mul_ps,
_mm_div_ps, _mm_store_ss, _mm_cvtss_f32): Use vector extensions
instead of builtins.
* config/i386/emmintrin.h (__v2du, __v4su, __v8hu, __v16qu): New
typedefs.
(_mm_sqrt_sd): Fix comment.
(_mm_add_epi8, _mm_add_epi16, _mm_add_epi32, _mm_add_epi64,
_mm_sub_epi8, _mm_sub_epi16, _mm_sub_epi32, _mm_sub_epi64,
_mm_mullo_epi16, _mm_cmpeq_epi8, _mm_cmpeq_epi16, _mm_cmpeq_epi32,
_mm_cmplt_epi8, _mm_cmplt_epi16, _mm_cmplt_epi32, _mm_cmpgt_epi8,
_mm_cmpgt_epi16, _mm_cmpgt_epi32, _mm_and_si128, _mm_or_si128,
_mm_xor_si128, _mm_store_sd, _mm_cvtsd_f64, _mm_storeh_pd,
_mm_cvtsi128_si64, _mm_cvtsi128_si64x, _mm_add_pd, _mm_sub_pd,
_mm_mul_pd, _mm_div_pd, _mm_storel_epi64, _mm_movepi64_pi64):
Use vector extensions instead of builtins.
* config/i386/smmintrin.h (_mm_cmpeq_epi64, _mm_cmpgt_epi64,
_mm_mullo_epi32): Likewise.
* config/i386/avxintrin.h (__v4du, __v8su, __v16hu, __v32qu):
New typedefs.
(_mm256_add_pd, _mm256_add_ps, _mm256_div_pd, _mm256_div_ps,
_mm256_mul_pd, _mm256_mul_ps, _mm256_sub_pd, _mm256_sub_ps):
Use vector extensions instead of builtins.
* config/i386/avx2intrin.h (_mm256_cmpeq_epi8, _mm256_cmpeq_epi16,
_mm256_cmpeq_epi32, _mm256_cmpeq_epi64, _mm256_cmpgt_epi8,
_mm256_cmpgt_epi16, _mm256_cmpgt_epi32, _mm256_cmpgt_epi64,
_mm256_and_si256, _mm256_or_si256, _mm256_xor_si256, _mm256_add_epi8,
_mm256_add_epi16, _mm256_add_epi32, _mm256_add_epi64,
_mm256_mullo_epi16, _mm256_mullo_epi32, _mm256_sub_epi8,
_mm256_sub_epi16, _mm256_sub_epi32, _mm256_sub_epi64): Likewise.
* config/i386/avx512fintrin.h (__v8du, __v16su, __v32hu, __v64qu):
New typedefs.
(_mm512_or_si512, _mm512_or_epi32, _mm512_or_epi64, _mm512_xor_si512,
_mm512_xor_epi32, _mm512_xor_epi64, _mm512_and_si512,
_mm512_and_epi32, _mm512_and_epi64, _mm512_mullo_epi32,
_mm512_add_epi64, _mm512_sub_epi64, _mm512_add_epi32,
_mm512_sub_epi32, _mm512_add_pd, _mm512_add_ps, _mm512_sub_pd,
_mm512_sub_ps, _mm512_mul_pd, _mm512_mul_ps, _mm512_div_pd,
_mm512_div_ps): Use vector extensions instead of builtins.
* config/i386/avx512bwintrin.h (_mm512_mullo_epi16, _mm512_add_epi8,
_mm512_sub_epi8, _mm512_sub_epi16, _mm512_add_epi16): Likewise.
* config/i386/avx512dqintrin.h (_mm512_mullo_epi64): Likewise.
* config/i386/avx512vldqintrin.h (_mm256_mullo_epi64, _mm_mullo_epi64):
Likewise.

gcc/testsuite/
* gcc.target/i386/intrinsics_opt-1.c: New testcase.
* gcc.target/i386/intrinsics_opt-2.c: Likewise.
* gcc.target/i386/intrinsics_opt-3.c: Likewise.
* gcc.target/i386/intrinsics_opt-4.c: Likewise.

--
Marc Glissediff -ru -N -x .svn trunk/gcc/config/i386/avx2intrin.h 
intrin/gcc/config/i386/avx2intrin.h
--- trunk/gcc/config/i386/avx2intrin.h  2014-04-01 07:34:06.335878860 +0200
+++ intrin/gcc/config/i386/avx2intrin.h 2014-11-10 21:56:37.040719810 +0100
@@ -104,28 +104,28 @@
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_add_epi8 (__m256i __A, __m256i __B)
 {
-  return (__m256i)__builtin_ia32_paddb256 ((__v32qi)__A, (__v32qi)__B);
+  return (__m256i) ((__v32qu)__A + (__v32qu)__B);
 }
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_add_epi16 (__m256i __A, __m256i __B)
 {
-  return (__m256i)__builtin_ia32_paddw256 ((__v16hi)__A, (__v16hi)__B);
+  return (__m256i) ((__v16hu)__A + (__v16hu)__B);
 }
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_add_epi32 (__m256i __A, __m256i __B)
 {
-  return (__m256i)__builtin_ia32_paddd256 ((__v8si)__A, (__v8si)__B);
+  return (__m256i) ((__v8su)__A + (__v8su)__B);
 }
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_add_epi64 (__m256i __A, __m256i __B)
 {
-  return (__m256i)__builtin_ia32_paddq256 ((__v4di)__A, (__v4di)__B);
+  return (__m256i) ((__v4du)__A + (__v4du)__B);
 }
 
 extern __inline __m256i
@@ -178,7 +178,7 @@
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_and_si256 (__m256i __A, __m256i __B)
 {
-  return (__m256i) __builtin_ia32_andsi256 ((__v4di)__A, (__v4di)__B);
+  return (__m256i) ((__v4du)__A  (__v4du)__B);
 }
 
 extern __inline __m256i
@@ -230,59 +230,56 @@
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cmpeq_epi8 (__m256i __A, __m256i __B)
 {
-  return (__m256i)__builtin_ia32_pcmpeqb256 ((__v32qi)__A, (__v32qi)__B);
+  return (__m256i) ((__v32qi)__A == (__v32qi)__B);
 }
 
 

Re: [PATCH] Fix PR56480 aka DR374. Allow explicit specialization in enclosing namespace.

2014-11-11 Thread Jason Merrill

On 11/08/2014 06:57 AM, Markus Trippelsdorf wrote:

+++ b/gcc/testsuite/g++.old-deja/g++.pt/explicit73.C
@@ -7,9 +7,9 @@
 // the template

 namespace N {
-  template class T class foo;  // { dg-error  } referenced below
+  template class T class foo;  // { dg-error   { target { ! c++11 } } } 
referenced below
 }

 using namespace N;

-template  class foovoid; // { dg-error  } invalid specialization
+template  class foovoid; // { dg-error   { target { ! c++11 } } } 
invalid specialization


This should still get an error in C++11 mode.

I think we also need to test this:

namespace A {
  namespace B {
template class T void f();
  }
  using namespace B;
}

template void A::f(); // { dg-error  }

I think your code won't catch this, because we need to know what the 
explicit namespace was, not just whether there was one.


Can we handle this in check_explicit_specialization rather than all the 
way down in register_specialization?


Jason



Re: [patch,gomp-4_0-branch] openacc parallel reduction part 1

2014-11-11 Thread Thomas Schwinge
Hi!

On Tue, 11 Nov 2014 16:03:05 +0100, I wrote:
 On Tue, 8 Jul 2014 07:28:24 -0700, Cesar Philippidis 
 cesar_philippi...@mentor.com wrote:
  On 07/07/2014 02:55 AM, Thomas Schwinge wrote:
  
   On Sun, 6 Jul 2014 16:10:56 -0700, Cesar Philippidis 
   cesar_philippi...@mentor.com wrote:
   This patch is the first step to enabling parallel reductions in openacc.
 
  I've committed this updated version
  of the patch.
 
 In r217354, I just applied the following cleanup to gomp-4_0-branch:
 
 commit 4fe8b3620b258ac904d9eade5f76dede69a80c98
 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
 Date:   Tue Nov 11 14:52:26 2014 +
 
 OpenACC reductions maintenance.
 
   gcc/
   * omp-low.c (maybe_lookup_reduction): Don't require an OpenACC
   context.
   (lower_oacc_offload): Simplify use of maybe_lookup_reduction.
 
   gcc/
   * omp-low.c (delete_omp_context): Dispose of reduction_map.
 
 git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@217354 
 138bc75d-0d04-0410-961f-82ee72b054a4

I further tried to tidy this up as follows -- but that is causing the
reduction execution tests to fail; indeed -fdump-tree-all already shows
unexpected changes during gimplification.  (I first suspected that
variables are added to a GIMPLE_OMP_FOR reduction_map, and then not
found when reading from GIMPLE_OACC_PARALLEL one, but now I'm not at
all sure about this theory.)  Cesar, is cleanup like that useful at all,
and if yes, could you look into that, later on?  (Definitely not urgent.)

commit 3ef04b65c1b5d3db5aa4b903a1ec0f693bb75ca8
Author: Thomas Schwinge tho...@codesourcery.com
Date:   Tue Nov 11 13:04:00 2014 +0100

[WIP] Make reduction_map per context.
---
 gcc/omp-low.c | 41 +
 1 file changed, 29 insertions(+), 12 deletions(-)

diff --git gcc/omp-low.c gcc/omp-low.c
index 5695ec3..44ed9a0 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -987,8 +987,19 @@ lookup_reduction (const char *id, omp_context *ctx)
 {
   gcc_assert (is_gimple_omp_oacc_specifically (ctx-stmt));
 
-  splay_tree_node n;
-  n = splay_tree_lookup (ctx-reduction_map, (splay_tree_key) id);
+  splay_tree_node n = NULL;
+  do
+{
+  if (ctx-reduction_map != NULL)
+   n = splay_tree_lookup (ctx-reduction_map, (splay_tree_key) id);
+  if (n != NULL)
+   break;
+  /* If not found, recurse into outer context.  */
+  ctx = ctx-outer;
+}
+  while (ctx != NULL
+/*  ctx-reduction_map != NULL */);
+  gcc_assert (n != NULL);
   return (tree) n-value;
 }
 
@@ -996,8 +1007,17 @@ static inline tree
 maybe_lookup_reduction (tree var, omp_context *ctx)
 {
   splay_tree_node n = NULL;
-  if (ctx-reduction_map)
-n = splay_tree_lookup (ctx-reduction_map, (splay_tree_key) var);
+  do
+{
+  if (ctx-reduction_map != NULL)
+   n = splay_tree_lookup (ctx-reduction_map, (splay_tree_key) var);
+  if (n != NULL)
+   break;
+  /* If not found, recurse into outer context.  */
+  ctx = ctx-outer;
+}
+  while (ctx != NULL
+/*  ctx-reduction_map != NULL */);
   return n ? (tree) n-value : NULL_TREE;
 }
 
@@ -1498,8 +1518,6 @@ new_omp_context (gimple stmt, omp_context *outer_ctx)
   ctx-cb = outer_ctx-cb;
   ctx-cb.block = NULL;
   ctx-depth = outer_ctx-depth + 1;
-  /* FIXME: handle reductions recursively.  */
-  ctx-reduction_map = outer_ctx-reduction_map;
 }
   else
 {
@@ -1513,7 +1531,6 @@ new_omp_context (gimple stmt, omp_context *outer_ctx)
   ctx-cb.eh_lp_nr = 0;
   ctx-cb.transform_call_graph_edges = CB_CGE_MOVE;
   ctx-depth = 1;
-  //TODO ctx-reduction_map = TODO;
 }
 
   ctx-cb.decl_map = new hash_maptree, tree;
@@ -1571,10 +1588,7 @@ delete_omp_context (splay_tree_value value)
 splay_tree_delete (ctx-field_map);
   if (ctx-sfield_map)
 splay_tree_delete (ctx-sfield_map);
-  if (ctx-reduction_map
-  /* Shared over several omp_contexts.  */
-   (ctx-outer == NULL
- || ctx-reduction_map != ctx-outer-reduction_map))
+  if (ctx-reduction_map)
 splay_tree_delete (ctx-reduction_map);
 
   /* We hijacked DECL_ABSTRACT_ORIGIN earlier.  We need to clear it before
@@ -1765,6 +1779,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
  install_var_local (array, c);
 
  /* Insert it into the current context.  */
+ //TODO
  splay_tree_insert (ctx-reduction_map,
 (splay_tree_key) omp_get_id(var),
 (splay_tree_value) array);
@@ -2394,8 +2409,8 @@ scan_oacc_offload (gimple stmt, omp_context *outer_ctx)
   DECL_ARTIFICIAL (name) = 1;
   DECL_NAMELESS (name) = 1;
   TYPE_NAME (ctx-record_type) = name;
-  create_omp_child_function (ctx, false);
   ctx-reduction_map = splay_tree_new (splay_tree_compare_pointers, 0, 0);
+  create_omp_child_function (ctx, false);
 
   gimple_omp_set_child_fn (stmt, 

Re: [PATCH][AArch64] Add bounds checking to vqdm*_lane intrinsics via a qualifier that also flips endianness

2014-11-11 Thread Charles Baylis
Resending as text/plain

On 11 November 2014 15:14, Charles Baylis charles.bay...@linaro.org wrote:


 On 6 November 2014 10:19, Alan Lawrence alan.lawre...@arm.com wrote:

 This generates out-of-range errors at compile- (rather than assemble-)time
 for the vqdm*_lane intrinsics, and also provides a single place to do
 bigendian lane-swapping for all those intrinsics (and others to follow in
 later patches). This allows us to remove many define_expands that just do a
 range-check and endian-swap before outputting the RTL for a corresponding
 _internal insn.

 Changes to aarch64-simd.md are not as big as they look, they are highly
 repetitive, like the code they are removing! Testcases are also repetitive,
 as unfortunately dg-error doesn't care *how many* errors there were matching
 it's pattern, as long as at least 1, hence having to separate each into own
 file - the last 0 in the dg-error disables the line-number checking, as
 the line numbers in our error messages refer to lines within arm_neon.h
 rather than within the test case. (They do at least mention the user
 function containing the call to the intrinsic.)

 Ok for trunk?


 It looks like there are a few places where you have 8 spaces where a tab
 ought to be. Other than that, it looks good to me (but I can't approve)

 I am looking making errors found in arm_neon.h a bit more user friendly,
 which depends on checking bounds on constant int parameters as you've done
 here.

 Do you plan to do similar changes for loads/stores/shifts, and also for the
 ARM back-end? I can help out if you don't already have patches in
 development.

 Charles


[PATCH] Remove pedantic_lvalues

2014-11-11 Thread Richard Biener

As pre-approved by Joseph the following removes pedantic_lvalues
which the C FE now handles itself without help from fold-const.c.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

The C++ FE is still not happy without NON_LVALUE_EXPRs though.

Richard.

2014-11-11  Richard Biener  rguent...@suse.de

* tree-core.h (pedantic_lvalues): Remove.
* fold-const.c (pedantic_lvalues): Likewise.
(pedantic_non_lvalue_loc): Remove conditional non_lvalue_loc call.

c/
* c-decl.c (c_init_decl_processing): Do not set pedantic_lvalues
to true.

Index: trunk/gcc/c/c-decl.c
===
*** trunk.orig/gcc/c/c-decl.c   2014-10-29 13:34:20.979438627 +0100
--- trunk/gcc/c/c-decl.c2014-11-11 14:16:04.605138651 +0100
*** c_init_decl_processing (void)
*** 3947,3954 
  
input_location = save_loc;
  
-   pedantic_lvalues = true;
- 
make_fname_decl = c_make_fname_decl;
start_fname_decls ();
  }
--- 3947,3952 
Index: trunk/gcc/fold-const.c
===
*** trunk.orig/gcc/fold-const.c 2014-11-11 10:54:01.424669165 +0100
--- trunk/gcc/fold-const.c  2014-11-11 14:15:36.954139861 +0100
*** non_lvalue_loc (location_t loc, tree x)
*** 2160,2179 
return build1_loc (loc, NON_LVALUE_EXPR, TREE_TYPE (x), x);
  }
  
- /* Nonzero means lvalues are limited to those valid in pedantic ANSI C.
-Zero means allow extended lvalues.  */
- 
- int pedantic_lvalues;
- 
  /* When pedantic, return an expr equal to X but certainly not valid as a
 pedantic lvalue.  Otherwise, return X.  */
  
  static tree
  pedantic_non_lvalue_loc (location_t loc, tree x)
  {
-   if (pedantic_lvalues)
- return non_lvalue_loc (loc, x);
- 
return protected_set_expr_location_unshare (x, loc);
  }
  
--- 2160,2171 
Index: trunk/gcc/tree-core.h
===
*** trunk.orig/gcc/tree-core.h  2014-11-11 09:39:42.722864279 +0100
--- trunk/gcc/tree-core.h   2014-11-11 14:17:49.193134074 +0100
*** extern GTY(()) builtin_info_type builtin
*** 1877,1886 
  /* If nonzero, an upper limit on alignment of structure fields, in bits,  */
  extern unsigned int maximum_field_alignment;
  
- /* Nonzero means lvalues are limited to those valid in pedantic ANSI C.
-Zero means allow extended lvalues.  */
- extern int pedantic_lvalues;
- 
  /* Points to the FUNCTION_DECL of the function whose body we are reading.  */
  extern GTY(()) tree current_function_decl;
  
--- 1877,1882 


Re: [RFC PATCH, AARCH64] Add support for -mlong-calls option

2014-11-11 Thread Richard Earnshaw
On 27/10/14 09:21, Yangfei (Felix) wrote:
 +/* Handle pragmas for compatibility with Intel's compilers.  */
 +#define REGISTER_TARGET_PRAGMAS() do {  
 \
 +  c_register_pragma (0, long_calls, aarch64_pr_long_calls);   
 \
 +  c_register_pragma (0, no_long_calls, aarch64_pr_no_long_calls); 
 \
 +  c_register_pragma (0, long_calls_off, aarch64_pr_long_calls_off);   
 \
 +} while (0)
 +
  #define FUNCTION_ARG_PADDING(MODE, TYPE) \
(aarch64_pad_arg_upward (MODE, TYPE) ? upward : downward)
 
 
 Hi,
 
   I updated the patch with the following two changes:
 1. Add one entry in ChangeLog for this patch;
 2. Enable this feature for sibling calls too.
 
   Assuming no issues pop up, OK for trunk?
 

Hi Felix,

Sorry for the delay responding, I've been out of the office recently and
I'm only just catching up on a backlog of GCC related emails.

I'm in two minds about this; I can potentially see the need for
attributes to enable long calls for specific calls, and maybe also for
pragmas that can be used to efficiently mark a group of functions in
that way; but I don't really see the value in adding a -mlong-calls
option to do this globally.

The reasoning is as follows: long calls are generally very expensive and
relatively few functions should need them in most applications (since
code that needs to span more than a single block of 128Mbytes - the span
of a BL or B instruction - will be very rare in reality).

The best way to handle very large branches for those rare cases where
you do have a very large contiguous block of code more than 128MB is by
having the linker insert veneers when needed; the code will branch to
the veneer which will insert an indirect branch at that point (the ABI
guarantees that at function call boundaries IP0 and IP1 will not contain
live values, making them available for such purposes).

In a very small number of cases it might be desirable to mark specific
functions as being too far away to reach; in those cases the attributes
and pragma methods can be used to mark such calls as being far calls.

Aside: The reason -mlong-calls was added to GCC for ARM is that the code
there pre-dates the EABI, which introduced the concept of link-time
veneering of calls - the option should be unnecessary now that almost
everyone uses the EABI as the basis for their platform ABI.  We don't
have such a legacy for AArch64 and I'd need to see strong justification
for its use before adding the option there as well.

So please can you rework the patch to remove the -mlong-calls option and
just leave the attribute and pragma interfaces.

R.

 
 Index: gcc/ChangeLog
 ===
 --- gcc/ChangeLog (revision 216558)
 +++ gcc/ChangeLog (working copy)
 @@ -1,3 +1,26 @@
 +2014-10-27  Felix Yang  felix.y...@huawei.com
 + Haijian Zhang  z.zhanghaij...@huawei.com
 +
 + * config/aarch64/aarch64.opt (mlong-calls): New option.
 + * config/aarch64/aarch64.h (REGISTER_TARGET_PRAGMAS): Define.
 + * config/aarch64/aarch64.c (aarch64_set_default_type_attributes,
 + aarch64_attribute_table, aarch64_comp_type_attributes,
 + aarch64_decl_is_long_call_p, aarch64_function_in_section_p,
 + aarch64_pr_long_calls, aarch64_pr_no_long_calls,
 + aarch64_pr_long_calls_off): New functions.
 + (TARGET_SET_DEFAULT_TYPE_ATTRIBUTES): Define as
 + aarch64_set_default_type_attributes.
 + (TARGET_ATTRIBUTE_TABLE): Define as aarch64_attribute_table.
 + (TARGET_COMP_TYPE_ATTRIBUTES): Define as aarch64_comp_type_attribute.
 + (aarch64_pragma_enum): New enum.
 + (aarch64_attribute_table): New attribute table.
 + * config/aarch64/aarch64-protos.h (aarch64_pr_long_calls,
 + aarch64_pr_no_long_calls, aarch64_pr_long_calls_off): New declarations.
 + * config/aarch64/aarch64.md (sibcall, sibcall_value): Modified to
 + generate indirect call for sibling call when needed.
 + * config/aarch64/predicate.md (aarch64_call_insn_operand): Modified to
 + exclude a symbol_ref for an indirect call.
 +
  2014-10-22  Richard Sandiford  richard.sandif...@arm.com
  
   * lra.c (lra): Remove call to recog_init.
 Index: gcc/config/aarch64/predicates.md
 ===
 --- gcc/config/aarch64/predicates.md  (revision 216558)
 +++ gcc/config/aarch64/predicates.md  (working copy)
 @@ -27,7 +27,8 @@
  )
  
  (define_predicate aarch64_call_insn_operand
 -  (ior (match_code symbol_ref)
 +  (ior (and (match_code symbol_ref)
 + (match_test !aarch64_is_long_call_p (op)))
 (match_operand 0 register_operand)))
  
  (define_predicate aarch64_simd_register
 Index: gcc/config/aarch64/aarch64.md
 ===
 --- gcc/config/aarch64/aarch64.md (revision 216558)
 +++ gcc/config/aarch64/aarch64.md (working copy)
 @@ -581,11 +581,13 @@
   

Re: [PATCH][AArch64] Add bounds checking to vqdm*_lane intrinsics via a qualifier that also flips endianness

2014-11-11 Thread Alan Lawrence

[Resending in gcc-patches-accepted form]

I'm working on a patch for vget_lane (that removes the be_checked_get_lane thing 
which isn't an intrinsic). Other than that, no not yet - loads and stores I was 
thinking to wait until David Sherwood + Alan Hayward's patches have been 
settled, but there's still ARM, indeed.


If you have any way/ideas to get better error messages (i.e. line numbers),
that'd be particularly good, tho  :)

Cheers, Alan

Charles Baylis wrote:



On 6 November 2014 10:19, Alan Lawrence alan.lawre...@arm.com 
mailto:alan.lawre...@arm.com wrote:


This generates out-of-range errors at compile- (rather than
assemble-)time for the vqdm*_lane intrinsics, and also provides a
single place to do bigendian lane-swapping for all those intrinsics
(and others to follow in later patches). This allows us to remove
many define_expands that just do a range-check and endian-swap
before outputting the RTL for a corresponding _internal insn.

Changes to aarch64-simd.md http://aarch64-simd.md are not as big
as they look, they are highly repetitive, like the code they are
removing! Testcases are also repetitive, as unfortunately dg-error
doesn't care *how many* errors there were matching it's pattern, as
long as at least 1, hence having to separate each into own file -
the last 0 in the dg-error disables the line-number checking, as
the line numbers in our error messages refer to lines within
arm_neon.h rather than within the test case. (They do at least
mention the user function containing the call to the intrinsic.)

Ok for trunk?


It looks like there are a few places where you have 8 spaces where a tab 
ought to be. Other than that, it looks good to me (but I can't approve)


I am looking making errors found in arm_neon.h a bit more user friendly, 
which depends on checking bounds on constant int parameters as you've 
done here.


Do you plan to do similar changes for loads/stores/shifts, and also for 
the ARM back-end? I can help out if you don't already have patches in 
development.


Charles





  1   2   3   >