Re: [PING][AARCH64, Question] Does AARCH64 GCC support long calls?

2014-10-24 Thread Andrew Pinski
On Fri, Oct 24, 2014 at 11:28 PM, Yangfei (Felix)  wrote:
>> On Fri, Oct 24, 2014 at 8:35 PM, Yangfei (Felix)  
>> wrote:
>> >> Hi,
>> >>
>> >> I find that the -mlong-calls option is not there for AARCH64. So
>> >> can this port generate long calls?
>> >> Any plan on this option? I would like to have a try on this if it's 
>> >> missing :-)
>> >> Thanks.
>> >
>> >
>> > Any comments?
>>
>> Yes you can use -mcmodel=large to this effect I think.
>>
>> Thanks,
>> Andrew Pinski
>
>
> Thanks for the reply.  It seems that -mcmodel=large is different from 
> -mlong-calls.
> GCC still emit the BL instruction with -mcmodel=large.  I thinks GCC should 
> emit BLR instruction with -mlong-calls, right?


Oh right.  Also it looks like it is not hooked up but the support is
partly there:
/* Return true if calls to DECL should be treated as
   long-calls (ie called via a register).  */
static bool
aarch64_decl_is_long_call_p (const_tree decl ATTRIBUTE_UNUSED)
{
  return false;
}

I had added the attribute in a version for this and the users of the
toolchain has not reported a bug about it so it seems like it is
working.
I don't have time right now to add support for the option or port the
attribute to the latest version but it should be easy as returning
true from that function when the option is turned on.

Thanks,
Andrew Pinski

>
> void test();
> int main()
> {
> test();
> }
>
> Assembly for this main with -mcmodel=large option:
>
> main:
> stp x29, x30, [sp, -16]!
> add x29, sp, 0
> bl  test
> ldp x29, x30, [sp], 16
> ret
>


Re: [PING][AARCH64, Question] Does AARCH64 GCC support long calls?

2014-10-24 Thread Yangfei (Felix)
> On Fri, Oct 24, 2014 at 8:35 PM, Yangfei (Felix)  
> wrote:
> >> Hi,
> >>
> >> I find that the -mlong-calls option is not there for AARCH64. So
> >> can this port generate long calls?
> >> Any plan on this option? I would like to have a try on this if it's 
> >> missing :-)
> >> Thanks.
> >
> >
> > Any comments?
> 
> Yes you can use -mcmodel=large to this effect I think.
> 
> Thanks,
> Andrew Pinski


Thanks for the reply.  It seems that -mcmodel=large is different from 
-mlong-calls. 
GCC still emit the BL instruction with -mcmodel=large.  I thinks GCC should 
emit BLR instruction with -mlong-calls, right? 

void test();
int main()
{
test();
}

Assembly for this main with -mcmodel=large option:

main:
stp x29, x30, [sp, -16]!
add x29, sp, 0
bl  test
ldp x29, x30, [sp], 16
ret



[pr/63582] Don't even store __int128 types if not supported.

2014-10-24 Thread DJ Delorie

Fixed PR/63582.  Tested with no regressions on x86-64 and ix86.  Ok?

* tree.c (build_common_tree_nodes): Don't even store the
__int128 types if they're not supported.

Index: tree.c
===
--- tree.c  (revision 216676)
+++ tree.c  (working copy)
@@ -9623,13 +9623,14 @@ build_common_tree_nodes (bool signed_cha
 {
   int_n_trees[i].signed_type = make_signed_type (int_n_data[i].bitsize);
   int_n_trees[i].unsigned_type = make_unsigned_type 
(int_n_data[i].bitsize);
   TYPE_SIZE (int_n_trees[i].signed_type) = bitsize_int 
(int_n_data[i].bitsize);
   TYPE_SIZE (int_n_trees[i].unsigned_type) = bitsize_int 
(int_n_data[i].bitsize);
 
-  if (int_n_data[i].bitsize > LONG_LONG_TYPE_SIZE)
+  if (int_n_data[i].bitsize > LONG_LONG_TYPE_SIZE
+ && int_n_enabled_p[i])
{
  integer_types[itk_intN_0 + i * 2] = int_n_trees[i].signed_type;
  integer_types[itk_unsigned_intN_0 + i * 2] = 
int_n_trees[i].unsigned_type;
}
 }
 


Re: [PING][AARCH64, Question] Does AARCH64 GCC support long calls?

2014-10-24 Thread Andrew Pinski
On Fri, Oct 24, 2014 at 8:35 PM, Yangfei (Felix)  wrote:
>> Hi,
>>
>> I find that the -mlong-calls option is not there for AARCH64. So can 
>> this port
>> generate long calls?
>> Any plan on this option? I would like to have a try on this if it's 
>> missing :-)
>> Thanks.
>
>
> Any comments?

Yes you can use -mcmodel=large to this effect I think.

Thanks,
Andrew Pinski


[PING][AARCH64, Question] Does AARCH64 GCC support long calls?

2014-10-24 Thread Yangfei (Felix)
> Hi,
> 
> I find that the -mlong-calls option is not there for AARCH64. So can this 
> port
> generate long calls?
> Any plan on this option? I would like to have a try on this if it's 
> missing :-)
> Thanks.


Any comments?


[COMMITTED][PATCH PR63173] [AARCH64, NEON] Improve vld[234](q?)_dup intrinsics

2014-10-24 Thread Yangfei (Felix)
> On 24 October 2014 03:21, Yangfei (Felix)  wrote:
> 
> > Thanks for the comments. I updated the patch with the intrinsic moved to its
> place.
> > Attached please find the new version of the patch.
> > OK for the trunk?
> >
> >
> > Index: gcc/ChangeLog
> >
> =
> ==
> > --- gcc/ChangeLog   (revision 216558)
> > +++ gcc/ChangeLog   (working copy)
> > @@ -1,3 +1,39 @@
> > +2014-10-23  Felix Yang  
> > +   Jiji Jiang 
> 
> Double space before "<".  Otherwise OK.
> Thanks /Marcus

Committed to trunk as r216630 with the space issue fixed. Thanks.



[patch] only emit one DIE for external declarations in the local scope

2014-10-24 Thread Aldy Hernandez
[Jason approved this patch off-line, and I am committing now that tests 
have successfully run.]


This is a bug I found while investigating early dwarf generation, but 
that is broken mainline as well.


For the following code:

namespace S
{
  int i=777;
  int
  f()
  {
int i = 42;
{
  extern int i;
  return i;
}
  }
}

...we end up emitting an extern declaration for "i" twice, once in the 
innermost lexical scope, and one in the namespace.  The one in the 
namespace is unnecessary (and incorrect), although it is really not 
impacting any testcases since the second one is the one that takes 
precedence within the innermost scope.


With this patch, we get one less DIE for this scenario on mainline, 
while fixing other problems in early dwarf.  Double yay!


Aldy
commit 81de7d658e94160426b38e6c43c09a484fb530ba
Author: Aldy Hernandez 
Date:   Fri Oct 24 18:00:49 2014 -0600

* dwarf2out.c (declare_in_namespace): Only emit external
declarations in the local scope once.

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index a87f9c0..3bce20f 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -20476,6 +20476,26 @@ declare_in_namespace (tree thing, dw_die_ref 
context_die)
   if (debug_info_level <= DINFO_LEVEL_TERSE)
 return context_die;
 
+  /* External declarations in the local scope only need to be emitted
+ once, not once in the namespace and once in the scope.
+
+ This avoids declaring the `extern' below in the
+ namespace DIE as well as in the innermost scope:
+
+  namespace S
+ {
+int i=5;
+int foo()
+   {
+  int i=8;
+  extern int i;
+ return i;
+   }
+  }
+  */
+  if (DECL_P (thing) && DECL_EXTERNAL (thing) && local_scope_p (context_die))
+return context_die;
+
   /* If this decl is from an inlined function, then don't try to emit it in its
  namespace, as we will get confused.  It would have already been emitted
  when the abstract instance of the inline function was emitted anyways.  */


Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal

2014-10-24 Thread David Edelsohn
On Fri, Oct 24, 2014 at 8:06 AM, Alan Lawrence  wrote:
> This migrates the reduction patterns in altivec.md and vector.md to the new
> names. I've not touched paired.md as I wasn't really sure how to fix that
> (how do I vec_extractv2sf ?), moreover the testing I did didn't seem to
> exercise any of those patterns (iow: I'm not sure what would be an
> appropriate target machine?).
>
> I note the reduc_uplus_v16qi (which I've removed, as unsigned and signed
> addition should be equivalent) differed from reduc_splus_v16qi in using
> gen_altivec_vsum4ubs rather than gen_altivec_vsum4sbs.  Testcases
> gcc.dg/vect/{slp-24-big-array.c,slp-24.c,vect-reduc-1char-big-array.c,vert-reduc-1char.c}
> thus produce assembly which differs from previously (only) in that
> "vsum4ubs" becomes "vsum4sbs". These tests are still passing so I assume
> this is OK.
>
> The combining of signed and unsigned addition also improves
> gcc.dg/vect/{vect-outer-4i.c,vect-reduc-1short.c,vect-reduc-dot-u8b.c,vect-reduc-pattern-1c-big-array.c,vect-reduc-pattern-1c.c}
> : these are now reduced using direct vector reduction, rather than with
> shifts as previously (because there was only a reduc_splus rather than the
> reduc_uplus these tests looked for).
>
> ((Side note: the RTL changes to vector.md are to match the combine patterns
> in vsx.md; now that we now longer depend upon combine to generate those
> patterns (as the optab outputs them directly), one might wish to remove the
> smaller pattern from vsx.md, and/or simplify the RTL. I theorize that a
> reduction of a two-element vector is just adding the first element to the
> second, so maybe to something like
>
>   [(parallel [(set (match_operand:DF 0 "vfloat_operand" "")
>(VEC_reduc:V2DF
> (vec_select:DF
>  (match_operand:V2DF 1 "vfloat_operand" "")
>  (parallel [(const_int 1)]))
> (vec_select:DF
>  (match_dup 1)
>  (parallel [(const_int 0)]
>   (clobber (match_scratch:V2DF 2 ""))])]
>
> but I think it's best for me to leave that to the port maintainers.))
>
> Bootstrapped and check-gcc on powerpc64-none-linux-gnu
> (gcc110.fsffrance.org, with thanks to the GCC Compile Farm).
>
> gcc/ChangeLog:
>
> * config/rs6000/altivec.md (reduc_splus_): Rename to...
> (reduc_plus_scal_): ...this, and rs6000_expand_vector_extract.
> (reduc_uplus_v16qi): Remove.
>
> * config/rs6000/vector.md (VEC_reduc_name): change "splus" to "plus"
> (reduc__v2df): Rename to...
> (reduc__scal_v2df): ...this, wrap VEC_reduc in a
> vec_select of element 1.
> (reduc__v4sf): Rename to...
> (reduc__scal_v4sf): ...this, wrap VEC_reduc in a
> vec_select of element 3, add scratch register.
>

This needs some input from Bill, but he will be busy with a conference
this week.

- David


Re: Optimize powerpc*-*-linux* 32-bit classic hard/soft float hardfp/soft-fp use

2014-10-24 Thread David Edelsohn
On Wed, Oct 22, 2014 at 10:02 PM, Joseph S. Myers
 wrote:
> Continuing the cleanups of libgcc soft-fp configuration for
> powerpc*-*-linux* in preparation for implementing
> TARGET_ATOMIC_ASSIGN_EXPAND_FENV for soft-float and e500, this patch
> optimizes the choice of which functions to build for the 32-bit
> classic hard-float and soft-float cases.  (e500 will be dealt with in
> a separate patch which will need to add new features to t-hardfp and
> t-softfp; this patch keeps the status quo for e500.)
>
> For hard-float, while the functions in question are part of the libgcc
> ABI there is no need for them to contain software floating point code:
> no newly built code should use them, and if anything does use them
> it's most efficient (space and speed) for them to pass straight
> through to floating-point hardware instructions; this case is made to
> use t-hardfp to achieve that.  For soft-float, direct use of soft-fp
> functions for operations involving DImode or unsigned integers is more
> efficient than using the libgcc2.c versions of those operations to
> convert to operations on other types (which then end up calling
> soft-fp functions for those other types, possibly more than once);
> this case is thus stopped from using t-softfp-excl.  (A future patch
> will stop the e500 cases from using t-softfp-excl as well.)
>
> Tested with no regressions for crosses to powerpc-linux-gnu (soft
> float and classic hard float); also checked that the same set of
> symbols and versions is exported from shared libgcc before and after
> the patch.  OK to commit?
>
> 2014-10-23  Joseph Myers  
>
> * configure.ac (ppc_fp_type): Set variable on powerpc*-*-linux*.
> * configure: Regenerate.
> * config.host (powerpc*-*-linux*): Use $ppc_fp_type to determine
> additions to tmake_file.  Use t-hardfp-sfdf and t-hardfp instead
> of soft-fp for 32-bit classic hard float.  Do not use
> t-softfp-excl for soft float.

Okay.

Thanks, David


Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-24 Thread Jeff Law

On 10/24/14 17:37, Evgeny Stupachenko wrote:

What if we remove the check?
glibc build pass?

That would be my inclination...   But it's not my decision to make.

The first check is to verify glibc builds and passes its testsuite with 
the new compiler and that check removed.


jeff



Re: Only allow e500 double in SPE_SIMD_REGNO_P registers

2014-10-24 Thread David Edelsohn
On Fri, Oct 24, 2014 at 7:09 PM, Joseph S. Myers
 wrote:
> rs6000_hard_regno_nregs_internal allows SPE vectors in single
> registers satisfying SPE_SIMD_REGNO_P (i.e. register numbers 0 to
> 31).  However, the corresponding test for e500 double treats all
> registers as being able to store a 64-bit value, rather than just
> those GPRs.
>
> Logically this inconsistency is wrong; in addition, it causes problems
> unwinding from signal handlers.  linux-unwind.h uses
> ARG_POINTER_REGNUM as a place to store the return address from a
> signal handler, but this logic in rs6000_hard_regno_nregs_internal
> results in that being considered an 8-byte register, resulting in
> assertion failures.
> ( first
> needs to be applied for unwinding to work in general on e500.)  This
> patch makes rs6000_hard_regno_nregs_internal handle the e500 double
> case consistently with SPE vectors.
>
> Tested with no regressions with cross to powerpc-linux-gnuspe (given
> the aforementioned patch applied).  Failures of signal handling
> unwinding tests such as gcc.dg/cleanup-{8,9,10,11}.c are fixed by this
> patch.  OK to commit?
>
> 2014-10-24  Joseph Myers  
>
> * config/rs6000/rs6000.c (rs6000_hard_regno_nregs_internal): Do
> not allow e500 double in registers not satisyfing
> SPE_SIMD_REGNO_P.

Okay.

Thanks, David


Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-24 Thread Evgeny Stupachenko
What if we remove the check?
glibc build pass?

On Sat, Oct 25, 2014 at 3:09 AM, Andrew Pinski  wrote:
> On Fri, Oct 10, 2014 at 12:43 AM, Evgeny Stupachenko  
> wrote:
>> i386 specific part of the patch:
>>
>> 2014-10-08  Ilya Enkovich  
>> Vladimir Makarov  
>> * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New.
>> (ix86_init_pic_reg): New.
>> (ix86_select_alt_pic_regnum): Add check on pseudo register.
>> (ix86_save_reg): Likewise.
>> (ix86_expand_prologue): Remove irrelevant code.
>> (ix86_output_function_epilogue): Add check on pseudo register.
>> (set_pic_reg_ever_alive): New.
>> (legitimize_pic_address): Replace df_set_regs_ever_live with new
>> set_pic_reg_ever_alive.
>> (legitimize_tls_address): Likewise.
>> (ix86_pic_register_p): New check.
>> (ix86_delegitimize_address): Add check on pseudo register.
>> (ix86_expand_call): Insert move from pseudo PIC register to ABI
>> defined REAL_PIC_OFFSET_TABLE_REGNUM.
>> (TARGET_INIT_PIC_REG): New.
>> (TARGET_USE_PSEUDO_PIC_REG): New.
>> (PIC_OFFSET_TABLE_REGNUM): New check.
>
>
> This patch breaks glibc's ld.so on i686.
> glibc has a check to make sure the PIC register is setup correctly:
> /* Consistency check for position-independent code.  */
> #ifdef __PIC__
> # define check_consistency()  \
>   ({ int __res;  \
>  __asm__ __volatile__  \
>(LOAD_PIC_REG_STR (cx) ";"  \
> "subl %%ebx, %%ecx;"  \
> "je 1f;"  \
> "ud2;"  \
> "1:\n"  \
> : "=c" (__res));  \
>  __res; })
> #endif
>
> This depends on ebx being the PIC register.  Now we don't have this so
> we get ud2 in some cases.
>
>
>
> Thanks,
> Andrew Pinski


genmatch infinite loop during bootstrap on AIX

2014-10-24 Thread David Edelsohn
genmatch is hanging when bootstrapping on AIX (gcc111).  When I attach
to the process:

#0  0x1007efac in std::basic_string,
std::allocator >::basic_string ()
#1  0x1000e6b0 in _ZN6parser13parse_captureEP7operand (this=0x300594b8, op=0x0)
at /home/dje/src/src/gcc/genmatch.c:2607
#2  0x1000e9f0 in _ZN6parser10parse_exprEv (this=0x2ff20208)
at /home/dje/src/src/gcc/genmatch.c:2669
#3  0x1000ee38 in _ZN6parser8parse_opEv (this=0x2ff20208)
at /home/dje/src/src/gcc/genmatch.c:2728
#4  0x1000efc4 in
_ZN6parser14parse_simplifyEjR3vecIP8simplify7va_heap6vl_ptrEP12predicate_idP4expr
(this=0x2ff20208, match_location=4614, simplifiers=...,
matcher=0x0, result=0x0) at /home/dje/src/src/gcc/genmatch.c:2792
#5  0x100102fc in _ZN6parser13parse_patternEv (this=0x2ff20208)
at /home/dje/src/src/gcc/genmatch.c:3052
#6  0x10010c0c in _ZN6parser9parse_forEj (this=0x2ff20208)
at /home/dje/src/src/gcc/genmatch.c:2991
#7  0x10010350 in _ZN6parser13parse_patternEv (this=0x2ff20208)
at /home/dje/src/src/gcc/genmatch.c:3090
#8  0x1001122c in _ZN6parserC2EP10cpp_reader (this=0x2ff20208, r_=0x3003bbec)
at /home/dje/src/src/gcc/genmatch.c:3122
#9  0x10004acc in main (argc=,
argv=) at  _start_ :3204


Re: PR debug/60655, debug loc expressions

2014-10-24 Thread Maciej W. Rozycki
On Mon, 20 Oct 2014, Alan Modra wrote:

> You were correct to be suspicious that we weren't simplifying as we
> should.  After more time in the debugger than I care to admit, I found
> the underlying cause.
> 
> One of the var loc expressions is
> (plus:SI (plus:SI (not:SI (debug_expr:SI D#9))
> (value/u:SI 58:4373 @0x18d3968/0x18ef230))
> (debug_expr:SI D#5))
> 
> which after substitution (in bb7) becomes
> (plus:SI (plus:SI (not:SI (plus:SI (reg:SI 5 5 [orig:212 D.2333 ] [212])
> (const:SI (plus:SI (symbol_ref:SI ("*.LANCHOR0") [flags 
> 0x182])
> (const_int -1 [0x])
> (reg:SI 10 10 [orig:223 ivtmp.33 ] [223]))
> (plus:SI (reg:SI 5 5 [orig:212 D.2333 ] [212])
> (const:SI (plus:SI (symbol_ref:SI ("*.LANCHOR0") [flags 0x182])
> (const_int 323 [0x143])
> 
> The above has 8 ops by the time you turn ~x into -x - 1, and exceeds
> the allowed number of elements in the simplify_plus_minus ops array.
> Note that the ops array has 8 elements but the code only allows 7 to
> be entered, a bug since the "spare" element isn't a sentinal or used
> in any other way.
> 
> This resulted in a partial simplification of the expression to
> (plus:SI (plus:SI (reg:SI 10 10 [orig:223 ivtmp.33 ] [223])
> (symbol_ref:SI ("*.LANCHOR0") [flags 0x182]))
> (const:SI (minus:SI (const_int 323 [0x143])
> (symbol_ref:SI ("*.LANCHOR0") [flags 0x182]
> 
> I also noticed another small bug in simplify_plus_minus.  n_constants
> ought to be the number of constants in ops, not the number of times
> we look at a constant.
> 
> The "Handle CONST wrapped NOT, NEG and MINUS" in the previous patch
> seems to no longer be necessary, so I took that out (didn't hit the
> code in powerpc64-linux, powerpc-linux and x86_64-linux bootstrap and
> regression tests).
> 
> Bootstrapped and regression tested powerpc64-linux and x86_64-linux.

 For the record, I have now regression-tested your updated change too with 
my usual powerpc-gnu-linux multilibs:

-mcpu=603e
-mcpu=603e -msoft-float
-mcpu=8540 -mfloat-gprs=single -mspe=yes -mabi=spe
-mcpu=8548 -mfloat-gprs=double -mspe=yes -mabi=spe
-mcpu=7400 -maltivec -mabi=altivec
-mcpu=e6500 -maltivec -mabi=altivec
-mcpu=e5500 -m64
-mcpu=e6500 -m64 -maltivec -mabi=altivec

observing no issues.  Thanks for the fix!

  Maciej


Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-24 Thread Andrew Pinski
On Fri, Oct 10, 2014 at 12:43 AM, Evgeny Stupachenko  wrote:
> i386 specific part of the patch:
>
> 2014-10-08  Ilya Enkovich  
> Vladimir Makarov  
> * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New.
> (ix86_init_pic_reg): New.
> (ix86_select_alt_pic_regnum): Add check on pseudo register.
> (ix86_save_reg): Likewise.
> (ix86_expand_prologue): Remove irrelevant code.
> (ix86_output_function_epilogue): Add check on pseudo register.
> (set_pic_reg_ever_alive): New.
> (legitimize_pic_address): Replace df_set_regs_ever_live with new
> set_pic_reg_ever_alive.
> (legitimize_tls_address): Likewise.
> (ix86_pic_register_p): New check.
> (ix86_delegitimize_address): Add check on pseudo register.
> (ix86_expand_call): Insert move from pseudo PIC register to ABI
> defined REAL_PIC_OFFSET_TABLE_REGNUM.
> (TARGET_INIT_PIC_REG): New.
> (TARGET_USE_PSEUDO_PIC_REG): New.
> (PIC_OFFSET_TABLE_REGNUM): New check.


This patch breaks glibc's ld.so on i686.
glibc has a check to make sure the PIC register is setup correctly:
/* Consistency check for position-independent code.  */
#ifdef __PIC__
# define check_consistency()  \
  ({ int __res;  \
 __asm__ __volatile__  \
   (LOAD_PIC_REG_STR (cx) ";"  \
"subl %%ebx, %%ecx;"  \
"je 1f;"  \
"ud2;"  \
"1:\n"  \
: "=c" (__res));  \
 __res; })
#endif

This depends on ebx being the PIC register.  Now we don't have this so
we get ud2 in some cases.



Thanks,
Andrew Pinski


Only allow e500 double in SPE_SIMD_REGNO_P registers

2014-10-24 Thread Joseph S. Myers
rs6000_hard_regno_nregs_internal allows SPE vectors in single
registers satisfying SPE_SIMD_REGNO_P (i.e. register numbers 0 to
31).  However, the corresponding test for e500 double treats all
registers as being able to store a 64-bit value, rather than just
those GPRs.

Logically this inconsistency is wrong; in addition, it causes problems
unwinding from signal handlers.  linux-unwind.h uses
ARG_POINTER_REGNUM as a place to store the return address from a
signal handler, but this logic in rs6000_hard_regno_nregs_internal
results in that being considered an 8-byte register, resulting in
assertion failures.
( first
needs to be applied for unwinding to work in general on e500.)  This
patch makes rs6000_hard_regno_nregs_internal handle the e500 double
case consistently with SPE vectors.

Tested with no regressions with cross to powerpc-linux-gnuspe (given
the aforementioned patch applied).  Failures of signal handling
unwinding tests such as gcc.dg/cleanup-{8,9,10,11}.c are fixed by this
patch.  OK to commit?

2014-10-24  Joseph Myers  

* config/rs6000/rs6000.c (rs6000_hard_regno_nregs_internal): Do
not allow e500 double in registers not satisyfing
SPE_SIMD_REGNO_P.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 216673)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -1721,7 +1721,7 @@ rs6000_hard_regno_nregs_internal (int regno, enum
  SCmode so as to pass the value correctly in a pair of
  registers.  */
   else if (TARGET_E500_DOUBLE && FLOAT_MODE_P (mode) && mode != SCmode
-  && !DECIMAL_FLOAT_MODE_P (mode))
+  && !DECIMAL_FLOAT_MODE_P (mode) && SPE_SIMD_REGNO_P (regno))
 reg_size = UNITS_PER_FP_WORD;
 
   else

-- 
Joseph S. Myers
jos...@codesourcery.com


[hsa] Better workaround in locating the file with HSAIL

2014-10-24 Thread Martin Jambor
Hi,

this patch has been mostly written by Ganesh and has only slightly
been then amended by me.  It improves the current workaround which we
use to locate HSAIL until we can link it properly by attempting to
derive the file name from the current input name.  More details in the
README.hsa changes.  Committed to the branch.

Thanks,

Martin


2014-10-24  Ganesh Gopalasubramanian  
Martin Jambor  

gcc/
* README.hsa: Removed the part about extracting the HSA ELF sections.
* hsa-gen.c (wrap_hsa): Deduce the file name and pass it to libgomp.

libgomp/
* hsaokra.c (__hsa_launch_kernel): Do not segfault if file cannot be
opened.

diff --git a/gcc/ChangeLog.hsa b/gcc/ChangeLog.hsa
index 04ecd06..8e592e7 100644
--- a/gcc/ChangeLog.hsa
+++ b/gcc/ChangeLog.hsa
@@ -1,3 +1,9 @@
+2014-10-24  Martin Jambor  
+
+   * README.hsa: Removed the part about extracting the HSA ELF sections.
+   * hsa-gen.c (wrap_hsa): Deduce the file name and pass it to libgomp.
+
+
 2014-10-24  Martin Jambor  
 
* hsa.h (hsa_symbol): Renamed offset to directive_offset.  Updated all
diff --git a/gcc/README.hsa b/gcc/README.hsa
index e41d762..a15f4ec 100644
--- a/gcc/README.hsa
+++ b/gcc/README.hsa
@@ -65,8 +65,8 @@ have not tested anything else):
$ git clone 'https://github.com/HSAFoundation/Okra-Interface-to-HSA-Device'
$ export 
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/testhsa/Okra-Interface-to-HSA-Device/okra/dist/bin
 
-   These and the steps below were tested with revision 4c4dc65 from
-   August 14 2014.
+   These and the steps below were tested with revision d457e8c from
+   October 6 2014.
 
You can now also try the sample check shipped with OKRA to test
whether your setup works.  Go into directory
@@ -135,19 +135,21 @@ example, however, it reports success like this:
 
 omp_veccopy.c:15:12: note: Parallel construct will be turned into an HSA kernel
 
-The next step is to extract the HSA ELF sections to a 32-bit file
-called hsakernel.o.  OKRA currently handles 32-bit ELF files only.
-This step is currently necessary but there is ongoing work aimed at
-eliminating it (yes, it means that currently there is no way of
-combining HSA from different compilation units):
-
-  $ objcopy -O elf32-i386 -j hsa_data -j hsa_code -j hsa_operand omp_veccopy.o 
hsakernel.o
-  
 Now link the program as usual, providing a path to libgomp of the
 installed hsa branch gcc:
 
   $ $HOME/gcc/hsa/inst/bin/gcc omp_veccopy.o -lm -fopenmp 
-Wl,-rpath,$HOME/gcc/hsa/inst/lib64 -o omp_veccopy
 
+The reason why it is necessary to proceed in two steps rather than do
+both at once is that at this point, HSA run time will expect the HSA
+kernel in object file with the same name as the input file, only with
+the suffix changed to .o, in the current working directory when
+executing the program.  If you use LTO, there is no input file (such
+as when compiling from standard input) or the input file name does not
+have a dot in it, run-time will expect the HSA ELF sections in a file
+called hsakernel.o.  This is a temporary situation and will be fixed,
+of course.
+
 Assuming you have all necessary libraries in your LD_LIBRARY_PATH, you
 can now run the example:
 
diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index 05e3015..1080ea2 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -2102,8 +2102,25 @@ wrap_hsa (void)
str = build_string_literal (1, "");
bool kern_p = lookup_attribute ("hsakernel",
DECL_ATTRIBUTES (fndecl));
+   if (!in_lto_p && main_input_filename)
+ {
+   char *filename;
+   const char *part = strrchr (main_input_filename, '/');
+   if (!part)
+ part = main_input_filename;
+   asprintf (&filename, "%s", part);
+   char* extension = strchr (filename, '.');
+   if (extension)
+ {
+   strcpy (extension, "\0");
+   asprintf (&extension, "%s", ".o\0");
+   strcat (filename, extension);
+   str = build_string_literal (strlen(filename)+1,filename);
+ }
+ }
CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, str);
 
+
int slen = IDENTIFIER_LENGTH (DECL_ASSEMBLER_NAME (fndecl));
if (asprintf (&tmpname, "&%s",
  IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (fndecl))) < 
0)
diff --git a/libgomp/ChangeLog.hsa b/libgomp/ChangeLog.hsa
index 1deae12..1ff5486 100644
--- a/libgomp/ChangeLog.hsa
+++ b/libgomp/ChangeLog.hsa
@@ -1,3 +1,8 @@
+2014-10-24  Martin Jambor  
+
+   * hsaokra.c (__hsa_launch_kernel): Do not segfault if file cannot be
+   opened.
+
 2014-09-26  Saravanan Ekanathan 
 
* hsaokra.c (__hsa_launch_kernel): Use BRIG generated by
diff --git a/libgomp/hsaokra.c b/libgomp/hsaokra.c
index c9e0b1e..c41b86b 100644
--- a/libgomp/hsaokra.c
+++ b/libgomp/hsaokra.

[hsa] Simple cleanup of different offsets in hsa-gen.h

2014-10-24 Thread Martin Jambor
Hi,

this is a very simple cleanup, renaming two fields of different
classes called offset to a more descriptive name and removal of one
which is no longer used.

Committed to the branch.

Thanks,

Martin


2014-10-24  Martin Jambor  

* hsa.h (hsa_symbol): Renamed offset to directive_offset.  Updated all
users.
(hsa_op_base): Renamed offset to brig_op_offset. Updated all
users.
(hsa_op_immed): Removed unused field offset.

diff --git a/gcc/ChangeLog.hsa b/gcc/ChangeLog.hsa
index d9695fd..04ecd06 100644
--- a/gcc/ChangeLog.hsa
+++ b/gcc/ChangeLog.hsa
@@ -1,3 +1,11 @@
+2014-10-24  Martin Jambor  
+
+   * hsa.h (hsa_symbol): Renamed offset to directive_offset.  Updated all
+   users.
+   (hsa_op_base): Renamed offset to brig_op_offset. Updated all
+   users.
+   (hsa_op_immed): Removed unused field offset.
+
 2014-10-07  Martin Jambor  
 
* README.hsa: Added known tested revisions of required git
diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
index 1598adb..b74f557 100644
--- a/gcc/hsa-brig.c
+++ b/gcc/hsa-brig.c
@@ -387,8 +387,8 @@ emit_directive_variable (struct hsa_symbol *symbol)
   static unsigned res_name_offset;
   char prefix;
 
-  if (symbol->offset)
-return symbol->offset;
+  if (symbol->directive_offset)
+return symbol->directive_offset;
 
   dirvar.base.byteCount = htole16 (sizeof (dirvar));
   dirvar.base.kind = htole16 (BRIG_KIND_DIRECTIVE_VARIABLE);
@@ -432,8 +432,8 @@ emit_directive_variable (struct hsa_symbol *symbol)
   dirvar.modifier = BRIG_SYMBOL_DEFINITION;
   dirvar.reserved = 0;
 
-  symbol->offset = brig_code.add (&dirvar, sizeof (dirvar));
-  return symbol->offset;
+  symbol->directive_offset = brig_code.add (&dirvar, sizeof (dirvar));
+  return symbol->directive_offset;
 }
 
 /* Emit directives describing the function, for example its input and output
@@ -658,11 +658,11 @@ enqueue_op (hsa_op_base *op)
 {
   unsigned ret;
 
-  if (op->offset)
-return op->offset;
+  if (op->brig_op_offset)
+return op->brig_op_offset;
 
   ret = op_queue.projected_size;
-  op->offset = op_queue.projected_size;
+  op->brig_op_offset = op_queue.projected_size;
 
   if (!op_queue.first_op)
 op_queue.first_op = op;
@@ -873,7 +873,7 @@ emit_queued_operands (void)
 {
   for (hsa_op_base *op = op_queue.first_op; op; op = op->next)
 {
-  gcc_assert (op->offset == brig_operand.total_size);
+  gcc_assert (op->brig_op_offset == brig_operand.total_size);
   if (hsa_op_immed *imm = dyn_cast  (op))
emit_immediate_operand (imm);
   else if (hsa_op_reg *reg = dyn_cast  (op))
diff --git a/gcc/hsa.h b/gcc/hsa.h
index fb66fc4..eeb8386 100644
--- a/gcc/hsa.h
+++ b/gcc/hsa.h
@@ -48,7 +48,7 @@ struct hsa_symbol
 
   /* Once written, this is the offset of the associated symbol directive.  Zero
  means the symbol has not been written yet.  */
-  unsigned offset;
+  unsigned directive_offset;
 
   /* HSA type of the parameter.  */
   BrigType16_t type;
@@ -70,7 +70,7 @@ struct hsa_op_base
 
   /* Offset to which the associated operand structure will be written.  Zero if
  yet not scheduled for writing.  */
-  unsigned offset;
+  unsigned brig_op_offset;
 
   /* The type of a particular operand.  */
   BrigKinds16_t kind;
@@ -83,10 +83,6 @@ struct hsa_op_immed : public hsa_op_base
   /* Type of the. */
   BrigType16_t type;
 
-  /* Offset to which the associated immediate operand structure will be 
written.
- Zero if not yet scheduled for writing */
-  unsigned offset;
-
   /* Value as represented by middle end.  */
   tree value;
 };


Re: [patch] should not define bool, true or false as macros for C++

2014-10-24 Thread Jason Merrill
OK.  Gerald, were you thinking of specific software that would be 
affected by this change?


Jason


Re: [PATCH][Commited] Fix PR sanitizer/63638

2014-10-24 Thread Yuri Gribov
On Sat, Oct 25, 2014 at 12:19 AM, Yuri Gribov  wrote:
> Same fix for 5.0 will be probably commited in

Forgot to mention that this one is for 4.9.

-Y


Re: update address taken: don't drop clobbers

2014-10-24 Thread Jeff Law

On 10/17/14 14:41, Marc Glisse wrote:

On Thu, 16 Oct 2014, Jeff Law wrote:


BTW, I dislike having multiple DCE implementations...

Similarly.  The proposal above was just to determine if we should
schedule DCE or not.


Thinking about it some more, I don't think we should need any kind of
DCE here. The rewriting in update_ssa already does a form of forward
propagation that avoids generating dead assignments, the problem only
occurs if we explicitly introduce this new assignment. So I believe we
should go back to an earlier version, like the attached, which is less
work for the compiler.

And now I can go re-read the old discussion (apparently I should avoid
gsi_replace, and there may be other ways to handle the coalescing).

I'm starting to agree -- a later message indicated you wanted to drop 
the unlink_stmt_vdef call and you wanted to avoid gsi_replace, that 
seems fine.  I'll approve once those things are taken care of.


jeff


[PATCH][Commited] Fix PR sanitizer/63638

2014-10-24 Thread Yuri Gribov
Hi all,

This patch fixes was approved by Jakub in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63638 . Same fix for 5.0
will be probably commited in
https://gcc.gnu.org/ml/gcc-patches/2014-10/msg02066.html .

-Y
commit 87a472b057af66d46a5437137e9535c70de5c786
Author: Yury Gribov 
Date:   Fri Oct 24 19:20:37 2014 +0400

2014-10-24  Yury Gribov  

gcc/
* asan.c (enum asan_check_flags): Fixed ASAN_CHECK_LAST.

gcc/testsuite/
* c-c++-common/asan/pr63638.c: New test.

diff --git a/gcc/asan.c b/gcc/asan.c
index 7c27fe7..f6c42a1 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -250,7 +250,7 @@ enum asan_check_flags
   ASAN_CHECK_NON_ZERO_LEN = 1 << 2,
   ASAN_CHECK_START_INSTRUMENTED = 1 << 3,
   ASAN_CHECK_END_INSTRUMENTED = 1 << 4,
-  ASAN_CHECK_LAST
+  ASAN_CHECK_LAST = 1 << 5
 };
 
 /* Hashtable support for memory references used by gimple
diff --git a/gcc/testsuite/c-c++-common/asan/pr63638.c 
b/gcc/testsuite/c-c++-common/asan/pr63638.c
new file mode 100644
index 000..a8bafc5
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/asan/pr63638.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+
+extern
+#ifdef __cplusplus
+"C"
+#endif
+void *memcpy (void *, const void *, __SIZE_TYPE__);
+
+struct S{
+  long d0, d1, d2, d3, d4, d5, d6;
+};
+
+struct S s[6];
+
+int f(struct S *p)
+{
+  memcpy(p, &s[2], sizeof(*p));
+  memcpy(p, &s[1], sizeof(*p));
+}
+


Re: update address taken: don't drop clobbers

2014-10-24 Thread Jeff Law

On 10/18/14 15:59, Marc Glisse wrote:

On Thu, 10 Jul 2014, Richard Biener wrote:


On Sun, Jun 29, 2014 at 12:33 AM, Marc Glisse 
wrote:


we currently drop clobbers on variables whose address is not taken
anymore.
However, rewrite_stmt has code to replace them with an SSA_NAME with a
default definition (an uninitialized variable), and I believe
rewrite_update_stmt should do the same. This allows us to warn sometimes
(see testcase), but during the debugging I also noticed several
places where
it allowed CCP to simplify further PHIs, so this is also an
optimization.

In an earlier version of the patch, I was using
get_or_create_ssa_default_def (cfun, sym);
(I was reusing the same variable). This passed bootstrap+testsuite on
all
languages except for ada. Indeed, the compiler wanted to coalesce
several
SSA_NAMEs, including those new ones, in out-of-ssa, but couldn't.
There are
abnormal PHIs involved. Maybe it shouldn't have insisted on
coalescing an
undefined ssa_name, maybe something should have prevented us from
reaching
such a situation, but creating a new variable was the simplest
workaround.


Hmm.  We indeed notice "late" that the new names are used in abnormal
PHIs.  Note that usually rewriting a symbol into SSA form does not
create overlapping life-ranges - but of course you are possibly
introducing
those by the new use of the default definitions.

Apart from the out-of-SSA patch you proposed elsewhere a possibility
is to simply never mark undefined SSA names as
SSA_NAME_OCCURS_IN_ABNORMAL_PHI ... (or not mark those
as must-coalesce).


For "not mark those as must-coalesce", replacing the liveness patch with
the attached patch also passed the testsuite: I skip undefined variables
when handling must-coalesce, and let the regular coalescing code handle
them. I am not sure what happens during expansion though, and bootstrap
only hits this issue a couple times in ada so it doesn't prove much.

This patch doesn't conflict with the liveness patch, they are rather
complementary. I didn't test but I am quite confident that having both
patches would also pass bootstrap+testsuite.

Of course that all becomes unnecessary if we use default definitions of
new variables instead of always the same old variable, but I can
understand not wanting all those new artificial variables.

I would be ok with the patch at
https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01787.html
(minus the line with unlink_stmt_vdef, which is indeed unnecessary)
(I looked at what gsi_replace does when replacing a clobber by a nop,
and it seems harmless, but I can manually inline the non-dead parts of
it if you want)
So I'm still trying to get comfortable with this patch.  I guess my 
concerns about having one of the undefined value SSA_NAMEs appearing in 
two conflicting coalesce lists are alleviated by the twiddle to 
coalesce_partitions where we essentially ignore them.


So in the end, they don't end up a part of any partition?  What happens 
when we expand them?  I guess they get a new pseudo since they're a 
distinct partition?  If we had a sensible story for expansion, then I 
could probably get on board with this patch.


And I'm still going to look at the other as well -- as you mention, 
they're independent.


jeff



Re: [PATCH] Add missing requirement to crossmodule-indircall-1a.c

2014-10-24 Thread Jeff Law

On 10/23/14 08:30, jb...@gmx.de wrote:

"Jeff Law" :


On 10/21/14 12:21, jb...@gmx.de wrote:

"Jeff Law" :

On 10/21/14 16:13, Haswell wrote:

The additional source must have the same requirement crossmodule-indircall-1.c 
has.

* crossmodule-indircall-1a.c: Add missing requirement.

Why?  When used by crossmodule-indircall-1.c we'll have already tested
the marker and when used by itself, it does nothing.



So I don't see why you think a marker is needed for this source file.


When configuring --disable-lto it gets compiled twice:

FAIL: gcc.dg/tree-prof/crossmodule-indircall-1a.c compilation,  
-fprofile-generate -D_PROFILE_GENERATE
UNRESOLVED: gcc.dg/tree-prof/crossmodule-indircall-1a.c execution,
-fprofile-generate -D_PROFILE_GENERATE
UNRESOLVED: gcc.dg/tree-prof/crossmodule-indircall-1a.c compilation,  
-fprofile-use -D_PROFILE_USE
UNRESOLVED: gcc.dg/tree-prof/crossmodule-indircall-1a.c execution,
-fprofile-use -D_PROFILE_USE

I'd recommend looking deeper.  I believe that file should be collapsing
down to main () { return 0; } when LTO is not enabled.


I'm not a dejagnu expert, but this is what happens:

/tmp/build/gcc/xgcc -B/tmp/build/gcc/ 
/tmp/gcc-4.9.1/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1a.c 
-fno-diagnostics-show-caret -fdiagnostics-color=never 
/tmp/gcc-4.9.1/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1a.c 
-fprofile-generate -D_PROFILE_GENERATE -lm -o 
/tmp/build/gcc/testsuite/gcc/crossmodule-indircall-1a.x01
/tmp/cc4rrWCn.o: In function `main':
crossmodule-indircall-1a.c:(.text+0x0): multiple definition of `main'
/tmp/ccgMlXGi.o:crossmodule-indircall-1a.c:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status
compiler exited with status 1

Thanks.

What's weird here is the source file is listed twice on the command 
line!  No wonder it's failing.


I can't typically decipher tcl code without trace info and some 
send_user commands to see what the values of various things are.  But 
the following seems, umm, odd:


profopt.exp:
set extra_flags [profopt-get-options $src]
[ ... ]

set options "$extra_options"
lappend options "additional_flags=$option $extra_flags 
$profile_option"


set comp_output [${tool}_target_compile "$src" "$execname1" 
executable $options]



Note $src passed to the target_compile procedure as an argument and also 
inside the string $options.


I'd think that's the real problem here.  Though I have no idea how 
that's expected to work in an LTO enabled compile.


jeff






Re: The nvptx port [11/11] More tools.

2014-10-24 Thread Jeff Law

On 10/22/14 15:11, Bernd Schmidt wrote:

On 10/22/2014 10:31 PM, Jeff Law wrote:

These tools currently require GNU extensions - something I probably
ought to fix if we decide to add them to the gcc build itself.

Would these be more appropriate in binutils?


I don't think so, given that we don't need any piece of regular
binutils. There's no meaningful way to build libbfd. It would be strange
to build binutils and have everything that's normally part of it
disabled at configure time.
Fair enough, but I'm having trouble seeing these in GCC.  Makes me 
wonder if they ought to be a package unto themselves, nvptxtools or 
somesuch.


Note that as a separate package, you don't have to remove the GNU 
extensions :-)


jeff


Re: Fix 63615 - FAIL: gcc.target/i386/addr-sel-1.c

2014-10-24 Thread Jeff Law

On 10/23/14 00:56, Alan Modra wrote:

PR 63615 was caused by r21642 fixing the accounting of "n_constants"
in simplify_plus_minus.  Previously, any expression handled by
simplify_plus_minus that expanded to more than two elements and
contained at least one constant would have resulted in "n_constants"
being larger than one, even if it had only one constant.  This had the
effect of setting "canonicalized" for such expressions.

The missed optimisation had these operands to simplify_plus_minus:
(gdb) p debug_rtx(op0)
(plus:SI (reg:SI 0 ax [96])
 (const_int 1 [0x1]))
$1 = void
(gdb) p debug_rtx(op1)
(symbol_ref:SI ("a") )
$2 = void

resulting in the ops array being populated as
(gdb) p n_ops
$3 = 3
(gdb) p ops[0]@3
$4 = {{op = 0x76d4b360, neg = 0}, {op = 0x76d483a8, neg = 0}, {op = 
0x76c29490, neg = 0}}
(gdb) p debug_rtx(ops[0].op)
(reg:SI 0 ax [96])
$5 = void
(gdb) p debug_rtx(ops[1].op)
(symbol_ref:SI ("a") )
$6 = void
(gdb) p debug_rtx(ops[2].op)
(const_int 1 [0x1])
$7 = void

Of note here is that the operands have been reordered from their
original positions.  What was ax + 1 + sym is now ax + sym + 1, and it
happens that this ordering is correct in the sense that
simplify_plus_minus_op_data_cmp sorting of the ops array produces no
changes.  Now any change made during sorting sets "canonicalized", so
I figure reordering while decomposing operands ought to set
"canonicalized" too.  Indeed, the reordering seen above has
canonicalized the expression.  (Of course the reordering during
decomposition might be exactly cancelled by later sorting, but it
hardly seems worth fixing that, and other cases where we might return
the input expression unchanged..)

I'm still running bootstrap and regression tests on x86_64-linux,
this time with both -m64 and -m32 regression tests.
OK to apply assuming no regressions?

PR rtl-optimization/63615
* simplify-rtx.c (simplify_plus_minus): Set "canonicalized" on
decomposing PLUS or MINUS if operands are not placed adjacent
in the "ops" array.

OK assuming no regressions.

jeff



Re: [patch] should not define bool, true or false as macros for C++

2014-10-24 Thread Jonathan Wakely
I want to resurrect this patch that I didn't pursue for 4.8, because
our  violates this very explicit requirement in the C++11
and C++14 standards:

18.10 [support.runtime] p8
"The header  and the header  shall not define
macros named bool, true, or false."

Is the gcc/ginclude/stdbool.h change OK for trunk?

Tested x86_64-linux. I also looked in the Debian Code Search and it
seems that all the code which cares whether bool is a macro follows
one of these patterns:

#ifdef bool
#undef bool

#ifdef bool
  /* Leave if macro is from C99 stdbool.h */
  #ifndef __bool_true_false_are_defined
#undef bool
(The __bool_true_false_are_defined macro *is* defined in C++)

#ifdef bool
#error bool should not be defined
#endif
(That's in the libc++ testsuite for , we fail that test obviously)

#ifdef bool
#define HAS_BOOL
#endif

So although that search only covers FOSS code I don't think we're
going to break much C++ code by doing this, and it's needed for C++11
anyway.



On 5 February 2012 13:00, Jonathan Wakely wrote:
> On 4 February 2012 23:35, Gerald Pfeifer wrote:
>> For what it's worth, I strongly suggest that you only define those when
>> __cpluplus is pre-C++11.
>>
>> There is simply too much software out there which will run into this
>
> Really? Why would any C++ code assume "bool" is defined as a macro?
> It's been a keyword in C++ for longer than C99 has defined it as a macro.
>
>> and being aggressive in breaking (admittedly non-standard confirming
>> programs) gives GCC a bad reputation and is not nice to our users to
>> begin with.
>
> Fair enough, this revised patch still defines bool, true and false for
> C++98 mode, but not for C++11 mode.
>
> gcc/
> * ginclude/stdbool.h (true, false, bool): Do not define for C++11.
>
> libstdc++/
>* testsuite/18_support/headers/cstdbool/macros.cc: New.
>
> Tested x86_64-linux, OK for trunk?
commit 4c76750dfd43a8102ee3de5ea0c5699be3359d7f
Author: Jonathan Wakely 
Date:   Fri Oct 24 10:50:27 2014 +0100

C++11 explicitly forbids macros for bool, true and false.

gcc:
* ginclude/stdbool.h: Do not define bool, true or false in C++11.

libstdc++-v3:
* testsuite/18_support/headers/cstdbool/macros.cc: New.

diff --git a/gcc/ginclude/stdbool.h b/gcc/ginclude/stdbool.h
index f4e802f..a06f17f2 100644
--- a/gcc/ginclude/stdbool.h
+++ b/gcc/ginclude/stdbool.h
@@ -36,11 +36,15 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 
 #else /* __cplusplus */
 
-/* Supporting  in C++ is a GCC extension.  */
+/* Supporting _Bool in C++ is a GCC extension.  */
 #define _Bool  bool
+
+#if __cplusplus < 201103L
+/* Defining these macros in C++98 is a GCC extension.  */
 #define bool   bool
 #define false  false
 #define true   true
+#endif
 
 #endif /* __cplusplus */
 
diff --git a/libstdc++-v3/testsuite/18_support/headers/cstdbool/macros.cc 
b/libstdc++-v3/testsuite/18_support/headers/cstdbool/macros.cc
new file mode 100644
index 000..58631d8
--- /dev/null
+++ b/libstdc++-v3/testsuite/18_support/headers/cstdbool/macros.cc
@@ -0,0 +1,38 @@
+// { dg-do compile }
+// { dg-options "-std=gnu++11" }
+
+// Copyright (C) 2012 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+
+#ifndef __bool_true_false_are_defined
+# error "The header  fails to define a macro named 
__bool_true_false_are_defined"
+#endif
+
+#ifdef bool
+# error "The header  defines a macro named bool"
+#endif
+
+#ifdef true
+# error "The header  defines a macro named true"
+#endif
+
+#ifdef false
+# error "The header  defines a macro named false"
+#endif
+


Re: [GOOGLE] Fix LIPO resolved node reference fixup

2014-10-24 Thread Teresa Johnson
On Fri, Oct 24, 2014 at 10:55 AM, Xinliang David Li  wrote:
> When orgin_addr == addr, is there a guarantee that this assert:
>
>  gcc_assert (TREE_OPERAND (op,0) == addr);
>
> is always true?

It should be, that is the assumption of the code that we are trying to
enforce with the assert.

Teresa

>
> David
>
>
>
> On Fri, Oct 24, 2014 at 10:21 AM, Teresa Johnson  wrote:
>> This patch makes a fix to the reference fixups performed after LIPO
>> node resolution, to better handle the case where we are updating the
>> base address of a reference.
>>
>> Fixes google benchmark and passes regression tests. Ok for google/4_9?
>>
>> Thanks,
>> Teresa
>>
>> 2014-10-24  Teresa Johnson  
>>
>> Google ref b/18110567.
>> * cgraphbuild.c (get_base_address_expr): New function.
>> (fixup_ref): Update the op expression for new base address.
>>
>> Index: cgraphbuild.c
>> ===
>> --- cgraphbuild.c   (revision 216667)
>> +++ cgraphbuild.c   (working copy)
>> @@ -665,13 +665,35 @@ record_references_in_initializer (tree decl, bool
>>pointer_set_destroy (visited_nodes);
>>  }
>>
>> +/* Similar to get_base_address but returns the ADDR_EXPR pointing
>> +   to the base address corresponding to T.  */
>> +
>> +static tree
>> +get_base_address_expr (tree t)
>> +{
>> +  while (handled_component_p (t))
>> +t = TREE_OPERAND (t, 0);
>> +
>> +  if ((TREE_CODE (t) == MEM_REF
>> +   || TREE_CODE (t) == TARGET_MEM_REF)
>> +  && TREE_CODE (TREE_OPERAND (t, 0)) == ADDR_EXPR)
>> +return TREE_OPERAND (t, 0);
>> +
>> +  return NULL_TREE;
>> +}
>> +
>>  /* Update any function decl references in base ADDR of operand OP to refer 
>> to
>> the resolved node.  */
>>
>>  static bool
>>  fixup_ref (gimple, tree addr, tree op)
>>  {
>> +  tree orig_addr = addr;
>>addr = get_base_address (addr);
>> +  /* If the address was changed, update the operand OP to be the
>> + ADDR_EXPR pointing to the new base address.  */
>> +  if (orig_addr != addr)
>> +op = get_base_address_expr (orig_addr);
>>if (addr && TREE_CODE (addr) == FUNCTION_DECL)
>>  {
>>gcc_assert (TREE_CODE (op) == ADDR_EXPR);
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH][optabs] PR63442 libgcc_cmp_return_mode not always return word_mode

2014-10-24 Thread Jeff Law

On 10/24/14 08:09, Jiong Wang wrote:

ping~

thanks.

Regards,
Jiong

On 17/10/14 13:04, Jiong Wang wrote:

the cause should be one minor bug in prepare_cmp_insn.

the last mode parameter "pmode" of "prepare_cmp_insn" should match the
mode of the first parameter "x", while during the recursive call of
"prepare_cmp_insn",
x is with mode of targetm.libgcc_cmp_return_mode () and pmode is
assign to word_mode.

generally this is OK, because default libgcc_cmp_return_mode hook
always return word_mode,
but AArch64 has a target private implementation which always return
SImode, so there is a
mismatch which cause a ICE later.

this minor issue is hidding because nearly all other targets use
default hook, and the
compare is rarely invoked.

Thanks

gcc/
PR target/63442
* optabs.c (prepare_cmp_insn): Use target hook
"libgcc_cmp_return_mode" instead of word_mode.

This is fine once you have run it through a bootstrap and regression test.

Any reason not to use RET_MODE since that's already set up with the 
result of the target hook?


Jeff


Re: [PATCH 6/n] OpenMP 4.0 offloading infrastructure: option handling

2014-10-24 Thread Ilya Verbin
On 13 Oct 12:19, Jakub Jelinek wrote:
> On Sat, Oct 11, 2014 at 06:49:00PM +0400, Ilya Verbin wrote:
> > It introduces 2 new options:
> > 1. -foffload==
> >By default, GCC will build offload images for all offload targets 
> > specified
> > in configure, with non-target-specific options passed to host compiler.
> > This option is used to control offload targets and options for them.
> > 
> > It can be used in a few ways:
> > * -foffload=disable
> >   Tells GCC to disable offload support.
> >   OpenMP target regions will be run in host fallback mode.
> > * -foffload=
> >   Tells GCC to build offload images for .
> >   They will be built with non-target-specific options passed to host 
> > compiler.
> > * -foffload=
> >   Tells GCC to build offload images for all targets specified in configure. 
> >   They will be built with non-target-specific options passed to host 
> > compiler
> >   plus .
> > * -foffload==
> >   Tells GCC to build offload images for .
> >   They will be built with non-target-specific options passed to host 
> > compiler
> >   plus .
> > 
> > Options specified by -foffload are appended to the end of option set, so in 
> > case
> > of option conflicts they have more priority.
> 
> This looks good to me.
> 
> > 2. -foffload-abi=[lp64|ilp32]
> >This option is supposed to tell mkoffload (and offload compiler) which 
> > ABI is
> > used in streamed GIMPLE.  This option is desirable, because host and offload
> > compilers must have the same ABI.  The option is generated by the host 
> > compiler
> > automatically, it should not be specified by user.
> 
> But I'd like to understand why is this one needed.
> Why should the compilers care?  Aggregates layout and alignment of
> integral/floating types must match between host and offload compilers, sure,
> but isn't that something streamed already in the LTO bytecode?
> Or is LTO streamer not streaming some types like long_type_node?
> I'd expect if host and offload compiler disagree on long type size that
> you'd just use a different integral type with the same size as long on the
> host.
> Different sized pointers are of course a bigger problem, but can't you just
> error out on that during reading of the LTO, or even handle it (just use
> some integral type for when is the pointer stored in memory, and just
> convert to pointer after reads from memory, and convert back before storing
> to memory).  Erroring out during LTO streaming in sounds just fine to me
> though.

Here is an additional check.  Is the whole 'option handling' patch OK?

Thanks,
  -- Ilya


diff --git a/gcc/opts.c b/gcc/opts.c
index 9b2e1af..d1a626c 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -1732,6 +1732,13 @@ common_handle_option (struct gcc_options *opts,
   /* Deferred.  */
   break;
 
+#ifndef ACCEL_COMPILER
+case OPT_foffload_abi_:
+  error_at (loc, "-foffload-abi option can be specified only for "
+   "offload compiler");
+  break;
+#endif
+
 case OPT_fpack_struct_:
   if (value <= 0 || (value & (value - 1)) || value > 16)
error_at (loc,


Re: [PATCH 6/n] OpenMP 4.0 offloading infrastructure: option handling

2014-10-24 Thread Jakub Jelinek
On Fri, Oct 24, 2014 at 10:29:50PM +0400, Ilya Verbin wrote:
> diff --git a/gcc/opts.c b/gcc/opts.c
> index 9b2e1af..d1a626c 100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -1732,6 +1732,13 @@ common_handle_option (struct gcc_options *opts,
>/* Deferred.  */
>break;
>  
> +#ifndef ACCEL_COMPILER

ifndef ?  I would have expected ifdef.

> +case OPT_foffload_abi_:
> +  error_at (loc, "-foffload-abi option can be specified only for "
> + "offload compiler");
> +  break;
> +#endif
> +
>  case OPT_fpack_struct_:
>if (value <= 0 || (value & (value - 1)) || value > 16)
>   error_at (loc,

Jakub


Re: Patch committed: Don't define TARGET_HAS_F_SETLKW

2014-10-24 Thread Ian Taylor
On Fri, Oct 24, 2014 at 6:26 AM, Andreas Schwab  wrote:
> Ian Taylor  writes:
>
>> 2014-10-23  Ian Lance Taylor  
>>
>> * config/mep/mep.h (TARGET_HAS_F_SETLKW): Don't define.
>
> s/define/undefine/

Thanks.  Fixed.  (Changes to ChangeLog files do not themselves require
ChangeLog entries.)

Ian
Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 216675)
+++ gcc/ChangeLog   (working copy)
@@ -200,7 +200,7 @@
 
 2014-10-23  Ian Lance Taylor  
 
-   * config/mep/mep.h (TARGET_HAS_F_SETLKW): Don't define.
+   * config/mep/mep.h (TARGET_HAS_F_SETLKW): Don't undefine.
 
 2014-10-23  Jakub Jelinek  
 


Re: [GOOGLE] Fix LIPO resolved node reference fixup

2014-10-24 Thread Xinliang David Li
When orgin_addr == addr, is there a guarantee that this assert:

 gcc_assert (TREE_OPERAND (op,0) == addr);

is always true?

David



On Fri, Oct 24, 2014 at 10:21 AM, Teresa Johnson  wrote:
> This patch makes a fix to the reference fixups performed after LIPO
> node resolution, to better handle the case where we are updating the
> base address of a reference.
>
> Fixes google benchmark and passes regression tests. Ok for google/4_9?
>
> Thanks,
> Teresa
>
> 2014-10-24  Teresa Johnson  
>
> Google ref b/18110567.
> * cgraphbuild.c (get_base_address_expr): New function.
> (fixup_ref): Update the op expression for new base address.
>
> Index: cgraphbuild.c
> ===
> --- cgraphbuild.c   (revision 216667)
> +++ cgraphbuild.c   (working copy)
> @@ -665,13 +665,35 @@ record_references_in_initializer (tree decl, bool
>pointer_set_destroy (visited_nodes);
>  }
>
> +/* Similar to get_base_address but returns the ADDR_EXPR pointing
> +   to the base address corresponding to T.  */
> +
> +static tree
> +get_base_address_expr (tree t)
> +{
> +  while (handled_component_p (t))
> +t = TREE_OPERAND (t, 0);
> +
> +  if ((TREE_CODE (t) == MEM_REF
> +   || TREE_CODE (t) == TARGET_MEM_REF)
> +  && TREE_CODE (TREE_OPERAND (t, 0)) == ADDR_EXPR)
> +return TREE_OPERAND (t, 0);
> +
> +  return NULL_TREE;
> +}
> +
>  /* Update any function decl references in base ADDR of operand OP to refer to
> the resolved node.  */
>
>  static bool
>  fixup_ref (gimple, tree addr, tree op)
>  {
> +  tree orig_addr = addr;
>addr = get_base_address (addr);
> +  /* If the address was changed, update the operand OP to be the
> + ADDR_EXPR pointing to the new base address.  */
> +  if (orig_addr != addr)
> +op = get_base_address_expr (orig_addr);
>if (addr && TREE_CODE (addr) == FUNCTION_DECL)
>  {
>gcc_assert (TREE_CODE (op) == ADDR_EXPR);
>
>
> --
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [patch] Final basic-block.h flattening patch

2014-10-24 Thread Jeff Law

On 10/24/14 09:22, Andrew MacLeod wrote:

Don't let it's size scare you, this is actually fairly trivial now. I
split it into the more interesting patch and the big, boring, mechanical
one.  all-in-all, it touches 351 files :-P.

This patch completely flattens basic-block.h.  I manually adjusted some
of the remaining .h files and replicated the include list to any source
and include files which included those... and so on until full closure
was accompllshed.  basic-block.h is not included from anything but
source files now.

Very few actual tweaks... "enum profile_status_d" and "struct
control_flow_graph" were moved from basic-block.h to cfg.h.  They are
used from very few places and there was a tiny bit of code in cfg.c
which allocated the struct.  This is also the logical place for it.

symbol_table::create_empty() was moved from cgraph.h to cgraph.c. It
uses REG_BR_PROB_BASE which is defined in basic-block.h, and by moving
it to the .c file, there is no dependency between basic-block.h and
cgraph.h any more.

a few of the gen*.c needed to have the generated file's include list
changed a little to still compile properly.

gccplugin.h now includes all the stuff basic-block.h use to.Its
still not perfect as the testsuite showed.  one of the tests included
gcc-plugin.h almost last, so an existing #include "basic-block.h"
earlier in the include list broke...   Can't help everyone I guess.  If
gcc-plugin.h is included first, no problem. :-)

I also ran the include file reducer on all the new .h files this patch
series introduced (they should be safe since there is no conditional
macro stuff in them).. specifically  on:
predict.h, cfgrtl.h, cfg.h, cfganal.h, lcm.h, cfgbuild.h, cfgcleanup.h,
dominance.h, and of course, basic-block.h itself.  The only exception is
that I didn't try to reduce any config/* files

I have bootstrapped on x86_64-unknown-linux-gnu, run the testsuite with
no regressions, and run all the targets in contrib/config-list.mk to
catch and #includes the might be required by #ifdef'd code.   Hopefully
I got them all :-)  Im re-running on a fresh checkout over the
weekend just to be sure.

Assuming no issues pop up, OK for trunk?

Yes, this is OK.  ANd thanks for splitting it up like that :-)


jeff


Re: [PATCH][6/n] Merge from match-and-simplify, make forwprop fold all stmts

2014-10-24 Thread Jeff Law

On 10/24/14 07:16, Richard Biener wrote:


This patch makes GIMPLE forwprop fold all statements, following
single-use SSA edges only (as suggested by Jeff and certainly
how this will regress the least until we replace manual
simplification code that does not restrict itself this way).

forwprop is run up to 4 times at the moment (once only for -Og,
not at all for -O0), which still seems reasonable.  IMHO the
forwprop pass immediately after inlining is somewhat superfluous,
it was added there just for its ADDR_EXPR propagation.  We should
eventually split this pass into two.

Note that just folding what we propagated into (like the SSA
propagators do during substitute-and-fold phase) will miss
cases where we propagate into a stmt feeding the one we could
simplify.  Unless we always fold all single-use (and their use)
stmts we have to fold everything from time to time.  Changing
how / when we fold stuff is certainly sth to look after with
fold_stmt now being able to follow SSA edges.

Bootstrapped on x86_64-unknown-linux-gnu, testing still in progress.

 From earlier testing I remember I need to adjust a few testcases
that don't expect the early folding - notably two strlenopt cases
(previously XFAILed but then PASSed again).

I also expect to massage the single-use heuristic as I get to
merging the patterns I added for the various forwprop manual
pattern matchings to trunk (a lot of them do not restrict themselves
this way).

Does this otherwise look ok?

Thanks,
Richard.

2014-10-24  Richard Biener  

* tree-ssa-forwprop.c: Include tree-cfgcleanup.h and tree-into-ssa.h.
(lattice): New global.
(fwprop_ssa_val): New function.
(fold_all_stmts): Likewise.
(pass_forwprop::execute): Finally fold all stmts.

Seems reasonable.  After all, we can iterate on the single-use heuristic.

jeff



[GOOGLE] Fix LIPO resolved node reference fixup

2014-10-24 Thread Teresa Johnson
This patch makes a fix to the reference fixups performed after LIPO
node resolution, to better handle the case where we are updating the
base address of a reference.

Fixes google benchmark and passes regression tests. Ok for google/4_9?

Thanks,
Teresa

2014-10-24  Teresa Johnson  

Google ref b/18110567.
* cgraphbuild.c (get_base_address_expr): New function.
(fixup_ref): Update the op expression for new base address.

Index: cgraphbuild.c
===
--- cgraphbuild.c   (revision 216667)
+++ cgraphbuild.c   (working copy)
@@ -665,13 +665,35 @@ record_references_in_initializer (tree decl, bool
   pointer_set_destroy (visited_nodes);
 }

+/* Similar to get_base_address but returns the ADDR_EXPR pointing
+   to the base address corresponding to T.  */
+
+static tree
+get_base_address_expr (tree t)
+{
+  while (handled_component_p (t))
+t = TREE_OPERAND (t, 0);
+
+  if ((TREE_CODE (t) == MEM_REF
+   || TREE_CODE (t) == TARGET_MEM_REF)
+  && TREE_CODE (TREE_OPERAND (t, 0)) == ADDR_EXPR)
+return TREE_OPERAND (t, 0);
+
+  return NULL_TREE;
+}
+
 /* Update any function decl references in base ADDR of operand OP to refer to
the resolved node.  */

 static bool
 fixup_ref (gimple, tree addr, tree op)
 {
+  tree orig_addr = addr;
   addr = get_base_address (addr);
+  /* If the address was changed, update the operand OP to be the
+ ADDR_EXPR pointing to the new base address.  */
+  if (orig_addr != addr)
+op = get_base_address_expr (orig_addr);
   if (addr && TREE_CODE (addr) == FUNCTION_DECL)
 {
   gcc_assert (TREE_CODE (op) == ADDR_EXPR);


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[COMMITTED][PATCH][ARM] gnu11 cleanup for aapcs testcases

2014-10-24 Thread Jiong Wang


On 24/10/14 12:50, Marek Polacek wrote:


diff --git a/gcc/testsuite/gcc.target/arm/aapcs/abitest.h 
b/gcc/testsuite/gcc.target/arm/aapcs/abitest.h
index 06a92c3..7bce58b 100644
--- a/gcc/testsuite/gcc.target/arm/aapcs/abitest.h
+++ b/gcc/testsuite/gcc.target/arm/aapcs/abitest.h
@@ -49,6 +49,8 @@
  
  
  extern void abort (void);

+typedef unsigned int size_t;
+extern int memcmp (const void *s1, const void *s2, size_t n);
You can use __SIZE_TYPE__ and then you don't need the typedef.


ok, fixed, thanks for pointing this out.

committed the attached patch as obvious after talked with Ramana.

gcc/testsuite/
  * gcc.target/arm/aapcs/abitest.h: Declare memcpy.
diff --git a/gcc/testsuite/gcc.target/arm/aapcs/abitest.h b/gcc/testsuite/gcc.target/arm/aapcs/abitest.h
index 06a92c3..7bce58b 100644
--- a/gcc/testsuite/gcc.target/arm/aapcs/abitest.h
+++ b/gcc/testsuite/gcc.target/arm/aapcs/abitest.h
@@ -49,6 +49,7 @@
 
 
 extern void abort (void);
+extern int memcmp (const void *s1, const void *s2, __SIZE_TYPE__ n);
 
 __attribute__((naked))  void dumpregs () __asm("myfunc");
 __attribute__((naked))  void dumpregs ()

Re: [PATCH] Relax check against commuting XOR and ASHIFTRT in combine.c

2014-10-24 Thread Alan Lawrence
Well, I'd be happy with that. Curiously, that patch is identical to what I have 
here...and that's not what I got having built gcc with 
--target=sparc-sun-solaris2.11, but on further investigation it looks like the 
require-effective-target-ilp32/64 is not working correctly on my setup.


FWIW I've also tried MIPS (both 32- and 64-bit tests appear to work), IA64 
(64-bit test fine, not sure if ilp32 is supported), microblaze (32-bit), and 
AArch64 with -mabi=ilp32. Hence, I attach an alternative/possible patch which 
adds these platforms too.


However, since we have talked about removing the target selector altogether: the 
only arch I've found so far which has failed, was Alpha, and widening the regex 
to match "le" as well as "ge" then passes on Alpha too. To be honest I expect 
that if we were to go that route, we would find a deluge of smaller platforms on 
which the test fails; but if as testsuite maintainer you think that's 
appropriate - certainly I'd be willing to try that i.e. to exclude them as they 
turn up. A second alternative patch also attached. (FWIW I'll be away and unable 
to commit anything before Monday.)


More generally: really the test wants to be a unit test on combine_simplify_rtx, 
independent of front-end, expand, platform-specific insns, etc., but since we 
can't do that - whilst adding more platforms generally seems good, it is maybe 
not essential, and may increase fragility.


(In answer to your point about adding an effective-target in target-supports.exp 
- yes, could do that, but it's difficult to come up with a good characterization 
of what the criteria is, and I don't see it'd generalize to any other tests at 
all :()


--Alan

Rainer Orth wrote:

Alan Lawrence  writes:


Rainer Orth wrote:

However, as a quick first step, does adding the ilp32 / lp64 (and keeping
the architectures list for now) solve the immediate problem? Patch
attached, OK for trunk?

No, as I said this is wrong for biarch targets like sparc and i386.

When you say no this does not solve the immediate problem, are you saying
that you are (still) seeing test failures with the require-effective-target
patch applied? Or is the issue that this would not execute the tests as


I didn't try that patch yet, but the target part is wrong, as I tried to
explain.  Consider the sparc case: 


* if you configure for sparc-sun-solaris2.11, you default to -m32
  (i.e. ilp32), while -m64 is lp64

* if you configure for sparcv9-sun-solaris2.11 instead, you default to
  -m64 (lp64), but get ilp32 with -m32

So, irrespective of the sparc vs. sparc64 (which is wrong, btw., the
canonical form for 64-bit-default sparc is sparcv9) forms, you can get
ilp32 and lp64 with both.

Similar issues hold for i?86 vs. x86_64 and probably other biarch
targets like powerpc vs. powerpc64, so you need to use the most generic
forms of the target names in you target lists.


widely as might be possible? In principle I'm quite happy to relax the
target patterns, although have been having issues with sparc (below)...

Re. "what the architectures have in common" is largely that these are the
primary/secondary archs on which I've checked the test passes! I can now
add mips and microblaze to this list, however I'm nervous of dropping the
target entirely given the very large number of target architectures gcc
supports; and e.g. IA64 (in ILP32 mode) generates an ashiftrt:DI by 31
places, not ashiftrt:SI, which does not match the simplification criteria
in combine.c.


As I stated before, such target lists without any explanation are bound
to confuse future readers/testers: at the very least, add comments
explaining what those lists have in common.  OTOH, at this stage it
might be best to just drop the target list for now, learn which targets
pass and fail the tests, and then reintroduce them or, better yet, add
an effective-target keyword which matches them.  Otherwise, you'll never
get test coverage beyond your current list.

This should be something like 


  { target aarch64*-*-* i?86-*-* powerpc*-*-* sparc*-*-* x86_64-*-* }

E.g. sparc-sun-solaris2.11 with -m64 is lp64, but would be excluded by
your target list.  Keep the list sorted alphabetically and best add an
explanation so others know what those targets have in common.

So I've built a stage-1 compiler with --target=sparc-sun-solaris2.11, and I find

  * without -m64, my "dg-require-effective-target ilp32" causes the 32-bit
test to execute, and pass; "dg-require-effective-target lp64" prevents
execution of the 64-bit test (which would fail) - so all as expected and
desired.

  * with -lp64, behaviour is as previous (this is probably expected)


Huh?  What's -lp64?


  * with -m64, "dg-require-effective-target ilp32" still causes the test to
execute (but it fails, as the RTL now has an ashiftrt:DI by 31 places,
which doesn't meet the simplification criteria in combine.c - this is
pretty much as expected). "dg-require-effective-target lp64" stops the
64-bit test from executing howe

Re: PATCH: fix breakage from "[PATCH] Fix genmatch linking"

2014-10-24 Thread Hans-Peter Nilsson
> From: Richard Biener 
> Date: Fri, 24 Oct 2014 09:56:51 +0200
> On Fri, 24 Oct 2014, Hans-Peter Nilsson wrote:
> > Still, I don't understand exactly how your patch
> > introduces build-subdirectories where there were none before.
> > Maybe that "+all-gcc: maybe-all-build-libcpp" was wrong and
> > should be different?
> 
> No, we do need a build-libcpp to build gcc/build/genmatch.
> Not sure how you got around without a build-libiberty as other
> gen* programs surely require that.

Regular cross-configurations got around fine as they used the
"host"-build libiberty, which for crosses seemed to differ from
"build"-builds(!) only in that they're built at the objdir top
instead of objdir/build-.  Crosses *could* still use
the host libraries, but whatever; we're avoiding a
cross-or-native-conditional now.  I haven't given
canadian-crosses any thought, maybe they were broken before.

brgds, H-P


Re: [PATCH 5/5] add libcc1

2014-10-24 Thread Jeff Law

On 10/24/14 01:15, Phil Muldoon wrote:

On 10/10/14 22:58, Jeff Law wrote:

On 10/09/14 03:07, Phil Muldoon wrote:


Sorry for taking so long to reply.  We've talked, on irc and elsewhere
a little (some at the Cauldron too!).  I think the consensus is as
nobody has explicitly mentioned anything, this is OK to go in?

Yes, please go ahead and check it in.  You'll be the first contact point if 
something goes wrong :-)

Given the length of time since the original post and now, can you please do 
sanity bootstrap to make sure nothing's bitrotted before you commit?


I rebased the patch on top of GCC head (from the git repository),
updated the ChangeLogs, etc from two days ago (it takes two days to do
a full rebase, pristine and patched bootstrap and testrun on my poor
laptop ;).
Get a new laptop :-)  The process for getting one from corporate isn't 
bad.  In fact, I just got my refreshed laptop last week.




I've built both pristine and patched branches with bootstrap enabled.
I ran both testsuites and used contrib/compare_tests to make sure
everything was as it should be.  compare_tests reports everything as
fine.  One minor change I found, was due to some ongoing work on
hash_tables.  It seems to parameterless constructor call for a new
hash table has been removed.  This was trivially fixed with the patch
attached.  Even though (to me) it is obvious, what do you think?

Looks fine to me.

jeff



Re: avoid alignment of static variables affecting stack's

2014-10-24 Thread Jeff Law

On 10/24/14 04:12, Jan Beulich wrote:

On 24.10.14 at 11:52,  wrote:

On Fri, Oct 24, 2014 at 11:18 AM, Jakub Jelinek  wrote:

On Fri, Oct 24, 2014 at 11:10:08AM +0200, Richard Biener wrote:

For something in static storage, this seems OK.  However, I think a hard
register variable ought to be left alone -- even if we can't spill it to
a stack slot today, there's a reasonable chance we might add that
capability in the future.


Hmm, but then wouldn't it need to be the code generating the spill
that's responsible for enforcing suitable alignment? I can certainly
re-submit without the hard register special cased (as it would still
fix the original issue I'm seeing), but it feels wrong to do so.


Yes, ISTR the spilling code is supposed to update the required
stack alignment.  After all the RA decision might affect required
alignment of spills.


 From what I remember, at RA time you already have to know conservatively
that you'll want to do dynamic stack realignment and what the highest needed
alignment will be, so various parts of expansion etc. conservatively compute
what will be needed.  I think that is because you e.g. need to reserve some
registers (vDRAP, etc.) if doing dynamic realignment.
If you conservatively assume you'll need dynamic stack realignment and after
RA you find you really don't need it, there are some optimizations in
prologue threading where it attempts to at least decrease amount of
unnecessary code, but the harm has already been done.

Might be that with LRA perhaps this could be changed and not conservatively
assume more alignment than proven to be needed, but such code isn't there I
think.


I stand corrected then.


So am I to conclude then that I need to take out the hard register
check in order for this to be accepted?
At least for now, yes.  We can always revisit hard registers if/when 
IRA/LRA can be enhanced to deal with these issues.



jeff



Re: [PATCH] Fix typedef-name printing (PR c/56980)

2014-10-24 Thread Jeff Law

On 10/24/14 08:16, Marek Polacek wrote:

Our current C pretty printer output sometimes looks a bit goofy:
"expected ‘enum F *’ but argument is of type ‘enum F *’".
It's because it always prints "struct"/"union"/"enum" even though
the type is a typedef name.  This patch ought to fix this.
We've got a bunch of reports about this over the years...

The C++ printer can also print "B* {aka A*}", I'll try to learn
c_tree_printer to do something similar as well.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2014-10-24  Marek Polacek  

PR c/56980
* c-pretty-print.c (c_pretty_printer::simple_type_specifier): Don't
print "struct"/"union"/"enum" for typedefed names.

* gcc.dg/pr56980.c: New test.

OK.
Jeff



Re: [PATCH diagnostics/fortran] dynamically generate locations from offset + handle %C

2014-10-24 Thread Manuel López-Ibáñez
On 23 October 2014 21:10, Dodji Seketeli  wrote:
> Sorry, I forgot to make it clear that I have no power to approve the
> line-map changes.  I just gave my casual point view; so please take it
> for what it is worth.
>
> I am CC-ing Tom and Jason who can approve this.

I don't see the CCs in the copy I received. Please, Tom and Jason,
could you nominate Dodji as line-map.[ch] and input.[ch] maintainer?
He is surely the person who knows more about it right now. Or simply
state that line-map.[ch] and input.[ch] fall within the diagnostic
machinery.

Cheers,

Manuel.


Re: [PATCH 5/n] OpenMP 4.0 offloading infrastructure: libgomp

2014-10-24 Thread Ilya Verbin
On 24 Oct 18:07, Jakub Jelinek wrote:
> On Fri, Oct 24, 2014 at 08:03:42PM +0400, Ilya Verbin wrote:
> > A small addition, refcount and copy_from were uninitialized for globals.
> > 
> > 
> > diff --git a/libgomp/target.c b/libgomp/target.c
> > index 4ace170..5b4873b 100644
> > --- a/libgomp/target.c
> > +++ b/libgomp/target.c
> > @@ -647,6 +647,8 @@ gomp_init_device (struct gomp_device_descr *devicep)
> >k->host_start = table[i].host_start;
> >k->host_end = table[i].host_end;
> >k->tgt_offset = 0;
> > +  k->refcount = 1;
> > +  k->copy_from = false;
> >k->tgt = tgt;
> >node->left = NULL;
> >node->right = NULL;
> 
> Is that what Kirill reported today on IRC?  LGTM.

Right.

  -- Ilya


Re: [PATCH 5/n] OpenMP 4.0 offloading infrastructure: libgomp

2014-10-24 Thread Jakub Jelinek
On Fri, Oct 24, 2014 at 08:03:42PM +0400, Ilya Verbin wrote:
> A small addition, refcount and copy_from were uninitialized for globals.
> 
> 
> diff --git a/libgomp/target.c b/libgomp/target.c
> index 4ace170..5b4873b 100644
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -647,6 +647,8 @@ gomp_init_device (struct gomp_device_descr *devicep)
>k->host_start = table[i].host_start;
>k->host_end = table[i].host_end;
>k->tgt_offset = 0;
> +  k->refcount = 1;
> +  k->copy_from = false;
>k->tgt = tgt;
>node->left = NULL;
>node->right = NULL;

Is that what Kirill reported today on IRC?  LGTM.

Jakub


Re: [PATCH 5/n] OpenMP 4.0 offloading infrastructure: libgomp

2014-10-24 Thread Ilya Verbin
On 08 Oct 13:08, Jakub Jelinek wrote:
> On Mon, Oct 06, 2014 at 07:53:17PM +0400, Ilya Verbin wrote:
> > libgomp/
> > * libgomp.map (GOMP_4.0.1): New symbol version.
> > Add GOMP_offload_register.
> > * libgomp_target.h: New file.
> > * splay-tree.h: New file.
> > * target.c: Include config.h, libgomp_target.h, dlfcn.h, splay-tree.h.
> > (gomp_target_init): New forward declaration.
> > (gomp_is_initialized): New static variable.
> > (splay_tree_node, splay_tree, splay_tree_key): New typedefs.
> > (struct target_mem_desc, struct splay_tree_key_s, offload_image_descr):
> > New structures.
> > (offload_images, num_offload_images, devices, num_devices): New static
> > variables.
> > (splay_compare): New static function.
> > (struct gomp_device_descr): New structure.
> > (gomp_get_num_devices): Call gomp_target_init.
> > (resolve_device, gomp_map_vars_existing, gomp_map_vars, gomp_unmap_tgt)
> > (gomp_unmap_vars, gomp_update, gomp_init_device): New static functions.
> > (GOMP_offload_register): New function.
> > (GOMP_target): Arrange for host callback to be performed in a separate
> > initial thread and contention group, inheriting ICVs from
> > gomp_global_icv etc.  Call gomp_map_vars and gomp_unmap_vars.
> > Add device initialization and lookup for target function in splay tree.
> > (GOMP_target_data): Add device initialization and call gomp_map_vars.
> > (GOMP_target_end_data): Call gomp_unmap_vars.
> > (GOMP_target_update): Add device initialization and call gomp_update.
> > (gomp_load_plugin_for_device, gomp_register_images_for_device)
> > (gomp_target_init): New static functions.
> 
> This looks good to me.

A small addition, refcount and copy_from were uninitialized for globals.


diff --git a/libgomp/target.c b/libgomp/target.c
index 4ace170..5b4873b 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -647,6 +647,8 @@ gomp_init_device (struct gomp_device_descr *devicep)
   k->host_start = table[i].host_start;
   k->host_end = table[i].host_end;
   k->tgt_offset = 0;
+  k->refcount = 1;
+  k->copy_from = false;
   k->tgt = tgt;
   node->left = NULL;
   node->right = NULL;


  -- Ilya


Re: [PATCH] PR58867 ASan and UBSan tests not run for installed testing.

2014-10-24 Thread Maxim Ostapenko


On 10/24/2014 02:43 PM, Eric Botcazou wrote:

some time ago, Andrew wrote a patch that fixes PR58867
(http://patchwork.ozlabs.org/patch/286866/), but for some reasons it
wasn't committed to trunk.

This is resurrected Andrew's patch, extended to support Tsan testsuite.

This patch broke --disable-libsanitizer though, i.e. you now get gazillions of
sanitizer failures in the C and C++ testsuites.


Hi,

do you have any other (system) version of GCC, configured without 
--disable-libsanitizer? If so, perhaps your GCC links with system 
asan_preinit.o and links dummy int main () { return 0; } in 
check_effective_target_fsanitize_address successfully, but fails to find 
libasan.so in execution tests because LD_LIBRARY_PATH does not contain 
any path to libasan.so.2.


If so, I see two ways to fix this:

1) Add path to system libs in LD_LIBRARY_PATH explicitly.

2) Make check_effective_target_fsanitize_address not only link dummy 
executable, but also run it and verify that exit code equals zero.


-Maxim


Re: [Patch, AArch64] Enable Address sanitizer and UB sanitizer

2014-10-24 Thread Andrew Pinski
On Fri, Oct 24, 2014 at 8:44 AM, Christophe Lyon
 wrote:
> On 29 September 2014 15:01, Christophe Lyon  
> wrote:
>> On 26 September 2014 23:05, Andreas Schwab  wrote:
>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34c65c4
>>>
>>> * sanitizer_common/sanitizer_platform_limits_posix.h
>>> (__sanitizer___kernel_old_uid_t, __sanitizer___kernel_old_gid_t)
>>> [__aarch64__]: Define to unsigned short.
>>>
>>
>> Thanks for pointing this.
>>
>> My understanding is that this kind of patch has to be submitted to the
>> libsanitizer maintainers via the LLVM project.
>>
>> I'm going to take care of it.
>>
>> Christophe.
>>
>
> Hi,
>
> Although I tried to speed things up, this patch has not yet made it to
> GCC trunk, but it has been committed in upstream libsanitizer.
>
> While testing a cherry-pick of the relevant commit, I realized that we
> already have aarch64 machines running older kernels, and applying this
> patch means GCC would no longer build on such configurations.
>
> I'm unsure about the desirable approach:
> A- modify upsteam libsanitizer so that
> __sanitizer___kernel_old_[gu]id_t are defined to match the definition
> of the kernel headers used to build GCC
> B- drop backward compatibility and make it impossible to build
> gcc+libsanitizer on aarch64 with a kernel older than 3.15.3
>
> A: means I have to iterate with upstream libsanitizer, to discuss &
> agree on a patch, then cherry-pick it to GCC
> B: I can do it now, but since 3.15.3 is rather new, it's a bit harsh
> for users, and maybe a libsanitizer/configure.tgt update would be
> desirable to cleanly prevent trying to build libsanitizer in a no
> longer supported configuration.
>
> It also means that binary toolchains have another implicit dependency
> on the kernel versions (runtime and build-time ones).

The binary toolchain I am not worried about.  I and others use either
3.10 or 3.14 since those are the two long term supported kernels.

Thanks,
Andrew


>
>
> Thanks,
>
> Christophe.
>
>
>
>
>>> ---
>>>  libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git 
>>> a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h 
>>> b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h
>>> index caa36a4..139fe0a 100644
>>> --- a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h
>>> +++ b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h
>>> @@ -470,7 +470,7 @@ namespace __sanitizer {
>>>typedef long __sanitizer___kernel_off_t;
>>>  #endif
>>>
>>> -#if defined(__powerpc__) || defined(__aarch64__) || defined(__mips__)
>>> +#if defined(__powerpc__) || defined(__mips__)
>>>typedef unsigned int __sanitizer___kernel_old_uid_t;
>>>typedef unsigned int __sanitizer___kernel_old_gid_t;
>>>  #else
>>> --
>>> 2.1.1
>>>
>>> --
>>> Andreas Schwab, sch...@linux-m68k.org
>>> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
>>> "And now for something completely different."


Re: [PATCH, x86, 63534] Fix '-p' profile for 32 bit PIC mode

2014-10-24 Thread Jakub Jelinek
On Fri, Oct 24, 2014 at 07:19:53PM +0400, Evgeny Stupachenko wrote:
> >What is wrong in emitting the set_got right before the PROLOGUE_END
> >note and that way sharing a single load from both?
> Can you please explain the idea? Now set_got emitted right after
> PROLOGUE_END, what is the advantage in emitting it right before?
> Which load is going to be shared?

I thought I've already explained.
In ix86_init_pic_reg 32-bit part, if crtl->profile, instead of
  rtx insn = emit_insn (gen_set_got (pic_offset_table_rtx));
  RTX_FRAME_RELATED_P (insn) = 1;
do:
  rtx reg = crtl->profile
? gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM)
: pic_offset_table_rtx;
  rtx insn = emit_insn (gen_set_got (reg));
  RTX_FRAME_RELATED_P (insn) = 1;
  if (crtl->profile)
emit_move_insn (pic_offset_table_rtx, reg);
or so.  That will ensure the RA will most likely allocate the pic pseudo
to %ebx at the start of the function, and even if it doesn't, it will still
be loaded into that reg and only then moved to some other reg.

Then, supposedly you need to tweak the condition in ix86_save_reg:
  if (pic_offset_table_rtx
  && !ix86_use_pseudo_pic_reg ()
  && regno == REAL_PIC_OFFSET_TABLE_REGNUM
  && (df_regs_ever_live_p (REAL_PIC_OFFSET_TABLE_REGNUM)
  || crtl->profile
  || crtl->calls_eh_return
  || crtl->uses_const_pool
  || cfun->has_nonlocal_label))
return ix86_select_alt_pic_regnum () == INVALID_REGNUM;
to something like:
  if (regno == REAL_PIC_OFFSET_TABLE_REGNUM
  && pic_offset_table_rtx)
{
  if (ix86_use_pseudo_pic_reg ())
{
  /* %ebx needed by call to _mcount after the prologue.  */
  if (!TARGET_64BIT && flag_pic && crtl->profile)
return true;
}
  else if (df_regs_ever_live_p (REAL_PIC_OFFSET_TABLE_REGNUM)
   || crtl->profile
   || crtl->calls_eh_return
   || crtl->uses_const_pool
   || cfun->has_nonlocal_label))
return ix86_select_alt_pic_regnum () == INVALID_REGNUM;
}
which will make sure the prologue/epilogue saves/restores %ebx properly.

And, finally, for the !TARGET_64BIT && flag_pic && crtl->profile case
e.g. at the end of ix86_expand_prologue, check if the prologue is followed
by a series of notes (one of which is the PROLOGUE_END note), but no real
insns, and followed by set_got pattern (perhaps check that it recogs to
CODE_FOR_set_got) that loads into %ebx.  If it does, then fine, and just
move that insn from where it is emitted to before those notes.
If you don't find it there, emit set_got insn to %ebx yourself at the end
of the prologue.  Then no need to change the _mcount call in any way.
The profiler code is emitted on the PROLOGUE_END note, so if you managed
to move the set_got across the PROLOGUE_END note, or if you added an extra
one (e.g. for the case when no set_got was really needed in the rest of the
function), at that point the pic register will be allocated in %ebx.

> >> --- a/gcc/config/i386/i386.c
> >> +++ b/gcc/config/i386/i386.c
> >> @@ -39124,13 +39124,22 @@ x86_function_profiler (FILE *file, int
> >> labelno ATTRIBUTE_UNUSED)
> >>else
> >> x86_print_call_or_nop (file, mcount_name);
> >>  }
> >> +  /* At this stage we can't detrmine where GOT register is, as RA can 
> >> allocate
> >> + it to any hard register.  Therefore we need to set it once again.  */
> >>else if (flag_pic)
> >>  {
> >> +  pic_labels_used |= 1 << BX_REG;
> >> +  fprintf (file,"\tsub\t$16, %%esp\n");
> >> +  fprintf (file,"\tmovl\t%%ebx, (%%esp)\n");
> >> +  fprintf (file,"\tcall\t__x86.get_pc_thunk.bx\n");
> >> +  fprintf (file,"\taddl\t$_GLOBAL_OFFSET_TABLE_, %%ebx\n");
> >>  #ifndef NO_PROFILE_COUNTERS
> >>fprintf (file, "\tleal\t%sP%d@GOTOFF(%%ebx),%%"
> >> PROFILE_COUNT_REGISTER "\n",
> >>LPREFIX, labelno);
> >>  #endif
> >>fprintf (file, "1:\tcall\t*%s@GOT(%%ebx)\n", mcount_name);
> >> +  fprintf (file,"\tmovl\t(%%esp), %%ebx\n");
> >> +  fprintf (file,"\tadd\t$16, %%esp\n");

Note, the unwind info is wrong even in this case.  Whenever you are in
between that call\t__x86.get_pc_thunk.bx and movl\t(%%esp), %%ebx,
there is no unwind info telling the debug info consumers that %ebx has been
saved to the stack and where, so any time the debugger or anything else
looks up at outer frames e.g. from _mcount, the %ebx will contain bogus
value in the function that calls the function with _mcount call.

Jakub


Re: [Patch, AArch64] Enable Address sanitizer and UB sanitizer

2014-10-24 Thread Christophe Lyon
On 29 September 2014 15:01, Christophe Lyon  wrote:
> On 26 September 2014 23:05, Andreas Schwab  wrote:
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34c65c4
>>
>> * sanitizer_common/sanitizer_platform_limits_posix.h
>> (__sanitizer___kernel_old_uid_t, __sanitizer___kernel_old_gid_t)
>> [__aarch64__]: Define to unsigned short.
>>
>
> Thanks for pointing this.
>
> My understanding is that this kind of patch has to be submitted to the
> libsanitizer maintainers via the LLVM project.
>
> I'm going to take care of it.
>
> Christophe.
>

Hi,

Although I tried to speed things up, this patch has not yet made it to
GCC trunk, but it has been committed in upstream libsanitizer.

While testing a cherry-pick of the relevant commit, I realized that we
already have aarch64 machines running older kernels, and applying this
patch means GCC would no longer build on such configurations.

I'm unsure about the desirable approach:
A- modify upsteam libsanitizer so that
__sanitizer___kernel_old_[gu]id_t are defined to match the definition
of the kernel headers used to build GCC
B- drop backward compatibility and make it impossible to build
gcc+libsanitizer on aarch64 with a kernel older than 3.15.3

A: means I have to iterate with upstream libsanitizer, to discuss &
agree on a patch, then cherry-pick it to GCC
B: I can do it now, but since 3.15.3 is rather new, it's a bit harsh
for users, and maybe a libsanitizer/configure.tgt update would be
desirable to cleanly prevent trying to build libsanitizer in a no
longer supported configuration.

It also means that binary toolchains have another implicit dependency
on the kernel versions (runtime and build-time ones).


Thanks,

Christophe.




>> ---
>>  libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h 
>> b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h
>> index caa36a4..139fe0a 100644
>> --- a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h
>> +++ b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h
>> @@ -470,7 +470,7 @@ namespace __sanitizer {
>>typedef long __sanitizer___kernel_off_t;
>>  #endif
>>
>> -#if defined(__powerpc__) || defined(__aarch64__) || defined(__mips__)
>> +#if defined(__powerpc__) || defined(__mips__)
>>typedef unsigned int __sanitizer___kernel_old_uid_t;
>>typedef unsigned int __sanitizer___kernel_old_gid_t;
>>  #else
>> --
>> 2.1.1
>>
>> --
>> Andreas Schwab, sch...@linux-m68k.org
>> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
>> "And now for something completely different."


[patch] Final basic-block.h flattening patch

2014-10-24 Thread Andrew MacLeod
Don't let it's size scare you, this is actually fairly trivial now. I 
split it into the more interesting patch and the big, boring, mechanical 
one.  all-in-all, it touches 351 files :-P.


This patch completely flattens basic-block.h.  I manually adjusted some 
of the remaining .h files and replicated the include list to any source 
and include files which included those... and so on until full closure 
was accompllshed.  basic-block.h is not included from anything but 
source files now.


Very few actual tweaks... "enum profile_status_d" and "struct 
control_flow_graph" were moved from basic-block.h to cfg.h.  They are 
used from very few places and there was a tiny bit of code in cfg.c 
which allocated the struct.  This is also the logical place for it.


symbol_table::create_empty() was moved from cgraph.h to cgraph.c. It 
uses REG_BR_PROB_BASE which is defined in basic-block.h, and by moving 
it to the .c file, there is no dependency between basic-block.h and 
cgraph.h any more.


a few of the gen*.c needed to have the generated file's include list 
changed a little to still compile properly.


gccplugin.h now includes all the stuff basic-block.h use to.Its 
still not perfect as the testsuite showed.  one of the tests included 
gcc-plugin.h almost last, so an existing #include "basic-block.h" 
earlier in the include list broke...   Can't help everyone I guess.  If 
gcc-plugin.h is included first, no problem. :-)


I also ran the include file reducer on all the new .h files this patch 
series introduced (they should be safe since there is no conditional 
macro stuff in them).. specifically  on:
predict.h, cfgrtl.h, cfg.h, cfganal.h, lcm.h, cfgbuild.h, cfgcleanup.h, 
dominance.h, and of course, basic-block.h itself.  The only exception is 
that I didn't try to reduce any config/* files


I have bootstrapped on x86_64-unknown-linux-gnu, run the testsuite with 
no regressions, and run all the targets in contrib/config-list.mk to 
catch and #includes the might be required by #ifdef'd code.   Hopefully 
I got them all :-)  Im re-running on a fresh checkout over the 
weekend just to be sure.


Assuming no issues pop up, OK for trunk?

Andrew



	* basic-block.h: Remove all includes.
	(enum profile_status_d, struct control_flow_graph): Move to cfg.h
	* cfg.h (profile_status_d, struct control_flow_graph): Relocate here.
	* Makefile.in (GTFILES): Add cfg.h to list.
	* cgraph.h (symbol_table::create_empty): Move to cgraph.c.
	* cgraph.c (symbol_table::create_empty): Relocate from cgraph.h.
	* genconditions.c (write_header): Add predict.h and basic-block.h to
	lits of includes.
	* genemit.c (main): Ditto.
	* genpreds.c (write_insn_preds_c): Ditto.
	* genrecog.c (write_header): Ditto.
	* gengtype.c (open_base_files): Add predict.h, basic-block.h, and cfg.h
	to list of includes.
	* testsuite/gcc.dg/plugin/ggcplug.c: Shuffle includes to include
	gcc-plugin.h earlier.

Index: basic-block.h
===
--- basic-block.h	(revision 216559)
+++ basic-block.h	(working copy)
@@ -20,22 +20,6 @@
 #ifndef GCC_BASIC_BLOCK_H
 #define GCC_BASIC_BLOCK_H
 
-#include "predict.h"
-#include "vec.h"
-#include "hashtab.h"
-#include "hash-set.h"
-#include "machmode.h"
-#include "tm.h"
-#include "hard-reg-set.h"
-#include "input.h"
-#include "function.h"
-#include "cfgrtl.h"
-#include "cfg.h"
-#include "cfganal.h"
-#include "lcm.h"
-#include "cfgbuild.h"
-#include "cfgcleanup.h"
-#include "dominance.h"
 
 /* Use gcov_type to hold basic block counters.  Should be at least
64bit.  Although a counter cannot be negative, we use a signed
@@ -220,57 +204,6 @@
 #define BB_COPY_PARTITION(dstbb, srcbb) \
   BB_SET_PARTITION (dstbb, BB_PARTITION (srcbb))
 
-/* What sort of profiling information we have.  */
-enum profile_status_d
-{
-  PROFILE_ABSENT,
-  PROFILE_GUESSED,
-  PROFILE_READ,
-  PROFILE_LAST	/* Last value, used by profile streaming.  */
-};
-
-/* A structure to group all the per-function control flow graph data.
-   The x_* prefixing is necessary because otherwise references to the
-   fields of this struct are interpreted as the defines for backward
-   source compatibility following the definition of this struct.  */
-struct GTY(()) control_flow_graph {
-  /* Block pointers for the exit and entry of a function.
- These are always the head and tail of the basic block list.  */
-  basic_block x_entry_block_ptr;
-  basic_block x_exit_block_ptr;
-
-  /* Index by basic block number, get basic block struct info.  */
-  vec *x_basic_block_info;
-
-  /* Number of basic blocks in this flow graph.  */
-  int x_n_basic_blocks;
-
-  /* Number of edges in this flow graph.  */
-  int x_n_edges;
-
-  /* The first free basic block number.  */
-  int x_last_basic_block;
-
-  /* UIDs for LABEL_DECLs.  */
-  int last_label_uid;
-
-  /* Mapping of labels to their associated blocks.  At present
- only used for the gimple CFG.  */
-  vec *x_label_to_block_map;
-
-  enum profile_s

Re: [PATCH, x86, 63534] Fix '-p' profile for 32 bit PIC mode

2014-10-24 Thread Evgeny Stupachenko
>What is wrong in emitting the set_got right before the PROLOGUE_END
>note and that way sharing a single load from both?
Can you please explain the idea? Now set_got emitted right after
PROLOGUE_END, what is the advantage in emitting it right before?
Which load is going to be shared?

>This looks just as a hack.
Isn't it similar to what was before but just adding additional "prints"?


On Fri, Oct 24, 2014 at 6:29 PM, Jakub Jelinek  wrote:
> On Fri, Oct 24, 2014 at 06:12:15PM +0400, Evgeny Stupachenko wrote:
>> The following patch align stack for mcount and there should be no
>> problems with unwind as ix86_frame_pointer_required is true when
>> crtl->profile is true and flag_fentry is false (we call mcount after
>> function prolog).
>> When flag_fentry is true it is set to false in 32bit PIC mode:
>>   if (!TARGET_64BIT_P (opts->x_ix86_isa_flags) && opts->x_flag_pic)
>> {
>>   if (opts->x_flag_fentry > 0)
>> sorry ("-mfentry isn%'t supported for 32-bit in combination "
>>   "with -fpic");
>>   opts->x_flag_fentry = 0;
>> }
>
> What is wrong in emitting the set_got right before the PROLOGUE_END
> note and that way sharing a single load from both?
> This looks just as a hack.
>
>> 2014-10-24  Evgeny Stupachenko  
>>
>> PR target/63534
>> * config/i386/i386.c (x86_function_profiler): Add GOT register init
>> for mcount call.
>>
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index 6235c4f..2dff29c 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -39124,13 +39124,22 @@ x86_function_profiler (FILE *file, int
>> labelno ATTRIBUTE_UNUSED)
>>else
>> x86_print_call_or_nop (file, mcount_name);
>>  }
>> +  /* At this stage we can't detrmine where GOT register is, as RA can 
>> allocate
>> + it to any hard register.  Therefore we need to set it once again.  */
>>else if (flag_pic)
>>  {
>> +  pic_labels_used |= 1 << BX_REG;
>> +  fprintf (file,"\tsub\t$16, %%esp\n");
>> +  fprintf (file,"\tmovl\t%%ebx, (%%esp)\n");
>> +  fprintf (file,"\tcall\t__x86.get_pc_thunk.bx\n");
>> +  fprintf (file,"\taddl\t$_GLOBAL_OFFSET_TABLE_, %%ebx\n");
>>  #ifndef NO_PROFILE_COUNTERS
>>fprintf (file, "\tleal\t%sP%d@GOTOFF(%%ebx),%%"
>> PROFILE_COUNT_REGISTER "\n",
>>LPREFIX, labelno);
>>  #endif
>>fprintf (file, "1:\tcall\t*%s@GOT(%%ebx)\n", mcount_name);
>> +  fprintf (file,"\tmovl\t(%%esp), %%ebx\n");
>> +  fprintf (file,"\tadd\t$16, %%esp\n");
>>  }
>>else
>>  {
>>
>
> Jakub


Re: [PATCH 3/4] Add libgomp plugin for Intel MIC

2014-10-24 Thread Jakub Jelinek
On Fri, Oct 24, 2014 at 07:08:44PM +0400, Ilya Verbin wrote:
> On 24 Oct 16:35, Jakub Jelinek wrote:
> > On Thu, Oct 23, 2014 at 07:41:12PM +0400, Ilya Verbin wrote:
> > > > malloc can fail, SIGSEGV in response to that is not desirable.
> > > > Can't you fallback to alloca, or use just alloca, or use alloca
> > > > with malloc fallback?
> > > 
> > > I replaced it with alloca.
> > 
> > There is a risk if a suid or otherwise priviledge escalated program
> > uses it and attacker passes huge env vars.
> > Perhaps use alloca if it is <= 2KB and malloc otherwise, and in that case
> > if malloc fails, just do a fatal error?
> 
> Why is this more preferable than just a malloc + fatal error?
> This function is executed only once at plugin initialization, therefore no 
> real
> performance gain could be achived.

Even if it is executed once, using malloc for short env vars that will be
the 99% of all cases sounds like waste of resources to me.
You already know the strlen of the vars, so it is just a matter of
comparing that and setting a bool flag.

Jakub


RE: [PATCH v2 0-6/11] Fix PR/61114, make direct vector reductions endianness-neutral

2014-10-24 Thread Matthew Fortune
Alan Lawrence  writes:
> Patches 7-11 migrate migrate ARM, x86, IA64 (I think), and mostly PowerPC,
> to
> the new reduc_(plus|[us](min|max))_scal_optab. I have not managed to work
> out
> how to do the same for MIPS (specifically what I need to add to
> mips_expand_vec_reduc), and have had no response from the maintainers, so
> am

Sorry, I was looking at this but failed to send an email saying so. The lack
of vec_extract appears to be the stumbling point here so at the very least
we need to add a naïve version of that I believe.

> (2) also renaming reduc_..._scal_optab back to reduc_..._optab; would
> break the
> MIPS backend if something were not done with it's existing patterns.

I suspect we can deal with this in time to make a rename OK.

One thing occurred to me about this change in general which is that on the
whole the reduction to a scalar seems good for an epilogue but is there
a problem if the result is then replicated across a vector for further
processing. I.e. a vector is reduced to a scalar, which moves the value
from a SIMD register to a GP register (because scalar modes are not
supported in SIMD registers generally) and then gets moved back to a
SIMD register to form part of a new vector? Would you expect the
redundant moves to get eliminated?

Thanks,
Matthew


Re: [PATCH 2/2] [AARCH64,NEON] Convert arm_neon.h to use new builtins for vld[234](q?)_lane_*

2014-10-24 Thread Charles Baylis
On 24 October 2014 11:23, Marcus Shawcroft  wrote:
> On 23 October 2014 18:51, Charles Baylis  wrote:
>
>>> Otherwise this and the previous 1/2 associated patch look good, can
>>> you respin with these tidy ups?
>>
>> OK for trunk?
>
> OK
> /Marcus

Committed to trunk as r216671 and r216672.


Re: [PATCH 3/4] Add libgomp plugin for Intel MIC

2014-10-24 Thread Ilya Verbin
On 24 Oct 16:35, Jakub Jelinek wrote:
> On Thu, Oct 23, 2014 at 07:41:12PM +0400, Ilya Verbin wrote:
> > > malloc can fail, SIGSEGV in response to that is not desirable.
> > > Can't you fallback to alloca, or use just alloca, or use alloca
> > > with malloc fallback?
> > 
> > I replaced it with alloca.
> 
> There is a risk if a suid or otherwise priviledge escalated program
> uses it and attacker passes huge env vars.
> Perhaps use alloca if it is <= 2KB and malloc otherwise, and in that case
> if malloc fails, just do a fatal error?

Why is this more preferable than just a malloc + fatal error?
This function is executed only once at plugin initialization, therefore no real
performance gain could be achived.

Thanks,
  -- Ilya


[PATCH v2] avoid alignment of static variables affecting stack's

2014-10-24 Thread Jan Beulich
Function (or more narrow) scope static variables (as well as others not
placed on the stack) should also not have any effect on the stack
alignment. I noticed the issue first with Linux'es dynamic_pr_debug()
construct using an 8-byte aligned sub-file-scope local variable.

According to my checking bad behavior started with 4.6.x (4.5.3 was
still okay), but generated code got quite a bit worse as of 4.9.0.

[v2: Drop inclusion of hard register variables, as requested by
 Jakub and Richard.]

gcc/
2014-10-24  Jan Beulich  

* cfgexpand.c (expand_one_var): Exclude static and external
variables when adjusting stack alignment related state.

gcc/testsuite/
2014-10-24  Jan Beulich  

* gcc.c-torture/execute/stkalign.c: New.

--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1233,12 +1233,15 @@ static HOST_WIDE_INT
 expand_one_var (tree var, bool toplevel, bool really_expand)
 {
   unsigned int align = BITS_PER_UNIT;
+  bool stack = true;
   tree origvar = var;
 
   var = SSAVAR (var);
 
   if (TREE_TYPE (var) != error_mark_node && TREE_CODE (var) == VAR_DECL)
 {
+  stack = !TREE_STATIC (var) && !DECL_EXTERNAL (var);
+
   /* Because we don't know if VAR will be in register or on stack,
 we conservatively assume it will be on stack even if VAR is
 eventually put into register after RA pass.  For non-automatic
@@ -1267,22 +1270,25 @@ expand_one_var (tree var, bool toplevel,
align = POINTER_SIZE;
 }
 
-  if (SUPPORTS_STACK_ALIGNMENT
-  && crtl->stack_alignment_estimated < align)
+  if (stack)
 {
-  /* stack_alignment_estimated shouldn't change after stack
- realign decision made */
-  gcc_assert (!crtl->stack_realign_processed);
-  crtl->stack_alignment_estimated = align;
+  if (SUPPORTS_STACK_ALIGNMENT
+ && crtl->stack_alignment_estimated < align)
+   {
+ /* stack_alignment_estimated shouldn't change after stack
+realign decision made */
+ gcc_assert (!crtl->stack_realign_processed);
+ crtl->stack_alignment_estimated = align;
+   }
+
+  /* stack_alignment_needed > PREFERRED_STACK_BOUNDARY is permitted.
+So here we only make sure stack_alignment_needed >= align.  */
+  if (crtl->stack_alignment_needed < align)
+   crtl->stack_alignment_needed = align;
+  if (crtl->max_used_stack_slot_alignment < align)
+   crtl->max_used_stack_slot_alignment = align;
 }
 
-  /* stack_alignment_needed > PREFERRED_STACK_BOUNDARY is permitted.
- So here we only make sure stack_alignment_needed >= align.  */
-  if (crtl->stack_alignment_needed < align)
-crtl->stack_alignment_needed = align;
-  if (crtl->max_used_stack_slot_alignment < align)
-crtl->max_used_stack_slot_alignment = align;
-
   if (TREE_CODE (origvar) == SSA_NAME)
 {
   gcc_assert (TREE_CODE (var) != VAR_DECL
--- a/gcc/testsuite/gcc.c-torture/execute/stkalign.c
+++ b/gcc/testsuite/gcc.c-torture/execute/stkalign.c
@@ -0,0 +1,26 @@
+/* { dg-options "-fno-inline" } */
+
+#include 
+
+#define ALIGNMENT 64
+
+unsigned test(unsigned n, unsigned p)
+{
+  static struct { char __attribute__((__aligned__(ALIGNMENT))) c; } s;
+  unsigned x;
+
+  assert(__alignof__(s) == ALIGNMENT);
+  asm ("" : "=g" (x), "+m" (s) : "0" (&x));
+
+  return n ? test(n - 1, x) : (x ^ p);
+}
+
+int main (int argc, char *argv[] __attribute__((unused)))
+{
+  unsigned int x = test(argc, 0);
+
+  x |= test(argc + 1, 0);
+  x |= test(argc + 2, 0);
+
+  return !(x & (ALIGNMENT - 1));
+}



avoid alignment of static variables affecting stack's

Function (or more narrow) scope static variables (as well as others not
placed on the stack) should also not have any effect on the stack
alignment. I noticed the issue first with Linux'es dynamic_pr_debug()
construct using an 8-byte aligned sub-file-scope local variable.

According to my checking bad behavior started with 4.6.x (4.5.3 was
still okay), but generated code got quite a bit worse as of 4.9.0.

[v2: Drop inclusion of hard register variables, as requested by
 Jakub and Richard.]

gcc/
2014-10-24  Jan Beulich  

* cfgexpand.c (expand_one_var): Exclude static and external
variables when adjusting stack alignment related state.

gcc/testsuite/
2014-10-24  Jan Beulich  

* gcc.c-torture/execute/stkalign.c: New.

--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1233,12 +1233,15 @@ static HOST_WIDE_INT
 expand_one_var (tree var, bool toplevel, bool really_expand)
 {
   unsigned int align = BITS_PER_UNIT;
+  bool stack = true;
   tree origvar = var;
 
   var = SSAVAR (var);
 
   if (TREE_TYPE (var) != error_mark_node && TREE_CODE (var) == VAR_DECL)
 {
+  stack = !TREE_STATIC (var) && !DECL_EXTERNAL (var);
+
   /* Because we don't know if VAR will be in register or on stack,
 we conservatively assume it will be on stack even if VAR is
 eventually put into register after RA pass.  For non-automatic
@@ -1267,22 +1270,

Re: [PATCH 3/4] Add libgomp plugin for Intel MIC

2014-10-24 Thread Jakub Jelinek
On Thu, Oct 23, 2014 at 07:41:12PM +0400, Ilya Verbin wrote:
> > malloc can fail, SIGSEGV in response to that is not desirable.
> > Can't you fallback to alloca, or use just alloca, or use alloca
> > with malloc fallback?
> 
> I replaced it with alloca.

There is a risk if a suid or otherwise priviledge escalated program
uses it and attacker passes huge env vars.
Perhaps use alloca if it is <= 2KB and malloc otherwise, and in that case
if malloc fails, just do a fatal error?

> > Where does this artificial limit come from?  Using libNNN.so library names?
> > Can't you use lib%d.so instead?
> 
> Yes, it comes from the Image structure 
> (liboffloadmic/runtime/offload_host.h:52)
> It must contain a null-terminated name, therefore I need to allocate some 
> space
> for the name in plugin's struct TargetImage.  But the structure can't contain
> any bytes after the trailing zero and before the actual data.
> So, now I extended the name to 10 digits and removed the comparison with 1000.

Ok.

> > Also, seeing register_image, shouldn't there be
> > GOMP_OFFLOAD_unregister_image which would be invoked when the library
> > containing MIC offloading regions is dlclosed?
> > One could use __cxa_atexit or similar for that, something that is given
> > &__dso_handle.  Or is no cleanup necessary?  At least unregistering it
> > from translation tables, because the same addresses might be reused by a
> > different shared library?
> > With dlopen/dlclose in mind, 1000 might be easily reached, consider 1
> > times dlopening/dlclosing (perhaps over longer time, by long running daemon)
> > a shared library containg #pragma omp target region.
> 
> Hmm, previously we've tested only cases when all libraries are loaded before 
> the
> first offload.  Offloading from a dlopened library after the call to
> gomp_target_init isn't working.  So, this will require some changes in
> libgomp/target.c .  Is it ok to fix this bug in a separate patch?

I guess it can be done incrementally, even during stage3.

Jakub


Re: [PATCH, x86, 63534] Fix '-p' profile for 32 bit PIC mode

2014-10-24 Thread Jakub Jelinek
On Fri, Oct 24, 2014 at 06:12:15PM +0400, Evgeny Stupachenko wrote:
> The following patch align stack for mcount and there should be no
> problems with unwind as ix86_frame_pointer_required is true when
> crtl->profile is true and flag_fentry is false (we call mcount after
> function prolog).
> When flag_fentry is true it is set to false in 32bit PIC mode:
>   if (!TARGET_64BIT_P (opts->x_ix86_isa_flags) && opts->x_flag_pic)
> {
>   if (opts->x_flag_fentry > 0)
> sorry ("-mfentry isn%'t supported for 32-bit in combination "
>   "with -fpic");
>   opts->x_flag_fentry = 0;
> }

What is wrong in emitting the set_got right before the PROLOGUE_END
note and that way sharing a single load from both?
This looks just as a hack.

> 2014-10-24  Evgeny Stupachenko  
> 
> PR target/63534
> * config/i386/i386.c (x86_function_profiler): Add GOT register init
> for mcount call.
> 
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 6235c4f..2dff29c 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -39124,13 +39124,22 @@ x86_function_profiler (FILE *file, int
> labelno ATTRIBUTE_UNUSED)
>else
> x86_print_call_or_nop (file, mcount_name);
>  }
> +  /* At this stage we can't detrmine where GOT register is, as RA can 
> allocate
> + it to any hard register.  Therefore we need to set it once again.  */
>else if (flag_pic)
>  {
> +  pic_labels_used |= 1 << BX_REG;
> +  fprintf (file,"\tsub\t$16, %%esp\n");
> +  fprintf (file,"\tmovl\t%%ebx, (%%esp)\n");
> +  fprintf (file,"\tcall\t__x86.get_pc_thunk.bx\n");
> +  fprintf (file,"\taddl\t$_GLOBAL_OFFSET_TABLE_, %%ebx\n");
>  #ifndef NO_PROFILE_COUNTERS
>fprintf (file, "\tleal\t%sP%d@GOTOFF(%%ebx),%%"
> PROFILE_COUNT_REGISTER "\n",
>LPREFIX, labelno);
>  #endif
>fprintf (file, "1:\tcall\t*%s@GOT(%%ebx)\n", mcount_name);
> +  fprintf (file,"\tmovl\t(%%esp), %%ebx\n");
> +  fprintf (file,"\tadd\t$16, %%esp\n");
>  }
>else
>  {
> 

Jakub


Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming

2014-10-24 Thread Jakub Jelinek
On Fri, Oct 24, 2014 at 06:16:01PM +0400, Ilya Verbin wrote:
> We have to set the global have_offload flag in few places in omp-low.c and in 
> FE
> (c/c-decl.c:c_decl_attributes, fortran/trans-common.c:build_common_decl,
> fortran/trans-decl.c:add_attributes_to_decl).
> This way looks for me a bit more complicated than the current approach.
> 
> Actually, we could follow Jakub's suggestion of caching the attribute in a bit
> field, and set the global have_offload flag on the run without any changes in
> FE.  However, I don't know a suitable place for it.  If you agree with the
> approach, could you please specify the place?

Can't you do that when creating the cgraph or varpool nodes?
I'd expect the attribute to be already present on the decls at those spots.

Jakub


[PATCH] Fix typedef-name printing (PR c/56980)

2014-10-24 Thread Marek Polacek
Our current C pretty printer output sometimes looks a bit goofy:
"expected ‘enum F *’ but argument is of type ‘enum F *’".
It's because it always prints "struct"/"union"/"enum" even though
the type is a typedef name.  This patch ought to fix this. 
We've got a bunch of reports about this over the years...

The C++ printer can also print "B* {aka A*}", I'll try to learn
c_tree_printer to do something similar as well.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2014-10-24  Marek Polacek  

PR c/56980
* c-pretty-print.c (c_pretty_printer::simple_type_specifier): Don't
print "struct"/"union"/"enum" for typedefed names.

* gcc.dg/pr56980.c: New test.

diff --git gcc/c-family/c-pretty-print.c gcc/c-family/c-pretty-print.c
index 3b2dbc1..9096a07 100644
--- gcc/c-family/c-pretty-print.c
+++ gcc/c-family/c-pretty-print.c
@@ -416,7 +416,9 @@ c_pretty_printer::simple_type_specifier (tree t)
 case UNION_TYPE:
 case RECORD_TYPE:
 case ENUMERAL_TYPE:
-  if (code == UNION_TYPE)
+  if (TYPE_NAME (t) && TREE_CODE (TYPE_NAME (t)) == TYPE_DECL)
+   /* Don't decorate the type if this is a typedef name.  */;
+  else if (code == UNION_TYPE)
pp_c_ws_string (this, "union");
   else if (code == RECORD_TYPE)
pp_c_ws_string (this, "struct");
diff --git gcc/testsuite/gcc.dg/pr56980.c gcc/testsuite/gcc.dg/pr56980.c
index e69de29..f48379a 100644
--- gcc/testsuite/gcc.dg/pr56980.c
+++ gcc/testsuite/gcc.dg/pr56980.c
@@ -0,0 +1,24 @@
+/* PR c/56980 */
+/* { dg-do compile } */
+
+typedef struct A { int i; } B;
+typedef union U { int i; } V;
+typedef enum E { G } F;
+
+void foo_s (struct A); /* { dg-message "expected .struct A. but argument is of 
type .B \\*." } */
+void foo_u (union U); /* { dg-message "expected .union U. but argument is of 
type .V \\*." } */
+void foo_e (enum E); /* { dg-message "expected .enum E. but argument is of 
type .F \\*." } */
+void foo_sp (B *); /* { dg-message "expected .B \\*. but argument is of type 
.struct B \\*." } */
+void foo_up (V *); /* { dg-message "expected .V \\*. but argument is of type 
.union V \\*." } */
+void foo_ep (F *); /* { dg-message "expected .F \\*. but argument is of type 
.enum F \\*." } */
+
+void 
+bar (B *b, V *v, F *f)
+{
+  foo_s (b); /* { dg-error "incompatible" } */
+  foo_u (v); /* { dg-error "incompatible" } */
+  foo_e (f); /* { dg-error "incompatible" } */
+  foo_sp ((struct B *) b); /* { dg-error "passing argument" } */
+  foo_up ((union V *) v); /* { dg-error "passing argument" } */
+  foo_ep (__extension__ (enum F *) f); /* { dg-error "passing argument" } */
+}

Marek


Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming

2014-10-24 Thread Ilya Verbin
On 20 Oct 15:19, Ilya Verbin wrote:
> On 15 Oct 16:23, Richard Biener wrote:
> > > +static bool
> > > +initialize_offload (void)
> > > +{
> > > +  bool have_offload = false;
> > > +  struct cgraph_node *node;
> > > +  struct varpool_node *vnode;
> > > +
> > > +  FOR_EACH_DEFINED_FUNCTION (node)
> > > +if (lookup_attribute ("omp declare target", DECL_ATTRIBUTES 
> > > (node->decl)))
> > > +  {
> > > + have_offload = true;
> > > + break;
> > > +  }
> > > +
> > > +  FOR_EACH_DEFINED_VARIABLE (vnode)
> > > +{
> > > +  if (!lookup_attribute ("omp declare target",
> > > +  DECL_ATTRIBUTES (vnode->decl))
> > > +   || TREE_CODE (vnode->decl) != VAR_DECL
> > > +   || DECL_SIZE (vnode->decl) == 0)
> > > + continue;
> > > +  have_offload = true;
> > > +}
> > > +
> > > +  return have_offload;
> > > +}
> > > +
> > 
> > I wonder if we can avoid the above by means of a global have_offload
> > flag?  (or inside gcc::context)
>
> > > +/* Select what needs to be streamed out.  In regular lto mode stream 
> > > everything.
> > > +   In offload lto mode stream only stuff marked with an attribute.  */
> > > +void
> > > +select_what_to_stream (bool offload_lto_mode)
> > > +{
> > > +  struct symtab_node *snode;
> > > +  FOR_EACH_SYMBOL (snode)
> > > +snode->need_lto_streaming
> > > +  = !offload_lto_mode || lookup_attribute ("omp declare target",
> > > +DECL_ATTRIBUTES (snode->decl));
> > 
> > I suppose I suggested this already earlier this year.  Why keep this
> > artificial attribute when you have a cgraph node flag?
> 
> > > +   /* If '#pragma omp critical' is inside target region, the symbol must
> > > +  have an 'omp declare target' attribute.  */
> > > +   omp_context *octx;
> > > +   for (octx = ctx->outer; octx; octx = octx->outer)
> > > + if (is_targetreg_ctx (octx))
> > > +   {
> > > + DECL_ATTRIBUTES (decl)
> > > +   = tree_cons (get_identifier ("omp declare target"),
> > > +NULL_TREE, DECL_ATTRIBUTES (decl));
> > 
> > Here - why not set a flag on cgraph_get_node (decl) instead?
> 
> I thought that select_what_to_stream is exactly what you've suggested.
> Could you please clarify this?  You propose to replace "omp declare target"
> attribure with some cgraph node flag like need_offload?  But we'll need
> need_lto_streaming anyway, since for LTO it should be 1 for all nodes, but for
> offloading it should be equal to need_offload.

We have to set the global have_offload flag in few places in omp-low.c and in FE
(c/c-decl.c:c_decl_attributes, fortran/trans-common.c:build_common_decl,
fortran/trans-decl.c:add_attributes_to_decl).
This way looks for me a bit more complicated than the current approach.

Actually, we could follow Jakub's suggestion of caching the attribute in a bit
field, and set the global have_offload flag on the run without any changes in
FE.  However, I don't know a suitable place for it.  If you agree with the
approach, could you please specify the place?

Thanks,
  -- Ilya


Unifying std::atomic_int and std::atomic

2014-10-24 Thread Jonathan Wakely

Our  was implemented (by Benjamin IIRC) based on an early
C++0x draft when the spec was still trying to be valid for both C and
C++. Part of the C compatibility aspect was that std::atomic_int is
allowed to be either a typedef for std::atomic or a base class of
it, so that a C library could define std::atomic_int and then the C++
library could make std::atomic derive from that.

In the final C11 spec atomics work completely differently, and
atomic_int is a typedef for _Atomic int, which is not a valid base
class. So the old C++0x draft's compatibility aim is impossible,
atomic_int can never be the same type in C and C++.

In our implementation, std::atomic_int is a base class of
std::atomic, which has no benefit I can see, but causes
https://gcc.gnu.org/PR60940

Rather than overloading every atomic_op() non-member function to
handle the derived class and the base class, it would be simpler to
just get rid of the base classes and make atomic_xxx a typedef for
atomic, as the attached patch does for atomic_{bool,char,schar}.

Does anyone object to that change?

If you object, are you prepared to do the work to fix PR60940? :-)



[Note:- it could probably be simplified even further so atomic
is just:

 template<>
   struct atomic : public __atomic_base
   {
 using __atomic_base::__atomic_base;
   };

But that could be done later as it wouldn't change anything
observable, making atomic_char a typedef for atomic is the
observable and IMHO important change. -end note]

diff --git a/libstdc++-v3/include/bits/atomic_base.h 
b/libstdc++-v3/include/bits/atomic_base.h
index 1fc0ebb..a591c46 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -120,12 +120,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct __atomic_base;
 
-  /// atomic_char
-  typedef __atomic_base  atomic_char;
-
-  /// atomic_schar
-  typedef __atomic_base   atomic_schar;
-
   /// atomic_uchar
   typedef __atomic_base atomic_uchar;
 
diff --git a/libstdc++-v3/include/std/atomic b/libstdc++-v3/include/std/atomic
index 85dc252..c58853e 100644
--- a/libstdc++-v3/include/std/atomic
+++ b/libstdc++-v3/include/std/atomic
@@ -49,21 +49,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
* @{
*/
 
-  /// atomic_bool
+  template
+struct atomic;
+
+  /// atomic
   // NB: No operators or fetch-operations for this type.
-  struct atomic_bool
+  template<>
+  struct atomic
   {
   private:
 __atomic_base_M_base;
 
   public:
-atomic_bool() noexcept = default;
-~atomic_bool() noexcept = default;
-atomic_bool(const atomic_bool&) = delete;
-atomic_bool& operator=(const atomic_bool&) = delete;
-atomic_bool& operator=(const atomic_bool&) volatile = delete;
+atomic() noexcept = default;
+~atomic() noexcept = default;
+atomic(const atomic&) = delete;
+atomic& operator=(const atomic&) = delete;
+atomic& operator=(const atomic&) volatile = delete;
 
-constexpr atomic_bool(bool __i) noexcept : _M_base(__i) { }
+constexpr atomic(bool __i) noexcept : _M_base(__i) { }
 
 bool
 operator=(bool __i) noexcept
@@ -151,6 +155,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { return _M_base.compare_exchange_strong(__i1, __i2, __m); }
   };
 
+  /// atomic_bool
+  typedef atomic atomic_bool;
+
 
   /**
*  @brief Generic atomic type, primary class template.
@@ -485,31 +492,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 };
 
 
-  /// Explicit specialization for bool.
-  template<>
-struct atomic : public atomic_bool
-{
-  typedef bool __integral_type;
-  typedef atomic_bool  __base_type;
-
-  atomic() noexcept = default;
-  ~atomic() noexcept = default;
-  atomic(const atomic&) = delete;
-  atomic& operator=(const atomic&) = delete;
-  atomic& operator=(const atomic&) volatile = delete;
-
-  constexpr atomic(__integral_type __i) noexcept : __base_type(__i) { }
-
-  using __base_type::operator __integral_type;
-  using __base_type::operator=;
-};
-
   /// Explicit specialization for char.
   template<>
-struct atomic : public atomic_char
+struct atomic : public __atomic_base
 {
   typedef char __integral_type;
-  typedef atomic_char  __base_type;
+  typedef __atomic_base  __base_type;
 
   atomic() noexcept = default;
   ~atomic() noexcept = default;
@@ -523,12 +511,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   using __base_type::operator=;
 };
 
+  /// atomic_char
+  typedef atomic atomic_char;
+
   /// Explicit specialization for signed char.
   template<>
-struct atomic : public atomic_schar
+struct atomic : public __atomic_base
 {
   typedef signed char  __integral_type;
-  typedef atomic_schar __base_type;
+  typedef __atomic_base   __base_type;
 
   atomic() noexcept= default;
 

Re: [PATCH, x86, 63534] Fix '-p' profile for 32 bit PIC mode

2014-10-24 Thread Evgeny Stupachenko
The following patch align stack for mcount and there should be no
problems with unwind as ix86_frame_pointer_required is true when
crtl->profile is true and flag_fentry is false (we call mcount after
function prolog).
When flag_fentry is true it is set to false in 32bit PIC mode:
  if (!TARGET_64BIT_P (opts->x_ix86_isa_flags) && opts->x_flag_pic)
{
  if (opts->x_flag_fentry > 0)
sorry ("-mfentry isn%'t supported for 32-bit in combination "
  "with -fpic");
  opts->x_flag_fentry = 0;
}



2014-10-24  Evgeny Stupachenko  

PR target/63534
* config/i386/i386.c (x86_function_profiler): Add GOT register init
for mcount call.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 6235c4f..2dff29c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -39124,13 +39124,22 @@ x86_function_profiler (FILE *file, int
labelno ATTRIBUTE_UNUSED)
   else
x86_print_call_or_nop (file, mcount_name);
 }
+  /* At this stage we can't detrmine where GOT register is, as RA can allocate
+ it to any hard register.  Therefore we need to set it once again.  */
   else if (flag_pic)
 {
+  pic_labels_used |= 1 << BX_REG;
+  fprintf (file,"\tsub\t$16, %%esp\n");
+  fprintf (file,"\tmovl\t%%ebx, (%%esp)\n");
+  fprintf (file,"\tcall\t__x86.get_pc_thunk.bx\n");
+  fprintf (file,"\taddl\t$_GLOBAL_OFFSET_TABLE_, %%ebx\n");
 #ifndef NO_PROFILE_COUNTERS
   fprintf (file, "\tleal\t%sP%d@GOTOFF(%%ebx),%%"
PROFILE_COUNT_REGISTER "\n",
   LPREFIX, labelno);
 #endif
   fprintf (file, "1:\tcall\t*%s@GOT(%%ebx)\n", mcount_name);
+  fprintf (file,"\tmovl\t(%%esp), %%ebx\n");
+  fprintf (file,"\tadd\t$16, %%esp\n");
 }
   else
 {

On Fri, Oct 17, 2014 at 6:38 PM, Jakub Jelinek  wrote:
> On Fri, Oct 17, 2014 at 06:30:42PM +0400, Evgeny Stupachenko wrote:
>> Hi,
>>
>> The patch fixes profile in 32bits PIC mode (only -p option affected).
>>
>> x86 bootstrap, make check passed
>>
>> spec2000 o2 -p train data on Corei7:
>> CINT -5%
>> CFP  +1,5
>> compared to a compiler before "enabling ebx".
>>
>> There is a potential performance improve after the patch applied
>> suggested by Jakub:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63534#c8
>> There is opened bug on this: PR63527. However the fix of the bug is
>> more complicated.
>>
>> Is it ok?
>
> Unfortunately I don't think it is ok.
> 1) you don't set the appropriate bit in pic_labels_used (for ebx)
> 2) more importantly, it causes the stack to be misaligned (i.e. violating
>ABI) for the _mcount call, and, break unwind info.
>
>> 2014-10-16  Evgeny Stupachenko  
>>
>> PR target/63534
>> * config/i386/i386.c (x86_function_profiler): Add GOT register init
>> for mcount call.
>>
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index a3ca2ed..5117572 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -39119,11 +39126,15 @@ x86_function_profiler (FILE *file, int
>> labelno ATTRIBUTE_UNUSED)
>>  }
>>else if (flag_pic)
>>  {
>> +  fprintf (file,"\tpush\t%%ebx\n");
>> +  fprintf (file,"\tcall\t__x86.get_pc_thunk.bx\n");
>> +  fprintf (file,"\taddl\t$_GLOBAL_OFFSET_TABLE_, %%ebx\n");
>>  #ifndef NO_PROFILE_COUNTERS
>>fprintf (file, "\tleal\t%sP%d@GOTOFF(%%ebx),%%"
>> PROFILE_COUNT_REGISTER "\n",
>>LPREFIX, labelno);
>>  #endif
>>fprintf (file, "1:\tcall\t*%s@GOT(%%ebx)\n", mcount_name);
>> +  fprintf (file,"\tpop\t%%ebx\n");
>>  }
>>else
>>  {
>
> Jakub


Re: [PATCH][optabs] PR63442 libgcc_cmp_return_mode not always return word_mode

2014-10-24 Thread Jiong Wang

ping~

thanks.

Regards,
Jiong

On 17/10/14 13:04, Jiong Wang wrote:

the cause should be one minor bug in prepare_cmp_insn.

the last mode parameter "pmode" of "prepare_cmp_insn" should match the
mode of the first parameter "x", while during the recursive call of 
"prepare_cmp_insn",
x is with mode of targetm.libgcc_cmp_return_mode () and pmode is assign to 
word_mode.

generally this is OK, because default libgcc_cmp_return_mode hook always return 
word_mode,
but AArch64 has a target private implementation which always return SImode, so 
there is a
mismatch which cause a ICE later.

this minor issue is hidding because nearly all other targets use default hook, 
and the
compare is rarely invoked.

Thanks

gcc/
PR target/63442
* optabs.c (prepare_cmp_insn): Use target hook "libgcc_cmp_return_mode" 
instead of word_mode.





Re: [PATCHv5][Kasan] Allow to override Asan shadow offset from command line

2014-10-24 Thread Jakub Jelinek
On Fri, Oct 24, 2014 at 05:56:37PM +0400, Yury Gribov wrote:
> >From 1882c41de6c8ae53b7e199b3cc655b6f4b31e8fb Mon Sep 17 00:00:00 2001
> From: Yury Gribov 
> Date: Thu, 16 Oct 2014 18:31:10 +0400
> Subject: [PATCH 1/2] Add strtoll and strtoull to libiberty.
> 
> 2014-10-20  Yury Gribov  
> 
> include/
>   * libiberty.h (strtol, strtoul, strtoll, strtoull): New prototypes.
> 
> libiberty/
>   * strtoll.c: New file.
>   * strtoull.c: New file.
>   * configure.ac: Add long long checks. Add harness for strtoll and
>   strtoull. Check decls for strtol, strtoul, strtoll, strtoull.
>   * Makefile.in (CFILES, CONFIGURED_OFILES): Added strtoll and strtoull.
>   * config.in: Regenerate.
>   * configure: Regenerate.
>   * functions.texi: Regenerate.
>   * testsuite/Makefile.in (check-strtol): New rule.
>   (test-strtol): Likewise.
>   (mostlyclean): Clean up strtol test.
>   * testsuite/test-strtol.c: New test.

Ian, can you please review this?

> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -883,6 +883,10 @@ fsanitize=
>  Common Driver Report Joined
>  Select what to sanitize
>  
> +fasan-shadow-offset=
> +Common Joined RejectNegative Var(common_deferred_options) Defer
> +-fasan-shadow-offset=Use custom shadow memory offset.

Shouldn't that be = or = instead of string?

> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -297,7 +297,7 @@ Objective-C and Objective-C++ Dialects}.
>  @xref{Debugging Options,,Options for Debugging Your Program or GCC}.
>  @gccoptlist{-d@var{letters}  -dumpspecs  -dumpmachine  -dumpversion @gol
>  -fsanitize=@var{style} -fsanitize-recover -fsanitize-recover=@var{style} @gol
> --fsanitize-undefined-trap-on-error @gol
> +-fasan-shadow-offset=@var{string} -fsanitize-undefined-trap-on-error @gol

Likewise here, @var{number} instead.

>  -fdbg-cnt-list -fdbg-cnt=@var{counter-value-list} @gol
>  -fdisable-ipa-@var{pass_name} @gol
>  -fdisable-rtl-@var{pass_name} @gol
> @@ -5642,6 +5642,12 @@ While @option{-ftrapv} causes traps for signed 
> overflows to be emitted,
>  @option{-fsanitize=undefined} gives a diagnostic message.
>  This currently works only for the C family of languages.
>  
> +@item -fasan-shadow-offset=@var{string}

And here.

Otherwise looks good to me.

Jakub


Re: [patch,avr] tweak sign extensions, take #2

2014-10-24 Thread Denis Chertykov
2014-10-24 14:37 GMT+04:00 Georg-Johann Lay :
> Am 10/23/2014 08:16 PM schrieb Denis Chertykov:
>>>
>>> This optimization makes most sign-extensions one instruction shorter in
>>> the
>>> case when the source register may be clobbered and the register numbers
>>> are
>>> different.  Source and destination may overlap.
>>>
>>> Ok for trunk?
>>>
>>> Johann
>>>
>>> gcc/
>>>  * config/avr/avr.md (extendqihi2, extendqipsi2, extendqisi2)
>>>  (extendhipsi2, extendhisi2): Optimize if source reg is unused
>>>  after the insns and has different REGNO than destination.
>>
>>
>> Approved.
>>
>> Denis.
>
>
> Finally I switched to a solution that avoids all the ugly asm snippets and
> special casing, and which is exact w.r.t code size.  So allow me drop the
> patch from above and to propose this one for trunk.  Sorry for the
> inconvenience.
>
> In any case it uses LSL/SBC idiom instead of the old CLR/SBRC/COM.
>
>
> Johann
>
> * avr-protos.h (avr_out_sign_extend): New.
> * avr.c (avr_adjust_insn_length) [ADJUST_LEN_SEXT]: Handle.
> (avr_out_sign_extend): New function.
> * avr.md (extendqihi2, extendqipsi2, extendqisi2, extendhipsi2)
> (extendhisi2, extendpsisi2): Use it.
> (adjust_len) [sext]: New.
>
>
>

I'm agree with you. It's better.
Approved.

Denis.


[PATCHv5][Kasan] Allow to override Asan shadow offset from command line

2014-10-24 Thread Yury Gribov

Hi all,

On 10/17/2014 11:53 AM, Yury Gribov wrote:

On 09/29/2014 09:21 PM, Yury Gribov wrote:

Kasan developers has asked for an option to override offset of Asan
shadow memory region. This should simplify experimenting with memory
layouts on 64-bit architectures.


New patch which checks that -fasan-shadow-offset is only enabled for
-fsanitize=kernel-address. I (unfortunately) can't make this --param
because this can be a 64-bit value.


New patchset that adds strtoull to libiberty (blind copy-paste of
already existing strtoul.c) and uses it to parse -fasan-shadow-offset
(to avoid problem with compiling for 64-bit target a 32-bit host).


A new version of patchset which does a proper implementation of 
strtoll/strtoull in libiberty (with tests, docs and stuff).


Bootstrapped and regtested on x64.

As mentioned previously, I'm not sure how to properly test strtoll 
implementation (strtoll is already part of Linux glibc so my 
implementation is not compiled in by default).  I've manually embedded 
strtoll.o/strtoull.o into libiberty.a and verified that regression tests 
passed.


-Y
>From 1882c41de6c8ae53b7e199b3cc655b6f4b31e8fb Mon Sep 17 00:00:00 2001
From: Yury Gribov 
Date: Thu, 16 Oct 2014 18:31:10 +0400
Subject: [PATCH 1/2] Add strtoll and strtoull to libiberty.

2014-10-20  Yury Gribov  

include/
	* libiberty.h (strtol, strtoul, strtoll, strtoull): New prototypes.

libiberty/
	* strtoll.c: New file.
	* strtoull.c: New file.
	* configure.ac: Add long long checks. Add harness for strtoll and
	strtoull. Check decls for strtol, strtoul, strtoll, strtoull.
	* Makefile.in (CFILES, CONFIGURED_OFILES): Added strtoll and strtoull.
	* config.in: Regenerate.
	* configure: Regenerate.
	* functions.texi: Regenerate.
	* testsuite/Makefile.in (check-strtol): New rule.
	(test-strtol): Likewise.
	(mostlyclean): Clean up strtol test.
	* testsuite/test-strtol.c: New test.
---
 include/libiberty.h   |   27 ++
 libiberty/Makefile.in |   46 +++---
 libiberty/config.in   |   31 +++
 libiberty/configure   |  122 +++-
 libiberty/configure.ac|   14 ++-
 libiberty/functions.texi  |   18 
 libiberty/strtoll.c   |  175 +++
 libiberty/strtoull.c  |  122 
 libiberty/testsuite/Makefile.in   |   12 ++-
 libiberty/testsuite/test-strtol.c |  184 +
 10 files changed, 733 insertions(+), 18 deletions(-)
 create mode 100644 libiberty/strtoll.c
 create mode 100644 libiberty/strtoull.c
 create mode 100644 libiberty/testsuite/test-strtol.c

diff --git a/include/libiberty.h b/include/libiberty.h
index d09c9a5..26355a9 100644
--- a/include/libiberty.h
+++ b/include/libiberty.h
@@ -655,6 +655,33 @@ extern size_t strnlen (const char *, size_t);
 extern int strverscmp (const char *, const char *);
 #endif
 
+#if defined(HAVE_DECL_STRTOL) && !HAVE_DECL_STRTOL
+extern long int strtol (const char *nptr,
+char **endptr, int base);
+#endif
+
+#if defined(HAVE_DECL_STRTOUL) && !HAVE_DECL_STRTOUL
+extern unsigned long int strtoul (const char *nptr,
+  char **endptr, int base);
+#endif
+
+#if defined(HAVE_DECL_STRTOLL) && !HAVE_DECL_STRTOLL
+__extension__
+extern long long int strtoll (const char *nptr,
+  char **endptr, int base);
+#endif
+
+#if defined(HAVE_DECL_STRTOULL) && !HAVE_DECL_STRTOULL
+__extension__
+extern unsigned long long int strtoull (const char *nptr,
+char **endptr, int base);
+#endif
+
+#if defined(HAVE_DECL_STRVERSCMP) && !HAVE_DECL_STRVERSCMP
+/* Compare version strings.  */
+extern int strverscmp (const char *, const char *);
+#endif
+
 /* Set the title of a process */
 extern void setproctitle (const char *name, ...);
 
diff --git a/libiberty/Makefile.in b/libiberty/Makefile.in
index 9b87720..1b0d8ae 100644
--- a/libiberty/Makefile.in
+++ b/libiberty/Makefile.in
@@ -152,8 +152,8 @@ CFILES = alloca.c argv.c asprintf.c atexit.c\
 	 spaces.c splay-tree.c stack-limit.c stpcpy.c stpncpy.c		\
 	 strcasecmp.c strchr.c strdup.c strerror.c strncasecmp.c	\
 	 strncmp.c strrchr.c strsignal.c strstr.c strtod.c strtol.c	\
-	 strtoul.c strndup.c strnlen.c strverscmp.c			\
-	timeval-utils.c tmpnam.c	\
+	 strtoll.c strtoul.c strtoull.c strndup.c strnlen.c \
+	 strverscmp.c timeval-utils.c tmpnam.c\
 	unlink-if-ordinary.c		\
 	vasprintf.c vfork.c vfprintf.c vprintf.c vsnprintf.c vsprintf.c	\
 	waitpid.c			\
@@ -219,8 +219,8 @@ CONFIGURED_OFILES = ./asprintf.$(objext) ./atexit.$(objext)		\
 	 ./strchr.$(objext) ./strdup.$(objext) ./strncasecmp.$(objext)	\
 	 ./strncmp.$(objext) ./strndup.$(objext) ./strnlen.$(objext)	\
 	 ./strrchr.$(objext) ./strstr.$(objext) ./strtod.$(objext)	\
-	 ./strtol.$(objext) ./strtoul.$(objext) ./strverscmp.$(objext)	\
-	./tmpnam.$(objext)		\

Re: [PATCH] Fix modulo patterns in match.pd

2014-10-24 Thread Richard Biener
On Fri, 24 Oct 2014, Jakub Jelinek wrote:

> On Fri, Oct 24, 2014 at 03:27:19PM +0200, Richard Biener wrote:
> > As noted by Marc I forgot to actually utilize the iterator variable.
> > 
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> > 
> > Richard.
> > 
> > PS: How do we want to refer to patterns in ChangeLogs?
> 
> Perhaps the syntax should be (simplify "name" (...) { ... })
> (maybe the name being optional?), where you'd give some name to the
> simplification, say "0 % X" or "0 % X => 0" or "0 % X variant 3"
> or whatever, then you could easily refer to those strings in ChangeLog,
> on gcc-patches, in comments etc.

I ripped out optional name support when I added user-defined predicates
which look like

(match truth_valued_p
  (truth_not @0))

or

(match (logical_inverted_value @0)
 (bit_not truth_valued_p@0))

(un-)conveniently the parsers for (simplify...) and (match...)
are shared.

I can see to re-add the optional pattern naming.  OTOH it will be
fun to invent an unique name for each of them ;)  (patternN
anyone? ...)

Richard.

> > 2014-10-24  Richard Biener  
> > 
> > * match.pd (0 % X): Properly use the iterator iterating over
> > all modulo operators.
> > (X % 1): Likewise.
> > 
> > Index: gcc/match.pd
> > ===
> > --- gcc/match.pd(revision 216648)
> > +++ gcc/match.pd(working copy)
> > @@ -64,13 +64,13 @@ (define_predicates
> >  (for op (ceil_mod floor_mod round_mod trunc_mod)
> >   /* 0 % X is always zero.  */
> >   (simplify
> > -  (trunc_mod integer_zerop@0 @1)
> > +  (op integer_zerop@0 @1)
> >/* But not for 0 % 0 so that we can get the proper warnings and errors.  
> > */
> >(if (!integer_zerop (@1))
> > @0))
> >   /* X % 1 is always zero.  */
> >   (simplify
> > -  (trunc_mod @0 integer_onep)
> > +  (op @0 integer_onep)
> >{ build_zero_cst (type); }))
> >  
> >  /* x | ~0 -> ~0  */


Re: [PATCH] Fix modulo patterns in match.pd

2014-10-24 Thread Jakub Jelinek
On Fri, Oct 24, 2014 at 03:27:19PM +0200, Richard Biener wrote:
> As noted by Marc I forgot to actually utilize the iterator variable.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> Richard.
> 
> PS: How do we want to refer to patterns in ChangeLogs?

Perhaps the syntax should be (simplify "name" (...) { ... })
(maybe the name being optional?), where you'd give some name to the
simplification, say "0 % X" or "0 % X => 0" or "0 % X variant 3"
or whatever, then you could easily refer to those strings in ChangeLog,
on gcc-patches, in comments etc.

> 2014-10-24  Richard Biener  
> 
>   * match.pd (0 % X): Properly use the iterator iterating over
>   all modulo operators.
>   (X % 1): Likewise.
> 
> Index: gcc/match.pd
> ===
> --- gcc/match.pd  (revision 216648)
> +++ gcc/match.pd  (working copy)
> @@ -64,13 +64,13 @@ (define_predicates
>  (for op (ceil_mod floor_mod round_mod trunc_mod)
>   /* 0 % X is always zero.  */
>   (simplify
> -  (trunc_mod integer_zerop@0 @1)
> +  (op integer_zerop@0 @1)
>/* But not for 0 % 0 so that we can get the proper warnings and errors.  */
>(if (!integer_zerop (@1))
> @0))
>   /* X % 1 is always zero.  */
>   (simplify
> -  (trunc_mod @0 integer_onep)
> +  (op @0 integer_onep)
>{ build_zero_cst (type); }))
>  
>  /* x | ~0 -> ~0  */

Jakub


[PATCH] Fix modulo patterns in match.pd

2014-10-24 Thread Richard Biener

As noted by Marc I forgot to actually utilize the iterator variable.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

PS: How do we want to refer to patterns in ChangeLogs?

2014-10-24  Richard Biener  

* match.pd (0 % X): Properly use the iterator iterating over
all modulo operators.
(X % 1): Likewise.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 216648)
+++ gcc/match.pd(working copy)
@@ -64,13 +64,13 @@ (define_predicates
 (for op (ceil_mod floor_mod round_mod trunc_mod)
  /* 0 % X is always zero.  */
  (simplify
-  (trunc_mod integer_zerop@0 @1)
+  (op integer_zerop@0 @1)
   /* But not for 0 % 0 so that we can get the proper warnings and errors.  */
   (if (!integer_zerop (@1))
@0))
  /* X % 1 is always zero.  */
  (simplify
-  (trunc_mod @0 integer_onep)
+  (op @0 integer_onep)
   { build_zero_cst (type); }))
 
 /* x | ~0 -> ~0  */


Re: Patch committed: Don't define TARGET_HAS_F_SETLKW

2014-10-24 Thread Andreas Schwab
Ian Taylor  writes:

> 2014-10-23  Ian Lance Taylor  
>
> * config/mep/mep.h (TARGET_HAS_F_SETLKW): Don't define.

s/define/undefine/

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


[PATCH][6/n] Merge from match-and-simplify, make forwprop fold all stmts

2014-10-24 Thread Richard Biener

This patch makes GIMPLE forwprop fold all statements, following
single-use SSA edges only (as suggested by Jeff and certainly
how this will regress the least until we replace manual
simplification code that does not restrict itself this way).

forwprop is run up to 4 times at the moment (once only for -Og,
not at all for -O0), which still seems reasonable.  IMHO the
forwprop pass immediately after inlining is somewhat superfluous,
it was added there just for its ADDR_EXPR propagation.  We should
eventually split this pass into two.

Note that just folding what we propagated into (like the SSA
propagators do during substitute-and-fold phase) will miss
cases where we propagate into a stmt feeding the one we could
simplify.  Unless we always fold all single-use (and their use)
stmts we have to fold everything from time to time.  Changing
how / when we fold stuff is certainly sth to look after with
fold_stmt now being able to follow SSA edges.

Bootstrapped on x86_64-unknown-linux-gnu, testing still in progress.

>From earlier testing I remember I need to adjust a few testcases
that don't expect the early folding - notably two strlenopt cases
(previously XFAILed but then PASSed again).

I also expect to massage the single-use heuristic as I get to
merging the patterns I added for the various forwprop manual
pattern matchings to trunk (a lot of them do not restrict themselves
this way).

Does this otherwise look ok?

Thanks,
Richard.

2014-10-24  Richard Biener  

* tree-ssa-forwprop.c: Include tree-cfgcleanup.h and tree-into-ssa.h.
(lattice): New global.
(fwprop_ssa_val): New function.
(fold_all_stmts): Likewise.
(pass_forwprop::execute): Finally fold all stmts.

Index: gcc/tree-ssa-forwprop.c
===
--- gcc/tree-ssa-forwprop.c 
(svn+ssh://rgue...@gcc.gnu.org/svn/gcc/trunk/gcc/tree-ssa-forwprop.c)   
(revision 216631)
+++ gcc/tree-ssa-forwprop.c (.../gcc/tree-ssa-forwprop.c)   (working copy)
@@ -54,6 +54,8 @@ along with GCC; see the file COPYING3.
 #include "tree-ssa-propagate.h"
 #include "tree-ssa-dom.h"
 #include "builtins.h"
+#include "tree-cfgcleanup.h"
+#include "tree-into-ssa.h"
 
 /* This pass propagates the RHS of assignment statements into use
sites of the LHS of the assignment.  It's basically a specialized
@@ -3586,6 +3588,93 @@ simplify_mult (gimple_stmt_iterator *gsi
 
   return false;
 }
+
+
+/* Const-and-copy lattice for fold_all_stmts.  */
+static vec lattice;
+
+/* Primitive "lattice" function for gimple_simplify.  */
+
+static tree
+fwprop_ssa_val (tree name)
+{
+  /* First valueize NAME.  */
+  if (TREE_CODE (name) == SSA_NAME
+  && SSA_NAME_VERSION (name) < lattice.length ())
+{
+  tree val = lattice[SSA_NAME_VERSION (name)];
+  if (val)
+   name = val;
+}
+  /* If NAME is not the only use signal we don't want to continue
+ matching into its definition.  */
+  if (TREE_CODE (name) == SSA_NAME
+  && !has_single_use (name))
+return NULL_TREE;
+  return name;
+}
+
+/* Fold all stmts using fold_stmt following only single-use chains
+   and using a simple const-and-copy lattice.  */
+
+static bool
+fold_all_stmts (struct function *fun)
+{
+  bool cfg_changed = false;
+
+  /* Combine stmts with the stmts defining their operands.  Do that
+ in an order that guarantees visiting SSA defs before SSA uses.  */
+  lattice.create (num_ssa_names);
+  lattice.quick_grow_cleared (num_ssa_names);
+  int *postorder = XNEWVEC (int, n_basic_blocks_for_fn (fun));
+  int postorder_num = inverted_post_order_compute (postorder);
+  for (int i = 0; i < postorder_num; ++i)
+{
+  basic_block bb = BASIC_BLOCK_FOR_FN (fun, postorder[i]);
+  for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+  !gsi_end_p (gsi); gsi_next (&gsi))
+   {
+ gimple stmt = gsi_stmt (gsi);
+ gimple orig_stmt = stmt;
+
+ if (fold_stmt (&gsi, fwprop_ssa_val))
+   {
+ stmt = gsi_stmt (gsi);
+ if (maybe_clean_or_replace_eh_stmt (orig_stmt, stmt)
+ && gimple_purge_dead_eh_edges (bb))
+   cfg_changed = true;
+ /* Cleanup the CFG if we simplified a condition to
+true or false.  */
+ if (gimple_code (stmt) == GIMPLE_COND
+ && (gimple_cond_true_p (stmt)
+ || gimple_cond_false_p (stmt)))
+   cfg_changed = true;
+ update_stmt (stmt);
+   }
+
+ /* Fill up the lattice.  */
+ if (gimple_assign_single_p (stmt))
+   {
+ tree lhs = gimple_assign_lhs (stmt);
+ tree rhs = gimple_assign_rhs1 (stmt);
+ if (TREE_CODE (lhs) == SSA_NAME)
+   {
+ if (TREE_CODE (rhs) == SSA_NAME)
+   lattice[SSA_NAME_VERSION (lhs)] = fwprop_ssa_val (rhs);
+ else if (is_gimple_min_invariant (rhs))
+  

Re: [PATCH] Relax check against commuting XOR and ASHIFTRT in combine.c

2014-10-24 Thread Rainer Orth
Alan Lawrence  writes:

> Rainer Orth wrote:
>>> However, as a quick first step, does adding the ilp32 / lp64 (and keeping
>>> the architectures list for now) solve the immediate problem? Patch
>>> attached, OK for trunk?
>>
>> No, as I said this is wrong for biarch targets like sparc and i386.
>
> When you say no this does not solve the immediate problem, are you saying
> that you are (still) seeing test failures with the require-effective-target
> patch applied? Or is the issue that this would not execute the tests as

I didn't try that patch yet, but the target part is wrong, as I tried to
explain.  Consider the sparc case: 

* if you configure for sparc-sun-solaris2.11, you default to -m32
  (i.e. ilp32), while -m64 is lp64

* if you configure for sparcv9-sun-solaris2.11 instead, you default to
  -m64 (lp64), but get ilp32 with -m32

So, irrespective of the sparc vs. sparc64 (which is wrong, btw., the
canonical form for 64-bit-default sparc is sparcv9) forms, you can get
ilp32 and lp64 with both.

Similar issues hold for i?86 vs. x86_64 and probably other biarch
targets like powerpc vs. powerpc64, so you need to use the most generic
forms of the target names in you target lists.

> widely as might be possible? In principle I'm quite happy to relax the
> target patterns, although have been having issues with sparc (below)...
>
> Re. "what the architectures have in common" is largely that these are the
> primary/secondary archs on which I've checked the test passes! I can now
> add mips and microblaze to this list, however I'm nervous of dropping the
> target entirely given the very large number of target architectures gcc
> supports; and e.g. IA64 (in ILP32 mode) generates an ashiftrt:DI by 31
> places, not ashiftrt:SI, which does not match the simplification criteria
> in combine.c.

As I stated before, such target lists without any explanation are bound
to confuse future readers/testers: at the very least, add comments
explaining what those lists have in common.  OTOH, at this stage it
might be best to just drop the target list for now, learn which targets
pass and fail the tests, and then reintroduce them or, better yet, add
an effective-target keyword which matches them.  Otherwise, you'll never
get test coverage beyond your current list.

>> This should be something like 
>>
>>   { target aarch64*-*-* i?86-*-* powerpc*-*-* sparc*-*-* x86_64-*-* }
>>
>> E.g. sparc-sun-solaris2.11 with -m64 is lp64, but would be excluded by
>> your target list.  Keep the list sorted alphabetically and best add an
>> explanation so others know what those targets have in common.
>
> So I've built a stage-1 compiler with --target=sparc-sun-solaris2.11, and I 
> find
>
>   * without -m64, my "dg-require-effective-target ilp32" causes the 32-bit
> test to execute, and pass; "dg-require-effective-target lp64" prevents
> execution of the 64-bit test (which would fail) - so all as expected and
> desired.
>
>   * with -lp64, behaviour is as previous (this is probably expected)

Huh?  What's -lp64?

>   * with -m64, "dg-require-effective-target ilp32" still causes the test to
> execute (but it fails, as the RTL now has an ashiftrt:DI by 31 places,
> which doesn't meet the simplification criteria in combine.c - this is
> pretty much as expected). "dg-require-effective-target lp64" stops the
> 64-bit test from executing however (despite that it would now pass).
>
> Can you clarify what I should be doing on sparc, therefore?

It's not only about sparc, but about all biarch targets.  The following
patch (which only includes the parts strictly necessary to avoid the
failures, nothing else I suggested above) works for me on
sparc-sun-solaris2.11 (-m32 and -m64), x86_64-unknown-linux-gnu (-m64
and -m32), and i686-unknown-linux-gnu (-m32 and -m64): the first test is
run for 64-bit only, while the second one only for 32-bit:

diff --git a/gcc/testsuite/gcc.dg/combine_ashiftrt_1.c b/gcc/testsuite/gcc.dg/combine_ashiftrt_1.c
--- a/gcc/testsuite/gcc.dg/combine_ashiftrt_1.c
+++ b/gcc/testsuite/gcc.dg/combine_ashiftrt_1.c
@@ -1,4 +1,5 @@
-/* { dg-do compile {target sparc64*-*-* aarch64*-*-* x86_64-*-* powerpc64*-*-*} } */
+/* { dg-do compile { target aarch64*-*-* i?86-*-* powerpc*-*-* sparc*-*-* x86_64-*-* } } */
+/* { dg-require-effective-target lp64 } */
 /* { dg-options "-O2 -fdump-rtl-combine-all" } */
 
 typedef long long int int64_t;
diff --git a/gcc/testsuite/gcc.dg/combine_ashiftrt_2.c b/gcc/testsuite/gcc.dg/combine_ashiftrt_2.c
--- a/gcc/testsuite/gcc.dg/combine_ashiftrt_2.c
+++ b/gcc/testsuite/gcc.dg/combine_ashiftrt_2.c
@@ -1,4 +1,5 @@
-/* { dg-do compile {target arm*-*-* i?86-*-* powerpc-*-* sparc-*-*} } */
+/* { dg-do compile { target arm*-*-* i?86-*-* powerpc*-*-* sparc*-*-* x86_64-*-* } } */
+/* { dg-require-effective-target ilp32 } */
 /* { dg-options "-O2 -fdump-rtl-combine-all" } */
 
 typedef long int32_t;

Rainer

-- 
-
Rainer Orth, Cente

[v3] Minor tweaks

2014-10-24 Thread Paolo Carlini

Hi,

tested x86_64-linux.

Thanks,
Paolo.

///
2014-10-24  Paolo Carlini  

* include/bits/atomic_base.h: Avoid including .
* include/std/atomic: When __cplusplus < 201103L skip the rest of
the header.
* testsuite/29_atomics/headers/atomic/std_c++0x_neg.cc: Adjust.
Index: include/bits/atomic_base.h
===
--- include/bits/atomic_base.h  (revision 216624)
+++ include/bits/atomic_base.h  (working copy)
@@ -33,7 +33,6 @@
 #pragma GCC system_header
 
 #include 
-#include 
 #include 
 #include 
 
Index: include/std/atomic
===
--- include/std/atomic  (revision 216624)
+++ include/std/atomic  (working copy)
@@ -36,7 +36,7 @@
 
 #if __cplusplus < 201103L
 # include 
-#endif
+#else
 
 #include 
 
@@ -1129,4 +1129,6 @@
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
 
-#endif
+#endif // C++11
+
+#endif // _GLIBCXX_ATOMIC
Index: testsuite/29_atomics/headers/atomic/std_c++0x_neg.cc
===
--- testsuite/29_atomics/headers/atomic/std_c++0x_neg.cc(revision 
216624)
+++ testsuite/29_atomics/headers/atomic/std_c++0x_neg.cc(working copy)
@@ -18,7 +18,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-#include   // { dg-excess-errors "In file included from" }
+#include 
 
 // { dg-error "ISO C.. 2011" "" { target *-*-* } 32 }
 


Re: [Protopatch 11/11][IA64] Migrate to reduc_(plus|min|max)_scal_v2df optab

2014-10-24 Thread Alan Lawrence

Ooops, attached.commit 56296417b9f6795e541b1101dce6e6ac1789de9a
Author: Alan Lawrence 
Date:   Wed Oct 8 15:58:27 2014 +0100

IA64 (?!)

diff --git a/gcc/config/ia64/vect.md b/gcc/config/ia64/vect.md
index e3ce292..45f4156 100644
--- a/gcc/config/ia64/vect.md
+++ b/gcc/config/ia64/vect.md
@@ -1217,45 +1217,54 @@
   "fpmin %0 = %1, %2"
   [(set_attr "itanium_class" "fmisc")])
 
-(define_expand "reduc_splus_v2sf"
-  [(match_operand:V2SF 0 "fr_register_operand" "")
+(define_expand "reduc_plus_scal_v2sf"
+  [(match_operand:SF 0 "fr_register_operand" "")
(match_operand:V2SF 1 "fr_register_operand" "")]
   ""
 {
   rtx tmp = gen_reg_rtx (V2SFmode);
+  rtx tmp2 = gen_reg_rtx (V2SFmode);
+
   if (TARGET_BIG_ENDIAN)
 emit_insn (gen_fswap (tmp, CONST0_RTX (V2SFmode), operands[1]));
   else
 emit_insn (gen_fswap (tmp, operands[1], CONST0_RTX (V2SFmode)));
-  emit_insn (gen_addv2sf3 (operands[0], operands[1], tmp));
+  emit_insn (gen_addv2sf3 (tmp2, operands[1], tmp));
+  emit_insn (gen_vec_extractv2sf (operands[0], tmp2, GEN_INT (0)));
   DONE;
 })
 
-(define_expand "reduc_smax_v2sf"
-  [(match_operand:V2SF 0 "fr_register_operand" "")
+(define_expand "reduc_smax_scal_v2sf"
+  [(match_operand:SF 0 "fr_register_operand" "")
(match_operand:V2SF 1 "fr_register_operand" "")]
   ""
 {
   rtx tmp = gen_reg_rtx (V2SFmode);
+  rtx tmp2 = gen_reg_rtx (V2SFmode);
+
   if (TARGET_BIG_ENDIAN)
 emit_insn (gen_fswap (tmp, CONST0_RTX (V2SFmode), operands[1]));
   else
 emit_insn (gen_fswap (tmp, operands[1], CONST0_RTX (V2SFmode)));
-  emit_insn (gen_smaxv2sf3 (operands[0], operands[1], tmp));
+  emit_insn (gen_smaxv2sf3 (tmp2, operands[1], tmp));
+  emit_insn (gen_vec_extractv2sf (operands[0], tmp2, GEN_INT (0)));
   DONE;
 })
 
-(define_expand "reduc_smin_v2sf"
-  [(match_operand:V2SF 0 "fr_register_operand" "")
+(define_expand "reduc_smin_scal_v2sf"
+  [(match_operand:SF 0 "fr_register_operand" "")
(match_operand:V2SF 1 "fr_register_operand" "")]
   ""
 {
   rtx tmp = gen_reg_rtx (V2SFmode);
+  rtx tmp2 = gen_reg_rtx (V2SFmode);
+
   if (TARGET_BIG_ENDIAN)
 emit_insn (gen_fswap (tmp, CONST0_RTX (V2SFmode), operands[1]));
   else
 emit_insn (gen_fswap (tmp, operands[1], CONST0_RTX (V2SFmode)));
-  emit_insn (gen_sminv2sf3 (operands[0], operands[1], tmp));
+  emit_insn (gen_sminv2sf3 (tmp2, operands[1], tmp));
+  emit_insn (gen_vec_extractv2sf (operands[0], tmp2, GEN_INT (0)));
   DONE;
 })
 

Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal

2014-10-24 Thread Alan Lawrence

Ooops, attached.commit e48d59399722ce8316d4b1b4f28b40d87b1193fa
Author: Alan Lawrence 
Date:   Tue Oct 7 15:28:47 2014 +0100

PowerPC v2 (but not paired.md)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 02ea142..92bb5d0 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2596,35 +2596,22 @@
   operands[3] = gen_reg_rtx (GET_MODE (operands[0]));
 })
 
-(define_expand "reduc_splus_"
-  [(set (match_operand:VIshort 0 "register_operand" "=v")
+(define_expand "reduc_plus_scal_"
+  [(set (match_operand: 0 "register_operand" "=v")
 (unspec:VIshort [(match_operand:VIshort 1 "register_operand" "v")]
 			UNSPEC_REDUC_PLUS))]
   "TARGET_ALTIVEC"
 {
   rtx vzero = gen_reg_rtx (V4SImode);
   rtx vtmp1 = gen_reg_rtx (V4SImode);
-  rtx dest = gen_lowpart (V4SImode, operands[0]);
+  rtx vtmp2 = gen_reg_rtx (mode);
+  rtx dest = gen_lowpart (V4SImode, vtmp2);
+  HOST_WIDE_INT last_elem = GET_MODE_NUNITS (mode) - 1;
 
   emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
   emit_insn (gen_altivec_vsum4ss (vtmp1, operands[1], vzero));
   emit_insn (gen_altivec_vsumsws_direct (dest, vtmp1, vzero));
-  DONE;
-})
-
-(define_expand "reduc_uplus_v16qi"
-  [(set (match_operand:V16QI 0 "register_operand" "=v")
-(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v")]
-		  UNSPEC_REDUC_PLUS))]
-  "TARGET_ALTIVEC"
-{
-  rtx vzero = gen_reg_rtx (V4SImode);
-  rtx vtmp1 = gen_reg_rtx (V4SImode);
-  rtx dest = gen_lowpart (V4SImode, operands[0]);
-
-  emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
-  emit_insn (gen_altivec_vsum4ubs (vtmp1, operands[1], vzero));
-  emit_insn (gen_altivec_vsumsws_direct (dest, vtmp1, vzero));
+  rs6000_expand_vector_extract (operands[0], vtmp2, last_elem);
   DONE;
 })
 
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 237724e..54b18aa 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -81,7 +81,7 @@
 ;; Vector reduction code iterators
 (define_code_iterator VEC_reduc [plus smin smax])
 
-(define_code_attr VEC_reduc_name [(plus "splus")
+(define_code_attr VEC_reduc_name [(plus "plus")
   (smin "smin")
   (smax "smax")])
 
@@ -1077,18 +1077,20 @@
 
 ;; Vector reduction expanders for VSX
 
-(define_expand "reduc__v2df"
-  [(parallel [(set (match_operand:V2DF 0 "vfloat_operand" "")
-		   (VEC_reduc:V2DF
-		(vec_concat:V2DF
-		 (vec_select:DF
-		  (match_operand:V2DF 1 "vfloat_operand" "")
-		  (parallel [(const_int 1)]))
-		 (vec_select:DF
-		  (match_dup 1)
-		  (parallel [(const_int 0)])))
-		(match_dup 1)))
-	  (clobber (match_scratch:V2DF 2 ""))])]
+(define_expand "reduc__scal_v2df"
+  [(parallel [(set (match_operand:DF 0 "vfloat_operand" "")
+		   (vec_select:DF
+		(VEC_reduc:V2DF
+		 (vec_concat:V2DF
+		  (vec_select:DF
+		   (match_operand:V2DF 1 "vfloat_operand" "")
+		   (parallel [(const_int 1)]))
+		  (vec_select:DF
+		   (match_dup 1)
+		   (parallel [(const_int 0)])))
+		 (match_dup 1))
+		(parallel [(const_int 1)])))
+	  (clobber (match_scratch:DF 2 ""))])]
   "VECTOR_UNIT_VSX_P (V2DFmode)"
   "")
 
@@ -1099,13 +1101,16 @@
 ; is to allow us to use a code iterator, but not completely list all of the
 ; vector rotates, etc. to prevent canonicalization
 
-(define_expand "reduc__v4sf"
-  [(parallel [(set (match_operand:V4SF 0 "vfloat_operand" "")
-		   (VEC_reduc:V4SF
-		(unspec:V4SF [(const_int 0)] UNSPEC_REDUC)
-		(match_operand:V4SF 1 "vfloat_operand" "")))
+(define_expand "reduc__scal_v4sf"
+  [(parallel [(set (match_operand:SF 0 "vfloat_operand" "")
+		   (vec_select:SF
+		(VEC_reduc:V4SF
+		 (unspec:V4SF [(const_int 0)] UNSPEC_REDUC)
+		 (match_operand:V4SF 1 "vfloat_operand" ""))
+		(parallel [(const_int 3)])))
 	  (clobber (match_scratch:V4SF 2 ""))
-	  (clobber (match_scratch:V4SF 3 ""))])]
+	  (clobber (match_scratch:V4SF 3 ""))
+	  (clobber (match_scratch:V4SF 4 ""))])]
   "VECTOR_UNIT_VSX_P (V4SFmode)"
   "")
 

[PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal

2014-10-24 Thread Alan Lawrence
This migrates the reduction patterns in altivec.md and vector.md to the new 
names. I've not touched paired.md as I wasn't really sure how to fix that (how 
do I vec_extractv2sf ?), moreover the testing I did didn't seem to exercise any 
of those patterns (iow: I'm not sure what would be an appropriate target machine?).


I note the reduc_uplus_v16qi (which I've removed, as unsigned and signed 
addition should be equivalent) differed from reduc_splus_v16qi in using 
gen_altivec_vsum4ubs rather than gen_altivec_vsum4sbs.  Testcases 
gcc.dg/vect/{slp-24-big-array.c,slp-24.c,vect-reduc-1char-big-array.c,vert-reduc-1char.c} 
thus produce assembly which differs from previously (only) in that "vsum4ubs" 
becomes "vsum4sbs". These tests are still passing so I assume this is OK.


The combining of signed and unsigned addition also improves 
gcc.dg/vect/{vect-outer-4i.c,vect-reduc-1short.c,vect-reduc-dot-u8b.c,vect-reduc-pattern-1c-big-array.c,vect-reduc-pattern-1c.c} 
: these are now reduced using direct vector reduction, rather than with shifts 
as previously (because there was only a reduc_splus rather than the reduc_uplus 
these tests looked for).


((Side note: the RTL changes to vector.md are to match the combine patterns in 
vsx.md; now that we now longer depend upon combine to generate those patterns 
(as the optab outputs them directly), one might wish to remove the smaller 
pattern from vsx.md, and/or simplify the RTL. I theorize that a reduction of a 
two-element vector is just adding the first element to the second, so maybe to 
something like


  [(parallel [(set (match_operand:DF 0 "vfloat_operand" "")
   (VEC_reduc:V2DF
(vec_select:DF
 (match_operand:V2DF 1 "vfloat_operand" "")
 (parallel [(const_int 1)]))
(vec_select:DF
 (match_dup 1)
 (parallel [(const_int 0)]
  (clobber (match_scratch:V2DF 2 ""))])]

but I think it's best for me to leave that to the port maintainers.))

Bootstrapped and check-gcc on powerpc64-none-linux-gnu (gcc110.fsffrance.org, 
with thanks to the GCC Compile Farm).


gcc/ChangeLog:

* config/rs6000/altivec.md (reduc_splus_): Rename to...
(reduc_plus_scal_): ...this, and rs6000_expand_vector_extract.
(reduc_uplus_v16qi): Remove.

* config/rs6000/vector.md (VEC_reduc_name): change "splus" to "plus"
(reduc__v2df): Rename to...
(reduc__scal_v2df): ...this, wrap VEC_reduc in a
vec_select of element 1.
(reduc__v4sf): Rename to...
(reduc__scal_v4sf): ...this, wrap VEC_reduc in a
vec_select of element 3, add scratch register.



[Protopatch 11/11][IA64] Migrate to reduc_(plus|min|max)_scal_v2df optab

2014-10-24 Thread Alan Lawrence
This is an attempt to migrate IA64 to the newer optabs, however, I found none of 
the tests in gcc.dg/vect seemed to touch any of the affected patternsso this 
is only really tested by building a stage-1 compiler.


gcc/ChangeLog:

* config/ia64/vect.md (reduc_splus_v2sf): Rename to...
(reduc_plus_v2sf): ...this, add a vec_extractv2sf.
(reduc_smin_v2sf): Rename to...
(reduc_smin_scal_v2sf): ...this, add a vec_extractv2sf.
(reduc_smax_v2sf): Rename to...
(reduc_smax_scal_v2sf): ...this, add a vec_extractv2sf.



[PATCH] c11-atomic-exec-5: Avoid dead code where LDBL_MANT_DIG is 106

2014-10-24 Thread Maciej W. Rozycki
Hi,

 Commit 216437 missed a part of Adhemerval's original change that made 
`long_double_add_overflow', `complex_long_double_add_overflow', 
`long_double_sub_overflow' and `complex_long_double_sub_overflow' tests 
consistently defined only if called.  These tests are now only made under 
the `LDBL_MANT_DIG != 106' condition, otherwise there is no need to 
provide definitions that become dead code.

 Here's the missing part, I have verified the source still builds after 
the change manually with:

$ gcc -U__LDBL_MANT_DIG__ -D__LDBL_MANT_DIG__=113 -Wunused-function -std=c11 
-pedantic-errors -pthread -D_POSIX_C_SOURCE=200809L -lm -latomic -o 
c11-atomic-exec-5 c11-atomic-exec-5.c

and:

$ gcc -U__LDBL_MANT_DIG__ -D__LDBL_MANT_DIG__=106 -Wunused-function -std=c11 
-pedantic-errors -pthread -D_POSIX_C_SOURCE=200809L -lm -latomic -o 
c11-atomic-exec-5 c11-atomic-exec-5.c

It also passed regression testing with the powerpc-gnu-linux target and my 
usual multilibs that have LDBL_MANT_DIG set to 106, which is the only case 
this change really affects.

 Without this change I get this instead:

$ gcc -U__LDBL_MANT_DIG__ -D__LDBL_MANT_DIG__=113 -Wunused-function -std=c11 
-pedantic-errors -pthread -D_POSIX_C_SOURCE=200809L -lm -latomic -o 
c11-atomic-exec-5 c11-atomic-exec-5.c
$

(OK), and:

$ gcc -U__LDBL_MANT_DIG__ -D__LDBL_MANT_DIG__=106 -Wunused-function -std=c11 
-pedantic-errors -pthread -D_POSIX_C_SOURCE=200809L -lm -latomic -o 
c11-atomic-exec-5 c11-atomic-exec-5.c
c11-atomic-exec-5.c:62:1: warning: 'test_main_long_double_add_overflow' 
definedbut not used [-Wunused-function]
 test_main_##NAME (void)   \
 ^
c11-atomic-exec-5.c:334:1: note: in expansion of macro 'TEST_FUNCS'
 TEST_FUNCS (long_double_add_overflow, long double, , += LDBL_MAX, 0,
 ^
c11-atomic-exec-5.c:62:1: warning: 
'test_main_complex_long_double_add_overflow'defined but not used 
[-Wunused-function]
 test_main_##NAME (void)   \
 ^
c11-atomic-exec-5.c:352:1: note: in expansion of macro 'TEST_FUNCS'
 TEST_FUNCS (complex_long_double_add_overflow, _Complex long double, , += 
LDBL_MAX, 0,
 ^
c11-atomic-exec-5.c:62:1: warning: 'test_main_long_double_sub_overflow' 
definedbut not used [-Wunused-function]
 test_main_##NAME (void)   \
 ^
c11-atomic-exec-5.c:358:1: note: in expansion of macro 'TEST_FUNCS'
 TEST_FUNCS (long_double_sub_overflow, long double, , -= LDBL_MAX, 0,
 ^
c11-atomic-exec-5.c:62:1: warning: 
'test_main_complex_long_double_sub_overflow'defined but not used 
[-Wunused-function]
 test_main_##NAME (void)   \
 ^
c11-atomic-exec-5.c:376:1: note: in expansion of macro 'TEST_FUNCS'
 TEST_FUNCS (complex_long_double_sub_overflow, _Complex long double, , -= 
LDBL_MAX, 0,
 ^
$

(not quite so).

 This also wraps the definitions of the `NOT_LDBL_EPSILON_2' and 
`NOT_MINUS_LDBL_EPSILON_2' macros into this condition, but these aren't 
referred to if `LDBL_MANT_DIG' is 106 either.

 No changes compared to original code so all credit goes to Adhemerval.  
OK to apply?

2014-10-24  Adhemerval Zanella  

gcc/testsuite/
* gcc.dg/atomic/c11-atomic-exec-5.c 
(test_main_long_double_add_overflow): Only actually define if
LDBL_MANT_DIG != 106.
(test_main_complex_long_double_add_overflow): Likewise.
(test_main_long_double_sub_overflow): Likewise.
(test_main_complex_long_double_sub_overflow): Likewise.

(NOT_LDBL_EPSILON_2): Likewise.
(NOT_MINUS_LDBL_EPSILON_2): Likewise.

  Maciej

gcc-r216437-azanella-rs6000-atomic-assign-expand-env-update.diff
Index: gcc-fsf-trunk-quilt/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c
===
--- gcc-fsf-trunk-quilt.orig/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c
2014-10-22 21:59:45.788954624 +0100
+++ gcc-fsf-trunk-quilt/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c 
2014-10-22 21:59:15.788143775 +0100
@@ -331,11 +331,11 @@ TEST_FUNCS (complex_double_div_overflow,
 TEST_FUNCS (long_double_add_invalid, long double, , += __builtin_infl (), 0,
0, __builtin_isinf, 0,
-__builtin_infl (), FE_INVALID)
+#if LDBL_MANT_DIG != 106
 TEST_FUNCS (long_double_add_overflow, long double, , += LDBL_MAX, 0,
LDBL_MAX, __builtin_isinf, FE_OVERFLOW | FE_INEXACT,
0, 0)
 #define NOT_LDBL_EPSILON_2(X) ((X) != LDBL_EPSILON / 2)
-#if LDBL_MANT_DIG != 106
 TEST_FUNCS (long_double_add_inexact, long double, , += LDBL_EPSILON / 2, 0,
1.0L, NOT_LDBL_EPSILON_2, FE_INEXACT,
0, 0)
@@ -348,18 +348,18 @@ TEST_FUNCS (long_double_preinc_inexact, 
 TEST_FUNCS (long_double_postinc_inexact, long double, , ++, 0,
LDBL_EPSILON / 2, NOT_MINUS_1, FE_INEXACT,
-1, 0)
-#endif
 TEST_FUNCS (complex_long_double_add_overflow, _Complex long double, , += 
LDBL_MAX, 0,
LDBL_MAX, REAL_ISINF, FE_OVERFLOW | FE_INEXACT,
0, 0)
+#endif
 TEST_FUNCS (long_double_sub_invalid, long double, , -= __builtin_infl (),

[PATCH 9/11][i386] Migrate reduction optabs to reduc_..._scal

2014-10-24 Thread Alan Lawrence

Bootstrapped and check-gcc on x86_64-none-linux-gnu.

gcc/ChangeLog:

* config/i386/i386.c (ix86_expand_reduc): Extract result into scalar.
* config/i386/sse.md (reduc_splus_v8df, reduc__ * 3,
reduc_umin_v8hi): Rename to...
(reduc_plus_scal_v8df, reduc__scal_ * 3,
reduc_umin_scal_v8hi): ...these, changing result mode to scalar.

(reduc_splus_v4df, reduc_splus_v2df, reduc_splus_v16sf,
reduc_splus_v8sf, reduc_splus_v4sf): Rename to...
(reduc_plus_scal_v4df, reduc_plus_scal_v2df, reduc_plus_scal_v16sf,
reduc_plus_scal_v8sf, reduc_plus_scal_v4sf): ...these, adding
gen_vec_extract for scalar result.commit 80b0d10a78b2f3e86325f373e99e9cf71e42e622
Author: Alan Lawrence 
Date:   Tue Oct 7 13:25:08 2014 +0100

i386

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4c4a6eb..670a5f5 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -41211,12 +41211,12 @@ emit_reduc_half (rtx dest, rtx src, int i)
 }
 
 /* Expand a vector reduction.  FN is the binary pattern to reduce;
-   DEST is the destination; IN is the input vector.  */
+   DEST is the (scalar) destination; IN is the input vector.  */
 
 void
 ix86_expand_reduc (rtx (*fn) (rtx, rtx, rtx), rtx dest, rtx in)
 {
-  rtx half, dst, vec = in;
+  rtx half, dst = NULL_RTX, vec = in;
   enum machine_mode mode = GET_MODE (in);
   int i;
 
@@ -41225,23 +41225,21 @@ ix86_expand_reduc (rtx (*fn) (rtx, rtx, rtx), rtx dest, rtx in)
   && mode == V8HImode
   && fn == gen_uminv8hi3)
 {
-  emit_insn (gen_sse4_1_phminposuw (dest, in));
-  return;
+  dst = gen_reg_rtx (mode);
+  emit_insn (gen_sse4_1_phminposuw (dst, in));
 }
-
-  for (i = GET_MODE_BITSIZE (mode);
-   i > GET_MODE_BITSIZE (GET_MODE_INNER (mode));
-   i >>= 1)
-{
+  else
+for (i = GET_MODE_BITSIZE (mode);
+	  i > GET_MODE_BITSIZE (GET_MODE_INNER (mode));
+	  i >>= 1)
+  {
   half = gen_reg_rtx (mode);
   emit_reduc_half (half, vec, i);
-  if (i == GET_MODE_BITSIZE (GET_MODE_INNER (mode)) * 2)
-	dst = dest;
-  else
-	dst = gen_reg_rtx (mode);
+  dst = gen_reg_rtx (mode);
   emit_insn (fn (dst, half, vec));
   vec = dst;
 }
+  ix86_expand_vector_extract (false, dest, dst, 0);
 }
 
 /* Target hook for scalar_mode_supported_p.  */
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index e7646d7..e4e0b95 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -2238,8 +2238,8 @@
(set_attr "prefix_rep" "1,*")
(set_attr "mode" "V4SF")])
 
-(define_expand "reduc_splus_v8df"
-  [(match_operand:V8DF 0 "register_operand")
+(define_expand "reduc_plus_scal_v8df"
+  [(match_operand:DF 0 "register_operand")
(match_operand:V8DF 1 "register_operand")]
   "TARGET_AVX512F"
 {
@@ -2247,30 +2247,35 @@
   DONE;
 })
 
-(define_expand "reduc_splus_v4df"
-  [(match_operand:V4DF 0 "register_operand")
+(define_expand "reduc_plus_scal_v4df"
+  [(match_operand:DF 0 "register_operand")
(match_operand:V4DF 1 "register_operand")]
   "TARGET_AVX"
 {
   rtx tmp = gen_reg_rtx (V4DFmode);
   rtx tmp2 = gen_reg_rtx (V4DFmode);
+  rtx tmp3 = gen_reg_rtx (V4DFmode);
+  
   emit_insn (gen_avx_haddv4df3 (tmp, operands[1], operands[1]));
   emit_insn (gen_avx_vperm2f128v4df3 (tmp2, tmp, tmp, GEN_INT (1)));
-  emit_insn (gen_addv4df3 (operands[0], tmp, tmp2));
+  emit_insn (gen_addv4df3 (tmp3, tmp, tmp2));
+  emit_insn (gen_vec_extractv4df (operands[0], tmp3, GEN_INT (1)));
   DONE;
 })
 
-(define_expand "reduc_splus_v2df"
-  [(match_operand:V2DF 0 "register_operand")
+(define_expand "reduc_plus_scal_v2df"
+  [(match_operand:DF 0 "register_operand")
(match_operand:V2DF 1 "register_operand")]
   "TARGET_SSE3"
 {
-  emit_insn (gen_sse3_haddv2df3 (operands[0], operands[1], operands[1]));
+  rtx tmp = gen_reg_rtx (V2DFmode);
+  emit_insn (gen_sse3_haddv2df3 (tmp, operands[1], operands[1]));
+  emit_insn (gen_vec_extractv2df (operands[0], tmp, GEN_INT (0)));
   DONE;
 })
 
-(define_expand "reduc_splus_v16sf"
-  [(match_operand:V16SF 0 "register_operand")
+(define_expand "reduc_plus_scal_v16sf"
+  [(match_operand:SF 0 "register_operand")
(match_operand:V16SF 1 "register_operand")]
   "TARGET_AVX512F"
 {
@@ -2278,30 +2283,35 @@
   DONE;
 })
 
-(define_expand "reduc_splus_v8sf"
-  [(match_operand:V8SF 0 "register_operand")
+(define_expand "reduc_plus_scal_v8sf"
+  [(match_operand:SF 0 "register_operand")
(match_operand:V8SF 1 "register_operand")]
   "TARGET_AVX"
 {
   rtx tmp = gen_reg_rtx (V8SFmode);
   rtx tmp2 = gen_reg_rtx (V8SFmode);
+  rtx tmp3 = gen_reg_rtx (V8SFmode);
+  
   emit_insn (gen_avx_haddv8sf3 (tmp, operands[1], operands[1]));
   emit_insn (gen_avx_haddv8sf3 (tmp2, tmp, tmp));
   emit_insn (gen_avx_vperm2f128v8sf3 (tmp, tmp2, tmp2, GEN_INT (1)));
-  emit_insn (gen_addv8sf3 (operands[0], tmp, tmp2));
+  emit_insn (gen_addv8sf3 (tmp3, tmp, tmp2));
+  emit_insn (gen_vec_extractv8sf (operands[0], tmp3, 

Re: [PATCH v2 0-6/11] Fix PR/61114, make direct vector reductions endianness-neutral

2014-10-24 Thread Richard Biener
On Fri, 24 Oct 2014, Alan Lawrence wrote:

> This is the first half of my previous patch series
> (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01456.html), that is the part
> making the REDUC_..._EXPR tree codes endian-neutral, and adding a new
> reduce-to-scalar optab in place of the endianness-dependent
> reduc_[us](plus|min|max)_optab.
> 
> I'm leaving the vec_shr portion out of this patch series, as the link between
> the two halves is only the end goal of removing an "if (BYTES_BIG_ENDIAN)"
> from tree-vect-loop.c; this series removes that from one code path so can
> stand alone.
> 
> Patches 1-6 are as previously posted apart from rebasing and removing the
> old/poisoned AArch64 patterns as per maintainer's request. Patches 1, 2, 4, 5
> and 6 have already been approved; patch 3 was discussed somewhat but I think
> we decided against most of the ideas raised, I have added comment to
> scalar_reduc_to_vector. I now reread Richie's "Otherwise the patch looks good
> to me" and wonder if I should have taken that as an approval but I didn't read
> it that way at the time...???

Yes, it was an approval ;)

> Patches 7-11 migrate migrate ARM, x86, IA64 (I think), and mostly PowerPC, to
> the new reduc_(plus|[us](min|max))_scal_optab. I have not managed to work out
> how to do the same for MIPS (specifically what I need to add to
> mips_expand_vec_reduc), and have had no response from the maintainers, so am
> leaving that for now. Also I haven't migrated (or worked out how to target)
> rs6000/paired.md, help would be most welcome.
> 
> 
> The suggestion was then to "complete" the migration, by removing the old
> optabs. There are a few options here and I'll follow up with appropriate
> patches according to feedback received. I see options:
> 
> (1) just delete the old optabs (and the migration code). This would
> performance-regress the MIPS backend, but should not break it, although one
> should really do *something* with the then-unused
> reduc_[us](plus|min|max)_optab in config/mips/loongson.md.
>
> (2) also renaming reduc_..._scal_optab back to reduc_..._optab; would break
> the MIPS backend if something were not done with it's existing patterns.
> 
> (2a) Alternatively I could just use a different new name, e.g. reduce_,
> reduct_, vec_reduc_..., anything that's less of a mouthful than
> reduc_..._scal. Whilst being only-very-slightly-different from the current
> reduc_... might be confusing, so might changing the meaning of the optab, and
> its signature, with the existing name, so am open to suggestions?

I definitely prefer (2).

Thanks,
Richard.


[PATCH 8/11][ARM] Migrate to new reduc_[us](min|max)_scal_optab

2014-10-24 Thread Alan Lawrence

Similarly to last patch.

Tested, in combination with previous patch:
bootstrap on arm-none-linux-gnueabihf
cross-tested check-gcc on arm-none-eabi.

gcc/ChangeLog:

config/arm/neon.md (reduc_smin_ *2): Rename to...
(reduc_smin_scal_ *2): ...this; extract scalar result.
(reduc_smax_ *2): Rename to...
(reduc_smax_scal_ *2): ...this; extract scalar result.
(reduc_umin_ *2): Rename to...
(reduc_umin_scal_ *2): ...this; extract scalar result.
(reduc_umax_ *2): Rename to...
(reduc_umax_scal_ *2): ...this; extract scalar result.commit 537c31561933f8054a2289198f35b19cf5c4196e
Author: Alan Lawrence 
Date:   Thu Aug 28 16:49:24 2014 +0100

ARM reduc_[us](min|max)_scal, V_elem not V_ext, rm old non-_scal version.

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index d13fe5d..19e1ba0 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1398,104 +1398,109 @@
   [(set_attr "type" "neon_add_q")]
 )
 
-(define_expand "reduc_smin_"
-  [(match_operand:VD 0 "s_register_operand" "")
+(define_expand "reduc_smin_scal_"
+  [(match_operand: 0 "nonimmediate_operand" "")
(match_operand:VD 1 "s_register_operand" "")]
   "TARGET_NEON && (! || flag_unsafe_math_optimizations)"
 {
-  neon_pairwise_reduce (operands[0], operands[1], mode,
+  rtx vec = gen_reg_rtx (mode);
+
+  neon_pairwise_reduce (vec, operands[1], mode,
 			&gen_neon_vpsmin);
+  /* The result is computed into every element of the vector.  */
+  emit_insn (gen_vec_extract (operands[0], vec, const0_rtx));
   DONE;
 })
 
-(define_expand "reduc_smin_"
-  [(match_operand:VQ 0 "s_register_operand" "")
+(define_expand "reduc_smin_scal_"
+  [(match_operand: 0 "nonimmediate_operand" "")
(match_operand:VQ 1 "s_register_operand" "")]
   "TARGET_NEON && (! || flag_unsafe_math_optimizations)
&& !BYTES_BIG_ENDIAN"
 {
   rtx step1 = gen_reg_rtx (mode);
-  rtx res_d = gen_reg_rtx (mode);
 
   emit_insn (gen_quad_halves_smin (step1, operands[1]));
-  emit_insn (gen_reduc_smin_ (res_d, step1));
-  emit_insn (gen_move_lo_quad_ (operands[0], res_d));
+  emit_insn (gen_reduc_smin_scal_ (operands[0], step1));
 
   DONE;
 })
 
-(define_expand "reduc_smax_"
-  [(match_operand:VD 0 "s_register_operand" "")
+(define_expand "reduc_smax_scal_"
+  [(match_operand: 0 "nonimmediate_operand" "")
(match_operand:VD 1 "s_register_operand" "")]
   "TARGET_NEON && (! || flag_unsafe_math_optimizations)"
 {
-  neon_pairwise_reduce (operands[0], operands[1], mode,
+  rtx vec = gen_reg_rtx (mode);
+  neon_pairwise_reduce (vec, operands[1], mode,
 			&gen_neon_vpsmax);
+  /* The result is computed into every element of the vector.  */
+  emit_insn (gen_vec_extract (operands[0], vec, const0_rtx));
   DONE;
 })
 
-(define_expand "reduc_smax_"
-  [(match_operand:VQ 0 "s_register_operand" "")
+(define_expand "reduc_smax_scal_"
+  [(match_operand: 0 "nonimmediate_operand" "")
(match_operand:VQ 1 "s_register_operand" "")]
   "TARGET_NEON && (! || flag_unsafe_math_optimizations)
&& !BYTES_BIG_ENDIAN"
 {
   rtx step1 = gen_reg_rtx (mode);
-  rtx res_d = gen_reg_rtx (mode);
 
   emit_insn (gen_quad_halves_smax (step1, operands[1]));
-  emit_insn (gen_reduc_smax_ (res_d, step1));
-  emit_insn (gen_move_lo_quad_ (operands[0], res_d));
+  emit_insn (gen_reduc_smax_scal_ (operands[0], step1));
 
   DONE;
 })
 
-(define_expand "reduc_umin_"
-  [(match_operand:VDI 0 "s_register_operand" "")
+(define_expand "reduc_umin_scal_"
+  [(match_operand: 0 "nonimmediate_operand" "")
(match_operand:VDI 1 "s_register_operand" "")]
   "TARGET_NEON"
 {
-  neon_pairwise_reduce (operands[0], operands[1], mode,
+  rtx vec = gen_reg_rtx (mode);
+  neon_pairwise_reduce (vec, operands[1], mode,
 			&gen_neon_vpumin);
+  /* The result is computed into every element of the vector.  */
+  emit_insn (gen_vec_extract (operands[0], vec, const0_rtx));
   DONE;
 })
 
-(define_expand "reduc_umin_"
-  [(match_operand:VQI 0 "s_register_operand" "")
+(define_expand "reduc_umin_scal_"
+  [(match_operand: 0 "nonimmediate_operand" "")
(match_operand:VQI 1 "s_register_operand" "")]
   "TARGET_NEON && !BYTES_BIG_ENDIAN"
 {
   rtx step1 = gen_reg_rtx (mode);
-  rtx res_d = gen_reg_rtx (mode);
 
   emit_insn (gen_quad_halves_umin (step1, operands[1]));
-  emit_insn (gen_reduc_umin_ (res_d, step1));
-  emit_insn (gen_move_lo_quad_ (operands[0], res_d));
+  emit_insn (gen_reduc_umin_scal_ (operands[0], step1));
 
   DONE;
 })
 
-(define_expand "reduc_umax_"
-  [(match_operand:VDI 0 "s_register_operand" "")
+(define_expand "reduc_umax_scal_"
+  [(match_operand: 0 "nonimmediate_operand" "")
(match_operand:VDI 1 "s_register_operand" "")]
   "TARGET_NEON"
 {
-  neon_pairwise_reduce (operands[0], operands[1], mode,
+  rtx vec = gen_reg_rtx (mode);
+  neon_pairwise_reduce (vec, operands[1], mode,
 			&gen_neon_vpumax);
+  /* The result is computed into every element of the vector.  */
+  emit_insn (gen_vec_extract (operands[0], vec, const0

Re: [PATCH][ARM] revert changes on check_effective_target_arm_*_ok

2014-10-24 Thread Ramana Radhakrishnan
On Fri, Oct 24, 2014 at 12:47 PM, Jiong Wang  wrote:
> we should not add explicit declaration there.
>
> arm_neon.h contains those prototype already. they will be available if the
> compiler configuration is with related builtin predefine, for example
> __ARM_FEATURE_CRYPTO.
>
> so, actually, if there is any warning when compile these test programs, they
> are expected,
> and we rely on these warnings to check whether certain features are
> available.
>
> previously, I only verified on arm-none-linux-gnueabi cross check, so have
> not exposed
> these regressions.

 I had also missed the vaes and vfma turning on by default by this
change. This is OK.

ramana
>
> no verified on arm-none-linux-gnueabihf, regression gone away on arm
> directory.
>
> make check RUNTESTFLAGS="aapcs.exp neon.exp acle.exp simd.exp arm.exp"
>
> ok for trunk?
>
> gcc/testsuite/
>
> * lib/target-supports.exp
> (check_effective_target_arm_crypto_ok_nocache): Remove declaration
> for
> vaeseq_u8.
> (check_effective_target_arm_neon_fp16_ok_nocache): Remove
> declaration for
> vcvt_f16_f32.
> (check_effective_target_arm_neonv2_ok_nocache): Remove declaration
> for
> vfma_f32.


[PATCH 7/11][ARM] Migrate to new reduc_plus_scal_optab

2014-10-24 Thread Alan Lawrence
This migrates ARM from reduc_splus_optab and reduc_uplus optab to a single 
reduc_plus_optab.


Tested, in combination with next patch:
bootstrap on arm-none-linux-gnueabihf
cross-tested check-gcc on arm-none-eabi.

gcc/ChangeLog:

config/arm/neon.md (reduc_plus_*): Rename to...
(reduc_plus_scal_*): ...this; reduce to temp and extract scalar result.commit 22e60bd46f2a591f5357a543d76b19ed89f401ed
Author: Alan Lawrence 
Date:   Thu Aug 28 16:12:24 2014 +0100

ARM reduc_plus_scal, V_elem not V_ext, rm old reduc_[us]plus, emit the extract!

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 41cf913..d13fe5d 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1349,33 +1349,47 @@
 
 ;; Reduction operations
 
-(define_expand "reduc_splus_"
-  [(match_operand:VD 0 "s_register_operand" "")
+(define_expand "reduc_plus_scal_"
+  [(match_operand: 0 "nonimmediate_operand" "")
(match_operand:VD 1 "s_register_operand" "")]
   "TARGET_NEON && (! || flag_unsafe_math_optimizations)"
 {
-  neon_pairwise_reduce (operands[0], operands[1], mode,
+  rtx vec = gen_reg_rtx (mode);
+  neon_pairwise_reduce (vec, operands[1], mode,
 			&gen_neon_vpadd_internal);
+  /* The same result is actually computed into every element.  */
+  emit_insn (gen_vec_extract (operands[0], vec, const0_rtx));
   DONE;
 })
 
-(define_expand "reduc_splus_"
-  [(match_operand:VQ 0 "s_register_operand" "")
+(define_expand "reduc_plus_scal_"
+  [(match_operand: 0 "nonimmediate_operand" "")
(match_operand:VQ 1 "s_register_operand" "")]
   "TARGET_NEON && (! || flag_unsafe_math_optimizations)
&& !BYTES_BIG_ENDIAN"
 {
   rtx step1 = gen_reg_rtx (mode);
-  rtx res_d = gen_reg_rtx (mode);
 
   emit_insn (gen_quad_halves_plus (step1, operands[1]));
-  emit_insn (gen_reduc_splus_ (res_d, step1));
-  emit_insn (gen_move_lo_quad_ (operands[0], res_d));
+  emit_insn (gen_reduc_plus_scal_ (operands[0], step1));
+
+  DONE;
+})
+
+(define_expand "reduc_plus_scal_v2di"
+  [(match_operand:DI 0 "nonimmediate_operand" "=w")
+   (match_operand:V2DI 1 "s_register_operand" "")]
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
+{
+  rtx vec = gen_reg_rtx (V2DImode);
+
+  emit_insn (gen_arm_reduc_plus_internal_v2di (vec, operands[1]));
+  emit_insn (gen_vec_extractv2di (operands[0], vec, const0_rtx));
 
   DONE;
 })
 
-(define_insn "reduc_splus_v2di"
+(define_insn "arm_reduc_plus_internal_v2di"
   [(set (match_operand:V2DI 0 "s_register_operand" "=w")
 	(unspec:V2DI [(match_operand:V2DI 1 "s_register_operand" "w")]
 		 UNSPEC_VPADD))]
@@ -1384,17 +1398,6 @@
   [(set_attr "type" "neon_add_q")]
 )
 
-;; NEON does not distinguish between signed and unsigned addition except on
-;; widening operations.
-(define_expand "reduc_uplus_"
-  [(match_operand:VDQI 0 "s_register_operand" "")
-   (match_operand:VDQI 1 "s_register_operand" "")]
-  "TARGET_NEON && ( || !BYTES_BIG_ENDIAN)"
-{
-  emit_insn (gen_reduc_splus_ (operands[0], operands[1]));
-  DONE;
-})
-
 (define_expand "reduc_smin_"
   [(match_operand:VD 0 "s_register_operand" "")
(match_operand:VD 1 "s_register_operand" "")]

[PATCH v2 0-6/11] Fix PR/61114, make direct vector reductions endianness-neutral

2014-10-24 Thread Alan Lawrence
This is the first half of my previous patch series 
(https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01456.html), that is the part 
making the REDUC_..._EXPR tree codes endian-neutral, and adding a new 
reduce-to-scalar optab in place of the endianness-dependent 
reduc_[us](plus|min|max)_optab.


I'm leaving the vec_shr portion out of this patch series, as the link between 
the two halves is only the end goal of removing an "if (BYTES_BIG_ENDIAN)" from 
tree-vect-loop.c; this series removes that from one code path so can stand alone.


Patches 1-6 are as previously posted apart from rebasing and removing the 
old/poisoned AArch64 patterns as per maintainer's request. Patches 1, 2, 4, 5 
and 6 have already been approved; patch 3 was discussed somewhat but I think we 
decided against most of the ideas raised, I have added comment to 
scalar_reduc_to_vector. I now reread Richie's "Otherwise the patch looks good to 
me" and wonder if I should have taken that as an approval but I didn't read it 
that way at the time...???


Patches 7-11 migrate migrate ARM, x86, IA64 (I think), and mostly PowerPC, to 
the new reduc_(plus|[us](min|max))_scal_optab. I have not managed to work out 
how to do the same for MIPS (specifically what I need to add to 
mips_expand_vec_reduc), and have had no response from the maintainers, so am 
leaving that for now. Also I haven't migrated (or worked out how to target) 
rs6000/paired.md, help would be most welcome.



The suggestion was then to "complete" the migration, by removing the old optabs. 
There are a few options here and I'll follow up with appropriate patches 
according to feedback received. I see options:


(1) just delete the old optabs (and the migration code). This would 
performance-regress the MIPS backend, but should not break it, although one 
should really do *something* with the then-unused reduc_[us](plus|min|max)_optab 
in config/mips/loongson.md.


(2) also renaming reduc_..._scal_optab back to reduc_..._optab; would break the 
MIPS backend if something were not done with it's existing patterns.


(2a) Alternatively I could just use a different new name, e.g. reduce_, 
reduct_, vec_reduc_..., anything that's less of a mouthful than 
reduc_..._scal. Whilst being only-very-slightly-different from the current 
reduc_... might be confusing, so might changing the meaning of the optab, and 
its signature, with the existing name, so am open to suggestions?


Cheers, Alancommit 9819291c17610dcdcca19a3d9ea3a4260df0577e
Author: Alan Lawrence 
Date:   Thu Aug 21 13:05:43 2014 +0100

Temporarily remove gimple_fold

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 3dba1b2..a49da89 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -1188,6 +1188,9 @@ aarch64_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED, tree *args,
   return NULL_TREE;
 }
 
+/* Handling of reduction operations temporarily removed so as to decouple
+   changes to tree codes from AArch64 NEON Intrinsics.  */
+#if 0
 bool
 aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 {
@@ -1259,6 +1262,7 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 
   return changed;
 }
+#endif
 
 void
 aarch64_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index db5ff59..27d82f3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10015,8 +10015,8 @@ aarch64_asan_shadow_offset (void)
 #undef TARGET_FRAME_POINTER_REQUIRED
 #define TARGET_FRAME_POINTER_REQUIRED aarch64_frame_pointer_required
 
-#undef TARGET_GIMPLE_FOLD_BUILTIN
-#define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin
+//#undef TARGET_GIMPLE_FOLD_BUILTIN
+//#define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin
 
 #undef TARGET_GIMPLIFY_VA_ARG_EXPR
 #define TARGET_GIMPLIFY_VA_ARG_EXPR aarch64_gimplify_va_arg_exprcommit bf6d5d32c552ce1c6ccd890f501db4f39291088f
Author: Alan Lawrence 
Date:   Tue Jul 29 11:46:01 2014 +0100

Make tree codes produce scalar, with NOP_EXPRs. (tree-vect-loop.c mess)

diff --git a/gcc/expr.c b/gcc/expr.c
index a6233f3..c792028 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9044,7 +9044,17 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
   {
 op0 = expand_normal (treeop0);
 this_optab = optab_for_tree_code (code, type, optab_default);
-temp = expand_unop (mode, this_optab, op0, target, unsignedp);
+enum machine_mode vec_mode = TYPE_MODE (TREE_TYPE (treeop0));
+temp = expand_unop (vec_mode, this_optab, op0, NULL_RTX, unsignedp);
+gcc_assert (temp);
+/* The tree code produces a scalar result, but (somewhat by convention)
+   the optab produces a vector with the result in element 0 if
+   little-endian, or element N-1 if big-endian.  So pull the scalar
+   result out of that eleme

Re: [PATCH] Relax check against commuting XOR and ASHIFTRT in combine.c

2014-10-24 Thread Alan Lawrence



Rainer Orth wrote:

However, as a quick first step, does adding the ilp32 / lp64 (and keeping
the architectures list for now) solve the immediate problem? Patch
attached, OK for trunk?


No, as I said this is wrong for biarch targets like sparc and i386.


When you say no this does not solve the immediate problem, are you saying that 
you are (still) seeing test failures with the require-effective-target patch 
applied? Or is the issue that this would not execute the tests as widely as 
might be possible? In principle I'm quite happy to relax the target patterns, 
although have been having issues with sparc (below)...


Re. "what the architectures have in common" is largely that these are the 
primary/secondary archs on which I've checked the test passes! I can now add 
mips and microblaze to this list, however I'm nervous of dropping the target 
entirely given the very large number of target architectures gcc supports; and 
e.g. IA64 (in ILP32 mode) generates an ashiftrt:DI by 31 places, not 
ashiftrt:SI, which does not match the simplification criteria in combine.c.


This should be something like 


  { target aarch64*-*-* i?86-*-* powerpc*-*-* sparc*-*-* x86_64-*-* }

E.g. sparc-sun-solaris2.11 with -m64 is lp64, but would be excluded by
your target list.  Keep the list sorted alphabetically and best add an
explanation so others know what those targets have in common.


So I've built a stage-1 compiler with --target=sparc-sun-solaris2.11, and I find

  * without -m64, my "dg-require-effective-target ilp32" causes the 32-bit test 
to execute, and pass; "dg-require-effective-target lp64" prevents execution of 
the 64-bit test (which would fail) - so all as expected and desired.


  * with -lp64, behaviour is as previous (this is probably expected)

  * with -m64, "dg-require-effective-target ilp32" still causes the test to 
execute (but it fails, as the RTL now has an ashiftrt:DI by 31 places, which 
doesn't meet the simplification criteria in combine.c - this is pretty much as 
expected). "dg-require-effective-target lp64" stops the 64-bit test from 
executing however (despite that it would now pass).


Can you clarify what I should be doing on sparc, therefore?

Thanks for your help!

Alan



Re: [PATCH][ARM] gnu11 cleanup for aapcs testcases

2014-10-24 Thread Marek Polacek
On Fri, Oct 24, 2014 at 12:48:24PM +0100, Jiong Wang wrote:
> a furhter cleanup under aapcs sub-directory.
> 
> ok for trunk?
> 
> gcc/testsuite/
>   * gcc.target/arm/aapcs/abitest.h: Declare memcpy.

> diff --git a/gcc/testsuite/gcc.target/arm/aapcs/abitest.h 
> b/gcc/testsuite/gcc.target/arm/aapcs/abitest.h
> index 06a92c3..7bce58b 100644
> --- a/gcc/testsuite/gcc.target/arm/aapcs/abitest.h
> +++ b/gcc/testsuite/gcc.target/arm/aapcs/abitest.h
> @@ -49,6 +49,8 @@
>  
>  
>  extern void abort (void);
> +typedef unsigned int size_t;
> +extern int memcmp (const void *s1, const void *s2, size_t n);

You can use __SIZE_TYPE__ and then you don't need the typedef.

Marek


[PATCH][ARM] gnu11 cleanup for aapcs testcases

2014-10-24 Thread Jiong Wang

a furhter cleanup under aapcs sub-directory.

ok for trunk?

gcc/testsuite/
  * gcc.target/arm/aapcs/abitest.h: Declare memcpy.
diff --git a/gcc/testsuite/gcc.target/arm/aapcs/abitest.h b/gcc/testsuite/gcc.target/arm/aapcs/abitest.h
index 06a92c3..7bce58b 100644
--- a/gcc/testsuite/gcc.target/arm/aapcs/abitest.h
+++ b/gcc/testsuite/gcc.target/arm/aapcs/abitest.h
@@ -49,6 +49,8 @@
 
 
 extern void abort (void);
+typedef unsigned int size_t;
+extern int memcmp (const void *s1, const void *s2, size_t n);
 
 __attribute__((naked))  void dumpregs () __asm("myfunc");
 __attribute__((naked))  void dumpregs ()

[PATCH][ARM] revert changes on check_effective_target_arm_*_ok

2014-10-24 Thread Jiong Wang

we should not add explicit declaration there.

arm_neon.h contains those prototype already. they will be available if the
compiler configuration is with related builtin predefine, for example 
__ARM_FEATURE_CRYPTO.

so, actually, if there is any warning when compile these test programs, they 
are expected,
and we rely on these warnings to check whether certain features are available.

previously, I only verified on arm-none-linux-gnueabi cross check, so have not 
exposed
these regressions.

no verified on arm-none-linux-gnueabihf, regression gone away on arm directory.

make check RUNTESTFLAGS="aapcs.exp neon.exp acle.exp simd.exp arm.exp"

ok for trunk?

gcc/testsuite/

* lib/target-supports.exp
(check_effective_target_arm_crypto_ok_nocache): Remove declaration for
vaeseq_u8.
(check_effective_target_arm_neon_fp16_ok_nocache): Remove declaration 
for
vcvt_f16_f32.
(check_effective_target_arm_neonv2_ok_nocache): Remove declaration for
vfma_f32.
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 91460c2..4398345 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2404,7 +2404,6 @@ proc check_effective_target_arm_crypto_ok_nocache { } {
 	foreach flags {"" "-mfloat-abi=softfp" "-mfpu=crypto-neon-fp-armv8" "-mfpu=crypto-neon-fp-armv8 -mfloat-abi=softfp"} {
 	if { [check_no_compiler_messages_nocache arm_crypto_ok object {
 		#include "arm_neon.h"
-		extern uint8x16_t vaeseq_u8 (uint8x16_t, uint8x16_t);
 		uint8x16_t
 		foo (uint8x16_t a, uint8x16_t b)
 		{
@@ -2549,7 +2548,6 @@ proc check_effective_target_arm_neon_fp16_ok_nocache { } {
 	   "-mfpu=neon-fp16 -mfloat-abi=softfp"} {
 	if { [check_no_compiler_messages_nocache arm_neon_fp_16_ok object {
 		#include "arm_neon.h"
-		extern float16x4_t vcvt_f16_f32 (float32x4_t);
 		float16x4_t
 		foo (float32x4_t arg)
 		{
@@ -2625,7 +2623,6 @@ proc check_effective_target_arm_neonv2_ok_nocache { } {
 	foreach flags {"" "-mfloat-abi=softfp" "-mfpu=neon-vfpv4" "-mfpu=neon-vfpv4 -mfloat-abi=softfp"} {
 	if { [check_no_compiler_messages_nocache arm_neonv2_ok object {
 		#include "arm_neon.h"
-		extern float32x2_t vfma_f32 (float32x2_t, float32x2_t, float32x2_t);
 		float32x2_t 
 		foo (float32x2_t a, float32x2_t b, float32x2_t c)
 {

Re: [PATCH][3,4,5/n] Merge from match-and-simplify, fold, fold_stmt and first patterns

2014-10-24 Thread Richard Biener
On Fri, 24 Oct 2014, Marc Glisse wrote:

> 
> > + /* Same applies to modulo operations, but fold is inconsistent here
> > +and simplifies 0 % x to 0, only preserving literal 0 % 0.  */
> > + (for op (ceil_mod floor_mod round_mod trunc_mod)
> > +  /* 0 % X is always zero.  */
> > +  (simplify
> > +   (trunc_mod integer_zerop@0 @1)
> > +   /* But not for 0 % 0 so that we can get the proper warnings and errors.
> > */
> > +   (if (!integer_zerop (@1))
> > +@0))
> > +  /* X % 1 is always zero.  */
> > +  (simplify
> > +   (trunc_mod @0 integer_onep)
> > +   { build_zero_cst (type); }))
> 
> "op" is unused, you probably meant to replace trunc_mod with it.

Oh, indeed.  I'll fix that up next week (heh - sth for a first warning
from genmatch!).

Thanks,
Richard.


Re: [PATCH][3,4,5/n] Merge from match-and-simplify, fold, fold_stmt and first patterns

2014-10-24 Thread Marc Glisse



+ /* Same applies to modulo operations, but fold is inconsistent here
+and simplifies 0 % x to 0, only preserving literal 0 % 0.  */
+ (for op (ceil_mod floor_mod round_mod trunc_mod)
+  /* 0 % X is always zero.  */
+  (simplify
+   (trunc_mod integer_zerop@0 @1)
+   /* But not for 0 % 0 so that we can get the proper warnings and errors.  */
+   (if (!integer_zerop (@1))
+@0))
+  /* X % 1 is always zero.  */
+  (simplify
+   (trunc_mod @0 integer_onep)
+   { build_zero_cst (type); }))


"op" is unused, you probably meant to replace trunc_mod with it.

--
Marc Glisse


Re: [PATCH] Fix genmatch linking

2014-10-24 Thread Richard Biener
On Fri, 24 Oct 2014, Rainer Orth wrote:

> Richard Biener  writes:
> 
> > Dominique reported that this fails for system libiconv but built libintl.
> >
> > Which might be fixed by the following.  Does that still work for you?
> 
> It does: an i386-pc-solaris2.10 bootstrap has finished by now and make
> check is running.

Dominique reported an ok as well.  Bootstrapped myself on
x86_64-unknown-linux-gnu and commited as r216632.

Richard.

2014-10-24  Richard Biener  

* Makefile.in (BUILD_CPPLIB): Move $(LIBINTL) $(LIBICONV)
to genmatch BUILD_LIBS instead.

Index: gcc/Makefile.in
===
--- gcc/Makefile.in (revision 216626)
+++ gcc/Makefile.in (working copy)
@@ -981,15 +981,6 @@ else
 LIBIBERTY = ../libiberty/libiberty.a
 BUILD_LIBIBERTY = $(build_libobjdir)/libiberty/libiberty.a
 endif
-# For stage1 and when cross-compiling use the build libcpp which is
-# built with NLS disabled.  For stage2+ use the host library and
-# its dependencies.
-ifeq ($(build_objdir),$(build_libobjdir))
-BUILD_CPPLIB = $(build_libobjdir)/libcpp/libcpp.a
-else
-BUILD_CPPLIB = $(CPPLIB) $(LIBIBERTY) $(LIBINTL) $(LIBICONV)
-build/genmatch$(build_exeext): BUILD_LIBDEPS += $(LIBINTL_DEP) $(LIBICONV_DEP)
-endif
 
 # Dependencies on the intl and portability libraries.
 LIBDEPS= libcommon.a $(CPPLIB) $(LIBIBERTY) $(LIBINTL_DEP) $(LIBICONV_DEP) \
@@ -2529,6 +2520,17 @@ genprog = $(genprogerr) check checksum c
 # These programs need libs over and above what they get from the above list.
 build/genautomata$(build_exeext) : BUILD_LIBS += -lm
 
+# For stage1 and when cross-compiling use the build libcpp which is
+# built with NLS disabled.  For stage2+ use the host library and
+# its dependencies.
+ifeq ($(build_objdir),$(build_libobjdir))
+BUILD_CPPLIB = $(build_libobjdir)/libcpp/libcpp.a
+else
+BUILD_CPPLIB = $(CPPLIB) $(LIBIBERTY)
+build/genmatch$(build_exeext): BUILD_LIBDEPS += $(LIBINTL_DEP) $(LIBICONV_DEP)
+build/genmatch$(build_exeext): BUILD_LIBS += $(LIBINTL) $(LIBICONV)
+endif
+
 build/genmatch$(build_exeext) : $(BUILD_CPPLIB) \
   $(BUILD_ERRORS) build/vec.o build/hash-table.o
 


Re: [PATCH][AArch64] LINK_SPEC changes for Cortex-A53 erratum 835769 workaround

2014-10-24 Thread Jakub Jelinek
On Fri, Oct 24, 2014 at 12:04:52PM +0100, Marcus Shawcroft wrote:
> On 22 October 2014 15:20, Kyrill Tkachov  wrote:
> > Hi all,
> >
> > This patch contains the LINK_SPEC changes required to pass on the linker
> > option --fix-cortex-a53-835769 when compiling with -mfix-cortex-a53-835769
> > (or by default when configured with --enable-fix-cortex-a53-835769).
> >
> > This requires a binutils installation with the patch posted at
> > https://sourceware.org/ml/binutils/2014-10/msg00198.html applied.
> >
> >
> > Bootstrapped and tested on aarch64-none-linux-gnu and built various
> > benchmarks.
> > This patch applies to 4.9 (4.8 version will be posted separately) and has
> > been tested there as well.
> >
> > Ok for trunk and 4.9?
> 
> The corresponding binutils changes are  committed on binutils trunk,
> 2.25 and 2.24.
> 
> The trunk patch is OK.
> 
> Given that Jakub is in the process of preparing a  4.9.2 I'd like an
> explicit OK before we commit on 4.9. Jakub?

Is that a regression on the 4.9 branch?  If not, I'd prefer if it could wait
for 4.9.3.

Jakub


[match-and-simplify] Merge from trunk

2014-10-24 Thread Richard Biener

2014-10-24  Richard Biener  

Merge from trunk r216543 through r216631.

Brings back second merge piece.



Re: [PATCH][AArch64] LINK_SPEC changes for Cortex-A53 erratum 835769 workaround

2014-10-24 Thread Marcus Shawcroft
On 22 October 2014 15:20, Kyrill Tkachov  wrote:
> Hi all,
>
> This patch contains the LINK_SPEC changes required to pass on the linker
> option --fix-cortex-a53-835769 when compiling with -mfix-cortex-a53-835769
> (or by default when configured with --enable-fix-cortex-a53-835769).
>
> This requires a binutils installation with the patch posted at
> https://sourceware.org/ml/binutils/2014-10/msg00198.html applied.
>
>
> Bootstrapped and tested on aarch64-none-linux-gnu and built various
> benchmarks.
> This patch applies to 4.9 (4.8 version will be posted separately) and has
> been tested there as well.
>
> Ok for trunk and 4.9?

The corresponding binutils changes are  committed on binutils trunk,
2.25 and 2.24.

The trunk patch is OK.

Given that Jakub is in the process of preparing a  4.9.2 I'd like an
explicit OK before we commit on 4.9. Jakub?

Cheers
/Marcus


Re: [PATCH] Fix genmatch linking

2014-10-24 Thread Rainer Orth
Richard Biener  writes:

> Dominique reported that this fails for system libiconv but built libintl.
>
> Which might be fixed by the following.  Does that still work for you?

It does: an i386-pc-solaris2.10 bootstrap has finished by now and make
check is running.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH][AArch64][4.8] Add --enable-fix-cortex-a53-835769 configure option

2014-10-24 Thread Marcus Shawcroft
On 17 October 2014 16:55, Kyrill Tkachov  wrote:
> Hi all,
>
> This is the 4.8 backport of the configure option
> --enable-fix-cortex-a53-835769 to enable the workaround
> for the Cortex-A53 erratum 835769 by default. The patch is very similar to
> the trunk version, just some
> differences in the placement of the relevant sections.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Ok for the 4.8 branch together with the -mfix-cortex-a53-835769 option
> backport?

OK /Marcus


[PATCH][3,4,5/n] Merge from match-and-simplify, fold, fold_stmt and first patterns

2014-10-24 Thread Richard Biener

This combines the already posted 3/n (first simple patterns),
4/n (hook into fold-const.c) and not yet posted 5/n (hook into
fold_stmt).  Over the first posting this also contains
recent improvements to the generator from the branch regarding
to TREE_SIDE_EFFECTS and NON_LVALUE_EXPR handling.

The hook into fold_stmt leaves all existing calls in doing
what fold_stmt does currently (the match-and-simplify machinery
will not follow SSA edges).  It adds the ability to enable
that though via an overload taking a valueization hook as
argument (this is how tree-ssa-forwprop.c will exercise it).

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Thanks,
Richard.

2014-10-24  Richard Biener  

* genmatch.c (expr::gen_transform): Use fold_buildN_loc
and build_call_expr_loc.
(dt_simplify::gen): Drop non_lvalue for GIMPLE, use
non_lvalue_loc to build it for GENERIC.
(decision_tree::gen_generic): Add location argument to
generic_simplify prototype.
(capture_info): New class.
(capture_info::capture_info): New constructor.
(capture_info::walk_match): New method.
(capture_info::walk_result): New method.
(capture_info::walk_c_expr): New method.
(dt_simplify::gen): Handle preserving side-effects for
GENERIC code generation.
(decision_tree::gen_generic): Do not reject operands
with TREE_SIDE_EFFECTS.
* generic-match.h: New file.
* generic-match-head.c: Include generic-match.h, not gimple-match.h.
* match.pd: Add some constant folding patterns from fold-const.c.
* fold-const.c: Include generic-match.h.
(fold_unary_loc): Dispatch to generic_simplify.
(fold_ternary_loc): Likewise.
(fold_binary_loc): Likewise.  Remove patterns now implemented
by generic_simplify.
* gimple-fold.c (replace_stmt_with_simplification): New function.
(fold_stmt_1): Add valueize parameter, dispatch to gimple_simplify.
(no_follow_ssa_edges): New function.
(fold_stmt): New overload with valueization hook.  Use
no_follow_ssa_edges for the overload without hook.
(fold_stmt_inplace): Likewise.
* gimple-fold.h (no_follow_ssa_edges): Declare.

Index: gcc/generic-match.h
===
*** /dev/null   1970-01-01 00:00:00.0 +
--- gcc/generic-match.h 2014-10-23 15:45:28.322836040 +0200
***
*** 0 
--- 1,33 
+ /* Generic simplify definitions.
+ 
+Copyright (C) 2011-2014 Free Software Foundation, Inc.
+Contributed by Richard Guenther 
+ 
+ This file is part of GCC.
+ 
+ GCC is free software; you can redistribute it and/or modify it under
+ the terms of the GNU General Public License as published by the Free
+ Software Foundation; either version 3, or (at your option) any later
+ version.
+ 
+ GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+ WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ for more details.
+ 
+ You should have received a copy of the GNU General Public License
+ along with GCC; see the file COPYING3.  If not see
+ .  */
+ 
+ #ifndef GCC_GENERIC_MATCH_H
+ #define GCC_GENERIC_MATCH_H
+ 
+ /* Note the following functions are supposed to be only used from
+fold_unary_loc, fold_binary_loc and fold_ternary_loc respectively.
+They are not considered a public API.  */
+ 
+ tree generic_simplify (location_t, enum tree_code, tree, tree);
+ tree generic_simplify (location_t, enum tree_code, tree, tree, tree);
+ tree generic_simplify (location_t, enum tree_code, tree, tree, tree, tree);
+ 
+ #endif  /* GCC_GENERIC_MATCH_H */
Index: gcc/generic-match-head.c
===
*** gcc/generic-match-head.c.orig   2014-10-23 15:45:26.935836135 +0200
--- gcc/generic-match-head.c2014-10-23 15:45:28.322836040 +0200
*** along with GCC; see the file COPYING3.
*** 43,48 
  #include "tree-phinodes.h"
  #include "ssa-iterators.h"
  #include "dumpfile.h"
! #include "gimple-match.h"
  
  
--- 43,48 
  #include "tree-phinodes.h"
  #include "ssa-iterators.h"
  #include "dumpfile.h"
! #include "generic-match.h"
  
  
Index: gcc/fold-const.c
===
*** gcc/fold-const.c.orig   2014-10-23 15:44:38.601839463 +0200
--- gcc/fold-const.c2014-10-23 15:45:51.976834411 +0200
*** along with GCC; see the file COPYING3.
*** 70,75 
--- 70,76 
  #include "hash-table.h"  /* Required for ENABLE_FOLD_CHECKING.  */
  #include "builtins.h"
  #include "cgraph.h"
+ #include "generic-match.h"
  
  /* Nonzero if we are folding constants inside an initializer; zero
 otherwise.  */
*** fold_unary_loc (location_t loc, enum tre
*** 7564,7569 **

Re: [PATCH][AArch64][4.8] Backport Cortex-A53 erratum 835769 workaround

2014-10-24 Thread Marcus Shawcroft
On 17 October 2014 16:55, Kyrill Tkachov  wrote:
> Hi all,
>
> This is the 4.8 backport of the Cortex-A53 erratum 835769 workaround.
> 4.8 doesn't have rtx_insns and the type attributes are different.
> Other than that there's not much different from the trunk version.
>
> Bootstrapped and tested on aarch64-none-linux-gnu with and without the
> workaround enabled.
> Compiled various large benchmarks with it.
>
> Ok for the 4.8 branch?

OK /Marcus


  1   2   >