date:20130911

RE: [PATCH GCC]Catch more MEM_REFs sharing common addressing part in gimple strength reduction

2013-09-11 Thread bin.cheng


On Tue, Sep 10, 2013 at 9:30 PM, Bill Schmidt  
wrote:
>
>
> On Tue, 2013-09-10 at 15:41 +0800, bin.cheng wrote:
>> On Mon, Sep 9, 2013 at 11:35 PM, Bill Schmidt  
>> wrote:
>> >
>> >> > I rely on size_binop to convert T2 into sizetype, because T2' may be in 
>> >> > other kind of type.  Otherwise there will be ssa_verify error later.
>> >>
>> >> OK, I see now.  I had thought this was handled by fold_build2, but
>> >> apparently not.  I guess all T2's formerly handled were already sizetype
>> >> as expected.  Thanks for the explanation!
>> >
>> > So, wouldn't it suffice to change t2 to fold_convert (sizetype, t2) in
>> > the argument list to fold_build2?  It's picking nits, but that would be
>> > slightly more efficient.
>>
>> Hi Bill,
>>
>> This is the 2nd version of patch with your comments incorporated.
>> Bootstrap and re-test on x86.  Re-test on ARM ongoing.  Is it ok if tests 
>> pass?
>
> Looks good to me!  Thanks, Bin.
>

Sorry I have to hold on this patch since it causes several tests failed on ARM. 
 Will investigate it and get back ASAP.

Thanks.
bin

Re: [patch][PR/42955] Don't install $(target)/bin/gcc, gfortran, etc.

2013-09-11 Thread Steve Kargl

On Wed, Sep 11, 2013 at 09:54:57PM -0700, Brooks Moses wrote:
> Ping^3?
> 

There is little to no Fortran specific content, which 
requires a Fortran review.  If no one steps up, just
commit the change.

-- 
Steve

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-11 Thread Wei Mi

Thanks! Your method to adjust 'last' is more concise. I try it and it
works for small testcases. bootstrap and regression are ok. More
performance test is going on.

I agree with you that explicit handling in sched-deps.c for this
feature looks not good. So I move it to sched_init (Instead of
ix86_sched_init_global because ix86_sched_init_global is used to
install scheduling hooks), and then it is possible for other
architectures to use it.
I also need the two hooks because one is used as the gate for
macro-fusion controlled by -mtune-ctrl=fuse_cmp_and_branch on x86, and
the other is used to check for which kind of cmp and branch pair
macro-fusion is supported on target platform. But I am not sure if it
is proper to put those two hooks under TARGET_SCHED hook vector.

Thanks,
Wei Mi.

updated patch:

Index: doc/tm.texi.in
===
--- doc/tm.texi.in  (revision 201771)
+++ doc/tm.texi.in  (working copy)
@@ -6455,6 +6455,10 @@ scheduling one insn causes other insns t
 cycle.  These other insns can then be taken into account properly.
 @end deftypefn

+@hook TARGET_SCHED_MACRO_FUSION_P
+
+@hook TARGET_SCHED_MACRO_FUSION_PAIR_P
+
 @hook TARGET_SCHED_DEPENDENCIES_EVALUATION_HOOK
 This hook is called after evaluation forward dependencies of insns in
 chain given by two parameter values (@var{head} and @var{tail}
Index: doc/tm.texi
===
--- doc/tm.texi (revision 201771)
+++ doc/tm.texi (working copy)
@@ -6551,6 +6551,17 @@ scheduling one insn causes other insns t
 cycle.  These other insns can then be taken into account properly.
 @end deftypefn

+@deftypefn {Target Hook} bool TARGET_SCHED_MACRO_FUSION_P (void)
+This hook is used to check whether target platform supports macro fusion.
+@end deftypefn
+
+@deftypefn {Target Hook} bool TARGET_SCHED_MACRO_FUSION_PAIR_P (rtx
@var{condgen}, rtx @var{condjmp})
+This hook is used to check whether two insns could be macro fused for
+target microarchitecture. If this hook returns true for the given insn pair
+(@var{condgen} and @var{condjmp}), scheduler will put them into a sched
+group, and they will not be scheduled apart.
+@end deftypefn
+
 @deftypefn {Target Hook} void
TARGET_SCHED_DEPENDENCIES_EVALUATION_HOOK (rtx @var{head}, rtx
@var{tail})
 This hook is called after evaluation forward dependencies of insns in
 chain given by two parameter values (@var{head} and @var{tail}
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 201771)
+++ config/i386/i386.c  (working copy)
@@ -2004,7 +2004,7 @@ static unsigned int initial_ix86_tune_fe
   /* X86_TUNE_FUSE_CMP_AND_BRANCH: Fuse a compare or test instruction
  with a subsequent conditional jump instruction into a single
  compare-and-branch uop.  */
-  m_BDVER,
+  m_COREI7 | m_BDVER,

   /* X86_TUNE_OPT_AGU: Optimize for Address Generation Unit. This flag
  will impact LEA instruction selection. */
@@ -24845,6 +24845,99 @@ ia32_multipass_dfa_lookahead (void)
 }
 }

+/* Return true if target platform supports macro-fusion.  */
+
+static bool
+ix86_macro_fusion_p ()
+{
+  if (TARGET_FUSE_CMP_AND_BRANCH)
+return true;
+  else
+return false;
+}
+
+/* Check whether current microarchitecture support macro fusion
+   for insn pair "CONDGEN + CONDJMP". Refer to
+   "Intel Architectures Optimization Reference Manual". */
+
+static bool
+ix86_macro_fusion_pair_p (rtx condgen, rtx condjmp)
+{
+  rtx src;
+  if (!strcmp (ix86_tune_string, "corei7"))
+{
+  /* For Nehalem.  */
+  rtx single_set = single_set (condgen);
+  /* Nehalem doesn't support macro-fusion for add/sub+jmp.  */
+  if (single_set == NULL_RTX)
+return false;
+
+  src = SET_SRC (single_set);
+  if (GET_CODE (src) != COMPARE)
+   return false;
+
+  /* Nehalem doesn't support macro-fusion for cmp/test MEM-IMM
+insn pattern.  */
+  if ((MEM_P (XEXP (src, 0))
+  && CONST_INT_P (XEXP (src, 1)))
+ || (MEM_P (XEXP (src, 1))
+ && CONST_INT_P (XEXP (src, 0
+   return false;
+
+  /* Nehalem doesn't support macro-fusion for add/sub/dec/inc + jmp.  */
+  if (get_attr_type (condgen) != TYPE_TEST
+ && get_attr_type (condgen) != TYPE_ICMP)
+   return false;
+  return true;
+}
+  else if (!strcmp (ix86_tune_string, "corei7-avx"))
+{
+  /* For Sandybridge.  */
+  enum rtx_code ccode;
+  rtx compare_set = NULL_RTX, test_if, cond;
+  rtx single_set = single_set (condgen);
+  if (single_set != NULL_RTX)
+compare_set = single_set;
+  else
+   {
+ int i;
+ rtx pat = PATTERN (condgen);
+ for (i = 0; i < XVECLEN (pat, 0); i++)
+   if (GET_CODE (XVECEXP (pat, 0, i)) == SET
+   && GET_CODE (SET_SRC (XVECEXP (pat, 0, i))) == COMPARE)
+ compare_set = XVECEXP (p

Re: [patch][PR/42955] Don't install $(target)/bin/gcc, gfortran, etc.

2013-09-11 Thread Brooks Moses

Ping^3?

Joseph, I'd been cc'ing you on this because it's driver-related and I
didn't find a more-obvious reviewer.  Is there someone else I should
be asking to review it?  Alternately, is this a change that should be
discussed on gcc@ before having the actual patch reviewed?

On Tue, Sep 3, 2013 at 9:32 AM, Brooks Moses  wrote:
> Ping^2?
>
> On 08/22/2013 02:00 PM, Brooks Moses wrote:
>> Ping?
>>
>> On 08/08/2013 02:10 PM, Brooks Moses wrote:
>>> As discussed in PR/42955, when GCC is built as a cross-compiler, it
>>> will install "gcc", "g++", "c++", and "gfortran" binaries in
>>> $(target)/bin, as well as installing the $target-gcc and so forth in
>>> bin.  However, these binaries in $(target)/bin do not work; they
>>> cannot find libexec.
>>>
>>> More to the point, this bug has been open for three years with no
>>> traffic, and the failure started significantly before that.  Clearly,
>>> making these work is not a priority.  Further, these binaries are real
>>> files, not symlinks or hard links; they take up actual space.
>>>
>>> As discussed on the bug, Joseph argues that $(target)/bin "contains
>>> executables from binutils for internal use by GCC; that's its sole
>>> purpose. The files installed by GCC there aren't used by GCC (rather,
>>> the public installed copy of the driver gets used when collect2 needs
>>> to call back to the driver), so shouldn't be installed."
>>>
>>> Thus, this patch, which simply removes these broken executables.
>>> Tested by building a cross-compiler and confirming that they are gone,
>>> and by building a native compiler and confirming that the expected
>>> bin/gcc, bin/g++, bin/c++, and bin/gfortran are still present.
>>>
>>> Ok to commit?
>>>
>>> - Brooks
>>>
>>> 
>>> 2013-08-08  Brooks Moses  
>>>
>>>  PR driver/42955
>>>  * Makefile.in: Do not install driver binaries in $(target)/bin.
>>>
>>>  PR driver/42955
>>>  * Make-lang.in: Do not install driver binaries in $(target)/bin.
>>>
>>
>>
>

Re: [PATCH, PowerPC] Fix PR57949 (ABI alignment issue)

2013-09-11 Thread Alan Modra

On Wed, Sep 11, 2013 at 07:55:43AM -0500, Bill Schmidt wrote:
> On Wed, 2013-09-11 at 21:08 +0930, Alan Modra wrote:
> > On Wed, Aug 14, 2013 at 10:32:01AM -0500, Bill Schmidt wrote:
> > > This fixes a long-standing problem with GCC's implementation of the
> > > PPC64 ELF ABI.  If a structure contains a member requiring 128-bit
> > > alignment, and that structure is passed as a parameter, the parameter
> > > currently receives only 64-bit alignment.  This is an error, and is
> > > incompatible with correct code generated by the IBM XL compilers.
> > 
> > This caused multiple failures in the libffi testsuite:
> > libffi.call/cls_align_longdouble.c
> > libffi.call/cls_align_longdouble_split.c
> > libffi.call/cls_align_longdouble_split2.c
> > libffi.call/nested_struct5.c
> > 
> > Fixed by making the same alignment adjustment in libffi to structures
> > passed by value.  Bill, I think your patch needs to go on all active
> > gcc branches as otherwise we'll need different versions of libffi for
> > the next gcc releases.
> 
> Hm, the libffi case is unfortunate. :(
> 
> The alternative is to leave libffi alone, and require code that calls
> these interfaces with "bad" structs passed by value to be built using
> -mcompat-align-parm, which was provided for such compatibility issues.
> Hopefully there is a small number of cases where this can happen, and
> this could be documented with libffi and gcc.  What do you think?

We have precedent for compiling libffi based on gcc preprocessor
defines, eg. __NO_FPRS__, so here's a way of making upstream libffi
compatible with the various versions of gcc out there.  I've taken the
condition under which we align aggregates from
rs6000_function_arg_boundary, and defined a macro with a value of the
maximum alignment.

Bootstrapped and regression tested powerpc64-linux.  OK for mainline?

gcc/
* config/rs6000/rs6000-c.c (rs6000_cpu_cpp_builtins): Define
__STRUCT_PARM_ALIGN__.
libffi/
* src/powerpc/ffi.c (ffi_prep_args64): Align FFI_TYPE_STRUCT.
(ffi_closure_helper_LINUX64): Likewise.

Index: gcc/config/rs6000/rs6000-c.c
===
--- gcc/config/rs6000/rs6000-c.c(revision 202428)
+++ gcc/config/rs6000/rs6000-c.c(working copy)
@@ -473,6 +473,12 @@ rs6000_cpu_cpp_builtins (cpp_reader *pfile)
   if (TARGET_SOFT_FLOAT || !TARGET_FPRS)
 builtin_define ("__NO_FPRS__");
 
+  /* Whether aggregates passed by value are aligned to a 16 byte boundary
+ if their alignment is 16 bytes or larger.  */
+  if ((TARGET_MACHO && rs6000_darwin64_abi)
+  || (DEFAULT_ABI == ABI_AIX && !rs6000_compat_align_parm))
+builtin_define ("__STRUCT_PARM_ALIGN__=16");
+
   /* Generate defines for Xilinx FPU. */
   if (rs6000_xilinx_fpu) 
 {
Index: libffi/src/powerpc/ffi.c
===
--- libffi/src/powerpc/ffi.c(revision 202428)
+++ libffi/src/powerpc/ffi.c(working copy)
@@ -462,6 +462,9 @@ ffi_prep_args64 (extended_cif *ecif, unsigned long
 double **d;
   } p_argv;
   unsigned long gprvalue;
+#ifdef __STRUCT_PARM_ALIGN__
+  unsigned long align;
+#endif
 
   stacktop.c = (char *) stack + bytes;
   gpr_base.ul = stacktop.ul - ASM_NEEDS_REGISTERS64 - NUM_GPR_ARG_REGISTERS64;
@@ -532,6 +535,12 @@ ffi_prep_args64 (extended_cif *ecif, unsigned long
 #endif
 
case FFI_TYPE_STRUCT:
+#ifdef __STRUCT_PARM_ALIGN__
+ align = (*ptr)->alignment;
+ if (align > __STRUCT_PARM_ALIGN__)
+   align = __STRUCT_PARM_ALIGN__;
+ next_arg.ul = ALIGN (next_arg.ul, align);
+#endif
  words = ((*ptr)->size + 7) / 8;
  if (next_arg.ul >= gpr_base.ul && next_arg.ul + words > gpr_end.ul)
{
@@ -1349,6 +1358,9 @@ ffi_closure_helper_LINUX64 (ffi_closure *closure,
   long i, avn;
   ffi_cif *cif;
   ffi_dblfl *end_pfr = pfr + NUM_FPR_ARG_REGISTERS64;
+#ifdef __STRUCT_PARM_ALIGN__
+  unsigned long align;
+#endif
 
   cif = closure->cif;
   avalue = alloca (cif->nargs * sizeof (void *));
@@ -1399,6 +1411,12 @@ ffi_closure_helper_LINUX64 (ffi_closure *closure,
  break;
 
case FFI_TYPE_STRUCT:
+#ifdef __STRUCT_PARM_ALIGN__
+ align = arg_types[i]->alignment;
+ if (align > __STRUCT_PARM_ALIGN__)
+   align = __STRUCT_PARM_ALIGN__;
+ pst = ALIGN (pst, align);
+#endif
 #ifndef __LITTLE_ENDIAN__
  /* Structures with size less than eight bytes are passed
 left-padded.  */

-- 
Alan Modra
Australia Development Lab, IBM

Re: [PATCH] [vectorizer] Fixing a bug in tree-vect-patterns.c in GCC vectorizer.

2013-09-11 Thread Xinliang David Li

Can you add a test case to the regression suite?

When the type of arguments are unsigned short/unsigned int, GCC does
not vectorize the loop anymore -- this is worth a separate bug to
track. punpcklwd instruction can be used to do zero extension of the
short type.

David

On Wed, Sep 11, 2013 at 6:16 PM, Cong Hou  wrote:
> Hi
>
> There is a bug in the function vect_recog_dot_prod_pattern() in
> tree-vect-patterns.c. This function checks if a loop is of dot
> production pattern. Specifically, according to the comment of this
> function:
>
> /*
>  Try to find the following pattern:
>
>  type x_t, y_t;
>  TYPE1 prod;
>  TYPE2 sum = init;
>loop:
>  sum_0 = phi 
>  S1  x_t = ...
>  S2  y_t = ...
>  S3  x_T = (TYPE1) x_t;
>  S4  y_T = (TYPE1) y_t;
>  S5  prod = x_T * y_T;
>  [S6  prod = (TYPE2) prod;  #optional]
>  S7  sum_1 = prod + sum_0;
>
>where 'TYPE1' is exactly double the size of type 'type', and
> 'TYPE2' is the same size of 'TYPE1' or bigger. This is a special case
> of a reduction computation.
> */
>
> This function should check if x_t and y_t have the same type (type)
> which has the half size of TYPE1. The corresponding code is shown
> below:
>
>   oprnd0 = gimple_assign_rhs1 (stmt);
>   oprnd1 = gimple_assign_rhs2 (stmt);
>   if (!types_compatible_p (TREE_TYPE (oprnd0), prod_type) ||
> !types_compatible_p (TREE_TYPE (oprnd1), prod_type))
> return NULL;
>   if (!type_conversion_p (oprnd0, stmt, true, &half_type0,
> &def_stmt, &promotion) || !promotion)
> return NULL;
>   oprnd00 = gimple_assign_rhs1 (def_stmt);
>
> /*==V  see here! */
>   if (!type_conversion_p (oprnd0, stmt, true, &half_type1,
> &def_stmt, &promotion) || !promotion)
> return NULL;
>   oprnd01 = gimple_assign_rhs1 (def_stmt);
>   if (!types_compatible_p (half_type0, half_type1))
> return NULL;
>   if (TYPE_PRECISION (prod_type) != TYPE_PRECISION (half_type0) * 2)
> return NULL;
>
> Here the function uses x_T (oprnd0) to check the type of y_t, which is
> incorrect. The fix is simple: just replace it by oprnd1.
>
> The failed test case for this bug is shown below:
>
> int foo(short *a, int *b, int n) {
>   int sum = 0;
>   for (int i = 0; i < n; ++i)
> sum += a[i] * b[i];
>   return sum;
> }
>
>
> thanks,
> Cong
>
>
> Index: gcc/tree-vect-patterns.c
> ===
> --- gcc/tree-vect-patterns.c (revision 200988)
> +++ gcc/tree-vect-patterns.c (working copy)
> @@ -397,7 +397,7 @@ vect_recog_dot_prod_pattern (vec
>|| !promotion)
>  return NULL;
>oprnd00 = gimple_assign_rhs1 (def_stmt);
> -  if (!type_conversion_p (oprnd0, stmt, true, &half_type1, &def_stmt,
> +  if (!type_conversion_p (oprnd1, stmt, true, &half_type1, &def_stmt,
>  &promotion)
>|| !promotion)
>  return NULL;

[PATCH] [vectorizer] Fixing a bug in tree-vect-patterns.c in GCC vectorizer.

2013-09-11 Thread Cong Hou

Hi

There is a bug in the function vect_recog_dot_prod_pattern() in
tree-vect-patterns.c. This function checks if a loop is of dot
production pattern. Specifically, according to the comment of this
function:

/*
 Try to find the following pattern:

 type x_t, y_t;
 TYPE1 prod;
 TYPE2 sum = init;
   loop:
 sum_0 = phi 
 S1  x_t = ...
 S2  y_t = ...
 S3  x_T = (TYPE1) x_t;
 S4  y_T = (TYPE1) y_t;
 S5  prod = x_T * y_T;
 [S6  prod = (TYPE2) prod;  #optional]
 S7  sum_1 = prod + sum_0;

   where 'TYPE1' is exactly double the size of type 'type', and
'TYPE2' is the same size of 'TYPE1' or bigger. This is a special case
of a reduction computation.
*/

This function should check if x_t and y_t have the same type (type)
which has the half size of TYPE1. The corresponding code is shown
below:

  oprnd0 = gimple_assign_rhs1 (stmt);
  oprnd1 = gimple_assign_rhs2 (stmt);
  if (!types_compatible_p (TREE_TYPE (oprnd0), prod_type) ||
!types_compatible_p (TREE_TYPE (oprnd1), prod_type))
return NULL;
  if (!type_conversion_p (oprnd0, stmt, true, &half_type0,
&def_stmt, &promotion) || !promotion)
return NULL;
  oprnd00 = gimple_assign_rhs1 (def_stmt);

/*==V  see here! */
  if (!type_conversion_p (oprnd0, stmt, true, &half_type1,
&def_stmt, &promotion) || !promotion)
return NULL;
  oprnd01 = gimple_assign_rhs1 (def_stmt);
  if (!types_compatible_p (half_type0, half_type1))
return NULL;
  if (TYPE_PRECISION (prod_type) != TYPE_PRECISION (half_type0) * 2)
return NULL;

Here the function uses x_T (oprnd0) to check the type of y_t, which is
incorrect. The fix is simple: just replace it by oprnd1.

The failed test case for this bug is shown below:

int foo(short *a, int *b, int n) {
  int sum = 0;
  for (int i = 0; i < n; ++i)
sum += a[i] * b[i];
  return sum;
}


thanks,
Cong


Index: gcc/tree-vect-patterns.c
===
--- gcc/tree-vect-patterns.c (revision 200988)
+++ gcc/tree-vect-patterns.c (working copy)
@@ -397,7 +397,7 @@ vect_recog_dot_prod_pattern (vec
   || !promotion)
 return NULL;
   oprnd00 = gimple_assign_rhs1 (def_stmt);
-  if (!type_conversion_p (oprnd0, stmt, true, &half_type1, &def_stmt,
+  if (!type_conversion_p (oprnd1, stmt, true, &half_type1, &def_stmt,
 &promotion)
   || !promotion)
 return NULL;

Re: gcc_GAS_FLAGS: Add more gcc_cv_as_flags overrides

2013-09-11 Thread Joseph S. Myers

On Wed, 11 Sep 2013, Thomas Schwinge wrote:

> configure:23559: checking assembler for thread-local storage support

I don't expect anyone to work on this, but such a configure test is silly 
- it's a test where a failure is more likely to indicate the test is buggy 
than that the feature tested for is actually missing.  I think each GCC 
version should have a minimum version of GNU binutils, to be used except 
for a whitelist of targets where some other assembler/linker are known to 
be supported, and should check the binutils version number (unless 
configured for one of the special targets not to use GNU binutils) and 
fail if it's too old, but not check for assembler or linker features in 
cases where the binutils version requirement means they can be assumed.  
(If the relevant conditional code in GCC wouldn't be used anyway for the 
proprietary Unix targets with non-GNU system assembler/linker, it can be 
made unconditional.)

So only targets where TLS support was a recent addition to the GNU 
assembler, or those supporting a non-GNU assembler where TLS may or may 
not be supported, would need such a test at all.

-- 
Joseph S. Myers
jos...@codesourcery.com

[ping**(n+1)] reimplement -fstrict-volatile-bitfields, v3

2013-09-11 Thread Sandra Loosemore


Ping?

On 09/02/2013 12:56 PM, Sandra Loosemore wrote:

On 09/02/2013 03:10 AM, Richard Biener wrote:

Can someone, in a new thread, ping the patches that are still in
flight?  ISTR having approved bits of some patches before my leave.


Here's the current state of the patch set I put together.  I've lost
track of where the canonical version of Bernd's followup patch is.

On 07/09/2013 10:23 AM, Sandra Loosemore wrote:

On 06/30/2013 09:24 PM, Sandra Loosemore wrote:

Here is my third attempt at cleaning up -fstrict-volatile-bitfields.



Part 1 removes the warnings and packedp flag.  It is the same as in the
last version, and has already been approved.  I'll skip reposting it
since the patch is here already:

http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00908.html

Part 2 replaces parts 2, 3, and 4 in the last version.  I've re-worked
this code significantly to try to address Bernd Edlinger's comments on
the last version in PR56997.


Part 2:  http://gcc.gnu.org/ml/gcc-patches/2013-07/msg1.html


Part 3 is the test cases, which are the same as in the last version.
Nobody has reviewed these but I assume they are OK if Part 2 is
approved?

http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00912.html

Part 4 is new; it makes -fstrict-volatile-bitfields not be the default
for any target any more.  It is independent of the other changes.


Part 4:  http://gcc.gnu.org/ml/gcc-patches/2013-07/msg2.html


-Sandra


-Sandra

[PATCH] disable use_vector_fp_converts for m_CORE_ALL

2013-09-11 Thread Wei Mi

For the following testcase 1.c, on westmere and sandybridge,
performance with the option -mtune=^use_vector_fp_converts is better
(improves from 3.46s to 2.83s). It means cvtss2sd is often better than
unpcklps+cvtps2pd on recent x86 platforms.

1.c:
float total = 0.2;
int k = 5;

int main() {
 int i;

 for (i = 0; i < 10; i++) {
   total += (0.5 + k);
 }

 return total == 0.3;
}

assembly generated by gcc-r201963 without -mtune=^use_vector_fp_converts
.L2:
unpcklps%xmm0, %xmm0
subl$1, %eax
cvtps2pd%xmm0, %xmm0
addsd   %xmm1, %xmm0
unpcklpd%xmm0, %xmm0
cvtpd2ps%xmm0, %xmm0
jne .L2

assembly generated by gcc-r201963 with -mtune=^use_vector_fp_converts
.L2:
cvtss2sd%xmm0, %xmm0
subl$1, %eax
addsd   %xmm1, %xmm0
cvtsd2ss%xmm0, %xmm0
jne .L2

But for testcase 2.c (Thanks to Igor Zamyatin for the testcase),
performance with the option -mtune=^use_vector_fp_converts is worse.
Analysis to the assembly shows the performance degradation comes from
partial reg stall caused by cvtsd2ss. Adding pxor %xmm0, %xmm0 before
cvtsd2ss b(,%rdx,8), %xmm0 gets the performance back.

2.c:
double b[1024];

float a[1024];

int main()
{
int i;
for(i = 0 ; i < 1024 * 1024 * 256; i++)
  a[i & 1023] = a[i & 1023] * (float)b[i & 1023];
return (int)a[512];
}

without -mtune-crtl=^use_vector_fp_converts
.L2:
movl%eax, %edx
addl$1, %eax
andl$1023, %edx
cmpl$268435456, %eax
movsd   b(,%rdx,8), %xmm0
cvtpd2ps%xmm0, %xmm0==> without partial reg stall
because of movsd.
mulss   a(,%rdx,4), %xmm0
movss   %xmm0, a(,%rdx,4)
jne .L2

with -mtune-crtl=^use_vector_fp_converts
.L2:
movl%eax, %edx
addl$1, %eax
andl$1023, %edx
cmpl$268435456, %eax
cvtsd2ssb(,%rdx,8), %xmm0   ==> with partial reg
stall. Needs to insert "pxor %xmm0, %xmm0" before current insn.
mulss   a(,%rdx,4), %xmm0
movss   %xmm0, a(,%rdx,4)
jne .L2

So the patch is to turn off use_vector_fp_converts for m_CORE_ALL to
use cvtss2sd/cvtsd2ss directly,  and add "pxor %xmmreg %xmmreg" before
cvtss2sd/cvtsd2ss to break partial reg stall (similar as what r201308
does for cvtsi2ss/cvtsi2sd). bootstrap and regression pass. ok for
trunk?

Thanks,
Wei Mi.

2013-09-11  Wei Mi  

* config/i386/x86-tune.def (DEF_TUNE): Remove
m_CORE_ALL.
* config/i386/i386.md: Add define_peephole2 to
break partial reg stall for cvtss2sd/cvtsd2ss.

Index: config/i386/x86-tune.def
===
--- config/i386/x86-tune.def(revision 201963)
+++ config/i386/x86-tune.def(working copy)
@@ -189,7 +189,7 @@ DEF_TUNE (X86_TUNE_NOT_VECTORMODE, "not_
 /* X86_TUNE_USE_VECTOR_FP_CONVERTS: Prefer vector packed SSE conversion
from FP to FP. */
 DEF_TUNE (X86_TUNE_USE_VECTOR_FP_CONVERTS, "use_vector_fp_converts",
-  m_CORE_ALL | m_AMDFAM10 | m_GENERIC)
+  m_AMDFAM10 | m_GENERIC)
 /* X86_TUNE_USE_VECTOR_CONVERTS: Prefer vector packed SSE conversion
from integer to FP. */
 DEF_TUNE (X86_TUNE_USE_VECTOR_CONVERTS, "use_vector_converts", m_AMDFAM10)
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 201963)
+++ config/i386/i386.md (working copy)
@@ -5075,6 +5075,63 @@
   emit_move_insn (operands[0], CONST0_RTX (mode));
 })

+;; Break partial reg stall for cvtsd2ss.
+
+(define_peephole2
+  [(set (match_operand:SF 0 "register_operand")
+(float_truncate:SF
+ (match_operand:DF 1 "nonimmediate_operand")))]
+  "TARGET_SSE2 && TARGET_SSE_MATH
+   && TARGET_SSE_PARTIAL_REG_DEPENDENCY
+   && optimize_function_for_speed_p (cfun)
+   && reload_completed && SSE_REG_P (operands[0])
+   && peep2_reg_dead_p (0, operands[0])
+   && (!SSE_REG_P (operands[1])
+   || REGNO (operands[0]) != REGNO (operands[1]))"
+  [(set (match_dup 0)
+   (vec_merge:V4SF
+ (vec_duplicate:V4SF
+   (float_truncate:V2SF
+ (match_dup 1)))
+ (match_dup 0)
+ (const_int 1)))]
+{
+  operands[0] = simplify_gen_subreg (V4SFmode, operands[0],
+SFmode, 0);
+  operands[1] = simplify_gen_subreg (V2DFmode, operands[1],
+DFmode, 0);
+  emit_move_insn (operands[0], CONST0_RTX (V4SFmode));
+})
+
+;; Break partial reg stall for cvtss2sd.
+
+(define_peephole2
+  [(set (match_operand:DF 0 "register_operand")
+(float_extend:DF
+  (match_operand:SF 1 "nonimmediate_operand")))]
+  "TARGET_SSE2 && TARGET_SSE_MATH
+   && TARGET_SSE_PARTIAL_REG_DEPENDENCY
+   && optimize_function_for_speed_p (cfun)
+   && reload_completed && SSE_REG_P (operands[0])
+   && peep2_reg_dead_p (0, oper

Re: gcc_GAS_FLAGS: Add more gcc_cv_as_flags overrides

2013-09-11 Thread Thomas Schwinge

Hi!

On Wed, 11 Sep 2013 14:53:58 -0700, "H.J. Lu"  wrote:
> On Wed, Sep 11, 2013 at 2:23 PM, Thomas Schwinge
>  wrote:
> > --- gcc/acinclude.m4
> > +++ gcc/acinclude.m4
> > @@ -444,8 +444,16 @@ AC_DEFUN([gcc_GAS_FLAGS],
> >  [AC_CACHE_CHECK([assembler flags], gcc_cv_as_flags,
> >  [ case "$target" in
> >i[[34567]]86-*-linux*)
> > -dnl Always pass --32 to ia32 Linux assembler.
> > -gcc_cv_as_flags="--32"
> > +dnl Override the default, which may be incompatible.
> > +gcc_cv_as_flags=--32
> > +;;
> > +  x86_64-*-linux-gnux32)
> > +dnl Override the default, which may be incompatible.
> > +gcc_cv_as_flags=--x32
> 
> I don't think it is necessary.  I configure x32 gcc with
> 
> CC="gcc -mx32" CXX="g++ -mx32" RUNTESTFLAGS=" .../configure
> --with-multilib-list=m32,m64,mx32 --with-abi=mx32
> 
> There is no difference in as/ld features between x32 and x86-64.

Ah, OK, I thought there was because of the --x32 option being available.
However, if the x86_64 change is approved, I'd suggest to add it anyway,
both for documentation purposes (so that nobody wonders why it is not
handled here), and for consistency (because ASM_SPEC is also handling it
separately).

> > +;;
> > +  x86_64-*-linux*)
> > +dnl Override the default, which may be incompatible.
> > +gcc_cv_as_flags=--64
> >  ;;


Grüße,
 Thomas


pgpNFJKSZw2E6.pgp
Description: PGP signature

Re: gcc_GAS_FLAGS: Add more gcc_cv_as_flags overrides

2013-09-11 Thread H.J. Lu

On Wed, Sep 11, 2013 at 2:23 PM, Thomas Schwinge
 wrote:
> Hi!
>
> In a toolchain where GCC has been configured for
> --target=i686-pc-linux-gnu, and where an x86_64 multilib has also been
> configured, by default GCC will generate code for 32-bit mode, and for
> 64-bit mode only when -m64 is passed.  Per ASM_SPEC definition, the -m64
> option is translated to --64 when invoking the assembler.  This is
> working fine.
>
> What however is not working fine is GCC's very own configure testing.
> When using said toolchain for building on/for x86_64-linux-gnu by
> configuring a native GCC build with CC='gcc -m64' CXX='g++ -m64', I'm
> seeing the following happen in [GCC]/gcc/configure:
>
> configure:23559: checking assembler for thread-local storage support
> configure:23572: [...]/bin/as   --fatal-warnings -o conftest.o conftest.s 
> >&5
> conftest.s: Assembler messages:
> conftest.s:5: Error: bad register name `%rax'
> [...]
> configure: failed program was
>
> .section ".tdata","awT",@progbits
> foo:.long   25
> .text
> movq%fs:0, %rax
> [...]
> configure:23586: result: no
>
> In this toolchain, an unadorned invocation of as will default to 32-bit
> mode, and enter 64-bit mode only when an explicit --64 is passed -- which
> is not done here.
>
> This is, from a quick glance, the "inverse" thing as has once been
> discussed in ,
> .
>
> Wanting to use this toolchain, do I have to configure GCC pointing to an
> assembler that defaults to 64-bit mode, or use something like
> --with-as='as --64' (not checked whether that works), or should the GCC
> configure system deduce from $CXX' -m64 that it needs to pass --64 to the
> assembler (à la ASM_SPEC as indicated above), or is the following patch
> the way to go, or something else?  H.J., the gnux32 change is just a
> guess; please comment if that's not right.  (And, of course, this is not
> a problem specific to *-*-linux*, but changing that is for another day.)
>
> gcc/
> * acinclude.m4 (gcc_GAS_FLAGS): Add more gcc_cv_as_flags
> overrides.
> * configure: Regenerate.
>
> --- gcc/acinclude.m4
> +++ gcc/acinclude.m4
> @@ -444,8 +444,16 @@ AC_DEFUN([gcc_GAS_FLAGS],
>  [AC_CACHE_CHECK([assembler flags], gcc_cv_as_flags,
>  [ case "$target" in
>i[[34567]]86-*-linux*)
> -dnl Always pass --32 to ia32 Linux assembler.
> -gcc_cv_as_flags="--32"
> +dnl Override the default, which may be incompatible.
> +gcc_cv_as_flags=--32
> +;;
> +  x86_64-*-linux-gnux32)
> +dnl Override the default, which may be incompatible.
> +gcc_cv_as_flags=--x32

I don't think it is necessary.  I configure x32 gcc with

CC="gcc -mx32" CXX="g++ -mx32" RUNTESTFLAGS=" .../configure
--with-multilib-list=m32,m64,mx32 --with-abi=mx32

There is no difference in as/ld features between x32 and x86-64.

> +;;
> +  x86_64-*-linux*)
> +dnl Override the default, which may be incompatible.
> +gcc_cv_as_flags=--64
>  ;;
>powerpc*-*-darwin*)
>  dnl Always pass -arch ppc to assembler.
>
>


-- 
H.J.

Re: [patch] Make cxxfilt demangle internal-linkage templates

2013-09-11 Thread Paul Pluzhnikov

Ping x2?

Original message:
http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00394.html

On Fri, Aug 16, 2013 at 6:10 PM, Paul Pluzhnikov  wrote:
> Ping?


--
Paul Pluzhnikov

[PATCH] [x86] Incorrect naming of FMA Builtins in documentation.

2013-09-11 Thread Cameron McInally

Hey guys,

It appears that there are errors in the x86 FMA Builtins
documentation. The FMA instruction names should be prefixed with a
'v'. Also the FMA instructions operate on XMM/YMM registers, not MMX
registers.

Please find a Patch and ChangeLog attached for both issues. I do not
have commit access.

Thanks,
Cameron


ChangeLog
Description: Binary data


Patch
Description: Binary data

Re: [PATCH][RFC] Move IVOPTs closer to RTL expansion

2013-09-11 Thread Pat Haugen


On 09/04/2013 04:20 AM, Richard Biener wrote:

Any help with benchmarking this on targets other than x86_64
is appreciated (I'll re-do x86_64).

I ran CPU2000 and CPU2006 on PowerPC comparing the patch. CPU2000 had 3 
benchmarks degrade in the 4%-6%range (254.gap, 168.wupwise, 
173.applu).CPU2006 showed one benchmark, 410.bwaves, degrading 9%.I have 
dug into applu/bwaves and saw nothing directly attributable to your 
patch, degradations looked to be caused by idioms/oddities of the 
PowerPC architecture related to minor scheduling differences and/or data 
location.

gcc_GAS_FLAGS: Add more gcc_cv_as_flags overrides

2013-09-11 Thread Thomas Schwinge

Hi!

In a toolchain where GCC has been configured for
--target=i686-pc-linux-gnu, and where an x86_64 multilib has also been
configured, by default GCC will generate code for 32-bit mode, and for
64-bit mode only when -m64 is passed.  Per ASM_SPEC definition, the -m64
option is translated to --64 when invoking the assembler.  This is
working fine.

What however is not working fine is GCC's very own configure testing.
When using said toolchain for building on/for x86_64-linux-gnu by
configuring a native GCC build with CC='gcc -m64' CXX='g++ -m64', I'm
seeing the following happen in [GCC]/gcc/configure:

configure:23559: checking assembler for thread-local storage support
configure:23572: [...]/bin/as   --fatal-warnings -o conftest.o conftest.s 
>&5
conftest.s: Assembler messages:
conftest.s:5: Error: bad register name `%rax'
[...]
configure: failed program was

.section ".tdata","awT",@progbits
foo:.long   25
.text
movq%fs:0, %rax
[...]
configure:23586: result: no

In this toolchain, an unadorned invocation of as will default to 32-bit
mode, and enter 64-bit mode only when an explicit --64 is passed -- which
is not done here.

This is, from a quick glance, the "inverse" thing as has once been
discussed in ,
.

Wanting to use this toolchain, do I have to configure GCC pointing to an
assembler that defaults to 64-bit mode, or use something like
--with-as='as --64' (not checked whether that works), or should the GCC
configure system deduce from $CXX' -m64 that it needs to pass --64 to the
assembler (à la ASM_SPEC as indicated above), or is the following patch
the way to go, or something else?  H.J., the gnux32 change is just a
guess; please comment if that's not right.  (And, of course, this is not
a problem specific to *-*-linux*, but changing that is for another day.)

gcc/
* acinclude.m4 (gcc_GAS_FLAGS): Add more gcc_cv_as_flags
overrides.
* configure: Regenerate.

--- gcc/acinclude.m4
+++ gcc/acinclude.m4
@@ -444,8 +444,16 @@ AC_DEFUN([gcc_GAS_FLAGS],
 [AC_CACHE_CHECK([assembler flags], gcc_cv_as_flags,
 [ case "$target" in
   i[[34567]]86-*-linux*)
-dnl Always pass --32 to ia32 Linux assembler.
-gcc_cv_as_flags="--32"
+dnl Override the default, which may be incompatible.
+gcc_cv_as_flags=--32
+;;
+  x86_64-*-linux-gnux32)
+dnl Override the default, which may be incompatible.
+gcc_cv_as_flags=--x32
+;;
+  x86_64-*-linux*)
+dnl Override the default, which may be incompatible.
+gcc_cv_as_flags=--64
 ;;
   powerpc*-*-darwin*)
 dnl Always pass -arch ppc to assembler.


Grüße,
 Thomas


pgpZ5wqwwOBcc.pgp
Description: PGP signature

Re: [RS6000] powerpc64 -mcmodel=medium large symbol offsets

2013-09-11 Thread David Edelsohn

On Wed, Sep 11, 2013 at 7:12 AM, Alan Modra  wrote:
> On Mon, Sep 09, 2013 at 06:37:03PM +0930, Alan Modra wrote:
>> gcc/
>>   * config/rs6000/predicates.md (add_cint_operand): New.
>>   (reg_or_add_cint_operand): Use add_cint_operand.
>>   * config/rs6000/rs6000.md (largetoc_high_plus): Restrict offset
>>   using add_cint_operand.
>>   (largetoc_high_plus_aix, small_toc_ref): Likewise.
>> gcc/testsuite/
>>   * gcc.target/powerpc/medium_offset.c: New.
>
> I missed seeing one testcase regression caused by this patch.  :-(
> gcc.c-torture/compile/pr41634.c at -O3 gets an "insn does not satisfy
> its constraints".  Fixed with the following.  OK to apply?
>
> * config/rs6000/rs6000.c (toc_relative_expr_p): Use add_cint_operand.

Okay.

Thanks, David

Re: [PATCH, libvtv] Fix configure/testsuite issues with libvtv

2013-09-11 Thread H.J. Lu

On Wed, Sep 11, 2013 at 12:27 PM, Caroline Tice  wrote:
>
>> 2. Why does libvtv/configure.ac have
>>
>> echo 'MULTISUBDIR =' >> $ac_file
>> ml_norecursion=yes
>> . ${multi_basedir}/config-ml.in
>>
>> when AM_ENABLE_MULTILIB is used?  I got
>>
>> make[3]: Entering directory
>> `/export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/libvtv'
>> Makefile:844: warning: overriding recipe for target `multi-do'
>> Makefile:775: warning: ignoring old recipe for target `multi-do'
>> Makefile:892: warning: overriding recipe for target `multi-clean'
>> Makefile:823: warning: ignoring old recipe for target `multi-clean'
>>
>
> I do not know exactly why it was done this way, but I see that the
> configure.ac files for libsanitizer and libstdc++-v3 seem to do it
> exactly this way as well.

They have

AC_CONFIG_FILES(AC_FOREACH([DIR], [tsan], [DIR/Makefile ])

libvtv has

AC_CONFIG_FILES(AC_FOREACH([DIR], [. testsuite], [DIR/Makefile ]),
^  Is
the extra "." needed?
>
> Are either of these issues actually causing you problems at the moment?

No.


-- 
H.J.

Re: [PATCH, ARM, LRA] Prepare ARM build with LRA

2013-09-11 Thread Yvan Roux

Here is the new patch discussed in the other thread.

Thanks
Yvan

2013-09-11  Yvan Roux  
Vladimir Makarov  

* rtlanal.c (lsb_bitfield_op_p): New predicate for bitfield operations
from the least significant bit.
(strip_address_mutations): Add bitfield operations handling.
(shift_code_p): New predicate for shifting operations.
(must_be_index_p): Add shifting operations handling.
(set_address_index): Likewise.


On 11 September 2013 09:00, Yvan Roux  wrote:
> New attempt, with fixes from Richard's comments (discussed in the other 
> thread).
>
> Thanks,
> Yvan
>
> 2013-09-09  Yvan Roux  
> Vladimir Makarov  
>
> * rtlanal.c (strip_address_mutations): Add bitfield operations
> handling.
> (shift_code_p): New predicate for shifting operations.
> (must_be_index_p): Add shifting operations handling.
> (set_address_index): Likewise.
>
>
> On 9 September 2013 10:01, Yvan Roux  wrote:
>> Hi,
>>
>> here are the modifications, discussed in another thread, needed in
>> rtlanal.c by ARM targets (AArch32 and AArch64) to build GCC with LRA.
>>
>> Is it ok for trunk ?
>>
>> Thanks,
>> Yvan
>>
>> 2013-09-09  Yvan Roux  
>> Vladimir Makarov  
>>
>> * rtlanal.c (must_be_index_p, set_address_index): Add ASHIFTRT,
>> LSHIFTRT, ROTATE, ROTATERT and SIGN_EXTRACT handling.
>> (set_address_base): Add SIGN_EXTRACT handling.


arm-lra-rtl.patch
Description: Binary data

Re: [PATCH, libvtv] Fix configure/testsuite issues with libvtv

2013-09-11 Thread H.J. Lu

On Wed, Sep 11, 2013 at 11:22 AM, Caroline Tice  wrote:
> This  patch should fix the issues that people were having with the
> libvtv testsuite being run (and failing) when GCC was not configured
> with --enable-vtable-verify.  I am still in the process of running
> some tests, but at this point I am fairly confident that this patch
> works correctly.  Even though I am the libvtv maintainer, I would
> appreciate someone else reviewing this patch too, as a sanity check.
>
> -- Caroline Tice
> cmt...@google.com
>
> 2013-09-11 Caroline Tice  
>
> * Makefile.am: Re-instante ENABLE_VTABLE_VERIFY checks, to make
> sure testsuite is not run if libstdc++ and libgcc were not built
> with vtable verification.
> * Makefile.in: Regenerated.
> * configure.ac: Re-instatate checks for --enable-vtable-verify flag,
> to make sure testsuite is not run if libstdc++ and libgcc were not
> built with vtable verification.
> * configure: Regenerated.

I tried it.  I noticed 2 issues:

1. Correct if I am wrong.  libvtv doesn't work correctly if
--enable-vtable-verify
isn't used. Why is libvtv built/installed at all when
--enable-vtable-verify isn't used.
2. Why does libvtv/configure.ac have

echo 'MULTISUBDIR =' >> $ac_file
ml_norecursion=yes
. ${multi_basedir}/config-ml.in

when AM_ENABLE_MULTILIB is used?  I got

make[3]: Entering directory
`/export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/libvtv'
Makefile:844: warning: overriding recipe for target `multi-do'
Makefile:775: warning: ignoring old recipe for target `multi-do'
Makefile:892: warning: overriding recipe for target `multi-clean'
Makefile:823: warning: ignoring old recipe for target `multi-clean'

-- 
H.J.

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-11 Thread Andrew Pinski

On Wed, Sep 4, 2013 at 12:33 PM, Alexander Monakov  wrote:
> On Wed, Sep 4, 2013 at 9:53 PM, Steven Bosscher  wrote:
>>
>> On Wed, Sep 4, 2013 at 10:58 AM, Alexander Monakov wrote:
>> > Hello,
>> >
>> > Could you use the existing facilities instead, such as adjust_priority 
>> > hook,
>> > or making the compare-branch insn sequence a SCHED_GROUP?
>>
>>
>> Or a define_bypass?
>
> Hm, I don't think define_bypass would work: it still leaves the
> scheduler freedom to move the compare up.

Even though it allows the scheduler freedom to move the compare up,
the schedule does due to the schedule model not being correct for the
processor.  I have done the same for Octeon2 where it is able to
combine the compare and the branch and found the resulting schedule is
much better than even what this hack could do due to the instructions
still take a issue slot.  Is it true that for these two processors it
takes an issue slot or is it being done before issue?

Thanks,
Andrew Pinski

>
> IMO adjust_priority would be preferable if it allows to achieve the goal.
>
> Alexander

Re: [PATCH] PR tree-optimization/58380

2013-09-11 Thread Jeff Law


On 09/11/2013 12:20 PM, Paolo Carlini wrote:

Hi,

On 09/11/2013 07:40 PM, domi...@lps.ens.fr wrote:

The failures are (at least on x86_64-apple-darwin10):

/opt/gcc/work/gcc/testsuite/g++.dg/torture/pr58380.C: In static member
function 'static iplugin_factory& selection_to_stdout::get_factory()':
/opt/gcc/work/gcc/testsuite/g++.dg/torture/pr58380.C:167:79: warning:
deprecated conversion from string constant to 'char*' [-Wwrite-strings]
/opt/gcc/work/gcc/testsuite/g++.dg/torture/pr58380.C:167:79: warning:
deprecated conversion from string constant to 'char*' [-Wwrite-strings]

Yeah, everywhere.

By the way, the warning is definitely correct, thus either tweak that
basic_string constructor to take a const char* (assuming the testcase is
still fine as reproducer), or add a -w, or some other simple tweak will do.
-w is the right thing to do, warnings or the lack thereof aren't 
important for what that test is detecting.  I'll fix it up momentarily.


jeff

Re: V4 Lambda templates and implicit function templates.

2013-09-11 Thread Jason Merrill


On 09/11/2013 02:22 PM, Adam Butcher wrote:

Okay for the attached to go to trunk with suitable changelog?


Yes.

Jason

[C++ Patch] Improve finish_pseudo_destructor_expr location

2013-09-11 Thread Paolo Carlini


Hi,

when yesterday I analyzed a bit c++/58363 and eventually I committed a 
pretty printing fix I noticed that the column was wrong for the pseudo 
destructor expression m.~f, pointing at the end. A fix turns out to be 
rather simple, because finish_pseudo_destructor_expr was simply not 
getting a location_t argument, I think it's also rather complete vs 
templates, otherwise it would not do the right thing for c++/58363 
itself for example, when the error message is produced by 
unify_arg_conversion.


Note, in tsubst_copy_and_build I simply pass input_location, not 
EXPR_LOCATION (t), which would cause a regression for the error @ line 
22 of pseudodtor3.C (because the location of t is UNKNOWN at that point 
during substitution whereas in fact the final location of the error 
messages was already Ok), neither EXPR_LOC_OR_HERE, which would buy 
nothing because tsubst_copy_and_build at the beginning has code which 
assigns input_location the EXPR_LOCATION (t), in case it's known.


Tested x86_64-linux.

Thanks!
Paolo.


2013-09-11  Paolo Carlini  

* semantics.c (finish_pseudo_destructor_expr): Add location_t
parameter.
* pt.c (unify_arg_conversion): Use EXPR_LOC_OR_HERE.
(tsubst_copy_and_build): Adjust finish_pseudo_destructor_expr
calls.
* parser.c (cp_parser_postfix_dot_deref_expression): Likewise.
(cp_parser_postfix_expression): Pass the proper location to
cp_parser_postfix_dot_deref_expression.

/testsuite
2013-09-11  Paolo Carlini  

* g++.dg/template/pseudodtor2.C: Add column number to dg-error
strings.
* g++.dg/template/pseudodtor3.C: Likewise.
Index: cp/cp-tree.h
===
--- cp/cp-tree.h(revision 202503)
+++ cp/cp-tree.h(working copy)
@@ -5734,7 +5734,7 @@ extern tree finish_call_expr  (tree, 
veclexer);
+  sloc = token->location;
   /* Some of the productions are determined by keywords.  */
   keyword = token->keyword;
   switch (keyword)
@@ -6019,7 +6021,7 @@ cp_parser_postfix_expression (cp_parser *parser, b
= cp_parser_postfix_dot_deref_expression (parser, token->type,
  postfix_expression,
  false, &idk,
- token->location);
+ sloc);
 
   is_member_access = true;
  break;
@@ -6338,7 +6340,7 @@ cp_parser_postfix_dot_deref_expression (cp_parser
  pseudo_destructor_p = true;
  postfix_expression
= finish_pseudo_destructor_expr (postfix_expression,
-s, type);
+s, type, location);
}
 }
 
Index: cp/pt.c
===
--- cp/pt.c (revision 202503)
+++ cp/pt.c (working copy)
@@ -5398,7 +5398,8 @@ unify_arg_conversion (bool explain_p, tree to_type
  tree from_type, tree arg)
 {
   if (explain_p)
-inform (input_location, "  cannot convert %qE (type %qT) to type %qT",
+inform (EXPR_LOC_OR_HERE (arg),
+   "  cannot convert %qE (type %qT) to type %qT",
arg, from_type, to_type);
   return 1;
 }
@@ -14292,9 +14293,10 @@ tsubst_copy_and_build (tree t,
 
 case PSEUDO_DTOR_EXPR:
   RETURN (finish_pseudo_destructor_expr
-   (RECUR (TREE_OPERAND (t, 0)),
-RECUR (TREE_OPERAND (t, 1)),
-tsubst (TREE_OPERAND (t, 2), args, complain, in_decl)));
+ (RECUR (TREE_OPERAND (t, 0)),
+  RECUR (TREE_OPERAND (t, 1)),
+  tsubst (TREE_OPERAND (t, 2), args, complain, in_decl),
+  input_location));
 
 case TREE_LIST:
   {
@@ -14423,7 +14425,8 @@ tsubst_copy_and_build (tree t,
  {
dtor = TREE_OPERAND (dtor, 0);
if (TYPE_P (dtor))
- RETURN (finish_pseudo_destructor_expr (object, s, dtor));
+ RETURN (finish_pseudo_destructor_expr
+ (object, s, dtor, input_location));
  }
  }
  }
Index: cp/semantics.c
===
--- cp/semantics.c  (revision 202503)
+++ cp/semantics.c  (working copy)
@@ -2361,7 +2361,8 @@ finish_this_expr (void)
was of the form `OBJECT.SCOPE::~DESTRUCTOR'.  */
 
 tree
-finish_pseudo_destructor_expr (tree object, tree scope, tree destructor)
+finish_pseudo_destructor_expr (tree object, tree scope, tree destructor,
+  location_t loc)
 {
   if (object == error_mark_node || destructor == error_mark_node)
 return error_mark_node;
@@ -2372,15 +2373,16 @@ tree
 {
   if (scope == error_mark_node)

Re: [patch driver]: Fix relocatable toolchain path-replacement in driver

2013-09-11 Thread Joseph S. Myers

On Wed, 11 Sep 2013, Kai Tietz wrote:

> > update_path should, as I understand it, always be called with PATH being a
> > relocated path (one that has had the configured prefix converted to the
> > prefix where the toolchain is in fact installed, using
> > make_relative_prefix, or that doesn't need any relocation, e.g. a path
> > passed with a -B option).  In incpath.c, for example, you see how a path
> > is computed using make_relative_prefix before update_path is called.
> > Thus, it is correct for std_prefix to be the relocated prefix rather than
> > the unrelocated one.
> 
> I understand this different.  See here translate_name function (used
> by update_path).  If path is empty it falls back to PREFIX (not
> std_prefix).  So I understand that this routine tries to cut off
> compiled in prefix from paths and replaces it by key's path.  But
> well, I might got this wrong.

It doesn't really make sense to me for there to be a mixture of direct 
replacement of the configure-time prefix based on keys, and first 
replacing the configure-time prefix with the install location and then 
replacing that based on keys.

I think the most appropriate design is as I said: make_relative_prefix, 
and nothing else, deals with replacing the configure-time prefix with the 
install location, and then prefix.c deals with any subsequent replacements 
of the install location.  On that basis, translate_name should be using 
std_prefix not PREFIX, and any callers of these prefix.c interfaces that 
pass paths still using the configure-time prefix should be fixed so that 
the make_relative_prefix substitution occurs first.

You've found evidence of an inconsistency.  A fix needs to be based on a 
clear design - a clear understanding of what sort of paths should be used 
where and what interfaces should translate them to other kinds of paths - 
rather than just locally changing the paths used in one particular place 
without a basis for arguing that the change fits a coherent design and so 
won't cause problems elsewhere.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH, libgcc] Disable JCR section when java is not enabled

2013-09-11 Thread Ian Lance Taylor

On Tue, Sep 10, 2013 at 2:01 AM, Joey Ye  wrote:
> Updated to http://gcc.gnu.org/ml/gcc-patches/2012-12/msg01097.html
>
> Build passes on arm-none-eabi and bootstrap passes on x86.
>
> OK to trunk?
>
> ChangeLog
>   * libgcc/Makefile.in: Include JAVA_IS_ENABLED in CFLAGS.
>   * libgcc/configure.ac (java_is_enabled): New variable.
>   * libgcc/configure: Regenerated.
>   * libgcc/crtstuff.c: Check JAVA_IS_ENABLED.


The ChangeLog entries should be in libgcc/ChangeLog, and they should
not have the libgcc/ prefix on the file names.  Compare to the other
entries in that file.

This patch is OK for libgcc.

However, before committing it, I would like it to be approved by a
Java maintainer.  I've CC'ed the Java maintainers on this message.

Thanks.

Ian




> Index: Makefile.in
> ===
> --- Makefile.in (revision 194467)
> +++ Makefile.in (working copy)
> @@ -281,7 +281,8 @@
>-finhibit-size-directive -fno-inline -fno-exceptions \
>-fno-zero-initialized-in-bss -fno-toplevel-reorder -fno-tree-vectorize \
>-fno-stack-protector \
> -  $(INHIBIT_LIBC_CFLAGS)
> +  $(INHIBIT_LIBC_CFLAGS) \
> +  -DJAVA_IS_ENABLED=@java_is_enabled@
>
>  # Extra flags to use when compiling crt{begin,end}.o.
>  CRTSTUFF_T_CFLAGS =
> Index: configure.ac
> ===
> --- configure.ac(revision 194467)
> +++ configure.ac(working copy)
> @@ -204,6 +204,17 @@
> esac],
>[enable_sjlj_exceptions=auto])
>
> +# Disable jcr section if we are not building java
> +case ,${enable_languages}, in
> +  *,java,*)
> +java_is_enabled=1
> +;;
> +  *)
> +java_is_enabled=0
> +;;
> +esac
> +AC_SUBST(java_is_enabled)
> +
>  AC_CACHE_CHECK([whether to use setjmp/longjmp exceptions],
>  [libgcc_cv_lib_sjlj_exceptions],
>  [AC_LANG_CONFTEST(
> Index: crtstuff.c
> ===
> --- crtstuff.c  (revision 194467)
> +++ crtstuff.c  (working copy)
> @@ -145,6 +145,10 @@
>  # define USE_TM_CLONE_REGISTRY 1
>  #endif
>
> +#if !JAVA_IS_ENABLED
> +#undef JCR_SECTION_NAME
> +#endif
> +
>  /* We do not want to add the weak attribute to the declarations of these
> routines in unwind-dw2-fde.h because that will cause the definition of
> these symbols to be weak as well.
> Index: configure
> ===
> --- configure   (revision 194467)
> +++ configure   (working copy)
> @@ -566,6 +566,7 @@
>  set_use_emutls
>  set_have_cc_tls
>  vis_hide
> +java_is_enabled
>  fixed_point
>  enable_decimal_float
>  decimal_float
> @@ -4191,6 +4192,17 @@
>  fi
>
>
> +# Disable jcr section if we are not building java
> +case ,${enable_languages}, in
> +  *,java,*)
> +java_is_enabled=1
> +;;
> +  *)
> +java_is_enabled=0
> +;;
> +esac
> +
> +
>  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to use
> setjmp/longjmp exceptions" >&5
>  $as_echo_n "checking whether to use setjmp/longjmp exceptions... " >&6; }
>  if test "${libgcc_cv_lib_sjlj_exceptions+set}" = set; then :
>
>
>

Re: [PATCH, libvtv] Fix configure/testsuite issues with libvtv

2013-09-11 Thread Caroline Tice

On Wed, Sep 11, 2013 at 12:07 PM, H.J. Lu  wrote:
> On Wed, Sep 11, 2013 at 11:22 AM, Caroline Tice  wrote:
>> This  patch should fix the issues that people were having with the
>> libvtv testsuite being run (and failing) when GCC was not configured
>> with --enable-vtable-verify.  I am still in the process of running
>> some tests, but at this point I am fairly confident that this patch
>> works correctly.  Even though I am the libvtv maintainer, I would
>> appreciate someone else reviewing this patch too, as a sanity check.
>>
>> -- Caroline Tice
>> cmt...@google.com
>>
>> 2013-09-11 Caroline Tice  
>>
>> * Makefile.am: Re-instante ENABLE_VTABLE_VERIFY checks, to make
>> sure testsuite is not run if libstdc++ and libgcc were not built
>> with vtable verification.
>> * Makefile.in: Regenerated.
>> * configure.ac: Re-instatate checks for --enable-vtable-verify flag,
>> to make sure testsuite is not run if libstdc++ and libgcc were not
>> built with vtable verification.
>> * configure: Regenerated.
>
> I tried it.  I noticed 2 issues:
>
> 1. Correct if I am wrong.  libvtv doesn't work correctly if
> --enable-vtable-verify
> isn't used. Why is libvtv built/installed at all when
> --enable-vtable-verify isn't used.


The short answer is "because Benjamin Kosnik thought we should do it
this way".  Benajmin, who helped me write most of the original
configure and Makefile stuff for libvtv, seemed to feel that it was
best to build libvtv whenever the architecture would allow it.  I do
not fully understand his reasoning, so it may be best to let him
answer this one.


> 2. Why does libvtv/configure.ac have
>
> echo 'MULTISUBDIR =' >> $ac_file
> ml_norecursion=yes
> . ${multi_basedir}/config-ml.in
>
> when AM_ENABLE_MULTILIB is used?  I got
>
> make[3]: Entering directory
> `/export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/libvtv'
> Makefile:844: warning: overriding recipe for target `multi-do'
> Makefile:775: warning: ignoring old recipe for target `multi-do'
> Makefile:892: warning: overriding recipe for target `multi-clean'
> Makefile:823: warning: ignoring old recipe for target `multi-clean'
>

I do not know exactly why it was done this way, but I see that the
configure.ac files for libsanitizer and libstdc++-v3 seem to do it
exactly this way as well.


Are either of these issues actually causing you problems at the moment?

-- Caroline
cmt...@google.com


> --
> H.J.

Re: RFC: patch to build GCC for arm with LRA

2013-09-11 Thread Yvan Roux

> Yeah, good point.  TBH I prefer it with separate ifs though, because the
> three cases are dealing with three different types of rtl (unary, binary
> and ternary).  But I don't mind much either way.

Ok, it's fine for me too.

> The new patch looks good to me, thanks.  Just one minor style nit:
> "return false" rather than "return 0" for the bool.  Maybe also change:
>
> /* Bitfield operations [SIGN|ZERO]_EXTRACT from the least significant
>bit can be used too.  */
>
> to something like:
>
> /* A [SIGN|ZERO]_EXTRACT from the least significant bit effectively
>acts as a combined truncation and extension.  */

Yeah, its clearer.  I'll post the new patch in the other thread.

> I really will try to make that my last comment and leave things open
> for an official review :-)

:-) once again many thanks for your help Richard.

Cheers,
Yvan

Re: Using gen_int_mode instead of GEN_INT minot testsuite fallout on MIPS

2013-09-11 Thread Graham Stott

Hi Richard,

Thanks I'll give a go tomorrow.

Not sure why it has only been seen/reported for MIPS so far
I can't see why it can't happen in other backends. 


Cheers
Graham

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-11 Thread Wei Mi

Taking the same issue slot is not enough for x86. The compare and
branch need to be consecutive in binary to be macro-fused on x86.

Thanks,
Wei Mi.

On Wed, Sep 11, 2013 at 10:45 AM, Andrew Pinski  wrote:
> On Wed, Sep 4, 2013 at 12:33 PM, Alexander Monakov  wrote:
>> On Wed, Sep 4, 2013 at 9:53 PM, Steven Bosscher  
>> wrote:
>>>
>>> On Wed, Sep 4, 2013 at 10:58 AM, Alexander Monakov wrote:
>>> > Hello,
>>> >
>>> > Could you use the existing facilities instead, such as adjust_priority 
>>> > hook,
>>> > or making the compare-branch insn sequence a SCHED_GROUP?
>>>
>>>
>>> Or a define_bypass?
>>
>> Hm, I don't think define_bypass would work: it still leaves the
>> scheduler freedom to move the compare up.
>
> Even though it allows the scheduler freedom to move the compare up,
> the schedule does due to the schedule model not being correct for the
> processor.  I have done the same for Octeon2 where it is able to
> combine the compare and the branch and found the resulting schedule is
> much better than even what this hack could do due to the instructions
> still take a issue slot.  Is it true that for these two processors it
> takes an issue slot or is it being done before issue?
>
> Thanks,
> Andrew Pinski
>
>>
>> IMO adjust_priority would be preferable if it allows to achieve the goal.
>>
>> Alexander

Re: crash fix for unhanded operation

2013-09-11 Thread Mike Stump

On Sep 10, 2013, at 3:43 PM, Joseph S. Myers  wrote:
> For target-specific types with more fine-grained 
> restrictions on permitted operations, there are several target hooks such 
> as invalid_unary_op, invalid_binary_op and invalid_parameter_type.  That's 
> how the errors should be given, so that the invalid GIMPLE is never 
> generated.

Ah…  yes, that does the trick, however, the disconnect between rtl and gimple 
is annoying.  gimply (or rtl) is free to decompose operations, for example, xor 
can be decomposed into n independent xors of the parts of a larger piece of 
data, and if it does that, then the port should not give an error, and if 
gimple is does not do this, then the port should, but, the port can't know to 
do this or not, and to retain the flexibility to allow gimple lowering to 
improve over time.

As a concrete example, xor, or, gt, ge, lt, le lower on plain integer modes; 
but eq and ne don't.  Odd that.  I'd claim it is a mere implementation detail 
of lowering and requiring port work for an internal implementation detail is 
odd.

But, with the interfaces you mentioned I can solve the problem…  I'll plan on 
doing it that way, not ideal, but reasonable.

Thanks for the help.

Re: Using gen_int_mode instead of GEN_INT minot testsuite fallout on MIPS

2013-09-11 Thread Richard Sandiford

Graham Stott  writes:
> Hi Richard,
>
> There is some minor testsuite fallout with these patches on MIPS a
> couple of tests (see below)ICE ingen_int_mode () in both these ICE the
> mode is CCmode.

Hmm, interesting.  I suppose gen_int_mode should handle CC modes,
since there's no other constant rtx that can be used instead.  OTOH,
like you say, it doesn't really make sense to apply try_const_anchor
to CCmode.

How does the following patch look?

Thanks,
Richard


gcc/
* emit-rtl.c (gen_int_mode): Handle CC modes.
* cse.c (try_const_anchors): ...but punt on them here.

Index: gcc/emit-rtl.c
===
--- gcc/emit-rtl.c  2013-09-08 11:52:15.0 +0100
+++ gcc/emit-rtl.c  2013-09-11 19:32:35.702377902 +0100
@@ -417,6 +417,11 @@ gen_rtx_CONST_INT (enum machine_mode mod
 rtx
 gen_int_mode (HOST_WIDE_INT c, enum machine_mode mode)
 {
+  /* CONST_INT is used for CC modes too.  We can't make any assumptions
+ about the precision or bitsize in that case, so just pass the value
+ through unchanged.  */
+  if (GET_MODE_CLASS (mode) == MODE_CC)
+return GEN_INT (c);
   return GEN_INT (trunc_int_for_mode (c, mode));
 }
 
Index: gcc/cse.c
===
--- gcc/cse.c   2013-09-08 11:52:15.0 +0100
+++ gcc/cse.c   2013-09-11 19:38:17.664399826 +0100
@@ -1354,6 +1354,11 @@ try_const_anchors (rtx src_const, enum m
   rtx lower_exp = NULL_RTX, upper_exp = NULL_RTX;
   unsigned lower_old, upper_old;
 
+  /* CONST_INT is used for CC modes, but we should leave those alone.  */
+  if (GET_MODE_CLASS (mode) == MODE_CC)
+return NULL_RTX;
+
+  gcc_assert (SCALAR_INT_MODE_P (mode));
   if (!compute_const_anchors (src_const, &lower_base, &lower_offs,
  &upper_base, &upper_offs))
 return NULL_RTX;

[PATCH, libvtv] Fix configure/testsuite issues with libvtv

2013-09-11 Thread Caroline Tice

This  patch should fix the issues that people were having with the
libvtv testsuite being run (and failing) when GCC was not configured
with --enable-vtable-verify.  I am still in the process of running
some tests, but at this point I am fairly confident that this patch
works correctly.  Even though I am the libvtv maintainer, I would
appreciate someone else reviewing this patch too, as a sanity check.

-- Caroline Tice
cmt...@google.com

2013-09-11 Caroline Tice  

* Makefile.am: Re-instante ENABLE_VTABLE_VERIFY checks, to make
sure testsuite is not run if libstdc++ and libgcc were not built
with vtable verification.
* Makefile.in: Regenerated.
* configure.ac: Re-instatate checks for --enable-vtable-verify flag,
to make sure testsuite is not run if libstdc++ and libgcc were not
built with vtable verification.
* configure: Regenerated.


libvtv-configure.patch
Description: Binary data

Re: RFC: patch to build GCC for arm with LRA

2013-09-11 Thread Richard Sandiford

Yvan Roux  writes:
> Yes indeed ! here is a fixed patch.
>
> In strip_address_mutations we now have 3 if/else if statements with
> the same body which could be factorized in:
>
>   if ((GET_RTX_CLASS (code) == RTX_UNARY)
>   /* Things like SIGN_EXTEND, ZERO_EXTEND and TRUNCATE can be
>  used to convert between pointer sizes.  */
>   || (lsb_bitfield_op_p (*loc))
>   /* Bitfield operations [SIGN|ZERO]_EXTRACT from the least 
> significant
>  bit can be used too.  */
>   || (code == AND && CONST_INT_P (XEXP (*loc, 1
>   /* (and ... (const_int -X)) is used to align to X bytes.  */
> loc = &XEXP (*loc, 0);
>
> if you think that it doesn't affect too much the readability.

Yeah, good point.  TBH I prefer it with separate ifs though, because the
three cases are dealing with three different types of rtl (unary, binary
and ternary).  But I don't mind much either way.

The new patch looks good to me, thanks.  Just one minor style nit:
"return false" rather than "return 0" for the bool.  Maybe also change:

/* Bitfield operations [SIGN|ZERO]_EXTRACT from the least significant
   bit can be used too.  */

to something like:

/* A [SIGN|ZERO]_EXTRACT from the least significant bit effectively
   acts as a combined truncation and extension.  */

I really will try to make that my last comment and leave things open
for an official review :-)

Thanks,
Richard

Re: [patch 1/2] tree-flow.h restructuring

2013-09-11 Thread Andrew MacLeod

OK, here's the patch... virtually the same except I moved 
useless_type_conversion_p() and  types_compatible_p() to gimple.[ch] 
instead of tree.[ch].


Upon closer examination, the comments  for those functions are fine, 
they say that this is for middle-end types...


Bootstrapped and no new regressions on x86_64-unknown-linux-gnu. so Just 
checking.. OK to check in?

I'll follow it with the tree-ssanames.c patch once it finishes running.

Andrew




treessah.diff.gz
Description: application/gzip

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-11 Thread Alexander Monakov



On Wed, 11 Sep 2013, Wei Mi wrote:

> I tried that and it caused some regressions, so I choosed to do
> chain_to_prev_insn another time in add_branch_dependences. There could
> be some dependence between those two functions.

(please don't top-post on this list)

In that case you can adjust 'last' in add_branch_dependences so that the
dependences pin the compare rather than the jump to the end, like this
(untested):

diff --git a/gcc/sched-rgn.c b/gcc/sched-rgn.c
index 2c971e2..a774d5d 100644
--- a/gcc/sched-rgn.c
+++ b/gcc/sched-rgn.c
@@ -2443,6 +2443,9 @@ add_branch_dependences (rtx head, rtx tail)
  cc0 setters remain at the end because they can't be moved away from
  their cc0 user.
 
+ Predecessors of SCHED_GROUP_P instructions that remain at the end also
+ remain at the end.
+
  COND_EXEC insns cannot be moved past a branch (see e.g. PR17808).
 
  Insns setting TARGET_CLASS_LIKELY_SPILLED_P registers (usually return
@@ -2465,6 +2468,7 @@ add_branch_dependences (rtx head, rtx tail)
 #endif
 || (!reload_completed
 && sets_likely_spilled (PATTERN (insn)
+|| (last != 0 && SCHED_GROUP_P (last))
 || NOTE_P (insn))
 {
   if (!NOTE_P (insn))

I'm also not a fan of adding two scheduler hooks and explicit handling in
sched-deps.c for this feature.  You probably could handle that with sched_init
hook entirely in the x86 backend (just loop over basic blocks and mark
suitable jumps with SCHED_GROUP_P), but on the other hand I can see an
argument that this might be useful in the future for other architectures.
Have you considered that?  What do other maintainers say?

Thanks.

Alexander

Re: V4 Lambda templates and implicit function templates.

2013-09-11 Thread Adam Butcher


On 11.09.2013 16:25, Jason Merrill wrote:

On 09/11/2013 10:42 AM, Jason Merrill wrote:
Sounds like the problem is that the compiler is trying to 
instantiate a
function while cp_unevaluated_operand is set.  But that shouldn't be 
an
issue because push_to_top_level clears cp_unevaluated_operand.  How 
does

it come to be set when instantiating the local variable?


Ah, I see: it's because instantiate_decl doesn't push_to_top_level
for function-local templates.  We still need to save/restore
cp_unevaluated_operand in that case, and let's also do
c_inhibit_evaluation_warnings.

Great, that fixes it.  Hadn't noticed it didn't happen a namespace 
scope.


Okay for the attached to go to trunk with suitable changelog?
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 22087fb..16e57b5 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -18947,6 +18947,8 @@ instantiate_decl (tree d, int defer_ok,
   tree gen_tmpl;
   bool pattern_defined;
   location_t saved_loc = input_location;
+  int saved_unevaluated_operand = cp_unevaluated_operand;
+  int saved_inhibit_evaluation_warnings = c_inhibit_evaluation_warnings;
   bool external_p;
   tree fn_context;
   bool nested;
@@ -19158,8 +19160,13 @@ instantiate_decl (tree d, int defer_ok,
   nested = (current_function_decl != NULL_TREE);
   if (!fn_context)
 push_to_top_level ();
-  else if (nested)
-push_function_context ();
+  else
+{
+  if (nested)
+	push_function_context ();
+  cp_unevaluated_operand = 0;
+  c_inhibit_evaluation_warnings = 0;
+}
 
   /* Mark D as instantiated so that recursive calls to
  instantiate_decl do not try to instantiate it again.  */
@@ -19283,6 +19290,8 @@ instantiate_decl (tree d, int defer_ok,
 
 out:
   input_location = saved_loc;
+  cp_unevaluated_operand = saved_unevaluated_operand;
+  c_inhibit_evaluation_warnings = saved_inhibit_evaluation_warnings;
   pop_deferring_access_checks ();
   pop_tinst_level ();

Re: [PATCH] PR tree-optimization/58380

2013-09-11 Thread Paolo Carlini


Hi,

On 09/11/2013 07:40 PM, domi...@lps.ens.fr wrote:

The failures are (at least on x86_64-apple-darwin10):

/opt/gcc/work/gcc/testsuite/g++.dg/torture/pr58380.C: In static member function 
'static iplugin_factory& selection_to_stdout::get_factory()':
/opt/gcc/work/gcc/testsuite/g++.dg/torture/pr58380.C:167:79: warning: 
deprecated conversion from string constant to 'char*' [-Wwrite-strings]
/opt/gcc/work/gcc/testsuite/g++.dg/torture/pr58380.C:167:79: warning: 
deprecated conversion from string constant to 'char*' [-Wwrite-strings]

Yeah, everywhere.

By the way, the warning is definitely correct, thus either tweak that 
basic_string constructor to take a const char* (assuming the testcase is 
still fine as reproducer), or add a -w, or some other simple tweak will do.


Paolo.

Re: [PATCH] PR tree-optimization/58380

2013-09-11 Thread Dominique Dhumieres

The test g++.dg/torture/pr58380.C fails:

FAIL: g++.dg/torture/pr58380.C  -O0  (test for excess errors)
FAIL: g++.dg/torture/pr58380.C  -O1  (test for excess errors)
FAIL: g++.dg/torture/pr58380.C  -O2  (test for excess errors)
FAIL: g++.dg/torture/pr58380.C  -O3 -fomit-frame-pointer  (test for excess 
errors)
FAIL: g++.dg/torture/pr58380.C  -O3 -fomit-frame-pointer -funroll-loops  (test 
for excess errors)
FAIL: g++.dg/torture/pr58380.C  -O3 -fomit-frame-pointer -funroll-all-loops 
-finline-functions  (test for excess errors)
FAIL: g++.dg/torture/pr58380.C  -O3 -g  (test for excess errors)
FAIL: g++.dg/torture/pr58380.C  -Os  (test for excess errors)
FAIL: g++.dg/torture/pr58380.C  -O2 -flto -flto-partition=none  (test for 
excess errors)
FAIL: g++.dg/torture/pr58380.C  -O2 -flto  (test for excess errors)

(see also http://gcc.gnu.org/ml/gcc-testresults/2013-09/msg00839.html ).

The failures are (at least on x86_64-apple-darwin10):

/opt/gcc/work/gcc/testsuite/g++.dg/torture/pr58380.C: In static member function 
'static iplugin_factory& selection_to_stdout::get_factory()':
/opt/gcc/work/gcc/testsuite/g++.dg/torture/pr58380.C:167:79: warning: 
deprecated conversion from string constant to 'char*' [-Wwrite-strings]
/opt/gcc/work/gcc/testsuite/g++.dg/torture/pr58380.C:167:79: warning: 
deprecated conversion from string constant to 'char*' [-Wwrite-strings]


TIA

Dominique

Re: [patch driver]: Fix relocatable toolchain path-replacement in driver

2013-09-11 Thread Kai Tietz

2013/9/11 Joseph S. Myers :
> On Wed, 11 Sep 2013, Kai Tietz wrote:
>
>> This change fixes a quirk happening for relocated toolchains.  Driver
>> remembers original-build directory
>
> The original *build* directory should never be known to the driver; only
> the *configured* prefix.
>
> This area is complicated and subtle; you need to be very careful and
> precise in the analysis included in any patch submission related to it.
> Get things absolutely clear in your head and then write a complete,
> careful, precise self-contained explanation of everything relevant in GCC,
> as if for an audience not at all familiar with the code in question.

That I agree on, and it took me some time to find this.

>> in std_prefix variable for being able later to modify path.  Sadly
>> this std_prefix variable gets modified
>> later on, and so update_path can't work any longer as desired.  This
>
> You don't say what "as desired" is.  prefix.c contains a long description
> of *what* the path updates are, but no explanation of *why* or any overall
> design; it appears to be something Windows-specific.

Well it might be Windows specific.  It shows its bad side on Windows
due old build-prefix remains on translation and leads to the try to
access an invalid path.  That isn't necessarily a problem as long as
the drive-letter exists.  Otherwise user gets system-failures by this.
 I am pretty sure that it isn't related to make_relative_path, as
otherwise we would see same behavior on other targets, too.

> update_path should, as I understand it, always be called with PATH being a
> relocated path (one that has had the configured prefix converted to the
> prefix where the toolchain is in fact installed, using
> make_relative_prefix, or that doesn't need any relocation, e.g. a path
> passed with a -B option).  In incpath.c, for example, you see how a path
> is computed using make_relative_prefix before update_path is called.
> Thus, it is correct for std_prefix to be the relocated prefix rather than
> the unrelocated one.

I understand this different.  See here translate_name function (used
by update_path).  If path is empty it falls back to PREFIX (not
std_prefix).  So I understand that this routine tries to cut off
compiled in prefix from paths and replaces it by key's path.  But
well, I might got this wrong.
The bad point about all this on native Windows targets is the use of
msys and its stupid auto-magic PATH-translation (means it converts
simply everything what it sees to begin with a slash ...).  By this it
might be that it is indeed a Windows specific thing here.
Nevertheless one I think we need to address ...

> If there is a bug, I'd say it's in whatever code is calling update_path
> without having first done the make_relative_prefix relocation on the path
> being passed to update_path.  Thus, it is that caller that you need to
> fix.
>
> --
> Joseph S. Myers
> jos...@codesourcery.com

Regards,
Kai

Re: [PATCH V4 1/2] Support lambda templates.

2013-09-11 Thread Jason Merrill


OK.

Jason

Re: [PATCH, i386, MPX 1/X] Support of Intel MPX ISA

2013-09-11 Thread Uros Bizjak

On Tue, Sep 10, 2013 at 1:38 PM, Ilya Enkovich  wrote:
> Ping^4
>
> Could please someone look at this patch? It is mostly i386 target
> specific and is basic for further MPX based features.
>
> Thanks,
> Ilya
>
> 2013/9/2 Ilya Enkovich :
>> Ping^3
>>
>> Attached is the same patch but against the current trunk.
>>
>> 2013/8/26 Ilya Enkovich :
>>> Ping
>>>
>>> 2013/8/19 Ilya Enkovich :
 Ping

 2013/8/12 Ilya Enkovich :
> 2013/8/10 Joseph S. Myers :
>> On Mon, 29 Jul 2013, Ilya Enkovich wrote:
>>
>>> Hi,
>>>
>>> Here is updated version of the patch. I removed redundant
>>> mode_for_bound, added comments to BOUND_TYPE and added -mmpx option.
>>> I also fixed bndmk/bndldx/bndstx constraints to avoid incorrect
>>> register allocation (created two new constraints for that).
>>
>> I think the -mmpx option should be documented in invoke.texi, and the new
>> machine modes / mode class should be documented in rtl.texi where other
>> machine modes / mode classes are documented.  Beyond that, I have no
>> comments on this patch revision.
>>
>> --
>> Joseph S. Myers
>> jos...@codesourcery.com
>
> Thanks! Here is a new revision with -mmpx and new machine modes /
> class documented.
> Is it good to install to trunk?
>
> Thanks,
> Ilya
> ---
> 2013-08-12  Ilya Enkovich  
>
> * mode-classes.def (MODE_BOUND): New.
> * tree.def (BOUND_TYPE): New.
> * genmodes.c (complete_mode): Support MODE_BOUND.
> (BOUND_MODE): New.
> (make_bound_mode): New.
> * machmode.h (BOUND_MODE_P): New.
> * stor-layout.c (int_mode_for_mode): Support MODE_BOUND.
> (layout_type): Support BOUND_TYPE.
> * tree-pretty-print.c (dump_generic_node): Support BOUND_TYPE.
> * tree.c (build_int_cst_wide): Support BOUND_TYPE.
> (type_contains_placeholder_1): Likewise.
> * tree.h (BOUND_TYPE_P): New.
> * varasm.c (output_constant): Support BOUND_TYPE.
> * config/i386/constraints.md (B): New.
> (Ti): New.
> (Tb): New.
> * config/i386/i386-modes.def (BND32): New.
> (BND64): New.
> * config/i386/i386-protos.h (ix86_bnd_prefixed_insn_p): New.
> * config/i386/i386.c (isa_opts): Add mmpx.
> (regclass_map): Add bound registers.
> (dbx_register_map): Likewise.
> (dbx64_register_map): Likewise.
> (svr4_dbx_register_map): Likewise.
> (PTA_MPX): New.
> (ix86_option_override_internal) Support MPX ISA.
> (ix86_code_end): Add MPX bnd prefix.
> (output_set_got): Likewise.
> (ix86_output_call_insn): Likewise.
> (get_some_local_dynamic_name): Add '!' (MPX bnd) print prefix 
> support.
> (ix86_print_operand_punct_valid_p): Likewise.
> (ix86_print_operand_address): Support UNSPEC_BNDMK_ADDR and
> UNSPEC_BNDMK_ADDR.
> (ix86_class_likely_spilled_p): Add bound regs support.
> (ix86_hard_regno_mode_ok): Likewise.
> (x86_order_regs_for_local_alloc): Likewise.
> (ix86_bnd_prefixed_insn_p): New.
> * config/i386/i386.h (FIRST_PSEUDO_REGISTER): Fix to new value.
> (FIXED_REGISTERS): Add bound registers.
> (CALL_USED_REGISTERS): Likewise.
> (REG_ALLOC_ORDER): Likewise.
> (HARD_REGNO_NREGS): Likewise.
> (TARGET_MPX): New.
> (VALID_BND_REG_MODE): New.
> (FIRST_BND_REG): New.
> (LAST_BND_REG): New.
> (reg_class): Add BND_REGS.
> (REG_CLASS_NAMES): Likewise.
> (REG_CLASS_CONTENTS): Likewise.
> (BND_REGNO_P): New.
> (ANY_BND_REG_P): New.
> (BNDmode): New.
> (HI_REGISTER_NAMES): Add bound registers.
> * config/i386/i386.md (UNSPEC_BNDMK): New.
> (UNSPEC_BNDMK_ADDR): New.
> (UNSPEC_BNDSTX): New.
> (UNSPEC_BNDLDX): New.
> (UNSPEC_BNDLDX_ADDR): New.
> (UNSPEC_BNDCL): New.
> (UNSPEC_BNDCU): New.
> (UNSPEC_BNDCN): New.
> (UNSPEC_MPX_FENCE): New.
> (BND0_REG): New.
> (BND1_REG): New.
> (type): Add mpxmov, mpxmk, mpxchk, mpxld, mpxst.
> (length_immediate): Likewise.
> (prefix_0f): Likewise.
> (memory): Likewise.
> (prefix_rep): Check for bnd prefix.
> (BND): New.
> (bnd_ptr): New.
> (BNDCHECK): New.
> (bndcheck): New.
> (*jcc_1): Add MPX bnd prefix and fix length.
> (*jcc_2): Likewise.
> (jump): Likewise.
> (simple_return_internal): Likewise.
> (simple_return_po

Re: [PATCH, PowerPC] Fix PR57949 (ABI alignment issue)

2013-09-11 Thread Jay

Isn't mixing and matching and mismatching somewhat inevitable? Libffi & gcc 
don't always come along with each other? One must never change the ABI?

 - Jay

On Sep 11, 2013, at 5:55 AM, Bill Schmidt  wrote:

> On Wed, 2013-09-11 at 21:08 +0930, Alan Modra wrote:
>> On Wed, Aug 14, 2013 at 10:32:01AM -0500, Bill Schmidt wrote:
>>> This fixes a long-standing problem with GCC's implementation of the
>>> PPC64 ELF ABI.  If a structure contains a member requiring 128-bit
>>> alignment, and that structure is passed as a parameter, the parameter
>>> currently receives only 64-bit alignment.  This is an error, and is
>>> incompatible with correct code generated by the IBM XL compilers.
>> 
>> This caused multiple failures in the libffi testsuite:
>> libffi.call/cls_align_longdouble.c
>> libffi.call/cls_align_longdouble_split.c
>> libffi.call/cls_align_longdouble_split2.c
>> libffi.call/nested_struct5.c
>> 
>> Fixed by making the same alignment adjustment in libffi to structures
>> passed by value.  Bill, I think your patch needs to go on all active
>> gcc branches as otherwise we'll need different versions of libffi for
>> the next gcc releases.
> 
> Hm, the libffi case is unfortunate. :(
> 
> The alternative is to leave libffi alone, and require code that calls
> these interfaces with "bad" structs passed by value to be built using
> -mcompat-align-parm, which was provided for such compatibility issues.
> Hopefully there is a small number of cases where this can happen, and
> this could be documented with libffi and gcc.  What do you think?
> 
> Thanks,
> Bill
> 
>> 
>> The following was bootstrapped and regression checked powerpc64-linux.
>> OK for mainline, and the 4.7 and 4.8 branches when/if Bill's patch
>> goes in there?
>> 
>>* src/powerpc/ffi.c (ffi_prep_args64): Align FFI_TYPE_STRUCT.
>>(ffi_closure_helper_LINUX64): Likewise.
>> 
>> Index: libffi/src/powerpc/ffi.c
>> ===
>> --- libffi/src/powerpc/ffi.c(revision 202428)
>> +++ libffi/src/powerpc/ffi.c(working copy)
>> @@ -462,6 +462,7 @@ ffi_prep_args64 (extended_cif *ecif, unsigned long
>> double **d;
>>   } p_argv;
>>   unsigned long gprvalue;
>> +  unsigned long align;
>> 
>>   stacktop.c = (char *) stack + bytes;
>>   gpr_base.ul = stacktop.ul - ASM_NEEDS_REGISTERS64 - 
>> NUM_GPR_ARG_REGISTERS64;
>> @@ -532,6 +533,10 @@ ffi_prep_args64 (extended_cif *ecif, unsigned long
>> #endif
>> 
>>case FFI_TYPE_STRUCT:
>> +  align = (*ptr)->alignment;
>> +  if (align > 16)
>> +align = 16;
>> +  next_arg.ul = ALIGN (next_arg.ul, align);
>>  words = ((*ptr)->size + 7) / 8;
>>  if (next_arg.ul >= gpr_base.ul && next_arg.ul + words > gpr_end.ul)
>>{
>> @@ -1349,6 +1354,7 @@ ffi_closure_helper_LINUX64 (ffi_closure *closure,
>>   long i, avn;
>>   ffi_cif *cif;
>>   ffi_dblfl *end_pfr = pfr + NUM_FPR_ARG_REGISTERS64;
>> +  unsigned long align;
>> 
>>   cif = closure->cif;
>>   avalue = alloca (cif->nargs * sizeof (void *));
>> @@ -1399,6 +1405,10 @@ ffi_closure_helper_LINUX64 (ffi_closure *closure,
>>  break;
>> 
>>case FFI_TYPE_STRUCT:
>> +  align = arg_types[i]->alignment;
>> +  if (align > 16)
>> +align = 16;
>> +  pst = ALIGN (pst, align);
>> #ifndef __LITTLE_ENDIAN__
>>  /* Structures with size less than eight bytes are passed
>> left-padded.  */
>

Re: [PATCH V4 2/2] Support using 'auto' in a function parameter list to introduce an implicit template parameter.

2013-09-11 Thread Jason Merrill


On 09/09/2013 10:19 PM, Adam Butcher wrote:

+ if (current_class_type && LAMBDA_TYPE_P (current_class_type))
+   {
+ if (cxx_dialect < cxx1y)
+   pedwarn (location_of (type), 0,
+"use of % in lambda parameter declaration "
+"only available with "
+"-std=c++1y or -std=gnu++1y");
+   }
+ else
+   pedwarn (location_of (type), OPT_Wpedantic,
+"ISO C++ forbids use of % in parameter "
+"declaration");


I think we want to limit the implicit template extension to C++1y mode 
as well.


OK with that change.

Jason

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-11 Thread Wei Mi

I tried that and it caused some regressions, so I choosed to do
chain_to_prev_insn another time in add_branch_dependences. There could
be some dependence between those two functions.

On Wed, Sep 11, 2013 at 2:58 AM, Alexander Monakov  wrote:
>
>
> On Tue, 10 Sep 2013, Wei Mi wrote:
>
>> Because deps_analyze_insn only analyzes data deps but no control deps.
>> Control deps are included by add_branch_dependences. Without the
>> chain_to_prev_insn in the end of add_branch_dependences, jmp will be
>> control dependent on every previous insn in the same bb, and the cmp
>> and jmp group could still be scheduled apart since they will not be
>> put in ready list at the same time.
>
> Would calling add_branch_dependences before sched_analyze solve that, then?
>
> Alexander

Re: V4 Lambda templates and implicit function templates.

2013-09-11 Thread Jason Merrill


On 09/11/2013 10:42 AM, Jason Merrill wrote:

Sounds like the problem is that the compiler is trying to instantiate a
function while cp_unevaluated_operand is set.  But that shouldn't be an
issue because push_to_top_level clears cp_unevaluated_operand.  How does
it come to be set when instantiating the local variable?


Ah, I see: it's because instantiate_decl doesn't push_to_top_level for 
function-local templates.  We still need to save/restore 
cp_unevaluated_operand in that case, and let's also do 
c_inhibit_evaluation_warnings.


Jason

Re: [patch driver]: Fix relocatable toolchain path-replacement in driver

2013-09-11 Thread Joseph S. Myers

On Wed, 11 Sep 2013, Kai Tietz wrote:

> This change fixes a quirk happening for relocated toolchains.  Driver
> remembers original-build directory

The original *build* directory should never be known to the driver; only 
the *configured* prefix.

This area is complicated and subtle; you need to be very careful and 
precise in the analysis included in any patch submission related to it.  
Get things absolutely clear in your head and then write a complete, 
careful, precise self-contained explanation of everything relevant in GCC, 
as if for an audience not at all familiar with the code in question.

> in std_prefix variable for being able later to modify path.  Sadly
> this std_prefix variable gets modified
> later on, and so update_path can't work any longer as desired.  This

You don't say what "as desired" is.  prefix.c contains a long description 
of *what* the path updates are, but no explanation of *why* or any overall 
design; it appears to be something Windows-specific.

update_path should, as I understand it, always be called with PATH being a 
relocated path (one that has had the configured prefix converted to the 
prefix where the toolchain is in fact installed, using 
make_relative_prefix, or that doesn't need any relocation, e.g. a path 
passed with a -B option).  In incpath.c, for example, you see how a path 
is computed using make_relative_prefix before update_path is called.  
Thus, it is correct for std_prefix to be the relocated prefix rather than 
the unrelocated one.

If there is a bug, I'd say it's in whatever code is calling update_path 
without having first done the make_relative_prefix relocation on the path 
being passed to update_path.  Thus, it is that caller that you need to 
fix.

-- 
Joseph S. Myers
jos...@codesourcery.com

[linaro/gcc-4_8-branch] Backports from trunk and merge from gcc-4_8-branch

2013-09-11 Thread Christophe Lyon

Hi,

We have recently committed backports from trunk to linaro/gcc-4_8-branch:

201624,201666   (as 202196)
201589  (as 202276)
201599  (as 202277)
201730,201731   (as 202278)
199527,199792,199814,201435 (as 202280)
201342,201566   (as 202294)
201249,201267   (as 202295)
201412  (as 202383)
200593,201024,201025,201122,201124,201126 (as 202424)
200197,201411(as 202430)

I have also merged the gcc-4_8-branch into linaro/gcc-4_8-branch up to
revision 202157 as 202439.

Christophe.

Re: V4 Lambda templates and implicit function templates.

2013-09-11 Thread Jason Merrill


On 09/11/2013 03:38 AM, Adam Butcher wrote:

This is not a complete enough description.  It only ICEs instantiating
the call op through the decltype return of the conversion op if the
return type of the call op is a deduced one (i.e. unspecified or
specified explicitly as 'auto').  If the lambda call op is instantiated
explicitly (i.e. the lambda is called) prior to using the conversion op
then all is well.  It seems to only occur if there are variables
declared within the lambda body or accessible via the lambda's 'this'.


Sounds like the problem is that the compiler is trying to instantiate a 
function while cp_unevaluated_operand is set.  But that shouldn't be an 
issue because push_to_top_level clears cp_unevaluated_operand.  How does 
it come to be set when instantiating the local variable?


Jason

Re: [PATCH, PR 57748] Check for out of bounds access

2013-09-11 Thread Richard Biener

On Wed, Sep 11, 2013 at 3:41 PM, Bernd Edlinger
 wrote:
> On Tue, 10 Sep 2013 21:32:29, Martin Jambor wrote:
>> Hi,
>>
>> On Fri, Sep 06, 2013 at 11:19:18AM +0200, Richard Biener wrote:
>>> On Fri, Sep 6, 2013 at 10:35 AM, Bernd Edlinger
>>>  wrote:
 Richard,

> But the movmisalign path skips all this code and with the
> current code thinks the actual access is in the mode of the
> whole structure. (and also misses the address adjustment
> as shown in the other testcase for the bug)
>
> The movmisalign handling in this path is just broken. And
> it's _not_ just for optimization! If you have
>
> struct __attribute__((packed)) {
> double x;
> v2df y;
> } *p;
>
> and do
>
> p = malloc (48); // gets you 8-byte aligned memory
> p->y = (v2df) { 1., 2. };
>
> then you cannot skip the movmisaling handling because
> we'd generate an aligned move (based on the mode) then.
>

 Ok, test examples are really helpful here.

 This time the structure is BLKmode, unaligned,
 movmisalign = false anyway.

 I tried to make a test case out of your example,
 and as I expected, the code is also correct:

 foo:
 .cfi_startproc
 movdqa .LC1(%rip), %xmm0
 movq $-1, (%rdi)
 movl $0x4048f5c3, 8(%rdi)
 movdqu %xmm0, 12(%rdi)
 ret

 movdqu.

 The test executes without trap.
 And I did everything to make the object unaligned.
>>>
>>> Yeah, well - for the expand path with BLKmode it's working fine.
>>> We're doing all
>>> the dances because of correctness for non-BLKmode expand paths.
>>>
 I am sure we could completely remove the
 movmisalign path, and nothing would happen.
 probably. except maybe for a performance regression.
>>>
>>> In this place probably yes. But we can also fix it to use the proper mode 
>>> and
>>> properly do the offset adjustment. But just adding a bounds check looks
>>> broken to me.
>>>
>>> Note that there may have been a correctness reason for this code in the
>>> light of the IPA-SRA code. Maybe Martin remembers.
>>>
>>
>> The misalignp path was added by you during the 4.7 development to fix
>> PR 50444 which was indeed about expansion of a SRA generated statement
>>
>> MEM[(struct Engine *)e_1(D) + 40B].m = SR.18_17;
>>
>> If I disable this path on the 4.7 branch, the testcase is compiled
>> incorrectly and aborts when run, apparently at least the 4.7's
>> combination of expand_normal and store_field cannot cope with it.
>>
>> The path no longer tests the testcase though, because the component
>> ref is not present in trunk, the LHS is now just
>>
>> MEM[(struct Engine *)e_3(D) + 40B]
>>
>> and so it is now handled just fine by the misaligned mem-ref case at
>> the beginning of expand_assignment.
>
> I tried to remove the misaligned mem-ref case at the beginning of the
> expand_assignment, just to see how it fails. I did this on trunk.
>
> But still all testcases pass,  including pr50444.c...
>
> How can this be?

Not sure.  Can you try reverting the fix itself and bisect when the testcase
will start to pass?  (maybe the testcase passed even without the fix?)

Richard.

>>> If removing the movmisalign handling inside the handled-component
>>> case in expand_assignment works out (make sure to also test a
>>> strict-alignment target) then that is probably fine.
>>>
>>
>> I think we'd also better check that we do have a test where we expand
>> a COMPONENT_REF encapsulating a misaligned MEM_REF and a misaligned
>> MEM_REF that is mem_ref_refers_to_non_mem_p.
>>
>> I'm now going through all the new comments in bugzilla and the
>> testcases to see if I can still be of any help.
>>
>> Martin

RE: [PATCH, PR 57748] Check for out of bounds access

2013-09-11 Thread Bernd Edlinger

On Tue, 10 Sep 2013 21:32:29, Martin Jambor wrote:
> Hi,
>
> On Fri, Sep 06, 2013 at 11:19:18AM +0200, Richard Biener wrote:
>> On Fri, Sep 6, 2013 at 10:35 AM, Bernd Edlinger
>>  wrote:
>>> Richard,
>>>
 But the movmisalign path skips all this code and with the
 current code thinks the actual access is in the mode of the
 whole structure. (and also misses the address adjustment
 as shown in the other testcase for the bug)

 The movmisalign handling in this path is just broken. And
 it's _not_ just for optimization! If you have

 struct __attribute__((packed)) {
 double x;
 v2df y;
 } *p;

 and do

 p = malloc (48); // gets you 8-byte aligned memory
 p->y = (v2df) { 1., 2. };

 then you cannot skip the movmisaling handling because
 we'd generate an aligned move (based on the mode) then.

>>>
>>> Ok, test examples are really helpful here.
>>>
>>> This time the structure is BLKmode, unaligned,
>>> movmisalign = false anyway.
>>>
>>> I tried to make a test case out of your example,
>>> and as I expected, the code is also correct:
>>>
>>> foo:
>>> .cfi_startproc
>>> movdqa .LC1(%rip), %xmm0
>>> movq $-1, (%rdi)
>>> movl $0x4048f5c3, 8(%rdi)
>>> movdqu %xmm0, 12(%rdi)
>>> ret
>>>
>>> movdqu.
>>>
>>> The test executes without trap.
>>> And I did everything to make the object unaligned.
>>
>> Yeah, well - for the expand path with BLKmode it's working fine.
>> We're doing all
>> the dances because of correctness for non-BLKmode expand paths.
>>
>>> I am sure we could completely remove the
>>> movmisalign path, and nothing would happen.
>>> probably. except maybe for a performance regression.
>>
>> In this place probably yes. But we can also fix it to use the proper mode and
>> properly do the offset adjustment. But just adding a bounds check looks
>> broken to me.
>>
>> Note that there may have been a correctness reason for this code in the
>> light of the IPA-SRA code. Maybe Martin remembers.
>>
>
> The misalignp path was added by you during the 4.7 development to fix
> PR 50444 which was indeed about expansion of a SRA generated statement
>
> MEM[(struct Engine *)e_1(D) + 40B].m = SR.18_17;
>
> If I disable this path on the 4.7 branch, the testcase is compiled
> incorrectly and aborts when run, apparently at least the 4.7's
> combination of expand_normal and store_field cannot cope with it.
>
> The path no longer tests the testcase though, because the component
> ref is not present in trunk, the LHS is now just
>
> MEM[(struct Engine *)e_3(D) + 40B]
>
> and so it is now handled just fine by the misaligned mem-ref case at
> the beginning of expand_assignment.

I tried to remove the misaligned mem-ref case at the beginning of the
expand_assignment, just to see how it fails. I did this on trunk.

But still all testcases pass,  including pr50444.c...

How can this be?

>> If removing the movmisalign handling inside the handled-component
>> case in expand_assignment works out (make sure to also test a
>> strict-alignment target) then that is probably fine.
>>
>
> I think we'd also better check that we do have a test where we expand
> a COMPONENT_REF encapsulating a misaligned MEM_REF and a misaligned
> MEM_REF that is mem_ref_refers_to_non_mem_p.
>
> I'm now going through all the new comments in bugzilla and the
> testcases to see if I can still be of any help.
>
> Martin

Re: [patch 1/2] tree-flow.h restructuring

2013-09-11 Thread Richard Biener

On Wed, Sep 11, 2013 at 2:30 PM, Andrew MacLeod  wrote:
> On 09/11/2013 04:45 AM, Richard Biener wrote:
>>
>> On Tue, Sep 10, 2013 at 9:19 PM, Andrew MacLeod 
>> wrote:
>>>
>>> Here's a start at restructuring the whole tree-flow.h mess that we
>>> created
>>> back in the original wild west tree-ssa days.
>>>
>>> First, almost everyone includes tree-flow.h because it became the kitchen
>>> sink of functionality.  Really, what we ought to have is a tree-ssa.h
>>> which
>>> anything that uses basic tree-ssa functionality includes, and that'll be
>>> the
>>> main include for SSA passes. Other prototypes and such should come from
>>> other appropriate places. Doing this in one step is basically impossible.
>>> so
>>> here is that first few steps, structured so that it's easier to review.
>>>
>>> I changed everywhere which includes tree-flow.h  to include tree-ssa.h
>>> instead.   tree-ssa.h includes tree-flow.h first, which makes it
>>> functionally the same from a compiling point of view.
>>
>> You mean that doing the restructure and only adding includes that are
>> required to make compile work again wasn't feasible...?
>
> Not really for this first split... eveyrthing that required tree-flow.h
> basically requires tree-ssa.h. Once everything has been separated from
> tree-flow.h and put in their right places,  It's easy to see exactly which
> parts are required where, and  by tackling each .c file once that uses
> tree.ssa.h I can get it down the the bare bones for what was required from
> the original bloated tree-flow.h.
>
>>> I also moved everything from tree-flow.h and tree-flow-inline.h that is
>>> related to tree-ssa.c functions into tree-ssa.h.  There were also a few
>>> function prototypes sprinkled in a  couple of other headers which I moved
>>> into tree-ssa.h as well.  I have verified that every exported function in
>>> tree-ssa.c has a proto in tree-ssa.h, and is listed in the same order as
>>> the
>>> .h file.
>>
>> Note that in general there isn't a thing like "tree SSA" anymore - it's
>> "GIMPLE SSA" now ;)  Not that we want gigantic renaming occuring now,
>> just if you invent completely new .[ch] files then keep that in mind.
>
>
> I considered just making it just ssa.h, but maybe this isn't the time...
> Probably easier to do a mass rename very early in stage 1... until then I
> figure it probably makes sense to make the .h match the .c file.. yes,
> for a new pair, I'd not bother with tree-*
>
>>
>> That is, if you moved stuff to tree-ssa.h that has no definition in
>> tree-ssa.c
>> then I'd rather have a new gimple-ssa.h that doesn't have a corresponding
>> .c file (for now).
>
> Ah, i see.   Although mulling it over last night,  Im marginally in favour
> of having a separate .h for for each .c... Then things like
> tree-ssa-ununit.h which are only used in one or two places can simply be
> included in the one or two .c files they are required for.  That way we
> preserve tree-ssa.h to only include the "commonly required" .h files which 6
> or more .c files require. (to pick an arbitrary number :-)   This will help
> reduce the include web for rebuilding/compiling.  I also note your response
> in the other patch... First I will try to restructure whats in the file
> before resorting to a new .h file in these cases.  I'll keep a
> gimple-ssa.h "aggregator" on the backburner until it looks like it will be
> needed.
>
>>> Compiling this change indicated that there were a few files which
>>> required
>>> functionality from tree-ssa.c which really don't belong there.  In
>>> particular,  useless_type_conversion_p and types_compatible_p are used by
>>> the C++ front end and other places which don't really care about SSA.  So
>>> I
>>> moved them into more appropriate places... tree.c and tree.h
>>
>> Well, those functions are only valid when we are in GIMPLE (the GIMPLE
>> type system is less strict than the GENERIC one), so tree.[ch] isn't
>> appropriate.
>> gimple.[ch] would be.  But I wonder where it's used from the FEs - that's
>> surely
>> looking wrong (unless it's in a langhook).  But yes, the functions are not
>> in any way about SSA but only GIMPLE.
>
>
> OK, that wasn't obvious from the function itself (it only ever deals with
> trees), but that being the case, I'll move them both to gimple.[ch] instead.
> That seems more appropriate and I will comment it as gimple compatible only.
>
> Looks like there is a langhook...
> c-family/c-common.c:  && lang_hooks.types_compatible_p (TREE_TYPE (t1),
> TREE_TYPE (t2)))
>
> However each front end seems to provide their own version of
> types_compatible_p and use that... ( c_types_compatible_p and
> cxx_types_compatible_p).   lto.c has gimple_types_compatible_p()... but does
> also use types_compatible_p() in lto-symtab.c  :-P

Yeah, but lang_hooks.types_compatible_p () is not equal to
types_compatible_p () :P

The langhook seems to be only used from c-family/ and thus is a C-family hook
now ;)

Richard.

>
> Andrew

[PATCH, testsuite] Add testcase from PR 58371

2013-09-11 Thread Martin Jambor

Hi,

the bug I have fixed yesterday got reported a day earlier as PR 58371.
This patch adds the testcase from that PR to the testsuite.  Honza
asked for it so I assume it has been pre-approved and so I will commit
it later today or early tomorrow if there are no objections.

Thanks,

Martin


2013-09-11  Martin Jambor  

PR ipa/58371
* g++.dg/ipa/pr58371.C: New test.

Index: src/gcc/testsuite/g++.dg/ipa/pr58371.C
===
--- /dev/null
+++ src/gcc/testsuite/g++.dg/ipa/pr58371.C
@@ -0,0 +1,204 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+
+typedef int size_t;
+namespace {
+template < typename > struct char_traits;
+}
+namespace __gnu_cxx {
+template < typename > class new_allocator {
+};
+}
+namespace std {
+template < typename _Tp > class allocator:__gnu_cxx::new_allocator < _Tp > {
+public:
+  size_t size_type;
+  typedef _Tp & const_reference;
+  template < typename > struct rebind {
+typedef allocator other;
+  };
+};
+}
+namespace __gnu_cxx {
+template < typename _Alloc > struct __alloc_traits {
+  typedef typename _Alloc::const_reference const_reference;
+  template < typename _Tp > struct rebind {
+typedef typename _Alloc::template rebind < _Tp >::other other;
+  };
+};
+}
+namespace std {
+struct __numeric_limits_base {
+};
+template < typename _Tp > struct numeric_limits:__numeric_limits_base {
+  static _Tp max () {
+  }
+};
+template < typename _Tp, typename _Alloc > struct _Vector_base {
+  typedef typename __gnu_cxx::__alloc_traits < _Alloc >::template rebind <
+  _Tp >::other _Tp_alloc_type;
+};
+template < typename _Tp, typename _Alloc = std::allocator < _Tp > >class 
vector:_Vector_base < _Tp,
+  _Alloc
+> {
+  typedef _Vector_base < _Tp, _Alloc > _Base;
+  typedef typename _Base::_Tp_alloc_type _Tp_alloc_type;
+  typedef __gnu_cxx::__alloc_traits < _Tp_alloc_type > _Alloc_traits;
+public:
+  _Tp value_type;
+  typedef typename _Alloc_traits::const_reference const_reference;
+  typedef size_t size_type;
+  size_type size () {
+  } const_reference operator[] (size_type) {
+  }
+};
+template < typename _CharT, typename =
+char_traits < _CharT > >class basic_ostream;
+typedef basic_ostream < int >ostream;
+class ios_base {
+};
+template < typename, typename > class basic_ios:ios_base {
+};
+template < typename _CharT, typename _Traits > class basic_ostream:basic_ios < 
_CharT,
+  _Traits
+> {
+public:
+  _CharT char_type;
+  typedef basic_ostream __ostream_type;
+  __ostream_type & operator<< (const void *) {
+  }
+};
+}
+namespace logging {
+int GetMinLogLevel ();
+typedef int LogSeverity;
+LogSeverity LOG_ERROR_REPORT;
+LogSeverity LOG_DCHECK;
+class LogMessage {
+public:
+  LogMessage (const char *, int, LogSeverity);
+  std::ostream & stream () {
+  }
+};
+class LogMessageVoidify {
+public:
+  LogMessageVoidify () {
+  } void operator& (std::ostream &) {
+  }
+};
+}
+namespace base {
+namespace internal {
+class WeakPtrBase {
+};
+class SupportsWeakPtrBase {
+};
+} template < typename T > class WeakPtr:internal::WeakPtrBase {
+public:
+  WeakPtr () :ptr_ () {
+  } T *operator-> () {
+logging:0 &&
+logging::LOG_DCHECK >=
+logging::GetMinLogLevel () ? (void) 0 : logging::
+LogMessageVoidify () & logging::
+LogMessage ("../../base/memory/weak_ptr.h", 0,
+logging::LOG_ERROR_REPORT).stream () << ". ";
+  } T *ptr_;
+};
+template < class > class SupportsWeakPtr:internal::SupportsWeakPtrBase {
+};
+}
+template < class ObserverType > class ObserverListBase:base::SupportsWeakPtr < 
ObserverListBase < ObserverType >
+> {
+public:
+  class Iterator {
+  public:
+Iterator (ObserverListBase & list) :max_index_ (0 ? std::numeric_limits <
+  size_t >::max () : list.observers_.
+  size () ) {
+} ObserverType *
+GetNext () {
+  ListType & observers = list_->observers_;
+  if (observers[0])
+++index_;
+}
+base::WeakPtr < ObserverListBase > list_;
+size_t
+index_;
+size_t
+max_index_;
+  };
+  typedef
+  std::vector <
+  ObserverType * >
+  ListType;
+  ListType
+  observers_;
+};
+template < class ObserverType, bool > class ObserverList:public 
ObserverListBase <
+ObserverType > {
+};
+namespace
+ProxyPrefs {
+enum ConfigState
+{ };
+}
+namespace
+net {
+class
+ProxyConfig {
+};
+class
+ProxyConfigService {
+public:
+  enum ConfigAvailability
+  { };
+  class
+  Observer {
+  public:
+Observer () {
+} virtual void
+OnProxyConfigChanged (const ProxyConfig &, ConfigAvailability) = 0;
+  };
+  virtual void
+  OnLazyPoll () {
+  }
+};
+}
+class
+  ChromeProxyConfigService:
+  net::ProxyConfigService,
+net::ProxyConfigService::Observer {
+  ConfigAvailability
+  GetLatestProxyConfig (net::ProxyConfig *);
+  void
+  UpdateProxyConfig (ProxyPrefs::ConfigState, const net::ProxyConfig &);
+  void
+  OnProxyConfigChanged (const net::ProxyConfig &, ConfigAvailability);
+

[PATCH] Move edge_within_scc to ipa-utils.c

2013-09-11 Thread Martin Jambor

Hi,

edge_within_scc should really be a part of API accompanying
ipa_reduced_postorder just like ipa_get_nodes_in_cycle and so this
patch moves it to ipa-utils.c and gives it the ipa_ prefix.

Bootstrapped and tested on x86_64-linux.  OK for trunk?

Thanks,

Martin



2013-09-10  Martin Jambor  

* ipa-utils.h (ipa_edge_within_scc): Declare.
* ipa-cp.c (edge_within_scc): Moved...
* ipa-utils.c (ipa_edge_within_scc): ...here.  Updated all callers.

Index: src/gcc/ipa-utils.c
===
--- src.orig/gcc/ipa-utils.c
+++ src/gcc/ipa-utils.c
@@ -253,6 +253,22 @@ ipa_get_nodes_in_cycle (struct cgraph_no
   return v;
 }
 
+/* Return true iff the CS is an edge within a strongly connected component as
+   computed by ipa_reduced_postorder.  */
+
+bool
+ipa_edge_within_scc (struct cgraph_edge *cs)
+{
+  struct ipa_dfs_info *caller_dfs = (struct ipa_dfs_info *) 
cs->caller->symbol.aux;
+  struct ipa_dfs_info *callee_dfs;
+  struct cgraph_node *callee = cgraph_function_node (cs->callee, NULL);
+
+  callee_dfs = (struct ipa_dfs_info *) callee->symbol.aux;
+  return (caller_dfs
+ && callee_dfs
+ && caller_dfs->scc_no == callee_dfs->scc_no);
+}
+
 struct postorder_stack
 {
   struct cgraph_node *node;
Index: src/gcc/ipa-utils.h
===
--- src.orig/gcc/ipa-utils.h
+++ src/gcc/ipa-utils.h
@@ -42,6 +42,7 @@ int ipa_reduced_postorder (struct cgraph
  bool (*ignore_edge) (struct cgraph_edge *));
 void ipa_free_postorder_info (void);
 vec ipa_get_nodes_in_cycle (struct cgraph_node *);
+bool ipa_edge_within_scc (struct cgraph_edge *);
 int ipa_reverse_postorder (struct cgraph_node **);
 tree get_base_var (tree);
 void ipa_merge_profiles (struct cgraph_node *dst,
Index: src/gcc/ipa-cp.c
===
--- src.orig/gcc/ipa-cp.c
+++ src/gcc/ipa-cp.c
@@ -287,22 +287,6 @@ ipa_lat_is_single_const (struct ipcp_lat
 return true;
 }
 
-/* Return true iff the CS is an edge within a strongly connected component as
-   computed by ipa_reduced_postorder.  */
-
-static inline bool
-edge_within_scc (struct cgraph_edge *cs)
-{
-  struct ipa_dfs_info *caller_dfs = (struct ipa_dfs_info *) 
cs->caller->symbol.aux;
-  struct ipa_dfs_info *callee_dfs;
-  struct cgraph_node *callee = cgraph_function_node (cs->callee, NULL);
-
-  callee_dfs = (struct ipa_dfs_info *) callee->symbol.aux;
-  return (caller_dfs
- && callee_dfs
- && caller_dfs->scc_no == callee_dfs->scc_no);
-}
-
 /* Print V which is extracted from a value in a lattice to F.  */
 
 static void
@@ -957,7 +941,7 @@ add_value_to_lattice (struct ipcp_lattic
   for (val = lat->values; val; val = val->next)
 if (values_equal_for_ipcp_p (val->value, newval))
   {
-   if (edge_within_scc (cs))
+   if (ipa_edge_within_scc (cs))
  {
struct ipcp_value_source *s;
for (s = val->sources; s ; s = s->next)
@@ -1030,7 +1014,7 @@ propagate_vals_accross_pass_through (str
  are arithmetic functions with circular dependencies, there is infinite
  number of them and we would just make lattices bottom.  */
   if ((ipa_get_jf_pass_through_operation (jfunc) != NOP_EXPR)
-  and edge_within_scc (cs))
+  && ipa_edge_within_scc (cs))
 ret = set_lattice_contains_variable (dest_lat);
   else
 for (src_val = src_lat->values; src_val; src_val = src_val->next)
@@ -1061,7 +1045,7 @@ propagate_vals_accross_ancestor (struct
   struct ipcp_value *src_val;
   bool ret = false;
 
-  if (edge_within_scc (cs))
+  if (ipa_edge_within_scc (cs))
 return set_lattice_contains_variable (dest_lat);
 
   for (src_val = src_lat->values; src_val; src_val = src_val->next)
@@ -2129,7 +2113,7 @@ propagate_constants_topo (struct topo_in
  struct cgraph_edge *cs;
 
  for (cs = v->callees; cs; cs = cs->next_callee)
-   if (edge_within_scc (cs)
+   if (ipa_edge_within_scc (cs)
&& propagate_constants_accross_call (cs))
  push_node_to_stack (topo, cs->callee);
  v = pop_node_from_stack (topo);
@@ -2146,7 +2130,7 @@ propagate_constants_topo (struct topo_in
estimate_local_effects (v);
add_all_node_vals_to_toposort (v);
for (cs = v->callees; cs; cs = cs->next_callee)
- if (!edge_within_scc (cs))
+ if (!ipa_edge_within_scc (cs))
propagate_constants_accross_call (cs);
  }
   cycle_nodes.release ();
@@ -3462,7 +3446,7 @@ spread_undeadness (struct cgraph_node *n
   struct cgraph_edge *cs;
 
   for (cs = node->callees; cs; cs = cs->next_callee)
-if (edge_within_scc (cs))
+if (ipa_edge_within_scc (cs))
   {
struct cgraph_node *callee;
struct ipa_node_params *info;
@@ -3493,7 +3477,7 @@ has_undead_caller_from_outside_scc_p (st

Re: C++ demangler fix

2013-09-11 Thread Jakub Jelinek

On Wed, Sep 11, 2013 at 01:49:46PM +0100, Gary Benson wrote:
> Jakub Jelinek wrote:
> > cp-demangle.c isn't used just in libiberty, where using hashtab,
> > xcalloc, XNEW etc. is fine, but also in libsupc++/libstdc++, where
> > none of that is fine.  That is why cp-demangle.c only uses
> > e.g. realloc, checks for allocation failures and propagates those to
> > the caller if they happen (see allocation_failure field).  hashtab.o
> > isn't linked into libstdc++ nor libsupc++, and the question is if we
> > really do want to link all the hashtable code into libstdc++.
> > How many hash table entries are there typically?  Is a hashtable
> > required?
> 
> Three entries were required for the symbol in the testcase:
> 
>   "_ZSt7forwardIRN1x14refobjiteratorINS0_3refINS0_4mime30multipart_se" \
>   "ction_processorObjIZ15get_body_parserIZZN14mime_processor21make_se" \
>   "ction_iteratorERKNS2_INS3_10sectionObjENS0_10ptrrefBaseEEEbENKUlvE" \
>   "_clEvEUlSB_bE_ZZNS6_21make_section_iteratorESB_bENKSC_clEvEUlSB_E0" \
>   "_ENS1_INS2_INS0_20outputrefiteratorObjIiEES8_RKSsSB_OT_OT0_EUl" \
>   "mE_NS3_32make_multipart_default_discarderISP_S8_EOT_RNSt16" \
>   "remove_referenceISW_E4typeE"
> 
> I don't think there will many symbols with very many entries required.
> I'm guessing that most symbols will require zero (which is why I made
> it defer hashtable creation until it was required).
> 
> What kind of data structure would you like to see here, a realloc'd
> array?  Do libsupc++ and libstdc++ use the demangler for anything more
> performance-sensitive than exception printing?

I don't know, I guess it isn't very performance sensitive, especially
if it is very rare to need too many of the scopes (at least on real-world
symbols, of course somebody can try to demangle something artificially
hacked up).

Jakub

Re: [PATCH, PowerPC] Fix PR57949 (ABI alignment issue)

2013-09-11 Thread Bill Schmidt

On Wed, 2013-09-11 at 21:08 +0930, Alan Modra wrote:
> On Wed, Aug 14, 2013 at 10:32:01AM -0500, Bill Schmidt wrote:
> > This fixes a long-standing problem with GCC's implementation of the
> > PPC64 ELF ABI.  If a structure contains a member requiring 128-bit
> > alignment, and that structure is passed as a parameter, the parameter
> > currently receives only 64-bit alignment.  This is an error, and is
> > incompatible with correct code generated by the IBM XL compilers.
> 
> This caused multiple failures in the libffi testsuite:
> libffi.call/cls_align_longdouble.c
> libffi.call/cls_align_longdouble_split.c
> libffi.call/cls_align_longdouble_split2.c
> libffi.call/nested_struct5.c
> 
> Fixed by making the same alignment adjustment in libffi to structures
> passed by value.  Bill, I think your patch needs to go on all active
> gcc branches as otherwise we'll need different versions of libffi for
> the next gcc releases.

Hm, the libffi case is unfortunate. :(

The alternative is to leave libffi alone, and require code that calls
these interfaces with "bad" structs passed by value to be built using
-mcompat-align-parm, which was provided for such compatibility issues.
Hopefully there is a small number of cases where this can happen, and
this could be documented with libffi and gcc.  What do you think?

Thanks,
Bill

> 
> The following was bootstrapped and regression checked powerpc64-linux.
> OK for mainline, and the 4.7 and 4.8 branches when/if Bill's patch
> goes in there?
> 
>   * src/powerpc/ffi.c (ffi_prep_args64): Align FFI_TYPE_STRUCT.
>   (ffi_closure_helper_LINUX64): Likewise.
> 
> Index: libffi/src/powerpc/ffi.c
> ===
> --- libffi/src/powerpc/ffi.c  (revision 202428)
> +++ libffi/src/powerpc/ffi.c  (working copy)
> @@ -462,6 +462,7 @@ ffi_prep_args64 (extended_cif *ecif, unsigned long
>  double **d;
>} p_argv;
>unsigned long gprvalue;
> +  unsigned long align;
> 
>stacktop.c = (char *) stack + bytes;
>gpr_base.ul = stacktop.ul - ASM_NEEDS_REGISTERS64 - 
> NUM_GPR_ARG_REGISTERS64;
> @@ -532,6 +533,10 @@ ffi_prep_args64 (extended_cif *ecif, unsigned long
>  #endif
> 
>   case FFI_TYPE_STRUCT:
> +   align = (*ptr)->alignment;
> +   if (align > 16)
> + align = 16;
> +   next_arg.ul = ALIGN (next_arg.ul, align);
> words = ((*ptr)->size + 7) / 8;
> if (next_arg.ul >= gpr_base.ul && next_arg.ul + words > gpr_end.ul)
>   {
> @@ -1349,6 +1354,7 @@ ffi_closure_helper_LINUX64 (ffi_closure *closure,
>long i, avn;
>ffi_cif *cif;
>ffi_dblfl *end_pfr = pfr + NUM_FPR_ARG_REGISTERS64;
> +  unsigned long align;
> 
>cif = closure->cif;
>avalue = alloca (cif->nargs * sizeof (void *));
> @@ -1399,6 +1405,10 @@ ffi_closure_helper_LINUX64 (ffi_closure *closure,
> break;
> 
>   case FFI_TYPE_STRUCT:
> +   align = arg_types[i]->alignment;
> +   if (align > 16)
> + align = 16;
> +   pst = ALIGN (pst, align);
>  #ifndef __LITTLE_ENDIAN__
> /* Structures with size less than eight bytes are passed
>left-padded.  */
> 
>

Re: C++ demangler fix

2013-09-11 Thread Gary Benson

Jakub Jelinek wrote:
> cp-demangle.c isn't used just in libiberty, where using hashtab,
> xcalloc, XNEW etc. is fine, but also in libsupc++/libstdc++, where
> none of that is fine.  That is why cp-demangle.c only uses
> e.g. realloc, checks for allocation failures and propagates those to
> the caller if they happen (see allocation_failure field).  hashtab.o
> isn't linked into libstdc++ nor libsupc++, and the question is if we
> really do want to link all the hashtable code into libstdc++.
> How many hash table entries are there typically?  Is a hashtable
> required?

Three entries were required for the symbol in the testcase:

  "_ZSt7forwardIRN1x14refobjiteratorINS0_3refINS0_4mime30multipart_se" \
  "ction_processorObjIZ15get_body_parserIZZN14mime_processor21make_se" \
  "ction_iteratorERKNS2_INS3_10sectionObjENS0_10ptrrefBaseEEEbENKUlvE" \
  "_clEvEUlSB_bE_ZZNS6_21make_section_iteratorESB_bENKSC_clEvEUlSB_E0" \
  "_ENS1_INS2_INS0_20outputrefiteratorObjIiEES8_RKSsSB_OT_OT0_EUl" \
  "mE_NS3_32make_multipart_default_discarderISP_S8_EOT_RNSt16" \
  "remove_referenceISW_E4typeE"

I don't think there will many symbols with very many entries required.
I'm guessing that most symbols will require zero (which is why I made
it defer hashtable creation until it was required).

What kind of data structure would you like to see here, a realloc'd
array?  Do libsupc++ and libstdc++ use the demangler for anything more
performance-sensitive than exception printing?

Thanks,
Gary

-- 
http://gbenson.net/

Re: [patch 1/2] tree-flow.h restructuring

2013-09-11 Thread Andrew MacLeod


On 09/11/2013 04:45 AM, Richard Biener wrote:

On Tue, Sep 10, 2013 at 9:19 PM, Andrew MacLeod  wrote:

Here's a start at restructuring the whole tree-flow.h mess that we created
back in the original wild west tree-ssa days.

First, almost everyone includes tree-flow.h because it became the kitchen
sink of functionality.  Really, what we ought to have is a tree-ssa.h which
anything that uses basic tree-ssa functionality includes, and that'll be the
main include for SSA passes. Other prototypes and such should come from
other appropriate places. Doing this in one step is basically impossible. so
here is that first few steps, structured so that it's easier to review.

I changed everywhere which includes tree-flow.h  to include tree-ssa.h
instead.   tree-ssa.h includes tree-flow.h first, which makes it
functionally the same from a compiling point of view.

You mean that doing the restructure and only adding includes that are
required to make compile work again wasn't feasible...?
Not really for this first split... eveyrthing that required tree-flow.h 
basically requires tree-ssa.h. Once everything has been separated 
from tree-flow.h and put in their right places,  It's easy to see 
exactly which parts are required where, and  by tackling each .c file 
once that uses tree.ssa.h I can get it down the the bare bones for what 
was required from the original bloated tree-flow.h.

I also moved everything from tree-flow.h and tree-flow-inline.h that is
related to tree-ssa.c functions into tree-ssa.h.  There were also a few
function prototypes sprinkled in a  couple of other headers which I moved
into tree-ssa.h as well.  I have verified that every exported function in
tree-ssa.c has a proto in tree-ssa.h, and is listed in the same order as the
.h file.

Note that in general there isn't a thing like "tree SSA" anymore - it's
"GIMPLE SSA" now ;)  Not that we want gigantic renaming occuring now,
just if you invent completely new .[ch] files then keep that in mind.


I considered just making it just ssa.h, but maybe this isn't the 
time...  Probably easier to do a mass rename very early in stage 1... 
until then I figure it probably makes sense to make the .h match the .c 
file.. yes, for a new pair, I'd not bother with tree-*


That is, if you moved stuff to tree-ssa.h that has no definition in tree-ssa.c
then I'd rather have a new gimple-ssa.h that doesn't have a corresponding
.c file (for now).
Ah, i see.   Although mulling it over last night,  Im marginally in 
favour of having a separate .h for for each .c... Then things like 
tree-ssa-ununit.h which are only used in one or two places can simply be 
included in the one or two .c files they are required for.  That way we 
preserve tree-ssa.h to only include the "commonly required" .h files 
which 6 or more .c files require. (to pick an arbitrary number :-)   
This will help reduce the include web for rebuilding/compiling.  I also 
note your response in the other patch... First I will try to restructure 
whats in the file before resorting to a new .h file in these cases.  
I'll keep a gimple-ssa.h "aggregator" on the backburner until it looks 
like it will be needed.

Compiling this change indicated that there were a few files which required
functionality from tree-ssa.c which really don't belong there.  In
particular,  useless_type_conversion_p and types_compatible_p are used by
the C++ front end and other places which don't really care about SSA.  So I
moved them into more appropriate places... tree.c and tree.h

Well, those functions are only valid when we are in GIMPLE (the GIMPLE
type system is less strict than the GENERIC one), so tree.[ch] isn't
appropriate.
gimple.[ch] would be.  But I wonder where it's used from the FEs - that's surely
looking wrong (unless it's in a langhook).  But yes, the functions are not
in any way about SSA but only GIMPLE.


OK, that wasn't obvious from the function itself (it only ever deals 
with trees), but that being the case, I'll move them both to gimple.[ch] 
instead. That seems more appropriate and I will comment it as gimple 
compatible only.


Looks like there is a langhook...
c-family/c-common.c:  && lang_hooks.types_compatible_p (TREE_TYPE 
(t1), TREE_TYPE (t2)))


However each front end seems to provide their own version of 
types_compatible_p and use that... ( c_types_compatible_p and 
cxx_types_compatible_p).   lto.c has gimple_types_compatible_p()... but 
does also use types_compatible_p() in lto-symtab.c  :-P



Andrew

Re: [PATCH GCC] Tweak gimple-ssa-strength-reduction.c:backtrace_base_for_ref () to cover different cases as seen on AArch64

2013-09-11 Thread Bill Schmidt

On Wed, 2013-09-11 at 10:32 +0200, Richard Biener wrote:
> On Tue, Sep 10, 2013 at 5:53 PM, Yufeng Zhang  wrote:
> > Hi,
> >
> > Following Bin's patch in
> > http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00695.html, this patch tweaks
> > backtrace_base_for_ref () to strip of any widening conversion after the
> > first TREE_CODE check fails.  Without this patch, the test
> > (gcc.dg/tree-ssa/slsr-39.c) in Bin's patch will fail on AArch64, as
> > backtrace_base_for_ref () will stop if not seeing an ssa_name since the tree
> > code can be nop_expr instead.
> >
> > Regtested on arm and aarch64; still bootstrapping x86_64.
> >
> > OK for the trunk if the x86_64 bootstrap succeeds?
> 
> Please add a testcase.

Also, the comment "Strip of" should read "Strip off".  Otherwise I have
no comments.

Thanks,
Bill

> 
> Richard.
> 
> > Thanks,
> > Yufeng
> >
> > gcc/
> >
> > * gimple-ssa-strength-reduction.c (backtrace_base_for_ref): Call
> > get_unwidened and check 'base_in' again.
>

Re: [patch 2/2] tree-flow.h restructuring

2013-09-11 Thread Andrew MacLeod


On 09/11/2013 04:55 AM, Richard Biener wrote:

On Tue, Sep 10, 2013 at 9:27 PM, Andrew MacLeod  wrote:

This splits out tree-ssaname related things to tree-ssanames.h. This is then
included from tree-ssa.h

This patch is ok as-is.


similar treatment can  be given to tree-phinodes.c

I notice a number of the other ssa passes only export a couple of functions,
and thats it.. no structs or anything like that.  (like tree-ssa-uninit.c
which exports ssa_undefined_value_p but that cant be easily relocated since
it depends on static objects in the file which are constructed earlier by
the pass)

Well - usually the reason is a bad design choice.  There should have been
a ssa_undefined_value_p predicate without the special
possibly_undefined_names pointer-set handling available in generic code
and tree-ssa-uninit.c wrapping that adding it's own special handling.

Most of the awkwardness here is because the generic uninit warning
machinery resides in tree-ssa.c.  Consider moving that (the
early uninit pass and its helpers) to tree-ssa-uninit.c.
I oringinally did try moving that one function to tree-ssa.c, and 
discovered the static pointer-set bits... and toyed with seeing what 
else would need to be moved, but didn't spend any real effort on it.  
I'll put a little more effort into it  :-)




Keep in mind the tree (GENERIC) vs. GIMPLE (non-SSA specific) and
SSA (SSA-on-GIMPLE specific) distinction.


Someday it will be more obvious :-)yes, I will keep that in mind, 
and you can catch what I miss, like types_compatible_p  :-)


Andrew

Re: [PATCH] Split critical edges before late uninit warning pass

2013-09-11 Thread Richard Biener

On Wed, 11 Sep 2013, Jakub Jelinek wrote:

> On Wed, Sep 11, 2013 at 12:00:04PM +0200, Richard Biener wrote:
> > It should make -Og more usable on the branch and as it is quite
> > new I suppose the change qualifies for the branch (I lately
> > moved some passes there for other diagnostic issues).
> > 
> > Opinions on generally splitting critical edges before late
> > uninit warnings?
> 
> Looks reasonable to me.

Bootstrapped and tested on x86_64-unknonw-linux-gnu and applied.

Richard.

2013-09-11  Richard Biener  

PR middle-end/58377
* passes.def: Split critical edges before late uninit warning passes.
* tree-cfg.c (pass_split_crit_edges): Implement clone method.

* g++.dg/uninit-pred-4.C: New testcase.

Index: gcc/passes.def
===
*** gcc/passes.def  (revision 202492)
--- gcc/passes.def  (working copy)
*** along with GCC; see the file COPYING3.
*** 249,254 
--- 249,257 
 account for the predicates protecting the set and the use of each
 variable.  Using a representation like Gated Single Assignment
 may help.  */
+   /* Split critical edges before late uninit warning to reduce the
+  number of false positives from it.  */
+   NEXT_PASS (pass_split_crit_edges);
NEXT_PASS (pass_late_warn_uninitialized);
NEXT_PASS (pass_dse);
NEXT_PASS (pass_forwprop);
*** along with GCC; see the file COPYING3.
*** 282,287 
--- 285,293 
/* ???  We do want some kind of loop invariant motion, but we possibly
   need to adjust LIM to be more friendly towards preserving accurate
 debug information here.  */
+   /* Split critical edges before late uninit warning to reduce the
+  number of false positives from it.  */
+   NEXT_PASS (pass_split_crit_edges);
NEXT_PASS (pass_late_warn_uninitialized);
NEXT_PASS (pass_uncprop);
NEXT_PASS (pass_local_pure_const);
Index: gcc/tree-cfg.c
===
*** gcc/tree-cfg.c  (revision 202492)
--- gcc/tree-cfg.c  (working copy)
*** public:
*** 7929,7934 
--- 7949,7955 
/* opt_pass methods: */
unsigned int execute () { return split_critical_edges (); }
  
+   opt_pass * clone () { return new pass_split_crit_edges (ctxt_); }
  }; // class pass_split_crit_edges
  
  } // anon namespace
Index: gcc/testsuite/g++.dg/uninit-pred-4.C
===
*** gcc/testsuite/g++.dg/uninit-pred-4.C(revision 0)
--- gcc/testsuite/g++.dg/uninit-pred-4.C(working copy)
***
*** 0 
--- 1,16 
+ /* { dg-do compile } */
+ /* { dg-options "-Wuninitialized -Og" } */
+ 
+ int pop ();
+ int pop_first_bucket;
+ 
+ int my_pop ()
+ {
+   int out;  // { dg-bogus "uninitialized" "uninitialized variable warning" }
+ 
+   while (pop_first_bucket)
+ if (pop_first_bucket && (out = pop()))
+   return out;
+ 
+   return 0;
+ }

Using gen_int_mode instead of GEN_INT minot testsuite fallout on MIPS

2013-09-11 Thread Graham Stott

Hi Richard,

There is some minor testsuite fallout with these patches on MIPS a
couple of tests (see below)ICE ingen_int_mode () in both these ICE the mode is 
CCmode.

Here we arrive at gen_int_mode () via a use of plus_constant () is cse.c

with a register with CCmode called from  find_reg_offset_for_const ().

The plus_constant () ends up trying to create the reg+const rtx via


  if (c != 0)
    x = gen_rtx_PLUS (mode, x, gen_int_mode (c, mode));


with the gen_int_mode 90 triggering the ICE.

It obviously doesn't make sensegenerating a reg + const where the

reg has CCMode.

I've only got results from check-gcc-c testsuite so far.


Graham


testsuite/c-c++-common/cilk-plus/AN/builtin_func_double.c
testsuite/gcc.c-torture/execute/921013-1.c

Re: [PATCH, PowerPC] Fix PR57949 (ABI alignment issue)

2013-09-11 Thread Jakub Jelinek

On Wed, Sep 11, 2013 at 09:08:46PM +0930, Alan Modra wrote:
> The following was bootstrapped and regression checked powerpc64-linux.
> OK for mainline, and the 4.7 and 4.8 branches when/if Bill's patch
> goes in there?

IMHO ABI changing patches shouldn't be backported to release branches.

Jakub

Re: RFC: patch to build GCC for arm with LRA

2013-09-11 Thread Yvan Roux

Yes indeed ! here is a fixed patch.

In strip_address_mutations we now have 3 if/else if statements with
the same body which could be factorized in:

  if ((GET_RTX_CLASS (code) == RTX_UNARY)
  /* Things like SIGN_EXTEND, ZERO_EXTEND and TRUNCATE can be
 used to convert between pointer sizes.  */
  || (lsb_bitfield_op_p (*loc))
  /* Bitfield operations [SIGN|ZERO]_EXTRACT from the least significant
 bit can be used too.  */
  || (code == AND && CONST_INT_P (XEXP (*loc, 1
  /* (and ... (const_int -X)) is used to align to X bytes.  */
loc = &XEXP (*loc, 0);

if you think that it doesn't affect too much the readability.

Many Thanks,
Yvan

On 11 September 2013 09:32, Richard Sandiford
 wrote:
> Yvan Roux  writes:
>> @@ -5454,6 +5454,16 @@ strip_address_mutations (rtx *loc, enum rtx_code 
>> *outer_code)
>>   /* Things like SIGN_EXTEND, ZERO_EXTEND and TRUNCATE can be
>>  used to convert between pointer sizes.  */
>>   loc = &XEXP (*loc, 0);
>> +  else if (GET_RTX_CLASS (code) == RTX_BITFIELD_OPS)
>> + {
>> +   /* Bitfield operations [SIGN|ZERO]_EXTRACT can be used too.  */
>> +   enum machine_mode mode = GET_MODE(*loc);
>> +   unsigned HOST_WIDE_INT len = INTVAL (XEXP (*loc, 1));
>> +   HOST_WIDE_INT pos = INTVAL (XEXP (*loc, 2));
>> +
>> +   if (pos == (BITS_BIG_ENDIAN ? GET_MODE_PRECISION (mode) - len : 0))
>> + loc = &XEXP (*loc, 0);
>> + }
>
> This means that the other values of "pos" bypass the:
>
>   else
> return loc;
>
> so you'll get an infinite loop.  I think it would be neater to split
> this out into:
>
> /* Return true if X is a sign_extract or zero_extract from the least
>significant bit.  */
>
> static bool
> lsb_bitfield_op_p (rtx X)
> {
>   ...;
> }
>
> else if (lsb_bitfield_op_p (*loc))
>   loc = &XEXP (*loc, 0);
>
> Looks good to me otherwise FWIW.
>
> Thanks,
> Richard

arm-lra.patch
Description: Binary data

[patch config.gcc]: Add a missed 64-bit cygwin case

2013-09-11 Thread Kai Tietz

Hi,

One missing nit in config.gcc to choose proper cpu default and split
cygwin and mingw targets.

ChangeLog

2013-09-11  Kai Tietz  

* config.gcc: Separate cases for mingw and cygwin targets,
and add 64-bit cygwin target case.

Tested for x86_64-pc-cygwin, i686-pc-cygwin, i686-w64-mingw32, and for
x86_64-w64-mingw32.  I will apply this patch in a cuple of hours, if
there are no objections.

Regards,
Kai

Index: config.gcc
===
--- config.gcc(Revision 202491)
+++ config.gcc(Arbeitskopie)
@@ -3858,8 +3858,10 @@ case ${target} in
 ;;
 i[34567]86-*-solaris2* | x86_64-*-solaris2.1[0-9]*)
 ;;
-i[34567]86-*-cygwin* | i[34567]86-*-mingw* | x86_64-*-mingw*)
+i[34567]86-*-cygwin* | x86_64-*-cygwin*)
 ;;
+i[34567]86-*-mingw* | x86_64-*-mingw*)
+;;
 i[34567]86-*-freebsd* | x86_64-*-freebsd*)
 ;;
 ia64*-*-linux*)

Re: [PATCH, PowerPC] Fix PR57949 (ABI alignment issue)

2013-09-11 Thread Alan Modra

On Wed, Aug 14, 2013 at 10:32:01AM -0500, Bill Schmidt wrote:
> This fixes a long-standing problem with GCC's implementation of the
> PPC64 ELF ABI.  If a structure contains a member requiring 128-bit
> alignment, and that structure is passed as a parameter, the parameter
> currently receives only 64-bit alignment.  This is an error, and is
> incompatible with correct code generated by the IBM XL compilers.

This caused multiple failures in the libffi testsuite:
libffi.call/cls_align_longdouble.c
libffi.call/cls_align_longdouble_split.c
libffi.call/cls_align_longdouble_split2.c
libffi.call/nested_struct5.c

Fixed by making the same alignment adjustment in libffi to structures
passed by value.  Bill, I think your patch needs to go on all active
gcc branches as otherwise we'll need different versions of libffi for
the next gcc releases.

The following was bootstrapped and regression checked powerpc64-linux.
OK for mainline, and the 4.7 and 4.8 branches when/if Bill's patch
goes in there?

* src/powerpc/ffi.c (ffi_prep_args64): Align FFI_TYPE_STRUCT.
(ffi_closure_helper_LINUX64): Likewise.

Index: libffi/src/powerpc/ffi.c
===
--- libffi/src/powerpc/ffi.c(revision 202428)
+++ libffi/src/powerpc/ffi.c(working copy)
@@ -462,6 +462,7 @@ ffi_prep_args64 (extended_cif *ecif, unsigned long
 double **d;
   } p_argv;
   unsigned long gprvalue;
+  unsigned long align;
 
   stacktop.c = (char *) stack + bytes;
   gpr_base.ul = stacktop.ul - ASM_NEEDS_REGISTERS64 - NUM_GPR_ARG_REGISTERS64;
@@ -532,6 +533,10 @@ ffi_prep_args64 (extended_cif *ecif, unsigned long
 #endif
 
case FFI_TYPE_STRUCT:
+ align = (*ptr)->alignment;
+ if (align > 16)
+   align = 16;
+ next_arg.ul = ALIGN (next_arg.ul, align);
  words = ((*ptr)->size + 7) / 8;
  if (next_arg.ul >= gpr_base.ul && next_arg.ul + words > gpr_end.ul)
{
@@ -1349,6 +1354,7 @@ ffi_closure_helper_LINUX64 (ffi_closure *closure,
   long i, avn;
   ffi_cif *cif;
   ffi_dblfl *end_pfr = pfr + NUM_FPR_ARG_REGISTERS64;
+  unsigned long align;
 
   cif = closure->cif;
   avalue = alloca (cif->nargs * sizeof (void *));
@@ -1399,6 +1405,10 @@ ffi_closure_helper_LINUX64 (ffi_closure *closure,
  break;
 
case FFI_TYPE_STRUCT:
+ align = arg_types[i]->alignment;
+ if (align > 16)
+   align = 16;
+ pst = ALIGN (pst, align);
 #ifndef __LITTLE_ENDIAN__
  /* Structures with size less than eight bytes are passed
 left-padded.  */


-- 
Alan Modra
Australia Development Lab, IBM

Re: [RS6000] powerpc64 -mcmodel=medium large symbol offsets

2013-09-11 Thread Alan Modra

On Mon, Sep 09, 2013 at 06:37:03PM +0930, Alan Modra wrote:
> gcc/
>   * config/rs6000/predicates.md (add_cint_operand): New.
>   (reg_or_add_cint_operand): Use add_cint_operand.
>   * config/rs6000/rs6000.md (largetoc_high_plus): Restrict offset
>   using add_cint_operand.
>   (largetoc_high_plus_aix, small_toc_ref): Likewise.
> gcc/testsuite/
>   * gcc.target/powerpc/medium_offset.c: New.

I missed seeing one testcase regression caused by this patch.  :-(
gcc.c-torture/compile/pr41634.c at -O3 gets an "insn does not satisfy
its constraints".  Fixed with the following.  OK to apply?

* config/rs6000/rs6000.c (toc_relative_expr_p): Use add_cint_operand.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 202428)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -5926,7 +5906,7 @@ toc_relative_expr_p (const_rtx op, bool strict)
 
   tocrel_base = op;
   tocrel_offset = const0_rtx;
-  if (GET_CODE (op) == PLUS && CONST_INT_P (XEXP (op, 1)))
+  if (GET_CODE (op) == PLUS && add_cint_operand (XEXP (op, 1), GET_MODE (op)))
 {
   tocrel_base = XEXP (op, 0);
   tocrel_offset = XEXP (op, 1);

-- 
Alan Modra
Australia Development Lab, IBM

Re: folding (vec_)cond_expr in a binary operation

2013-09-11 Thread Marc Glisse


Any other comments on this patch?

http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00129.html

On Tue, 3 Sep 2013, Marc Glisse wrote:


(attaching the latest version. I only added comments since regtesting it)

On Tue, 3 Sep 2013, Richard Biener wrote:

Btw, as for the patch I don't like that you basically feed everything 
into

fold.  Yes, I know we do that for conditions because that's quite
important
and nobody has re-written the condition folding as gimple pattern 
matcher.

I doubt that this transform is important enough to warrant another
exception ;)



I am not sure what you want here. When I notice the pattern (a?b:c) op d, 
I

need to check whether b op d and c op d are likely to simplify before
transforming it to a?(b op d):(c op d). And currently I can't see any way 
to

do that other than feeding (b op d) to fold. Even if/when we get a gimple
folder, we will still need to call it in about the same way.


With a gimple folder you'd avoid building trees.


Ah, so the problem is the cost of building those 2 trees? It will indeed
be better when we can avoid it, but I really don't expect the cost to be
that much...


You are testing for
a quite complex pattern here - (a?b:c) & (d?e:f).


But it is really handled in several steps. IIRC:
(a?1:0)&x becomes a?(x&1):0.
Since x is d?1:0, x&1 becomes d?1:0.
a?(d?1:0):0 is not (yet?) simplified further.

Now if we compare to 0: a?(d?1:0):0 != 0 simplifies to a?(d?1:0)!=0:0
then a?(d?-1:0):0 and finally a?d:0.

Each step can also trigger individually.


It seems to be that
for example

+  vec c=(a>3)?o:z;
+  vec d=(b>2)?o:z;
+  vec e=c&d;

would be better suited in the combine phase (you are interested in
combining the comparisons).  That is, look for a [&|^] b where
a and b are [VEC_]COND_EXPRs with the same values.


Hmm, I am already looking for a binary op which has at least one operand
which is a [VEC_]COND_EXPR, in the combine (=backward) part of
tree-ssa-forwprop.  Isn't that almost what you are suggesting?

Heh, and it's not obvious to me with looking for a minute what this should 
be simplified to ...


(a?x:y)&(b?x:y) doesn't really simplify in general.


(so the code and the testcase needs some
comments on what you are trying to catch ...)


a

--
Marc Glisse

Re: [PATCH] Split critical edges before late uninit warning pass

2013-09-11 Thread Jakub Jelinek

On Wed, Sep 11, 2013 at 12:00:04PM +0200, Richard Biener wrote:
> It should make -Og more usable on the branch and as it is quite
> new I suppose the change qualifies for the branch (I lately
> moved some passes there for other diagnostic issues).
> 
> Opinions on generally splitting critical edges before late
> uninit warnings?

Looks reasonable to me.

> 2013-09-11  Richard Biener  
> 
>   PR middle-end/58377
>   * passes.c (init_optimization_passes): Split critical edges
>   before late uninit warning pass.
> 
>   * g++.dg/uninit-pred-4.C: New testcase.

Jakub

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-11 Thread Alexander Monakov



On Tue, 10 Sep 2013, Wei Mi wrote:

> Because deps_analyze_insn only analyzes data deps but no control deps.
> Control deps are included by add_branch_dependences. Without the
> chain_to_prev_insn in the end of add_branch_dependences, jmp will be
> control dependent on every previous insn in the same bb, and the cmp
> and jmp group could still be scheduled apart since they will not be
> put in ready list at the same time.

Would calling add_branch_dependences before sched_analyze solve that, then?

Alexander

[PATCH] Split critical edges before late uninit warning pass

2013-09-11 Thread Richard Biener


This patch adds critical edge splitting before the late uninit
warning pass in the -Og pipeline (the patch is for the 4.8 branch
where the testcase otherwise fails).  If I commit this then I'll
commit a corresponding change to trunk as well.

On trunk we can also split critical edges in the regular pipeline
to mitigate the issue the late uninit warning pass has with
critical edges, posssibly reducing the number of false positives.

Critical edges are unsplit by the next CFG cleanup which happens
at latest during pass_cleanup_cfg_post_optimizing.

Bootstrap / regtest running on the branch for x86_64-unknown-linux-gnu.

It should make -Og more usable on the branch and as it is quite
new I suppose the change qualifies for the branch (I lately
moved some passes there for other diagnostic issues).

Opinions on generally splitting critical edges before late
uninit warnings?

Thanks,
Richard.

2013-09-11  Richard Biener  

PR middle-end/58377
* passes.c (init_optimization_passes): Split critical edges
before late uninit warning pass.

* g++.dg/uninit-pred-4.C: New testcase.

Index: gcc/passes.c
===
*** gcc/passes.c(revision 202445)
--- gcc/passes.c(working copy)
*** init_optimization_passes (void)
*** 1543,1548 
--- 1543,1551 
/* ???  We do want some kind of loop invariant motion, but we possibly
   need to adjust LIM to be more friendly towards preserving accurate
 debug information here.  */
+   /* Split critical edges before late uninit warning to reduce the
+  number of false positives from it.  */
+   NEXT_PASS (pass_split_crit_edges);
NEXT_PASS (pass_late_warn_uninitialized);
NEXT_PASS (pass_uncprop);
NEXT_PASS (pass_local_pure_const);
Index: gcc/testsuite/g++.dg/uninit-pred-4.C
===
*** gcc/testsuite/g++.dg/uninit-pred-4.C(revision 0)
--- gcc/testsuite/g++.dg/uninit-pred-4.C(working copy)
***
*** 0 
--- 1,16 
+ /* { dg-do compile } */
+ /* { dg-options "-Wuninitialized -Og" } */
+ 
+ int pop ();
+ int pop_first_bucket;
+ 
+ int my_pop ()
+ {
+   int out;  // { dg-bogus "uninitialized" "uninitialized variable warning" }
+ 
+   while (pop_first_bucket)
+ if (pop_first_bucket && (out = pop()))
+   return out;
+ 
+   return 0;
+ }

[PATCH] Move RDG from tree-data-ref.c to tree-loop-distribution.c

2013-09-11 Thread Richard Biener


This moves the RDG implementation private to its only user,
loop distribution, and away from the unrelated data-reference/dependence
code.

The patch also moves the loops bitmap passed around into the
partition struct.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2013-09-11  Richard Biener  

* tree-data-ref.c (dump_rdg_vertex, debug_rdg_vertex,
dump_rdg_component, debug_rdg_component, dump_rdg, debug_rdg,
dot_rdg_1, dot_rdg, rdg_vertex_for_stmt, create_rdg_edge_for_ddr,
create_rdg_edges_for_scalar, create_rdg_edges, create_rdg_vertices,
stmts_from_loop, known_dependences_p, build_empty_rdg,
build_rdg, free_rdg, rdg_defs_used_in_other_loops_p): Move ...
* tree-loop-distribution.c: ... here.
* tree-data-ref.h (struct rdg_vertex, RDGV_STMT, RDGV_DATAREFS,
RDGV_HAS_MEM_WRITE, RDGV_HAS_MEM_READS, RDG_STMT, RDG_DATAREFS,
RDG_MEM_WRITE_STMT, RDG_MEM_READS_STMT, enum rdg_dep_type,
struct rdg_edge, RDGE_TYPE, RDGE_LEVEL, RDGE_RELATION): Move ...
* tree-loop-distribution.c: ... here.
* tree-loop-distribution.c: Include gimple-pretty-print.h.
(struct partition_s): Add loops member.
(partition_alloc, partition_free, rdg_flag_uses, rdg_flag_vertex,
rdg_flag_vertex_and_dependent, rdg_flag_loop_exits,
build_rdg_partition_for_component, rdg_build_partitions): Adjust.

Index: gcc/tree-data-ref.c
===
*** gcc/tree-data-ref.c (revision 202431)
--- gcc/tree-data-ref.c (working copy)
*** free_data_refs (vec da
*** 4798,5268 
  free_data_ref (dr);
datarefs.release ();
  }
- 
- 
- 
- /* Dump vertex I in RDG to FILE.  */
- 
- static void
- dump_rdg_vertex (FILE *file, struct graph *rdg, int i)
- {
-   struct vertex *v = &(rdg->vertices[i]);
-   struct graph_edge *e;
- 
-   fprintf (file, "(vertex %d: (%s%s) (in:", i,
-  RDG_MEM_WRITE_STMT (rdg, i) ? "w" : "",
-  RDG_MEM_READS_STMT (rdg, i) ? "r" : "");
- 
-   if (v->pred)
- for (e = v->pred; e; e = e->pred_next)
-   fprintf (file, " %d", e->src);
- 
-   fprintf (file, ") (out:");
- 
-   if (v->succ)
- for (e = v->succ; e; e = e->succ_next)
-   fprintf (file, " %d", e->dest);
- 
-   fprintf (file, ")\n");
-   print_gimple_stmt (file, RDGV_STMT (v), 0, TDF_VOPS|TDF_MEMSYMS);
-   fprintf (file, ")\n");
- }
- 
- /* Call dump_rdg_vertex on stderr.  */
- 
- DEBUG_FUNCTION void
- debug_rdg_vertex (struct graph *rdg, int i)
- {
-   dump_rdg_vertex (stderr, rdg, i);
- }
- 
- /* Dump component C of RDG to FILE.  If DUMPED is non-null, set the
-dumped vertices to that bitmap.  */
- 
- static void
- dump_rdg_component (FILE *file, struct graph *rdg, int c, bitmap dumped)
- {
-   int i;
- 
-   fprintf (file, "(%d\n", c);
- 
-   for (i = 0; i < rdg->n_vertices; i++)
- if (rdg->vertices[i].component == c)
-   {
-   if (dumped)
- bitmap_set_bit (dumped, i);
- 
-   dump_rdg_vertex (file, rdg, i);
-   }
- 
-   fprintf (file, ")\n");
- }
- 
- /* Call dump_rdg_vertex on stderr.  */
- 
- DEBUG_FUNCTION void
- debug_rdg_component (struct graph *rdg, int c)
- {
-   dump_rdg_component (stderr, rdg, c, NULL);
- }
- 
- /* Dump the reduced dependence graph RDG to FILE.  */
- 
- void
- dump_rdg (FILE *file, struct graph *rdg)
- {
-   int i;
-   bitmap dumped = BITMAP_ALLOC (NULL);
- 
-   fprintf (file, "(rdg\n");
- 
-   for (i = 0; i < rdg->n_vertices; i++)
- if (!bitmap_bit_p (dumped, i))
-   dump_rdg_component (file, rdg, rdg->vertices[i].component, dumped);
- 
-   fprintf (file, ")\n");
-   BITMAP_FREE (dumped);
- }
- 
- /* Call dump_rdg on stderr.  */
- 
- DEBUG_FUNCTION void
- debug_rdg (struct graph *rdg)
- {
-   dump_rdg (stderr, rdg);
- }
- 
- static void
- dot_rdg_1 (FILE *file, struct graph *rdg)
- {
-   int i;
- 
-   fprintf (file, "digraph RDG {\n");
- 
-   for (i = 0; i < rdg->n_vertices; i++)
- {
-   struct vertex *v = &(rdg->vertices[i]);
-   struct graph_edge *e;
- 
-   /* Highlight reads from memory.  */
-   if (RDG_MEM_READS_STMT (rdg, i))
-fprintf (file, "%d [style=filled, fillcolor=green]\n", i);
- 
-   /* Highlight stores to memory.  */
-   if (RDG_MEM_WRITE_STMT (rdg, i))
-fprintf (file, "%d [style=filled, fillcolor=red]\n", i);
- 
-   if (v->succ)
-for (e = v->succ; e; e = e->succ_next)
-  switch (RDGE_TYPE (e))
-{
-case input_dd:
-  fprintf (file, "%d -> %d [label=input] \n", i, e->dest);
-  break;
- 
-case output_dd:
-  fprintf (file, "%d -> %d [label=output] \n", i, e->dest);
-  break;
- 
-case flow_dd:
-  /* These are the most common dependences: don't print these. */
-  fprintf (file, "%d -> %d \n", i, e->dest);
-  break;
- 
-case anti_dd:
-

Re: [ping][PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL

2013-09-11 Thread Richard Biener

On Wed, 11 Sep 2013, Jakub Jelinek wrote:

> On Wed, Sep 11, 2013 at 11:02:30AM +0200, Richard Biener wrote:
> > Make sure to add a predicate that can tell whether its an anti-range
> > then.
> 
> Or perhaps extraction routine, that given an SSA_NAME will give you
> a triplet, 
> { enum value_range_type vr_type; double_int min, max; }
> For non-integral SSA_NAMEs, or SSA_NAMEs with NULL RANGE_INFO
> (should include integral types with > 2 * HOST_BITS_PER_WIDE_INT
> precision) will give you VR_VARYING, for the case where
> min <= max VR_RANGE and otherwise decode the max > min range
> into VR_ANTI_RANGE with adjusted min/max?
> 
> Then the min/max encoding of anti range would be just a more compact
> way of encoding it to lower memory usage.

Yeah, that also works.

Richard.

Re: [ping][PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL

2013-09-11 Thread Jakub Jelinek

On Wed, Sep 11, 2013 at 11:02:30AM +0200, Richard Biener wrote:
> Make sure to add a predicate that can tell whether its an anti-range
> then.

Or perhaps extraction routine, that given an SSA_NAME will give you
a triplet, 
{ enum value_range_type vr_type; double_int min, max; }
For non-integral SSA_NAMEs, or SSA_NAMEs with NULL RANGE_INFO
(should include integral types with > 2 * HOST_BITS_PER_WIDE_INT
precision) will give you VR_VARYING, for the case where
min <= max VR_RANGE and otherwise decode the max > min range
into VR_ANTI_RANGE with adjusted min/max?

Then the min/max encoding of anti range would be just a more compact
way of encoding it to lower memory usage.

Jakub

Re: [ping][PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL

2013-09-11 Thread Richard Biener

On Wed, 11 Sep 2013, Kugan wrote:

> Thanks Jakub for the review.
> 
> On 10/09/13 23:10, Jakub Jelinek wrote:
> > On Tue, Sep 10, 2013 at 03:17:50PM +0200, Richard Biener wrote:
> > > > unsigned short s;
> > > > s.1_3 = (short unsigned int) l.0_2;
> > > > l.0_2: VARYING
> > > > s.1_3: [0, +INF]
> > > 
> > > Note that [0, +INF] is the same as VARYING and [-INF, +INF] and VARYING
> > > for
> > > l.0_2 is the same as [-INF, +INF].
> > 
> > Yeah, I don't see much value in differentiating between VR_VARYING and
> > VR_RANGE [TYPE_MIN_VALUE, TYPE_MAX_VALUE] (perhaps a question is what to do
> > for types with precisions different from TYPE_MODE's bitsize, if we should
> > store for VARYING/UNDEFINED a range of all possible values in the mode).
> > Unsigned type will be always >= 0, even if it is VARYING or UNDEFINED.
> > What is the valid bit good for?  Is it meant just for integrals with >
> > 2*HOST_BITS_PER_WIDE_INT precision, which we can't represent in double_int?
> > I'd say we just don't want to keep track on the value ranges for those.
> 
> Ok, I will remove the valid.
> 
> > And, do we need to distinguish between VR_RANGE and VR_ANTI_RANGE?
> > I mean, can't we always store the range in VR_RANGE format?  Instead of
> > -[3,7] we'd store [8,2] and define that if the min double_int is bigger than
> > max double_int, then it is [min,+infinity] merged with [-infinity,max] range
> > (i.e. -[max+1,min-1])?
> > 
> 
> Ok, I will change this too.

Make sure to add a predicate that can tell whether its an anti-range
then.

Richard.

Re: [patch 2/2] tree-flow.h restructuring

2013-09-11 Thread Richard Biener

On Tue, Sep 10, 2013 at 9:27 PM, Andrew MacLeod  wrote:
> This splits out tree-ssaname related things to tree-ssanames.h. This is then
> included from tree-ssa.h

This patch is ok as-is.

> similar treatment can  be given to tree-phinodes.c
>
> I notice a number of the other ssa passes only export a couple of functions,
> and thats it.. no structs or anything like that.  (like tree-ssa-uninit.c
> which exports ssa_undefined_value_p but that cant be easily relocated since
> it depends on static objects in the file which are constructed earlier by
> the pass)

Well - usually the reason is a bad design choice.  There should have been
a ssa_undefined_value_p predicate without the special
possibly_undefined_names pointer-set handling available in generic code
and tree-ssa-uninit.c wrapping that adding it's own special handling.

Most of the awkwardness here is because the generic uninit warning
machinery resides in tree-ssa.c.  Consider moving that (the
early uninit pass and its helpers) to tree-ssa-uninit.c.

> I was thinking that rather than create tree-ssa-passxyz.h in these cases
> would could simply put those prototypes into tree-ssa.h since they are SSA
> related... but I'm ok creating those pass headers if that is the direction
> we want to go  then we have 1:1 correspondences rather than recreating
> the /* In file.c */ setup again in tree-ssa.h :-)

Please not ;)  Just keep the existing mess rather than changing it to a
different one.  Or go the full way of restructuring things like outlined above
for this special case so no mess is required.

> It looks like there is a bunch of tree-ssa-loop stuff in there as well, I
> would think all of that would be good to put into a tree-ssa-loop.h, and
> then any non-loop files won't need to see these structs and functions unless
> they want to include that file.  (ie tree-ssa.h wouldn't include
> tree-ssa-loop.h, but all the tree-ssa-loop*.c files would)

Sure.  Note that for generic loop stuff we have the IL agnostic cfgloop*
files.

> after that, tree-flow.h will end up with some gimple flow and other
> miscellaneous things which can be looked at, as well as the SSA immediate
> use code which should go somewhere else... perhaps  in
> tree-ssa-operands.h...  Sometimes its hard to tell until you try moving it
> :-)

Immediate use stuff indeed looks like belonging to tree-ssa-operands.[ch].

> then I'd go tackle the stuff in gimple.h and tree.h that doesn't belong
> there.

Keep in mind the tree (GENERIC) vs. GIMPLE (non-SSA specific) and
SSA (SSA-on-GIMPLE specific) distinction.

Richard.

> Andrew

[patch driver]: Fix relocatable toolchain path-replacement in driver

2013-09-11 Thread Kai Tietz

Hi,

This change fixes a quirk happening for relocated toolchains.  Driver
remembers original-build directory
in std_prefix variable for being able later to modify path.  Sadly
this std_prefix variable gets modified
later on, and so update_path can't work any longer as desired.  This
patch fixes that by introducing an
constant variable holding the initial path without being later on modified.

ChangeLog

2013-09-11  Kai Tietz  

* prefix.c (org_prefix): New static variable.
(update_path): Use org_prefix instead of std_prefix.

Tested for i686-w64-mingw32, for x86_64-w64-mingw32, and for
x86_64-unknown-linux-gnu.  Ok for apply?

Regards,
Kai

Index: prefix.c
===
--- prefix.c(Revision 202491)
+++ prefix.c(Arbeitskopie)
@@ -73,6 +73,9 @@ License along with GCC; see the file COPYING3.  If
 #include "common/common-target.h"

 static const char *std_prefix = PREFIX;
+/* Original prefix used on intial build.  This might be different
+   to std_prefix for relocatable-configure.  */
+static const char *org_prefix = PREFIX;

 static const char *get_key_value (char *);
 static char *translate_name (char *);
@@ -247,9 +250,9 @@ char *
 update_path (const char *path, const char *key)
 {
   char *result, *p;
-  const int len = strlen (std_prefix);
+  const int len = strlen (org_prefix);

-  if (! filename_ncmp (path, std_prefix, len)
+  if (! filename_ncmp (path, org_prefix, len)
   && (IS_DIR_SEPARATOR(path[len])
   || path[len] == '\0')
   && key != 0)

Re: [patch 1/2] tree-flow.h restructuring

2013-09-11 Thread Richard Biener

On Tue, Sep 10, 2013 at 9:19 PM, Andrew MacLeod  wrote:
> Here's a start at restructuring the whole tree-flow.h mess that we created
> back in the original wild west tree-ssa days.
>
> First, almost everyone includes tree-flow.h because it became the kitchen
> sink of functionality.  Really, what we ought to have is a tree-ssa.h which
> anything that uses basic tree-ssa functionality includes, and that'll be the
> main include for SSA passes. Other prototypes and such should come from
> other appropriate places. Doing this in one step is basically impossible. so
> here is that first few steps, structured so that it's easier to review.
>
> I changed everywhere which includes tree-flow.h  to include tree-ssa.h
> instead.   tree-ssa.h includes tree-flow.h first, which makes it
> functionally the same from a compiling point of view.

You mean that doing the restructure and only adding includes that are
required to make compile work again wasn't feasible...?

> I also moved everything from tree-flow.h and tree-flow-inline.h that is
> related to tree-ssa.c functions into tree-ssa.h.  There were also a few
> function prototypes sprinkled in a  couple of other headers which I moved
> into tree-ssa.h as well.  I have verified that every exported function in
> tree-ssa.c has a proto in tree-ssa.h, and is listed in the same order as the
> .h file.

Note that in general there isn't a thing like "tree SSA" anymore - it's
"GIMPLE SSA" now ;)  Not that we want gigantic renaming occuring now,
just if you invent completely new .[ch] files then keep that in mind.

That is, if you moved stuff to tree-ssa.h that has no definition in tree-ssa.c
then I'd rather have a new gimple-ssa.h that doesn't have a corresponding
.c file (for now).

> Compiling this change indicated that there were a few files which required
> functionality from tree-ssa.c which really don't belong there.  In
> particular,  useless_type_conversion_p and types_compatible_p are used by
> the C++ front end and other places which don't really care about SSA.  So I
> moved them into more appropriate places... tree.c and tree.h

Well, those functions are only valid when we are in GIMPLE (the GIMPLE
type system is less strict than the GENERIC one), so tree.[ch] isn't
appropriate.
gimple.[ch] would be.  But I wonder where it's used from the FEs - that's surely
looking wrong (unless it's in a langhook).  But yes, the functions are not
in any way about SSA but only GIMPLE.

For now I'd say just keep them in tree-ssa.[ch] or move them to gimple.[ch].

> We'll continue moving stuff out of tree-flow.h into appropriate places,
> until all that is left makes sense where it is and doesn't include
> prototypes from other .c files.
>
> Once that is finished, I will go back and revisit tree-ssa.h to see what it
> actually needs for includes. Then visit every .c file which includes
> tree-ssa.h, remove it from the include list, compile and see what routines
> the file is looking for, then include the appropriate file(s).  This will
> likely identify other things like types_compatible_p() which are really in
> the wrong place, and those can then be moved.
>
> This patch implements that starting point, and its the most painful.
> The next one shows how we proceed and is much easier.
> Do we want to proceed this way? It seems reasonable to me.
>
> The changes bootstrap fine, I haven't finished running the regression tests
> yet... I wanted to see any comments before proceeding.

Apart from that (well, I can certainly bear with the bulk include change...)
the rest looks fine.

Thanks,
Richard.

> Andrew
>
>
>

[patch windows i386]: Make sure that really-external functions aren't dllexported

2013-09-11 Thread Kai Tietz

Hi,

this patch takes care that for C/C++ inlines aren't dll-exported.

ChangeLog

2013-09-11  Kai Tietz  

* config/i386/winnt-cxx.c (i386_pe_type_dllexport_p): Don't
dll-export inline-functions.
* config/i386/winnt.c (i386_pe_determine_dllexport_p): Likewise.

Tested for x86_64-w64-mingw32 and i686-w64-mingw32.  I will apply this in
a couple of hours, if there are no objections.

Regards,
Kai

Index: winnt-cxx.c
===
--- winnt-cxx.c(Revision 202491)
+++ winnt-cxx.c(Arbeitskopie)
@@ -65,6 +65,13 @@ i386_pe_type_dllexport_p (tree decl)
   if (TREE_CODE (TREE_TYPE (decl)) == METHOD_TYPE
   && DECL_ARTIFICIAL (decl) && !DECL_THUNK_P (decl))
 return false;
+  if (TREE_CODE (decl) == FUNCTION_DECL
+  && DECL_DECLARED_INLINE_P (decl))
+{
+  if (DECL_REALLY_EXTERN (decl)
+  || !flag_keep_inline_dllexport)
+return false;
+}
   return true;
 }

Index: winnt.c
===
--- winnt.c(Revision 202491)
+++ winnt.c(Arbeitskopie)
@@ -110,6 +110,11 @@ i386_pe_determine_dllexport_p (tree decl)
   if (!TREE_PUBLIC (decl))
 return false;

+  if (TREE_CODE (decl) == FUNCTION_DECL
+  && DECL_DECLARED_INLINE_P (decl)
+  && !flag_keep_inline_dllexport)
+return false;
+
   if (lookup_attribute ("dllexport", DECL_ATTRIBUTES (decl)))
 return true;

Re: [PATCH GCC] Tweak gimple-ssa-strength-reduction.c:backtrace_base_for_ref () to cover different cases as seen on AArch64

2013-09-11 Thread Richard Biener

On Tue, Sep 10, 2013 at 5:53 PM, Yufeng Zhang  wrote:
> Hi,
>
> Following Bin's patch in
> http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00695.html, this patch tweaks
> backtrace_base_for_ref () to strip of any widening conversion after the
> first TREE_CODE check fails.  Without this patch, the test
> (gcc.dg/tree-ssa/slsr-39.c) in Bin's patch will fail on AArch64, as
> backtrace_base_for_ref () will stop if not seeing an ssa_name since the tree
> code can be nop_expr instead.
>
> Regtested on arm and aarch64; still bootstrapping x86_64.
>
> OK for the trunk if the x86_64 bootstrap succeeds?

Please add a testcase.

Richard.

> Thanks,
> Yufeng
>
> gcc/
>
> * gimple-ssa-strength-reduction.c (backtrace_base_for_ref): Call
> get_unwidened and check 'base_in' again.

Re: [1/4] Using gen_int_mode instead of GEN_INT

2013-09-11 Thread Eric Botcazou

> Now bootstrapped & regression-tested on x86_64-linux-gnu.  OK to install?

Sure, modulo...

> Thanks,
> Richard
> 
> 
> gcc/
>   * simplify-rtx.c (simplify_unary_operation_1): Use simplify_gen_binary
>   for (not (neg ...)) and (neg (not ...)) casees.

'cases'

-- 
Eric Botcazou

Re: V4 Lambda templates and implicit function templates.

2013-09-11 Thread Adam Butcher


On 10.09.2013 03:19, Adam Butcher wrote:
 - Instantiation of generic conversion op ICEs if the call op 
contains

   declarations and hasn't already been instantiated.

This is not a complete enough description.  It only ICEs instantiating 
the call op through the decltype return of the conversion op if the 
return type of the call op is a deduced one (i.e. unspecified or 
specified explicitly as 'auto').  If the lambda call op is instantiated 
explicitly (i.e. the lambda is called) prior to using the conversion op 
then all is well.  It seems to only occur if there are variables 
declared within the lambda body or accessible via the lambda's 'this'.


Specifically, the ICE is in tsubst_decl (cp/pt.c:10839) asserting 
'gcc_unreachable' due to being 'cp_unevaluated_operand'.  The 
instantiation chain starts from attempting to 'mark_used' the call op in 
the decltype expression.


The same ICE can be caused in user code by attempting to take get 
decltype of a generic lambda call having a deduced return type and 
declarations:


This is fine:

   auto f = []  (T) {};
   decltype (f (4.f)) *p;

This is not; it ICEs doing 'tsubst_decl' on the declaration 'x'.

   auto f = []  (T) { int x; };
   decltype (f (4.f)) *p;

The conversion op is clearly not a factor here but can be removed from 
the equation completely by adding a capture.  The ICE still occurs.  In 
this case it occurs trying to do 'tsubst_decl' on the capture decl 'i'.


   int i = 0;
   auto f = [i]  (T) {};
   decltype (f (4.f)) *p;

ice.cpp: In instantiation of ‘main():: [with T = float]’:
ice.cpp:5:20:   required from here
ice.cpp:4:34: internal compiler error: in tsubst_decl, at cp/pt.c:10839
auto f = [i]  (T) {};
  ^
Any ideas?  Looks like it's something to do with how the call operator 
is defined.  Is there some flag I'm missing in the generic case?


Cheers,
Adam

Re: [PATCH] Fix build_range_check (PR tree-optimization/58385)

2013-09-11 Thread Richard Biener

On Tue, 10 Sep 2013, Jakub Jelinek wrote:

> Hi!
> 
> If exp has side effects, for [-,-] ranges we would optimize them away.
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk/4.8?

Ok.

Thanks,
Richard.

> 2013-09-10  Jakub Jelinek  
> 
>   PR tree-optimization/58385
>   * fold-const.c (build_range_check): If both low and high are NULL,
>   use omit_one_operand_loc to preserve exp side-effects.
> 
>   * gcc.c-torture/execute/pr58385.c: New test.
> 
> --- gcc/fold-const.c.jj   2013-09-09 11:32:39.0 +0200
> +++ gcc/fold-const.c  2013-09-10 20:48:12.018170411 +0200
> @@ -4299,7 +4299,7 @@ build_range_check (location_t loc, tree
>  }
>  
>if (low == 0 && high == 0)
> -return build_int_cst (type, 1);
> +return omit_one_operand_loc (loc, type, build_int_cst (type, 1), exp);
>  
>if (low == 0)
>  return fold_build2_loc (loc, LE_EXPR, type, exp,
> --- gcc/testsuite/gcc.c-torture/execute/pr58385.c.jj  2013-09-10 
> 20:50:02.909589473 +0200
> +++ gcc/testsuite/gcc.c-torture/execute/pr58385.c 2013-09-10 
> 20:48:38.0 +0200
> @@ -0,0 +1,21 @@
> +/* PR tree-optimization/58385 */
> +
> +extern void abort (void);
> +
> +int a, b = 1;
> +
> +int
> +foo ()
> +{
> +  b = 0;
> +  return 0;
> +}
> +
> +int
> +main ()
> +{
> +  ((0 || a) & foo () >= 0) <= 1 && 1;
> +  if (b)
> +abort ();
> +  return 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend

Re: [PATCH i386 2/8] [AVX512] Add mask registers.

2013-09-11 Thread Kirill Yukhin

Hello,
On 10 Sep 11:51, Richard Henderson wrote:
> On 09/10/2013 11:25 AM, Kirill Yukhin wrote:
> > Is it ok now?
> 
> 
> Yes.
Thanks a lot!
Checked into main trunk: http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00354.html

--
Thanks, K

Re: [1/4] Using gen_int_mode instead of GEN_INT

2013-09-11 Thread Richard Sandiford

James Greenhalgh  writes:
> On Tue, Sep 10, 2013 at 08:09:42PM +0100, Richard Sandiford wrote:
>> Sorry for the breakage.  gen_int_mode and GEN_INT really are only for
>> scalar integers though.  (So is plus_constant.)  Vector constants should
>> be CONST_VECTORs rather than CONST_INTs.
>> 
>> I think the gcc.target/aarch64/vect-fcm-eq-d.c failure is from a latent
>> bug in the way (neg (not ...)) and (not (neg ...)) are handled.
>> Could you give the attached patch a go?
>
> Thanks Richard, this patch fixes the test FAILs I was seeing.

Now bootstrapped & regression-tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
* simplify-rtx.c (simplify_unary_operation_1): Use simplify_gen_binary
for (not (neg ...)) and (neg (not ...)) casees.

Index: gcc/simplify-rtx.c
===
--- gcc/simplify-rtx.c  2013-09-10 20:02:08.756091875 +0100
+++ gcc/simplify-rtx.c  2013-09-10 20:02:09.002093907 +0100
@@ -825,7 +825,8 @@ simplify_unary_operation_1 (enum rtx_cod
 
   /* Similarly, (not (neg X)) is (plus X -1).  */
   if (GET_CODE (op) == NEG)
-   return plus_constant (mode, XEXP (op, 0), -1);
+   return simplify_gen_binary (PLUS, mode, XEXP (op, 0),
+   CONSTM1_RTX (mode));
 
   /* (not (xor X C)) for C constant is (xor X D) with D = ~C.  */
   if (GET_CODE (op) == XOR
@@ -932,7 +933,8 @@ simplify_unary_operation_1 (enum rtx_cod
 
   /* Similarly, (neg (not X)) is (plus X 1).  */
   if (GET_CODE (op) == NOT)
-   return plus_constant (mode, XEXP (op, 0), 1);
+   return simplify_gen_binary (PLUS, mode, XEXP (op, 0),
+   CONST1_RTX (mode));
 
   /* (neg (minus X Y)) can become (minus Y X).  This transformation
 isn't safe for modes with signed zeros, since if X and Y are

Re: RFC: patch to build GCC for arm with LRA

2013-09-11 Thread Richard Sandiford

Yvan Roux  writes:
> @@ -5454,6 +5454,16 @@ strip_address_mutations (rtx *loc, enum rtx_code 
> *outer_code)
>   /* Things like SIGN_EXTEND, ZERO_EXTEND and TRUNCATE can be
>  used to convert between pointer sizes.  */
>   loc = &XEXP (*loc, 0);
> +  else if (GET_RTX_CLASS (code) == RTX_BITFIELD_OPS)
> + {
> +   /* Bitfield operations [SIGN|ZERO]_EXTRACT can be used too.  */
> +   enum machine_mode mode = GET_MODE(*loc);
> +   unsigned HOST_WIDE_INT len = INTVAL (XEXP (*loc, 1));
> +   HOST_WIDE_INT pos = INTVAL (XEXP (*loc, 2));
> +
> +   if (pos == (BITS_BIG_ENDIAN ? GET_MODE_PRECISION (mode) - len : 0))
> + loc = &XEXP (*loc, 0);
> + }

This means that the other values of "pos" bypass the:

  else
return loc;

so you'll get an infinite loop.  I think it would be neater to split
this out into:

/* Return true if X is a sign_extract or zero_extract from the least
   significant bit.  */

static bool
lsb_bitfield_op_p (rtx X)
{
  ...;
}

else if (lsb_bitfield_op_p (*loc))
  loc = &XEXP (*loc, 0);

Looks good to me otherwise FWIW.

Thanks,
Richard

Re: [i386, doc] Add documentation for fxsr, xsave, xsaveopt

2013-09-11 Thread Kirill Yukhin

Hello,
On 10 Sep 19:42, Uros Bizjak wrote:
> The patch is OK for mainline.
Checked into main trunk: http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00353.html

--
Thanks, K

Re: [PATCH, ARM, LRA] Prepare ARM build with LRA

2013-09-11 Thread Yvan Roux

New attempt, with fixes from Richard's comments (discussed in the other thread).

Thanks,
Yvan

2013-09-09  Yvan Roux  
Vladimir Makarov  

* rtlanal.c (strip_address_mutations): Add bitfield operations
handling.
(shift_code_p): New predicate for shifting operations.
(must_be_index_p): Add shifting operations handling.
(set_address_index): Likewise.

On 9 September 2013 10:01, Yvan Roux  wrote:
> Hi,
>
> here are the modifications, discussed in another thread, needed in
> rtlanal.c by ARM targets (AArch32 and AArch64) to build GCC with LRA.
>
> Is it ok for trunk ?
>
> Thanks,
> Yvan
>
> 2013-09-09  Yvan Roux  
> Vladimir Makarov  
>
> * rtlanal.c (must_be_index_p, set_address_index): Add ASHIFTRT,
> LSHIFTRT, ROTATE, ROTATERT and SIGN_EXTRACT handling.
> (set_address_base): Add SIGN_EXTRACT handling.

arm-lra-rtl.patch
Description: Binary data

88 matches

Mail list logo