date:20120913

Re: [PATCH] Combine location with block using block_locations

2012-09-13 Thread Richard Guenther

On Thu, Sep 13, 2012 at 7:37 PM, Mike Stump  wrote:
> On Sep 13, 2012, at 6:52 AM, Robert Dewar  wrote:
>> Sure, it is obvious that you don't want -g to affect -O1 or -O2 code,
>> but I think if you have -Og (if and when we have that), it would not
>> be a bad thing for -g to affect that.
>
> No, instead think of -Og as affecting the -g output itself.  If it does, then 
> there is nothing for -g to affect when used with -Og.  So, I agree, -g -Og 
> can have any impact on code-gen that we want, I just dis-agree that -Og 
> should be any different; I just don't see the need.

I think it's going to make GCC harder to maintain if we drop the -g0
vs. -g no-code-difference requirement for just some optimization
levels.

Richard.

Re: [PATCH] Changes in mode switching

2012-09-13 Thread Vladimir Yakovlev

Hello,

I reproduced the failure and found reason of it. I understood haw it
resolve and now I need small changes only - additional argument of
EMIT_MODE_SET. Is it good fo trunk?

Thank you,
Vladimir

2012-09-14  Vladimir Yakovlev  

* (optimize_mode_switching): Added an argument EMIT_MODE_SET calls.

* config/epiphany/epiphany.h (EMIT_MODE_SET): Added an argument.

* config/i386/i386.h (EMIT_MODE_SET): Added an argument.

* config/sh/sh.h (EMIT_MODE_SET): Added an argument.


2012/8/29 Vladimir Yakovlev :
> I built using last configure.
>
> Thank you,
> Vladimir
>
> 2012/8/29 Kaz Kojima :
>>> I tryed
>>>
>>> ../gcc/configure --host=i686-pc-linux-gnu
>>> --target=sh4-unknown-linux-gnu --enable-build-with-cxx --enable-lto
>>> --enable-shared --enable-threads=posix --enable-clocale=gnu
>>> --enable-libitm --enable-libgcj
>>> --with-ld=/usr/local/bin/sh4-unknown-linux-gnu-ld
>>> --with-as=/usr/local/bin/sh4-unknown-linux-gnu-as
>>> --with-sysroot=/exp/ldroot --with-mpfr=/opt2/i686-pc-linux-gnu
>>> --with-mpc=/opt2/i686-pc-linux-gnu
>>> --with-libelf=/opt2/i686-pc-linux-gnu --with-ppl=no
>>> --enable-languages=c,c++,fortran,java,lto,objc
>>> --prefix=/export/users/mstester/stability/work/trunk/64/install_sh4
>>>
>>> and have got build error. make.log attached. Could you take a look?
>>
>> make.log says
>>
>>> make[2]: i686-pc-linux-gnu-ar: Command not found
>>
>> It looks your build system is x86_64-unknown-linux-gnu.
>> Perhaps with specifying --host=x86_64-unknown-linux-gnu instead
>> of --host=i686-pc-linux-gnu in your configuration, that error
>> could be resolved, though
>>
>>> --with-ld=/usr/local/bin/sh4-unknown-linux-gnu-ld
>>> --with-as=/usr/local/bin/sh4-unknown-linux-gnu-as
>>> --with-sysroot=/exp/ldroot --with-mpfr=/opt2/i686-pc-linux-gnu
>>> --with-mpc=/opt2/i686-pc-linux-gnu
>>> --with-libelf=/opt2/i686-pc-linux-gnu
>>
>> are strongly specific to my environment.  Maybe
>>
>>   ../gcc/configure --host=x86_64-unknown-linux-gnu 
>> --target=sh4-unknown-linux-gnu --enable-languages=c
>>
>> and
>>
>>   make all-gcc
>>
>> is enough to get cc1 for sh4-unknown-linux-gnu.
>>
>> Best Regards,
>> kaz


middle.patch
Description: Binary data

Re: [PATCH] Combine location with block using block_locations

2012-09-13 Thread Dehao Chen

On Fri, Sep 14, 2012 at 12:49 AM, Tom Tromey  wrote:
>> "Dehao" == Dehao Chen  writes:
>
> Dehao> + static htab_t location_adhoc_data_htab;
> Dehao> + static source_location curr_adhoc_loc;
> Dehao> + static struct location_adhoc_data *location_adhoc_data;
> Dehao> + static unsigned int allocated_location_adhoc_data;
>
> libcpp was written to allow multiple preprocessor objects to be created
> and used in one process.  I think introducing globals like this breaks
> this part of the design.  It seems to me they should instead be fields
> of cpp_reader or line_maps.

Okay. I've moved these into line_maps. Will send out the whole patch soon...

Thanks,
Dehao

>
> Tom

PATCH: PR debug/54568: --eh-frame-hdr should also be enabled for static executable

2012-09-13 Thread H.J. Lu

Hi,

There is no reason why --eh-frame-hdr can't be used with static
executable on Linux.  This patch enables --eh-frame-hdr for static
executable on Linux and adds an exception test for static executable.
Other platforms may also work correctly.  But I can't verify it.

Tested on Linux/x86-64.  OK to install?

Thanks.


H.J.

gcc/

2012-09-13  H.J. Lu  

PR debug/54568
* config/gnu-user.h (USE_LD_EH_FRAME_HDR_FOR_STATIC): Defined
if HAVE_LD_EH_FRAME_HDR is defined.
(LINK_EH_SPEC): Drop "!static".

gcc/testsuite/

2012-09-13  H.J. Lu  

PR debug/54568
* g++.dg/eh/spec3-static.C: New test.

libgcc/

2012-09-13  H.J. Lu  

PR debug/54568
* crtstuff.c (USE_PT_GNU_EH_FRAME): Check CRTSTUFFT_O together
with USE_LD_EH_FRAME_HDR_FOR_STATIC.

diff --git a/gcc/config/gnu-user.h b/gcc/config/gnu-user.h
index cb45749..aa4e78d 100644
--- a/gcc/config/gnu-user.h
+++ b/gcc/config/gnu-user.h
@@ -82,7 +82,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #define LIB_SPEC GNU_USER_TARGET_LIB_SPEC
 
 #if defined(HAVE_LD_EH_FRAME_HDR)
-#define LINK_EH_SPEC "%{!static:--eh-frame-hdr} "
+#define USE_LD_EH_FRAME_HDR_FOR_STATIC
+#define LINK_EH_SPEC "--eh-frame-hdr "
 #endif
 
 #undef LINK_GCC_C_SEQUENCE_SPEC
diff --git a/gcc/testsuite/g++.dg/eh/spec3-static.C 
b/gcc/testsuite/g++.dg/eh/spec3-static.C
new file mode 100644
index 000..15408ef
--- /dev/null
+++ b/gcc/testsuite/g++.dg/eh/spec3-static.C
@@ -0,0 +1,25 @@
+// PR c++/4381
+// Test that exception-specs work properly for classes with virtual bases.
+
+// { dg-do run }
+// { dg-options "-static" }
+
+class Base {};
+
+struct A : virtual public Base
+{
+  A() {}
+};
+
+struct B {};
+
+void func() throw (B,A)
+{
+  throw A();
+}
+
+int main(void)
+{
+  try {func(); }
+  catch (A& a) { }
+}
diff --git a/libgcc/crtstuff.c b/libgcc/crtstuff.c
index 973956a..01cf254 100644
--- a/libgcc/crtstuff.c
+++ b/libgcc/crtstuff.c
@@ -84,7 +84,8 @@ call_ ## FUNC (void)  \
 #if defined(OBJECT_FORMAT_ELF) \
 && !defined(OBJECT_FORMAT_FLAT) \
 && defined(HAVE_LD_EH_FRAME_HDR) \
-&& !defined(inhibit_libc) && !defined(CRTSTUFFT_O) \
+&& !defined(inhibit_libc) \
+&& (defined(USE_LD_EH_FRAME_HDR_FOR_STATIC) || !defined(CRTSTUFFT_O)) \
 && defined(__FreeBSD__) && __FreeBSD__ >= 7
 #include 
 # define USE_PT_GNU_EH_FRAME
@@ -93,7 +94,8 @@ call_ ## FUNC (void)  \
 #if defined(OBJECT_FORMAT_ELF) \
 && !defined(OBJECT_FORMAT_FLAT) \
 && defined(HAVE_LD_EH_FRAME_HDR) && defined(TARGET_DL_ITERATE_PHDR) \
-&& !defined(inhibit_libc) && !defined(CRTSTUFFT_O) \
+&& !defined(inhibit_libc) \
+&& (defined(USE_LD_EH_FRAME_HDR_FOR_STATIC) || !defined(CRTSTUFFT_O)) \
 && defined(__sun__) && defined(__svr4__)
 #include 
 # define USE_PT_GNU_EH_FRAME
@@ -102,7 +104,8 @@ call_ ## FUNC (void)
\
 #if defined(OBJECT_FORMAT_ELF) \
 && !defined(OBJECT_FORMAT_FLAT) \
 && defined(HAVE_LD_EH_FRAME_HDR) \
-&& !defined(inhibit_libc) && !defined(CRTSTUFFT_O) \
+&& !defined(inhibit_libc) \
+&& (defined(USE_LD_EH_FRAME_HDR_FOR_STATIC) || !defined(CRTSTUFFT_O)) \
 && defined(__GLIBC__) && __GLIBC__ >= 2
 #include 
 /* uClibc pretends to be glibc 2.2 and DT_CONFIG is defined in its link.h.
@@ -117,7 +120,7 @@ call_ ## FUNC (void)
\
 #if defined(OBJECT_FORMAT_ELF) \
 && !defined(OBJECT_FORMAT_FLAT) \
 && defined(HAVE_LD_EH_FRAME_HDR) \
-&& !defined(CRTSTUFFT_O) \
+&& (defined(USE_LD_EH_FRAME_HDR_FOR_STATIC) || !defined(CRTSTUFFT_O)) \
 && defined(inhibit_libc) \
 && (defined(__GLIBC__) || defined(__gnu_linux__) || defined(__GNU__))
 /* On systems using glibc, an inhibit_libc build of libgcc is only

PR libstdc++/54576: random_device isn't protected by _GLIBCXX_USE_C99_STDINT_TR1

2012-09-13 Thread H.J. Lu

Hi,

include/random has

#ifdef _GLIBCXX_USE_C99_STDINT_TR1

#include  // For uint_fast32_t, uint_fast64_t, uint_least32_t
#include 
#include 

#endif // _GLIBCXX_USE_C99_STDINT_TR1

random_device is defined in . But src/c++11/random.cc
has

#include 
...
  void
  random_device::_M_init(const std::string& token)
  {

It doesn't check if _GLIBCXX_USE_C99_STDINT_TR1 is defined.  This
patch checks it.  OK to install?

Thanks.


H.J.
--
2012-09-13  H.J. Lu  

PR libstdc++/54576
* libstdc++-v3/src/c++11/random.cc: Check if
_GLIBCXX_USE_C99_STDINT_TR1 is defined.

diff --git a/libstdc++-v3/src/c++11/random.cc b/libstdc++-v3/src/c++11/random.cc
index 4342df4..bb51fba 100644
--- a/libstdc++-v3/src/c++11/random.cc
+++ b/libstdc++-v3/src/c++11/random.cc
@@ -24,6 +24,7 @@
 
 #include 
 
+#ifdef  _GLIBCXX_USE_C99_STDINT_TR1
 #if defined __i386__ || defined __x86_64__
 # include 
 #endif
@@ -144,3 +145,4 @@ namespace std _GLIBCXX_VISIBILITY(default)
 0xefc6UL, 18, 1812433253UL>;
 
 }
+#endif

Re: vector comparisons in C++

2012-09-13 Thread Marc Glisse


On Thu, 13 Sep 2012, Jason Merrill wrote:

Furthermore, this builtin support would be useful for implementing 
a C++ class for vector arithmetic, just as it is with std::complex.  I'm not 
aware of any other portable way to implement such a class.


I forgot to say: it is always possible to do the operations elementwise:
__m128d x,y;
__m128i cmp={(x[0]This counts on the middle-end to recognize the patterns. But that will 
have to be implemented anyway, possibly for sqrt, and certainly for type 
conversions:


__m128i i;
__m128d x={i[0],i[1]};

because the sensible syntax (__m128d)i has been taken to mean 
*(__m128d*)&i :-(


(it is also possible to add __builtin_convert, __builtin_math_sqrt, etc)

--
Marc Glisse

Re: vector comparisons in C++

2012-09-13 Thread Marc Glisse


On Fri, 14 Sep 2012, Marc Glisse wrote:


While checking my facts for the previous paragraph, I got an ICE :-(

typedef int vec __attribute__((vector_size(16)));
vec const f(vec x,vec y){return xThe same program compiles with gcc (prepare_cmp_insn isn't called), but ICEs 
with g++. Looking at the 003t.original tree dump, the C one looks like:


 return VIEW_CONVERT_EXPR(x < y);

while the C++ one looks like:

return  = x < y ? { -1, -1, -1, -1 } : { 0, 0, 0, 0 };

or in raw form:

 gimple_cond , >
 gimple_label <>
 gimple_assign 
 gimple_goto <>
 gimple_label <>
 gimple_assign 
 gimple_label <>
 gimple_assign 
 gimple_return 

That doesn't look very vector-like... I'll investigate before resending the 
patch.


Looks like a latent bug in fold_unary. The following seems to work in this 
case. We then end up with a VIEW_CONVERT_EXPR in C and a NOP_EXPR in C++, 
both seem fine. I'll post a combined patch once I have tested it.


--- ../fold-const.c (revision 191279)
+++ ../fold-const.c (working copy)
@@ -7764,21 +7764,21 @@ fold_unary_loc (location_t loc, enum tre
  /* If we have (type) (a CMP b) and type is an integral type, return
 new expression involving the new type.  Canonicalize
 (type) (a CMP b) to (a CMP b) ? (type) true : (type) false for
 non-integral type.
 Do not fold the result as that would not simplify further, also
 folding again results in recursions.  */
  if (TREE_CODE (type) == BOOLEAN_TYPE)
return build2_loc (loc, TREE_CODE (op0), type,
   TREE_OPERAND (op0, 0),
   TREE_OPERAND (op0, 1));
- else if (!INTEGRAL_TYPE_P (type))
+ else if (!INTEGRAL_TYPE_P (type) && TREE_CODE (type) != VECTOR_TYPE)
return build3_loc (loc, COND_EXPR, type, op0,
   constant_boolean_node (true, type),
   constant_boolean_node (false, type));
}

   /* Handle cases of two conversions in a row.  */
   if (CONVERT_EXPR_P (op0))
{
  tree inside_type = TREE_TYPE (TREE_OPERAND (op0, 0));
  tree inter_type = TREE_TYPE (op0);


--
Marc Glisse

[Patch, moxie] bi-endian support for moxie

2012-09-13 Thread Anthony Green

Here are some changes in support of a little-endian soft-core
implementation of moxie.   We now build multilibs for both endians.
Corresponding binutils changes have already been committed and I am
checking this in.  (note: it does include  what I believe are
trivially correct doc changes - apologies in advance if I should have
waited for a review).

Thanks,

Anthony Green



2012-09-13  Anthony Green  

* config/moxie/moxie.h (LINK_SPEC): Add bi-endian support.
(MULTILIB_DEFAULTS): Define.
(ASM_SPEC): Define.
(BYTES_BIG_ENDIAN, WORDS_BIG_ENDIAN): Add bi-endian support.
(TARGET_CPU_CPP_BUILTINS): Add __MOXIE_LITTLE_ENDIAN__ and
__MOXIE_BIG_ENDIAN__.
* config/moxie/t-moxie (MULTILIB_DIRNAMES, MULTILIB_OPTIONS):
Define.
* config/moxie/moxie.opt: New file.
* doc/invoke.texi (Moxie Options): Add section documenting -mel
and -meb.


Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 191083)
+++ gcc/doc/invoke.texi (working copy)
@@ -776,6 +776,9 @@
 -mreturn-pointer-on-d0 @gol
 -mno-crt0  -mrelax -mliw -msetlb}

+@emph{Moxie Options}
+@gccoptlist{-meb  -mel}
+
 @emph{PDP-11 Options}
 @gccoptlist{-mfpu  -msoft-float  -mac0  -mno-ac0  -m40  -m45  -m10 @gol
 -mbcopy  -mbcopy-builtin  -mint32  -mno-int16 @gol
@@ -10514,6 +10517,7 @@
 * MIPS Options::
 * MMIX Options::
 * MN10300 Options::
+* Moxie Options::
 * PDP-11 Options::
 * picoChip Options::
 * PowerPC Options::
@@ -16433,6 +16437,23 @@

 @end table

+@node Moxie Options
+@subsection Moxie Options
+@cindex Moxie Options
+
+@table @gcctabopt
+
+@item -meb
+@opindex meb
+Generate big-endian code.  This is the default for @samp{moxie-*-*}
+configurations.
+
+@item -mel
+@opindex mel
+Generate little-endian code.
+
+@end table
+
 @node PDP-11 Options
 @subsection PDP-11 Options
 @cindex PDP-11 Options
Index: gcc/config/moxie/moxie.h
===
--- gcc/config/moxie/moxie.h(revision 191083)
+++ gcc/config/moxie/moxie.h(working copy)
@@ -41,9 +41,13 @@
 #define LIB_SPEC "%{!shared:%{!symbolic:-lc}}"

 #undef  LINK_SPEC
-#define LINK_SPEC "%{h*} %{v:-V} \
+#define LINK_SPEC "%{h*} %{v:-V} %{!mel:-EB} %{mel:-EL}\
   %{static:-Bstatic} %{shared:-shared} %{symbolic:-Bsymbolic}"

+#ifndef MULTILIB_DEFAULTS
+#define MULTILIB_DEFAULTS { "meb" }
+#endif
+
 /* Layout of Source Language Data Types */

 #define INT_TYPE_SIZE 32
@@ -192,6 +196,7 @@
 /* The Overall Framework of an Assembler File */

 #undef  ASM_SPEC
+#define ASM_SPEC "%{!mel:-EB} %{mel:-EL}"
 #define ASM_COMMENT_START "#"
 #define ASM_APP_ON ""
 #define ASM_APP_OFF ""
@@ -291,8 +296,8 @@
 /* Storage Layout */

 #define BITS_BIG_ENDIAN 0
-#define BYTES_BIG_ENDIAN 1
-#define WORDS_BIG_ENDIAN 1
+#define BYTES_BIG_ENDIAN ( ! TARGET_LITTLE_ENDIAN )
+#define WORDS_BIG_ENDIAN ( ! TARGET_LITTLE_ENDIAN )

 /* Alignment required for a function entry point, in bits.  */
 #define FUNCTION_BOUNDARY 16
@@ -473,8 +478,12 @@

 #define TARGET_CPU_CPP_BUILTINS() \
   { \
-builtin_define_std ("moxie");  \
-builtin_define_std ("MOXIE");  \
+builtin_define_std ("moxie");  \
+builtin_define_std ("MOXIE");  \
+if (TARGET_LITTLE_ENDIAN)  \
+  builtin_define ("__MOXIE_LITTLE_ENDIAN__");  \
+else   \
+  builtin_define ("__MOXIE_BIG_ENDIAN__"); \
   }

 #define HAS_LONG_UNCOND_BRANCH true
Index: gcc/config/moxie/t-moxie
===
--- gcc/config/moxie/t-moxie(revision 191083)
+++ gcc/config/moxie/t-moxie(working copy)
@@ -18,3 +18,6 @@
 # along with GCC; see the file COPYING3.  If not see
 # .

+MULTILIB_OPTIONS = meb/mel
+MULTILIB_DIRNAMES= eb el
+
Index: gcc/config/moxie/moxie.opt
===
--- gcc/config/moxie/moxie.opt  (revision 0)
+++ gcc/config/moxie/moxie.opt  (working copy)
@@ -0,0 +1,27 @@
+; Options for the moxie compiler port.
+
+; Copyright (C) 2012 Free Software Foundation, Inc.
+;
+; This file is part of GCC.
+;
+; GCC is free software; you can redistribute it and/or modify it under
+; the terms of the GNU General Public License as published by the Free
+; Software Foundation; either version 3, or (at your option) any later
+; version.
+;
+; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+; WARRANTY; without even the implied warranty of MERCHANTABILITY or
+; FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+; for more details.
+;
+; You should have received a copy of the GNU General Public License
+; along with GCC; see the file COPYING3.  If not see
+; .
+
+meb
+Target Rejec

Re: [SH] Fix bootstrap failures with --enable-checking

2012-09-13 Thread Kaz Kojima

Christian Bruel  wrote:
> This patch fixes a couple of assertions while building libgcc, when
> configured with --enable-checking=all.
> 
> OK for trunk ?

OK.

Regards,
kaz

[SPARC] Implement TImode support

2012-09-13 Thread Eric Botcazou

Now that TImode support is enabled on SPARC 64-bit, let's implement it. :-)
This is modeled on the TFmode support and, consequently, inherits its relative 
verbosity.  A future cleanup could simplify it a little and unify it with the 
TFmode support, as e.g. for Alpha.

Bootstrapped/regtested on SPARC/Solaris and SPARC64/Solaris, applied on the 
mainline.


2012-09-13  Eric Botcazou  

* config/sparc/predicates.md (input_operand): Do not consider TImode
constants as 1-instruction integer constants.
Use register_or_zero_operand instead of register_operand and tidy up.
* config/sparc/sparc.md (movti): New expander.
(movti_insn_sp64): New instruction.
(movti_insn_sp64_hq): Likewise.
(TImode splitters): New splitters.
* config/sparc/sparc.c (sparc_expand_move) : New case.
(sparc_legitimate_address_p): Return 0 for REG+REG in TImode.

* config/sparc/sparc-protos.h (arith_double_4096_operand): Delete.
(arith_4096_operand): Likewise.
(zero_operand): Likewise.
(fp_zero_operand): Likewise.
(reg_or_0_operand): Likewise.


-- 
Eric Botcazou
Index: config/sparc/predicates.md
===
--- config/sparc/predicates.md	(revision 191198)
+++ config/sparc/predicates.md	(working copy)
@@ -357,7 +357,7 @@ (define_predicate "arith_double_operand"
 (define_predicate "arith_add_operand"
   (ior (match_operand 0 "arith_operand")
(match_operand 0 "const_4096_operand")))
-   
+
 ;; Return true if OP is suitable as second double operand for add/sub.
 (define_predicate "arith_double_add_operand"
   (match_code "const_int,const_double,reg,subreg")
@@ -427,6 +427,7 @@ (define_predicate "input_operand"
 
   /* Allow any 1-instruction integer constant.  */
   if (mclass == MODE_INT
+  && mode != TImode
   && (small_int_operand (op, mode) || const_high_operand (op, mode)))
 return true;
 
@@ -440,12 +441,10 @@ (define_predicate "input_operand"
   if (mclass == MODE_FLOAT && GET_CODE (op) == CONST_DOUBLE)
 return true;
 
-  if (mclass == MODE_VECTOR_INT && GET_CODE (op) == CONST_VECTOR
-  && (const_zero_operand (op, mode)
-  || const_all_ones_operand (op, mode)))
+  if (mclass == MODE_VECTOR_INT && const_all_ones_operand (op, mode))
 return true;
 
-  if (register_operand (op, mode))
+  if (register_or_zero_operand (op, mode))
 return true;
 
   /* If this is a SUBREG, look inside so that we handle paradoxical ones.  */
Index: config/sparc/sparc.md
===
--- config/sparc/sparc.md	(revision 191198)
+++ config/sparc/sparc.md	(working copy)
@@ -2034,6 +2034,164 @@ (define_split
   DONE;
 })
 
+(define_expand "movti"
+  [(set (match_operand:TI 0 "nonimmediate_operand" "")
+	(match_operand:TI 1 "general_operand" ""))]
+  "TARGET_ARCH64"
+{
+  if (sparc_expand_move (TImode, operands))
+DONE;
+})
+
+;; We need to prevent reload from splitting TImode moves, because it
+;; might decide to overwrite a pointer with the value it points to.
+;; In that case we have to do the loads in the appropriate order so
+;; that the pointer is not destroyed too early.
+
+(define_insn "*movti_insn_sp64"
+  [(set (match_operand:TI 0 "nonimmediate_operand" "=r , o,?*e,?o,b")
+(match_operand:TI 1 "input_operand""roJ,rJ, eo, e,J"))]
+  "TARGET_ARCH64
+   && ! TARGET_HARD_QUAD
+   && (register_operand (operands[0], TImode)
+   || register_or_zero_operand (operands[1], TImode))"
+  "#"
+  [(set_attr "length" "2,2,2,2,2")
+   (set_attr "cpu_feature" "*,*,fpu,fpu,vis")])
+
+(define_insn "*movti_insn_sp64_hq"
+  [(set (match_operand:TI 0 "nonimmediate_operand" "=r , o,?*e,?*e,?m,b")
+(match_operand:TI 1 "input_operand""roJ,rJ,  e,  m, e,J"))]
+  "TARGET_ARCH64
+   && TARGET_HARD_QUAD
+   && (register_operand (operands[0], TImode)
+   || register_or_zero_operand (operands[1], TImode))"
+  "@
+  #
+  #
+  fmovq\t%1, %0
+  ldq\t%1, %0
+  stq\t%1, %0
+  #"
+  [(set_attr "type" "*,*,fpmove,fpload,fpstore,*")
+   (set_attr "length" "2,2,*,*,*,2")])
+
+;; Now all the splits to handle multi-insn TI mode moves.
+(define_split
+  [(set (match_operand:TI 0 "register_operand" "")
+(match_operand:TI 1 "register_operand" ""))]
+  "reload_completed
+   && ((TARGET_FPU
+&& ! TARGET_HARD_QUAD)
+   || (! fp_register_operand (operands[0], TImode)
+   && ! fp_register_operand (operands[1], TImode)))"
+  [(clobber (const_int 0))]
+{
+  rtx set_dest = operands[0];
+  rtx set_src = operands[1];
+  rtx dest1, dest2;
+  rtx src1, src2;
+
+  dest1 = gen_highpart (DImode, set_dest);
+  dest2 = gen_lowpart (DImode, set_dest);
+  src1 = gen_highpart (DImode, set_src);
+  src2 = gen_lowpart (DImode, set_src);
+
+  /* Now emit using the real source and destination we found, swapping
+ the order if we detect overlap.  */
+  if (reg_overlap_mentioned_p (des

Re: vector comparisons in C++

2012-09-13 Thread Marc Glisse


On Thu, 13 Sep 2012, Jason Merrill wrote:


I don't know either.


+  if (TREE_TYPE (type0) != TREE_TYPE (type1))


I think this should use same_type_ignoring_top_level_qualifiers_p.


Hmm, I assume you mean

same_type_ignoring_top_level_qualifiers_p (type0, type1)

which would replace both this test and

TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1)

below?


I was thinking just for the first test, but I suppose that would work too. 
My concern is that vectors of typedefs of the same type need to be 
compatible.


Oh, so you meant something like:

same_type_ignoring_top_level_qualifiers_p (TREE_TYPE (type0), TREE_TYPE (type1))

? I thought the element types of vectors were already kind of 
canonicalized. build_vector_type uses TYPE_MAIN_VARIANT (although it does 
handle the case where this is not its own TYPE_CANONICAL). And when I 
tried making a vector of T where T was a typedef for const int, I got a 
const vector of int instead, which I thought was pretty cool. But I agree 
that using same_type_ignoring_top_level_qualifiers_p is safer and can't 
hurt, plus it will keep the code close enough to the one in the C 
front-end.


While checking my facts for the previous paragraph, I got an ICE :-(

typedef int vec __attribute__((vector_size(16)));
vec const f(vec x,vec y){return xThe same program compiles with gcc (prepare_cmp_insn isn't called), but 
ICEs with g++. Looking at the 003t.original tree dump, the C one looks 
like:


  return VIEW_CONVERT_EXPR(x < y);

while the C++ one looks like:

return  = x < y ? { -1, -1, -1, -1 } : { 0, 0, 0, 0 };

or in raw form:

  gimple_cond , >
  gimple_label <>
  gimple_assign 
  gimple_goto <>
  gimple_label <>
  gimple_assign 
  gimple_label <>
  gimple_assign 
  gimple_return 

That doesn't look very vector-like... I'll investigate before resending 
the patch.


As Mike says, we want code that works in C to work in C++ too as much as 
possible.  Furthermore, this builtin support would be useful for implementing 
a C++ class for vector arithmetic, just as it is with std::complex.  I'm not 
aware of any other portable way to implement such a class.


Ok, thanks to both of you.

--
Marc Glisse

Re: [PATCH] Add slim-lto-bootstrap build-config

2012-09-13 Thread Andi Kleen

Markus Trippelsdorf  writes:

> Because there is no enthusiastic support for a full libtool update,
> here is a minimal version that adds a new slim-lto-bootstrap
> build-config. 

Can you split the two patches? libtool and ltmain? Thanks for extracting
those out.

Looks good to me, but eventually this should be just the default for
lto-bootstrap

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only

Re: [PATCH] Fix PR54489 - FRE needing AVAIL_OUT

2012-09-13 Thread Steven Bosscher

On Wed, Sep 12, 2012 at 4:52 PM, Steven Bosscher wrote:
> On Wed, Sep 12, 2012 at 4:02 PM, Richard Guenther wrote:
>> for a followup (and I bet sth else than PRE blows up at -O2 as well).
>
> Actually, the only thing that really blows up is that enemy of scalability, 
> VRP.

FWIW, this appears to be due to the well-known problem with the equiv
set, but also due to the liveness computations that tree-vrp performs,
since your commit in r139263.

Any reason why you didn't just re-use the tree-ssa-live machinery?

And any reason why you don't let a DEF kill a live SSA name? AFAICT
you're exposing all SSA names up without ever killing a USE :-)

Ciao!
Steven

Re: vector comparisons in C++

2012-09-13 Thread Jason Merrill


On 09/13/2012 11:47 AM, Marc Glisse wrote:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51033

In comments 1 and 7, Richard Guenther didn't seem too enthusiastic about
any vector-related extension to the C++ front-end.

Some users (other PRs) asked instead that we make vector types
class-like so users can define their own operator<(vec,vec).


As Mike says, we want code that works in C to work in C++ too as much as 
possible.  Furthermore, this builtin support would be useful for 
implementing a C++ class for vector arithmetic, just as it is with 
std::complex.  I'm not aware of any other portable way to implement such 
a class.



Following the OpenCL standard makes sense to me.


I should really take a look at that standard...


My impression is that the C vector support was written to follow OpenCL, 
so extending the same semantics to C++ would also follow OpenCL.



I don't know either.


+  if (TREE_TYPE (type0) != TREE_TYPE (type1))


I think this should use same_type_ignoring_top_level_qualifiers_p.


Hmm, I assume you mean

same_type_ignoring_top_level_qualifiers_p (type0, type1)

which would replace both this test and

TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1)

below?


I was thinking just for the first test, but I suppose that would work 
too.  My concern is that vectors of typedefs of the same type need to be 
compatible.


Jason

[PATCH] Add slim-lto-bootstrap build-config

2012-09-13 Thread Markus Trippelsdorf

Because there is no enthusiastic support for a full libtool update,
here is a minimal version that adds a new slim-lto-bootstrap
build-config. 

Comments are welcome.
Thanks.

Tested on x86_64-pc-linux-gnu

2012-09-13  Markus Trippelsdorf  

* Makefile.in (configure-build-fixincludes): Pass CFLAGS
* Makefile.in (configure-fixincludes): Likewise.
* config/slim-lto-bootstrap.mk: new build-config
* libtool.m4 : Handle slim-lto objects
* ltmain.sh: Likewise.

diff --git a/Makefile.in b/Makefile.in
index 0108162..891168d 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -2835,6 +2835,7 @@ configure-build-fixincludes:
test ! -f $(BUILD_SUBDIR)/fixincludes/Makefile || exit 0; \
$(SHELL) $(srcdir)/mkinstalldirs $(BUILD_SUBDIR)/fixincludes ; \
$(BUILD_EXPORTS)  \
+   CFLAGS="$(STAGE_CFLAGS)"; export CFLAGS; \
echo Configuring in $(BUILD_SUBDIR)/fixincludes; \
cd "$(BUILD_SUBDIR)/fixincludes" || exit 1; \
case $(srcdir) in \
@@ -2870,6 +2871,7 @@ all-build-fixincludes: configure-build-fixincludes
$(BUILD_EXPORTS)  \
(cd $(BUILD_SUBDIR)/fixincludes && \
  $(MAKE) $(BASE_FLAGS_TO_PASS) $(EXTRA_BUILD_FLAGS)  \
+   CFLAGS="$(STAGE_CFLAGS)" \
$(TARGET-build-fixincludes))
 @endif build-fixincludes
 
@@ -7745,6 +7747,7 @@ configure-fixincludes:
test ! -f $(HOST_SUBDIR)/fixincludes/Makefile || exit 0; \
$(SHELL) $(srcdir)/mkinstalldirs $(HOST_SUBDIR)/fixincludes ; \
$(HOST_EXPORTS)  \
+   CFLAGS="$(STAGE_CFLAGS)"; export CFLAGS; \
echo Configuring in $(HOST_SUBDIR)/fixincludes; \
cd "$(HOST_SUBDIR)/fixincludes" || exit 1; \
case $(srcdir) in \
@@ -7779,6 +7782,7 @@ all-fixincludes: configure-fixincludes
$(HOST_EXPORTS)  \
(cd $(HOST_SUBDIR)/fixincludes && \
  $(MAKE) $(BASE_FLAGS_TO_PASS) $(EXTRA_HOST_FLAGS)  \
+   CFLAGS="$(STAGE_CFLAGS)" \
$(TARGET-fixincludes))
 @endif fixincludes
 
diff --git a/config/slim-lto-bootstrap.mk b/config/slim-lto-bootstrap.mk
new file mode 100644
index 000..11d1252
--- /dev/null
+++ b/config/slim-lto-bootstrap.mk
@@ -0,0 +1,9 @@
+# This option enables slim LTO for stage2 and stage3.
+
+STAGE2_CFLAGS += -flto=jobserver -fno-fat-lto-objects -frandom-seed=1
+STAGE3_CFLAGS += -flto=jobserver -fno-fat-lto-objects -frandom-seed=1
+STAGE_CFLAGS += -fuse-linker-plugin
+STAGEprofile_CFLAGS += -fno-lto
+AR = gcc-ar
+NM = gcc-nm
+RANLIB = gcc-ranlib
diff --git a/libtool.m4 b/libtool.m4
index a7f99ac..5754fb1 100644
--- a/libtool.m4
+++ b/libtool.m4
@@ -3434,6 +3434,7 @@ for ac_symprfx in "" "_"; do
   else
 lt_cv_sys_global_symbol_pipe="sed -n -e 's/^.*[[
]]\($symcode$symcode*\)[[   ]][[
]]*$ac_symprfx$sympat$opt_cr$/$symxfrm/p'"
   fi
+  lt_cv_sys_global_symbol_pipe="$lt_cv_sys_global_symbol_pipe | sed '/ 
__gnu_lto/d'"
 
   # Check to see that the pipe works correctly.
   pipe_works=no
@@ -4451,7 +4452,7 @@ _LT_EOF
   if $LD --help 2>&1 | $EGREP ': supported targets:.* elf' > /dev/null \
 && test "$tmp_diet" = no
   then
-   tmp_addflag=
+   tmp_addflag=' $pic_flag'
tmp_sharedflag='-shared'
case $cc_basename,$host_cpu in
 pgcc*) # Portland Group C compiler
@@ -5517,8 +5518,8 @@ if test "$_lt_caught_CXX_error" != yes; then
   # Check if GNU C++ uses GNU ld as the underlying linker, since the
   # archiving commands below assume that GNU ld is being used.
   if test "$with_gnu_ld" = yes; then
-_LT_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib $predep_objects 
$libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $wl$soname -o 
$lib'
-_LT_TAGVAR(archive_expsym_cmds, $1)='$CC -shared -nostdlib 
$predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname 
$wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib'
+_LT_TAGVAR(archive_cmds, $1)='$CC $pic_flag -shared -nostdlib 
$predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname 
$wl$soname -o $lib'
+_LT_TAGVAR(archive_expsym_cmds, $1)='$CC $pic_flag -shared -nostdlib 
$predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname 
$wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib'
 
 _LT_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath ${wl}$libdir'
 _LT_TAGVAR(export_dynamic_flag_spec, $1)='${wl}--export-dynamic'
@@ -6495,6 +6496,13 @@ public class foo {
 };
 _LT_EOF
 ])
+
+_lt_libdeps_save_CFLAGS=$CFLAGS
+case "$CC $CFLAGS " in #(
+*\ -flto*\ *) CFLAGS="$CFLAGS -fno-lto" ;;
+*\ -fwhopr*\ *) CFLAGS="$CFLAGS -fno-whopr" ;;
+esac
+
 dnl Parse the compiler output and extract the necessary
 dnl objects, libraries and library flags.
 if AC_TRY_EVAL(ac_compile); then
@@ -6543,6 +6551,7 @@ if AC_TRY_EVAL(ac_compile); then
fi
;;
 
+*.lto.$objext) ;; # Ignore

Re: Backtrace library [3/3]

2012-09-13 Thread Diego Novillo


On 2012-09-12 10:48 , Ian Lance Taylor wrote:

On Tue, Sep 11, 2012 at 3:55 PM, Ian Lance Taylor  wrote:


This patch is the actual implementation of libbacktrace.


This is the updated version of this patch with a state parameter.


This is OK.

Thank you so much for doing this!  This will help reducing the testing 
matrix a bit.  There should be fewer differences between 
--enable-checking values.


So, now we only need to hook gcc_assert to use this library.  I think we 
want a --param to specify how deep the backtrace should be.  A default 
value of 3-5 stack frames should be good to start, I think.



Diego.

Re: Backtrace library [2/3]

2012-09-13 Thread Diego Novillo


On 2012-09-11 18:54 , Ian Lance Taylor wrote:


2012-09-11  Ian Lance Taylor  

* MAINTAINERS (Various Maintainers): Add libbacktrace.
* configure.ac (host_libs): Add libbacktrace.
(target_libraries): Add libbacktrace.
* Makefile.def (host_modules): Add libbacktrace.
(target_modules): Likewise.
* configure, Makefile.in: Rebuild.


OK.


Diego.

Re: Backtrace library [1/3]

2012-09-13 Thread Diego Novillo


On 2012-09-11 18:53 , Ian Lance Taylor wrote:


2012-09-11  Ian Lance Taylor  

* Initial implementation.


OK.


Diego.

Re: [PATCH] Combine location with block using block_locations

2012-09-13 Thread Dehao Chen

On Thu, Sep 13, 2012 at 8:00 PM, Richard Guenther
 wrote:
> On Wed, Sep 12, 2012 at 7:20 PM, Dehao Chen  wrote:
>> There is another bug in the patch (not covered by unittests,
>> discovered through spec benchmarks).
>>
>> When we remove unused locals, we do not mark the block as used for
>> debug stmt, but gimple-streamer-out will still stream out blocks for
>> debug stmt. There can be 2 fixes:
>
> Because doing so would create code generation differences -g vs. -g0.
>
>> 1.
>> --- a/gcc/gimple-streamer-out.c
>> +++ b/gcc/gimple-streamer-out.c
>> @@ -77,7 +77,8 @@ output_gimple_stmt (struct output_block *ob, gimple stmt)
>>lto_output_location (ob, LOCATION_LOCUS (gimple_location (stmt)));
>>
>>/* Emit the lexical block holding STMT.  */
>> -  stream_write_tree (ob, gimple_block (stmt), true);
>> +  if (!is_gimple_debug (stmt))
>> +stream_write_tree (ob, gimple_block (stmt), true);
>>
>>/* Emit the operands.  */
>>switch (gimple_code (stmt))
>>
>> 2.
>> --- a/gcc/tree-ssa-live.c
>> +++ b/gcc/tree-ssa-live.c
>> @@ -726,9 +726,6 @@ remove_unused_locals (void)
>>   gimple stmt = gsi_stmt (gsi);
>>   tree b = gimple_block (stmt);
>>
>> - if (is_gimple_debug (stmt))
>> -   continue;
>> -
>>   if (gimple_clobber_p (stmt))
>> {
>>   have_local_clobbers = true;
>>
>> Either fix could work. Any suggests which one should we go?
>
> The 2nd one will not work and is not acceptable.  The 1st one - well ...
> what happens on trunk right now?  The debug stmt points to a
> BLOCK that is possibly removed from the BLOCK tree?  In this case
> I think the fix is 3. make sure remove_unused_scope_block_p will
> clear BLOCKs from all stmts / expressions that have been removed.
>

Thanks, updated the patch for this issue. Only attached the diff here,
will send out the whole patch with updated ChangeLog later.
Bootstrapped and passed all gcc regression tests. And also passed
SPEC2006 with LTO.

Dehao

diff --git a/gcc/tree-ssa-live.c b/gcc/tree-ssa-live.c
index 1381693..af09806 100644
--- a/gcc/tree-ssa-live.c
+++ b/gcc/tree-ssa-live.c
@@ -612,6 +612,47 @@ mark_all_vars_used (tree *expr_p)
   walk_tree (expr_p, mark_all_vars_used_1, NULL, NULL);
 }

+/* Helper function for clear_unused_block_pointer, called via walk_tree.  */
+
+static tree
+clear_unused_block_pointer_1 (tree *tp, int *, void *)
+{
+  if (EXPR_P (*tp) && TREE_BLOCK (*tp)
+  && !TREE_USED (TREE_BLOCK (*tp)))
+TREE_SET_BLOCK (*tp, NULL);
+  if (TREE_CODE (*tp) == VAR_DECL && DECL_DEBUG_EXPR_IS_FROM (*tp))
+{
+  tree debug_expr = DECL_DEBUG_EXPR (*tp);
+  walk_tree (&debug_expr, clear_unused_block_pointer_1, NULL, NULL);
+}
+  return NULL_TREE;
+}
+
+/* Set all block pointer in debug stmt to NULL if the block is unused,
+   so that they will not be streamed out.  */
+
+static void
+clear_unused_block_pointer ()
+{
+  basic_block bb;
+  gimple_stmt_iterator gsi;
+  FOR_EACH_BB (bb)
+for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+  {
+   unsigned i;
+   tree b;
+   gimple stmt = gsi_stmt (gsi);
+
+   if (!is_gimple_debug (stmt))
+ continue;
+   b = gimple_block (stmt);
+   if (b && !TREE_USED (b))
+ gimple_set_block (stmt, NULL);
+   for (i = 0; i < gimple_num_ops (stmt); i++)
+ walk_tree (gimple_op_ptr (stmt, i), clear_unused_block_pointer_1,
+NULL, NULL);
+  }
+}

 /* Dump scope blocks starting at SCOPE to FILE.  INDENT is the
indentation level and FLAGS is as in print_generic_expr.  */
@@ -841,6 +882,7 @@ remove_unused_locals (void)
 VEC_truncate (tree, cfun->local_decls, dstidx);

   remove_unused_scope_block_p (DECL_INITIAL (current_function_decl));
+  clear_unused_block_pointer ();

   BITMAP_FREE (usedvars);



> Richard.
>
>> Thanks,
>> Dehao
>>
>> On Wed, Sep 12, 2012 at 10:05 AM, Dehao Chen  wrote:
>>> There are two parts that needs memory management:
>>>
>>> 1. The BLOCK structure. This is managed by GC. I originally thought
>>> that removing blocks from tree.gsbase would paralyze GC. This turned
>>> out not to be a concern because DECL_INITIAL will still mark those
>>> used tree nodes. This patch may decrease the memory consumption by
>>> removing blocks from tree/gimple. However, as it makes more blocks
>>> become used, they also increase the memory consumption.
>>> 2. The data structure in libcpp that maintains the hashtable for the
>>> location->block mapping. This is relatively minor because for the
>>> largest source I've seen, it only maintains less than 100K entries in
>>> the array (less than 1M total memory consumption). However, as it is a
>>> global data structure, it may make LTO unhappy. Honza is helping
>>> testing the memory consumption on LTO (but we first need to make this
>>> patch work for LTO). If the LTO result turns out ok, we probably don't
>>> want to put these under GC because: 1. it'll make things much more
>>> compli

Re: [PATCH] Fix up _mm_f{,n}m{add,sub}_s{s,d} (PR target/54564)

2012-09-13 Thread Richard Henderson

On 09/13/2012 12:03 PM, Jakub Jelinek wrote:
> How could you extend these builtins to FMA4 BTW?  Doesn't FMA4 zero up the
> high elements?

Duh.  You're absolutely right.  I'd mis-read the document as clearing the
high lane of the %ymm register only.


r~

Re: [PATCH] Fix up _mm_f{,n}m{add,sub}_s{s,d} (PR target/54564)

2012-09-13 Thread Richard Henderson

On 09/13/2012 11:49 AM, Jakub Jelinek wrote:
> On Thu, Sep 13, 2012 at 11:25:42AM -0700, Richard Henderson wrote:
>> (1) Negating the second argument is arguably non-canonical rtl.
> 
> That is why I've put in the *fmai_fnm{add,sub}_ patterns
> operands 2 with the neg as first operand of the FMA rtl.  That way it is
> canonical (otherwise it didn't match in combine).  The FMA rtl operand
> order doesn't need to imply the order of instruction operands.

Sorry, I didn't read the unidiff properly.

>   (fma:VF_128
> (match_operand:VF_128 1 "nonimmediate_operand" " 0, 0")
> (match_operand:VF_128 2 "nonimmediate_operand" "xm, x")
> (match_operand:VF_128 3 "nonimmediate_operand" " x,xm"))
>   (match_operand:VF_128   4 "nonimmediate_operand" " 0, 0")
...
> which was apparently too much for reload (supposedly the two "0" constraint
> operands, even when the expander used (match_dup 1)).

Yes.  We'd have to have two different patterns to "properly" support fma4.

Though I suppose now that I think about it this is extremely similar to
the vfmadd231 case, in that in order to want to generate

vfmaddss  %xmm3, %xmm2, %xmm1, %xmm0

given the semantics of the builtin we'd have had to emit a copy of %xmm1
or %xmm2 into %xmm0 anyway.  So we might as well not support this and just do

(define_insn "*fmai_fmadd_"
  [(set (match_operand:VF_128 0 "register_operand" "=x,x,x,x")
(vec_merge:VF_128
  (fma:VF_128
(match_operand:VF_128 1 "nonimmediate_operand" "%0, 0, 0,0")
(match_operand:VF_128 2 "nonimmediate_operand" "xm, x, x,m")
(match_operand:VF_128 3 "nonimmediate_operand" " x,xm,xm,x"))
  (match_dup 0)
  (const_int 1)))]
  "TARGET_FMA || TARGET_FMA4"
  "@
   vfmadd132\t{%2, %3, %0|%0, %3, %2}
   vfmadd213\t{%3, %2, %0|%0, %2, %3}
   vfmadd\t{%3, %2, %1, %0|%0, %1, %2, %3}
   vfmadd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
  [(set_attr "isa" "fma,fma,fma4,fma4")
   (set_attr "type" "ssemuladd")
   (set_attr "mode" "")])


r~

Re: [PATCH] Fix up _mm_f{,n}m{add,sub}_s{s,d} (PR target/54564)

2012-09-13 Thread Jakub Jelinek

On Thu, Sep 13, 2012 at 11:25:42AM -0700, Richard Henderson wrote:
> (2) It's not the best match if we were to extend these builtins to FMA4.
> There we really do have 4 inputs.  Thus

How could you extend these builtins to FMA4 BTW?  Doesn't FMA4 zero up the
high elements?  In that case you'd need to expand it as copy of the X
operand register to DEST, doing vfmadd{ss,sd} to a temp register and
followed by vmovss/vmovsd instruction.

> (define_insn "*fmai_fmadd__4"
>   [(set (match_operand:VF_128 0 "register_operand" "=x,x")
> (vec_merge:VF_128
>   (fma:VF_128
> (match_operand:VF_128 1 "nonimmediate_operand" "%x,x")
> (match_operand:VF_128 2 "nonimmediate_operand" " x,m")
> (match_operand:VF_128 3 "nonimmediate_operand" "xm,x"))
>   (match_operand:VF_128 4 "register_operand" "0,0")
>   (const_int 1)))]
>   "TARGET_FMA4"
>   "vfmadd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
>   [(set_attr "type" "ssemuladd")
>(set_attr "mode" "")])

Jakub

Re: [PATCH] Fix up _mm_f{,n}m{add,sub}_s{s,d} (PR target/54564)

2012-09-13 Thread Jakub Jelinek

On Thu, Sep 13, 2012 at 11:25:42AM -0700, Richard Henderson wrote:
> On 09/13/2012 08:52 AM, Jakub Jelinek wrote:
> > The following patch fixes it, by tweaking the header so that the first
> > argument is not negated (we negate the second one instead), as we don't want
> > to negate the high elements if e.g. for whatever reason combiner doesn't
> > match it.  It fixes the expander to use a dup of the X operand as the high
> > element provider for the pattern, removes the 231 alternatives (because
> > those provide different destination high elements) and removes commutative
> > marker (again, that would mean different high elements).
> 
> I don't think this is the best way to fix this up.
> 
> (1) Negating the second argument is arguably non-canonical rtl.

That is why I've put in the *fmai_fnm{add,sub}_ patterns
operands 2 with the neg as first operand of the FMA rtl.  That way it is
canonical (otherwise it didn't match in combine).  The FMA rtl operand
order doesn't need to imply the order of instruction operands.
The reason why I don't want to negate in fmaintrin.h the first operand
is that the higher elements of it shouldn't be negated.
So, when the second argument to the builtin is negated, combiner substitutes
... (fma (reg1) (neg (reg2) (reg3))) (reg1) ..., then canonicalizes
to ... (fma (neg (reg2) (reg1) (reg3))) (reg1) ... and it matches that way.

> (2) It's not the best match if we were to extend these builtins to FMA4.
> There we really do have 4 inputs.  Thus

The first thing I've tried was for the *fmai_fmadd_ patterns to use
match_operand:VF_128 4 instead of the match_dup I'm using now.  But that way
I've ended up with:

(define_insn "*fmai_fmadd_"
  [(set (match_operand:VF_128 0 "register_operand" "=x,x")
(vec_merge:VF_128
  (fma:VF_128
(match_operand:VF_128 1 "nonimmediate_operand" " 0, 0")
(match_operand:VF_128 2 "nonimmediate_operand" "xm, x")
(match_operand:VF_128 3 "nonimmediate_operand" " x,xm"))
  (match_operand:VF_128   4 "nonimmediate_operand" " 0, 0")
  (const_int 1)))]
  "TARGET_FMA"
  "@
   vfmadd132\t{%2, %3, %0|%0, %3, %2}
   vfmadd213\t{%3, %2, %0|%0, %2, %3}"
  [(set_attr "type" "ssemuladd")
   (set_attr "mode" "")])

which was apparently too much for reload (supposedly the two "0" constraint
operands, even when the expander used (match_dup 1)).

> It would be nice if Intel cleaned up their documentation for the
> builtin, explicitly saying which high bits to copy.  I agree that
> the testcase is probably as normative as we'll get though.

Usually the docs just document instructions and list intrinsics.
When there is non-obvious match between the intrinsics and instructions, we
have a problem of insufficiently documented semantics.

Jakub

Re: [PATCH] Fix up _mm_f{,n}m{add,sub}_s{s,d} (PR target/54564)

2012-09-13 Thread Jakub Jelinek

On Thu, Sep 13, 2012 at 11:28:11AM -0700, Richard Henderson wrote:
> On 09/13/2012 10:42 AM, Uros Bizjak wrote:
> > Can we introduce additional "*fmai_fmadd__1" pattern (and
> > others) that would cover missing 231 alternative?
> 
> I really don't think that's necessary.  For that you'd need to
> be computing fma(x, y, x), at which point vfmadd213 matches as well.

I agree.  I've posted the patch just because Uros requested it.

Jakub

Re: Finish up PR rtl-optimization/44194

2012-09-13 Thread Eric Botcazou

> Sounds like a good cleanup to me.

Thanks.  I managed to screw up the computation of the new right end of the 
memory access in adjust_address_1 so I'll fix and retest.

-- 
Eric Botcazou

Re: [PATCH] Fix up _mm_f{,n}m{add,sub}_s{s,d} (PR target/54564)

2012-09-13 Thread Richard Henderson

On 09/13/2012 10:42 AM, Uros Bizjak wrote:
> Can we introduce additional "*fmai_fmadd__1" pattern (and
> others) that would cover missing 231 alternative?

I really don't think that's necessary.  For that you'd need to
be computing fma(x, y, x), at which point vfmadd213 matches as well.


r~

Re: [PATCH] Fix up _mm_f{,n}m{add,sub}_s{s,d} (PR target/54564)

2012-09-13 Thread Richard Henderson

On 09/13/2012 08:52 AM, Jakub Jelinek wrote:
> The following patch fixes it, by tweaking the header so that the first
> argument is not negated (we negate the second one instead), as we don't want
> to negate the high elements if e.g. for whatever reason combiner doesn't
> match it.  It fixes the expander to use a dup of the X operand as the high
> element provider for the pattern, removes the 231 alternatives (because
> those provide different destination high elements) and removes commutative
> marker (again, that would mean different high elements).

I don't think this is the best way to fix this up.

(1) Negating the second argument is arguably non-canonical rtl.

(2) It's not the best match if we were to extend these builtins to FMA4.
There we really do have 4 inputs.  Thus

(define_insn "*fmai_fmadd__4"
  [(set (match_operand:VF_128 0 "register_operand" "=x,x")
(vec_merge:VF_128
  (fma:VF_128
(match_operand:VF_128 1 "nonimmediate_operand" "%x,x")
(match_operand:VF_128 2 "nonimmediate_operand" " x,m")
(match_operand:VF_128 3 "nonimmediate_operand" "xm,x"))
  (match_operand:VF_128 4 "register_operand" "0,0")
  (const_int 1)))]
  "TARGET_FMA4"
  "vfmadd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
  [(set_attr "type" "ssemuladd")
   (set_attr "mode" "")])

which we can just as easily do by passing the appropriate input to
the expander.

It would be nice if Intel cleaned up their documentation for the
builtin, explicitly saying which high bits to copy.  I agree that
the testcase is probably as normative as we'll get though.

r~

Re: [PATCH] Fix up _mm_f{,n}m{add,sub}_s{s,d} (PR target/54564)

2012-09-13 Thread Jakub Jelinek

On Thu, Sep 13, 2012 at 07:42:17PM +0200, Uros Bizjak wrote:
> Can we introduce additional "*fmai_fmadd__1" pattern (and
> others) that would cover missing 231 alternative?

Here is the patch for that.  But, I don't see how it would ever match
(unless perhaps x is equal to z, but then the other insns could do the right
job too).

2012-09-13  Jakub Jelinek  

PR target/54564
* config/i386/sse.md (*fmai_fmadd__1, *fmai_fmsub__1,
*fmai_fnmadd__1, *fmai_fnmsub__1): New patterns.

--- gcc/config/i386/sse.md.jj   2012-09-13 19:58:42.0 +0200
+++ gcc/config/i386/sse.md  2012-09-13 20:07:15.181214004 +0200
@@ -2092,6 +2092,20 @@ (define_insn "*fmai_fmadd_"
   [(set_attr "type" "ssemuladd")
(set_attr "mode" "")])
 
+(define_insn "*fmai_fmadd__1"
+  [(set (match_operand:VF_128 0 "register_operand" "=x")
+(vec_merge:VF_128
+ (fma:VF_128
+   (match_operand:VF_128 1 "nonimmediate_operand" "%x")
+   (match_operand:VF_128 2 "nonimmediate_operand" "xm")
+   (match_operand:VF_128 3 "nonimmediate_operand" " 0"))
+ (match_dup 3)
+ (const_int 1)))]
+  "TARGET_FMA"
+  "vfmadd231\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "")])
+
 (define_insn "*fmai_fmsub_"
   [(set (match_operand:VF_128 0 "register_operand" "=x,x")
 (vec_merge:VF_128
@@ -2109,6 +2123,21 @@ (define_insn "*fmai_fmsub_"
   [(set_attr "type" "ssemuladd")
(set_attr "mode" "")])
 
+(define_insn "*fmai_fmsub__1"
+  [(set (match_operand:VF_128 0 "register_operand" "=x")
+(vec_merge:VF_128
+ (fma:VF_128
+   (match_operand:VF_128   1 "nonimmediate_operand" "%x")
+   (match_operand:VF_128   2 "nonimmediate_operand" "xm")
+   (neg:VF_128
+ (match_operand:VF_128 3 "nonimmediate_operand" " 0")))
+ (match_dup 3)
+ (const_int 1)))]
+  "TARGET_FMA"
+  "vfmsub231\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "")])
+
 (define_insn "*fmai_fnmadd_"
   [(set (match_operand:VF_128 0 "register_operand" "=x,x")
 (vec_merge:VF_128
@@ -2126,6 +2155,21 @@ (define_insn "*fmai_fnmadd_"
   [(set_attr "type" "ssemuladd")
(set_attr "mode" "")])
 
+(define_insn "*fmai_fnmadd__1"
+  [(set (match_operand:VF_128 0 "register_operand" "=x")
+(vec_merge:VF_128
+ (fma:VF_128
+   (neg:VF_128
+ (match_operand:VF_128 1 "nonimmediate_operand" "%x"))
+   (match_operand:VF_128   2 "nonimmediate_operand" "xm")
+   (match_operand:VF_128   3 "nonimmediate_operand" " 0"))
+ (match_dup 3)
+ (const_int 1)))]
+  "TARGET_FMA"
+  "vfnmadd231\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "")])
+
 (define_insn "*fmai_fnmsub_"
   [(set (match_operand:VF_128 0 "register_operand" "=x,x")
 (vec_merge:VF_128
@@ -2144,6 +2188,22 @@ (define_insn "*fmai_fnmsub_"
   [(set_attr "type" "ssemuladd")
(set_attr "mode" "")])
 
+(define_insn "*fmai_fnmsub__1"
+  [(set (match_operand:VF_128 0 "register_operand" "=x")
+(vec_merge:VF_128
+ (fma:VF_128
+   (neg:VF_128
+ (match_operand:VF_128 1 "nonimmediate_operand" "%x"))
+   (match_operand:VF_128   2 "nonimmediate_operand" "xm")
+   (neg:VF_128
+ (match_operand:VF_128 3 "nonimmediate_operand" " 0")))
+ (match_dup 3)
+ (const_int 1)))]
+  "TARGET_FMA"
+  "vfnmsub231\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "")])
+
 ;; FMA4 floating point scalar intrinsics.  These write the
 ;; entire destination register, with the high-order elements zeroed.
 


Jakub

minor cleanup in forwprop: use get_prop_source_stmt more

2012-09-13 Thread Marc Glisse


Hello,

this patch is a minor cleanup of my previous forwprop patches for vectors. 
I have known about get_prop_source_stmt from the beginning, but for some 
reason I always used SSA_NAME_DEF_STMT. This makes the source code 
slightly shorter, and together with PR 54565 it should help get some 
optimizations to apply as early as forwprop1 instead of forwprop2.


There is one line I had badly indented. I am not sure what the policy is 
for that. Silently bundling it with this patch as I am doing is probably 
not so good. I should probably just fix it in svn without asking the list, 
but I was wondering if I should add a ChangeLog entry and post the 
committed patch to the list afterwards? (that's what I would do by 
default, at worst it is a bit of spam)


passes bootstrap+testsuite

2012-09-14  Marc Glisse  

* tree-ssa-forwprop.c (simplify_bitfield_ref): Call
get_prop_source_stmt.
(simplify_permutation): Likewise.
(simplify_vector_constructor): Likewise.

--
Marc GlisseIndex: tree-ssa-forwprop.c
===
--- tree-ssa-forwprop.c (revision 191247)
+++ tree-ssa-forwprop.c (working copy)
@@ -2599,23 +2599,22 @@ simplify_bitfield_ref (gimple_stmt_itera
   elem_type = TREE_TYPE (TREE_TYPE (op0));
   if (TREE_TYPE (op) != elem_type)
 return false;
 
   size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
   op1 = TREE_OPERAND (op, 1);
   n = TREE_INT_CST_LOW (op1) / size;
   if (n != 1)
 return false;
 
-  def_stmt = SSA_NAME_DEF_STMT (op0);
-  if (!def_stmt || !is_gimple_assign (def_stmt)
-  || !can_propagate_from (def_stmt))
+  def_stmt = get_prop_source_stmt (op0, false, NULL);
+  if (!def_stmt || !can_propagate_from (def_stmt))
 return false;
 
   op2 = TREE_OPERAND (op, 2);
   idx = TREE_INT_CST_LOW (op2) / size;
 
   code = gimple_assign_rhs_code (def_stmt);
 
   if (code == VEC_PERM_EXPR)
 {
   tree p, m, index, tem;
@@ -2630,21 +2629,21 @@ simplify_bitfield_ref (gimple_stmt_itera
{
  p = gimple_assign_rhs1 (def_stmt);
}
   else
{
  p = gimple_assign_rhs2 (def_stmt);
  idx -= nelts;
}
   index = build_int_cst (TREE_TYPE (TREE_TYPE (m)), idx * size);
   tem = build3 (BIT_FIELD_REF, TREE_TYPE (op),
-unshare_expr (p), op1, index);
+   unshare_expr (p), op1, index);
   gimple_assign_set_rhs1 (stmt, tem);
   fold_stmt (gsi);
   update_stmt (gsi_stmt (*gsi));
   return true;
 }
 
   return false;
 }
 
 /* Determine whether applying the 2 permutations (mask1 then mask2)
@@ -2682,40 +2681,40 @@ is_combined_permutation_identity (tree m
 /* Combine a shuffle with its arguments.  Returns 1 if there were any
changes made, 2 if cfg-cleanup needs to run.  Else it returns 0.  */
  
 static int
 simplify_permutation (gimple_stmt_iterator *gsi)
 {
   gimple stmt = gsi_stmt (*gsi);
   gimple def_stmt;
   tree op0, op1, op2, op3, arg0, arg1;
   enum tree_code code;
+  bool single_use_op0 = false;
 
   gcc_checking_assert (gimple_assign_rhs_code (stmt) == VEC_PERM_EXPR);
 
   op0 = gimple_assign_rhs1 (stmt);
   op1 = gimple_assign_rhs2 (stmt);
   op2 = gimple_assign_rhs3 (stmt);
 
   if (TREE_CODE (op2) != VECTOR_CST)
 return 0;
 
   if (TREE_CODE (op0) == VECTOR_CST)
 {
   code = VECTOR_CST;
   arg0 = op0;
 }
   else if (TREE_CODE (op0) == SSA_NAME)
 {
-  def_stmt = SSA_NAME_DEF_STMT (op0);
-  if (!def_stmt || !is_gimple_assign (def_stmt)
- || !can_propagate_from (def_stmt))
+  def_stmt = get_prop_source_stmt (op0, false, &single_use_op0);
+  if (!def_stmt || !can_propagate_from (def_stmt))
return 0;
 
   code = gimple_assign_rhs_code (def_stmt);
   arg0 = gimple_assign_rhs1 (def_stmt);
 }
   else
 return 0;
 
   /* Two consecutive shuffles.  */
   if (code == VEC_PERM_EXPR)
@@ -2740,35 +2739,31 @@ simplify_permutation (gimple_stmt_iterat
   return remove_prop_source_from_use (op0) ? 2 : 1;
 }
 
   /* Shuffle of a constructor.  */
   else if (code == CONSTRUCTOR || code == VECTOR_CST)
 {
   tree opt;
   bool ret = false;
   if (op0 != op1)
{
- if (TREE_CODE (op0) == SSA_NAME && !has_single_use (op0))
+ if (TREE_CODE (op0) == SSA_NAME && !single_use_op0)
return 0;
 
  if (TREE_CODE (op1) == VECTOR_CST)
arg1 = op1;
  else if (TREE_CODE (op1) == SSA_NAME)
{
  enum tree_code code2;
 
- if (!has_single_use (op1))
-   return 0;
-
- gimple def_stmt2 = SSA_NAME_DEF_STMT (op1);
- if (!def_stmt2 || !is_gimple_assign (def_stmt2)
- || !can_propagate_from (def_stmt2))
+ gimple def_stmt2 = get_prop_source_stmt (op1, true, NULL);
+ if (!def_stmt2 || !can_propagate_from (def_stmt2))
return 0;
 
  code2 = g

Re: [PATCH] Fix up _mm_f{,n}m{add,sub}_s{s,d} (PR target/54564)

2012-09-13 Thread Jakub Jelinek

On Thu, Sep 13, 2012 at 07:42:17PM +0200, Uros Bizjak wrote:
> Can we introduce additional "*fmai_fmadd__1" pattern (and
> others) that would cover missing 231 alternative?

Sure.  Will cook up a patch soon.

> > 2012-09-13  Jakub Jelinek  
> >
> > PR target/54564
> > * config/i386/sse.md (fmai_vmfmadd_): Use (match_dup 1)
> > instead of (match_dup 0) as second argument to vec_merge.
> > (*fmai_fmadd_, *fmai_fmsub_): Likewise.
> > Remove third alternative.
> > (*fmai_fnmadd_, *fmai_fnmsub_): Likewise.  Negate
> > operand 2 instead of operand 1, but put it as first argument
> > of fma.
> >
> > * config/i386/fmaintrin.h (_mm_fnmadd_sd, _mm_fnmadd_ss,
> > _mm_fnmsub_sd, _mm_fnmsub_ss): Negate the second argument instead
> > of the first.
> 
> OK, but header change should be also reviewed by H.J.

H.J., are you ok with this?

Jakub

Re: [PATCH] Combine location with block using block_locations

2012-09-13 Thread Mike Stump

On Sep 13, 2012, at 9:51 AM, Robert Dewar  wrote:
> I routinely debugged code at -O1, but then the
> compiler got better at optimization, and things deteriorated so much
> at -O1 that now I don't even attempt it.

An example of a non-feature for me would be the reordering of instructions by 
scheduling when there is no benefit, only reorder, if there is a non-zero 
benefit.  This causes the jumpy debug experience, and needlessly so in some 
cases.  This is also an example that gdb can't fix.  The fix would be to 
compute costs as tuple and have the second part of the cost be the original 
instruction ordering.  Then, the scheduler actively works to preserve order, 
unless the upper case of the cost indicates there is a win to be had.

Re: [PATCH] Fix up _mm_f{,n}m{add,sub}_s{s,d} (PR target/54564)

2012-09-13 Thread Uros Bizjak

On Thu, Sep 13, 2012 at 5:52 PM, Jakub Jelinek  wrote:

> The fma-*.c testcase show that these intrinsics probably mean to preserve
> the high elements (other than the lowest) of the first argument of the
> fmaintrin.h *_s{s,d} intrinsics in the destination (the HW insn preserve
> there the destination register, but that varies - for 132 and 213 it is the
> first one (but the negation performed for _mm_fnm*_s[sd] breaks it anyway),
> for 231 it is the last one).  What the expander did was to put there
> an uninitialized pseudo, so we ended up with pretty random content, before
> H.J's http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190492 it happened
> to work by accident, but when things changed slightly and reload chose
> different alternative, this broke.
>
> The following patch fixes it, by tweaking the header so that the first
> argument is not negated (we negate the second one instead), as we don't want
> to negate the high elements if e.g. for whatever reason combiner doesn't
> match it.  It fixes the expander to use a dup of the X operand as the high
> element provider for the pattern, removes the 231 alternatives (because
> those provide different destination high elements) and removes commutative
> marker (again, that would mean different high elements).

Can we introduce additional "*fmai_fmadd__1" pattern (and
others) that would cover missing 231 alternative?

> 2012-09-13  Jakub Jelinek  
>
> PR target/54564
> * config/i386/sse.md (fmai_vmfmadd_): Use (match_dup 1)
> instead of (match_dup 0) as second argument to vec_merge.
> (*fmai_fmadd_, *fmai_fmsub_): Likewise.
> Remove third alternative.
> (*fmai_fnmadd_, *fmai_fnmsub_): Likewise.  Negate
> operand 2 instead of operand 1, but put it as first argument
> of fma.
>
> * config/i386/fmaintrin.h (_mm_fnmadd_sd, _mm_fnmadd_ss,
> _mm_fnmsub_sd, _mm_fnmsub_ss): Negate the second argument instead
> of the first.

OK, but header change should be also reviewed by H.J.

Thanks,
Uros.

Re: [PATCH] Combine location with block using block_locations

2012-09-13 Thread Mike Stump

On Sep 13, 2012, at 6:52 AM, Robert Dewar  wrote:
> Sure, it is obvious that you don't want -g to affect -O1 or -O2 code,
> but I think if you have -Og (if and when we have that), it would not
> be a bad thing for -g to affect that.

No, instead think of -Og as affecting the -g output itself.  If it does, then 
there is nothing for -g to affect when used with -Og.  So, I agree, -g -Og can 
have any impact on code-gen that we want, I just dis-agree that -Og should be 
any different; I just don't see the need.

[patch, mips] Patch for new mips triplet - mips-mti-elf

2012-09-13 Thread Steve Ellcey


Here is a patch to add a new mips*-mti-elf target to GCC.  This is similar
to the mips*-mti-linux-gnu target but for bare metal instead of linux.
The main difference between this new target and the existing mips*-sde-elf
target is that this version does not get built for as many different mips
architectures making the overall GCC build and test process faster and the
resulting cross compiler smaller because there are fewer different runtime
libraries.

I have attached 4 seperate patches because I made changes at the top level,
the top level config directory, the gcc directory, and the gcc test directory.

The top-level change is to not build the gprof directory for mips-mti-elf,
copying what we already do for mips-sde-elf.  The config directory change
is to use -mcode-readable=pcrel instead of -mcode-xonly during compilation.
The flags do the same thing but -mcode-readable=pcrel is the prefered flag
to use and for my new target I have it set up to only understand the new
flag instead of both the old and new flags.  The existing mips-sde-elf target
recognizes both old and new flags so it should not be affected by this
change.  Most of the changes are two new files in the gcc directory for the
new target, and then there is one test that is skipped on mips-sde-elf and I
now skip it for mips-mti-elf too.

For this patch to work I also need a binutils patch which I will submit later
today, it adds the mips*-mti-elf target to binutils as a target that behaves
identically to mips*-sde-elf.

I tested each variation that the new target supports using newlib and the
gnu simulator.  OK to checkin?


Top-level:


2012-09-12  Steve Ellcey  

* configure.ac: Add mips*-mti-elf* target.
* configure: Regenerate.


diff --git a/configure.ac b/configure.ac
index a6f5828..3ddefdb 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1032,7 +1032,7 @@ case "${target}" in
   microblaze*)
 noconfigdirs="$noconfigdirs gprof"
 ;;
-  mips*-sde-elf*)
+  mips*-sde-elf* | mips*-mti-elf*)
 if test x$with_newlib = xyes; then
   noconfigdirs="$noconfigdirs gprof"
 fi
@@ -2251,7 +2251,7 @@ case "${target}" in
   spu-*-*)
 target_makefile_frag="config/mt-spu"
 ;;
-  mips*-sde-elf*)
+  mips*-sde-elf* | mips*-mti-elf*)
 target_makefile_frag="config/mt-sde"
 ;;
   mipsisa*-*-elfoabi*)


===


2012-09-13  Steve Ellcey  

* mt-sde: Change -mcode-xonly to -mcode-readable=pcrel.


diff --git a/config/mt-sde b/config/mt-sde
index d6992e4..a3fc1e1 100644
--- a/config/mt-sde
+++ b/config/mt-sde
@@ -1,10 +1,10 @@
 # We default to building libraries optimised for size.  We use
 # -minterlink-mips16 so that the non-MIPS16 libraries can still be
-# linked against partly-MIPS16 code.  The -mcode-xonly option allows
+# linked against partly-MIPS16 code.  The -mcode-readable=pcrel option allows
 # MIPS16 libraries to run on Harvard-style split I/D memories, so long
 # as they have the D-to-I redirect for PC-relative loads.  -mno-gpopt
 # has two purposes: it allows libraries to be used in situations where
 # $gp != our _gp, and it allows them to be built with -G8 while
 # retaining link compatibility with -G0 and -G4.
-CFLAGS_FOR_TARGET += -Os -minterlink-mips16 -mcode-xonly -mno-gpopt
-CXXFLAGS_FOR_TARGET += -Os -minterlink-mips16 -mcode-xonly -mno-gpopt
+CFLAGS_FOR_TARGET += -Os -minterlink-mips16 -mcode-readable=pcrel -mno-gpopt
+CXXFLAGS_FOR_TARGET += -Os -minterlink-mips16 -mcode-readable=pcrel -mno-gpopt



===


2012-09-13  Steve Ellcey  

* config.gcc (mips*-mti-elf*): New target.
* config/mips/mti-elf.h: New file.
* config/mips/t-mti-elf: New file.


diff --git a/gcc/config.gcc b/gcc/config.gcc
index ba366b3..9f5e170 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1741,6 +1741,11 @@ mips*-*-linux*)  # Linux MIPS, 
either endian.
 esac
test x$with_llsc != x || with_llsc=yes
;;
+mips*-mti-elf*)
+   tm_file="elfos.h newlib-stdint.h ${tm_file} mips/elf.h mips/sde.h 
mips/mti-elf.h"
+   tmake_file="mips/t-mti-elf"
+   tm_defines="${tm_defines} MIPS_ISA_DEFAULT=33 MIPS_ABI_DEFAULT=ABI_32"
+   ;;
 mips*-sde-elf*)
tm_file="elfos.h newlib-stdint.h ${tm_file} mips/elf.h mips/sde.h"
tmake_file="mips/t-sde"
diff --git a/gcc/config/mips/mti-elf.h b/gcc/config/mips/mti-elf.h
new file mode 100644
index 000..229555e
--- /dev/null
+++ b/gcc/config/mips/mti-elf.h
@@ -0,0 +1,56 @@
+/* Target macros for mips*-mti-elf targets.
+   Copyright (C) 2012
+   Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will

Re: vector comparisons in C++

2012-09-13 Thread Mike Stump

On Sep 13, 2012, at 8:47 AM, Marc Glisse  wrote:
>> What was the reluctance?  It seems clear to me that if we support the type, 
>> we should support these operations.
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51033
> 
> In comments 1 and 7, Richard Guenther didn't seem too enthusiastic about any 
> vector-related extension to the C++ front-end.

C++, as designed, wants to be a proper super set of C (not that it is, or ever 
will be), so, in general, it wants things that work in C to also work in C++.  
Richard is right there are more 2,3 and 4 features at a time interaction 
problems that make C++ support in general for any new feature tend to be 
slightly non-trivial, but that, while true, I don't think should dissuade us.  
Now, we might recommend people use some better, C++ish to solve their problems, 
but, for people that just want existing C code to just compile as C++ code…  
having the support is nice.

Re: [Patch ARM] big-endian support for Neon vext tests

2012-09-13 Thread Mike Stump

On Sep 13, 2012, at 2:45 AM, Christophe Lyon  wrote:
> Ping?
> http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00330.html

So, two things I thought I'd ask about:

> +/* __attribute__ ((noinline)) is currently required, otherwise the
> +   generated code computes wrong results in big-endian.  */

and:

> +#ifdef __ARMEL__
> +  uint64x2_t __mask1 = {1, 0};
> +#else
>uint64x2_t __mask1 = {1, 0};
> +#endif

>> * In the case of the test which is executed, I had to force the
>> noinline attribute on the helper functions, otherwise the computed
>> results are wrong in big-endian. It is probably an overkill workaround
>> but it works :-)
>>  I am going to file a bugzilla for this problem.

I think that for developing the patches noinline was fine, we are confident 
there aren't any more bugs, but, for checkin, I think it is better to leave the 
test case as is, and let it fail until the PR you filed is fixed.  We usually 
don't put hack arounds for code-gen compiler bugs into the testsuite just to 
make them pass…  :-)

The second (occurs more than once) just looks odd.  I thought I'd mention it, 
not sure what my preference is.

Re: [C++ Patch] PR 53210

2012-09-13 Thread Paolo Carlini

"Manuel López-Ibáñez"  ha scritto:

>But then the warning should report Winit-self (that is, use
>OPT_Winit_self for warning) and not OPT_Wuninitialized. Because it is
>what people should use to disabled it.

Ok, I'll do the change.

Thanks,
Paolo

Re: [C PATCH] Fix another _Complex C ICE (PR c/54559)

2012-09-13 Thread Joseph S. Myers

On Thu, 13 Sep 2012, Jakub Jelinek wrote:

> Hi!
> 
> This ICE is because c_finish_return calls convert after c_fully_fold
> is performed on the argument, and doesn't call it again, thus we need
> in_late_binary_op.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
> ok for trunk/4.7?
> 
> 2012-09-13  Jakub Jelinek  
> 
>   PR c/54559
>   * c-typeck.c (c_finish_return): Do convert to BOOLEAN_TYPE or
>   COMPLEX_TYPE with in_late_binary_op set temporarily to true.
> 
>   * gcc.c-torture/compile/pr54559.c: New test.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] Combine location with block using block_locations

2012-09-13 Thread Robert Dewar


On 9/13/2012 12:46 PM, Tom Tromey wrote:

"Robert" == Robert Dewar  writes:


Robert> Sometimes I wonder whether the insistence on -g not changing code
Robert> generation is warranted. In practice, gdb for me is so weak in handling
Robert> -O1 or -O2, that if I want to debug something I have to recompile
Robert> with -O0 -g, which causes quite a bit of code generation change :-)

If those are gdb bugs, please file them.


Well I think everyone knows about the failings of gdb in -O1 mode, they
have been much discussed, and they are not really gdb bugs, more an 
issue of it being basically hard to debug optimized code. Things used

to be a LOT better, I routinely debugged code at -O1, but then the
compiler got better at optimization, and things deteriorated so much
at -O1 that now I don't even attempt it.


Tom

Re: [PATCH] Combine location with block using block_locations

2012-09-13 Thread Tom Tromey

> "Dehao" == Dehao Chen  writes:

Dehao> + static htab_t location_adhoc_data_htab;
Dehao> + static source_location curr_adhoc_loc;
Dehao> + static struct location_adhoc_data *location_adhoc_data;
Dehao> + static unsigned int allocated_location_adhoc_data;

libcpp was written to allow multiple preprocessor objects to be created
and used in one process.  I think introducing globals like this breaks
this part of the design.  It seems to me they should instead be fields
of cpp_reader or line_maps.

Tom

Re: [PATCH] Combine location with block using block_locations

2012-09-13 Thread Tom Tromey

> "Robert" == Robert Dewar  writes:

Robert> Sometimes I wonder whether the insistence on -g not changing code
Robert> generation is warranted. In practice, gdb for me is so weak in handling
Robert> -O1 or -O2, that if I want to debug something I have to recompile
Robert> with -O0 -g, which causes quite a bit of code generation change :-)

If those are gdb bugs, please file them.

Tom

[google/main] Fix regression - SUBTARGET_EXTRA_SPECS overridden by LINUX_GRTE_EXTRA_SPECS

2012-09-13 Thread 沈涵

Hi, the google/gcc-main fails to linking anything (on x86-generic chromeos).

By looking into specs file, it seems that 'link_emulation' section is
missing in specs.

The problem is in config/i386/linux.h, SUBTARGET_EXTRA_SPECS (which is
not empty for chrome x86-generic) is overridden by
"LINUX_GRTE_EXTRA_SPECS".

My fix is to prepend LINUX_GRTE_EXTRA_SPECS to SUBTARGET_EXTRA_SPECS in linux.h

This fix was submitted to google/gcc-4_7 and had gone thorough tests
for past weeks.

Tested by crosstool-validate.py --crosstool_ver=v16 --gcc_dir=`pwd`
--testers=crosstool

Jing, could you take a look at this?

--
Han Shen

2012-09-13 Han Shen  
* gcc/config/i386/gnu-user.h (SUBTARGET_EXTRA_SPECS): Compute
new value of LINUX_GRTE_EXTRA_SPECS by pre-pending LINUX_GRTE_EXTRA_SPECS
to its origin value.
* gcc/config/i386/gnu-user.h (SUBTARGET_EXTRA_SPECS_STR): Add
new MACRO to hold value of SUBTARET_EXTRA_SPECS so that
SUBTARET_EXTRA_SPECS could be replaced later in gnu-user.h


diff --git a/gcc/config/i386/gnu-user.h b/gcc/config/i386/gnu-user.h
index 98d0a25..ba120b8 100644
--- a/gcc/config/i386/gnu-user.h
+++ b/gcc/config/i386/gnu-user.h
@@ -92,10 +92,12 @@ along with GCC; see the file COPYING3.  If not see
 #define ASM_SPEC \
   "--32 %{!mno-sse2avx:%{mavx:-msse2avx}} %{msse2avx:%{!mavx:-msse2avx}}"

-#undef  SUBTARGET_EXTRA_SPECS
-#define SUBTARGET_EXTRA_SPECS \
+#undef  SUBTARGET_EXTRA_SPECS_STR
+#define SUBTARGET_EXTRA_SPECS_STR \
   { "link_emulation", GNU_USER_LINK_EMULATION },\
   { "dynamic_linker", GNU_USER_DYNAMIC_LINKER }
+#undef  SUBTARGET_EXTRA_SPECS
+#define SUBTARGET_EXTRA_SPECS SUBTARGET_EXTRA_SPECS_STR

 #undef LINK_SPEC
 #define LINK_SPEC "-m %(link_emulation) %{shared:-shared} \
diff --git a/gcc/config/i386/linux.h b/gcc/config/i386/linux.h
index ade524c..61d5c68 100644
--- a/gcc/config/i386/linux.h
+++ b/gcc/config/i386/linux.h
@@ -32,5 +32,11 @@ along with GCC; see the file COPYING3.  If not see
 #endif

 #undef  SUBTARGET_EXTRA_SPECS
+#ifndef SUBTARGET_EXTRA_SPECS_STR
 #define SUBTARGET_EXTRA_SPECS \
   LINUX_GRTE_EXTRA_SPECS
+#else
+#define SUBTARGET_EXTRA_SPECS \
+  LINUX_GRTE_EXTRA_SPECS \
+  SUBTARGET_EXTRA_SPECS_STR
+#endif

[google] Fix duplicate symbol error reported by assembler

2012-09-13 Thread Xinliang David Li

The following patch fixes a problem exposed in LIPO random stress
testing with large module groups -- the error is that multiple copies
compiler generated static functions (ctor of class in anonymous
namespace) get emitted.


David


Index: cgraphunit.c
===
--- cgraphunit.c(revision 191267)
+++ cgraphunit.c(revision 191268)
@@ -1430,7 +1430,12 @@ cgraph_add_output_node (struct cgraph_no
   if (!L_IPO_COMP_MODE)
 return node;

-  if (!TREE_PUBLIC (node->decl))
+  /* Never common non public names except for compiler
+ generated static functions. (they are not promoted
+ to globals either.  */
+  if (!TREE_PUBLIC (node->decl)
+  && !(DECL_ARTIFICIAL (node->decl)
+  && DECL_ASSEMBLER_NAME_SET_P (node->decl)))
 return node;

   if (!output_node_hash)

Re: [PATCH] Combine location with block using block_locations

2012-09-13 Thread Robert Dewar


On 9/13/2012 12:07 PM, Xinliang David Li wrote:

It is very important to make sure -g does not affect code gen ---
people do release build with -g with optimization, and strip the
binary before sending it to production machines ..


Yes, of course, and for sure -g cannot affect optimized code, see
my follow on message.


David

On Thu, Sep 13, 2012 at 6:33 AM, Robert Dewar  wrote:

On 9/13/2012 8:00 AM, Richard Guenther wrote:


Because doing so would create code generation differences -g vs. -g0.



Sometimes I wonder whether the insistence on -g not changing code
generation is warranted. In practice, gdb for me is so weak in handling
-O1 or -O2, that if I want to debug something I have to recompile
with -O0 -g, which causes quite a bit of code generation change :-)

Re: Merge C++ conversion into trunk (0/6 - Overview)

2012-09-13 Thread Paolo Bonzini

Il 13/09/2012 17:57, Jakub Jelinek ha scritto:
>>> > > Can we get this change in?  The current state is terribly annoying.
>> > 
>> > Yes, please go ahead.
> Here it is, bootstrapped/regtested on x86_64-linux and i686-linux,
> additionally tested on --disable-bootstrap tree, both by make cc1 inside of
> gcc subdir (no -O2) and make all-gcc above it (with -O2).

Ok.

Paolo

Re: [PATCH] Combine location with block using block_locations

2012-09-13 Thread Dehao Chen

On Thu, Sep 13, 2012 at 8:02 PM, Richard Guenther
 wrote:
> On Wed, Sep 12, 2012 at 6:39 PM, Xinliang David Li  wrote:
>> On Wed, Sep 12, 2012 at 2:13 AM, Richard Guenther
>>  wrote:
>>> On Wed, Sep 12, 2012 at 7:06 AM, Dehao Chen  wrote:
 Now I think we are facing a more complex problem. The data structure
 we use to store the location_adhoc_data are file-static in linemap.c
 in libcpp. These data structures are not guarded by GTY(()).
 Meanwhile, as we have removed the block data structure from
 gimple.gsbase as well as tree.exp (encoding them into an location_t).
 This could cause block being GCed and the LOCATION_BLOCK becoming
 dangling pointers.
>>>
>>> Uh.  Note that it is quite important that we are able to garbage-collect 
>>> unused
>>> BLOCKs, this is the whole point of removing unused BLOCK scopes in
>>> remove_unused_locals.  So this indeed becomes much more complicated ...
>>> What would be desired is that the garbage collector can NULL an entry in
>>> the mapping table when it is not referenced in any other way (that other
>>> reference would be the BLOCK tree as stored in a FUNCTION_DECLs 
>>> DECL_INITIAL).
>>
>> It would be nice to GC those unused BLOCKS. I wonder how many BLOCKS
>> are created for a large C++ program. This patch saves memory by
>> shrinking tree size, is it a net win or loss without GC those BLOCKS?
>
> Memory usage issues pop up with C++ code using expression templates
> (try BOOST MPL or tramp3d or some larger spirit testcases).  Inlining

I compared the memory consumption for tramp3d, the patched version has
a peak of 504065kB, while non-patched version has a peak of 491853kB.

> creates tons of "empty" BLOCK trees that just wrap others.  It is important
> to be able to GC those.  Now, it might be that no expression / location
> which references the BLOCK survives, and if the line-table is not scanned

Those non-used blocks will still be GCed in this patch.

> by GC then we will just end up with never re-usable entries (the BLOCK address
> may get re-used - can we get false sharing here?)

That is true (memory wasted). However, in the tramp3d case, only
409600 entries are allocated in location_adhoc_data (4.9MB, 1% of the
peak mem consumption). Thus the wasted entry should not be
significant.

Concerning re-used BLOCK, if the block address and the location are
the same, the previously allocated entry will be reused. But it'll not
affect the correctness.

Thanks,
Dehao

>
> Richard.
>
>> thanks,
>>
>> David
>>
>>
>>>
 I tried to manipulate GTY to make it recognize the LOCATION_BLOCK from
 gimple.gsbase.location. However, neigher nested_ptr nor mark_hook can
 help me.

 Another approach would be guard the location_adhoc_data and related
 data structures in GTY(()). However, this is non-trivial because tree
 is not visible in libcpp. At the same time, my implementation heavily
 relies on hashtable to make the code efficient, thus it's quite tricky
 to make "param_is" and "use_params" work.

 The final approach, which I'll try tomorrow, would be move all my
 implementation from libcpp to gcc, and guard them with GTY(()). I
 still haven't thought of any potential problem of this approach. Any
 comments?
>>>
>>> I think moving the mapping to GC in a lazy manner as I described above
>>> would be the way to go.  For hashtables GC already supports if_marked,
>>> not sure if similar support is available for arrays/vecs.
>>>
>>> Richard.
>>>
 Thanks,
 Dehao

 On Tue, Sep 11, 2012 at 9:00 AM, Dehao Chen  wrote:
> I saw comments in tree-streamer-out.c:
>
>   /* Do not stream BLOCK_SOURCE_LOCATION.  We cannot handle debug 
> information
>  for early inlining so drop it on the floor instead of ICEing in
>  dwarf2out.c.  */
>   streamer_write_chain (ob, BLOCK_VARS (expr), ref_p);
>
> However, what the code is doing seemed contradictory with the comment.
> Or am I missing something?
>
>
>
> On Tue, Sep 11, 2012 at 8:32 AM, Michael Matz  wrote:
>> Hi,
>>
>> On Tue, 11 Sep 2012, Dehao Chen wrote:
>>
>>> Looks like we have two choices:
>>>
>>> 1. Stream out block info, and use LTO_SET_PREVAIL for TREE_CHAIN(t)
>>
>> This will actually not work correctly in some cases.  The problem is, if
>> the prevailing decl is already part of another chain (say in another
>> block_var list) you would break the current chain.  Hence block vars need
>> special handling in the lto streamer (another reason why tree_chain is 
>> not
>> the most clever think to use for this chain).  This problem area needs to
>> be solved somehow if block info is to be preserved correctly.
>>
>>> 2. Don't stream out block info for LTO, and still call LTO_NO_PREVAIL
>>> (TREE_CHAIN (t)).
>>
>> That's also a large hammer as it basically will mean no debug info after
>> LTO :-/ Sigh,

Re: [PATCH] Add option for dumping to stderr (issue6190057)

2012-09-13 Thread Xinliang David Li

Yes, indeed.

thanks,

David

On Thu, Sep 13, 2012 at 4:08 AM, Richard Guenther
 wrote:
> On Wed, Sep 12, 2012 at 6:46 PM, Xinliang David Li  wrote:
>> On Wed, Sep 12, 2012 at 3:30 AM, Richard Guenther
>>  wrote:
>>> On Wed, Sep 12, 2012 at 10:12 AM, Sharad Singhai  wrote:
 Thanks for your comments. Please see my responses inline.

 On Tue, Sep 11, 2012 at 1:16 PM, Xinliang David Li  
 wrote:
> Can you resend your patch in text form (also need to resolve the
> latest conflicts) so that it can be commented inline?

 I tried to include inline patch earlier but my message was bounced
 back from patches mailing list. I am trying it again.

> Please also provide as summary a more up-to-date description of
> 1) Command line option syntax and semantics

 I added some documentation in the patch. Here are the relevant bits
 from invoke.texi.

 `-fdump-tree-SWITCH-OPTIONS=FILENAME'
  Control the dumping at various stages of processing the
  intermediate language tree to a file.  The file name is generated
  by appending a switch-specific suffix to the source file name, and
  the file is created in the same directory as the output file. In
  case of `=FILENAME' option, the dump is output on the given file
  instead of the auto named dump files.
  ...

 `=FILENAME'
   Instead of an auto named dump file, output into the given file
   name. The file names `stdout' and `stderr' are treated
   specially and are considered already open standard streams.
   For example,

gcc -O2 -ftree-vectorize -fdump-tree-vect-details=foo.dump
 -fdump-tree-pre=stderr file.c

   outputs vectorizer dump into `foo.dump', while the PRE dump
   is output on to `stderr'. If two conflicting dump filenames
   are given for the same pass, then the latter option
   overrides the earlier one.

 `-fopt-info-PASS'
 `-fopt-info-PASS-OPTIONS'
 `-fopt-info-PASS-OPTIONS=FILENAME'
  Controls optimization dumps from various passes. If the `-OPTIONS'
  form is used, OPTIONS is a list of `-' separated options which
  controls the details of the dump.  If OPTIONS is not specified, it
  defaults to `optimized'. If the FILENAME is not specified, it
  defaults to `stderr'. Note that the output FILENAME will be
  overwritten in case of multiple translation units. If a combined
  output from multiple the translation units is desired, `stderr'
  should be used instead.

  The PASS could be one of the tree or rtl passes. The following
  options are available
>>>
>>> I don't like that we have -PASS here.  That makes it awfully similar
>>> to -fdump-PASS-OPTIONS=FILENAME.  Are we merely having
>>> -fopt-info because OPTIONS are "different"?
>>
>>
>> Having PASS is useful to do filtering. But as your said, the option
>> design here is very much oriented towards developers not end users
>> which fopt-info is also intended for.
>
> Just to add a comment here - -fopt-info is _only_ targeted at end users.
> Developers can use -fdump-tree-XXX=stderr now (which, with the correct
> pass / flags should produce identical output to -fopt-info - at least that
> was the whole point with the re-design of the dump API - to make it
> possible to implement -fopt-info in a way that it simply provides a nice
> interface to end-users to our existing dumping information.
>
> If it doesn't work like that right now we should make it work this way.
>
> Richard.

Re: Finish up PR rtl-optimization/44194

2012-09-13 Thread Eric Botcazou

> Will it help
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54315
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28831

It won't help everywhere, since it's only for architectures that return 
structures in registers, so x86-64 but not x86 for example.

54315 pertains to single-fielded unions and uses a different code path 
(copy_blkmode_from_reg) although the issue is similar.  I'll have a look.

It will help 28831 on x86-64 if the structure is returned in registers, e.g.

struct val_t {
  int a, b, c;
};

extern struct val_t foo();
extern int bar(struct val_t);
int main() {
return bar(foo());
}

The patch eliminates the two extra stores mentioned in comment #15.  I'll add a 
reference to this one.

-- 
Eric Botcazou

Re: [PATCH] Combine location with block using block_locations

2012-09-13 Thread Xinliang David Li

It is very important to make sure -g does not affect code gen ---
people do release build with -g with optimization, and strip the
binary before sending it to production machines ..

David

On Thu, Sep 13, 2012 at 6:33 AM, Robert Dewar  wrote:
> On 9/13/2012 8:00 AM, Richard Guenther wrote:
>
>> Because doing so would create code generation differences -g vs. -g0.
>
>
> Sometimes I wonder whether the insistence on -g not changing code
> generation is warranted. In practice, gdb for me is so weak in handling
> -O1 or -O2, that if I want to debug something I have to recompile
> with -O0 -g, which causes quite a bit of code generation change :-)
>

Re: Merge C++ conversion into trunk (0/6 - Overview)

2012-09-13 Thread Jakub Jelinek

On Thu, Sep 13, 2012 at 10:53:23AM +0200, Paolo Bonzini wrote:
> Il 13/09/2012 10:46, Jakub Jelinek ha scritto:
> >> > # Remove the -O2: for historical reasons, unless bootstrapping we prefer 
> >> > 
> >> > # optimizations to be activated explicitly by the toplevel.  
> >> > 
> >> > case "$CC" in
> >> >   */prev-gcc/xgcc*) ;;
> >> >   *) CFLAGS=`echo $CFLAGS | sed "s/-O[[s0-9]]* *//" ` ;;
> >> > esac
> >> > AC_SUBST(CFLAGS)
> >> > 
> >> > in configure.ac does this.  I think if CXXFLAGS is also so done, we'd 
> >> > gain parity.
> > Can we get this change in?  The current state is terribly annoying.
> 
> Yes, please go ahead.

Here it is, bootstrapped/regtested on x86_64-linux and i686-linux,
additionally tested on --disable-bootstrap tree, both by make cc1 inside of
gcc subdir (no -O2) and make all-gcc above it (with -O2).

Ok for trunk?

2012-09-13  Jakub Jelinek  

* configure.ac (CXXFLAGS): Remove -O2 when not bootstrapping.
* configure: Regenerated.

--- gcc/configure.ac.jj 2012-09-13 07:54:41.0 +0200
+++ gcc/configure.ac2012-09-13 14:19:54.016741197 +0200
@@ -296,9 +296,11 @@ AC_SUBST(OUTPUT_OPTION)
 # optimizations to be activated explicitly by the toplevel.
 case "$CC" in
   */prev-gcc/xgcc*) ;;
-  *) CFLAGS=`echo $CFLAGS | sed "s/-O[[s0-9]]* *//" ` ;;
+  *) CFLAGS=`echo $CFLAGS | sed "s/-O[[s0-9]]* *//" `
+ CXXFLAGS=`echo $CXXFLAGS | sed "s/-O[[s0-9]]* *//" ` ;;
 esac
 AC_SUBST(CFLAGS)
+AC_SUBST(CXXFLAGS)
 
 # Determine PICFLAG for target gnatlib.
 GCC_PICFLAG_FOR_TARGET
--- gcc/configure.jj2012-09-13 07:54:39.0 +0200
+++ gcc/configure   2012-09-13 14:34:40.429269215 +0200
@@ -4863,10 +4863,12 @@ fi
 # optimizations to be activated explicitly by the toplevel.
 case "$CC" in
   */prev-gcc/xgcc*) ;;
-  *) CFLAGS=`echo $CFLAGS | sed "s/-O[s0-9]* *//" ` ;;
+  *) CFLAGS=`echo $CFLAGS | sed "s/-O[s0-9]* *//" `
+ CXXFLAGS=`echo $CXXFLAGS | sed "s/-O[s0-9]* *//" ` ;;
 esac
 
 
+
 # Determine PICFLAG for target gnatlib.
 
 
@@ -17782,7 +17784,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 17785 "configure"
+#line 17787 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -17888,7 +17890,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 17891 "configure"
+#line 17893 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H


Jakub

[PATCH] Fix up _mm_f{,n}m{add,sub}_s{s,d} (PR target/54564)

2012-09-13 Thread Jakub Jelinek

Hi!

The fma-*.c testcase show that these intrinsics probably mean to preserve
the high elements (other than the lowest) of the first argument of the
fmaintrin.h *_s{s,d} intrinsics in the destination (the HW insn preserve
there the destination register, but that varies - for 132 and 213 it is the
first one (but the negation performed for _mm_fnm*_s[sd] breaks it anyway),
for 231 it is the last one).  What the expander did was to put there
an uninitialized pseudo, so we ended up with pretty random content, before
H.J's http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190492 it happened
to work by accident, but when things changed slightly and reload chose
different alternative, this broke.

The following patch fixes it, by tweaking the header so that the first
argument is not negated (we negate the second one instead), as we don't want
to negate the high elements if e.g. for whatever reason combiner doesn't
match it.  It fixes the expander to use a dup of the X operand as the high
element provider for the pattern, removes the 231 alternatives (because
those provide different destination high elements) and removes commutative
marker (again, that would mean different high elements).

Bootstrapped/regtested on x86_64-linux and i686-linux, additionally tested
with
make check-gcc RUNTESTFLAGS='--target_board=valgrind-sim/-m64 i386.exp=\*fma\*'
Ok for trunk/4.7?

2012-09-13  Jakub Jelinek  

PR target/54564
* config/i386/sse.md (fmai_vmfmadd_): Use (match_dup 1)
instead of (match_dup 0) as second argument to vec_merge.
(*fmai_fmadd_, *fmai_fmsub_): Likewise.
Remove third alternative.
(*fmai_fnmadd_, *fmai_fnmsub_): Likewise.  Negate
operand 2 instead of operand 1, but put it as first argument
of fma.

* config/i386/fmaintrin.h (_mm_fnmadd_sd, _mm_fnmadd_ss,
_mm_fnmsub_sd, _mm_fnmsub_ss): Negate the second argument instead
of the first.

--- gcc/config/i386/sse.md.jj   2012-09-05 18:27:03.0 +0200
+++ gcc/config/i386/sse.md  2012-09-13 13:49:49.504968716 +0200
@@ -2072,79 +2072,75 @@ (define_expand "fmai_vmfmadd_"
(match_operand:VF_128 1 "nonimmediate_operand")
(match_operand:VF_128 2 "nonimmediate_operand")
(match_operand:VF_128 3 "nonimmediate_operand"))
- (match_dup 0)
+ (match_dup 1)
  (const_int 1)))]
   "TARGET_FMA")
 
 (define_insn "*fmai_fmadd_"
-  [(set (match_operand:VF_128 0 "register_operand" "=x,x,x")
+  [(set (match_operand:VF_128 0 "register_operand" "=x,x")
 (vec_merge:VF_128
  (fma:VF_128
-   (match_operand:VF_128 1 "nonimmediate_operand" "%0, 0,x")
-   (match_operand:VF_128 2 "nonimmediate_operand" "xm, x,xm")
-   (match_operand:VF_128 3 "nonimmediate_operand" " x,xm,0"))
- (match_dup 0)
+   (match_operand:VF_128 1 "nonimmediate_operand" " 0, 0")
+   (match_operand:VF_128 2 "nonimmediate_operand" "xm, x")
+   (match_operand:VF_128 3 "nonimmediate_operand" " x,xm"))
+ (match_dup 1)
  (const_int 1)))]
   "TARGET_FMA"
   "@
vfmadd132\t{%2, %3, %0|%0, %3, %2}
-   vfmadd213\t{%3, %2, %0|%0, %2, %3}
-   vfmadd231\t{%2, %1, %0|%0, %1, %2}"
+   vfmadd213\t{%3, %2, %0|%0, %2, %3}"
   [(set_attr "type" "ssemuladd")
(set_attr "mode" "")])
 
 (define_insn "*fmai_fmsub_"
-  [(set (match_operand:VF_128 0 "register_operand" "=x,x,x")
+  [(set (match_operand:VF_128 0 "register_operand" "=x,x")
 (vec_merge:VF_128
  (fma:VF_128
-   (match_operand:VF_128   1 "nonimmediate_operand" "%0, 0,x")
-   (match_operand:VF_128   2 "nonimmediate_operand" "xm, x,xm")
+   (match_operand:VF_128   1 "nonimmediate_operand" " 0, 0")
+   (match_operand:VF_128   2 "nonimmediate_operand" "xm, x")
(neg:VF_128
- (match_operand:VF_128 3 "nonimmediate_operand" " x,xm,0")))
- (match_dup 0)
+ (match_operand:VF_128 3 "nonimmediate_operand" " x,xm")))
+ (match_dup 1)
  (const_int 1)))]
   "TARGET_FMA"
   "@
vfmsub132\t{%2, %3, %0|%0, %3, %2}
-   vfmsub213\t{%3, %2, %0|%0, %2, %3}
-   vfmsub231\t{%2, %1, %0|%0, %1, %2}"
+   vfmsub213\t{%3, %2, %0|%0, %2, %3}"
   [(set_attr "type" "ssemuladd")
(set_attr "mode" "")])
 
 (define_insn "*fmai_fnmadd_"
-  [(set (match_operand:VF_128 0 "register_operand" "=x,x,x")
+  [(set (match_operand:VF_128 0 "register_operand" "=x,x")
 (vec_merge:VF_128
  (fma:VF_128
(neg:VF_128
- (match_operand:VF_128 1 "nonimmediate_operand" "%0, 0,x"))
-   (match_operand:VF_128   2 "nonimmediate_operand" "xm, x,xm")
-   (match_operand:VF_128   3 "nonimmediate_operand" " x,xm,0"))
- (match_dup 0)
+ (match_operand:VF_128 2 "nonimmediate_operand" "xm, x"))
+   (match_operand:VF_128   1 "nonimmediate_operand" " 0, 0")
+   (match_operand:VF_128   3 "nonimmediate_operand" " x,

Re: vector comparisons in C++

2012-09-13 Thread Marc Glisse


On Thu, 13 Sep 2012, Jason Merrill wrote:


On 08/31/2012 06:20 PM, Marc Glisse wrote:

this patch copies some more vector extensions from the C front-end to
the C++ front-end. There seemed to be some reluctance to add those, but
I guess a patch is the best way to ask


What was the reluctance?  It seems clear to me that if we support the type, 
we should support these operations.


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51033

In comments 1 and 7, Richard Guenther didn't seem too enthusiastic about 
any vector-related extension to the C++ front-end.


Some users (other PRs) asked instead that we make vector types class-like 
so users can define their own operator<(vec,vec).



Following the OpenCL standard makes sense to me.


I should really take a look at that standard...


I don't know either.


+ if (TREE_TYPE (type0) != TREE_TYPE (type1))


I think this should use same_type_ignoring_top_level_qualifiers_p.


Hmm, I assume you mean

same_type_ignoring_top_level_qualifiers_p (type0, type1)

which would replace both this test and

TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1)

below? I copied this directly from the C front-end, which I guess split 
the test to have 2 different error messages. I could call 
same_type_ignoring_top_level_qualifiers_p and only if it fails do more 
specific tests to determine the error message (or I could even merge the 2 
error messages, "vectors with different types" should be good enough).


Did I understand your suggestion correctly?


Thank you for your comments,

--
Marc Glisse

[C PATCH] Fix another _Complex C ICE (PR c/54559)

2012-09-13 Thread Jakub Jelinek

Hi!

This ICE is because c_finish_return calls convert after c_fully_fold
is performed on the argument, and doesn't call it again, thus we need
in_late_binary_op.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk/4.7?

2012-09-13  Jakub Jelinek  

PR c/54559
* c-typeck.c (c_finish_return): Do convert to BOOLEAN_TYPE or
COMPLEX_TYPE with in_late_binary_op set temporarily to true.

* gcc.c-torture/compile/pr54559.c: New test.

--- gcc/c/c-typeck.c.jj 2012-09-12 10:56:59.0 +0200
+++ gcc/c/c-typeck.c2012-09-13 10:57:24.318837832 +0200
@@ -8682,12 +8682,18 @@ c_finish_return (location_t loc, tree re
   npc, NULL_TREE, NULL_TREE, 0);
   tree res = DECL_RESULT (current_function_decl);
   tree inner;
+  bool save;
 
   current_function_returns_value = 1;
   if (t == error_mark_node)
return NULL_TREE;
 
+  save = in_late_binary_op;
+  if (TREE_CODE (TREE_TYPE (res)) == BOOLEAN_TYPE
+  || TREE_CODE (TREE_TYPE (res)) == COMPLEX_TYPE)
+in_late_binary_op = true;
   inner = t = convert (TREE_TYPE (res), t);
+  in_late_binary_op = save;
 
   /* Strip any conversions, additions, and subtractions, and see if
 we are returning the address of a local variable.  Warn if so.  */
--- gcc/testsuite/gcc.c-torture/compile/pr54559.c.jj2012-09-13 
11:23:32.049954011 +0200
+++ gcc/testsuite/gcc.c-torture/compile/pr54559.c   2012-09-13 
11:23:15.0 +0200
@@ -0,0 +1,9 @@
+/* PR c/54559 */
+
+typedef double _Complex T;
+
+T
+foo (double x, double y)
+{
+  return x + y * (T) (__extension__ 1.0iF);
+}

Jakub

Re: [C++ Patch] PR 53210

2012-09-13 Thread Manuel López-Ibáñez

On 13 September 2012 15:38, Jason Merrill  wrote:
>
> I think my preference would be to add -Winit-self to -Wall for C++; people
> can use -Wno-init-self if they don't want the warning.

But then the warning should report Winit-self (that is, use
OPT_Winit_self for warning) and not OPT_Wuninitialized. Because it is
what people should use to disabled it.

Re: [PATCH] Add option for dumping to stderr (issue6190057)

2012-09-13 Thread Sharad Singhai

> Is -fopt-info-rtl-all also accepted?

Currently it is accepted. However, based on the recent comments, I am
going to remove the pass name from the flags.

>
> It would be useful to have a good default for -fopt-info so that users
> can get high level info about optimizations without having to specify
> a pass. Can -fopt-info be mapped to
> "-fopt-info-tree-all-optimized=stderr

Yes, I have updated the patch so that -fopt-info without any pass
specifier would map to all passes which support this. Currently, only
vectorizer.

> -fopt-info-rtl-all-optimized=stderr"? And perhaps
> -fopt-info-all-OPTIONS can be mapped to all tree and rtl passes?
>
> Thanks,
> Teresa

Thanks,
Sharad

Re: [PATCH] Add option for dumping to stderr (issue6190057)

2012-09-13 Thread Sharad Singhai

That is a good point. Currently I am making a distinction between dump
flags and opt-info flags, but it is not necessary since the opt-info
flags can be thought of an extension of dump flags.

I will update the patch so that -fdump-tree-vect-optimized also works.

Thanks,
Sharad

On Thu, Sep 13, 2012 at 4:08 AM, Richard Guenther
 wrote:
> On Wed, Sep 12, 2012 at 6:46 PM, Xinliang David Li  wrote:
>> On Wed, Sep 12, 2012 at 3:30 AM, Richard Guenther
>>  wrote:
>>> On Wed, Sep 12, 2012 at 10:12 AM, Sharad Singhai  wrote:
 Thanks for your comments. Please see my responses inline.

 On Tue, Sep 11, 2012 at 1:16 PM, Xinliang David Li  
 wrote:
> Can you resend your patch in text form (also need to resolve the
> latest conflicts) so that it can be commented inline?

 I tried to include inline patch earlier but my message was bounced
 back from patches mailing list. I am trying it again.

> Please also provide as summary a more up-to-date description of
> 1) Command line option syntax and semantics

 I added some documentation in the patch. Here are the relevant bits
 from invoke.texi.

 `-fdump-tree-SWITCH-OPTIONS=FILENAME'
  Control the dumping at various stages of processing the
  intermediate language tree to a file.  The file name is generated
  by appending a switch-specific suffix to the source file name, and
  the file is created in the same directory as the output file. In
  case of `=FILENAME' option, the dump is output on the given file
  instead of the auto named dump files.
  ...

 `=FILENAME'
   Instead of an auto named dump file, output into the given file
   name. The file names `stdout' and `stderr' are treated
   specially and are considered already open standard streams.
   For example,

gcc -O2 -ftree-vectorize -fdump-tree-vect-details=foo.dump
 -fdump-tree-pre=stderr file.c

   outputs vectorizer dump into `foo.dump', while the PRE dump
   is output on to `stderr'. If two conflicting dump filenames
   are given for the same pass, then the latter option
   overrides the earlier one.

 `-fopt-info-PASS'
 `-fopt-info-PASS-OPTIONS'
 `-fopt-info-PASS-OPTIONS=FILENAME'
  Controls optimization dumps from various passes. If the `-OPTIONS'
  form is used, OPTIONS is a list of `-' separated options which
  controls the details of the dump.  If OPTIONS is not specified, it
  defaults to `optimized'. If the FILENAME is not specified, it
  defaults to `stderr'. Note that the output FILENAME will be
  overwritten in case of multiple translation units. If a combined
  output from multiple the translation units is desired, `stderr'
  should be used instead.

  The PASS could be one of the tree or rtl passes. The following
  options are available
>>>
>>> I don't like that we have -PASS here.  That makes it awfully similar
>>> to -fdump-PASS-OPTIONS=FILENAME.  Are we merely having
>>> -fopt-info because OPTIONS are "different"?
>>
>>
>> Having PASS is useful to do filtering. But as your said, the option
>> design here is very much oriented towards developers not end users
>> which fopt-info is also intended for.
>
> Just to add a comment here - -fopt-info is _only_ targeted at end users.
> Developers can use -fdump-tree-XXX=stderr now (which, with the correct
> pass / flags should produce identical output to -fopt-info - at least that
> was the whole point with the re-design of the dump API - to make it
> possible to implement -fopt-info in a way that it simply provides a nice
> interface to end-users to our existing dumping information.
>
> If it doesn't work like that right now we should make it work this way.
>
> Richard.

C++ PATCH for c++/53839 (ICE with constexpr)

2012-09-13 Thread Jason Merrill

We weren't requiring the result of an INDIRECT_REF to be a constant 
because we could end up taking its address again later, so in this case 
we ended up trying to handle a non-constant expression as a constant and 
failing.  But since we have the "addr" parameter we know whether or not 
we will end up taking the address again, and we can require a constant 
if not.


Tested x86_64-pc-linux-gnu, applying to trunk and 4.7.
commit 4156683505c976d284fe38eb3a5ca40812ae5d66
Author: Jason Merrill 
Date:   Tue Sep 11 06:15:25 2012 -0400

	PR c++/53839
	* semantics.c (cxx_eval_indirect_ref): If we aren't looking for an
	address, make sure the value is constant.

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index a6cdfb5..d19ff1c 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -7474,7 +7474,11 @@ cxx_eval_indirect_ref (const constexpr_call *call, tree t,
 }
 
   if (r == NULL_TREE)
-return t;
+{
+  if (!addr)
+	VERIFY_CONSTANT (t);
+  return t;
+}
   return r;
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-temp1.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-temp1.C
new file mode 100644
index 000..d065436
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-temp1.C
@@ -0,0 +1,9 @@
+// { dg-do compile { target c++11 } }
+
+struct A { int i; };
+constexpr A f2 (const A& a) { return a; }
+constexpr int f1 (const A &a) { return f2(a).i; }
+A g(const A &a)
+{
+  return { f1(a) };
+}

Re: [Patch, Fortran] PR54556 - fix (4.6/4.7/) 4.8 regression: wrong code with implicit_pure procedures

2012-09-13 Thread Mikael Morin

On 13/09/2012 14:35, Tobias Burnus wrote:
> gfortran wrongly marks some procedures as implicit_pure which aren't
> pure. implicit_pure exists since 2011-01-08 (= GCC 4.6), but was only
> used internally (FE optimization and trans*.c to avoid temporaries).
> Since 2012-08-28, implicit_pure also implies DECL_PURE_P.  The later
> change exposes a bug and  miscompiles CP2K.
> 
> The reason for the the bug is that gfc_impure_variable() checks at some
> point whether it is invoked in a PURE procedure. For implicit_pure
> procedures, the answer is no - thus that check never triggered. I have
> now removed the check - the callee already takes care of that. (Which is
> also implied by the function name.)
> 
> I additionally allow VALUE for implicit_pure. That's in line with PURE
> where VALUE is allowed since Fortran 2008. (I think since F2008's first
> technical corriendum.)
> 
> Build and regtested on x86-64-gnu-linux.
> OK for the trunk and for the 4.6/4.7 branches?
> 
> Tobias,
> who hopes that no additional implicit_pure bugs exist.

OK, thanks.

C++ PATCH for c++/54511 (anonymous union in template function)

2012-09-13 Thread Jason Merrill

Instantiating an anonymous union was problematic because we don't set up 
a mapping between the fake variables that point to the different 
members.  This patch fixes that by doing name lookup to find the 
corresponding fake variable in the instantiation, and then adding it to 
the hash table for later references.


Tested x86_64-pc-linux-gnu, applying to trunk, 4.7 and 4.6.
commit e1df1acb3674c52ee582ba8a5f756066e23e5016
Author: Jason Merrill 
Date:   Tue Sep 11 22:05:10 2012 -0400

	PR c++/54511
	* pt.c (tsubst_decl) [VAR_DECL]: Handle DECL_ANON_UNION_VAR_P.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 4cf2ed8..5b7976a 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -10443,6 +10443,16 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
 	break;
 	  }
 
+	if (TREE_CODE (t) == VAR_DECL && DECL_ANON_UNION_VAR_P (t))
+	  {
+	/* Just use name lookup to find a member alias for an anonymous
+	   union, but then add it to the hash table.  */
+	r = lookup_name (DECL_NAME (t));
+	gcc_assert (DECL_ANON_UNION_VAR_P (r));
+	register_local_specialization (r, t);
+	break;
+	  }
+
 	/* Create a new node for the specialization we need.  */
 	r = copy_decl (t);
 	if (type == NULL_TREE)
diff --git a/gcc/testsuite/g++.dg/template/anonunion2.C b/gcc/testsuite/g++.dg/template/anonunion2.C
new file mode 100644
index 000..cb3c12d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/anonunion2.C
@@ -0,0 +1,6 @@
+template 
+struct S
+{
+  S () { union { int a; }; a = 0; }
+};
+S<0> s;

C++ PATCH for c++/53836 (dependent parenthesized initializer)

2012-09-13 Thread Jason Merrill

When a parenthesized initializer has dependent elements, we leave it as 
a TREE_LIST.  We shouldn't let that confuse us into thinking that it 
isn't value-dependent.


Tested x86_64-pc-linux-gnu, applying to trunk and 4.7.
commit 26bd4898faf6a74d3e5f1531790cabd1a5d25d8a
Author: Jason Merrill 
Date:   Tue Sep 11 21:49:30 2012 -0400

	PR c++/53836
	* pt.c (value_dependent_expression_p): A TREE_LIST initializer must
	be dependent.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 768f141..4cf2ed8 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -19199,10 +19199,15 @@ value_dependent_expression_p (tree expression)
 
 case VAR_DECL:
/* A constant with literal type and is initialized
-	  with an expression that is value-dependent.  */
+	  with an expression that is value-dependent.
+
+  Note that a non-dependent parenthesized initializer will have
+  already been replaced with its constant value, so if we see
+  a TREE_LIST it must be dependent.  */
   if (DECL_INITIAL (expression)
 	  && decl_constant_var_p (expression)
-	  && value_dependent_expression_p (DECL_INITIAL (expression)))
+	  && (TREE_CODE (DECL_INITIAL (expression)) == TREE_LIST
+	  || value_dependent_expression_p (DECL_INITIAL (expression
 	return true;
   return false;
 
diff --git a/gcc/testsuite/g++.dg/template/init10.C b/gcc/testsuite/g++.dg/template/init10.C
new file mode 100644
index 000..1480622
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/init10.C
@@ -0,0 +1,15 @@
+template 
+struct A { };
+
+template 
+void g()
+{
+const int M ( Q );
+
+A a;
+}
+
+void h()
+{
+g<3>();
+}

Re: [C++ Patch] PR 53210

2012-09-13 Thread Jason Merrill


OK.

Jason

Re: [C++ Patch] PR 53210

2012-09-13 Thread Paolo Carlini


Hi,

On 09/13/2012 03:38 PM, Jason Merrill wrote:

On 09/13/2012 09:28 AM, Paolo Carlini wrote:

Jon noticed that for this testcase we don't warn at all even with -Wall,
whereas the code doesn't really make much sense. Turns out that the
warning is currently controlled both by warn_init_self (not part of
-Wall) and OPT_Wuninitialized. Thus Manuel proposes to simply remove the
former, because this isn't the specific case of int x = x which we want
to keep on "supporting" as a GNU extension. Also, as mentioned by Jon,
the user can always leave 'i' out the mem-initializer-list.

Alternately, one may want to use OPT_Winit_self, but then we still have
the issue that -Winit-self is not part of -Wall.


I think my preference would be to add -Winit-self to -Wall for C++; 
people can use -Wno-init-self if they don't want the warning.
Agreed. Then I'm finishing testing the below (already booted and tested 
C++, the other languages next). Ok?


Thanks,
Paolo.

///
2012-09-13  Paolo Carlini  

PR c++/53210
* doc/invoke.texi ([Winit-self]): Document as enabled by -Wall in C++.

/c-family
2012-09-13  Paolo Carlini  

PR c++/53210
* c.opt ([Winit-self]): Enabled by -Wall in C++.

/testsuite
2012-09-13  Paolo Carlini  

PR c++/53210
* g++.dg/warn/Wuninitialized-self.C: New.
Index: testsuite/g++.dg/warn/Winit-self.C
===
--- testsuite/g++.dg/warn/Winit-self.C  (revision 0)
+++ testsuite/g++.dg/warn/Winit-self.C  (revision 0)
@@ -0,0 +1,8 @@
+// PR c++/53210
+// { dg-options "-Wall" }
+
+struct S
+{
+  S(int i) : j(j) { }  // { dg-warning "is initialized with itself" }
+  int j;
+};
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 191245)
+++ doc/invoke.texi (working copy)
@@ -3348,6 +3348,8 @@ int f()
 @end group
 @end smallexample
 
+This warning is enabled by @option{-Wall} in C++.
+
 @item -Wimplicit-int @r{(C and Objective-C only)}
 @opindex Wimplicit-int
 @opindex Wno-implicit-int
Index: c-family/c.opt
===
--- c-family/c.opt  (revision 191245)
+++ c-family/c.opt  (working copy)
@@ -408,7 +408,7 @@ C C++ Var(warn_ignored_qualifiers) Warning Enabled
 Warn whenever type qualifiers are ignored.
 
 Winit-self
-C ObjC C++ ObjC++ Var(warn_init_self) Warning
+C ObjC C++ ObjC++ Var(warn_init_self) Warning LangEnabledBy(C++ ObjC++,Wall)
 Warn about variables which are initialized to themselves
 
 Wimplicit

Re: vector comparisons in C++

2012-09-13 Thread Jason Merrill


On 08/31/2012 06:20 PM, Marc Glisse wrote:

this patch copies some more vector extensions from the C front-end to
the C++ front-end. There seemed to be some reluctance to add those, but
I guess a patch is the best way to ask


What was the reluctance?  It seems clear to me that if we support the 
type, we should support these operations.  Following the OpenCL standard 
makes sense to me.



I have some issues with the vector-compare-2.c torture test. It passes a
vector by value (argument and return type), which is likely to warn
(although for some reason it doesn't for me, with today's compiler). And
it takes -Wno-psabi through a .x file, but those are not read in
c-c++-common, so I put it in dg-options.


That sounds fine.


I would have changed the
function to use pointers, but I don't know if it specifically wants to
test passing by value...


I don't know either.


+ if (TREE_TYPE (type0) != TREE_TYPE (type1))


I think this should use same_type_ignoring_top_level_qualifiers_p.

Jason

[SH] Fix bootstrap failures with --enable-checking

2012-09-13 Thread Christian Bruel

Hello,

This patch fixes a couple of assertions while building libgcc, when
configured with --enable-checking=all.

OK for trunk ?

thanks

Christian

2012-09-13  Christian Bruel  

	* config/sh/predicates.md (t_reg_operand): Check REG_P for SUBREG.
	* config/sh/sh.c (sequence_insn_p: Check INSNP_P for SEQUENCE.

Index: config/sh/predicates.md
===
--- config/sh/predicates.md	(revision 191222)
+++ config/sh/predicates.md	(working copy)
@@ -998,11 +998,12 @@
 	return REGNO (op) == T_REG;
 
   case SUBREG:
-	return REGNO (SUBREG_REG (op)) == T_REG;
+	return REG_P (SUBREG_REG (op)) && REGNO (SUBREG_REG (op)) == T_REG;
 
   case ZERO_EXTEND:
   case SIGN_EXTEND:
 	return GET_CODE (XEXP (op, 0)) == SUBREG
+	   && REG_P (SUBREG_REG (XEXP (op, 0)))
 	   && REGNO (SUBREG_REG (XEXP (op, 0))) == T_REG;
 
   default:
Index: config/sh/sh.c
===
--- config/sh/sh.c	(revision 191222)
+++ config/sh/sh.c	(working copy)
@@ -9876,7 +9876,7 @@ fpscr_set_from_mem (int mode, HARD_REG_SET regs_li
 static bool
 sequence_insn_p (rtx insn)
 {
-  rtx prev, next, pat;
+  rtx prev, next;
 
   prev = PREV_INSN (insn);
   if (prev == NULL)
@@ -9886,11 +9886,7 @@ sequence_insn_p (rtx insn)
   if (next == NULL)
 return false;
 
-  pat = PATTERN (next);
-  if (pat == NULL)
-return false;
-
-  return GET_CODE (pat) == SEQUENCE;
+  return INSN_P (next) && GET_CODE (PATTERN (next)) == SEQUENCE;
 }
 
 int

Re: [PATCH] Enable bbro for -Os

2012-09-13 Thread Eric Botcazou

> The updated patched is attached. Is it OK?

Yes, OK for mainline.

-- 
Eric Botcazou

Re: [C++ Patch] Remove uses of ATTRIBUTE_UNUSED in the function parameters

2012-09-13 Thread Gabriel Dos Reis

On Thu, Sep 13, 2012 at 9:00 AM, Paolo Carlini  wrote:
> On 09/11/2012 06:53 PM, Gabriel Dos Reis wrote:
>>
>> On Tue, Sep 11, 2012 at 10:37 AM, Jakub Jelinek  wrote:
>>>
>>> On Tue, Sep 11, 2012 at 05:29:12PM +0200, Paolo Carlini wrote:

 PS: slightly interesting, in a couple of cases -
 write_unnamed_type_name, wrap_cleanups_r - the parameters were
 actually used.
>>>
>>> Just a general comment, often an argument is only conditionally used,
>>> e.g. depending on some preprocessor macro (e.g. target hook).  In that
>>> case unnamed parameter is not an option, but dropping ATTRIBUTE_UNUSED is
>>> not desirable either.
>>
>> That a parameter is unused in a function body should be clear from the
>> context.
>> And in those case, it is desirable that the parameter be unnamed, and
>> the attribute
>> be dropped.  That is what Paolo's patch is doing.  That should not be
>> controversial.
>
> Thanks. Thus, I suppose I can ho ahead... If I don't hear further comments,
> tonight I will.
>
> Paolo.

Indeed.

-- Gaby

Re: [patch] Fix memory exhaustion during cunrolli

2012-09-13 Thread Richard Guenther

On Thu, Sep 13, 2012 at 4:00 PM, Eric Botcazou  wrote:
>> Indeed somewhat simple-minded - when originally fixing a similar testcase
>> (heh ...) I improved things by improving CFG cleanup to fold some more
>> conditions by looking at SSA defs, that improved things a lot.  I also
>> thought the real fix should involve some scalar optimization on a
>> sub-range of the CFG. That should be easiest when performing the copy in
>> the first place - after all we keep copy tables and such for the purpose
>> of update-SSA so we might as well create a lattice from PHI nodes we
>> disassemble for use by copy_bb ...
>
> I also thought of similar approaches, but couldn't really come up with
> something satisfactory.
>
>> On the patch itself - can you call the simple CCP before we call
>> cleanup_tree_cfg () please?
>
> That was the original implementation (in fact, in the original implementation
> the simple CCP was conditionalized on canonicalize_loop_induction_variables,
> but you broke it a few days ago by moving the call to update_ssa :-)
>
> The problem is that, if it is moved to before cleanup_tree_cfg, then:
>  1) you have more basic blocks (15 vs 2 for the testcase),
>  2) you also need to do simple CCP for degenerate PHI nodes.

Yes - now cfg_cleanup does that (and it really shouldn't be its job).  That
was the improvement I talked about - reducing the number of BBs a lot.

> That being said, I can certainly save in a bitmap the loop fathers for which
> canonicalize_loop_induction_variables unrolled a child and do the simple CPP
> (augmented for degenerate PHI nodes) on them between the calls to update_ssa
> and cleanup_tree_cfg.

Ah, indeed ;)  Or just push struct loop of changed loops onto a stack.

>> We might get rid of that weirdo SSA lookup
>> there again then:
>>
>> static bool
>> cleanup_control_expr_graph (basic_block bb, gimple_stmt_iterator gsi)
>> {
>> ...
>> /* For conditions try harder and lookup single-argument
>>PHI nodes.  Only do so from the same basic-block though
>>as other basic-blocks may be dead already.  */
>> if (TREE_CODE (lhs) == SSA_NAME
>> && !name_registered_for_update_p (lhs))
>
> I'll investigate.
>
>> + FOR_EACH_IMM_USE_ON_STMT (use, iter)
>> +   propagate_value (use, gimple_assign_rhs1 (stmt));
>> +
>> + fold_stmt_inplace (&use_stmt_gsi);
>> + update_stmt (use_stmt);
>>
>> Use SET_USE (use, rhs1) and cache gimple_assign_rhs1 somewhere.
>>
>>   if (fold_stmt_inplace (&use_stmt_gsi))
>> update_stmt (use_stmt);
>
> OK, will adjust, thanks.

Thanks,
Richard.

> --
> Eric Botcazou

Re: [C++ Patch] Remove uses of ATTRIBUTE_UNUSED in the function parameters

2012-09-13 Thread Paolo Carlini


On 09/11/2012 06:53 PM, Gabriel Dos Reis wrote:

On Tue, Sep 11, 2012 at 10:37 AM, Jakub Jelinek  wrote:

On Tue, Sep 11, 2012 at 05:29:12PM +0200, Paolo Carlini wrote:

PS: slightly interesting, in a couple of cases -
write_unnamed_type_name, wrap_cleanups_r - the parameters were
actually used.

Just a general comment, often an argument is only conditionally used,
e.g. depending on some preprocessor macro (e.g. target hook).  In that
case unnamed parameter is not an option, but dropping ATTRIBUTE_UNUSED is
not desirable either.

That a parameter is unused in a function body should be clear from the context.
And in those case, it is desirable that the parameter be unnamed, and
the attribute
be dropped.  That is what Paolo's patch is doing.  That should not be
controversial.
Thanks. Thus, I suppose I can ho ahead... If I don't hear further 
comments, tonight I will.


Paolo.

Re: [patch] Fix memory exhaustion during cunrolli

2012-09-13 Thread Eric Botcazou

> Indeed somewhat simple-minded - when originally fixing a similar testcase
> (heh ...) I improved things by improving CFG cleanup to fold some more
> conditions by looking at SSA defs, that improved things a lot.  I also
> thought the real fix should involve some scalar optimization on a
> sub-range of the CFG. That should be easiest when performing the copy in
> the first place - after all we keep copy tables and such for the purpose
> of update-SSA so we might as well create a lattice from PHI nodes we
> disassemble for use by copy_bb ...

I also thought of similar approaches, but couldn't really come up with 
something satisfactory.

> On the patch itself - can you call the simple CCP before we call
> cleanup_tree_cfg () please?

That was the original implementation (in fact, in the original implementation 
the simple CCP was conditionalized on canonicalize_loop_induction_variables, 
but you broke it a few days ago by moving the call to update_ssa :-)

The problem is that, if it is moved to before cleanup_tree_cfg, then:
 1) you have more basic blocks (15 vs 2 for the testcase),
 2) you also need to do simple CCP for degenerate PHI nodes.

That being said, I can certainly save in a bitmap the loop fathers for which
canonicalize_loop_induction_variables unrolled a child and do the simple CPP 
(augmented for degenerate PHI nodes) on them between the calls to update_ssa 
and cleanup_tree_cfg.

> We might get rid of that weirdo SSA lookup
> there again then:
> 
> static bool
> cleanup_control_expr_graph (basic_block bb, gimple_stmt_iterator gsi)
> {
> ...
> /* For conditions try harder and lookup single-argument
>PHI nodes.  Only do so from the same basic-block though
>as other basic-blocks may be dead already.  */
> if (TREE_CODE (lhs) == SSA_NAME
> && !name_registered_for_update_p (lhs))

I'll investigate.

> + FOR_EACH_IMM_USE_ON_STMT (use, iter)
> +   propagate_value (use, gimple_assign_rhs1 (stmt));
> +
> + fold_stmt_inplace (&use_stmt_gsi);
> + update_stmt (use_stmt);
> 
> Use SET_USE (use, rhs1) and cache gimple_assign_rhs1 somewhere.
> 
>   if (fold_stmt_inplace (&use_stmt_gsi))
> update_stmt (use_stmt);

OK, will adjust, thanks.

-- 
Eric Botcazou

Re: [PATCH] Combine location with block using block_locations

2012-09-13 Thread Robert Dewar


On 9/13/2012 9:38 AM, Jakub Jelinek wrote:

On Thu, Sep 13, 2012 at 09:33:20AM -0400, Robert Dewar wrote:

On 9/13/2012 8:00 AM, Richard Guenther wrote:


Because doing so would create code generation differences -g vs. -g0.


Sometimes I wonder whether the insistence on -g not changing code
generation is warranted. In practice, gdb for me is so weak in handling


It is.  IMHO the most important reason is not that somebody would build
first with just -O2 and then later on to debug the code would build it again
with -g -O2 and hope the code is the same, but by making sure -g vs. -g0
doesn't change generate code we ensure -g doesn't pessimize the generated
code, and really many people compile even production code with -g -O2
or similar.  The debug info is then either stripped, or stripped into
separate files/not shipped or only optionally shipped with the product.

Jakub


Sure, it is obvious that you don't want -g to affect -O1 or -O2 code,
but I think if you have -Og (if and when we have that), it would not
be a bad thing for -g to affect that. I can even imagine that what
-Og means is -O1 if you don't have -g, and something good for
debugging if you do have -g.

[PATCH, i386]: Remove mode of address_operand predicate from prefetch patterns

2012-09-13 Thread Uros Bizjak

Hello!

The mode of address_operand predicate is ignored in ix86_legitimate_address_p.

2012-08-13  Uros Bizjak  

* config/i386/i386.md (prefetch): Do not assert mode of operand 0.
(*prefetch_sse_): Do not set mode of address_operand predicate.
Rename to ...
(*prefetch_sse): ... this.
(*prefetch_3dnow_): Do not set mode of address_operand predicate.
Rename to ...
(*prefetch_3dnow): ... this.

Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32} and
committed to mainline SVN.

Uros.
Index: i386.md
===
--- i386.md (revision 191240)
+++ i386.md (working copy)
@@ -17800,12 +17800,10 @@
   int locality = INTVAL (operands[2]);
 
   gcc_assert (rw == 0 || rw == 1);
-  gcc_assert (locality >= 0 && locality <= 3);
-  gcc_assert (GET_MODE (operands[0]) == Pmode
- || GET_MODE (operands[0]) == VOIDmode);
+  gcc_assert (IN_RANGE (locality, 0, 3));
+
   if (TARGET_PRFCHW && rw)
 operands[2] = GEN_INT (3);
-
   /* Use 3dNOW prefetch in case we are asking for write prefetch not
  supported by SSE counterpart or the SSE prefetch is not available
  (K6 machines).  Otherwise use SSE prefetch as it allows specifying
@@ -17816,8 +17814,8 @@
 operands[1] = const0_rtx;
 })
 
-(define_insn "*prefetch_sse_"
-  [(prefetch (match_operand:P 0 "address_operand" "p")
+(define_insn "*prefetch_sse"
+  [(prefetch (match_operand 0 "address_operand" "p")
 (const_int 0)
 (match_operand:SI 1 "const_int_operand"))]
   "TARGET_PREFETCH_SSE"
@@ -17827,7 +17825,7 @@
   };
 
   int locality = INTVAL (operands[1]);
-  gcc_assert (locality >= 0 && locality <= 3);
+  gcc_assert (IN_RANGE (locality, 0, 3));
 
   return patterns[locality];
 }
@@ -17837,8 +17835,8 @@
(symbol_ref "memory_address_length (operands[0])"))
(set_attr "memory" "none")])
 
-(define_insn "*prefetch_3dnow_"
-  [(prefetch (match_operand:P 0 "address_operand" "p")
+(define_insn "*prefetch_3dnow"
+  [(prefetch (match_operand 0 "address_operand" "p")
 (match_operand:SI 1 "const_int_operand" "n")
 (const_int 3))]
   "TARGET_3DNOW || TARGET_PRFCHW"

Re: [PATCH] Combine location with block using block_locations

2012-09-13 Thread Jakub Jelinek

On Thu, Sep 13, 2012 at 09:33:20AM -0400, Robert Dewar wrote:
> On 9/13/2012 8:00 AM, Richard Guenther wrote:
> 
> >Because doing so would create code generation differences -g vs. -g0.
> 
> Sometimes I wonder whether the insistence on -g not changing code
> generation is warranted. In practice, gdb for me is so weak in handling

It is.  IMHO the most important reason is not that somebody would build
first with just -O2 and then later on to debug the code would build it again
with -g -O2 and hope the code is the same, but by making sure -g vs. -g0
doesn't change generate code we ensure -g doesn't pessimize the generated
code, and really many people compile even production code with -g -O2
or similar.  The debug info is then either stripped, or stripped into
separate files/not shipped or only optionally shipped with the product.

Jakub

Re: [patch] IPA cleanups and assorted cleanups

2012-09-13 Thread Steven Bosscher

On Thu, Sep 13, 2012 at 2:29 PM, Jan Hubicka  wrote:
> +/* Get the set of nodes for the cycle in the reduced call graph starting
> +   from NODE.  */
> +
> +VEC (cgraph_node_p, heap) *
> +ipa_get_nodes_in_cycle (struct cgraph_node *node)
>
> I never really like the api of SCC searching that made user to walk across AUX
> pointer.
>
> I however also do not like allocating a temporary vector and requiring user to
> mind to free it just to add abstraction about single linked list walk.

I think this new ipa_get_nodes_in_cycle thing is an improvement over
what there is now, but I agree that something better is necessary in
the long term. Not just for visiting nodes in a cycle, but also for
how a cycle is represented. Linking via the aux pointer wouldn't have
been my choice to begin with...

(FWIW, I also dislike the get_loop_body stuff. Loop nodes should just
be a set directly available from the loop info without a CFG DFS.)

>  What
> about adding convenient iterator API, especially now when we have the C++
> wonderland?

Maybe later. But I'll wait with C++ iterators until someone has put an
example in the trunk that I can copy-and-paste :-)

Ciao!
Steven

Re: [C++ Patch] PR 53210

2012-09-13 Thread Jason Merrill


On 09/13/2012 09:28 AM, Paolo Carlini wrote:

Jon noticed that for this testcase we don't warn at all even with -Wall,
whereas the code doesn't really make much sense. Turns out that the
warning is currently controlled both by warn_init_self (not part of
-Wall) and OPT_Wuninitialized. Thus Manuel proposes to simply remove the
former, because this isn't the specific case of int x = x which we want
to keep on "supporting" as a GNU extension. Also, as mentioned by Jon,
the user can always leave 'i' out the mem-initializer-list.

Alternately, one may want to use OPT_Winit_self, but then we still have
the issue that -Winit-self is not part of -Wall.


I think my preference would be to add -Winit-self to -Wall for C++; 
people can use -Wno-init-self if they don't want the warning.


Jason

Re: [patch] IPA cleanups and assorted cleanups

2012-09-13 Thread Richard Guenther

On Thu, Sep 13, 2012 at 3:08 PM, Steven Bosscher  wrote:
>> +/* Compute X &= Y, taking into account the possibility that
>> +   X may become the maximum set.  */
>>
>> Hmm, how can X become the maximum set if it was not the maximum set
>> before?  Thus, shouldn't this simply be
>>
>>   if (y == all_module_statics)
>>/* do nothing */;
>>  else
>> ...
>>
>> ?
>
> No. The local sets contain all initially considered static vars, but
> this set is later pruned to the "really" static vars. See the code
> below the comment "/* Now we know what vars are really statics; prune
> out those that aren't.  */". It may happen (and apparently it
> frequently does happen, going on my memory numbers) that after
> pruning, a local set becomes equivalent to the maximum set.

Hm, ok - so the maximum set isn't really "maximum" after all ;)

>> Otherwise ok
>
> Also OK with the above not changed?

Yes.

Thanks,
Richard.

>
>> (the patch could have been split though).
>
> I know. I'll try to behave next time :-)
>
> Ciao!
> Steven

Re: [patch] Fix memory exhaustion during cunrolli

2012-09-13 Thread Richard Guenther

On Thu, Sep 13, 2012 at 3:11 PM, Eric Botcazou  wrote:
> Hi,
>
> the attached testcase triggers a memory exhaustion at -O2 during the cunrolli
> pass on the mainline and 4.7 branch.  The problem is that the size estimates
> disregard induction variable computations on the ground that they will be
> folded later.  But they aren't folded between the iterations of the loop so
> they can add up and exhaust the memory during SSA updating if stars are
> properly aligned.
>
> The patch is a somewhat simple-minded fix...  Bootstrapped/regtested on 
> x86_64-
> suse-linux.  OK for mainline and 4.7 branch?

Indeed somewhat simple-minded - when originally fixing a similar testcase
(heh ...) I improved things by improving CFG cleanup to fold some more
conditions by looking at SSA defs, that improved things a lot.  I also thought
the real fix should involve some scalar optimization on a sub-range of the CFG.
That should be easiest when performing the copy in the first place - after all
we keep copy tables and such for the purpose of update-SSA so we might as
well create a lattice from PHI nodes we disassemble for use by copy_bb ...

On the patch itself - can you call the simple CCP before we call
cleanup_tree_cfg () please?  We might get rid of that weirdo SSA lookup
there again then:

static bool
cleanup_control_expr_graph (basic_block bb, gimple_stmt_iterator gsi)
{
...
/* For conditions try harder and lookup single-argument
   PHI nodes.  Only do so from the same basic-block though
   as other basic-blocks may be dead already.  */
if (TREE_CODE (lhs) == SSA_NAME
&& !name_registered_for_update_p (lhs))
...

+ FOR_EACH_IMM_USE_ON_STMT (use, iter)
+   propagate_value (use, gimple_assign_rhs1 (stmt));
+
+ fold_stmt_inplace (&use_stmt_gsi);
+ update_stmt (use_stmt);

Use SET_USE (use, rhs1) and cache gimple_assign_rhs1 somewhere.

  if (fold_stmt_inplace (&use_stmt_gsi))
update_stmt (use_stmt);

Thanks,
Richard.

>
> 2012-09-13  Eric Botcazou  
>
> * tree-ssa-loop-ivcanon.c (propagate_constants_for_unrolling): New.
> (tree_unroll_loops_completely): Starting from the second iteration,
> propagate constants within the innermost loops.
>
>
> 2012-09-13  Eric Botcazou  
>
> * gnat.dg/loop_optimization12.ad[sb]: New test.
>
>
> --
> Eric Botcazou

Re: [PATCH] Combine location with block using block_locations

2012-09-13 Thread Robert Dewar


On 9/13/2012 8:00 AM, Richard Guenther wrote:


Because doing so would create code generation differences -g vs. -g0.


Sometimes I wonder whether the insistence on -g not changing code
generation is warranted. In practice, gdb for me is so weak in handling
-O1 or -O2, that if I want to debug something I have to recompile
with -O0 -g, which causes quite a bit of code generation change :-)

[C++ Patch] PR 53210

2012-09-13 Thread Paolo Carlini


Hi,

Jon noticed that for this testcase we don't warn at all even with -Wall, 
whereas the code doesn't really make much sense. Turns out that the 
warning is currently controlled both by warn_init_self (not part of 
-Wall) and OPT_Wuninitialized. Thus Manuel proposes to simply remove the 
former, because this isn't the specific case of int x = x which we want 
to keep on "supporting" as a GNU extension. Also, as mentioned by Jon, 
the user can always leave 'i' out the mem-initializer-list.


Alternately, one may want to use OPT_Winit_self, but then we still have 
the issue that -Winit-self is not part of -Wall.


Anyway, I booted and tested the patchlet preferred by Manuel as-is.

Thanks,
Paolo.

///
/cp
2012-09-13  Manuel López-Ibáñez  

PR c++/53210
* init.c (perform_member_init): Remove warn_init_self check from
"initialized with itself" warning.

/testsuite
2012-09-13  Manuel López-Ibáñez  

PR c++/53210
* g++.dg/warn/Wuninitialized-self.C: New.
Index: testsuite/g++.dg/warn/Wuninitialized-self.C
===
--- testsuite/g++.dg/warn/Wuninitialized-self.C (revision 0)
+++ testsuite/g++.dg/warn/Wuninitialized-self.C (revision 0)
@@ -0,0 +1,8 @@
+// PR c++/53210
+// { dg-options "-Wuninitialized" }
+
+struct S
+{
+  S(int i) : j(j) { }  // { dg-warning "is initialized with itself" }
+  int j;
+};
Index: cp/init.c
===
--- cp/init.c   (revision 191245)
+++ cp/init.c   (working copy)
@@ -571,7 +571,7 @@ perform_member_init (tree member, tree init)
   if (decl == error_mark_node)
 return;
 
-  if (warn_init_self && init && TREE_CODE (init) == TREE_LIST
+  if (init && TREE_CODE (init) == TREE_LIST
   && TREE_CHAIN (init) == NULL_TREE)
 {
   tree val = TREE_VALUE (init);

Re: [SH] Add simple_return pattern

2012-09-13 Thread Kaz Kojima

Christian Bruel  wrote:
> The failure turned out to be issues with the profile count and handling
> or region partitioning. So, I prefer to handle those separately,
> For now, I disable shrink-wrap when partitioning, even if the problem
> seems to have disappeared with the more constrained heuristics. This is
> probably latent also on other targets BTW.
> 
> I added a sh_can_use_simple_return_p function that makes the heuristic
> refinements more convenient. For instance, measured that shrink-wrap is
> generally not good when optimizing for size because we might introduce
> new return instructions or split blocks to avoid the epilogue, that is
> still in the code somewhere anyway.
> 
> Cycle-accurate benchmarks show a few very small improvements (there and
> there, about max 2%. accordingly, the prologue is rarely in the critical
> path...) but no regression. Manual assembly peering of CSiBE show that
> the transformation are decent.
> 
> Checked with all assertions this time, Candidate for trunk.

The patch is OK for trunk.  Thanks for looking into the problem.

Regards,
kaz

[patch] Fix memory exhaustion during cunrolli

2012-09-13 Thread Eric Botcazou

Hi,

the attached testcase triggers a memory exhaustion at -O2 during the cunrolli 
pass on the mainline and 4.7 branch.  The problem is that the size estimates 
disregard induction variable computations on the ground that they will be 
folded later.  But they aren't folded between the iterations of the loop so 
they can add up and exhaust the memory during SSA updating if stars are 
properly aligned.

The patch is a somewhat simple-minded fix...  Bootstrapped/regtested on x86_64-
suse-linux.  OK for mainline and 4.7 branch?


2012-09-13  Eric Botcazou  

* tree-ssa-loop-ivcanon.c (propagate_constants_for_unrolling): New.
(tree_unroll_loops_completely): Starting from the second iteration,
propagate constants within the innermost loops.


2012-09-13  Eric Botcazou  

* gnat.dg/loop_optimization12.ad[sb]: New test.


-- 
Eric Botcazou
Index: tree-ssa-loop-ivcanon.c
===
--- tree-ssa-loop-ivcanon.c	(revision 191198)
+++ tree-ssa-loop-ivcanon.c	(working copy)
@@ -503,6 +503,48 @@ canonicalize_induction_variables (void)
   return 0;
 }
 
+/* Propagate constant SSA_NAMEs defined in basic block BB.  */
+
+static void
+propagate_constants_for_unrolling (basic_block bb)
+{
+  gimple_stmt_iterator gsi;
+
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); )
+{
+  gimple stmt = gsi_stmt (gsi);
+  tree lhs;
+
+  /* Look for assignments to SSA names with constant RHS.  */
+  if (is_gimple_assign (stmt)
+	  && (lhs = gimple_assign_lhs (stmt), TREE_CODE (lhs) == SSA_NAME)
+	  && gimple_assign_rhs_code (stmt) == INTEGER_CST)
+	{
+	  imm_use_iterator iter;
+	  gimple use_stmt;
+
+	  /* Propagate the RHS into all the uses of the SSA name.  */
+	  FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs)
+	{
+	  gimple_stmt_iterator use_stmt_gsi = gsi_for_stmt (use_stmt);
+	  use_operand_p use;
+
+	  FOR_EACH_IMM_USE_ON_STMT (use, iter)
+	propagate_value (use, gimple_assign_rhs1 (stmt));
+
+	  fold_stmt_inplace (&use_stmt_gsi);
+	  update_stmt (use_stmt);
+	}
+
+	  /* And get rid of the now unused SSA name.  */
+	  gsi_remove (&gsi, true);
+	  release_ssa_name (lhs);
+	}
+  else
+	gsi_next (&gsi);
+}
+}
+
 /* Unroll LOOPS completely if they iterate just few times.  Unless
MAY_INCREASE_SIZE is true, perform the unrolling only if the
size of the code does not increase.  */
@@ -522,6 +564,19 @@ tree_unroll_loops_completely (bool may_i
 
   FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
 	{
+	  /* If we have already unrolled, we need to propagate constants
+	 within the new basic blocks to fold away induction variable
+	 computations; otherwise, the size might blow up before the
+	 iteration is complete and the IR eventually cleaned up.  */
+	  if (iteration > 0)
+	{
+	  unsigned i;
+	  basic_block *body = get_loop_body_in_dom_order (loop);
+	  for (i = 0; i < loop->num_nodes; i++)
+		propagate_constants_for_unrolling (body[i]);
+	  free (body);
+	}
+
 	  if (may_increase_size && optimize_loop_for_speed_p (loop)
 	  /* Unroll outermost loops only if asked to do so or they do
 		 not cause code growth.  */
@@ -530,6 +585,7 @@ tree_unroll_loops_completely (bool may_i
 	ul = UL_ALL;
 	  else
 	ul = UL_NO_GROWTH;
+
 	  changed |= canonicalize_loop_induction_variables
 		   (loop, false, ul, !flag_tree_loop_ivcanon);
 	}
-- { dg-do compile }
-- { dg-options "-O2" }

package body Loop_Optimization12 is

  procedure Reset (S : Rec_Ptr) is
  begin
for I in Enum1 loop
  S.F (I).all := (others =>
   (others =>
 (others =>
   (others =>
 (others =>
   (others =>
 (others =>
   (others =>
(others =>
  (others => 0));
end loop;
  end;

end Loop_Optimization12;
package Loop_Optimization12 is

  type Enum1 is (A, B, C, D, E, F, G, H, I, J);

  type Enum2 is (A, B, C);

  type Enum3 is (A, B, C, D, E, F);

  type Enum4 is (A, B, C, D);

  type Enum5 is (A, B, C, D, E);

  type Arr is array (Enum3, Enum4, Enum4, Enum5, Enum5, Enum3,
 Enum2, Enum3, Enum5, Enum3) of Natural;

  type Arr_Ptr is access Arr;
  type Ext_Arr is array (Enum1) of Arr_Ptr;

  type Rec is record
F : Ext_Arr;
  end record;

  type Rec_Ptr is access Rec;

  procedure Reset (S : Rec_Ptr);

end Loop_Optimization12;

Re: [patch] IPA cleanups and assorted cleanups

2012-09-13 Thread Steven Bosscher

> +/* Compute X &= Y, taking into account the possibility that
> +   X may become the maximum set.  */
>
> Hmm, how can X become the maximum set if it was not the maximum set
> before?  Thus, shouldn't this simply be
>
>   if (y == all_module_statics)
>/* do nothing */;
>  else
> ...
>
> ?

No. The local sets contain all initially considered static vars, but
this set is later pruned to the "really" static vars. See the code
below the comment "/* Now we know what vars are really statics; prune
out those that aren't.  */". It may happen (and apparently it
frequently does happen, going on my memory numbers) that after
pruning, a local set becomes equivalent to the maximum set.

> Otherwise ok

Also OK with the above not changed?


> (the patch could have been split though).

I know. I'll try to behave next time :-)

Ciao!
Steven

Re: [PATCH] Expand pow(x,n) into mulitplies in cse_sincos pass (PR46728, patch 2)

2012-09-13 Thread H.J. Lu

On Tue, May 24, 2011 at 1:35 PM, William J. Schmidt
 wrote:
> Here's a small patch to expand pow(x,n) for integer n using the
> powi(x,n) logic in the cse_sincos pass.  OK for trunk?
>
> For the next patch, I'll plan on expanding pow(x,n) for n in
> {0.5, 0.25, 0.75, 1./3., 1./6.}.  This logic will be added to
> gimple_expand_builtin_pow.
>
> Bill
>
>
> 2011-05-24  Bill Schmidt  
> PR tree-optimization/46728
> * tree-ssa-math-opts.c (gimple_expand_builtin_pow): New.
> (execute_cse_sincos): Add switch case for BUILT_IN_POW.
>

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54563

-- 
H.J.

[Patch, Fortran] PR54556 - fix (4.6/4.7/) 4.8 regression: wrong code with implicit_pure procedures

2012-09-13 Thread Tobias Burnus

gfortran wrongly marks some procedures as implicit_pure which aren't 
pure. implicit_pure exists since 2011-01-08 (= GCC 4.6), but was only 
used internally (FE optimization and trans*.c to avoid temporaries). 
Since 2012-08-28, implicit_pure also implies DECL_PURE_P.  The later 
change exposes a bug and  miscompiles CP2K.


The reason for the the bug is that gfc_impure_variable() checks at some 
point whether it is invoked in a PURE procedure. For implicit_pure 
procedures, the answer is no - thus that check never triggered. I have 
now removed the check - the callee already takes care of that. (Which is 
also implied by the function name.)


I additionally allow VALUE for implicit_pure. That's in line with PURE 
where VALUE is allowed since Fortran 2008. (I think since F2008's first 
technical corriendum.)


Build and regtested on x86-64-gnu-linux.
OK for the trunk and for the 4.6/4.7 branches?

Tobias,
who hopes that no additional implicit_pure bugs exist.
2012-09-13  Tobias Burnus  

	PR fortran/54556
	* resolve.c (resolve_formal_arglist): Allow VALUE arguments
	with implicit_pure.
	(gfc_impure_variable): Don't check gfc_pure such that the
	function also works for gfc_implicit_pure procedures.

2012-09-13  Tobias Burnus  

	PR fortran/54556
	* gfortran.dg/implicit_pure_3.f90: New.

diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index 28eea5d..0748b6a 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -426,10 +426,12 @@ resolve_formal_arglist (gfc_symbol *proc)
 	}
 	  else if (!sym->attr.pointer)
 	{
-	  if (proc->attr.function && sym->attr.intent != INTENT_IN)
+	  if (proc->attr.function && sym->attr.intent != INTENT_IN
+		  && !sym->value)
 		proc->attr.implicit_pure = 0;
 
-	  if (proc->attr.subroutine && sym->attr.intent == INTENT_UNKNOWN)
+	  if (proc->attr.subroutine && sym->attr.intent == INTENT_UNKNOWN
+		  && !sym->value)
 		proc->attr.implicit_pure = 0;
 	}
 	}
@@ -13565,10 +13567,9 @@ gfc_impure_variable (gfc_symbol *sym)
 }
 
   proc = sym->ns->proc_name;
-  if (sym->attr.dummy && gfc_pure (proc)
-	&& ((proc->attr.subroutine && sym->attr.intent == INTENT_IN)
-		||
-	 proc->attr.function))
+  if (sym->attr.dummy
+  && ((proc->attr.subroutine && sym->attr.intent == INTENT_IN)
+	  || proc->attr.function))
 return 1;
 
   /* TODO: Sort out what can be storage associated, if anything, and include
--- /dev/null	2012-09-13 07:07:28.691771313 +0200
+++ gcc/gcc/testsuite/gfortran.dg/implicit_pure_3.f90	2012-09-13 14:17:18.0 +0200
@@ -0,0 +1,109 @@
+! { dg-do compile }
+! { dg-options "-O2 -fdump-tree-optimized" }
+!
+! PR fortran/54556
+!
+! Contributed by Joost VandeVondele
+!
+MODULE parallel_rng_types
+
+  IMPLICIT NONE
+
+  ! Global parameters in this module
+  INTEGER, PARAMETER :: dp=8
+
+  TYPE rng_stream_type
+PRIVATE
+CHARACTER(LEN=40) :: name
+INTEGER   :: distribution_type
+REAL(KIND=dp), DIMENSION(3,2) :: bg,cg,ig
+LOGICAL   :: antithetic,extended_precision
+REAL(KIND=dp) :: buffer
+LOGICAL   :: buffer_filled
+  END TYPE rng_stream_type
+
+  REAL(KIND=dp), DIMENSION(3,3) :: a1p0,a1p76,a1p127,&
+   a2p0,a2p76,a2p127,&
+   inv_a1,inv_a2
+
+  INTEGER, PARAMETER  :: GAUSSIAN = 1,&
+ UNIFORM  = 2
+
+  REAL(KIND=dp), PARAMETER :: norm  = 2.328306549295727688e-10_dp,&
+  m1= 4294967087.0_dp,&
+  m2= 429493.0_dp,&
+  a12   = 1403580.0_dp,&
+  a13n  = 810728.0_dp,&
+  a21   = 527612.0_dp,&
+  a23n  = 1370589.0_dp,&
+  two17 = 131072.0_dp,&! 2**17
+  two53 = 9007199254740992.0_dp,&  ! 2**53
+  fact  = 5.9604644775390625e-8_dp ! 1/2**24
+
+
+CONTAINS
+
+  FUNCTION rn32(rng_stream) RESULT(u)
+
+TYPE(rng_stream_type), POINTER   :: rng_stream
+REAL(KIND=dp):: u
+
+INTEGER  :: k
+REAL(KIND=dp):: p1, p2
+
+! -
+! Component 1
+
+p1 = a12*rng_stream%cg(2,1) - a13n*rng_stream%cg(1,1)
+k = INT(p1/m1)
+p1 = p1 - k*m1
+IF (p1 < 0.0_dp) p1 = p1 + m1
+rng_stream%cg(1,1) = rng_stream%cg(2,1)
+rng_stream%cg(2,1) = rng_stream%cg(3,1)
+rng_stream%cg(3,1) = p1
+
+! Component 2
+
+p2 = a21*rng_stream%cg(3,2) - a23n*rng_stream%cg(1,2)
+k = INT(p2/m2)
+p2 = p2 - k*m2
+IF (p2 < 0.0_dp) p2 = p2 + m2
+rng_stream%cg(1,2) = rng_stream%cg(2,2)
+rng_stream%cg(2,2) = rng_stream%cg(3,2)
+rng_stream%cg(3,2) = p2
+
+! Comb

Re: [patch] IPA cleanups and assorted cleanups

2012-09-13 Thread Jan Hubicka

> Hello,
> 
> This patch cleans up some things that annoyed me in ipa-reference.c
> and ipa-pure-const.c. These two passes are very important but they
> show all the signs of being developed when GCC's IPA infrastructure
> was still (or at least even more than today) in its infancy: Walking

yeah, those really developed from really weird initial implementations and was
getting updated for a long while..  Cleaning this up was on my TODO for a
while, but never got high enough.  Thanks for looking into it.

> 2. The management of sets is clarified using functions with intuitive
> names that also help reduce the cost of the transitive closure by
> detecting when a set union(X,Y) is the maximum set (reducing the
> memory footprint from 3 GB to just ~120MB for a large C++ test case)

nice... I tought it was already done :))

Index: ipa-utils.c
===
--- ipa-utils.c (revision 191214)
+++ ipa-utils.c (working copy)
@@ -154,8 +154,11 @@ searchc (struct searchc_env* env, struct cgraph_no
 
 /* Topsort the call graph by caller relation.  Put the result in ORDER.
 
-   The REDUCE flag is true if you want the cycles reduced to single nodes.  Set
-   ALLOW_OVERWRITABLE if nodes with such availability should be included.
+   The REDUCE flag is true if you want the cycles reduced to single nodes.
+   You can use ipa_get_nodes_in_cycle to obtain a vector containing all real
+   call graph nodes in a reduced node.
+
+   Set ALLOW_OVERWRITABLE if nodes with such availability should be included.
IGNORE_EDGE, if non-NULL is a hook that may make some edges insignificant
for the topological sort.   */
 
@@ -231,6 +234,23 @@ ipa_free_postorder_info (void)
 }
 }
 
+/* Get the set of nodes for the cycle in the reduced call graph starting
+   from NODE.  */
+
+VEC (cgraph_node_p, heap) *
+ipa_get_nodes_in_cycle (struct cgraph_node *node)

I never really like the api of SCC searching that made user to walk across AUX
pointer.

I however also do not like allocating a temporary vector and requiring user to
mind to free it just to add abstraction about single linked list walk.  What
about adding convenient iterator API, especially now when we have the C++
wonderland?

Thanks,
Honza

[PATCH] Fix ADDR_EXPR handling in SCCVN and PRE

2012-09-13 Thread Richard Guenther


This unifies the two code paths that try to figure out which
VN handling routines are responsible for value-numbering.  It
also fixes ADDR_EXPR handling so that we handle those properly.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2012-09-13  Richard Guenther  

* tree-ssa-sccvn.h (enum vn_kind): New.
(vn_get_stmt_kind): Likewise.
* tree-ssa-sccvn.c (vn_get_stmt_kind): New function, adjust
ADDR_EXPR handling.
(visit_use): Use it.
* tree-ssa-pre.c (compute_avail): Likewise, simplify further.

* gcc.dg/tree-ssa/ssa-fre-37.c: New testcase.

Index: gcc/tree-ssa-sccvn.h
===
--- gcc/tree-ssa-sccvn.h(revision 191247)
+++ gcc/tree-ssa-sccvn.h(working copy)
@@ -121,6 +121,9 @@ typedef struct vn_constant_s
   tree constant;
 } *vn_constant_t;
 
+enum vn_kind { VN_NONE, VN_CONSTANT, VN_NARY, VN_REFERENCE, VN_PHI };
+enum vn_kind vn_get_stmt_kind (gimple);
+
 /* Hash the constant CONSTANT with distinguishing type incompatible
constants in the types_compatible_p sense.  */
 
Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 191247)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -287,6 +287,63 @@ vn_get_expr_for (tree name)
   return expr;
 }
 
+/* Return the vn_kind the expression computed by the stmt should be
+   associated with.  */
+
+enum vn_kind
+vn_get_stmt_kind (gimple stmt)
+{
+  switch (gimple_code (stmt))
+{
+case GIMPLE_CALL:
+  return VN_REFERENCE;
+case GIMPLE_PHI:
+  return VN_PHI;
+case GIMPLE_ASSIGN:
+  {
+   enum tree_code code = gimple_assign_rhs_code (stmt);
+   tree rhs1 = gimple_assign_rhs1 (stmt);
+   switch (get_gimple_rhs_class (code))
+ {
+ case GIMPLE_UNARY_RHS:
+ case GIMPLE_BINARY_RHS:
+ case GIMPLE_TERNARY_RHS:
+   return VN_NARY;
+ case GIMPLE_SINGLE_RHS:
+   switch (TREE_CODE_CLASS (code))
+ {
+ case tcc_reference:
+   /* VOP-less references can go through unary case.  */
+   if ((code == REALPART_EXPR
+|| code == IMAGPART_EXPR
+|| code == VIEW_CONVERT_EXPR
+|| code == BIT_FIELD_REF)
+   && TREE_CODE (TREE_OPERAND (rhs1, 0)) == SSA_NAME)
+ return VN_NARY;
+
+   /* Fallthrough.  */
+ case tcc_declaration:
+   return VN_REFERENCE;
+
+ case tcc_constant:
+   return VN_CONSTANT;
+
+ default:
+   if (code == ADDR_EXPR)
+ return (is_gimple_min_invariant (rhs1)
+ ? VN_CONSTANT : VN_REFERENCE);
+   else if (code == CONSTRUCTOR)
+ return VN_NARY;
+   return VN_NONE;
+ }
+ default:
+   return VN_NONE;
+ }
+  }
+default:
+  return VN_NONE;
+}
+}
 
 /* Free a phi operation structure VP.  */
 
@@ -3364,44 +3421,13 @@ visit_use (tree use)
}
  else
{
- switch (get_gimple_rhs_class (code))
+ switch (vn_get_stmt_kind (stmt))
{
-   case GIMPLE_UNARY_RHS:
-   case GIMPLE_BINARY_RHS:
-   case GIMPLE_TERNARY_RHS:
+   case VN_NARY:
  changed = visit_nary_op (lhs, stmt);
  break;
-   case GIMPLE_SINGLE_RHS:
- switch (TREE_CODE_CLASS (code))
-   {
-   case tcc_reference:
- /* VOP-less references can go through unary case.  */
- if ((code == REALPART_EXPR
-  || code == IMAGPART_EXPR
-  || code == VIEW_CONVERT_EXPR
-  || code == BIT_FIELD_REF)
- && TREE_CODE (TREE_OPERAND (rhs1, 0)) == SSA_NAME)
-   {
- changed = visit_nary_op (lhs, stmt);
- break;
-   }
- /* Fallthrough.  */
-   case tcc_declaration:
- changed = visit_reference_op_load (lhs, rhs1, stmt);
- break;
-   default:
- if (code == ADDR_EXPR)
-   {
- changed = visit_nary_op (lhs, stmt);
- break;
-   }
- else if (code == CONSTRUCTOR)
-   {
- changed = visit_nary_op (lhs, stmt);
- break;

[AArch64, AArch64-4.7] Fix target ordering in config.gcc.

2012-09-13 Thread Sofiane Naci

Hi,

I've just committed the attached patch on the branches

   ARM/aarch64-branch
   ARM/aarch64-4.7-branch

to fix the target ordering in supported_defaults in config.gcc.

Thank you
Sofiane

[AArch64] Merge from upstream trunk r191124

2012-09-13 Thread Sofiane Naci

Hi,

I have just merged upstream trunk on the aarch64-branch up to r191124.
As a result, I have also updated the AArch64 backend with the attached
patch.

Thanks
Sofiane


aarch64-191124-rebase.patch
Description: Binary data

Re: [PATCH] Combine location with block using block_locations

2012-09-13 Thread Richard Guenther

On Wed, Sep 12, 2012 at 10:44 PM, Dehao Chen  wrote:
> Attached is the memory consumption report for a very large source
> file. Looks like this patch actually reduced the memory consumption by
> 2%.

Please make sure to test large C++ expression template users.  Large
regular programs do not stress this part.

Richard.

> Dehao
>
> On Thu, Sep 13, 2012 at 1:18 AM, Xinliang David Li  wrote:
>> On Wed, Sep 12, 2012 at 10:05 AM, Dehao Chen  wrote:
>>> There are two parts that needs memory management:
>>>
>>> 1. The BLOCK structure. This is managed by GC. I originally thought
>>> that removing blocks from tree.gsbase would paralyze GC. This turned
>>> out not to be a concern because DECL_INITIAL will still mark those
>>> used tree nodes. This patch may decrease the memory consumption by
>>> removing blocks from tree/gimple. However, as it makes more blocks
>>> become used, they also increase the memory consumption.
>>
>> You mean when you also make the location table GC root.
>>
>> Can you share the mem-stats information for the large program with and
>> without your patch?
>>
>> thanks,
>>
>> David
>>
>>> 2. The data structure in libcpp that maintains the hashtable for the
>>> location->block mapping. This is relatively minor because for the
>>> largest source I've seen, it only maintains less than 100K entries in
>>> the array (less than 1M total memory consumption). However, as it is a
>>> global data structure, it may make LTO unhappy. Honza is helping
>>> testing the memory consumption on LTO (but we first need to make this
>>> patch work for LTO). If the LTO result turns out ok, we probably don't
>>> want to put these under GC because: 1. it'll make things much more
>>> complicated. 2. using self managed memory is more efficient (as this
>>> is frequently used in many passes). 3. not using GC actually saves
>>> memory because even though the block is in the map, it can still be
>>> GCed as soon as it's not reachable from DECL_INITIAL.
>>>
>>> I've tested this on some very large C++ files (each one takes more
>>> than 10s to build), the memory consumption does not see noticeable
>>> increase/decrease.
>>>
>>> Thanks,
>>> Dehao
>>>
>>> On Wed, Sep 12, 2012 at 9:39 AM, Xinliang David Li  
>>> wrote:
 On Wed, Sep 12, 2012 at 2:13 AM, Richard Guenther
  wrote:
> On Wed, Sep 12, 2012 at 7:06 AM, Dehao Chen  wrote:
>> Now I think we are facing a more complex problem. The data structure
>> we use to store the location_adhoc_data are file-static in linemap.c
>> in libcpp. These data structures are not guarded by GTY(()).
>> Meanwhile, as we have removed the block data structure from
>> gimple.gsbase as well as tree.exp (encoding them into an location_t).
>> This could cause block being GCed and the LOCATION_BLOCK becoming
>> dangling pointers.
>
> Uh.  Note that it is quite important that we are able to garbage-collect 
> unused
> BLOCKs, this is the whole point of removing unused BLOCK scopes in
> remove_unused_locals.  So this indeed becomes much more complicated ...
> What would be desired is that the garbage collector can NULL an entry in
> the mapping table when it is not referenced in any other way (that other
> reference would be the BLOCK tree as stored in a FUNCTION_DECLs 
> DECL_INITIAL).

 It would be nice to GC those unused BLOCKS. I wonder how many BLOCKS
 are created for a large C++ program. This patch saves memory by
 shrinking tree size, is it a net win or loss without GC those BLOCKS?

 thanks,

 David


>
>> I tried to manipulate GTY to make it recognize the LOCATION_BLOCK from
>> gimple.gsbase.location. However, neigher nested_ptr nor mark_hook can
>> help me.
>>
>> Another approach would be guard the location_adhoc_data and related
>> data structures in GTY(()). However, this is non-trivial because tree
>> is not visible in libcpp. At the same time, my implementation heavily
>> relies on hashtable to make the code efficient, thus it's quite tricky
>> to make "param_is" and "use_params" work.
>>
>> The final approach, which I'll try tomorrow, would be move all my
>> implementation from libcpp to gcc, and guard them with GTY(()). I
>> still haven't thought of any potential problem of this approach. Any
>> comments?
>
> I think moving the mapping to GC in a lazy manner as I described above
> would be the way to go.  For hashtables GC already supports if_marked,
> not sure if similar support is available for arrays/vecs.
>
> Richard.
>
>> Thanks,
>> Dehao
>>
>> On Tue, Sep 11, 2012 at 9:00 AM, Dehao Chen  wrote:
>>> I saw comments in tree-streamer-out.c:
>>>
>>>   /* Do not stream BLOCK_SOURCE_LOCATION.  We cannot handle debug 
>>> information
>>>  for early inlining so drop it on the floor instead of ICEing in
>>>  dwarf2out.c.  */
>>>   stream

Re: [PATCH] Combine location with block using block_locations

2012-09-13 Thread Richard Guenther

On Wed, Sep 12, 2012 at 6:39 PM, Xinliang David Li  wrote:
> On Wed, Sep 12, 2012 at 2:13 AM, Richard Guenther
>  wrote:
>> On Wed, Sep 12, 2012 at 7:06 AM, Dehao Chen  wrote:
>>> Now I think we are facing a more complex problem. The data structure
>>> we use to store the location_adhoc_data are file-static in linemap.c
>>> in libcpp. These data structures are not guarded by GTY(()).
>>> Meanwhile, as we have removed the block data structure from
>>> gimple.gsbase as well as tree.exp (encoding them into an location_t).
>>> This could cause block being GCed and the LOCATION_BLOCK becoming
>>> dangling pointers.
>>
>> Uh.  Note that it is quite important that we are able to garbage-collect 
>> unused
>> BLOCKs, this is the whole point of removing unused BLOCK scopes in
>> remove_unused_locals.  So this indeed becomes much more complicated ...
>> What would be desired is that the garbage collector can NULL an entry in
>> the mapping table when it is not referenced in any other way (that other
>> reference would be the BLOCK tree as stored in a FUNCTION_DECLs 
>> DECL_INITIAL).
>
> It would be nice to GC those unused BLOCKS. I wonder how many BLOCKS
> are created for a large C++ program. This patch saves memory by
> shrinking tree size, is it a net win or loss without GC those BLOCKS?

Memory usage issues pop up with C++ code using expression templates
(try BOOST MPL or tramp3d or some larger spirit testcases).  Inlining
creates tons of "empty" BLOCK trees that just wrap others.  It is important
to be able to GC those.  Now, it might be that no expression / location
which references the BLOCK survives, and if the line-table is not scanned
by GC then we will just end up with never re-usable entries (the BLOCK address
may get re-used - can we get false sharing here?)

Richard.

> thanks,
>
> David
>
>
>>
>>> I tried to manipulate GTY to make it recognize the LOCATION_BLOCK from
>>> gimple.gsbase.location. However, neigher nested_ptr nor mark_hook can
>>> help me.
>>>
>>> Another approach would be guard the location_adhoc_data and related
>>> data structures in GTY(()). However, this is non-trivial because tree
>>> is not visible in libcpp. At the same time, my implementation heavily
>>> relies on hashtable to make the code efficient, thus it's quite tricky
>>> to make "param_is" and "use_params" work.
>>>
>>> The final approach, which I'll try tomorrow, would be move all my
>>> implementation from libcpp to gcc, and guard them with GTY(()). I
>>> still haven't thought of any potential problem of this approach. Any
>>> comments?
>>
>> I think moving the mapping to GC in a lazy manner as I described above
>> would be the way to go.  For hashtables GC already supports if_marked,
>> not sure if similar support is available for arrays/vecs.
>>
>> Richard.
>>
>>> Thanks,
>>> Dehao
>>>
>>> On Tue, Sep 11, 2012 at 9:00 AM, Dehao Chen  wrote:
 I saw comments in tree-streamer-out.c:

   /* Do not stream BLOCK_SOURCE_LOCATION.  We cannot handle debug 
 information
  for early inlining so drop it on the floor instead of ICEing in
  dwarf2out.c.  */
   streamer_write_chain (ob, BLOCK_VARS (expr), ref_p);

 However, what the code is doing seemed contradictory with the comment.
 Or am I missing something?



 On Tue, Sep 11, 2012 at 8:32 AM, Michael Matz  wrote:
> Hi,
>
> On Tue, 11 Sep 2012, Dehao Chen wrote:
>
>> Looks like we have two choices:
>>
>> 1. Stream out block info, and use LTO_SET_PREVAIL for TREE_CHAIN(t)
>
> This will actually not work correctly in some cases.  The problem is, if
> the prevailing decl is already part of another chain (say in another
> block_var list) you would break the current chain.  Hence block vars need
> special handling in the lto streamer (another reason why tree_chain is not
> the most clever think to use for this chain).  This problem area needs to
> be solved somehow if block info is to be preserved correctly.
>
>> 2. Don't stream out block info for LTO, and still call LTO_NO_PREVAIL
>> (TREE_CHAIN (t)).
>
> That's also a large hammer as it basically will mean no debug info after
> LTO :-/ Sigh, at this point I have no good solution that doesn't involve
> quite some work, perhaps your hack is good enough for the time being,
> though I hate it :)

 I got it. Then I'll keep the patch as it is (remove the
 LTO_NO_PREVAIL), and work with Honza to resolve the issue he had, and
 then we should be good to check in?

 Thanks,
 Dehao

>
>
> Ciao,
> Michael.

Re: [PATCH] Combine location with block using block_locations

2012-09-13 Thread Richard Guenther

On Wed, Sep 12, 2012 at 7:20 PM, Dehao Chen  wrote:
> There is another bug in the patch (not covered by unittests,
> discovered through spec benchmarks).
>
> When we remove unused locals, we do not mark the block as used for
> debug stmt, but gimple-streamer-out will still stream out blocks for
> debug stmt. There can be 2 fixes:

Because doing so would create code generation differences -g vs. -g0.

> 1.
> --- a/gcc/gimple-streamer-out.c
> +++ b/gcc/gimple-streamer-out.c
> @@ -77,7 +77,8 @@ output_gimple_stmt (struct output_block *ob, gimple stmt)
>lto_output_location (ob, LOCATION_LOCUS (gimple_location (stmt)));
>
>/* Emit the lexical block holding STMT.  */
> -  stream_write_tree (ob, gimple_block (stmt), true);
> +  if (!is_gimple_debug (stmt))
> +stream_write_tree (ob, gimple_block (stmt), true);
>
>/* Emit the operands.  */
>switch (gimple_code (stmt))
>
> 2.
> --- a/gcc/tree-ssa-live.c
> +++ b/gcc/tree-ssa-live.c
> @@ -726,9 +726,6 @@ remove_unused_locals (void)
>   gimple stmt = gsi_stmt (gsi);
>   tree b = gimple_block (stmt);
>
> - if (is_gimple_debug (stmt))
> -   continue;
> -
>   if (gimple_clobber_p (stmt))
> {
>   have_local_clobbers = true;
>
> Either fix could work. Any suggests which one should we go?

The 2nd one will not work and is not acceptable.  The 1st one - well ...
what happens on trunk right now?  The debug stmt points to a
BLOCK that is possibly removed from the BLOCK tree?  In this case
I think the fix is 3. make sure remove_unused_scope_block_p will
clear BLOCKs from all stmts / expressions that have been removed.

Richard.

> Thanks,
> Dehao
>
> On Wed, Sep 12, 2012 at 10:05 AM, Dehao Chen  wrote:
>> There are two parts that needs memory management:
>>
>> 1. The BLOCK structure. This is managed by GC. I originally thought
>> that removing blocks from tree.gsbase would paralyze GC. This turned
>> out not to be a concern because DECL_INITIAL will still mark those
>> used tree nodes. This patch may decrease the memory consumption by
>> removing blocks from tree/gimple. However, as it makes more blocks
>> become used, they also increase the memory consumption.
>> 2. The data structure in libcpp that maintains the hashtable for the
>> location->block mapping. This is relatively minor because for the
>> largest source I've seen, it only maintains less than 100K entries in
>> the array (less than 1M total memory consumption). However, as it is a
>> global data structure, it may make LTO unhappy. Honza is helping
>> testing the memory consumption on LTO (but we first need to make this
>> patch work for LTO). If the LTO result turns out ok, we probably don't
>> want to put these under GC because: 1. it'll make things much more
>> complicated. 2. using self managed memory is more efficient (as this
>> is frequently used in many passes). 3. not using GC actually saves
>> memory because even though the block is in the map, it can still be
>> GCed as soon as it's not reachable from DECL_INITIAL.
>>
>> I've tested this on some very large C++ files (each one takes more
>> than 10s to build), the memory consumption does not see noticeable
>> increase/decrease.
>>
>> Thanks,
>> Dehao
>>
>> On Wed, Sep 12, 2012 at 9:39 AM, Xinliang David Li  
>> wrote:
>>> On Wed, Sep 12, 2012 at 2:13 AM, Richard Guenther
>>>  wrote:
 On Wed, Sep 12, 2012 at 7:06 AM, Dehao Chen  wrote:
> Now I think we are facing a more complex problem. The data structure
> we use to store the location_adhoc_data are file-static in linemap.c
> in libcpp. These data structures are not guarded by GTY(()).
> Meanwhile, as we have removed the block data structure from
> gimple.gsbase as well as tree.exp (encoding them into an location_t).
> This could cause block being GCed and the LOCATION_BLOCK becoming
> dangling pointers.

 Uh.  Note that it is quite important that we are able to garbage-collect 
 unused
 BLOCKs, this is the whole point of removing unused BLOCK scopes in
 remove_unused_locals.  So this indeed becomes much more complicated ...
 What would be desired is that the garbage collector can NULL an entry in
 the mapping table when it is not referenced in any other way (that other
 reference would be the BLOCK tree as stored in a FUNCTION_DECLs 
 DECL_INITIAL).
>>>
>>> It would be nice to GC those unused BLOCKS. I wonder how many BLOCKS
>>> are created for a large C++ program. This patch saves memory by
>>> shrinking tree size, is it a net win or loss without GC those BLOCKS?
>>>
>>> thanks,
>>>
>>> David
>>>
>>>

> I tried to manipulate GTY to make it recognize the LOCATION_BLOCK from
> gimple.gsbase.location. However, neigher nested_ptr nor mark_hook can
> help me.
>
> Another approach would be guard the location_adhoc_data and related
> data structures in GTY(()). However, this is non-trivial because tree
> is not visible in libcpp. A

Re: [SH] Add simple_return pattern

2012-09-13 Thread Christian Bruel

Hi Kaz,

The failure turned out to be issues with the profile count and handling
or region partitioning. So, I prefer to handle those separately,
For now, I disable shrink-wrap when partitioning, even if the problem
seems to have disappeared with the more constrained heuristics. This is
probably latent also on other targets BTW.

I added a sh_can_use_simple_return_p function that makes the heuristic
refinements more convenient. For instance, measured that shrink-wrap is
generally not good when optimizing for size because we might introduce
new return instructions or split blocks to avoid the epilogue, that is
still in the code somewhere anyway.

Cycle-accurate benchmarks show a few very small improvements (there and
there, about max 2%. accordingly, the prologue is rarely in the critical
path...) but no regression. Manual assembly peering of CSiBE show that
the transformation are decent.

Checked with all assertions this time, Candidate for trunk.

Many thanks

Christian


On 09/11/2012 03:05 AM, Kaz Kojima wrote:
> Christian Bruel  wrote:
>> This patch implements the simple_return pattern to enable -fshrink-wrap
>> on SH. It also clean up some redundancies for expand_epilogue (called
>> twice from the "return" and "epilogue" patterns and the
>> sh_expand_prologue parameter type.
>>
>> No regressions with sh-superh-elf and sh4-linux gcc testsuites.
> 
> With the patch + revision 191106, I've got a new failure:
> 
> FAIL: gcc.dg/tree-prof/bb-reorg.c compilation,  -fprofile-use -D_PROFILE_USE 
> (internal compiler error)
> 
> for sh4-unknown-linux-gnu.  My testsuite/gcc/gcc.log says
> 
> /exp/ldroot/dodes/xsh-gcc/gcc/xgcc -B/exp/ldroot/dodes/xsh-gcc/gcc/ 
> /exp/ldroot/dodes/LOCAL/trunk/gcc/testsuite/gcc.dg/tree-prof/bb-reorg.c 
> -fno-diagnostics-show-caret -O2 -freorder-blocks-and-partition -fprofile-use 
> -D_PROFILE_USE -lm -o /exp/ldroot/dodes/xsh-gcc/gcc/testsuite/gcc/bb-reorg.x02
> /exp/ldroot/dodes/LOCAL/trunk/gcc/testsuite/gcc.dg/tree-prof/bb-reorg.c: In 
> function 'main':
> /exp/ldroot/dodes/LOCAL/trunk/gcc/testsuite/gcc.dg/tree-prof/bb-reorg.c:38:1: 
> error: EDGE_CROSSING missing across section boundary
> /exp/ldroot/dodes/LOCAL/trunk/gcc/testsuite/gcc.dg/tree-prof/bb-reorg.c:38:1: 
> internal compiler error: verify_flow_info failed
> Please submit a full bug report,
> 
> Regards,
>   kaz
> 
2012-09-12  Christian Bruel  

	PR target/54546
	* config/sh/sh-protos.h (sh_need_epilogue): Delete.
	(sh_can_use_simple_return_p): Declare.
	* config/sh/sh.c (sh_can_use_simple_return_p): Define.
	(sh_need_epilogue, sh_need_epilogue_known): Delete.
	(sh_output_function_epilogue): Remove sh_need_epilogue_known.
	* config/sh/sh.md (simple_return, return): Define.
	(epilogue): Use inline return rtl.
	(sh_expand_epilogue): Cleanup parameters boolean type.
	* config/sh/iterators.md (any_return): New iterator.

Index: config/sh/sh-protos.h
===
--- config/sh/sh-protos.h	(revision 191129)
+++ config/sh/sh-protos.h	(working copy)
@@ -117,7 +117,6 @@ extern rtx get_fpscr_rtx (void);
 extern int sh_media_register_for_return (void);
 extern void sh_expand_prologue (void);
 extern void sh_expand_epilogue (bool);
-extern bool sh_need_epilogue (void);
 extern void sh_set_return_address (rtx, rtx);
 extern int initial_elimination_offset (int, int);
 extern bool fldi_ok (void);
@@ -155,4 +154,5 @@ extern int sh2a_get_function_vector_number (rtx);
 extern bool sh2a_is_function_vector_call (rtx);
 extern void sh_fix_range (const char *);
 extern bool sh_hard_regno_mode_ok (unsigned int, enum machine_mode);
+extern bool sh_can_use_simple_return_p (void);
 #endif /* ! GCC_SH_PROTOS_H */
Index: config/sh/sh.c
===
--- config/sh/sh.c	(revision 191129)
+++ config/sh/sh.c	(working copy)
@@ -7899,24 +7899,6 @@ sh_expand_epilogue (bool sibcall_p)
 emit_use (gen_rtx_REG (SImode, PR_REG));
 }
 
-static int sh_need_epilogue_known = 0;
-
-bool
-sh_need_epilogue (void)
-{
-  if (! sh_need_epilogue_known)
-{
-  rtx epilogue;
-
-  start_sequence ();
-  sh_expand_epilogue (0);
-  epilogue = get_insns ();
-  end_sequence ();
-  sh_need_epilogue_known = (epilogue == NULL ? -1 : 1);
-}
-  return sh_need_epilogue_known > 0;
-}
-
 /* Emit code to change the current function's return address to RA.
TEMP is available as a scratch register, if needed.  */
 
@@ -7996,7 +7978,6 @@ static void
 sh_output_function_epilogue (FILE *file ATTRIBUTE_UNUSED,
 			 HOST_WIDE_INT size ATTRIBUTE_UNUSED)
 {
-  sh_need_epilogue_known = 0;
 }
 
 static rtx
@@ -12959,4 +12940,34 @@ sh_init_sync_libfuncs (void)
   init_sync_libfuncs (UNITS_PER_WORD);
 }
 
+/* Return true if it is appropriate to emit `ret' instructions in the
+   body of a function.  */
+
+bool
+sh_can_use_simple_return_p (void)
+{
+  HARD_REG_SET live_regs_mask;
+  int d;
+
+  if (! reload_completed || frame_pointer_needed)
+

Re: [PING^2] C++ conversion - pull in cstdlib

2012-09-13 Thread Richard Guenther

On Thu, Sep 13, 2012 at 12:10 AM, Oleg Endo  wrote:
> Hello,
>
> On Sun, 2012-09-02 at 01:19 +0200, Oleg Endo wrote:
>> On Sat, 2012-09-01 at 18:25 +0200, Oleg Endo wrote:
>> > On Sat, 2012-09-01 at 16:17 +, Joseph S. Myers wrote:
>> > > On Sat, 1 Sep 2012, Oleg Endo wrote:
>> > >
>> > > > Ping!
>> > > >
>> > > > This allows one to include e.g.  in GCC source files.
>> > > > Since the switch to C++ has been made, this should be OK to do now, I
>> > > > guess.
>> > >
>> > > This is not a review, but have you tested building the Ada front end with
>> > > this patch applied?  Given recent issues relating to how Ada uses
>> > > system.h, I think any such changes need testing for Ada.
>> > >
>> >
>> > No I haven't. C and C++ only. Good to know, thanks.  Will try.
>> >
>>
>> OK, now I have. ada, c, c++, fortran, go, java, objc, obj-c++ do build
>> here.
>>
>
> Would it be OK to install the patch originally posted here:
> http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01761.html ?
>
> If not OK, it's also fine.  Just let me know.  It seems the issue can be
> worked around in individual source files by including  before
> "system.h" (as it is already done in config/sh/sh.c).

In general system headers should be exclusively included from system.h.
As C++ standard headers may pull in system headers that includes them.
This is to allow various workarounds for host compiler / OS issues in a
central place as well as not affecting those headers with the #poisonings
we do at the end of system.h.

Richard.

> Cheers,
> Oleg
>
>
>

Re: Loop stride inline hint

2012-09-13 Thread Richard Guenther

On Wed, Sep 12, 2012 at 11:53 PM, Jan Hubicka  wrote:
> Hi,
> this patch makes inliner to realize that it is good idea to inline when loop
> stride becomes constant. This is mostly to help fortran testcases where
> it is important to inline to get array descriptors.

I think the same applies to upper/lower bounds.

Richard.

> Bootstrapped/regtested x86_64-linux, comitted.
>
> Honza
>
> * ipa-inline-analysis.c (dump_inline_hints): Dump loop stride.
> (set_hint_predicate): New function.
> (reset_inline_summary): Reset loop stride.
> (remap_predicate_after_duplication): New function.
> (remap_hint_predicate_after_duplication): New function.
> (inline_node_duplication_hook): Update.
> (dump_inline_summary): Dump stride summaries.
> (estimate_function_body_sizes): Compute strides.
> (remap_hint_predicate): New function.
> (inline_merge_summary): Use it.
> (inline_read_section): Read stride.
> (inline_write_summary): Write stride.
> * ipa-inline.c (want_inline_small_function_p): Handle strides.
> (edge_badness): Likewise.
> * ipa-inline.h (inline_hints_vals): Add stride hint.
> (inline_summary): Update stride.
>
> * gcc.dg/ipa/inlinehint-2.c: New testcase.
> Index: ipa-inline.c
> ===
> *** ipa-inline.c(revision 191228)
> --- ipa-inline.c(working copy)
> *** want_inline_small_function_p (struct cgr
> *** 481,487 
> else if (DECL_DECLARED_INLINE_P (callee->symbol.decl)
>&& growth >= MAX_INLINE_INSNS_SINGLE
>&& !(hints & (INLINE_HINT_indirect_call
> !| INLINE_HINT_loop_iterations)))
> {
> e->inline_failed = CIF_MAX_INLINE_INSNS_SINGLE_LIMIT;
>   want_inline = false;
> --- 481,488 
> else if (DECL_DECLARED_INLINE_P (callee->symbol.decl)
>&& growth >= MAX_INLINE_INSNS_SINGLE
>&& !(hints & (INLINE_HINT_indirect_call
> !| INLINE_HINT_loop_iterations
> !| INLINE_HINT_loop_stride)))
> {
> e->inline_failed = CIF_MAX_INLINE_INSNS_SINGLE_LIMIT;
>   want_inline = false;
> *** want_inline_small_function_p (struct cgr
> *** 533,539 
>  inlining given function is very profitable.  */
> else if (!DECL_DECLARED_INLINE_P (callee->symbol.decl)
>&& growth >= ((hints & (INLINE_HINT_indirect_call
> !  | INLINE_HINT_loop_iterations))
>  ? MAX (MAX_INLINE_INSNS_AUTO,
> MAX_INLINE_INSNS_SINGLE)
>  : MAX_INLINE_INSNS_AUTO))
> --- 534,541 
>  inlining given function is very profitable.  */
> else if (!DECL_DECLARED_INLINE_P (callee->symbol.decl)
>&& growth >= ((hints & (INLINE_HINT_indirect_call
> !  | INLINE_HINT_loop_iterations
> !  | INLINE_HINT_loop_stride))
>  ? MAX (MAX_INLINE_INSNS_AUTO,
> MAX_INLINE_INSNS_SINGLE)
>  : MAX_INLINE_INSNS_AUTO))
> *** edge_badness (struct cgraph_edge *edge,
> *** 866,872 
> fprintf (dump_file, "Badness overflow\n");
> }
> if (hints & (INLINE_HINT_indirect_call
> !  | INLINE_HINT_loop_iterations))
> badness /= 8;
> if (dump)
> {
> --- 868,875 
> fprintf (dump_file, "Badness overflow\n");
> }
> if (hints & (INLINE_HINT_indirect_call
> !  | INLINE_HINT_loop_iterations
> !  | INLINE_HINT_loop_stride))
> badness /= 8;
> if (dump)
> {
> Index: ipa-inline.h
> ===
> *** ipa-inline.h(revision 191228)
> --- ipa-inline.h(working copy)
> *** typedef struct GTY(()) condition
> *** 46,52 
>  They are represtented as bitmap of the following values.  */
>   enum inline_hints_vals {
> INLINE_HINT_indirect_call = 1,
> !   INLINE_HINT_loop_iterations = 2
>   };
>   typedef int inline_hints;
>
> --- 46,53 
>  They are represtented as bitmap of the following values.  */
>   enum inline_hints_vals {
> INLINE_HINT_indirect_call = 1,
> !   INLINE_HINT_loop_iterations = 2,
> !   INLINE_HINT_loop_stride = 4
>   };
>   typedef int inline_hints;
>
> *** struct GTY(()) inline_summary
> *** 120,128 
> conditions conds;
> VEC(size_time_entry,gc) *entry;
>
> !   /* Predicate on when some loop in the function sbecomes to have known
>bounds.   */
> struct predicate * GTY((skip)) loop_iterations;
>   }

Re: [patch] IPA cleanups and assorted cleanups

2012-09-13 Thread Richard Guenther

On Wed, Sep 12, 2012 at 8:27 PM, Steven Bosscher  wrote:
> Hello,
>
> This patch cleans up some things that annoyed me in ipa-reference.c
> and ipa-pure-const.c. These two passes are very important but they
> show all the signs of being developed when GCC's IPA infrastructure
> was still (or at least even more than today) in its infancy: Walking
> cycles of collapsed nodes is clumsy, handling of multiple functions
> uses langhooks, and there are many, many style issues, most of which
> can be traced back to change upon change in the SVN repository as the
> IPA infrastructure developed.
>
> The first block of the ChangeLog entry is a simple cleanup in
> ipa-pure-const.c, which doesn't and shouldn't have to use langhooks to
> print a function name.
>
> The biggest cleanup is in ipa-reference.c, the second block in the
> ChangeLog entry:
> 1. A new function "ipa_get_nodes_in_cycle" returns a VEC of cgraph
> nodes (much like get_loop_body) that can be walked in sequence.
> 2. The management of sets is clarified using functions with intuitive
> names that also help reduce the cost of the transitive closure by
> detecting when a set union(X,Y) is the maximum set (reducing the
> memory footprint from 3 GB to just ~120MB for a large C++ test case)
> 3. Add helper functions to dump and copy sets
>
> The third block of the ChangeLog entry is some prep work for
> structural analysis code that Kenny wrote 7 years ago, and that I'm
> trying to bring in a shape where I can do something useful with it. I
> know of a few places where the empty_bb_p hook can be used, and I'll
> submit patches for that later. The other hook is for splitting a block
> to create a region header. It has no uses in the trunk but it would
> help if it can go in now already anyway :-)
>
> Bootstrapped&tested on {x86_64,powerpc64}-unknown-linux-gnu. OK for trunk?

Yay!  First reference!

+static bool
+union_static_var_sets (bitmap &x, bitmap y)

;)

+/* Compute X &= Y, taking into account the possibility that
+   X may become the maximum set.  */

Hmm, how can X become the maximum set if it was not the maximum set
before?  Thus, shouldn't this simply be

  if (y == all_module_statics)
   /* do nothing */;
 else
...

?

Otherwise ok (the patch could have been split though).

Thanks,
Richard.




> Ciao!
> Steven
>
> * ipa-pure-const.c (state_from_flags, local_pure_const): Use
> current_function_name instead of lang_hooks.decl_printable_name.
>
> * function.h (fndecl_name): New prototype.
> * function.c (fndecl_name): New function.
> * vecir.h (cgraph_node_p): New standard IR VEC type.
> * trans-mem.c (cgraph_node_p): No need anymore to define it here.
> * ipa-utils.h (ipa_get_nodes_in_cycle): New prototype.
> * ipa-utils.c (ipa_get_nodes_in_cycle): New function.
> * ipa-reference.c: Don't include langhooks.h, and certainly not twice.
> Fix many formatting issues (long lines, short lines, spacing, etc.).
> (get_static_name): Use fndecl_name.
> (dump_static_vars_set_to_file): New function split out from propagate.
> (union_static_var_sets): New function, union two sets and collapse
> to all_module_statics as quickly as possible.
> (intersect_static_var_sets): New function, similar to above.
> (copy_static_var_set): Renamed from copy_global_bitmap and rewritten
> to allocate a copy on the same bitmap_obstack as the source set.
> (propagate_bits): Simplify, and clarify by using 
> union_static_var_sets.
> (generate_summary): Remove bm_temp.  Print UID of promotable globals.
> (read_write_all_from_decl): Use pass-by-reference, bless C++.
> (get_read_write_all_from_node): New function, split out from 
> propagate.
> (propagate): Simplify and clarify with helper functions.  Use
> ipa_get_nodes_in_cycle to walk all nodes in a reduced node.
> (ipa_reference_read_optimization_summary): Use fndecl_name instead of
> lang_hooks.decl_printable_name.
>
> * rtl.h (print_rtl_single_with_indent): New prototype.
> * print-rtl.c (print_rtl_single_with_indent): New function.
> * cfghooks.h (empty_block_p, split_block_before_cond_jump): New hooks.
> * cfghooks.c (empty_block_p, split_block_before_cond_jump): Implement.
> * cfgrtl.c (rtl_block_empty_p, rtl_split_block_before_cond_jump):
> Implement RTL specific hooks.
> (rtl_cfg_hooks, cfg_layout_rtl_cfg_hooks): Register the new hooks.
> * tree-cfg.c (gimple_empty_block_p,
> gimple_split_block_before_cond_jump): Implement GIMPLE specific hooks.
> (gimple_cfg_hooks): Register the new hooks.
> * tree-ssa-phiopt.c (empty_block_p): Remove in favor of new hook.

Re: Finish up PR rtl-optimization/44194

2012-09-13 Thread Richard Guenther

On Wed, Sep 12, 2012 at 5:37 PM, Eric Botcazou  wrote:
> This is the PR about the useless spilling to memory of structures that are
> returned in registers.  It was essentially addressed last year by Easwaran 
> with
> an enhancement of the RTL DSE pass, but Easwaran also noted that we still 
> spill
> to memory in the simplest cases, e.g. gcc.dg/pr44194-1.c, because expand_call
> creates a temporary on the stack to store the value returned in registers...
>
> The attached patch solves this problem by copying the value into pseudos
> instead by means of emit_group_move_into_temps.  This is sufficient to get rid
> of the remaining memory accesses for gcc.dg/pr44194-1.c on x86-64 for example,
> but not on strict-alignment platforms like SPARC64.
>
> The problem is that, on strict-alignment platforms, emit_group_store will use
> bitfield techniques (store_bit_field) to store the returned value, and the
> bitfield routines (store_bit_field and extract_bit_field) have these lines:
>
>   /* We may be accessing data outside the field, which means
>  we can alias adjacent data.  */
>   if (MEM_P (op0))
> {
>   op0 = shallow_copy_rtx (op0);
>   set_mem_alias_set (op0, 0);
>   set_mem_expr (op0, 0);
> }
>
> Now the enhancement implemented in the RTL DSE pass by Easwaran is precisely
> based on the MEM_EXPR of MEM objects.
>
> The patch solves this problem by implementing a variant of adjust_address 
> along
> the lines of the comment at the end of adjust_address_1:
>
>   /* At some point, we should validate that this offset is within the object,
>  if all the appropriate values are known.  */
>   return new_rtx;
>
> i.e. adjust_bitfield_address will drop the underlying object of the MEM if it
> cannot prove that the adjusted memory access is still within its bounds.
> The bitfield manipulation routines in expmed.c are then changed to invoke
> adjust_bitfield_address instead of adjust_address and the above special lines
> in store_bit_field and extract_bit_field are eliminated.
>
> While I was at it, I also fixed a probable oversight in extract_bit_field_1
> that has bothered me for a while: in the multi-word case, extract_bit_field_1
> recurses on extract_bit_field instead of itself (unlike store_bit_field_1),
> which short-circuits the FALLBACK_P parameter.
>
> Tested on x86-64/Linux and SPARC64/Solaris.  Comments?

Sounds like a good cleanup to me.

Richard.

>
> 2012-09-12  Eric Botcazou  
>
> PR rtl-optimization/44194
> * calls.c (expand_call): In the PARALLEL case, copy the return value
> into pseudos instead of spilling it onto the stack.
> * emit-rtl.c (adjust_address_1): Rename ADJUST into ADJUST_ADDRESS and
> add new ADJUST_OBJECT parameter.
> If ADJUST_OBJECT is set, drop the underlying object if it cannot be
> proved that the adjusted memory access is still within its bounds.
> (adjust_automodify_address_1): Adjust call to adjust_address_1.
> (widen_memory_access): Likewise.
> * expmed.c (store_bit_field_1): Call adjust_bitfield_address instead
> of adjust_address.  Do not drop the underlying object of a MEM.
> (store_fixed_bit_field): Likewise.
> (extract_bit_field_1): Likewise.  Fix oversight in recursion.
> (extract_fixed_bit_field): Likewise.
> * expr.h (adjust_address_1): Adjust prototype.
> (adjust_address): Adjust call to adjust_address_1.
> (adjust_address_nv): Likewise.
> (adjust_bitfield_address): New macro.
> (adjust_bitfield_address_nv): Likewise.
> * expr.c (expand_assignment): Handle a PARALLEL in more cases.
> (store_expr): Likewise.
> (store_field): Likewise.
>
> * dse.c: Fix typos in the head comment.
>
>
> --
> Eric Botcazou

Re: [PATCH] Add option for dumping to stderr (issue6190057)

2012-09-13 Thread Richard Guenther

On Wed, Sep 12, 2012 at 6:46 PM, Xinliang David Li  wrote:
> On Wed, Sep 12, 2012 at 3:30 AM, Richard Guenther
>  wrote:
>> On Wed, Sep 12, 2012 at 10:12 AM, Sharad Singhai  wrote:
>>> Thanks for your comments. Please see my responses inline.
>>>
>>> On Tue, Sep 11, 2012 at 1:16 PM, Xinliang David Li  
>>> wrote:
 Can you resend your patch in text form (also need to resolve the
 latest conflicts) so that it can be commented inline?
>>>
>>> I tried to include inline patch earlier but my message was bounced
>>> back from patches mailing list. I am trying it again.
>>>
 Please also provide as summary a more up-to-date description of
 1) Command line option syntax and semantics
>>>
>>> I added some documentation in the patch. Here are the relevant bits
>>> from invoke.texi.
>>>
>>> `-fdump-tree-SWITCH-OPTIONS=FILENAME'
>>>  Control the dumping at various stages of processing the
>>>  intermediate language tree to a file.  The file name is generated
>>>  by appending a switch-specific suffix to the source file name, and
>>>  the file is created in the same directory as the output file. In
>>>  case of `=FILENAME' option, the dump is output on the given file
>>>  instead of the auto named dump files.
>>>  ...
>>>
>>> `=FILENAME'
>>>   Instead of an auto named dump file, output into the given file
>>>   name. The file names `stdout' and `stderr' are treated
>>>   specially and are considered already open standard streams.
>>>   For example,
>>>
>>>gcc -O2 -ftree-vectorize -fdump-tree-vect-details=foo.dump
>>> -fdump-tree-pre=stderr file.c
>>>
>>>   outputs vectorizer dump into `foo.dump', while the PRE dump
>>>   is output on to `stderr'. If two conflicting dump filenames
>>>   are given for the same pass, then the latter option
>>>   overrides the earlier one.
>>>
>>> `-fopt-info-PASS'
>>> `-fopt-info-PASS-OPTIONS'
>>> `-fopt-info-PASS-OPTIONS=FILENAME'
>>>  Controls optimization dumps from various passes. If the `-OPTIONS'
>>>  form is used, OPTIONS is a list of `-' separated options which
>>>  controls the details of the dump.  If OPTIONS is not specified, it
>>>  defaults to `optimized'. If the FILENAME is not specified, it
>>>  defaults to `stderr'. Note that the output FILENAME will be
>>>  overwritten in case of multiple translation units. If a combined
>>>  output from multiple the translation units is desired, `stderr'
>>>  should be used instead.
>>>
>>>  The PASS could be one of the tree or rtl passes. The following
>>>  options are available
>>
>> I don't like that we have -PASS here.  That makes it awfully similar
>> to -fdump-PASS-OPTIONS=FILENAME.  Are we merely having
>> -fopt-info because OPTIONS are "different"?
>
>
> Having PASS is useful to do filtering. But as your said, the option
> design here is very much oriented towards developers not end users
> which fopt-info is also intended for.

Just to add a comment here - -fopt-info is _only_ targeted at end users.
Developers can use -fdump-tree-XXX=stderr now (which, with the correct
pass / flags should produce identical output to -fopt-info - at least that
was the whole point with the re-design of the dump API - to make it
possible to implement -fopt-info in a way that it simply provides a nice
interface to end-users to our existing dumping information.

If it doesn't work like that right now we should make it work this way.

Richard.

Re: [PATCH] Combine location with block using block_locations

2012-09-13 Thread Jan Hubicka

> Hi,
> 
> On Wed, Sep 12, 2012 at 04:17:45PM +0200, Michael Matz wrote:
> > Hi,
> > 
> > On Wed, 12 Sep 2012, Michael Matz wrote:
> > 
> > > > Hm, but we shouldn't end up streaming any BLOCKs at this point (nor 
> > > > local TYPE_DECLs).  Those are supposed to be in the local function 
> > > > sections only where no fixup for prevailing decls happens.
> > > 
> > > That's true, something is fishy with the patch, will try to investigate.
> > 
> > ipa-prop creates the problem.  Its tree mapping can contain expressions, 
> > expressions can have locations, locations now have blocks.  The tree maps 
> > are stored as part of jump functions, and hence as part of node summaries.  
> > Node summaries are global, hence blocks, and therefore block vars can be 
> > placed in the global blob.
> > 
> > That's not supposed to happen.  The patch below fixes this instance of the 
> > problem and makes the testcase work with Dehaos patch with the 
> > LTO_NO_PREVAIL call added back in.
> > 
> > 
> > Ciao,
> > Michael.
> > 
> > Index: lto-cgraph.c
> > ===
> > --- lto-cgraph.c(revision 190803)
> > +++ lto-cgraph.c(working copy)
> > @@ -1373,6 +1373,7 @@ output_node_opt_summary (struct output_b
> >   mechanism to store function local declarations into summaries.  */
> >gcc_assert (parm);
> >streamer_write_uhwi (ob, parm_num);
> > +  gcc_assert (IS_UNKNOWN_LOCATION (EXPR_LOCATION (map->new_tree)));
> >stream_write_tree (ob, map->new_tree, true);
> >bp = bitpack_create (ob->main_stream);
> >bp_pack_value (&bp, map->replace_p, 1);
> > Index: ipa-prop.c
> > ===
> > --- ipa-prop.c  (revision 190803)
> > +++ ipa-prop.c  (working copy)
> > @@ -1378,7 +1378,11 @@ ipa_compute_jump_functions_for_edge (str
> >tree arg = gimple_call_arg (call, n);
> >  
> >if (is_gimple_ip_invariant (arg))
> > -   ipa_set_jf_constant (jfunc, arg);
> > +   {
> > + arg = unshare_expr (arg);
> > + SET_EXPR_LOCATION (arg, UNKNOWN_LOCATION);
> > + ipa_set_jf_constant (jfunc, arg);
> > +   }
> >else if (!is_gimple_reg_type (TREE_TYPE (arg))
> >&& TREE_CODE (arg) == PARM_DECL)
> 
> Perhaps it would be better if ipa_set_jf_constant did that, just in
> case we ever add another caller?  Note that arithmetic functions also
> have their second operand tree stored in them and so perhaps
> ipa_set_jf_arith_pass_through should do the same.
> 
> And I it is also necessary to do the same thing at the end of
> determine_known_aggregate_parts, i.e. before assignment to
> item->value.  I can post a separate patch if necessary.

Yes, this seem resonable thing to do. Patchees for this are preapproved.
> 
> I wasn't following this thread but I hope that streaming types does
> not cause this problem.  If they do, there are quite a few in various
> jump functions and indirect call graph edges.

We probably ought to ban streaming in BLOCK_DECL and other beasts that are
not expected at WPA stage.

Concerning Richi's suggestion to move jump functions completely away from
trees, I am really not 100% sure how good idea it is in long term. As IPA
analysis are getting more accurate we will stream more and more expressions.  I
am not convinced it makes sense to reinvent way of representing them rather
than fixing what we have.

Honza

[Patch, i386 committed] RE: [PATCH,i386] Enable prefetchw in processor alias table for AMD targets

2012-09-13 Thread Kumar, Venkataramanan

Hi  Uros,  

Thank you  for the review comments.

Committed to trunk at http://gcc.gnu.org/viewcvs?view=revision&revision=191245

Regards,
Venkat.

-Original Message-
From: Uros Bizjak [mailto:ubiz...@gmail.com] 
Sent: Wednesday, September 12, 2012 9:14 PM
To: Kumar, Venkataramanan
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH,i386] Enable prefetchw in processor alias table for AMD 
targets

On Tue, Sep 11, 2012 at 11:03 AM,   wrote:
> Hi Maintainers,
>
> This patch enables "prefetchw" ISA in the processor alias table for targets 
> amdfam10,barcelona and bdver1,2 and btver1,2.
>
> GCC regression test passes with the patch.
>
> Ok for trunk?
>
> Change log:
>
> 2012-09-11  Venkataramanan Kumar  
>
> * config/i386/i386.c (processor_alias_table): Enable PTA_PRFCHW
> for targets amdfam10, barcelona, bdver1, bdver2, btver1 and btver2.

Please note that amdfam10 and barcelona are already generating prefetchw due to 
PTA_3DNOW flag, so these targets can be removed from the patch.

The patch is OK for mainline with that change. Please commit the patch.

Thanks,
Uros.

Re: [Patch ARM] big-endian support for Neon vext tests

2012-09-13 Thread Christophe Lyon

Ping?
http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00330.html

Christophe.


On 6 September 2012 00:14, Christophe Lyon  wrote:
> Hello,
>
> Although the recent optimization I have committed to use Neon vext
> instruction for suitable builtin_shuffle calls does not support
> big-endian yet, I have written a patch to the existing testcases such
> they now support big-endian mode.
>
> I think it's worth improving these tests since writing the right masks
> for big-endian (such that the program computes the same results as in
> little-endian) is not always straightforward.
>
> In particular:
> * I have added some comments in a few tests were it took me a while to
> find the right mask.
> * In the case of the test which is executed, I had to force the
> noinline attribute on the helper functions, otherwise the computed
> results are wrong in big-endian. It is probably an overkill workaround
> but it works :-)
>   I am going to file a bugzilla for this problem.
>
> I have checked that replacing calls to builtin_shuffle by the expected
> Neon vext variant produces the expected results in big-endian mode,
> and I arranged the big-endian masks to get the same results.
>
> Christophe.

Re: Merge C++ conversion into trunk (0/6 - Overview)

2012-09-13 Thread Paolo Bonzini

Il 13/09/2012 10:46, Jakub Jelinek ha scritto:
>> > # Remove the -O2: for historical reasons, unless bootstrapping we prefer   
>> >   
>> > # optimizations to be activated explicitly by the toplevel.
>> >   
>> > case "$CC" in
>> >   */prev-gcc/xgcc*) ;;
>> >   *) CFLAGS=`echo $CFLAGS | sed "s/-O[[s0-9]]* *//" ` ;;
>> > esac
>> > AC_SUBST(CFLAGS)
>> > 
>> > in configure.ac does this.  I think if CXXFLAGS is also so done, we'd gain 
>> > parity.
> Can we get this change in?  The current state is terribly annoying.

Yes, please go ahead.

Paolo

1 2 >

1 - 100 of 106 matches

Mail list logo