date:20110928

[arm-embedded] Backport mainline r179330

2011-09-28 Thread Ye Joey

Backport mainline r179330 to ARM/embedded-4_6-branch

Committed.

2011-09-29  Joey Ye  

Backport r179330 from mainline
2011-09-29  Jiangning Liu  

* gcc/testsuite/gcc.dg/tree-ssa/predcom-1.c: Explicitly turn on
loop unroll and set max unroll times to 8.
* gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c: Likewise.
* gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c: Likewise.
* gcc/testsuite/gcc.dg/tree-ssa/predcom-4.c: Likewise.
* gcc/testsuite/gcc.dg/tree-ssa/predcom-5.c: Likewise.

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Michael Zolotukhin

> Sorry what I meant is that it would be bad if -mtune=corei7(-avx)? was
> slower than generic.
For now, -mtune=corei7 is triggering use of generic cost-table (I'm
not sure about corei7-avx, but assume the same) - so it won't be
slower.

> Adding new tables shouldn't be very difficult, even if they are the same
> as generic.
Yes, that should be quite easy, but this isn't 'must-have' for the
current patch (and it's already very huge). This could be done
independently from the work on memfunctions inlining.

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Michael Zolotukhin

> Michael,
>Did you bootstrap with --enable-checking=yes? I am seeing the bootstrap
> failure...
I checked bootstrap, specs and 'make check' with the complete patch.
Separate patches for ME and BE were only tested for build (no
bootstrap) and 'make check'. I think it's better to apply the complete
patch, but review the separate patches (to make it easier).

> ps There also seems to be common sections in the memfunc-mid.patch and 
> memfunc-be.patch patches.
That's true, some new routines from middle-end are used in back-end
changes - I couldn't separate the patches in other way without
significant changes in them.


On 29 September 2011 01:51, Jack Howarth  wrote:
> On Wed, Sep 28, 2011 at 05:33:23PM +0400, Michael Zolotukhin wrote:
>> >   It appears that part 1 of the patch wasn't really attached.
>> Thanks, resending.
>
> Michael,
>    Did you bootstrap with --enable-checking=yes? I am seeing the bootstrap
> failure...
>
> /sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/./prev-gcc/g++ 
> -B/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/./prev-gcc/ 
> -B/sw/lib/gcc4.7/x86_64-apple-darwin11.2.0/bin/ -nostdinc++ 
> -B/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/prev-x86_64-apple-darwin11.2.0/libstdc++-v3/src/.libs
>  
> -B/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/prev-x86_64-apple-darwin11.2.0/libstdc++-v3/libsupc++/.libs
>  
> -I/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/prev-x86_64-apple-darwin11.2.0/libstdc++-v3/include/x86_64-apple-darwin11.2.0
>  
> -I/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/prev-x86_64-apple-darwin11.2.0/libstdc++-v3/include
>  -I/sw/src/fink.build/gcc47-4.7.0-1/gcc-4.7-20110927/libstdc++-v3/libsupc++ 
> -L/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/prev-x86_64-apple-darwin11.2.0/libstdc++-v3/src/.libs
>  
> -L/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/prev-x86_64-apple-darwin11.2.0/libstdc++-v3/libsupc++/.libs
>  -c   -g -O2 -mdynamic-no-pic -gtoggle -DIN_GCC   -W -Wall -Wwrite-strings 
> -Wcast-qual -Wmissing-format-attribute -pedantic -Wno-long-long 
> -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common  
> -DHAVE_CONFIG_H -I. -I. -I../../gcc-4.7-20110927/gcc 
> -I../../gcc-4.7-20110927/gcc/. -I../../gcc-4.7-20110927/gcc/../include 
> -I../../gcc-4.7-20110927/gcc/../libcpp/include -I/sw/include -I/sw/include  
> -I../../gcc-4.7-20110927/gcc/../libdecnumber 
> -I../../gcc-4.7-20110927/gcc/../libdecnumber/dpd -I../libdecnumber 
> -I/sw/include  -I/sw/include -DCLOOG_INT_GMP -DCLOOG_ORG -I/sw/include 
> ../../gcc-4.7-20110927/gcc/emit-rtl.c -o emit-rtl.o
> ../../gcc-4.7-20110927/gcc/emit-rtl.c: In function ‘rtx_def* 
> adjust_address_1(rtx, machine_mode, long int, int, int)’:
> ../../gcc-4.7-20110927/gcc/emit-rtl.c:2060:26: error: unused variable 
> ‘max_align’ [-Werror=unused-variable]
> cc1plus: all warnings being treated as errors
>
> on x86_64-apple-darwin11 with your patches.
>          Jack
> ps There also seems to be common sections in the memfunc-mid.patch and 
> memfunc-be.patch patches.
>



-- 
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.

[PATCH] Respin sparc pixel-compare patterns using iterators.

2011-09-28 Thread David Miller


This is heavily inspired by a proof-of-concept patch David
Bremner sent to me the other week.

Committed to trunk.

gcc/

* config/sparc/sparc.md (UNSPEC_FCMPLE, UNSPEC_FCMPNE,
UNSPEC_FCMPGT, UNSPEC_FCMPEQ): Delete and reduce to...
(UNSPEC_FCMP): New unspec.
(gcond): New code iterator.
(gcond_name): New code attr.
(GCM): New mode iterator.
(gcm_name): New mode attr.
(fcmp{le,ne,gt,eq}{16,32}_vis): Reimplement using iterators.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@179329 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog |   11 +
 gcc/config/sparc/sparc.md |   90 +
 2 files changed, 21 insertions(+), 80 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 2aae5aa..e853b15 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,14 @@
+2011-09-28  David S. Miller  
+
+   * config/sparc/sparc.md (UNSPEC_FCMPLE, UNSPEC_FCMPNE,
+   UNSPEC_FCMPGT, UNSPEC_FCMPEQ): Delete and reduce to...
+   (UNSPEC_FCMP): New unspec.
+   (gcond): New code iterator.
+   (gcond_name): New code attr.
+   (GCM): New mode iterator.
+   (gcm_name): New mode attr.
+   (fcmp{le,ne,gt,eq}{16,32}_vis): Reimplement using iterators.
+
 2011-09-28  Oleg Endo  
 
PR target/49486
diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index 6e38298..dfc5559 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -58,7 +58,7 @@
(UNSPEC_MUL8UL  46)
(UNSPEC_MULDUL  47)
(UNSPEC_ALIGNDATA   48)
-
+   (UNSPEC_FCMP49)
(UNSPEC_PDIST   50)
(UNSPEC_EDGE8   51)
(UNSPEC_EDGE8L  52)
@@ -69,11 +69,6 @@
 
(UNSPEC_SP_SET  60)
(UNSPEC_SP_TEST 61)
-
-   (UNSPEC_FCMPLE  70)
-   (UNSPEC_FCMPNE  71)
-   (UNSPEC_FCMPGT  72)
-   (UNSPEC_FCMPEQ  73)
   ])
 
 (define_constants
@@ -8149,83 +8144,18 @@
   "edge32l\t%r1, %r2, %0"
   [(set_attr "type" "edge")])
 
-(define_insn "fcmple16_vis"
-  [(set (match_operand:P 0 "register_operand" "=r")
-   (unspec:P [(match_operand:V4HI 1 "register_operand" "e")
-  (match_operand:V4HI 2 "register_operand" "e")]
-UNSPEC_FCMPLE))]
-  "TARGET_VIS"
-  "fcmple16\t%1, %2, %0"
-  [(set_attr "type" "fpmul")
-   (set_attr "fptype" "double")])
-
-(define_insn "fcmple32_vis"
-  [(set (match_operand:P 0 "register_operand" "=r")
-   (unspec:P [(match_operand:V2SI 1 "register_operand" "e")
-  (match_operand:V2SI 2 "register_operand" "e")]
-UNSPEC_FCMPLE))]
-  "TARGET_VIS"
-  "fcmple32\t%1, %2, %0"
-  [(set_attr "type" "fpmul")
-   (set_attr "fptype" "double")])
-
-(define_insn "fcmpne16_vis"
-  [(set (match_operand:P 0 "register_operand" "=r")
-   (unspec:P [(match_operand:V4HI 1 "register_operand" "e")
-  (match_operand:V4HI 2 "register_operand" "e")]
-UNSPEC_FCMPNE))]
-  "TARGET_VIS"
-  "fcmpne16\t%1, %2, %0"
-  [(set_attr "type" "fpmul")
-   (set_attr "fptype" "double")])
-
-(define_insn "fcmpne32_vis"
-  [(set (match_operand:P 0 "register_operand" "=r")
-   (unspec:P [(match_operand:V2SI 1 "register_operand" "e")
-  (match_operand:V2SI 2 "register_operand" "e")]
-UNSPEC_FCMPNE))]
-  "TARGET_VIS"
-  "fcmpne32\t%1, %2, %0"
-  [(set_attr "type" "fpmul")
-   (set_attr "fptype" "double")])
-
-(define_insn "fcmpgt16_vis"
-  [(set (match_operand:P 0 "register_operand" "=r")
-   (unspec:P [(match_operand:V4HI 1 "register_operand" "e")
-  (match_operand:V4HI 2 "register_operand" "e")]
-UNSPEC_FCMPGT))]
-  "TARGET_VIS"
-  "fcmpgt16\t%1, %2, %0"
-  [(set_attr "type" "fpmul")
-   (set_attr "fptype" "double")])
-
-(define_insn "fcmpgt32_vis"
-  [(set (match_operand:P 0 "register_operand" "=r")
-   (unspec:P [(match_operand:V2SI 1 "register_operand" "e")
-  (match_operand:V2SI 2 "register_operand" "e")]
-UNSPEC_FCMPGT))]
-  "TARGET_VIS"
-  "fcmpgt32\t%1, %2, %0"
-  [(set_attr "type" "fpmul")
-   (set_attr "fptype" "double")])
-
-(define_insn "fcmpeq16_vis"
-  [(set (match_operand:P 0 "register_operand" "=r")
-   (unspec:P [(match_operand:V4HI 1 "register_operand" "e")
-  (match_operand:V4HI 2 "register_operand" "e")]
-UNSPEC_FCMPEQ))]
-  "TARGET_VIS"
-  "fcmpeq16\t%1, %2, %0"
-  [(set_attr "type" "fpmul")
-   (set_attr "fptype" "double")])
+(define_code_iterator gcond [le ne gt eq])
+(define_code_attr gcond_name [(le "le") (ne "ne") (gt "gt") (eq "eq")])
+(define_mode_iterator GCM [V4HI V2SI])
+(define_mode_attr gcm_name [(V4HI "16") (V2SI "32")])
 
-(define_insn "fcmpeq32_vis"
+(define_insn "fcmp_vis"
   [(set (match_operand:P 0 "register_operand" "=r")
-   (unspec:P [(match_operand:V2SI 1 "register_operand" "e")
-   (match_operand

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-28 Thread Jiangning Liu



> -Original Message-
> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> Sent: Wednesday, September 28, 2011 5:56 PM
> To: Jiangning Liu
> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> 
> On Wed, Sep 28, 2011 at 11:40 AM, Jiangning Liu 
> wrote:
> >
> >
> >> -Original Message-
> >> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> >> Sent: Wednesday, September 28, 2011 5:20 PM
> >> To: Jiangning Liu
> >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> >>
> >> On Wed, Sep 28, 2011 at 11:10 AM, Jiangning Liu
> 
> >> wrote:
> >> >
> >> >
> >> >> -Original Message-
> >> >> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> >> >> Sent: Wednesday, September 28, 2011 4:39 PM
> >> >> To: Jiangning Liu
> >> >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
> >> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> >> >>
> >> >> On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu
> >> 
> >> >> wrote:
> >> >> >
> >> >> >
> >> >> >> -Original Message-
> >> >> >> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> >> >> >> Sent: Tuesday, September 27, 2011 3:41 PM
> >> >> >> To: Jiangning Liu
> >> >> >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
> >> >> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> >> >> >>
> >> >> >> On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu
> >> >> 
> >> >> >> wrote:
> >> >> >> >> Think of it this way.  What the IR says is there is no
> barrier
> >> >> >> between
> >> >> >> >> those moves.  You either have an implicit barrier (which is
> >> what
> >> >> you
> >> >> >> >> are proposing) or you have it explicitly.  I think we all
> >> rather
> >> >> >> have
> >> >> >> >> more things explicit rather than implicit in the IR.  And
> that
> >> >> has
> >> >> >> >> been the overall feeling for a few years now.
> >> >> >> >>
> >> >> >> >
> >> >> >> > Sorry, I'm afraid I can't agree with you. Instead, I think
> >> using
> >> >> >> barrier to describe this kind of dependence is a kind of
> implicit
> >> >> >> method. Please note that this is not an usual data dependence
> >> issue.
> >> >> >> The stack pointer change doesn't have any dependence with
> memory
> >> >> access
> >> >> >> at all.
> >> >> >>
> >> >> >> It is similar to atomic instructions that require being an
> >> >> >> optimization/memory barrier.  Sure it is not a usual data
> >> dependence
> >> >> >> (otherwise we would handle
> >> >> >> it already), so the targets have to explicitly express the
> >> >> dependence
> >> >> >> somehow, for which we only have barriers right now.
> >> >> >>
> >> >> >
> >> >> > Richard,
> >> >> >
> >> >> > Thanks for your explanation. It's explicit to back-end, while
> it's
> >> >> implicit
> >> >> > to scheduler in middle end, because barrier can decide
> dependence
> >> in
> >> >> > scheduler but barrier can be generated from several different
> >> >> scenarios.
> >> >> > It's unsafe and prone to introduce bug if any one of the
> scenarios
> >> >> requiring
> >> >> > generating barriers is being missed in back-end.
> >> >> >
> >> >> > Between middle-end and back-end, we should have interfaces that
> is
> >> >> easy to
> >> >> > be implemented by back-end. After all, middle-end itself can't
> >> >> consist of a
> >> >> > compiler, and vice versa. Back-end needs middle-end's help to
> make
> >> >> sure
> >> >> > back-end is easy to be implemented and reduce the possibility
> of
> >> >> introducing
> >> >> > bugs.
> >> >> >
> >> >> > Without an explicit hook as I'm proposing, back-end
> implementers
> >> have
> >> >> to
> >> >> > clearly know all scenarios of generating barrier very clearly,
> >> >> because there
> >> >> > isn't any code tips and comments in middle end telling back-end
> >> the
> >> >> list of
> >> >> > all scenarios on generating barriers.
> >> >> >
> >> >> > Yes, barrier is a perfect interface for scheduler in theory.
> But
> >> from
> >> >> > engineering point of view, I think it's better to explicitly
> >> define
> >> >> an
> >> >> > interface to describe stack red zone and inform back-end, or
> vice
> >> >> versa. Not
> >> >> > like computer, people is easy to make mistake if you don't tell
> >> them.
> >> >> On
> >> >> > this bug, the fact is over the years different back-ends made
> >> similar
> >> >> bugs.
> >> >> >
> >> >> > GCC is really a perfect platform on building new ports, and I
> saw
> >> a
> >> >> lot of
> >> >> > new back-ends. The current middle end is unsafe, if port
> doesn't
> >> >> support
> >> >> > stack red zone and back-ends doesn't generate barrier for it.
> Why
> >> >> can't we
> >> >> > explicitly clarify this in compiler code between middle end and
> >> back
> >> >> end?
> >> >> > What if any other back-end (new or old) NOT supporting stack
> red
> >> zone
> >> >> > exposing the similar bug again?
> >> >>
> >> >> There are gazillion things you have to explicitly get right in

[arm-embedded][PR38644] Fix stack red zone.

2011-09-28 Thread Terry Guo

Committed to ARM/embedded-4_6-branch.

2011-09-28  Jiangning Liu  

   PR rtl-optimization/38644
   * config/i386/i386.c (ix86_stack_using_red_zone): Change inline
   to be extern.
   (TARGET_STACK_USING_RED_ZONE): New.
   * config/rs6000/rs6000.c (rs6000_stack_using_red_zone): New.
   (TARGET_STACK_USING_RED_ZONE): New.
   (offset_below_red_zone_p): Change to use new hook
   TARGET_STACK_USING_RED_ZONE.
   * doc/tm.texi (TARGET_STACK_USING_RED_ZONE): New.
   * doc/tm.texi.in (TARGET_STACK_USING_RED_ZONE): New.
   * sched-deps.c (sched_analyze_1): If the stack pointer is being
   modified and stack red zone is not supported for ports, flush out
   all memory references as they may become invalid if moved across
   the stack adjustment.
   * target.def (stack_using_red_zone): New.
   * testsuite/gcc.target/arm/stack-red-zone.c: New.

Re: [PATCH 3/7] Emit macro expansion related diagnostics

2011-09-28 Thread Jason Merrill


On 09/27/2011 01:52 PM, Dodji Seketeli wrote:

+ Remember we are at the expansion point of MACRO.  Each xI is the
+ location of the Ith token of the replacement-list. Now it gets
+ confusing. the xI is the location of the Ith token of the
+ replacement-list at the macro *definition* point. Not at the
+ macro replacement point. Okay, let's try to explain this below.


This should be yI.


+ The token '+' has two locations, so to speak. One in the context
+ of the macro *expansion* of PLUS in #2 and one in the context of
+ the macro *definition* of PLUS in #1. These two locations are
+ encoded in the the latter context, somehow in the xI we are
+ talking about.


The location of '+' in #2 is not encoded in xI or yI, we reach it 
through the expansion point location of the macro.  The location in #1 
is yI (and xI, in this case).



+ xI is roughly the index of the token inside the replacement-list
+ at the expansion point. So for '+', it's index would then be 1


"its"


+ [The index of token '1' would be 0 and the index of token 2 would
+ be 1]. So if '+' is our current xI, it is actualy an x1.


Are we still talking about #1 here?  It looks to me like the index of 
"1" would be 2, the index of "+" would be 4, and the index of token "2" 
would be 6.  I bet PLUS used to just be "A + B", and this section of 
comment didn't get updated when it changed.


Keep changing xI to yI.


+ Now what's the y1 then? Remember, we said macro_locations is an
+ array of pairs (xI,yI). We now know what the xI is, now let's
+ look at the yI.


xI allows us to find where the token was actually written.  If the 
current macro context is also the spelling location of the token (e.g. 
#1 for "+"), then xI is the same as yI, i.e. the source location of that 
token.


If the current macro context is not the spelling location of the token 
(e.g. #0 or #1 for "1"), then xI is the location outside the current 
macro context.  For "1" in #0, this the location of "1" in #1, which is 
a virtual location.  For "1" in #1, this is the location of "1" in #2, 
which is a source location.



+   * If LRK is set to LRK_MACRO_EXPANSION_POINT
+   ---
+
+   The virtual location is resolved to the location to the locus of
+   the expansion point of the macro.


The first macro expansion point that led to this macro expansion.


+   * If LRK is set to LRK_MACRO_DEFINITION_LOCATION
+   --


The virtual location is resolved to the locus of the token in the 
context of the macro definition.



+   If LOC is the locus of a token that is an argument of a
+   function-like macro [replacing a parameter in the replacement list
+   of the macro] the virtual location is resolved to the locus of the
+   parameter that is replaced, in the context of the definition of the
+   macro.
+
+   If LOC is the locus of a token that is not an argument of a
+   function-like macro, then the function behaves as if LRK was set to
+   LRK_SPELLING_LOCATION.


(and then keep these two paragraphs)


+   Finally, if SPELLING_LOC is not NULL, *RESULTING_LOC is set to the
+   location to which LOC was resolved


SPELLING_LOC doesn't exist anymore.


+   ORIG_LOC is the orginal location of the token at the definition
+   point of the macro. If you read the extensive comments of struct
+   line_map_macro in line-map.h, this is the xI.
+
+   If the token is part of a macro argument, ORIG_PARM_REPLACEMENT_LOC
+   is the location of the point at wich the token (the argument)
+   replaces the macro parameter in the context of the relevant macro
+   definition. If you read the comments of struct line_map_macro in
+   line-map.h, this is the yI.  */
+
+source_location
+linemap_add_macro_token (const struct line_map *map,


ORIG_LOC is the location of the token outside this macro expansion.  If 
the token comes originally from the macro definition, it is the locus in 
the macro definition; otherwise it is a location in the caller of this 
macro expansion (which is a virtual location or a source location if the 
caller is itself a macro expansion or not).


MACRO_DEFINITION_LOC is the location in the macro definition, either of 
the token itself or of a macro parameter that it replaces.



+/* If LOCATION is the locus of a token that is an argument of a
+   function-like macro M and appears in the expansion of M, return the
+   locus of that argument in the context of the caller of M.  Note
+   that the caller of M is necessarily another macro.


Why is the caller of M necessarily another macro?  In the PLUS example 
above, if we have the location for "1" in #1, won't it give us the 
location of "1" in #2?



  The context of M is a macro definition.


What does this mean?


+/* Return the source line number corresponding to source location
+   LOCATION.  SET is the line map set LOCATION comes from.  If
+   LOCATION is the source location of token that is part

[C++ testcase, committed as obvious] PR 40145

2011-09-28 Thread Paolo Carlini


Hi,

tested x86_64-linux, committed.

Paolo.

//
2011-09-28  Paolo Carlini  

PR c++/40145
* g++.dg/ext/visibility/warn5.C: New.

Index: g++.dg/ext/visibility/warn5.C
===
--- g++.dg/ext/visibility/warn5.C   (revision 0)
+++ g++.dg/ext/visibility/warn5.C   (revision 0)
@@ -0,0 +1,11 @@
+// PR c++/40145
+// { dg-do compile }
+// { dg-require-visibility "" }
+// { dg-options "-fvisibility=hidden" }
+
+struct EditorInternalCommand { };
+
+static void createCommandMap()
+{
+struct CommandEntry { EditorInternalCommand command; };
+}

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread Jakub Jelinek

On Wed, Sep 28, 2011 at 06:43:04PM -0400, David Miller wrote:
> From: Mike Stump 
> Date: Wed, 28 Sep 2011 15:19:10 -0700
> 
> > If Android is safe in this respect, then, they can just turn it on,
> > and then force anyone porting software to their platform to `fix'
> > their code.
> 
> They'd have to then know to turn this option off when building the kernel,
> which does use such constructs extensively.
> 
> I think this whole idea has too many downsides to be considered seriously.
> 
> People write problems like this, lots of people.  It's a pervasive
> technique to encode boolean state into the low bits of a pointer or to
> represent special "token" pointers using integers such as "-1".

Or "1".  Just do
grep '\*)[[:blank:]]*1' *.[chS]
to see how often an integer value is stored into a pointer.
And it is not just void * pointers, it is struct cgraph_node *
or struct varpool_node * too and the pointed types there certainly
have higher alignment than 1.
Now repeat the same with google codesearch.

Jakub

Re: [pph] Prepare for mutation detection [2/3] (issue5142049)

2011-09-28 Thread Gabriel Charette

On Wed, Sep 28, 2011 at 5:31 PM, Diego Novillo  wrote:
> On Wed, Sep 28, 2011 at 17:23, Gabriel Charette  wrote:
>> More comments to come on [3/3], for now just a single comment below on
>> this specific patch:
>>
>>> diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c
>>> index 0bd4d64..b267833 100644
>>> --- a/gcc/cp/pph-streamer-in.c
>>> +++ b/gcc/cp/pph-streamer-in.c
>>> @@ -439,7 +439,10 @@ pph_in_cxx_binding_1 (pph_stream *stream)
>>>   if (marker == PPH_RECORD_END)
>>>     return NULL;
>>>   else if (pph_is_reference_marker (marker))
>>> -    return (cxx_binding *) pph_cache_get (&stream->cache, image_ix, ix, 
>>> marker);
>>> +    {
>>> +      pph_cache *cache = pph_cache_select (stream, marker, image_ix);
>>> +      return (cxx_binding *) pph_cache_get (cache, ix);
>>> +    }
>>
>> Seems like you replaced the pph_cache_get one liners with these
>> two-liners. Wouldn't a simple inline function be nicer for this?
>
> I call them separately.  Or do you mean a single call that combines
> them for the common case?
>

Yes that's what I mean, a single call that combines both, since that's
the common usage and I feel there should be as little cache code at
the top of every pph_* function (in particular, every time a new  pph
streaming function is added all that code needs to be "duplicated", so
the less code the better imo).

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread David Miller

From: Mike Stump 
Date: Wed, 28 Sep 2011 15:19:10 -0700

> If Android is safe in this respect, then, they can just turn it on,
> and then force anyone porting software to their platform to `fix'
> their code.

They'd have to then know to turn this option off when building the kernel,
which does use such constructs extensively.

I think this whole idea has too many downsides to be considered seriously.

People write problems like this, lots of people.  It's a pervasive
technique to encode boolean state into the low bits of a pointer or to
represent special "token" pointers using integers such as "-1".

Re: [PATCH] Add explicit VIS intrinsics for addition and subtraction.

2011-09-28 Thread Eric Botcazou

[Vlad, if you have a few minutes, would you mind having a look at the couple of 
questions at the end of the message?  Thanks in advance].

> No problem.

Here are the results of the investigation.  Pseudo 116 needs to be assigned a 
hard register.  It is used mostly in vector instructions so we would like it 
to be assigned a FP reg, but it is initialized in insn 2:

(insn 2 5 3 2 (set (reg/v:V4HI 116 [ a ])
(reg:V4HI 24 %i0 [ a ])) combined-1.c:7 93 {*movdf_insn_sp32_v9}
 (expr_list:REG_DEAD (reg:V4HI 24 %i0 [ a ])
(nil)))

so it ends up being assigned the (integer) argument register %i0 instead.  It 
used to be assigned a FP reg as expected with the GCC 4.6.x series.


The register class preference discovery is OK:

r116: preferred EXTRA_FP_REGS, alternative GENERAL_OR_EXTRA_FP_REGS, 
allocno GENERAL_OR_EXTRA_FP_REGS
a2 (r116,l0) best EXTRA_FP_REGS, allocno GENERAL_OR_EXTRA_FP_REGS

i.e. EXTRA_FP_REGS is "preferred"/"best".  Then it seems that this preference 
is dropped and only the class of the allocno, GENERAL_OR_EXTRA_FP_REGS, is 
handed down to the coloring stage.  By contrast, in the GCC 4.6 series, the 
cover_class of the allocno is EXTRA_FP_REGS.

The initial cost for %i0 is twice as high (24000) as the cost of FP regs.  But 
then it is reduced by 12000 when process_bb_node_for_hard_reg_moves sees insn 
2 above and then again by 12000 when process_regs_for_copy sees the same insn.
So, in the end, %i0 is given cost 0 and thus beats every other register.  This 
doesn't happen in the GCC 4.6 series because %i0 isn't in the cover_class.

This is at -O1.  At -O2, there is an extra pass at the discovery stage and it 
sets the class of the allocno to EXTRA_FP_REGS, like with the GCC 4.6 series, 
so a simple workaround is

Index: gcc.target/sparc/combined-1.c
===
--- gcc.target/sparc/combined-1.c   (revision 179316)
+++ gcc.target/sparc/combined-1.c   (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -mcpu=ultrasparc -mvis" } */
+/* { dg-options "-O2 -mcpu=ultrasparc -mvis" } */
 typedef short vec16 __attribute__((vector_size(8)));
 typedef int vec32 __attribute__((vector_size(8)));


Finally the couple of questions:

 1. Is it expected that the register class preference be dropped at -O1?

 2. Is it expected that a single insn be processed by 2 different mechanisms 
that independently halve the initial cost of a hard register?


-- 
Eric Botcazou

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread Tom de Vries

On 09/24/2011 11:31 AM, Richard Guenther wrote:
> On Tue, Sep 20, 2011 at 1:13 PM, Tom de Vries  wrote:
>> Hi Richard,
>>
>> I have a patch for PR43814. It introduces an option that assumes that 
>> function
>> arguments of pointer type are aligned, and uses that information in
>> tree-ssa-ccp. This enables the memcpy in pr43814-2.c to be inlined.
>>
>> I tested the patch successfully on-by-default on x86_64 and i686 (both gcc 
>> only
>> builds).
>>
>> I also tested the patch on-by-default for ARM (gcc/glibc build). The patch
>> generated wrong code for uselocale.c:
>> ...
>> glibc/locale/locale.h:
>> ...
>> /* This value can be passed to `uselocale' and may be returned by
>>   it. Passing this value to any other function has undefined behavior.  */
>> # define LC_GLOBAL_LOCALE   ((__locale_t) -1L)
>> ...
>> glibc/locale/uselocale.c:
>> ...
>> locale_t
>> __uselocale (locale_t newloc)
>> {
>>  locale_t oldloc = _NL_CURRENT_LOCALE;
>>
>>  if (newloc != NULL)
>>{
>>  const locale_t locobj
>>= newloc == LC_GLOBAL_LOCALE ? &_nl_global_locale : newloc;
>>
>> ...
>> The assumption that function arguments of pointer type are aligned, allowed 
>> the
>> test 'newloc == LC_GLOBAL_LOCALE' to evaluate to false.
>> But the usage of ((__locale_t) -1L) as function argument in uselocale 
>> violates
>> that assumption.
>>
>> Fixing the definition of LC_GLOBAL_LOCALE allowed the gcc tests to run 
>> without
>> regressions for ARM.
>>
>> Furthermore, the patch fixes ipa-sra-2.c and ipa-sra-6.c regressions on ARM,
>> discussed here:
>> - http://gcc.gnu.org/ml/gcc-patches/2011-08/msg00930.html
>> - http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00459.html
>>
>> But, since glibc uses this construct currently, the option is off-by-default 
>> for
>> now.
>>
>> OK for trunk?
> 
> No, I don't like to have an option to control this.  And given the issue
> you spotted it doesn't look like the best idea either.  This area in GCC
> is simply too fragile right now :/
> 
> It would be nice if we could extend IPA-CP to propagate alignment
> information though.
> 
> And at some point devise a reliable way for frontends to communicate
> alignment constraints the middle-end can rely on (well, yes, you could
> argue that's what TYPE_ALIGN is about, and I thought that maybe we
> can unconditionally rely on TYPE_ALIGN for pointer PARM_DECLs
> at least - your example shows we can't).
> 
> In the end I'd probably say the patch is ok without the option (thus
> turned on by default), but if LC_GLOBAL_LOCALE is part of the
> glibc ABI then we clearly can't do this.
> 

I removed the flag, and enabled the optimization now with the aligned attribute.

bootstrapped and tested on x86_64 (gcc only build), and build and reg-tested on
arm (gcc + glibc build), no issues found.

OK for trunk?

I intend to send a follow-up patch that introduces a target hook
function_pointers_aligned, that is false by default, and on by default for
-mandroid. I asked Google to test their Android codebase with the optimization
on by default. Would such a target hook be acceptable?

> Richard.
> 

Thanks,
- Tom

2011-09-29  Tom de Vries 

PR target/43814
* tree-ssa-ccp.c (get_align_value): New function, factored out of
get_value_from_alignment.
(get_value_from_alignment): Use get_align_value.
(get_value_for_expr): Use get_align_value to handle alignment of
function argument pointers with TYPE_USER_ALIGN.

* gcc/testsuite/gcc.dg/pr43814.c: New test.
* gcc/testsuite/gcc.target/arm/pr43814-2.c: New test.
Index: gcc/tree-cfgcleanup.c
===
--- gcc/tree-cfgcleanup.c	(revision 173703)
+++ gcc/tree-cfgcleanup.c	(working copy)
@@ -641,6 +641,552 @@ cleanup_omp_return (basic_block bb)
   return true;
 }
 
+/* Returns true if S contains (I1, I2).  */
+
+static bool
+int_int_splay_lookup (splay_tree s, unsigned int i1, unsigned int i2)
+{
+  splay_tree_node node;
+
+  if (s == NULL)
+return false;
+
+  node = splay_tree_lookup (s, i1);
+  return node && node->value == i2;
+}
+
+/* Attempts to insert (I1, I2) into *S.  Returns true if successful.
+   Allocates *S if necessary.  */
+
+static bool
+int_int_splay_insert (splay_tree *s, unsigned int i1 , unsigned int i2)
+{
+  if (*s != NULL)
+{
+  /* Check for existing element, which would otherwise be silently
+	 overwritten by splay_tree_insert.  */
+  if (splay_tree_lookup (*s, i1))
+	return false;
+}
+  else
+*s = splay_tree_new (splay_tree_compare_ints, 0, 0);
+
+  splay_tree_insert (*s, i1, i2);
+  return true;
+}
+
+/* Returns 0 if (NODE->value, NODE->key) is an element of S.  Otherwise,
+   returns 1.  */
+
+static int
+int_int_splay_node_contained_in (splay_tree_node node, void *s)
+{
+  splay_tree_node snode = splay_tree_lookup ((splay_tree)s, node->key);
+  return (!snode || node->value != snode->value) ? 1 : 0;
+}
+
+/* Returns true if all elements of S1 are als

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread Mike Stump

On Sep 20, 2011, at 4:13 AM, Tom de Vries wrote:
> I have a patch for PR43814. It introduces an option that assumes that function
> arguments of pointer type are aligned, and uses that information in
> tree-ssa-ccp. This enables the memcpy in pr43814-2.c to be inlined.

I'm not a huge fan of an option that is very hard to use.  I think this option 
would be hard to use.  I'd rather have a port just assert that no code will be 
compiled that is weird in this way, then the front end can check for weird 
values on int to pointer conversions with constants and complain about the 
code, if they tried it.  If Android is safe in this respect, then, they can 
just turn it on, and then force anyone porting software to their platform to 
`fix' their code.  For systems that are clean, and new systems, we can 
recommend they set the option on.  For legacy systems that don't want to change 
or just want to compile legacy software, well, they can opt out.

[PING] Re: [PATCH] Fix PR50183

2011-09-28 Thread William J. Schmidt

Hi there,

Ping.  I'm seeking approval for this fix on trunk and 4_6-branch.
Thanks!

Bill

On Tue, 2011-09-13 at 17:55 -0500, William J. Schmidt wrote:
> Greetings,
> 
> The code to build scops (static control parts) for graphite first
> rewrites loops into canonical loop-closed SSA form.  PR50183 identifies
> a scenario where the results do not fulfill all required invariants of
> this form.  In particular, a value defined inside a loop and used
> outside that loop must reach exactly one definition, which must be a
> single-argument PHI node called a close-phi.  When nested loops exist,
> it is possible that, following the rewrite, a definition may reach two
> close-phis.  This patch corrects that problem.
> 
> The problem arises because loops are processed from outside in.  While
> processing a loop, duplicate close-phis are eliminated.  However,
> eliminating duplicate close-phis for an inner loop can sometimes create
> duplicate close-phis for an already-processed outer loop.  This patch
> detects when this may have occurred and repeats the removal of duplicate
> close-phis as necessary.
> 
> The problem was noted on ibm/4_6-branch and 4_6-branch; it is apparently
> latent on trunk.  The same patch can be applied to all three branches.
> 
> Bootstrapped and regression-tested on powerpc64-linux.  OK to commit to
> these three branches?
> 
> Thanks,
> Bill
> 
> 
> 2011-09-13  Bill Schmidt  
> 
>   * graphite-scop-detection.c (make_close_phi_nodes_unique):  New
>   forward declaration.
>   (remove_duplicate_close_phi): Detect and repair creation of
>   duplicate close-phis for a containing loop.
> 
> 
> Index: gcc/graphite-scop-detection.c
> ===
> --- gcc/graphite-scop-detection.c (revision 178829)
> +++ gcc/graphite-scop-detection.c (working copy)
> @@ -30,6 +30,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-pass.h"
>  #include "sese.h"
> 
> +/* Forward declarations.  */
> +static void make_close_phi_nodes_unique (basic_block);
> +
>  #ifdef HAVE_cloog
>  #include "ppl_c.h"
>  #include "graphite-ppl.h"
> @@ -1231,6 +1234,13 @@ remove_duplicate_close_phi (gimple phi, gimple_stm
>   SET_USE (use_p, res);
> 
>update_stmt (use_stmt);
> +  
> +  /* It is possible that we just created a duplicate close-phi
> +  for an already-processed containing loop.  Check for this
> +  case and clean it up.  */
> +  if (gimple_code (use_stmt) == GIMPLE_PHI
> +   && gimple_phi_num_args (use_stmt) == 1)
> + make_close_phi_nodes_unique (gimple_bb (use_stmt));
>  }
> 
>remove_phi_node (gsi, true);
> 
>

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Jack Howarth

On Wed, Sep 28, 2011 at 05:33:23PM +0400, Michael Zolotukhin wrote:
> >   It appears that part 1 of the patch wasn't really attached.
> Thanks, resending.

Michael,
Did you bootstrap with --enable-checking=yes? I am seeing the bootstrap
failure...

/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/./prev-gcc/g++ 
-B/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/./prev-gcc/ 
-B/sw/lib/gcc4.7/x86_64-apple-darwin11.2.0/bin/ -nostdinc++ 
-B/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/prev-x86_64-apple-darwin11.2.0/libstdc++-v3/src/.libs
 
-B/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/prev-x86_64-apple-darwin11.2.0/libstdc++-v3/libsupc++/.libs
 
-I/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/prev-x86_64-apple-darwin11.2.0/libstdc++-v3/include/x86_64-apple-darwin11.2.0
 
-I/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/prev-x86_64-apple-darwin11.2.0/libstdc++-v3/include
 -I/sw/src/fink.build/gcc47-4.7.0-1/gcc-4.7-20110927/libstdc++-v3/libsupc++ 
-L/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/prev-x86_64-apple-darwin11.2.0/libstdc++-v3/src/.libs
 
-L/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/prev-x86_64-apple-darwin11.2.0/libstdc++-v3/libsupc++/.libs
 -c   -g -O2 -mdynamic-no-pic -gtoggle -DIN_GCC   -W -Wall -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -pedantic -Wno-long-long 
-Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common  
-DHAVE_CONFIG_H -I. -I. -I../../gcc-4.7-20110927/gcc 
-I../../gcc-4.7-20110927/gcc/. -I../../gcc-4.7-20110927/gcc/../include 
-I../../gcc-4.7-20110927/gcc/../libcpp/include -I/sw/include -I/sw/include  
-I../../gcc-4.7-20110927/gcc/../libdecnumber 
-I../../gcc-4.7-20110927/gcc/../libdecnumber/dpd -I../libdecnumber 
-I/sw/include  -I/sw/include -DCLOOG_INT_GMP -DCLOOG_ORG -I/sw/include 
../../gcc-4.7-20110927/gcc/emit-rtl.c -o emit-rtl.o
../../gcc-4.7-20110927/gcc/emit-rtl.c: In function ‘rtx_def* 
adjust_address_1(rtx, machine_mode, long int, int, int)’:
../../gcc-4.7-20110927/gcc/emit-rtl.c:2060:26: error: unused variable 
‘max_align’ [-Werror=unused-variable]
cc1plus: all warnings being treated as errors

on x86_64-apple-darwin11 with your patches.
  Jack
ps There also seems to be common sections in the memfunc-mid.patch and 
memfunc-be.patch patches.

Re: [C++ Patch] PR 45278

2011-09-28 Thread Jason Merrill


OK.

Jason

Re: [SH] PR 49468 - Integer SI abs

2011-09-28 Thread Kaz Kojima

Oleg Endo  wrote:
> The attached patch and ChangeLog below should fix it.
> I have also added a test case for SI mode abs.

Thanks!  I've applied your patch as revision 179320.

Regards,
kaz

Re: [pph] Prepare for mutation detection [2/3] (issue5142049)

2011-09-28 Thread Diego Novillo

On Wed, Sep 28, 2011 at 17:23, Gabriel Charette  wrote:
> More comments to come on [3/3], for now just a single comment below on
> this specific patch:
>
>> diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c
>> index 0bd4d64..b267833 100644
>> --- a/gcc/cp/pph-streamer-in.c
>> +++ b/gcc/cp/pph-streamer-in.c
>> @@ -439,7 +439,10 @@ pph_in_cxx_binding_1 (pph_stream *stream)
>>   if (marker == PPH_RECORD_END)
>>     return NULL;
>>   else if (pph_is_reference_marker (marker))
>> -    return (cxx_binding *) pph_cache_get (&stream->cache, image_ix, ix, 
>> marker);
>> +    {
>> +      pph_cache *cache = pph_cache_select (stream, marker, image_ix);
>> +      return (cxx_binding *) pph_cache_get (cache, ix);
>> +    }
>
> Seems like you replaced the pph_cache_get one liners with these
> two-liners. Wouldn't a simple inline function be nicer for this?

I call them separately.  Or do you mean a single call that combines
them for the common case?

Re: [pph] Prepare for mutation detection [3/3] (issue5139052)

2011-09-28 Thread Gabriel Charette

Very nice!

I really like where this is heading :)!

I think it would be great to instrument this to know how many times we
need to use a PPH_RECORD_MREF, to avoid trashing the cache (i.e.
potentially there are specific cases where only a small field's value
changes and pickling the entire tree again is sub-optimal; if instead
we could detect those cases and simply output the required change
which could then be applied on the reader side it would potentially be
better... it is possible that we haven't been affected by these
specific cases up until now, but that they would all massively result
in PPH_RECORD_MREFs now (the instrumentation would also potentially
help us find some of those tricky mutation, if there are any caused by
other things than overloads, which we aren't aware of yet...just a
thought).

(or maybe even assert that those mutation are indeed overloads if we
think that's the only time it occurs??)

Cheers,
Gab

On Tue, Sep 27, 2011 at 1:03 PM, Diego Novillo  wrote:
>
> This finishes removing constants and builtins out of the cache.
> This time in a slightly more elegant way.
>
> The patch introduces a new version of pph_out_start_record exclusively
> for trees (pph_out_start_tree_record).  If the tree is not cacheable
> then it emits a PPH_RECORD_START_NO_CACHE record to indicate that the
> reader should not bother adding the tree to its cache.
>
> It also changes the semantics of pph_out_start_record.  It now returns
> true when the caller should do nothing else.
>
> We are now signing every DECL and TYPE tree we see, but doing nothing
> with this signature.  In the final patch this will change so we only
> sign DECLs/TYPEs after reading, if we are also generating a PPH image
> (pure readers do not need to care about mutations).
>
> Tested on x86_64.  Committed to branch.
>
>
>        * pph-streamer-in.c (pph_read_namespace_tree): Do not insert
>        constants in the cache.
>        * pph-streamer-out.c (pph_cache_should_handle): New.
>        (pph_out_start_ref_record): Factor out of ...
>        (pph_out_start_record): ... here.
>        Change return value meaning; true means 'all done'; false
>        means 'caller should write data out'.  Update all callers.
>        (pph_out_start_tree_record):  New.
>        (pph_write_builtin): Remove.  Update all users.
>        (pph_write_namespace_tree): Call pph_out_start_tree_record.
>        * pph-streamer.c (pph_cache_insert_at): Assert that we are
>        not trying to insert the same data more than once.
>        * pph-streamer.h (enum pph_record_marker): Add new value
>        PPH_RECORD_START_NO_CACHE.
>        (pph_in_record_marker): Handle it.
>
> diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c
> index b267833..8e7c772 100644
> --- a/gcc/cp/pph-streamer-in.c
> +++ b/gcc/cp/pph-streamer-in.c
> @@ -2137,7 +2137,6 @@ pph_read_namespace_tree (pph_stream *stream, tree 
> enclosing_namespace)
>       /* For integer constants we only need the type and its hi/low
>         words.  */
>       expr = streamer_read_integer_cst (ib, data_in);
> -      pph_cache_insert_at (&stream->cache, expr, ix);
>     }
>   else
>     {
> diff --git a/gcc/cp/pph-streamer-out.c b/gcc/cp/pph-streamer-out.c
> index 7157275..2f9fcae 100644
> --- a/gcc/cp/pph-streamer-out.c
> +++ b/gcc/cp/pph-streamer-out.c
> @@ -182,14 +182,56 @@ pph_flush_buffers (pph_stream *stream)
>  }
>
>
> -/* Start a new record in STREAM for data in DATA.  If DATA is NULL
> -   write an end-of-record marker and return false.  If DATA is not NULL
> -   and did not exist in the pickle cache, add it, write a
> -   start-of-record marker and return true.  If DATA existed in the
> -   cache, write a shared-record marker and return false.  */
> +/* Return true if tree node T should be added to the pickle cache.  */
>
>  static inline bool
> -pph_out_start_record (pph_stream *stream, void *data)
> +pph_cache_should_handle (tree t)
> +{
> +  if (t)
> +    {
> +      if (TREE_CODE (t) == INTEGER_CST)
> +        {
> +          /* With the exception of some special constants that are
> +            pointer-compared everywhere (e.g., integer_zero_node), we
> +            do not want constants in the cache.  Their physical
> +            representation is smaller than a cache reference record
> +            and they can also trick the cache in similar ways to
> +            builtins (read below).  */
> +          return false;
> +        }
> +      else if (streamer_handle_as_builtin_p (t))
> +        {
> +          /* We do not want builtins in the cache for two reasons:
> +
> +            - They never need pickling.  Much like pre-loaded tree
> +              nodes, builtins are pre-built by the compiler and
> +              accessible with their class and code.
> +
> +            - They can trick the cache.  When possible, user-provided
> +              functions are generally replaced by their builtin
> +              counterparts (e.g., strcmp, malloc, etc).  When this
> +

Re: [pph] Prepare for mutation detection [2/3] (issue5142049)

2011-09-28 Thread Gabriel Charette

More comments to come on [3/3], for now just a single comment below on
this specific patch:

> diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c
> index 0bd4d64..b267833 100644
> --- a/gcc/cp/pph-streamer-in.c
> +++ b/gcc/cp/pph-streamer-in.c
> @@ -439,7 +439,10 @@ pph_in_cxx_binding_1 (pph_stream *stream)
>   if (marker == PPH_RECORD_END)
>     return NULL;
>   else if (pph_is_reference_marker (marker))
> -    return (cxx_binding *) pph_cache_get (&stream->cache, image_ix, ix, 
> marker);
> +    {
> +      pph_cache *cache = pph_cache_select (stream, marker, image_ix);
> +      return (cxx_binding *) pph_cache_get (cache, ix);
> +    }

Seems like you replaced the pph_cache_get one liners with these
two-liners. Wouldn't a simple inline function be nicer for this?

Gab

Re: [C++ Patch] PR 45278

2011-09-28 Thread Paolo Carlini


On 09/28/2011 11:02 PM, Paolo Carlini wrote:

Hi,

here submitter remarks that, inconsistently with the documentation, 
with -Wextra the C++ front-end doesn't warn for ordered comparison of 
pointer with integer zero. Thus I simply adapted the warning in the C 
front-end. Is the patch Ok?

Better attaching the patch ;)

Paolo.
/cp
2011-09-28  Paolo Carlini  

PR c++/45278
* typeck.c (cp_build_binary_op): With -Wextra, warn for ordered
comparison of pointer with zero.

/testsuite
2011-09-28  Paolo Carlini  

PR c++/45278
* g++.dg/warn/Wextra-3.C: New.

Index: testsuite/g++.dg/warn/Wextra-3.C
===
--- testsuite/g++.dg/warn/Wextra-3.C(revision 0)
+++ testsuite/g++.dg/warn/Wextra-3.C(revision 0)
@@ -0,0 +1,9 @@
+// PR c++/45278
+// { dg-options "-Wextra" } 
+
+extern void* p;
+
+int f1() { return ( p <  0 ? 1 : 0 ); } // { dg-warning "ordered comparison" }
+int f2() { return ( p <= 0 ? 1 : 0 ); } // { dg-warning "ordered comparison" }
+int f3() { return ( p >  0 ? 1 : 0 ); } // { dg-warning "ordered comparison" }
+int f4() { return ( p >= 0 ? 1 : 0 ); } // { dg-warning "ordered comparison" }
Index: cp/typeck.c
===
--- cp/typeck.c (revision 179319)
+++ cp/typeck.c (working copy)
@@ -4189,9 +4189,19 @@ cp_build_binary_op (location_t location,
result_type = composite_pointer_type (type0, type1, op0, op1,
  CPO_COMPARISON, complain);
   else if (code0 == POINTER_TYPE && null_ptr_cst_p (op1))
-   result_type = type0;
+   {
+ result_type = type0;
+ if (extra_warnings && (complain & tf_warning))
+   warning (OPT_Wextra,
+"ordered comparison of pointer with integer zero");
+   }
   else if (code1 == POINTER_TYPE && null_ptr_cst_p (op0))
-   result_type = type1;
+   {
+ result_type = type1;
+ if (extra_warnings && (complain & tf_warning))
+   warning (OPT_Wextra,
+"ordered comparison of pointer with integer zero");
+   }
   else if (null_ptr_cst_p (op0) && null_ptr_cst_p (op1))
/* One of the operands must be of nullptr_t type.  */
 result_type = TREE_TYPE (nullptr_node);

Re: [PATCH 2/3] Use urandom to get random seed

2011-09-28 Thread Andi Kleen

> That depends upon the signal. If we got SIGCHLD or SIGWINCH, the call to read 
> probably
> gets EINTR, but the signal is ignored unless explicitly handled.

ignored signals do not cause EINTR no.

And I don't think either gcc.c nor toplev.c can get SIGCHLD at this point.

-Andi

[C++ Patch] PR 45278

2011-09-28 Thread Paolo Carlini


Hi,

here submitter remarks that, inconsistently with the documentation, with 
-Wextra the C++ front-end doesn't warn for ordered comparison of pointer 
with integer zero. Thus I simply adapted the warning in the C front-end. 
Is the patch Ok?


Tested x86_64-linux.

Paolo.

/

Re: [PATCH 2/7] Generate virtual locations for tokens

2011-09-28 Thread Jason Merrill


On 09/28/2011 03:15 PM, Dodji Seketeli wrote:

+set_arg_token (macro_arg *arg, const cpp_token *token,
+  source_location location, size_t index,
+  enum macro_arg_token_kind kind,
+  bool track_macro_exp_p)
+{
+  const cpp_token **token_ptr;
+  source_location *loc = NULL;
+
+  token_ptr =
+arg_token_ptr_at (arg, index, kind,
+ track_macro_exp_p ? &loc : NULL);

...

+  if (virt_location)
+{
+  if (kind == MACRO_ARG_TOKEN_NORMAL)
+   *virt_location = &arg->virt_locs[index];
+  else if (kind == MACRO_ARG_TOKEN_EXPANDED)
+   *virt_location = &arg->expanded_virt_locs[index];
+  else if (kind == MACRO_ARG_TOKEN_STRINGIFIED)
+   *virt_location =
+ (source_location *) &tokens_ptr[index]->src_loc;
+}


If we make this block conditional on arg->virt_locs being set, then we 
can pass &loc in unconditionally and don't need the track_macro_exp_p 
flag to set_arg_token.



Note that the gotos were put there also because we needed to get out
of the for (;;) loop, similarly to what the previous return statements
were doing; so by doing this doesn't we don't do get rid of the gotos.


Can't you use break instead?

Jason

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread Jakub Jelinek

On Wed, Sep 28, 2011 at 09:56:27PM +0200, Richard Guenther wrote:
> There is nothing like "very likely aligned" ;)  Note that what is new is

On non-strict aligned targets there is no reason not to have something like
"very likely aligned".  You would expand stores/loads as if it was aligned
in that case, and if it isn't, all you'd need to ensure from that is that
you don't derive properties about the pointer value from the "likely
aligned" info, only from alignment.

"Very likely aligned" is interesting to the vectorizer too, if it is very
likely something is sufficiently aligned, the vectorizer could decide to
assume the alignment in the vectorized loop and add the check for the
alignment to the loop guards.  In the likely case the vectorized loop would
be used (if other guards were true too), in the unlikely case it is
unaligned it would just use a slower loop.

> that we now no longer assume alignment by default (we did in the past)
> and that we derive properties about the pointer _value_ from alignment.
> 
> I think we can derive pointer values when we see dereferences, the
> code wouldn't be portable to strict-alignment targets otherwise.  We

But any references?  If you have
int foo (int *p)
{
  memcpy (p, "a", 1);
  return ((uintptr_t) p & 3) == 0;
}
then even if p isn't aligned, this could work even on strict aligned
targets.

Anyway, the arbitrary value in a pointer thing is much more important then
the rest, so having the dominating dereference test is very important.

Jakub

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread Richard Guenther

On Wed, Sep 28, 2011 at 9:13 PM, Jakub Jelinek  wrote:
> On Wed, Sep 28, 2011 at 03:00:09PM -0400, David Miller wrote:
>> From: Maxim Kuvyrkov 
>> Date: Thu, 29 Sep 2011 07:58:01 +1300
>>
>> > To summarize, your opinion seems to be to not enable the
>> > optimization by default, correct?
>>
>> Yes, and I don't think we ever could do so.
>>
>> BTW, another common paradigm is using the "always clear" bits of a
>> pointer to encode state bits.
>
> Yeah, I think it is a bad optimization that is going to break lots of code
> and the glibc patch is definitely unacceptable (and not doing what you think
> it is doing, your change didn't do anything at all, as the aligned attribute
> applies to the pointer type and not to the pointed type).
>
> The alignment of the argument pointer can be IMHO only hard assumed
> if there is a load or store using that type dominating the spot where you
> are checking it, and the target is strict alignment target.
> Then we can both assume that alignment, optimize away tests on the low bits
> etc.
> Otherwise, what you could do is just use the pointed type's alignment as
> a hint, likely alignment, e.g. on non-strict alignment target you could
> expand code optimistically assuming that it is very likely aligned, but
> still shouldn't optimize away low bits tests.

There is nothing like "very likely aligned" ;)  Note that what is new is
that we now no longer assume alignment by default (we did in the past)
and that we derive properties about the pointer _value_ from alignment.

I think we can derive pointer values when we see dereferences, the
code wouldn't be portable to strict-alignment targets otherwise.  We
can offer a -fno-strict-alignment option to disable deriving alignment
from types.

Richard.

>        Jakub
>

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread Jakub Jelinek

On Wed, Sep 28, 2011 at 03:00:09PM -0400, David Miller wrote:
> From: Maxim Kuvyrkov 
> Date: Thu, 29 Sep 2011 07:58:01 +1300
> 
> > To summarize, your opinion seems to be to not enable the
> > optimization by default, correct?
> 
> Yes, and I don't think we ever could do so.
> 
> BTW, another common paradigm is using the "always clear" bits of a
> pointer to encode state bits.

Yeah, I think it is a bad optimization that is going to break lots of code
and the glibc patch is definitely unacceptable (and not doing what you think
it is doing, your change didn't do anything at all, as the aligned attribute
applies to the pointer type and not to the pointed type).

The alignment of the argument pointer can be IMHO only hard assumed
if there is a load or store using that type dominating the spot where you
are checking it, and the target is strict alignment target.
Then we can both assume that alignment, optimize away tests on the low bits
etc.
Otherwise, what you could do is just use the pointed type's alignment as
a hint, likely alignment, e.g. on non-strict alignment target you could
expand code optimistically assuming that it is very likely aligned, but
still shouldn't optimize away low bits tests.

Jakub

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread Jeff Law

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 09/28/11 13:00, David Miller wrote:
> From: Maxim Kuvyrkov  Date: Thu, 29 Sep
> 2011 07:58:01 +1300
> 
>> To summarize, your opinion seems to be to not enable the 
>> optimization by default, correct?
> 
> Yes, and I don't think we ever could do so.
> 
> BTW, another common paradigm is using the "always clear" bits of a 
> pointer to encode state bits.
The PA (and IA64 IIRC) store tidbits of info in the low order (always
zero) bits of function addresses.  May not be relevant for the
discussion at hand, but figured I'd mention it.

jeff
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOg3AYAAoJEBRtltQi2kC7j3AIAIUdlIfYpDYagNio0sfX+OkD
7LnBhXH15IU6NyaPQ7JYDfcr98I42tmBIwRNfzVyZJBiWSLhEYnFZV9WrZ3tubha
7WGIfXqA4sxLapOrTwGwJTO5Z1lCGjIgz3H8WPv1nKxMeQ1hmeLVaXrKUc9fiJQ5
tAp7g0ZRXW8RbJ3GmA4IYogIRSLK9OmrG3UPBPL6ARUG1vYgLVFGfW/k6ddrEcWe
HaFy31S57rVH8jTDlY769FbJyHajQ86ZJByNQ6hIvTeRojomAacsjRAthrpGKqxj
/yd3Rj6dOvogqp3dpaKFLOwvn2Zo2SUDt/bQkPWs/iRUlyxNnICaiZfo0FsJgD4=
=w/UC
-END PGP SIGNATURE-

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread David Miller

From: Maxim Kuvyrkov 
Date: Thu, 29 Sep 2011 07:58:01 +1300

> To summarize, your opinion seems to be to not enable the
> optimization by default, correct?

Yes, and I don't think we ever could do so.

BTW, another common paradigm is using the "always clear" bits of a
pointer to encode state bits.

Re: Scalar vector binary operation

2011-09-28 Thread Artem Shinkarov

Hi

In the attachment there is a patch with the changes. I don't have
rights to commit to the cvs.

If we write about vector-related stuff, may be it would make sense to
mention that from now on we can make vector shifts and we can index
vector. These are changes from 4.6 (I guess), but they were never
mentioned in any changes.


Thanks,
Artem.


On Wed, Sep 28, 2011 at 3:46 PM, Ian Lance Taylor  wrote:
> Artem Shinkarov  writes:
>
>> I can try to put a description in the document. I am not sure that I
>> have rights to commit to the svn, but at least I can try to write the
>> text.
>>
>> There are also pending patches for vector-comparison (almost
>> submitted) and vector shuffling (still under discussion), but I hope
>> to finish both of them until the new release is coming.
>>
>> Could you point out where the cnhanges.html is located in svn?
>
> It's actually still in CVS, at gcc.gnu.org:/cvs/gcc.
>
> Ian
>
Index: htdocs/gcc-4.7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.7/changes.html,v
retrieving revision 1.38
diff -u -r1.38 changes.html
--- htdocs/gcc-4.7/changes.html 27 Sep 2011 18:42:56 -  1.38
+++ htdocs/gcc-4.7/changes.html 28 Sep 2011 18:52:33 -
@@ -150,6 +150,18 @@
   through which the compiler can be hinted about pointer alignment
   and can use it to improve generated code.
   
+  When a binary operation performed on vector types and one of the operands
+  is a uniform vector it is possible to replace the vector with the
+  generating element. For example:
+  
+typedef int v4si __attribute__ ((vector_size (16)));
+v4si res, a = {1,2,3,4};
+int x;
+
+res = 2 + a;  /* means {2,2,2,2} + a  */
+res = a - x;  /* means a - {x,x,x,x}  */
+  
+  
 
 
 C++

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread Maxim Kuvyrkov

David,

To summarize, your opinion seems to be to not enable the optimization by 
default, correct?

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics


On 29/09/2011, at 7:48 AM, David Miller wrote:

> From: Maxim Kuvyrkov 
> Date: Thu, 29 Sep 2011 07:45:17 +1300
> 
>> OK.  Do you have an alternative suggestion that would fix non-portable use 
>> of locale_t?
> 
> Don't optimize something that is invalidated by a quite common practice?
> 
> What about people who encode invalid pointers with "0xdeadbeef", do we need
> to audit every source tree that does this too?  This invalidates the
> optimization's preconditions as well.
> 
> You're not going to eradicate all the code in the world which uses unaligned
> constants to encode pointers to make them have special meanings in certain
> situations.
> 
> We use the "-1" thing in the Linux kernel too I believe.  I'd go so far as
> to say this kind of thing is pervasive.
> 
>

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread David Miller

From: Maxim Kuvyrkov 
Date: Thu, 29 Sep 2011 07:45:17 +1300

> OK.  Do you have an alternative suggestion that would fix non-portable use of 
> locale_t?

Don't optimize something that is invalidated by a quite common practice?

What about people who encode invalid pointers with "0xdeadbeef", do we need
to audit every source tree that does this too?  This invalidates the
optimization's preconditions as well.

You're not going to eradicate all the code in the world which uses unaligned
constants to encode pointers to make them have special meanings in certain
situations.

We use the "-1" thing in the Linux kernel too I believe.  I'd go so far as
to say this kind of thing is pervasive.

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread Maxim Kuvyrkov

On 29/09/2011, at 7:41 AM, David Miller wrote:

> From: Maxim Kuvyrkov 
> Date: Thu, 29 Sep 2011 07:40:55 +1300
> 
>> On 29/09/2011, at 7:35 AM, David Miller wrote:
>> 
>>> From: Maxim Kuvyrkov 
>>> Date: Thu, 29 Sep 2011 07:29:12 +1300
>>> 
 GLIBC patch to fix locale_t definition is attached.
>>> 
>>> Isn't this going to result in byte loads being used to dereference
>>> all locale_t pointers on targets like sparc and mips?
>> 
>> Yes, that's the price for binary compatibility for comparing a pointer to 
>> -1.  I just hope that no common program has locale functions on its critical 
>> path.
> 
> I personally don't think this is acceptable, critical path or not.

OK.  Do you have an alternative suggestion that would fix non-portable use of 
locale_t?

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread David Miller

From: Maxim Kuvyrkov 
Date: Thu, 29 Sep 2011 07:40:55 +1300

> On 29/09/2011, at 7:35 AM, David Miller wrote:
> 
>> From: Maxim Kuvyrkov 
>> Date: Thu, 29 Sep 2011 07:29:12 +1300
>> 
>>> GLIBC patch to fix locale_t definition is attached.
>> 
>> Isn't this going to result in byte loads being used to dereference
>> all locale_t pointers on targets like sparc and mips?
> 
> Yes, that's the price for binary compatibility for comparing a pointer to -1. 
>  I just hope that no common program has locale functions on its critical path.

I personally don't think this is acceptable, critical path or not.

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread Maxim Kuvyrkov

On 29/09/2011, at 7:35 AM, David Miller wrote:

> From: Maxim Kuvyrkov 
> Date: Thu, 29 Sep 2011 07:29:12 +1300
> 
>> GLIBC patch to fix locale_t definition is attached.
> 
> Isn't this going to result in byte loads being used to dereference
> all locale_t pointers on targets like sparc and mips?

Yes, that's the price for binary compatibility for comparing a pointer to -1.  
I just hope that no common program has locale functions on its critical path.

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread David Miller

From: Maxim Kuvyrkov 
Date: Thu, 29 Sep 2011 07:29:12 +1300

> GLIBC patch to fix locale_t definition is attached.

Isn't this going to result in byte loads being used to dereference
all locale_t pointers on targets like sparc and mips?

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread Maxim Kuvyrkov

On 24/09/2011, at 9:40 PM, Jakub Jelinek wrote:

> On Sat, Sep 24, 2011 at 11:31:25AM +0200, Richard Guenther wrote:
>> In the end I'd probably say the patch is ok without the option (thus
>> turned on by default), but if LC_GLOBAL_LOCALE is part of the
>> glibc ABI then we clearly can't do this.
> 
> Yes, LC_GLOBAL_LOCALE is part of glibc ABI.  I guess we could only assume
> the alignment if the pointer is actually dereferenced on the statement
> that checks the ABI or in some stmt that dominates the spot where you want
> to check the alignment.  It is IMHO quite common to pass arbitrary values
> in pointer types, then cast them back or just compare.

I can relate to camps of both GCC developers and GLIBC developers, and I 
understand the benefits and liabilities of Tom's optimization.  Ultimately, the 
problem we need to solve is to find a way to manage non-conforming code with a 
compiler that tries to squeeze out as much performance from valid code as 
possible.

I think Tom's suggestion to have an option (either enabled or disabled by 
default) is a very good solution given the circumstances.  I think we should 
document the option as a transitional measure designed to give time to GLIBC 
and others to catch up, and obsolete it in the next release.  GLIBC patch to 
fix locale_t definition is attached.

In this submission Tom is being punished for his thoroughness in testing the 
optimization.  Let this be a lesson to all of us to steer clear of GLIBC.  
[Kidding.]

We had similar discussions several times already, and it seems the guiding 
principle of whether enabling new optimization that may break undefined code 
patterns is to balance expected performance benefits against how frequently the 
offending code pattern is used.  Returning to the case at hand: Is there code 
comparing a pointer to an integer?  Yes.  Is it common?  Comparing to a zero, 
absolutely; to a non-zero  errr.  Probably not that much.

The performance benefits from the optimization are quite significant.  Pointer 
alignment has tremendous effect on expanding __builtin_{mem,str}* functions.

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics

fsf-glibc-locale_t-align.patch
Description: Binary data

Re: [RFA] Add libiberty/argv.c:countargv

2011-09-28 Thread Ian Lance Taylor

On Tue, Sep 20, 2011 at 11:23 AM, Doug Evans  wrote:
>
> 2011-09-20  Doug Evans  
>
>        include/
>        * libiberty.h (countargv): Declare.
>
>        libiberty/
>        * argv.c (countargv): New function.


> +  for (argc = 0; argv[argc] != NULL; argc++);

Please write the semicolon on the next line.  I think it is too easy
to get confused when written that way.

  for (argc = 0; argv[argc] != NULL; argc++)
;

This is OK with that change.

Thanks.

Ian

Re: [RFA] timeval-utils.c for libiberty

2011-09-28 Thread Ian Lance Taylor

On Mon, Sep 19, 2011 at 5:52 PM, Doug Evans  wrote:
>
> 2011-09-19  Doug Evans  
>
>        include/
>        * timeval-utils.h: New file.
>
>        libiberty/
>        * timeval-utils.c: New file.
>        * Makefile.in (CFILES): Add it.
>        (REQUIRED_OFILES): Add timeval-utils.$(objext).
>        (INSTALLED_HEADERS): Add timeval-utils.h.
>        (timeval-utils.$(objext)): Add rule.

This is OK.

Thanks.

Ian

Re: [PATCH 2/3] Use urandom to get random seed

2011-09-28 Thread Basile Starynkevitch

On Wed, 28 Sep 2011 16:56:32 +0200
Andi Kleen  wrote:

> > I suppose we might get interrupted before anything is read and
> > read can return with -1 (I suppose partial reads are quite unlikely
> > though)?  Thus, don't we need the usual EINTR loop?
> 
> When we get interrupted gcc will die. I don't think gcc handles any
> asynchronous signals, right?


That depends upon the signal. If we got SIGCHLD or SIGWINCH, the call to read 
probably
gets EINTR, but the signal is ignored unless explicitly handled.

So I would suggest testing for EINTR.

Besides, perhaps some plugins could install their signal handlers

Regards.
-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***

Re: [RFA/arm] Fix gcc.target/arm/pr42835.c testcase

2011-09-28 Thread Ramana Radhakrishnan

> gcc/testsuite/ChangeLog:
>
> 2011-09-28  Matthew Gretton-Dann  
>
>        * gcc.target/arm/pr42835.c: Add -fno-tree-tail-merge.

This is OK.

Ramana

[RFA/arm] Fix gcc.target/arm/pr42835.c testcase

2011-09-28 Thread Matthew Gretton-Dann


All,

gcc.target/arm/pr42835.c started failing as a result of the following
change which add tree-tail merging:
  http://gcc.gnu.org/viewcvs?view=revision&revision=179275

The behaviour of the testcase with tree-tail merging is correct, but not 
what is expected.


The attached patch adds -fno-tree-tail-merge to the test options.

Tested arm-none-eabi.

Can someone review and approve please?

Thanks,

Matt

gcc/testsuite/ChangeLog:

2011-09-28  Matthew Gretton-Dann  

* gcc.target/arm/pr42835.c: Add -fno-tree-tail-merge.

--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltddiff --git a/gcc/testsuite/gcc.target/arm/pr42835.c 
b/gcc/testsuite/gcc.target/arm/pr42835.c
index 71c51eb..867dd02 100644
--- a/gcc/testsuite/gcc.target/arm/pr42835.c
+++ b/gcc/testsuite/gcc.target/arm/pr42835.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mthumb -Os" }  */
+/* { dg-options "-mthumb -Os -fno-tree-tail-merge" }  */
 /* { dg-require-effective-target arm_thumb2_ok } */
 
 int foo(int *p, int i)

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Andi Kleen

On Wed, Sep 28, 2011 at 06:27:11PM +0200, Andi Kleen wrote:
> > There is no separate cost-table for Nehalem or SandyBridge - however,
> > I tuned generic32 and generic64 tables, that should improve
> > performance on modern processors. In old version REP-MOV was used - it
> 
> The recommended heuristics have changed in Nehalem and Sandy-Bridge
> over earlier Intel CPUs.

Sorry what I meant is that it would be bad if -mtune=corei7(-avx)? was
slower than generic.

Adding new tables shouldn't be very difficult, even if they are the same
as generic.

-Andi

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Andi Kleen

> There is no separate cost-table for Nehalem or SandyBridge - however,
> I tuned generic32 and generic64 tables, that should improve
> performance on modern processors. In old version REP-MOV was used - it

The recommended heuristics have changed in Nehalem and Sandy-Bridge
over earlier Intel CPUs.

-nAid
-- 
a...@linux.intel.com -- Speaking for myself only.

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread Tom de Vries

On 09/28/2011 03:57 PM, Carrot Wei wrote:
> Hi Tom
> 
> What's the behavior of your patch to the following case
> 
> typedef int int_unaligned __attribute__((aligned(1)));
> int foo (int_unaligned *p)
> {
>   return *p;
> }
> 

I modified the example slightly:

test.c:
...
typedef int __attribute__((aligned(2))) int_unaligned;
int foo (int_unaligned *p)
{
  return *(p+1);
}
...

test.c.023t.ccp1:
...
  # PT = anything
  # ALIGN = 2, MISALIGN = 0
  D.2723_2 = pD.1604_1(D) + 4;
...

Thanks,
- Tom

> thanks
> Carrot
> 
> On Tue, Sep 20, 2011 at 7:13 PM, Tom de Vries  wrote:
>> Hi Richard,
>>
>> I have a patch for PR43814. It introduces an option that assumes that 
>> function
>> arguments of pointer type are aligned, and uses that information in
>> tree-ssa-ccp. This enables the memcpy in pr43814-2.c to be inlined.
>>
>> I tested the patch successfully on-by-default on x86_64 and i686 (both gcc 
>> only
>> builds).
>>
>> I also tested the patch on-by-default for ARM (gcc/glibc build). The patch
>> generated wrong code for uselocale.c:
>> ...
>> glibc/locale/locale.h:
>> ...
>> /* This value can be passed to `uselocale' and may be returned by
>>   it. Passing this value to any other function has undefined behavior.  */
>> # define LC_GLOBAL_LOCALE   ((__locale_t) -1L)
>> ...
>> glibc/locale/uselocale.c:
>> ...
>> locale_t
>> __uselocale (locale_t newloc)
>> {
>>  locale_t oldloc = _NL_CURRENT_LOCALE;
>>
>>  if (newloc != NULL)
>>{
>>  const locale_t locobj
>>= newloc == LC_GLOBAL_LOCALE ? &_nl_global_locale : newloc;
>>
>> ...
>> The assumption that function arguments of pointer type are aligned, allowed 
>> the
>> test 'newloc == LC_GLOBAL_LOCALE' to evaluate to false.
>> But the usage of ((__locale_t) -1L) as function argument in uselocale 
>> violates
>> that assumption.
>>
>> Fixing the definition of LC_GLOBAL_LOCALE allowed the gcc tests to run 
>> without
>> regressions for ARM.
>>
>> Furthermore, the patch fixes ipa-sra-2.c and ipa-sra-6.c regressions on ARM,
>> discussed here:
>> - http://gcc.gnu.org/ml/gcc-patches/2011-08/msg00930.html
>> - http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00459.html
>>
>> But, since glibc uses this construct currently, the option is off-by-default 
>> for
>> now.
>>
>> OK for trunk?
>>
>> Thanks,
>> - Tom
>>
>> 2011-09-20  Tom de Vries 
>>
>>PR target/43814
>>* tree-ssa-ccp.c (get_align_value): New function, factored out of
>>get_value_from_alignment.
>>(get_value_from_alignment): Use get_align_value.
>>(get_value_for_expr): Use get_align_value to handle alignment of
>>function argument pointers.
>>* common.opt (faligned-pointer-argument): New option.
>>* doc/invoke.texi (Optimization Options): Add
>>-faligned-pointer-argument.
>>(-faligned-pointer-argument): New item.
>>
>>* gcc/testsuite/gcc.dg/pr43814.c: New test.
>>* gcc/testsuite/gcc.target/arm/pr43814-2.c: New test.
>>

[RFA/ARM][Patch 01/02]: Thumb2 epilogue in RTL

2011-09-28 Thread Sameera Deshpande

Hi!

This patch generates Thumb2 epilogues in RTL form.

The work involves defining new functions, predicates and patterns along with
few changes in existing code:
* The load_multiple_operation predicate was found to be too restrictive for
integer loads as it required consecutive destination regs, so this
restriction was lifted.
* Variations of load_multiple_operation were required to handle cases 
   - where SP must be the base register 
   - where FP values were being loaded (which do require consecutive
destination registers)
   - where PC can be in register-list (which requires return pattern along
with register loads).
  Hence, the common code was factored out into a new function in arm.c and
parameterised to show 
   - whether consecutive destination regs are needed
   - the data type being loaded 
   - whether the base register has to be SP
   - whether PC is in register-list

The patch is tested with arm-eabi with no regressions.

ChangeLog:

2011-09-28  Ian Bolton 
Sameera Deshpande  
   
   * config/arm/arm-protos.h (load_multiple_operation_p): New
declaration.
 (thumb2_expand_epilogue): Likewise.
 (thumb2_output_return): Likewise
 (thumb2_expand_return): Likewise.
 (thumb_unexpanded_epilogue): Rename to... 
 (thumb1_unexpanded_epilogue): ...this 
   * config/arm/arm.c (load_multiple_operation_p): New function. 
 (thumb2_emit_multi_reg_pop): Likewise.
 (thumb2_emit_vfp_multi_reg_pop): Likewise.
 (thumb2_expand_return): Likewise. 
 (thumb2_expand_epilogue): Likewise. 
 (thumb2_output_return): Likewise
 (thumb_unexpanded_epilogue): Rename to...
 ( thumb1_unexpanded_epilogue): ...this
   * config/arm/arm.md (pop_multiple_with_stack_update): New pattern. 
 (pop_multiple_with_stack_update_and_return): Likewise.
 (thumb2_ldr_with_return): Likewise.
 (floating_point_pop_multiple_with_stack_update): Likewise.
 (return): Update condition and code for pattern.
 (arm_return): Likewise.
 (epilogue_insns): Likewise.
   * config/arm/predicates.md (load_multiple_operation): Update
predicate.
 (load_multiple_operation_stack_and_return): New predicate. 
 (load_multiple_operation_stack): Likewise.
 (load_multiple_operation_stack_fp): Likewise.
   * config/arm/thumb2.md (thumb2_return): Remove.
 (thumb2_rtl_epilogue_return): New pattern.


- Thanks and regards,
  Sameera D.

thumb2_rtl_epilogue_complete-27Sept.patch
Description: Binary data

Re: PowerPC shrink-wrap support 3 of 3

2011-09-28 Thread Alan Modra

This supercedes http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01004.html
and http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01593.html, fixing
the two regressions introduced by those patches.  The first patch is
unchanged except to leave all the out-of-line restore functions using
"return" rather than "simple_return".  We don't want these being
confused with a plain "simple_return" and perhaps used by the shrink-
wrapping to return from pre-prologue code.

The second of these two patches was way too simplistic.  It was a real
pain getting the cfa_restores correct.  A lot were missing, or emitted
at the wrong place (due to bug in rs6000_emit_stack_reset).  I also
had the real restore insn move past the cfa_restores ("mtlr 0" insn
scheduled over loads from stack).

* config/rs6000/rs6000.c (rs6000_make_savres_rtx): Delete unneeded
declaration.
(rs6000_emit_stack_reset): Only return insn emitted when it adjusts sp.
(rs6000_make_savres_rtx): Rename to rs6000_emit_savres_rtx.  Use
simple_return in pattern, emit instruction, and set jump_label.
(rs6000_emit_prologue): Update for rs6000_emit_savres_rtx.  Use
simple_return rather than return.
(emit_cfa_restores): New function.
(rs6000_emit_epilogue): Emit cfa_restores when flag_shrink_wrap.
Add missing cfa_restores for SAVE_WORLD.  Add missing LR cfa_restore
when using out-of-line gpr restore.  Add missing LR and FP regs
cfa_restores for out-of-line fpr restore.  Consolidate code setting
up cfa_restores.  Formatting.  Use LR_REGNO define.
(rs6000_output_mi_thunk): Use simple_return rather than return.
* config/rs6000/rs6000.md (sibcall*, sibcall_value*): Likewise.
(return_internal*): Likewise.
(any_return, return_pred, return_str): New iterators.
(return, conditional return insns): Provide both return and
simple_return variants.
* gcc/config/rs6000/rs6000.h (EARLY_R12, LATE_R12): Define.
(REG_ALLOC_ORDER): Move r12 before call-saved regs when FIXED_R13.
Move r11 and r0 later to suit shrink-wrapping.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 178876)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -899,8 +900,6 @@ static const char *rs6000_mangle_type (c
 static void rs6000_set_default_type_attributes (tree);
 static rtx rs6000_savres_routine_sym (rs6000_stack_t *, bool, bool, bool);
 static rtx rs6000_emit_stack_reset (rs6000_stack_t *, rtx, rtx, int, bool);
-static rtx rs6000_make_savres_rtx (rs6000_stack_t *, rtx, int,
-  enum machine_mode, bool, bool, bool);
 static bool rs6000_reg_live_or_pic_offset_p (int);
 static tree rs6000_builtin_vectorized_libmass (tree, tree, tree);
 static tree rs6000_builtin_vectorized_function (tree, tree, tree);
@@ -19704,8 +19728,10 @@ rs6000_emit_stack_reset (rs6000_stack_t 
   if (sp_offset != 0)
{
  rtx dest_reg = savres ? gen_rtx_REG (Pmode, 11) : sp_reg_rtx;
- return emit_insn (gen_add3_insn (dest_reg, frame_reg_rtx,
-  GEN_INT (sp_offset)));
+ rtx insn = emit_insn (gen_add3_insn (dest_reg, frame_reg_rtx,
+  GEN_INT (sp_offset)));
+ if (!savres)
+   return insn;
}
   else if (!savres)
return emit_move_insn (sp_reg_rtx, frame_reg_rtx);
@@ -19729,10 +19755,11 @@ rs6000_emit_stack_reset (rs6000_stack_t 
 }
 
 /* Construct a parallel rtx describing the effect of a call to an
-   out-of-line register save/restore routine.  */
+   out-of-line register save/restore routine, and emit the insn
+   or jump_insn as appropriate.  */
 
 static rtx
-rs6000_make_savres_rtx (rs6000_stack_t *info,
+rs6000_emit_savres_rtx (rs6000_stack_t *info,
rtx frame_reg_rtx, int save_area_offset,
enum machine_mode reg_mode,
bool savep, bool gpr, bool lr)
@@ -19742,6 +19769,7 @@ rs6000_make_savres_rtx (rs6000_stack_t *
   int reg_size = GET_MODE_SIZE (reg_mode);
   rtx sym;
   rtvec p;
+  rtx par, insn;
 
   offset = 0;
   start_reg = (gpr
@@ -19755,7 +19783,7 @@ rs6000_make_savres_rtx (rs6000_stack_t *
 RTVEC_ELT (p, offset++) = ret_rtx;
 
   RTVEC_ELT (p, offset++)
-= gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, 65));
+= gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, LR_REGNO));
 
   sym = rs6000_savres_routine_sym (info, savep, gpr, lr);
   RTVEC_ELT (p, offset++) = gen_rtx_USE (VOIDmode, sym);
@@ -19788,7 +19816,16 @@ rs6000_make_savres_rtx (rs6000_stack_t *
   RTVEC_ELT (p, i + offset) = gen_rtx_SET (VOIDmode, mem, reg);
 }
 
-  return gen_rtx_PARALLEL (VOIDmode, p);
+  par = gen_rtx_PARALLEL (VOIDmode, p);
+
+  if (!savep && lr)
+{
+  insn = emit_jump_insn (par);
+  JUMP_LABEL (insn) = ret_

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Michael Zolotukhin

> You could add a check to configure and generate based on that?
Do you mean check if glibc is newer than 2.13?
I think that when new glibc version is released, the tables should be
re-examined anyway - we shouldn't just stop inlining, or stop
generating libcalls.

> BTW I know that the tables need tuning for Nehalem and Sandy Bridge too.
> Michael are you planning to do that?
There is no separate cost-table for Nehalem or SandyBridge - however,
I tuned generic32 and generic64 tables, that should improve
performance on modern processors. In old version REP-MOV was used - it
turns out to be slower than SSE-moves or libcalls (in my version the
fastest from these options is used).

[Patch,AVR]: Better log output with -mdeb

2011-09-28 Thread Georg-Johann Lay

This is a tentative patch for better support of logging information for avr BE
developers.

There are situations where it is more convenient to let the compiler produce
information than to debug into the compiler. One example are -da dumps.

This patch proposes a better support to print information by means of a
printf-like function via %-codes.

The current debug output with avr-gcc's option -mdeb produces bulk of
information that is very hard to read because
 - there is much output
 - there is no information on current_function_decl
 - there is no information on current pass name/number
 - there is no print-like function so the trees, rtxes so that it is
   tedious to get formatted output.

For example, the following call to avr_edump

static int
avr_OS_main_function_p (tree func)
{
  avr_edump ("%?: %t\n", func);
  return avr_lookup_function_attribute1 (func, "OS_main");
}

prints additional information in a convenient way (foo is function to be 
compiled):

avr_OS_main_function_p[foo:pro_and_epilogue(202)]: Index: config/avr/avr-protos.h
===
--- config/avr/avr-protos.h	(revision 179257)
+++ config/avr/avr-protos.h	(working copy)
@@ -111,3 +111,9 @@ extern rtx avr_incoming_return_addr_rtx
 #ifdef REAL_VALUE_TYPE
 extern void asm_output_float (FILE *file, REAL_VALUE_TYPE n);
 #endif
+
+#define avr_edump (avr__set_caller_e (__FUNCTION__))
+#define avr_fdump (avr__set_caller_f (__FUNCTION__))
+
+extern int (*avr__set_caller_e (const char*))(const char*, ...);
+extern int (*avr__set_caller_f (const char*))(FILE*, const char*, ...);
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 179257)
+++ config/avr/avr.c	(working copy)
@@ -7810,5 +7810,233 @@ avr_expand_builtin (tree exp, rtx target
   gcc_unreachable ();
 }
 
+
+/***
+ ** Logging, for BE developers only
+ ***/
+
+#include "tree-pass.h"
+
+static const char* avr__caller = "?";
+static FILE* avr__stream;
+
+static void avr__vadump (FILE*, const char*, va_list);
+
+static int
+avr__fdump_f (FILE *stream, const char* fmt, ...)
+{
+  va_list ap;
+
+  va_start (ap, fmt);
+  if (stream)
+avr__vadump (stream, fmt, ap);
+  va_end (ap);
+
+  return 1;
+}
+
+int (*
+avr__set_caller_f (const char* caller)
+ )(FILE*, const char*, ...)
+{
+  avr__caller = caller;
+
+  return avr__fdump_f;
+}
+
+static int
+avr__fdump_e (const char* fmt, ...)
+{
+  va_list ap;
+
+  va_start (ap, fmt);
+  avr__vadump (stderr, fmt, ap);
+  va_end (ap);
+
+  return 1;
+}
+
+int (*
+avr__set_caller_e (const char* caller)
+ )(const char*, ...)
+{
+  avr__caller = caller;
+  avr__stream = stderr;
+  
+  return avr__fdump_e;
+}
+
+/*
+  -- known %-codes --
+  
+  r: rtx
+  t: tree
+  T: tree (brief)
+  C: enum rtx_code
+  m: enum machine_mode
+  R: enum reg_class
+  L: insn list
+  H: location_t
+
+  -- no args --
+  A: call abort()
+  f: current_function_name()
+  F: caller (via __FUNCTION__)
+  P: Pass name and number
+  ?: Print caller, current function and pass info
+
+  -- same as printf --
+  %: %
+  c: char
+  s: string
+  d: int (decimal)
+  x: int (hex)
+*/
+
+static void
+avr__vadump (FILE *file, const char *fmt, va_list ap)
+{
+  char bs[3] = {'\\', '?', '\0'};
+
+  while (*fmt)
+{
+  switch (*fmt++)
+{
+default:
+  fputc (*(fmt-1), file);
+  break;
+  
+case '\\':
+  bs[1] = *fmt++;
+  fputs (bs, file);
+  break;
+  
+case '%':
+  switch (*fmt++)
+{
+case '%':
+  fputc ('%', file);
+  break;
+
+case 't':
+  {
+tree t = va_arg (ap, tree);
+if (NULL_TREE == t)
+  fprintf (file, "");
+else
+  {
+if (stderr == file)
+  debug_tree (t);
+  }
+break;
+  }
+
+case 'T':
+  print_node_brief (file, "", va_arg (ap, tree), 3);
+  break;
+
+case 'd':
+  fprintf (file, "%d", va_arg (ap, int));
+  break;
+
+case 'x':
+  fprintf (file, "%x", va_arg (ap, int));
+  break;
+
+case 'c':
+  fputc (va_arg (ap, int), file);
+  break;
+
+case 'r':
+  print_inline_rtx (file, va_arg (ap, rtx), 0);
+  break;
+
+case 'L':
+  {
+rtx insn = va_arg (ap, rtx);
+
+while (insn)
+  {
+print_inline_rtx (file, insn, 0);
+

Re: [PATCH 2/3] Use urandom to get random seed

2011-09-28 Thread Andi Kleen

> I suppose we might get interrupted before anything is read and
> read can return with -1 (I suppose partial reads are quite unlikely
> though)?  Thus, don't we need the usual EINTR loop?

When we get interrupted gcc will die. I don't think gcc handles any
asynchronous signals, right?

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Andi Kleen

On Wed, Sep 28, 2011 at 02:54:34PM +0200, Jan Hubicka wrote:
> > > Do you know glibc version numbers when
> > > the optimized string functions was introduced?
> > 
> > Afaik, it's 2.13.
> > I also compared my implementation to 2.13.
> 
> I wonder if we can assume that most of GCC 4.7 based systems will be glibc 
> 2.13
> based, too.  I would tend to say that yes and thus would suggest to tamn down
> inlining that is no longer profitable on newer glibcs with a note in
> changes.html...

You could add a check to configure and generate based on that?

BTW I know that the tables need tuning for Nehalem and Sandy Bridge too.
Michael are you planning to do that?

-Andi

Re: [3/4] SMS: Record moves in the partial schedule

2011-09-28 Thread Richard Sandiford

Ayal Zaks  writes:
>>> Only request is to document that the register moves are
>>> placed/assigned-id's in a specific order.
>>
>>I suppose this is the downside of splitting the patches up, sorry,
>>but the ids are only ordered for the throw-away loop:
>>
>> FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps->reg_moves, i, move)
>>   add_insn_before (move->insn, ps_first_note (ps, move->def), NULL);
>>
>>and for the prologue/epilogue code.  Both are replaced in patch 4
>>with code that doesn't rely on the ordering of ids.
> Ok then, very well. I was mostly referring to the following in
> prologue/epiloque code, which merely relies on assigning all regmoves
> of a node consecutive id's:

FWIW, the 4/4 that I just posted did finally get rid of the first_reg_move
& nreg_moves fields.

Here's a slightly updated patch in line with your 4/4 comments.
The move->def is now always the id of the predecessor, rather than
the id of the original ddg node producer.  I've adapted the throw-away
loop accordingly, but there are no other changes.

Tested on powerpc64-linux-gnu and with ARM benchmarks.

Richard

gcc/
* modulo-sched.c (ps_insn): Adjust comment.
(ps_reg_move_info): New structure.
(partial_schedule): Add reg_moves field.
(SCHED_PARAMS): Use node_sched_param_vec instead of node_sched_params.
(node_sched_params): Turn first_reg_move into an identifier.
(ps_reg_move): New function.
(ps_rtl_insn): Cope with register moves.
(ps_first_note): Adjust comment and assert that the instruction
isn't a register move.
(node_sched_params): Replace with...
(node_sched_param_vec): ...this vector.
(set_node_sched_params): Adjust accordingly.
(print_node_sched_params): Take a partial schedule instead of a ddg.
Use ps_rtl_insn and ps_reg_move.
(generate_reg_moves): Rename to...
(schedule_reg_moves): ...this.  Remove rescan parameter.  Record each
move in the partial schedule, but don't emit it here.  Don't perform
register substitutions here either.
(apply_reg_moves): New function.
(duplicate_insns_of_cycles): Use register indices directly,
rather than finding instructions using PREV_INSN.  Use ps_reg_move.
(sms_schedule): Call schedule_reg_moves before committing to
a partial schedule.   Try the next ii if the schedule fails.
Use apply_reg_moves instead of generate_reg_moves.  Adjust
call to print_node_sched_params.  Free node_sched_param_vec
instead of node_sched_params.
(create_partial_schedule): Initialize reg_moves.
(free_partial_schedule): Free reg_moves.

Index: gcc/modulo-sched.c
===
--- gcc/modulo-sched.c  2011-09-28 09:03:10.334301485 +0100
+++ gcc/modulo-sched.c  2011-09-28 11:24:26.530300781 +0100
@@ -124,7 +124,9 @@ #define PS_STAGE_COUNT(ps) (((partial_sc
 /* A single instruction in the partial schedule.  */
 struct ps_insn
 {
-  /* The number of the ddg node whose instruction is being scheduled.  */
+  /* Identifies the instruction to be scheduled.  Values smaller than
+ the ddg's num_nodes refer directly to ddg nodes.  A value of
+ X - num_nodes refers to register move X.  */
   int id;

   /* The (absolute) cycle in which the PS instruction is scheduled.
@@ -137,6 +139,30 @@ struct ps_insn

 };

+/* Information about a register move that has been added to a partial
+   schedule.  */
+struct ps_reg_move_info
+{
+  /* The source of the move is defined by the ps_insn with id DEF.
+ The destination is used by the ps_insns with the ids in USES.  */
+  int def;
+  sbitmap uses;
+
+  /* The original form of USES' instructions used OLD_REG, but they
+ should now use NEW_REG.  */
+  rtx old_reg;
+  rtx new_reg;
+
+  /* An instruction that sets NEW_REG to the correct value.  The first
+ move associated with DEF will have an rhs of OLD_REG; later moves
+ use the result of the previous move.  */
+  rtx insn;
+};
+
+typedef struct ps_reg_move_info ps_reg_move_info;
+DEF_VEC_O (ps_reg_move_info);
+DEF_VEC_ALLOC_O (ps_reg_move_info, heap);
+
 /* Holds the partial schedule as an array of II rows.  Each entry of the
array points to a linked list of PS_INSNs, which represents the
instructions that are scheduled for that row.  */
@@ -148,6 +174,10 @@ struct partial_schedule
   /* rows[i] points to linked list of insns scheduled in row i (0<=inum_nodes.  */
+  VEC (ps_reg_move_info, heap) *reg_moves;
+
   /*  rows_length[i] holds the number of instructions in the row.
   It is used only (as an optimization) to back off quickly from
   trying to schedule a node in a full row; that is, to avoid running
@@ -201,7 +231,7 @@ static void remove_node_from_ps (partial

 #define NODE_ASAP(node) ((node)->aux.count)

-#define SCHED_PARAMS(x) (&node_sched_params[x])
+#define SCHED_PARAMS(x) VEC_index (node_sched_params, no

Re: [4/4] Make SMS schedule register moves

2011-09-28 Thread Richard Sandiford

Ayal Zaks  writes:
>> >> +  /* The cyclic lifetime of move->new_reg starts and ends at move->def
>> >> +     (the instruction that defines move->old_reg).
>> >
>> > So instruction I_REG_MOVE (new_reg=reg) must be scheduled before the
>> > next I_MUST_FOLLOW move/original-def (due to anti-dependence: it
>> > overwrites reg), but after the previous instance of I_MUST_FOLLOW (due
>> > to true dependence; i.e. account for latency also). Why do moves,
>> > except for the one closest to move->def (which is directly dependent
>> > upon it, i.e. for whom move->def == I_MUST_FOLLOW), have to worry
>> > about move->def at all? (Or have their cyclic lifetimes start and end
>> > there?)
>>
>> Because the uses of new_reg belong to the same move->def based cycle.
>
>
> the "cycle" (overloaded term; rather "iteration" in this context) to
> which the uses belong, is inferred from the "cycle" (absolute schedule
> time) in which they are scheduled, regardless of move->def.

Just to prove your point about "cycle" being an overloaded term: I wasn't
actually meaning it in the sense of "(loop) iteration".  I meant a "circular
window based on move->def".

>> So (I think this is the uncontroversial bit): [M1] must be scheduled
>> "cyclically before" [B] and "cyclically after" [C], with the cycle
>> based at [B]:
>>
>>    row 3 after [B]:  empty
>>    row 4:            [C]
>>    row 5:            [D]
>>    row 0:            empty
>>    row 1:            empty
>>    row 2:            [A]
>>    row 3 before [B]: empty
>>
>> [M1] could therefore go in row 1.  This part is OK.
>
>
> Here's how I see it:
> [M1] feeds [C] which is scheduled at cycle 10, so it must be scheduled
> before cycle 10-M_latency and after cycle 10-ii. [M1] uses the result
> of [B] which is scheduled at cycle 3, so must be scheduled after cycle
> 3+B_latency and before cycle 3+ii. Taking all latencies to be 1 and
> ii=6, this yields a scheduling window of cycles [4,9]\cap[4,9]=[4,9];
> if scheduled at cycle 4 it must_follow [C], if scheduled at cycle 9 it
> must_precede [B]. This is identical to the logic behind the
> sched_window of any instruction, based on its dependencies (as you've
> updated not too long ago..), if we do not allow reg_moves (and
> arguably, one should not allow reg_moves when scheduling
> reg_moves...).
>
> To address the potential erroneous scenario of Loop 2, suppose [A] is
> scheduled as in the beginning in cycle 20, and that [M1] is scheduled
> in cycle 7 (\in[4,9]). Then
> [M2] feeds [D] and [A] which are scheduled at cycles 17 and 20, so it
> must be scheduled before cycle 17-1 and after cycle 20-6. [M2] uses
> the result of [M1], so must be scheduled after cycle 7+1 and before
> cycle 7+6. This yields the desired [14,16]\cap[8,13]=\emptyset.

I agree it's natural to schedule moves for intra-iteration dependencies
in the normal get_sched_window way.  But suppose we have a dependency:

   A --(T,N,1)--> B

that requires two moves M1 and M2.  If we think in terms of cycles
(in the SCHED_TIME sense), then this effectively becomes:

   A --(T,N1,1)--> M1 -->(T,N2,0)--> M2 -->(T,N3,0)--> B

because it is now M1 that is fed by both the loop and the incoming edge.
But if there is a second dependency:

   A --(T,M,0)--> C

that also requires two moves, we end up with:

   A --(T,N1,1)--> M1 -->(T,N2,0)--> M2 -->(T,N3,0)--> B
-->(T,M3,-1)--> B

and dependence distances of -1 feel a bit weird. :-)  Of course,
what we really have are two parallel dependencies:

   A --(T,N1,1)--> M1 -->(T,N2,0)--> M2 -->(T,N3,0)--> B

   A --(T,M1,0)--> M1' -->(T,M2,0)--> M2' -->(T,N3,0)--> B

where M1' and M2' occupy the same position as M1 and M2 in the schedule,
but are one stage further along.  But we only schedule them once,
so if we take the cycle/SCHED_TIME route, we have to introduce
dependencies of distance -1.

Another reason why it seemed a little confusing to think in terms of
cycles (in the SCHED_TIME sense) was that, by this stage, we were
already thinking in terms of rows and columns to some extent:

/* If dest precedes src in the schedule of the kernel, then dest
   will read before src writes and we can save one reg_copy.  */
if (SCHED_ROW (e->dest->cuid) == SCHED_ROW (e->src->cuid)
&& SCHED_COLUMN (e->dest->cuid) < SCHED_COLUMN (e->src->cuid))
  nreg_moves4e--;

However...

> Also note that if moves are assigned absolute cycles, it becomes clear
> to which stage each belongs (just like any other instruction),
> regulating their generation in prolog and epilog.

...I have to concede that it makes the stage calculation easier,
and that that tips the balance.  (Although again, a move can belong
to two stages simultanouesly.)

Anyway, here's an updated patch that uses cycle times.  I've also
dropped the code that tried to allow windows to start and end on
the same row (and therefore schedule "either side" of that row).
I thought it might help wit

Re: Scalar vector binary operation

2011-09-28 Thread Ian Lance Taylor

Artem Shinkarov  writes:

> I can try to put a description in the document. I am not sure that I
> have rights to commit to the svn, but at least I can try to write the
> text.
>
> There are also pending patches for vector-comparison (almost
> submitted) and vector shuffling (still under discussion), but I hope
> to finish both of them until the new release is coming.
>
> Could you point out where the cnhanges.html is located in svn?

It's actually still in CVS, at gcc.gnu.org:/cvs/gcc.

Ian

Re: Vector shuffling

2011-09-28 Thread Richard Henderson

On 09/28/2011 05:59 AM, Artem Shinkarov wrote:
> I don't really understand this. As far as I know, expand_normal
> "converts" tree to rtx. All my computations are happening at the level
> of rtx and force_reg is needed just to bring an rtx expression to the
> register of the correct mode. If I am missing something, could you
> give an example how can I use expand_normal instead of force_reg in
> this particular code.

Sorry, I meant expand_(simple_)binop.

>> Is ssse3_pshufb why you do the wrong thing in the expander for v0 != v1?
> 
> My personal feeling is that it may be the case with v0 != v1, that it
> would be more efficient to perform piecewise shuffling rather than
> bitwise dances around the masks.

Maybe for V2DI and V2DFmode, but probably not otherwise.

We can perform the double-word shuffle in 12 insns; 10 for SSE 4.1.
Example assembly attached.

>> It's certainly possible to handle it, though it takes a few more steps,
>> and might well be more efficient as a libgcc function rather than inline.
> 
> I don't really understand why it could be more efficient. I thought
> that inline gives more chances to the final RTL optimisation.

We'll not be able to optimize this at the rtl level.  There are too many
UNSPEC instructions in the way.  In any case, even if that weren't so we'd
only be able to do useful optimization for a constant permutation.  And
we should have been able to prove that at the gimple level.

r~
.data
.align 16
vec3:   .long   3,3,3,3
vec4:   .long   4,4,4,4
dup4:   .byte   0,0,0,0, 4,4,4,4, 8,8,8,8, 12,12,12,12
ofs4:   .byte   0,1,2,3, 0,1,2,3, 0,1,2,3, 0,1,2,3

.text
shuffle2:

// Convert the low bits of the mask to a shuffle
movdqa  %xmm2, %xmm3
pandvec3, %xmm3
pmulld  vec4, %xmm3
pshufb  dup4, %xmm3
paddb   ofs4, %xmm3

// Shuffle both inputs
pshufb  %xmm3, %xmm0
pshufb  %xmm3, %xmm1

// Select and merge the inputs
// Use ix86_expand_int_vcond for use of pblendvb for SSE4_1.
pandvec4, %xmm2
pcmpeqd vec4, %xmm2
pand%xmm2, %xmm1
pandn   %xmm2, %xmm0
por %xmm1, %xmm0

ret

Commit: RX: Add support for MIN and MAX instructions in QI and HI modes

2011-09-28 Thread Nick Clifton

Hi Guys,

  I am going to apply the patch below to the RX backend to add support
  for generating MIN and MAX instructions for HI and QI modes.

Cheers
  Nick

gcc/ChangeLog
2011-09-28  Nick Clifton  

* config/rx/predicates.md (rx_minmax_operand): New predicate.
Accepts immediates and a restricted subset of MEMs.
* config/rx/rx.md (int_modes): New iterator.
(smaxsi3, sminsi3): Delete and replace with...
(smax3, smin3): New patterns.
(umax<>3_u, umax<>3_ur, umax<>3, umin<>3): New patterns.

Index: gcc/config/rx/predicates.md
===
--- gcc/config/rx/predicates.md (revision 179307)
+++ gcc/config/rx/predicates.md (working copy)
@@ -72,6 +72,16 @@
(match_operand 0 "rx_restricted_mem_operand"))
 )
 
+;; Check that the operand is suitable as the source operand
+;; for a min/max instruction.  This is the same as
+;; rx_source_operand except that CONST_INTs are allowed but
+;; REGs and SUBREGs are not.
+
+(define_predicate "rx_minmaxex_operand"
+  (ior (match_operand 0 "immediate_operand")
+   (match_operand 0 "rx_restricted_mem_operand"))
+)
+
 ;; Return true if OP is a store multiple operation.  This looks like:
 ;;
 ;;   [(set (SP) (MINUS (SP) (INT)))
Index: gcc/config/rx/rx.md
===
--- gcc/config/rx/rx.md (revision 179307)
+++ gcc/config/rx/rx.md (working copy)
@@ -22,6 +22,9 @@
 ;; This code iterator is used for sign- and zero- extensions.
 (define_mode_iterator small_int_modes [(HI "") (QI "")])
 
+;; This code iterator is used for max and min operations.
+(define_mode_iterator int_modes [(SI "") (HI "") (QI "")])
+
 ;; We do not handle DFmode here because it is either
 ;; the same as SFmode, or if -m64bit-doubles is active
 ;; then all operations on doubles have to be handled by
@@ -1160,28 +1163,109 @@
(set_attr "timings" "22,44")]
 )
 
-(define_insn "smaxsi3"
-  [(set (match_operand:SI  0 "register_operand" "=r,r,r,r,r,r")
-   (smax:SI (match_operand:SI 1 "register_operand" "%0,0,0,0,0,0")
-(match_operand:SI 2 "rx_source_operand"
-  "r,Sint08,Sint16,Sint24,i,Q")))]
+(define_insn "smax3"
+  [(set (match_operand:int_modes 0 "register_operand" 
"=r,r,r,r,r,r")
+   (smax:int_modes (match_operand:int_modes 1 "register_operand" 
"%0,0,0,0,0,0")
+   (match_operand:int_modes 2 "rx_source_operand"
+
"r,Sint08,Sint16,Sint24,i,Q")))]
   ""
   "max\t%Q2, %0"
   [(set_attr "length" "3,4,5,6,7,6")
(set_attr "timings" "11,11,11,11,11,33")]
 )
 
-(define_insn "sminsi3"
-  [(set (match_operand:SI  0 "register_operand" "=r,r,r,r,r,r")
-   (smin:SI (match_operand:SI 1 "register_operand" "%0,0,0,0,0,0")
-(match_operand:SI 2 "rx_source_operand"
-  "r,Sint08,Sint16,Sint24,i,Q")))]
+(define_insn "smin3"
+  [(set (match_operand:int_modes 0 "register_operand" 
"=r,r,r,r,r,r")
+   (smin:int_modes (match_operand:int_modes 1 "register_operand" 
"%0,0,0,0,0,0")
+(match_operand:int_modes2 "rx_source_operand"
+
"r,Sint08,Sint16,Sint24,i,Q")))]
   ""
   "min\t%Q2, %0"
   [(set_attr "length"  "3,4,5,6,7,6")
(set_attr "timings" "11,11,11,11,11,33")]
 )
 
+(define_insn "umax3_u"
+  [(set (match_operand:SI  0 "register_operand" "=r,r,r,r,r,r")
+   (smax:SI (match_operand:SI 1 "register_operand" "%0,0,0,0,0,0")
+(zero_extend:SI (match_operand:small_int_modes 2 
"rx_minmaxex_operand"
+   
"r,Sint08,Sint16,Sint24,i,Q"]
+  ""
+  "max\t%R2, %0"
+  [(set_attr "length"  "3,4,5,6,7,6")
+   (set_attr "timings" "11,11,11,11,11,33")]
+)
+
+(define_insn "umin3_ur"
+  [(set (match_operand:SI  0 "register_operand" "=r,r,r,r,r,r")
+   (smin:SI (zero_extend:SI (match_operand:small_int_modes 2 
"rx_minmaxex_operand"
+   
"r,Sint08,Sint16,Sint24,i,Q"))
+(match_operand:SI 1 "register_operand" "%0,0,0,0,0,0")))]
+  ""
+  "min\t%R2, %0"
+  [(set_attr "length"  "3,4,5,6,7,6")
+   (set_attr "timings" "11,11,11,11,11,33")]
+)
+
+(define_insn "umax3_ur"
+  [(set (match_operand:SI  0 "register_operand" "=r,r,r,r,r,r")
+   (smax:SI (zero_extend:SI (match_operand:small_int_modes 2 
"rx_minmaxex_operand"
+   
"r,Sint08,Sint16,Sint24,i,Q"))
+(match_operand:SI 1 "register_operand" "%0,0,0,0,0,0")))]
+  ""
+  "max\t%R2, %0"
+  [(set_attr "length"  "3,4,5,6,7,6")
+   (set_attr "timings" "11,11,11,11,11,33")]
+)
+
+(define_expand "umax3"
+  [(set (match_dup 4)
+   (zero_extend:SI (match_operand:small_int_modes 1 "regis

[Patch, Fortran] Add c_float128{,_complex} as GNU extension to ISO_C_BINDING

2011-09-28 Thread Tobias Burnus

This patch makes the GCC extension __float128 (_Complex) available in 
the C bindings via C_FLOAT128 and C_FLOAT128_COMPLEX.


Additionally, I have improved the diagnostic for explicitly use 
associating -std= versioned symbols. And I have finally added the 
iso*.def files to the makefile dependencies.


As usual, with -std=f2008, the GNU extensions are not loaded. I have 
also updated the documentation.


OK for the trunk?

Tobias

PS: If you think that C_FLOAT128/C_FLOAT128_COMPLEX are bad names for 
C's __float128, please speak up before gfortran - and other compilers 
implement it. (At least one vendor is implementing __float128 support 
and plans to modify ISO_C_BINDING.) The proper name would be 
C___FLOAT128, but that looks awkward!
2011-09-28  Tobias Burnus  

	* Make-lang.in (F95_PARSER_OBJS, GFORTRAN_TRANS_DEPS): Add
	dependency on iso-c-binding.def and iso-fortran-env.def.
	* module.c (import_iso_c_binding_module): Add error when
	explicitly importing a nonstandard symbol; extend standard-
	depending loading.
	* iso-c-binding.def: Add c_float128 and c_float128_complex
	integer parameters (for -std=gnu).
	* intrinsic.texi (ISO_C_Binding): Document them.
	* symbol.c (generate_isocbinding_symbol): Change macros
	to ignore GFC_STD_* data.
	* trans-types.c (gfc_init_c_interop_kinds): Ditto; make
	nonstatic and renamed from "init_c_interop_kinds".
	(gfc_init_kinds): Don't call it
	* trans-types.h (gfc_init_c_interop_kinds): Add prototype.
	* f95-lang.c (gfc_init_decl_processing): Call it.

2011-09-28  Tobias Burnus  

	* gfortran.dg/iso_c_binding_param_1.f90: New.
	* gfortran.dg/iso_c_binding_param_2.f90: New.
	* gfortran.dg/c_sizeof_2.f90: Update dg-error.

diff --git a/gcc/fortran/Make-lang.in b/gcc/fortran/Make-lang.in
index 0816458..b766da6 100644
--- a/gcc/fortran/Make-lang.in
+++ b/gcc/fortran/Make-lang.in
@@ -329,14 +329,16 @@ $(F95_PARSER_OBJS): fortran/gfortran.h fortran/libgfortran.h \
 		fortran/parse.h fortran/arith.h fortran/target-memory.h \
 		$(CONFIG_H) $(SYSTEM_H) $(TM_H) $(TM_P_H) coretypes.h \
 		$(RTL_H) $(TREE_H) $(TREE_DUMP_H) $(GGC_H) $(EXPR_H) \
-		$(FLAGS_H) output.h $(DIAGNOSTIC_H) errors.h $(FUNCTION_H) 
+		$(FLAGS_H) output.h $(DIAGNOSTIC_H) errors.h $(FUNCTION_H) \
+		fortran/iso-c-binding.def fortran/iso-fortran-env.def
 fortran/openmp.o: pointer-set.h $(TARGET_H) toplev.h
 
 GFORTRAN_TRANS_DEPS = fortran/gfortran.h fortran/libgfortran.h \
 fortran/intrinsic.h fortran/trans-array.h \
 fortran/trans-const.h fortran/trans-const.h fortran/trans.h \
 fortran/trans-stmt.h fortran/trans-types.h \
-$(CONFIG_H) $(SYSTEM_H) $(TREE_H) $(TM_H) coretypes.h $(GGC_H)
+$(CONFIG_H) $(SYSTEM_H) $(TREE_H) $(TM_H) coretypes.h $(GGC_H) \
+fortran/iso-c-binding.def fortran/iso-fortran-env.def
 
 fortran/f95-lang.o: $(GFORTRAN_TRANS_DEPS) fortran/mathbuiltins.def \
   gt-fortran-f95-lang.h gtype-fortran.h $(CGRAPH_H) $(TARGET_H) fortran/cpp.h \
diff --git a/gcc/fortran/f95-lang.c b/gcc/fortran/f95-lang.c
index 648831f..8f8dd7d 100644
--- a/gcc/fortran/f95-lang.c
+++ b/gcc/fortran/f95-lang.c
@@ -595,6 +595,7 @@ gfc_init_decl_processing (void)
   /* Set up F95 type nodes.  */
   gfc_init_kinds ();
   gfc_init_types ();
+  gfc_init_c_interop_kinds ();
 }
 
 
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 54e0b20..1bd5ec3 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -610,8 +610,8 @@ iso_fortran_env_symbol;
 #undef NAMED_DERIVED_TYPE
 
 #define NAMED_INTCST(a,b,c,d) a,
-#define NAMED_REALCST(a,b,c) a,
-#define NAMED_CMPXCST(a,b,c) a,
+#define NAMED_REALCST(a,b,c,d) a,
+#define NAMED_CMPXCST(a,b,c,d) a,
 #define NAMED_LOGCST(a,b,c) a,
 #define NAMED_CHARKNDCST(a,b,c) a,
 #define NAMED_CHARCST(a,b,c) a,
diff --git a/gcc/fortran/intrinsic.texi b/gcc/fortran/intrinsic.texi
index 9adeeab..a093bec 100644
--- a/gcc/fortran/intrinsic.texi
+++ b/gcc/fortran/intrinsic.texi
@@ -13006,7 +13006,9 @@ type default integer, which can be used as KIND type parameters.
 In addition to the integer named constants required by the Fortran 2003 
 standard, GNU Fortran provides as an extension named constants for the 
 128-bit integer types supported by the C compiler: @code{C_INT128_T, 
-C_INT_LEAST128_T, C_INT_FAST128_T}.
+C_INT_LEAST128_T, C_INT_FAST128_T}. Furthermore, if @code{__float} is
+supported in C, the named constants @code{C_FLOAT128, C_FLOAT128_COMPLEX}
+are defined.
 
 @multitable @columnfractions .15 .35 .35 .35
 @item Fortran Type  @tab Named constant @tab C type@tab Extension
@@ -13036,9 +13038,11 @@ C_INT_LEAST128_T, C_INT_FAST128_T}.
 @item @code{REAL}   @tab @code{C_FLOAT} @tab @code{float}
 @item @code{REAL}   @tab @code{C_DOUBLE}@tab @code{double}
 @item @code{REAL}   @tab @code{C_LONG_DOUBLE}   @tab @code{long double}
+@item @code{REAL}   @tab @code{C_FLOAT128}  @tab @code{__float128}@tab Ext.
 @item @code{COMPLEX}@tab @code{C_FLOAT_COMPLEX} @tab @code{

Re: Scalar vector binary operation

2011-09-28 Thread Artem Shinkarov

Ian

I can try to put a description in the document. I am not sure that I
have rights to commit to the svn, but at least I can try to write the
text.

There are also pending patches for vector-comparison (almost
submitted) and vector shuffling (still under discussion), but I hope
to finish both of them until the new release is coming.

Could you point out where the cnhanges.html is located in svn?


Thanks,
Artem.

Re: Vector Comparison patch

2011-09-28 Thread Richard Guenther

On Mon, Sep 26, 2011 at 5:43 PM, Richard Guenther
 wrote:
> On Mon, Sep 26, 2011 at 4:25 PM, Richard Guenther
>  wrote:
>> On Wed, Sep 7, 2011 at 5:06 PM, Joseph S. Myers  
>> wrote:
>>> This looks like it has the same issue with maybe needing to use
>>> TYPE_MAIN_VARIANT in type comparisons as the shuffle patch.
>>
>> I don't think so, we move qualifiers to the vector type from the element type
>> in make_vector_type and the tests only look at the component type.
>>
>> I am re-testing the patch currently and will commit it if that succeeds.
>
> Unfortunately gcc.c-torture/execute/vector-compare-1.c fails with -m32
> for
>
>    vector (2, double) d0;
>    vector (2, double) d1;
>    vector (2, long) idres;
>
>    d0 = (vector (2, double)){(double)argc,  10.};
>    d1 = (vector (2, double)){0., (double)-23};
>    idres = (d0 > d1);
>
> as appearantly the type we chose to assign to (d0 > d1) is different
> from that of idres:
>
> /space/rguenther/src/svn/trunk/gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5:
> error: incompatible types when assigning to type '__vector(2) long
> int' from type '__vector(2) long long int'^M
>
> Adjusting it to vector (2, long long) otoh yields, for -m64:
>
> /space/rguenther/src/svn/trunk/gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5:
> error: incompatible types when assigning to type '__vector(2) long
> long int' from type '__vector(2) long int'
>
> But those two types are at least compatible from their modes.  Joseph,
> should we accept mode-compatible types in assignments or maybe
> transparently convert them?

Looks like we have a more suitable solution for these automatically
generated vector types - mark them with TYPE_VECTOR_OPAQUE.

I'm testing the following incremental patch.

Richard.

Index: gcc/c-typeck.c
===
--- gcc/c-typeck.c.orig 2011-09-28 16:22:10.0 +0200
+++ gcc/c-typeck.c  2011-09-28 16:18:39.0 +0200
@@ -9928,8 +9928,10 @@ build_binary_op (location_t location, en
 }

   /* Always construct signed integer vector type.  */
-  intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE
(type0)), 0);
-  result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+  intt = c_common_type_for_size (GET_MODE_BITSIZE
+  (TYPE_MODE (TREE_TYPE (type0))), 0);
+  result_type = build_opaque_vector_type (intt,
+ TYPE_VECTOR_SUBPARTS (type0));
   converted = 1;
   break;
 }
@@ -10063,8 +10065,10 @@ build_binary_op (location_t location, en
 }

   /* Always construct signed integer vector type.  */
-  intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE
(type0)), 0);
-  result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+  intt = c_common_type_for_size (GET_MODE_BITSIZE
+  (TYPE_MODE (TREE_TYPE (type0))), 0);
+  result_type = build_opaque_vector_type (intt,
+ TYPE_VECTOR_SUBPARTS (type0));
   converted = 1;
   break;
 }

Re: [PATCH 2/7] Generate virtual locations for tokens

2011-09-28 Thread Jason Merrill


On 09/27/2011 08:03 PM, Dodji Seketeli wrote:

Jason Merrill  writes:


Yes, I was suggesting that you change that, since it's only used by
_get_location, which can get the location from the pointer instead.


Done.


And then you should be able to drop the track_macro_exp_p field from 
macro_arg_token_iter, and instead check whether location_ptr is set.


Jason

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread Carrot Wei

Hi Tom

What's the behavior of your patch to the following case

typedef int int_unaligned __attribute__((aligned(1)));
int foo (int_unaligned *p)
{
  return *p;
}

thanks
Carrot

On Tue, Sep 20, 2011 at 7:13 PM, Tom de Vries  wrote:
> Hi Richard,
>
> I have a patch for PR43814. It introduces an option that assumes that function
> arguments of pointer type are aligned, and uses that information in
> tree-ssa-ccp. This enables the memcpy in pr43814-2.c to be inlined.
>
> I tested the patch successfully on-by-default on x86_64 and i686 (both gcc 
> only
> builds).
>
> I also tested the patch on-by-default for ARM (gcc/glibc build). The patch
> generated wrong code for uselocale.c:
> ...
> glibc/locale/locale.h:
> ...
> /* This value can be passed to `uselocale' and may be returned by
>   it. Passing this value to any other function has undefined behavior.  */
> # define LC_GLOBAL_LOCALE       ((__locale_t) -1L)
> ...
> glibc/locale/uselocale.c:
> ...
> locale_t
> __uselocale (locale_t newloc)
> {
>  locale_t oldloc = _NL_CURRENT_LOCALE;
>
>  if (newloc != NULL)
>    {
>      const locale_t locobj
>        = newloc == LC_GLOBAL_LOCALE ? &_nl_global_locale : newloc;
>
> ...
> The assumption that function arguments of pointer type are aligned, allowed 
> the
> test 'newloc == LC_GLOBAL_LOCALE' to evaluate to false.
> But the usage of ((__locale_t) -1L) as function argument in uselocale violates
> that assumption.
>
> Fixing the definition of LC_GLOBAL_LOCALE allowed the gcc tests to run without
> regressions for ARM.
>
> Furthermore, the patch fixes ipa-sra-2.c and ipa-sra-6.c regressions on ARM,
> discussed here:
> - http://gcc.gnu.org/ml/gcc-patches/2011-08/msg00930.html
> - http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00459.html
>
> But, since glibc uses this construct currently, the option is off-by-default 
> for
> now.
>
> OK for trunk?
>
> Thanks,
> - Tom
>
> 2011-09-20  Tom de Vries 
>
>        PR target/43814
>        * tree-ssa-ccp.c (get_align_value): New function, factored out of
>        get_value_from_alignment.
>        (get_value_from_alignment): Use get_align_value.
>        (get_value_for_expr): Use get_align_value to handle alignment of
>        function argument pointers.
>        * common.opt (faligned-pointer-argument): New option.
>        * doc/invoke.texi (Optimization Options): Add
>        -faligned-pointer-argument.
>        (-faligned-pointer-argument): New item.
>
>        * gcc/testsuite/gcc.dg/pr43814.c: New test.
>        * gcc/testsuite/gcc.target/arm/pr43814-2.c: New test.
>

Commit: RX: Fix problems building libgcc

2011-09-28 Thread Nick Clifton

Hi Guys,

  I am applying the patch below to fix a couple of problems building
  libgcc for the RX target.  The first is that when 32-bit doubles are
  enabled we need to make sure that we never try to construct a 64-bit
  double type.  This is done in rx-lib.h, but it was only being enabled
  when constructing the float-type functions.  The patch enables the fix
  for when double-type functions are being constructed as well.

  The second fix is to prevent a extraneous renaming of the floatsisf
  and floatunsisf functions when building libgcc width 32-bit doubles
  enabled.

Cheers
  Nick

libgcc/ChangeLog
2011-09-28  Nick Clifton  

* config/rx/rx-lib.h: Always restrict doubles to the SF type when
64-bit doubles are not enabled.
* config/rx/rx-abi.h: Fix extraneous renaming of the floatsisf
and floatunsisf functions.

Index: libgcc/config/rx/rx-lib.h
===
--- libgcc/config/rx/rx-lib.h   (revision 179307)
+++ libgcc/config/rx/rx-lib.h   (working copy)
@@ -1,6 +1,5 @@
-#ifdef FLOAT
 #ifndef __RX_64BIT_DOUBLES__
 #define DF SF
 #define FLOAT_ONLY
 #endif
-#endif
+
Index: libgcc/config/rx/rx-abi.h
===
--- libgcc/config/rx/rx-abi.h   (revision 179307)
+++ libgcc/config/rx/rx-abi.h   (working copy)
@@ -80,16 +80,7 @@
 #endif
 
 
-#ifdef L_si_to_sf
-#define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatsisf, CONV32sf)
-#endif
 
-#ifdef L_usi_to_sf
-#define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatunsisf, CONV32uf)
-#endif
-
-
-
 #ifdef __RX_64BIT_DOUBLES__
 
 /* Float (32-bit) aliases...  */
@@ -176,6 +167,14 @@
 #define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (negdf2, NEGd)
 #endif
 
+#ifdef L_si_to_sf
+#define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatsisf, CONV32sf)
+#endif
+
+#ifdef L_usi_to_sf
+#define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatunsisf, CONV32uf)
+#endif
+
 /* The 64-bit comparison functions do not have aliases because libgcc2
does not provide them.  Instead they have to be supplied in
rx-abi-functions.c.  */

Re: Scalar vector binary operation

2011-09-28 Thread Ian Lance Taylor

On Wed, Aug 10, 2011 at 7:44 AM, Richard Guenther
 wrote:
>
> On Tue, Aug 9, 2011 at 10:23 PM, Artem Shinkarov
>  wrote:
> > Sorry, I didn't attach the patch itself.
> > Here we go, in the attachment.
>
> I have committed the patch after re-bootstrapping and testing it
> on x86_64-unknown-linux-gnu with {,-m32}.

This is new functionality for gcc's vector extension, and I think it
should be mentioned at http://gcc.gnu.org/gcc-4.7/changes.html.  Does
somebody want to take care of that?

Ian

[PATCH] Fix PR50460

2011-09-28 Thread Richard Guenther


This extends try_move_mult_to_index folding to be less picky about
the outermost array reference and handles a component-ref of an
array the same as if that were wrapped with an array-ref with
minimum index.

This consolidates array-ref reconstruction to the frontend-closest
position we have right now (eventually that function should move
to some c-family helper I guess).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2011-09-28  Richard Guenther  

PR middle-end/50460
* fold-const.c (try_move_mult_to_index): Handle &a.array the
same as &a.array[0].

Index: gcc/fold-const.c
===
*** gcc/fold-const.c(revision 179308)
--- gcc/fold-const.c(working copy)
*** try_move_mult_to_index (location_t loc,
*** 6870,6875 
--- 6870,6929 
  
  break;
}
+   else if (TREE_CODE (ref) == COMPONENT_REF
+  && TREE_CODE (TREE_TYPE (ref)) == ARRAY_TYPE)
+   {
+ tree domain;
+ 
+ /* Remember if this was a multi-dimensional array.  */
+ if (TREE_CODE (TREE_OPERAND (ref, 0)) == ARRAY_REF)
+   mdim = true;
+ 
+ domain = TYPE_DOMAIN (TREE_TYPE (ref));
+ if (! domain)
+   continue;
+ itype = TREE_TYPE (domain);
+ 
+ step = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (ref)));
+ if (TREE_CODE (step) != INTEGER_CST)
+   continue;
+ 
+ if (s)
+   {
+ if (! tree_int_cst_equal (step, s))
+ continue;
+   }
+ else
+   {
+ /* Try if delta is a multiple of step.  */
+ tree tmp = div_if_zero_remainder (EXACT_DIV_EXPR, op1, step);
+ if (! tmp)
+   continue;
+ delta = tmp;
+   }
+ 
+ /* Only fold here if we can verify we do not overflow one
+dimension of a multi-dimensional array.  */
+ if (mdim)
+   {
+ tree tmp;
+ 
+ if (!TYPE_MIN_VALUE (domain)
+ || !TYPE_MAX_VALUE (domain)
+ || TREE_CODE (TYPE_MAX_VALUE (domain)) != INTEGER_CST)
+   continue;
+ 
+ tmp = fold_binary_loc (loc, PLUS_EXPR, itype,
+fold_convert_loc (loc, itype,
+  TYPE_MIN_VALUE (domain)),
+fold_convert_loc (loc, itype, delta));
+ if (TREE_CODE (tmp) != INTEGER_CST
+ || tree_int_cst_lt (TYPE_MAX_VALUE (domain), tmp))
+   continue;
+   }
+ 
+ break;
+   }
else
mdim = false;
  
*** try_move_mult_to_index (location_t loc,
*** 6892,6903 
pos = TREE_OPERAND (pos, 0);
  }
  
!   TREE_OPERAND (pos, 1) = fold_build2_loc (loc, PLUS_EXPR, itype,
!  fold_convert_loc (loc, itype,
!TREE_OPERAND (pos, 1)),
!  fold_convert_loc (loc, itype, delta));
! 
!   return fold_build1_loc (loc, ADDR_EXPR, TREE_TYPE (addr), ret);
  }
  
  
--- 6946,6974 
pos = TREE_OPERAND (pos, 0);
  }
  
!   if (TREE_CODE (ref) == ARRAY_REF)
! {
!   TREE_OPERAND (pos, 1)
!   = fold_build2_loc (loc, PLUS_EXPR, itype,
!  fold_convert_loc (loc, itype, TREE_OPERAND (pos, 1)),
!  fold_convert_loc (loc, itype, delta));
!   return fold_build1_loc (loc, ADDR_EXPR, TREE_TYPE (addr), ret);
! }
!   else if (TREE_CODE (ref) == COMPONENT_REF)
! {
!   gcc_assert (ret == pos);
!   ret = build4_loc (loc, ARRAY_REF, TREE_TYPE (TREE_TYPE (ref)), ret,
!   fold_build2_loc
! (loc, PLUS_EXPR, itype,
!  fold_convert_loc (loc, itype,
!TYPE_MIN_VALUE
!  (TYPE_DOMAIN (TREE_TYPE (ref,
!  fold_convert_loc (loc, itype, delta)),
!   NULL_TREE, NULL_TREE);
!   return build_fold_addr_expr_loc (loc, ret);
! }
!   else
! gcc_unreachable ();
  }

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Michael Zolotukhin

>   It appears that part 1 of the patch wasn't really attached.
Thanks, resending.


memfunc-mid.patch
Description: Binary data

Re: [PATCH, ARM] Unaligned accesses for builtin memcpy [2/2]

2011-09-28 Thread Ramana Radhakrishnan

On 6 May 2011 14:13, Julian Brown  wrote:
> Hi,
>
> This is the second of two patches to add unaligned-access support to
> the ARM backend. It builds on the first patch to provide support for
> unaligned accesses when expanding block moves (i.e. for builtin memcpy
> operations). It makes some effort to use load/store multiple
> instructions where appropriate (when accessing sufficiently-aligned
> source or destination addresses), and also makes some effort to
> generate fast code (for -O1/2/3) or small code (for -Os), though some
> of the heuristics may need tweaking still

Sorry it's taken me a while to get around to this one. Do you know
what difference this makes to performance on some standard benchmarks
on let's say an A9 and an M4 as I see that this gets triggered only
when we have less than 64 bytes to copy. ?

Please add a few testcases from the examples that you've shown here to
be sure that ldm's are being generated in the right cases.

cheers
Ramana

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Jack Howarth

On Wed, Sep 28, 2011 at 02:56:30PM +0400, Michael Zolotukhin wrote:
> Attached is a part 1 of patch that enables use of vector-instructions
> in memset and memcopy (middle-end part).
> The main part of the changes is in functions
> move_by_pieces/set_by_pieces. In new version algorithm of move-mode
> selection was changed – now it checks if alignment is known at compile
> time and uses cost-models to choose between aligned and unaligned
> vector or not-vector move-modes.
> 

Michael,
   It appears that part 1 of the patch wasn't really attached.
   Jack

> 
> 
> 
> -- 
> ---
> Best regards,
> Michael V. Zolotukhin,
> Software Engineer
> Intel Corporation.

Re: Intrinsics for N2965: Type traits and base classes

2011-09-28 Thread Mike Spertus

OK. Here are some simple benchmarks. I simulated heavy use of reflection 
with 1000 classes that each had about a thousand base classes. I also 
created a super-simple typelist class


template struct typelist {}; // Variadic templates rock

If bases returns a typelist, the program takes about 4 sec.
If bases returns a tuple, the program takes about 4 min.

If I make the program any bigger, the tuple case fails to compile with 
spurious error messages, while the typelist version stays quick.


Given that metaprograms typically create large class hierarchies (look 
at Alexandrescu's CreateScatterHierarchy that he uses to implement 
factory in the Modern C++ design book) and that compile times are an 
enormous obstacle to metaprogramming, I don't think these tests are at 
all ridiculous.


I think this shows we need to return a typelist instead of a tuple.

As I mentioned earlier, I could just return the typelist, or hide it by 
returning an unspecified type (which would actually be a typelist) that 
you would apply a first<> and a rest<> template to walk through. This 
would give us more flexibility for the future (e.g., if a standard 
typelist type is adopted. Likewise, we would be covered if wanted to 
change bases implementation in the future to return an associative 
container. For example, if using size::type>>::value to 
count the number of occurrences of A as a base class of E turns out to 
be useful).


Thanks,

Mike

On 9/28/2011 6:54 AM, Mike Spertus wrote:
Don't worry, I'm not suggesting including boost::mpl at all, just 
leaving the return type of the bases trait unspecified. IMO, your 
example illustrates my point that without performance tuning, 
compiling metaprograms can be prohibitively expensive, so I want to 
avoid running the tuple metaprogram that creates the fields when we 
never need to instantiate the type. Benchmarks soon.


Mike

On 9/28/2011 2:53 AM, Jonathan Wakely wrote:

On 28 September 2011 04:22, Michael Spertus wrote:

Benjamin,
I think tuple is wrong both for performance reasons (I believe these 
are likely to be serious enough to depress use due to inordinately 
long compiles) and because it prematurely locks us into a rigid 
choice of how our typelists are implemented.


My inclination is to make it type-independent by returning an 
unspecified type that can have a sequence of types extracted from it 
(this is the approach taken by boost::mpl and has loads of 
experience that shows it is a good approach to metaprogramming). In 
other words, first>::type would be the first base of A, etc.

Citing Boost MPL as a good way to avoid inordinately long compiles ...
interesting!  Have you ever tried to reduce a GCC bug report from 20k
lines to 20, because most Boost libs include every MPL header?!

I hope we can get a simple typelist _without_ needing everything else
in MPL, such as the apply and lambda metafunctions (and maybe a lot of
that could be massively simplified using variadic templates anyway.)

.

Re: Go patch committed: Update libgo to Go release r60

2011-09-28 Thread Ian Lance Taylor

Rainer Orth  writes:

> Thanks, I'd missed that.  It turned out that IRIX 6 needs one more
> change to return to bootstrap land:  only defines
> TIOCNOTTY if !_XOPEN_SOURCE, which we need for other stuff
> (cf. configure.ac).  I've cheated and use  instead, which
> doesn't have this check.  With this patch, a Go-only IRIX 6.5 bootstrap
> completed successfully.

Thanks.  Committed to mainline.

Ian

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Michael Zolotukhin

> (I worry about the tables in i386.c deciding what strategy to use for block of
> given size. This is more or less unrelated to the actual patch)
Yep, the threshold values I mentioned above are the values in these
tables. Even with fast glibs there are some cases when inlining is
profitable (e.g. if alignment is known at compile time).

On 28 September 2011 16:54, Jan Hubicka  wrote:
>> > Do you know glibc version numbers when
>> > the optimized string functions was introduced?
>>
>> Afaik, it's 2.13.
>> I also compared my implementation to 2.13.
>
> I wonder if we can assume that most of GCC 4.7 based systems will be glibc 
> 2.13
> based, too.  I would tend to say that yes and thus would suggest to tamn down
> inlining that is no longer profitable on newer glibcs with a note in
> changes.html...
>
> (I worry about the tables in i386.c deciding what strategy to use for block of
> given size. This is more or less unrelated to the actual patch)
>
> Honza
>



-- 
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.

Re: Vector shuffling

2011-09-28 Thread Artem Shinkarov

On Thu, Sep 15, 2011 at 8:05 PM, Richard Henderson  wrote:
>> +The elements of the input vectors are numbered from left to right across
>> +one or both of the vectors. Each element in the mask specifies a number
>> +of element from the input vector(s). Consider the following example.
>
> It would be more preferable to talk about the memory ordering of the
> elements rather than "left" and "right" which are ambiguous at best.
>
>> +  if (TREE_CODE (mask) == VECTOR_CST)
>> +    {
>> +      tree m_type, call;
>> +      tree fn = targetm.vectorize.builtin_vec_perm (TREE_TYPE (v0), 
>> &m_type);
>> +      /*rtx t;*/
>
> Leftover crap.

Fixed.

>> +
>> +      if (!fn)
>> +     goto vshuffle;
>> +
>> +      if (m_type != TREE_TYPE (TREE_TYPE (mask)))
>> +     {
>> +       int units = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask));
>> +       tree cvt = build_vector_type (m_type, units);
>> +       mask = fold_convert (cvt, mask);
>> +     }
>> +
>> +      call = fold_build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (fn)), 
>> fn);
>> +      call = build_call_nary (type /* ? */, call, 3, v0, v1, mask);
>> +
>> +      return expand_expr_real_1 (call, target, VOIDmode, EXPAND_NORMAL, 
>> NULL);
>> +    }
>> +
>> +vshuffle:
>> +  gcc_assert (operand_equal_p (v0, v1, 0));
>
> Why can't a non-constant shuffle have different V0 and V1?  That seems
> like a direct violation of the documentation, and any sort of usefulness.

Ok, I agree. The reason why this assert is here is that noone in the
middle-end generates the code that does not meet this assert. In
principle we definitely want to support it in the upcoming patches,
but it would be nice to start with a simple thing.

> Also, while I'm ok with the use of builtin_vec_perm here in the short
> term, I think that in the long term we should simply force the named
> pattern to handle constants.  Then the vectorizer can simply use the
> predicate and the tree code and we can drop the large redundancy of
> builtins with different argument types.
>
> Indeed, once this patch is applied, I think that ought to be the very
> next task in this domain.
>
>> +/* Vector shuffle expression.  A = VEC_SHUFFLE_EXPR
>
> Typo in "mask".
>
>> +   foreach i in length (mask):
>> +     A = mask[i] < length (v0) ? v0[mask[i]] : v1[mask[i]]
>
> Surely it's v1[mask[i] - length].
>
>> +      if (TREE_CODE (vect) == VECTOR_CST)
>> +        {
>> +            unsigned i;
>
> Indentation is off all through this function.

Fixed.

>> +  mask = gen_rtx_AND (maskmode, mask, mm);
>> +
>> +  /* Convert mask to vector of chars.  */
>> +  mask = simplify_gen_subreg (V16QImode, mask, maskmode, 0);
>> +  mask = force_reg (V16QImode, mask);
>
> Why are you using force_reg to do all the dirty work?  Seems to
> me this should be using expand_normal.  All throughout this
> function.  That would also avoid the need for all of the extra
> force_reg stuff that ought not be there for -O0.

I don't really understand this. As far as I know, expand_normal
"converts" tree to rtx. All my computations are happening at the level
of rtx and force_reg is needed just to bring an rtx expression to the
register of the correct mode. If I am missing something, could you
give an example how can I use expand_normal instead of force_reg in
this particular code.

> I also see that you're not even attempting to use xop_pperm.

As I said, I am happy to experiment with the cases v0 != v1 in the
upcoming patches. Let's just start with a simple thing and see what
kind of issues/problems it would bring.

> Is ssse3_pshufb why you do the wrong thing in the expander for v0 != v1?

My personal feeling is that it may be the case with v0 != v1, that it
would be more efficient to perform piecewise shuffling rather than
bitwise dances around the masks.

> And give the vshuffle named pattern the wrong number of arguments?

Ok, If I'll make vshuffle to accept only two arguments -- vector and
mask, would it be ok?

> It's certainly possible to handle it, though it takes a few more steps,
> and might well be more efficient as a libgcc function rather than inline.

I don't really understand why it could be more efficient. I thought
that inline gives more chances to the final RTL optimisation.
>
>
> r~
>
>
>

Thanks,
Artem.

Re: [PATCH 2/3] Use urandom to get random seed

2011-09-28 Thread Richard Guenther

On Wed, Sep 28, 2011 at 2:49 PM, Andi Kleen  wrote:
> From: Andi Kleen 
>
> When available use /dev/urandom to get the random seem. This will lower the 
> probability
> of collisions.
>
> On other systems it will fallback to the old methods.
>
> Passes bootstrap + testsuite on x86_64. Ok?
>
> gcc/:
>
> * 2011-09-26   Andi Kleen 
>
>        * toplev.c (init_local_tick): Try reading random seed from /dev/urandom
> ---
>  gcc/toplev.c |   12 +++-
>  1 files changed, 11 insertions(+), 1 deletions(-)
>
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index 78583fc..ab6b5a4 100644
> --- a/gcc/toplev.c
> +++ b/gcc/toplev.c
> @@ -262,7 +262,17 @@ init_local_tick (void)
>  {
>   if (!flag_random_seed)
>     {
> -      /* Get some more or less random data.  */
> +      /* Try urandom first. Time of day is too likely to collide.
> +        In case of any error we just use the local tick. */
> +
> +      int fd = open ("/dev/urandom", O_RDONLY);
> +      if (fd >= 0)
> +        {
> +          read (fd, &random_seed, sizeof (random_seed));

I suppose we might get interrupted before anything is read and
read can return with -1 (I suppose partial reads are quite unlikely
though)?  Thus, don't we need the usual EINTR loop?

If not, the patch is ok.

Thanks,
Richard.

> +          close (fd);
> +        }
> +
> +      /* Now get the tick anyways  */
>  #ifdef HAVE_GETTIMEOFDAY
>       {
>        struct timeval tv;
> --
> 1.7.5.4
>
>

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Jan Hubicka

> > Do you know glibc version numbers when
> > the optimized string functions was introduced?
> 
> Afaik, it's 2.13.
> I also compared my implementation to 2.13.

I wonder if we can assume that most of GCC 4.7 based systems will be glibc 2.13
based, too.  I would tend to say that yes and thus would suggest to tamn down
inlining that is no longer profitable on newer glibcs with a note in
changes.html...

(I worry about the tables in i386.c deciding what strategy to use for block of
given size. This is more or less unrelated to the actual patch)

Honza

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Michael Zolotukhin

> Do you know glibc version numbers when
> the optimized string functions was introduced?

Afaik, it's 2.13.
I also compared my implementation to 2.13.

[PATCH 1/3] Change random seeds to 64bit and drop re-crcing

2011-09-28 Thread Andi Kleen

From: Andi Kleen 

I had some trouble with random build failures in a large LTO project
and it turned out to be random seed collisions in a highly parallel build
(thanks to Honza for suggesting that)

There were multiple problems:
- The way to generate the random seed is not very random (milliseconds time 
plus pid)
and prone to collisions on highly parallel builds
- It's only 32bit
- Several users take the existing ascii seed and re-CRC32 it again, which
doesn't exactly improve it.

This patch changes that to:
- Always use 64bit seeds as numbers (no re-crcing)
- Change all users to use HOST_WIDE_INT
- When the user specifies a random seed it's still crc32ed, but only in
this case.

Passes bootstrap + testsuite on x86_64-linux.

gcc/cp:

2011-09-26   Andi Kleen 

* repo.c (finish_repo): Use HOST_WIDE_INT_PRINT_HEX_PURE.

gcc/:

2011-09-26   Andi Kleen 

* hwint.h (HOST_WIDE_INT_PRINT_HEX_PURE): Add.
* lto-streamer.c (lto_get_section_name): Remove crc32_string.
Handle numerical random seed.
* lto-streamer.h (lto_file_decl_data): Change id to unsigned 
HOST_WIDE_INT.
* toplev.c (random_seed): Add.
(init_random_seed): Change for numerical random seed.
(get_random_seed): Return as HOST_WIDE_INT.
(set_random_seed): Crc32 existing string.
* toplev.h (get_random_seed): Change to numercal return.
* tree.c (get_file_function_name): Remove CRC. Handle numerical random 
seed.

gcc/lto/:

2011-09-26   Andi Kleen 

* lto.c (lto_resolution_read): Remove id dumping.
(lto_section_with_id): Turn id HOST_WIDE_ID.
(create_subid_section_table): Dito.
---
 gcc/cp/repo.c  |3 ++-
 gcc/hwint.h|1 +
 gcc/lto-streamer.c |8 
 gcc/lto-streamer.h |2 +-
 gcc/lto/lto.c  |   11 ---
 gcc/toplev.c   |   27 +--
 gcc/toplev.h   |2 +-
 gcc/tree.c |6 +++---
 8 files changed, 33 insertions(+), 27 deletions(-)

diff --git a/gcc/cp/repo.c b/gcc/cp/repo.c
index 16a192e..ca971b6 100644
--- a/gcc/cp/repo.c
+++ b/gcc/cp/repo.c
@@ -263,7 +263,8 @@ finish_repo (void)
 anonymous namespaces will get the same mangling when this
 file is recompiled.  */
   if (!strstr (args, "'-frandom-seed="))
-   fprintf (repo_file, " '-frandom-seed=%s'", get_random_seed (false));
+   fprintf (repo_file, " '-frandom-seed=" HOST_WIDE_INT_PRINT_HEX_PURE 
"'", 
+get_random_seed (false));
   fprintf (repo_file, "\n");
 }
 
diff --git a/gcc/hwint.h b/gcc/hwint.h
index 2643aee..f5e0bee 100644
--- a/gcc/hwint.h
+++ b/gcc/hwint.h
@@ -102,6 +102,7 @@ extern char sizeof_long_long_must_be_8[sizeof(long long) == 
8 ? 1 : -1];
 #define HOST_WIDE_INT_PRINT_DEC_C HOST_WIDE_INT_PRINT_DEC HOST_WIDE_INT_PRINT_C
 #define HOST_WIDE_INT_PRINT_UNSIGNED "%" HOST_WIDE_INT_PRINT "u"
 #define HOST_WIDE_INT_PRINT_HEX "%#" HOST_WIDE_INT_PRINT "x"
+#define HOST_WIDE_INT_PRINT_HEX_PURE "%" HOST_WIDE_INT_PRINT "x"
 
 /* Set HOST_WIDEST_INT.  This is a 64-bit type unless the compiler
in use has no 64-bit type at all; in that case it's 32 bits.  */
diff --git a/gcc/lto-streamer.c b/gcc/lto-streamer.c
index 633c3ce..e3ccb79 100644
--- a/gcc/lto-streamer.c
+++ b/gcc/lto-streamer.c
@@ -166,13 +166,13 @@ lto_get_section_name (int section_type, const char *name, 
struct lto_file_decl_d
  doesn't confuse the reader with merged sections.
 
  For options don't add a ID, the option reader cannot deal with them
- and merging should be ok here.
-
- XXX: use crc64 to minimize collisions? */
+ and merging should be ok here. */
   if (section_type == LTO_section_opts)
 strcpy (post, "");
+  else if (f != NULL) 
+sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, f->id);
   else
-sprintf (post, ".%x", f ? f->id : crc32_string(0, get_random_seed 
(false)));
+sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, get_random_seed (false)); 
   return concat (LTO_SECTION_NAME_PREFIX, sep, add, post, NULL);
 }
 
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 190d6a3..2564bd2 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -552,7 +552,7 @@ struct GTY(()) lto_file_decl_data
   struct lto_file_decl_data *next;
 
   /* Sub ID for merged objects. */
-  unsigned id;
+  unsigned HOST_WIDE_INT id;
 
   /* Symbol resolutions for this file */
   VEC(ld_plugin_symbol_resolution_t,heap) * GTY((skip)) resolutions;
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index 0b1dcb9..77eb1a1 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -976,9 +976,6 @@ lto_resolution_read (splay_tree file_ids, FILE *resolution, 
lto_file *file)
}
 
   file_data = (struct lto_file_decl_data *)nd->value;
-  if (cgraph_dump_file)
-   fprintf (cgraph_dump_file, "Adding resolution %u %u to id %x\n",
-index, r, file_data->id);
   VEC_safe_grow_cleared (ld_plugin_symbol_resolution_t, heap,

[PATCH 3/3] Use urandom in gcc.c too

2011-09-28 Thread Andi Kleen

From: Andi Kleen 

gcc also takes generates a random number in some special circumstances,
so teach it about /dev/urandom too.

gcc/:

2011-09-27   Andi Kleen 

* gcc.c (get_local_tick). Rename to get_random_number.  Read from 
/dev/urandom.
Add getpid call.
(compare_debug_dump_opt_spec_function): Drop getpid call.
---
 gcc/gcc.c |   22 --
 1 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/gcc/gcc.c b/gcc/gcc.c
index ddec8db..3bfdf77 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -8062,12 +8062,22 @@ print_asm_header_spec_function (int arg 
ATTRIBUTE_UNUSED,
   return NULL;
 }
 
-/* Compute a timestamp to initialize flag_random_seed.  */
+/* Get a random number for -frandom-seed */
 
-static unsigned
-get_local_tick (void)
+static unsigned HOST_WIDE_INT
+get_random_number (void)
 {
-  unsigned ret = 0;
+  unsigned HOST_WIDE_INT ret = 0;
+  int fd; 
+
+  fd = open ("/dev/urandom", O_RDONLY); 
+  if (fd >= 0)
+{
+  read (fd, &ret, sizeof (HOST_WIDE_INT));
+  close (fd);
+  if (ret)
+return ret;
+}
 
   /* Get some more or less random data.  */
 #ifdef HAVE_GETTIMEOFDAY
@@ -8086,7 +8096,7 @@ get_local_tick (void)
   }
 #endif
 
-  return ret;
+  return ret ^ getpid();
 }
 
 /* %:compare-debug-dump-opt spec function.  Save the last argument,
@@ -8145,7 +8155,7 @@ compare_debug_dump_opt_spec_function (int arg,
 
   if (!which)
 {
-  unsigned HOST_WIDE_INT value = get_local_tick () ^ getpid ();
+  unsigned HOST_WIDE_INT value = get_random_number ();
 
   sprintf (random_seed, HOST_WIDE_INT_PRINT_HEX, value);
 }
-- 
1.7.5.4

[PATCH 2/3] Use urandom to get random seed

2011-09-28 Thread Andi Kleen

From: Andi Kleen 

When available use /dev/urandom to get the random seem. This will lower the 
probability
of collisions.

On other systems it will fallback to the old methods.

Passes bootstrap + testsuite on x86_64. Ok?

gcc/:

* 2011-09-26   Andi Kleen 

* toplev.c (init_local_tick): Try reading random seed from /dev/urandom
---
 gcc/toplev.c |   12 +++-
 1 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/gcc/toplev.c b/gcc/toplev.c
index 78583fc..ab6b5a4 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -262,7 +262,17 @@ init_local_tick (void)
 {
   if (!flag_random_seed)
 {
-  /* Get some more or less random data.  */
+  /* Try urandom first. Time of day is too likely to collide. 
+In case of any error we just use the local tick. */
+
+  int fd = open ("/dev/urandom", O_RDONLY);
+  if (fd >= 0)
+{
+  read (fd, &random_seed, sizeof (random_seed));
+  close (fd);
+}
+
+  /* Now get the tick anyways  */
 #ifdef HAVE_GETTIMEOFDAY
   {
struct timeval tv;
-- 
1.7.5.4

Updated random seed patchkit

2011-09-28 Thread Andi Kleen

I addressed all review comments, clarified some code.
The random seed generation in gcc.c is now fixed too and toplev.c
detects this case and doesn't rerun the CRC.

Repasses bootstrap and testsuite on x86_64-linux.

Since the previous version was approved I will commit in 24h,
unless there are new objections.

-Andi

Re: Support for V2 plugin API

2011-09-28 Thread Richard Guenther

On Wed, 28 Sep 2011, Jan Hubicka wrote:

> Hi,
> this patch adds support for V2 plugin API (thanks, Cary) that adds
> LDPR_PREVAILING_DEF_IRONLY_EXP.
> The reoslution is like LDPR_PREVAILING_DEF_IRONLY but the symbol is exported
> out of DSO.  It is up to the compiler to optimize it out or keep it based on
> the knowledge whether the symbol can be optimized out at all (i.e. most 
> COMDATs
> can, other types can't).
> 
> This solve quite few problems with building C++ APPS, see the PR log.
> 
> I was originally wrong about gold implementation being buggy. The problem 
> turned
> out to be subtle lto-symtab bug that was mostly latent because of the COMDAT 
> hack
> we use. lto_symtab_resolve_symbols is supposed to honor plugin decision when 
> it is
> available but it doesn't when resolution of very first entry in the list is 
> UNKNOWN.
> This can happen because we add into symtab also declarations that are not in
> varpool (i.e. they are neither defined or used by the object file), but they 
> are
> used otherwise, i.e. referred from stuff used for debug info or TB 
> devirtualization.
> 
> To ensure backward compatibility I am keeping the COMDAT hack in place.  It 
> won't help
> letting compiler know the plugin API version, since we decide on that at a 
> time
> we output object files and thus we are not called from plugin.  I suppose we 
> could
> keep the hack in place for next release and remove it afterwards penalizing 
> builds
> with old binutils? Or perhaps even in GCC 4.7 if GNU LD gets updated in time.
> 
> Bootstrapped/regtested x86_64-linux, built Mozilla and lto-bootstrap in 
> progress.
> OK if it passes?

Ok.

Any idea when GNU ld will catch up?

Thanks,
Richard.

> Honza
> 
>   PR lto/47247
>   * lto-plugin.c (get_symbols_v2): New variable.
>   (write_resolution): Use V2 API when available.
>   (onload): Handle LDPT_GET_SYMBOLS_V2.
> 
>   * lto-symtab.c (lto_symtab_resolve_symbols): Do not resolve
>   when resolution is already availbale from plugin.
>   (lto_symtab_merge_decls_1): Handle LDPR_PREVAILING_DEF_IRONLY_EXP.
>   * cgraph.c (ld_plugin_symbol_resolution): Add prevailing_def_ironly_exp.
>   * lto-cgraph.c (LDPR_NUM_KNOWN): Update.
>   * ipa.c (varpool_externally_visible_p): IRONLY variables are never
>   externally visible.
>   * varasm.c (resolution_to_local_definition_p): Add
>   LDPR_PREVAILING_DEF_IRONLY_EXP.
>   (resolution_local_p): Likewise.
> 
>   * common.c (lto_resolution_str): Add new resolution.
>   * common.h (lto_resolution_str): Likewise.
> Index: lto-plugin/lto-plugin.c
> ===
> *** lto-plugin/lto-plugin.c   (revision 179274)
> --- lto-plugin/lto-plugin.c   (working copy)
> *** enum symbol_style
> *** 129,135 
>   static char *arguments_file_name;
>   static ld_plugin_register_claim_file register_claim_file;
>   static ld_plugin_register_all_symbols_read register_all_symbols_read;
> ! static ld_plugin_get_symbols get_symbols;
>   static ld_plugin_register_cleanup register_cleanup;
>   static ld_plugin_add_input_file add_input_file;
>   static ld_plugin_add_input_library add_input_library;
> --- 129,135 
>   static char *arguments_file_name;
>   static ld_plugin_register_claim_file register_claim_file;
>   static ld_plugin_register_all_symbols_read register_all_symbols_read;
> ! static ld_plugin_get_symbols get_symbols, get_symbols_v2;
>   static ld_plugin_register_cleanup register_cleanup;
>   static ld_plugin_add_input_file add_input_file;
>   static ld_plugin_add_input_library add_input_library;
> *** write_resolution (void)
> *** 441,447 
> struct plugin_symtab *symtab = &info->symtab;
> struct ld_plugin_symbol *syms = symtab->syms;
>   
> !   get_symbols (info->handle, symtab->nsyms, syms);
>   
> finish_conflict_resolution (symtab, &info->conflicts);
>   
> --- 441,452 
> struct plugin_symtab *symtab = &info->symtab;
> struct ld_plugin_symbol *syms = symtab->syms;
>   
> !   /* Version 2 of API supports IRONLY_EXP resolution that is
> !  accepted by GCC-4.7 and newer.  */
> !   if (get_symbols_v2)
> ! get_symbols_v2 (info->handle, symtab->nsyms, syms);
> !   else
> ! get_symbols (info->handle, symtab->nsyms, syms);
>   
> finish_conflict_resolution (symtab, &info->conflicts);
>   
> *** onload (struct ld_plugin_tv *tv)
> *** 986,991 
> --- 991,999 
>   case LDPT_REGISTER_ALL_SYMBOLS_READ_HOOK:
> register_all_symbols_read = p->tv_u.tv_register_all_symbols_read;
> break;
> + case LDPT_GET_SYMBOLS_V2:
> +   get_symbols_v2 = p->tv_u.tv_get_symbols;
> +   break;
>   case LDPT_GET_SYMBOLS:
> get_symbols = p->tv_u.tv_get_symbols;
> break;
> Index: gcc/lto-symtab.c
> ===
> *** gcc/lt

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Michael Zolotukhin

This expanding only works on relatively small sizes (up to 4k), where
overhead of library call could be quite significant. In some cases new
implementation gives 5x acceleration (especially on small sizes - less
than ~256 bytes). Almost on all sizes from 16 to 4096 bytes there is a
some gain, in average it's 20-30% on 64-bits and 40-50% on 32-bits (on
Atom).
This inlining implementation isn't intended to replace glibc, it's
intended to replace old implementation which sometimes is quite slow.

If glibc-calls turn out to be faster than this expanding, libcall is
generated (special experiments were carried out to find threshold
values in cost models).

If the size is unknown at all, this inlining doesn't work (i.e glibc is called).

On 28 September 2011 15:55, Jakub Jelinek  wrote:
> On Wed, Sep 28, 2011 at 04:41:47AM -0700, Andi Kleen wrote:
>> Michael Zolotukhin  writes:
>> >
>> > Build and 'make check' was tested.
>>
>> Could you expand a bit on the performance benefits?  Where does it help?
>
> Especially when glibc these days has very well optimized implementations
> tuned for various CPUs and it is very unlikely beneficial to inline
> memcpy/memset if they aren't really short or have unknown number of
> iterations.
>
>        Jakub
>

-- 
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Jan Hubicka

> On Wed, Sep 28, 2011 at 04:41:47AM -0700, Andi Kleen wrote:
> > Michael Zolotukhin  writes:
> > >
> > > Build and 'make check' was tested.
> > 
> > Could you expand a bit on the performance benefits?  Where does it help?
> 
> Especially when glibc these days has very well optimized implementations
> tuned for various CPUs and it is very unlikely beneficial to inline
> memcpy/memset if they aren't really short or have unknown number of
> iterations.

I guess we should update the expansion tables so we produce function calls more 
often.
I will look how things behave on my setup.  Do you know glibc version numbers 
when
the optimized string functions was introduced?

Concerning inline SSE, I think it makes a lot of sense when we know size &
alignment so we can output just few SSE moves instead of more integer moves.
We definitely need some numbers for the loop variants.

Honza

Support for V2 plugin API

2011-09-28 Thread Jan Hubicka

Hi,
this patch adds support for V2 plugin API (thanks, Cary) that adds
LDPR_PREVAILING_DEF_IRONLY_EXP.
The reoslution is like LDPR_PREVAILING_DEF_IRONLY but the symbol is exported
out of DSO.  It is up to the compiler to optimize it out or keep it based on
the knowledge whether the symbol can be optimized out at all (i.e. most COMDATs
can, other types can't).

This solve quite few problems with building C++ APPS, see the PR log.

I was originally wrong about gold implementation being buggy. The problem turned
out to be subtle lto-symtab bug that was mostly latent because of the COMDAT 
hack
we use. lto_symtab_resolve_symbols is supposed to honor plugin decision when it 
is
available but it doesn't when resolution of very first entry in the list is 
UNKNOWN.
This can happen because we add into symtab also declarations that are not in
varpool (i.e. they are neither defined or used by the object file), but they are
used otherwise, i.e. referred from stuff used for debug info or TB 
devirtualization.

To ensure backward compatibility I am keeping the COMDAT hack in place.  It 
won't help
letting compiler know the plugin API version, since we decide on that at a time
we output object files and thus we are not called from plugin.  I suppose we 
could
keep the hack in place for next release and remove it afterwards penalizing 
builds
with old binutils? Or perhaps even in GCC 4.7 if GNU LD gets updated in time.

Bootstrapped/regtested x86_64-linux, built Mozilla and lto-bootstrap in 
progress.
OK if it passes?

Honza

PR lto/47247
* lto-plugin.c (get_symbols_v2): New variable.
(write_resolution): Use V2 API when available.
(onload): Handle LDPT_GET_SYMBOLS_V2.

* lto-symtab.c (lto_symtab_resolve_symbols): Do not resolve
when resolution is already availbale from plugin.
(lto_symtab_merge_decls_1): Handle LDPR_PREVAILING_DEF_IRONLY_EXP.
* cgraph.c (ld_plugin_symbol_resolution): Add prevailing_def_ironly_exp.
* lto-cgraph.c (LDPR_NUM_KNOWN): Update.
* ipa.c (varpool_externally_visible_p): IRONLY variables are never
externally visible.
* varasm.c (resolution_to_local_definition_p): Add
LDPR_PREVAILING_DEF_IRONLY_EXP.
(resolution_local_p): Likewise.

* common.c (lto_resolution_str): Add new resolution.
* common.h (lto_resolution_str): Likewise.
Index: lto-plugin/lto-plugin.c
===
*** lto-plugin/lto-plugin.c (revision 179274)
--- lto-plugin/lto-plugin.c (working copy)
*** enum symbol_style
*** 129,135 
  static char *arguments_file_name;
  static ld_plugin_register_claim_file register_claim_file;
  static ld_plugin_register_all_symbols_read register_all_symbols_read;
! static ld_plugin_get_symbols get_symbols;
  static ld_plugin_register_cleanup register_cleanup;
  static ld_plugin_add_input_file add_input_file;
  static ld_plugin_add_input_library add_input_library;
--- 129,135 
  static char *arguments_file_name;
  static ld_plugin_register_claim_file register_claim_file;
  static ld_plugin_register_all_symbols_read register_all_symbols_read;
! static ld_plugin_get_symbols get_symbols, get_symbols_v2;
  static ld_plugin_register_cleanup register_cleanup;
  static ld_plugin_add_input_file add_input_file;
  static ld_plugin_add_input_library add_input_library;
*** write_resolution (void)
*** 441,447 
struct plugin_symtab *symtab = &info->symtab;
struct ld_plugin_symbol *syms = symtab->syms;
  
!   get_symbols (info->handle, symtab->nsyms, syms);
  
finish_conflict_resolution (symtab, &info->conflicts);
  
--- 441,452 
struct plugin_symtab *symtab = &info->symtab;
struct ld_plugin_symbol *syms = symtab->syms;
  
!   /* Version 2 of API supports IRONLY_EXP resolution that is
!  accepted by GCC-4.7 and newer.  */
!   if (get_symbols_v2)
! get_symbols_v2 (info->handle, symtab->nsyms, syms);
!   else
! get_symbols (info->handle, symtab->nsyms, syms);
  
finish_conflict_resolution (symtab, &info->conflicts);
  
*** onload (struct ld_plugin_tv *tv)
*** 986,991 
--- 991,999 
case LDPT_REGISTER_ALL_SYMBOLS_READ_HOOK:
  register_all_symbols_read = p->tv_u.tv_register_all_symbols_read;
  break;
+   case LDPT_GET_SYMBOLS_V2:
+ get_symbols_v2 = p->tv_u.tv_get_symbols;
+ break;
case LDPT_GET_SYMBOLS:
  get_symbols = p->tv_u.tv_get_symbols;
  break;
Index: gcc/lto-symtab.c
===
*** gcc/lto-symtab.c(revision 179274)
--- gcc/lto-symtab.c(working copy)
*** lto_symtab_resolve_symbols (void **slot)
*** 441,452 
e->node = cgraph_get_node (e->decl);
else if (TREE_CODE (e->decl) == VAR_DECL)
e->vnode = varpool_get_node (e->dec

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Jakub Jelinek

On Wed, Sep 28, 2011 at 04:41:47AM -0700, Andi Kleen wrote:
> Michael Zolotukhin  writes:
> >
> > Build and 'make check' was tested.
> 
> Could you expand a bit on the performance benefits?  Where does it help?

Especially when glibc these days has very well optimized implementations
tuned for various CPUs and it is very unlikely beneficial to inline
memcpy/memset if they aren't really short or have unknown number of
iterations.

Jakub

Re: Intrinsics for N2965: Type traits and base classes

2011-09-28 Thread Mike Spertus

Don't worry, I'm not suggesting including boost::mpl at all, just 
leaving the return type of the bases trait unspecified. IMO, your 
example illustrates my point that without performance tuning, compiling 
metaprograms can be prohibitively expensive, so I want to avoid running 
the tuple metaprogram that creates the fields when we never need to 
instantiate the type. Benchmarks soon.


Mike

On 9/28/2011 2:53 AM, Jonathan Wakely wrote:

On 28 September 2011 04:22, Michael Spertus wrote:
   

Benjamin,
I think tuple is wrong both for performance reasons (I believe these are likely 
to be serious enough to depress use due to inordinately long compiles) and 
because it prematurely locks us into a rigid choice of how our typelists are 
implemented.

My inclination is to make it type-independent by returning an unspecified type that can 
have a sequence of types extracted from it (this is the approach taken by boost::mpl and 
has loads of experience that shows it is a good approach to metaprogramming). In other 
words, first>::type would be the first base of A, etc.
 

Citing Boost MPL as a good way to avoid inordinately long compiles ...
interesting!  Have you ever tried to reduce a GCC bug report from 20k
lines to 20, because most Boost libs include every MPL header?!

I hope we can get a simple typelist _without_ needing everything else
in MPL, such as the apply and lambda metafunctions (and maybe a lot of
that could be massively simplified using variadic templates anyway.)

.

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Andi Kleen

Michael Zolotukhin  writes:
>
> Build and 'make check' was tested.

Could you expand a bit on the performance benefits?  Where does it help?

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only

Re: [ARM] Optimise handling of neon_vget_high/low

2011-09-28 Thread Ramana Radhakrishnan

>
> Tested on arm-linux-gnueabi, and on benchmarks which should (and did)
> benefit from it.  OK to install?

OK.

Ramana

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Michael Zolotukhin

Attached is a part 1 of patch that enables use of vector-instructions
in memset and memcopy (middle-end part).
The main part of the changes is in functions
move_by_pieces/set_by_pieces. In new version algorithm of move-mode
selection was changed – now it checks if alignment is known at compile
time and uses cost-models to choose between aligned and unaligned
vector or not-vector move-modes.

Build and 'make check' was tested - in 'make check' there is a fail,
that would be cured when complete patch is applied.

On 27 September 2011 18:44, Michael Zolotukhin
 wrote:
> I divided the patch into three smaller ones:
>
> 1) Patch with target-independent changes (see attached file 
> memfunc-mid.patch).
> The main part of the changes is in functions
> move_by_pieces/set_by_pieces. In new version algorithm of move-mode
> selection was changed – now it checks if alignment is known at compile
> time and uses cost-models to choose between aligned and unaligned
> vector or not-vector move-modes.
>
> 2) Patch with target-dependent changes (memfunc-be.patch).
> The main part of the changes is in functions
> ix86_expand_setmem/ix86_expand_movmem. The other changes are only
> needed to support it.
> The changes mostly touched unrolled_loop strategy – now vector move
> modes could be used here. That resulted in large epilogues and
> prologues, so their generation also was modified.
> This patch contains some changes in middle-end (to make build
> possible) - but all these changes are present in the first patch, so
> there is no need to review them here.
>
> 3) Patch with all new tests (memfunc-tests.patch).
> This patch contains a lot of small tests for different memset and memcopy 
> cases.
>
> Separately from each other, these patches won't give performance gain.
> The positive effect will be noticeable only if they are applied
> together (I attach the complete patch also - see file
> memfunc-complete.patch).
>
>
> If you have any questions regarding these changes, please don't
> hesitate to ask them.
>
>
> On 18 July 2011 15:00, Michael Zolotukhin
>  wrote:
>> Here is a summary - probably, it doesn't cover every single piece in
>> the patch, but I tried to describe the major changes. I hope this will
>> help you a bit - and of course I'll answer your further questions if
>> they appear.
>>
>> The changes could be logically divided into two parts (though, these
>> parts have something in common).
>> The first part is changes in target-independent part, in functions
>> move_by_pieces() and store_by_pieces() - mostly located in expr.c.
>> The second part touches ix86_expand_movmem() and ix86_expand_setmem()
>> - mostly located in config/i386/i386.c.
>>
>> Changes in i386.c (target-dependent part):
>> 1) Strategies for cases with known and unknown alignment are separated
>> from each other.
>> When alignment is known at compile time, we could generate optimized
>> code without libcalls.
>> When it's unknown, we sometimes could create runtime-checks to reach
>> desired alignment, but not always.
>> Strategies for atom and generic_32, generic_64 were chosen according
>> to set of experiments, strategies in other
>> cost models are unchanged (strategies for unknown alignment are copied
>> from existing strategies).
>> 2) unrolled_loop algorithm was modified - now it uses SSE move-modes,
>> if they're available.
>> 3) As size of data, moved in one iteration, greatly increased, and
>> epilogues became bigger - so some changes were needed in epilogue
>> generation. In some cases a special loop (not unrolled) is generated
>> in epilogue to avoid slow copying by bytes (changes in
>> expand_set_or_movmem_via_loop() and introducing of
>> expand_set_or_movmem_via_loop_with_iter() is made for these cases).
>> 4) As bigger alignment might be needed than previously, prologue
>> generation was also modified.
>>
>> Changes in expr.c (target-independent part):
>> There are two possible strategies now: use of aligned and unaligned
>> moves. For each of them a cost model was implemented and the choice is
>> made according to the cost of each option. Move-mode choice is made by
>> functions widest_mode_for_unaligned_mov() and
>> widest_mode_for_aligned_mov().
>> Cost estimation is implemented in functions compute_aligned_cost() and
>> compute_unaligned_cost().
>> Choice between these two strategies and the generation of moves
>> themselves are in function move_by_pieces().
>>
>> Function store_by_pieces() calls set_by_pieces_1() instead of
>> store_by_pieces_1(), if this is memset-case (I needed to introduce
>> set_by_pieces_1 to separate memset-case from others -
>> store_by_pieces_1 is sometimes called for strcpy and some other
>> functions, not only for memset).
>>
>> Set_by_pieces_1() estimates costs of aligned and unaligned strategies
>> (as in move_by_pieces() ) and generates moves for memset. Single move
>> is generated via
>> generate_move_with_mode(). If it's called first time, a promoted value
>> (register, filled with one-byte value

Re: [RFC] Context sensitive inline analysis

2011-09-28 Thread Richard Sandiford

Jan Hubicka  writes:
> the problem is sign overflow in time computation. Time should be
> capped by MAX_TIME and we compute MAX_TIME * INLINE_SIZE_SCALE *
> 2. This happens to be >2^31 & <2^32 so we overflow here because of use
> of signed arithmetics.
>
> Index: ipa-inline-analysis.c
> ===
> --- ipa-inline-analysis.c (revision 179266)
> +++ ipa-inline-analysis.c (working copy)
> @@ -92,7 +92,7 @@ along with GCC; see the file COPYING3.
>  /* Estimate runtime of function can easilly run into huge numbers with many
> nested loops.  Be sure we can compute time * INLINE_SIZE_SCALE in integer.
> For anything larger we use gcov_type.  */
> -#define MAX_TIME 100
> +#define MAX_TIME 50
>  
>  /* Number of bits in integer, but we really want to be stable across 
> different
> hosts.  */

Could you update the comment too?  ("time * INLINE_SIZE_SCALE * 2")

Richard

Re: Go patch committed: Update libgo to Go release r60

2011-09-28 Thread Rainer Orth

Ian Lance Taylor  writes:

> Rainer Orth  writes:
>
>> Solaris 8 and 9 suffer from the same problem.  The following patch
>> allowed the bootstrap to complete.  An IRIX bootstrap is currently
>> running, but will take some time to complete.
>>
>>  Rainer
>>
>>
>> 2011-09-23  Rainer Orth  
>>
>>  * mksysinfo.sh: Provide TIOCSCTTY if missing.
>
> Thanks.  I committed this patch, though I moved the new lines farther
> down in the script next to the handling of the ioctl constants.

Thanks, I'd missed that.  It turned out that IRIX 6 needs one more
change to return to bootstrap land:  only defines
TIOCNOTTY if !_XOPEN_SOURCE, which we need for other stuff
(cf. configure.ac).  I've cheated and use  instead, which
doesn't have this check.  With this patch, a Go-only IRIX 6.5 bootstrap
completed successfully.

Rainer


2011-09-28  Rainer Orth  

* mksysinfo.sh [__sgi__]: Include .

# HG changeset patch
# Parent 4530aeaf12a2b1576a7bf67de9cf5569719107c6
Provide TIOCNOTTY on IRIX 6

diff --git a/libgo/mksysinfo.sh b/libgo/mksysinfo.sh
--- a/libgo/mksysinfo.sh
+++ b/libgo/mksysinfo.sh
@@ -36,9 +36,12 @@ cat > sysinfo.c <
 /*  needs u_char/u_short, but  is only
included by  if _SGIAPI (i.e. _SGI_SOURCE
-   && !_XOPEN_SOURCE.  */
+   && !_XOPEN_SOURCE.
+only defines TIOCNOTTY if !_XOPEN_SOURCE, while
+does so unconditionally.  */
 #ifdef __sgi__
 #include 
+#include 
 #endif
 #include 
 #include 

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH] Fix stack red zone bug (PR38644)

2011-09-28 Thread Richard Guenther

On Wed, Sep 28, 2011 at 11:40 AM, Jiangning Liu  wrote:
>
>
>> -Original Message-
>> From: Richard Guenther [mailto:richard.guent...@gmail.com]
>> Sent: Wednesday, September 28, 2011 5:20 PM
>> To: Jiangning Liu
>> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
>>
>> On Wed, Sep 28, 2011 at 11:10 AM, Jiangning Liu 
>> wrote:
>> >
>> >
>> >> -Original Message-
>> >> From: Richard Guenther [mailto:richard.guent...@gmail.com]
>> >> Sent: Wednesday, September 28, 2011 4:39 PM
>> >> To: Jiangning Liu
>> >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
>> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
>> >>
>> >> On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu
>> 
>> >> wrote:
>> >> >
>> >> >
>> >> >> -Original Message-
>> >> >> From: Richard Guenther [mailto:richard.guent...@gmail.com]
>> >> >> Sent: Tuesday, September 27, 2011 3:41 PM
>> >> >> To: Jiangning Liu
>> >> >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
>> >> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
>> >> >>
>> >> >> On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu
>> >> 
>> >> >> wrote:
>> >> >> >> Think of it this way.  What the IR says is there is no barrier
>> >> >> between
>> >> >> >> those moves.  You either have an implicit barrier (which is
>> what
>> >> you
>> >> >> >> are proposing) or you have it explicitly.  I think we all
>> rather
>> >> >> have
>> >> >> >> more things explicit rather than implicit in the IR.  And that
>> >> has
>> >> >> >> been the overall feeling for a few years now.
>> >> >> >>
>> >> >> >
>> >> >> > Sorry, I'm afraid I can't agree with you. Instead, I think
>> using
>> >> >> barrier to describe this kind of dependence is a kind of implicit
>> >> >> method. Please note that this is not an usual data dependence
>> issue.
>> >> >> The stack pointer change doesn't have any dependence with memory
>> >> access
>> >> >> at all.
>> >> >>
>> >> >> It is similar to atomic instructions that require being an
>> >> >> optimization/memory barrier.  Sure it is not a usual data
>> dependence
>> >> >> (otherwise we would handle
>> >> >> it already), so the targets have to explicitly express the
>> >> dependence
>> >> >> somehow, for which we only have barriers right now.
>> >> >>
>> >> >
>> >> > Richard,
>> >> >
>> >> > Thanks for your explanation. It's explicit to back-end, while it's
>> >> implicit
>> >> > to scheduler in middle end, because barrier can decide dependence
>> in
>> >> > scheduler but barrier can be generated from several different
>> >> scenarios.
>> >> > It's unsafe and prone to introduce bug if any one of the scenarios
>> >> requiring
>> >> > generating barriers is being missed in back-end.
>> >> >
>> >> > Between middle-end and back-end, we should have interfaces that is
>> >> easy to
>> >> > be implemented by back-end. After all, middle-end itself can't
>> >> consist of a
>> >> > compiler, and vice versa. Back-end needs middle-end's help to make
>> >> sure
>> >> > back-end is easy to be implemented and reduce the possibility of
>> >> introducing
>> >> > bugs.
>> >> >
>> >> > Without an explicit hook as I'm proposing, back-end implementers
>> have
>> >> to
>> >> > clearly know all scenarios of generating barrier very clearly,
>> >> because there
>> >> > isn't any code tips and comments in middle end telling back-end
>> the
>> >> list of
>> >> > all scenarios on generating barriers.
>> >> >
>> >> > Yes, barrier is a perfect interface for scheduler in theory. But
>> from
>> >> > engineering point of view, I think it's better to explicitly
>> define
>> >> an
>> >> > interface to describe stack red zone and inform back-end, or vice
>> >> versa. Not
>> >> > like computer, people is easy to make mistake if you don't tell
>> them.
>> >> On
>> >> > this bug, the fact is over the years different back-ends made
>> similar
>> >> bugs.
>> >> >
>> >> > GCC is really a perfect platform on building new ports, and I saw
>> a
>> >> lot of
>> >> > new back-ends. The current middle end is unsafe, if port doesn't
>> >> support
>> >> > stack red zone and back-ends doesn't generate barrier for it. Why
>> >> can't we
>> >> > explicitly clarify this in compiler code between middle end and
>> back
>> >> end?
>> >> > What if any other back-end (new or old) NOT supporting stack red
>> zone
>> >> > exposing the similar bug again?
>> >>
>> >> There are gazillion things you have to explicitly get right in your
>> >> backends,
>> >> so I don't see why exposing proper scheduling barriers should be
>> >> special,
>> >> and there, why red-zones should be special (as opposed to other
>> >> occasions
>> >> where you need to emit barriers from the backend for the scheduler).
>> >>
>> >
>> > Richard,
>> >
>> > This is because,
>> >
>> > 1) Current scheduler is unsafe if back-end doesn't generate barrier
>> for a
>> > port which doesn't support stack red zone at all.
>> > 2) Implementing barrier in back-end is a burden to new back-end
>> > implementation for

Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.

2011-09-28 Thread Richard Guenther

On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries  wrote:
> Richard,
>
> I got a patch for PR50527.
>
> The patch prevents the alignment of vla-related allocas to be set to
> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding
> the alloca.
>
> Bootstrapped and regtested on x86_64.
>
> OK for trunk?

Hmm.  As gfortran with -fstack-arrays uses VLAs it's probably bad that
the vectorizer then will no longer see that the arrays are properly aligned.

I'm not sure what the best thing to do is here, other than trying to record
the alignment requirement of the VLA somewhere.

Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT
has the issue that it will force stack-realignment which isn't free (and the
point was to make the decl cheaper than the alloca).  But that might
possibly be the better choice.

Any other thoughts?

Thanks,
Richard.

> Thanks,
> - Tom
>
> 2011-09-27  Tom de Vries  
>
>        * tree-ssa-ccp.c (evaluate_stmt): Don't assume alignment for 
> vla-related
>        allocas.
>
>        * gcc.dg/pr50527.c: New test.
>

Re: [patch, testsuite, ARM] Skip architecture option in pr42575.c

2011-09-28 Thread Ramana Radhakrishnan

On 28 September 2011 09:48, Joey Ye  wrote:
> 2011-09-28  Joey Ye  
>
>        * gcc.target/arm/pr42575.c: Remove architecture option.

What happens if this test is run with a multilib of march=armv5te ?

Ramana
>
> Index: gcc/testsuite/gcc.target/arm/pr42575.c
> ===
> --- gcc/testsuite/gcc.target/arm/pr42575.c      (revision 179308)
> +++ gcc/testsuite/gcc.target/arm/pr42575.c      (working copy)
> @@ -1,4 +1,4 @@
> -/* { dg-options "-O2 -march=armv7-a" }  */
> +/* { dg-options "-O2" }  */
>  /* Make sure RA does good job allocating registers and avoids
>    unnecessary moves.  */
>  /* { dg-final { scan-assembler-not "mov" } } */
>
>
>
>

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-28 Thread Jiangning Liu



> -Original Message-
> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> Sent: Wednesday, September 28, 2011 5:20 PM
> To: Jiangning Liu
> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> 
> On Wed, Sep 28, 2011 at 11:10 AM, Jiangning Liu 
> wrote:
> >
> >
> >> -Original Message-
> >> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> >> Sent: Wednesday, September 28, 2011 4:39 PM
> >> To: Jiangning Liu
> >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> >>
> >> On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu
> 
> >> wrote:
> >> >
> >> >
> >> >> -Original Message-
> >> >> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> >> >> Sent: Tuesday, September 27, 2011 3:41 PM
> >> >> To: Jiangning Liu
> >> >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
> >> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> >> >>
> >> >> On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu
> >> 
> >> >> wrote:
> >> >> >> Think of it this way.  What the IR says is there is no barrier
> >> >> between
> >> >> >> those moves.  You either have an implicit barrier (which is
> what
> >> you
> >> >> >> are proposing) or you have it explicitly.  I think we all
> rather
> >> >> have
> >> >> >> more things explicit rather than implicit in the IR.  And that
> >> has
> >> >> >> been the overall feeling for a few years now.
> >> >> >>
> >> >> >
> >> >> > Sorry, I'm afraid I can't agree with you. Instead, I think
> using
> >> >> barrier to describe this kind of dependence is a kind of implicit
> >> >> method. Please note that this is not an usual data dependence
> issue.
> >> >> The stack pointer change doesn't have any dependence with memory
> >> access
> >> >> at all.
> >> >>
> >> >> It is similar to atomic instructions that require being an
> >> >> optimization/memory barrier.  Sure it is not a usual data
> dependence
> >> >> (otherwise we would handle
> >> >> it already), so the targets have to explicitly express the
> >> dependence
> >> >> somehow, for which we only have barriers right now.
> >> >>
> >> >
> >> > Richard,
> >> >
> >> > Thanks for your explanation. It's explicit to back-end, while it's
> >> implicit
> >> > to scheduler in middle end, because barrier can decide dependence
> in
> >> > scheduler but barrier can be generated from several different
> >> scenarios.
> >> > It's unsafe and prone to introduce bug if any one of the scenarios
> >> requiring
> >> > generating barriers is being missed in back-end.
> >> >
> >> > Between middle-end and back-end, we should have interfaces that is
> >> easy to
> >> > be implemented by back-end. After all, middle-end itself can't
> >> consist of a
> >> > compiler, and vice versa. Back-end needs middle-end's help to make
> >> sure
> >> > back-end is easy to be implemented and reduce the possibility of
> >> introducing
> >> > bugs.
> >> >
> >> > Without an explicit hook as I'm proposing, back-end implementers
> have
> >> to
> >> > clearly know all scenarios of generating barrier very clearly,
> >> because there
> >> > isn't any code tips and comments in middle end telling back-end
> the
> >> list of
> >> > all scenarios on generating barriers.
> >> >
> >> > Yes, barrier is a perfect interface for scheduler in theory. But
> from
> >> > engineering point of view, I think it's better to explicitly
> define
> >> an
> >> > interface to describe stack red zone and inform back-end, or vice
> >> versa. Not
> >> > like computer, people is easy to make mistake if you don't tell
> them.
> >> On
> >> > this bug, the fact is over the years different back-ends made
> similar
> >> bugs.
> >> >
> >> > GCC is really a perfect platform on building new ports, and I saw
> a
> >> lot of
> >> > new back-ends. The current middle end is unsafe, if port doesn't
> >> support
> >> > stack red zone and back-ends doesn't generate barrier for it. Why
> >> can't we
> >> > explicitly clarify this in compiler code between middle end and
> back
> >> end?
> >> > What if any other back-end (new or old) NOT supporting stack red
> zone
> >> > exposing the similar bug again?
> >>
> >> There are gazillion things you have to explicitly get right in your
> >> backends,
> >> so I don't see why exposing proper scheduling barriers should be
> >> special,
> >> and there, why red-zones should be special (as opposed to other
> >> occasions
> >> where you need to emit barriers from the backend for the scheduler).
> >>
> >
> > Richard,
> >
> > This is because,
> >
> > 1) Current scheduler is unsafe if back-end doesn't generate barrier
> for a
> > port which doesn't support stack red zone at all.
> > 2) Implementing barrier in back-end is a burden to new back-end
> > implementation for ports not supporting stack red zone at all.
> > 3) There are many other ports not reporting bugs around this. I doubt
> there
> > isn't bug for them.
> > 4) There are over 300 TARGET HOOKS being defi

[PATCH, PR50527] Don't assume alignment of vla-related allocas.

2011-09-28 Thread Tom de Vries

Richard,

I got a patch for PR50527.

The patch prevents the alignment of vla-related allocas to be set to
BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding
the alloca.

Bootstrapped and regtested on x86_64.

OK for trunk?

Thanks,
- Tom

2011-09-27  Tom de Vries  

* tree-ssa-ccp.c (evaluate_stmt): Don't assume alignment for vla-related
allocas.

* gcc.dg/pr50527.c: New test.
Index: gcc/tree-ssa-ccp.c
===
--- gcc/tree-ssa-ccp.c (revision 179210)
+++ gcc/tree-ssa-ccp.c (working copy)
@@ -1632,6 +1632,8 @@ evaluate_stmt (gimple stmt)
 	  break;
 
 	case BUILT_IN_ALLOCA:
+	  if (gimple_call_alloca_for_var_p (stmt))
+		break;
 	  val.lattice_val = CONSTANT;
 	  val.value = build_int_cst (TREE_TYPE (gimple_get_lhs (stmt)), 0);
 	  val.mask = shwi_to_double_int
Index: gcc/testsuite/gcc.dg/pr50527.c
===
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr50527.c (revision 0)
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-options "-Os --param large-stack-frame=30" } */
+
+extern void abort (void);
+
+void __attribute__((noinline)) 
+bar (char *a)
+{
+}
+
+void __attribute__((noinline)) 
+foo (char *a, int b)
+{
+}
+
+void __attribute__((noinline))
+test_align (char *p, int aligned, unsigned int mask)
+{
+  int p_aligned = ((unsigned long int)p & mask) == 0;
+  if (aligned != p_aligned)
+abort ();
+}
+
+int
+main ()
+{
+  const int kIterations = 4;
+  char results[kIterations];
+  int i;
+  unsigned int mask;
+
+  mask = 0xf;
+  test_align (results, ((unsigned long int)results & mask) == 0, mask);
+  mask = 0x7;
+  test_align (results, ((unsigned long int)results & mask) == 0, mask);
+  mask = 0x3;
+  test_align (results, ((unsigned long int)results & mask) == 0, mask);
+  mask = 0x1;
+  test_align (results, ((unsigned long int)results & mask) == 0, mask);
+
+  bar (results);
+  for (i = 0; i < kIterations; i++)
+foo ("%d ", results[i]);
+
+  return 0;
+}

Re: [PATCH] Fix stack red zone bug (PR38644)

2011-09-28 Thread Richard Guenther

On Wed, Sep 28, 2011 at 11:10 AM, Jiangning Liu  wrote:
>
>
>> -Original Message-
>> From: Richard Guenther [mailto:richard.guent...@gmail.com]
>> Sent: Wednesday, September 28, 2011 4:39 PM
>> To: Jiangning Liu
>> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
>>
>> On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu 
>> wrote:
>> >
>> >
>> >> -Original Message-
>> >> From: Richard Guenther [mailto:richard.guent...@gmail.com]
>> >> Sent: Tuesday, September 27, 2011 3:41 PM
>> >> To: Jiangning Liu
>> >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
>> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
>> >>
>> >> On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu
>> 
>> >> wrote:
>> >> >> Think of it this way.  What the IR says is there is no barrier
>> >> between
>> >> >> those moves.  You either have an implicit barrier (which is what
>> you
>> >> >> are proposing) or you have it explicitly.  I think we all rather
>> >> have
>> >> >> more things explicit rather than implicit in the IR.  And that
>> has
>> >> >> been the overall feeling for a few years now.
>> >> >>
>> >> >
>> >> > Sorry, I'm afraid I can't agree with you. Instead, I think using
>> >> barrier to describe this kind of dependence is a kind of implicit
>> >> method. Please note that this is not an usual data dependence issue.
>> >> The stack pointer change doesn't have any dependence with memory
>> access
>> >> at all.
>> >>
>> >> It is similar to atomic instructions that require being an
>> >> optimization/memory barrier.  Sure it is not a usual data dependence
>> >> (otherwise we would handle
>> >> it already), so the targets have to explicitly express the
>> dependence
>> >> somehow, for which we only have barriers right now.
>> >>
>> >
>> > Richard,
>> >
>> > Thanks for your explanation. It's explicit to back-end, while it's
>> implicit
>> > to scheduler in middle end, because barrier can decide dependence in
>> > scheduler but barrier can be generated from several different
>> scenarios.
>> > It's unsafe and prone to introduce bug if any one of the scenarios
>> requiring
>> > generating barriers is being missed in back-end.
>> >
>> > Between middle-end and back-end, we should have interfaces that is
>> easy to
>> > be implemented by back-end. After all, middle-end itself can't
>> consist of a
>> > compiler, and vice versa. Back-end needs middle-end's help to make
>> sure
>> > back-end is easy to be implemented and reduce the possibility of
>> introducing
>> > bugs.
>> >
>> > Without an explicit hook as I'm proposing, back-end implementers have
>> to
>> > clearly know all scenarios of generating barrier very clearly,
>> because there
>> > isn't any code tips and comments in middle end telling back-end the
>> list of
>> > all scenarios on generating barriers.
>> >
>> > Yes, barrier is a perfect interface for scheduler in theory. But from
>> > engineering point of view, I think it's better to explicitly define
>> an
>> > interface to describe stack red zone and inform back-end, or vice
>> versa. Not
>> > like computer, people is easy to make mistake if you don't tell them.
>> On
>> > this bug, the fact is over the years different back-ends made similar
>> bugs.
>> >
>> > GCC is really a perfect platform on building new ports, and I saw a
>> lot of
>> > new back-ends. The current middle end is unsafe, if port doesn't
>> support
>> > stack red zone and back-ends doesn't generate barrier for it. Why
>> can't we
>> > explicitly clarify this in compiler code between middle end and back
>> end?
>> > What if any other back-end (new or old) NOT supporting stack red zone
>> > exposing the similar bug again?
>>
>> There are gazillion things you have to explicitly get right in your
>> backends,
>> so I don't see why exposing proper scheduling barriers should be
>> special,
>> and there, why red-zones should be special (as opposed to other
>> occasions
>> where you need to emit barriers from the backend for the scheduler).
>>
>
> Richard,
>
> This is because,
>
> 1) Current scheduler is unsafe if back-end doesn't generate barrier for a
> port which doesn't support stack red zone at all.
> 2) Implementing barrier in back-end is a burden to new back-end
> implementation for ports not supporting stack red zone at all.
> 3) There are many other ports not reporting bugs around this. I doubt there
> isn't bug for them.
> 4) There are over 300 TARGET HOOKS being defined in target.def. I don't
> think adding this interface is hurting GCC.

I don't argue that your solution is not acceptable, just your reasoning
is bogus IMHO. 1) is a target bug, 2) huh, I doubt that this is the
biggest issue
one faces when implementing a new target, 3) I'm sure there are very many
latent backend bugs not related to red-zone

The middle-end isn't "safe by default" either if you have bogus
instruction patterns in your .md file, or you generate bogus IL
from the va-arg gimplification hook.  A target bug is a target bu

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-28 Thread Jiangning Liu



> -Original Message-
> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> Sent: Wednesday, September 28, 2011 4:39 PM
> To: Jiangning Liu
> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> 
> On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu 
> wrote:
> >
> >
> >> -Original Message-
> >> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> >> Sent: Tuesday, September 27, 2011 3:41 PM
> >> To: Jiangning Liu
> >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> >>
> >> On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu
> 
> >> wrote:
> >> >> Think of it this way.  What the IR says is there is no barrier
> >> between
> >> >> those moves.  You either have an implicit barrier (which is what
> you
> >> >> are proposing) or you have it explicitly.  I think we all rather
> >> have
> >> >> more things explicit rather than implicit in the IR.  And that
> has
> >> >> been the overall feeling for a few years now.
> >> >>
> >> >
> >> > Sorry, I'm afraid I can't agree with you. Instead, I think using
> >> barrier to describe this kind of dependence is a kind of implicit
> >> method. Please note that this is not an usual data dependence issue.
> >> The stack pointer change doesn't have any dependence with memory
> access
> >> at all.
> >>
> >> It is similar to atomic instructions that require being an
> >> optimization/memory barrier.  Sure it is not a usual data dependence
> >> (otherwise we would handle
> >> it already), so the targets have to explicitly express the
> dependence
> >> somehow, for which we only have barriers right now.
> >>
> >
> > Richard,
> >
> > Thanks for your explanation. It's explicit to back-end, while it's
> implicit
> > to scheduler in middle end, because barrier can decide dependence in
> > scheduler but barrier can be generated from several different
> scenarios.
> > It's unsafe and prone to introduce bug if any one of the scenarios
> requiring
> > generating barriers is being missed in back-end.
> >
> > Between middle-end and back-end, we should have interfaces that is
> easy to
> > be implemented by back-end. After all, middle-end itself can't
> consist of a
> > compiler, and vice versa. Back-end needs middle-end's help to make
> sure
> > back-end is easy to be implemented and reduce the possibility of
> introducing
> > bugs.
> >
> > Without an explicit hook as I'm proposing, back-end implementers have
> to
> > clearly know all scenarios of generating barrier very clearly,
> because there
> > isn't any code tips and comments in middle end telling back-end the
> list of
> > all scenarios on generating barriers.
> >
> > Yes, barrier is a perfect interface for scheduler in theory. But from
> > engineering point of view, I think it's better to explicitly define
> an
> > interface to describe stack red zone and inform back-end, or vice
> versa. Not
> > like computer, people is easy to make mistake if you don't tell them.
> On
> > this bug, the fact is over the years different back-ends made similar
> bugs.
> >
> > GCC is really a perfect platform on building new ports, and I saw a
> lot of
> > new back-ends. The current middle end is unsafe, if port doesn't
> support
> > stack red zone and back-ends doesn't generate barrier for it. Why
> can't we
> > explicitly clarify this in compiler code between middle end and back
> end?
> > What if any other back-end (new or old) NOT supporting stack red zone
> > exposing the similar bug again?
> 
> There are gazillion things you have to explicitly get right in your
> backends,
> so I don't see why exposing proper scheduling barriers should be
> special,
> and there, why red-zones should be special (as opposed to other
> occasions
> where you need to emit barriers from the backend for the scheduler).
> 

Richard,

This is because,

1) Current scheduler is unsafe if back-end doesn't generate barrier for a
port which doesn't support stack red zone at all.
2) Implementing barrier in back-end is a burden to new back-end
implementation for ports not supporting stack red zone at all. 
3) There are many other ports not reporting bugs around this. I doubt there
isn't bug for them. 
4) There are over 300 TARGET HOOKS being defined in target.def. I don't
think adding this interface is hurting GCC.

BTW, really appreciate your close attention to this specific issue.

Thanks,
-Jiangning

> Richard.
> 
> > Thanks,
> > -Jiangning
> >
> >> Richard.
> >>
> >> > No matter what solution itself is, the problem itself is a quite a
> >> common one on ISA level, so we should solve it in middle-end,
> because
> >> middle-end is shared for all ports.
> >> >
> >> > My proposal avoids problems in future. Any new ports and new back-
> end
> >> implementations needn't explicitly define this hook in a very safe
> way.
> >> But if one port wants to "unsafely" introduce red zone, we can
> >> explicitly define this hook in back-end. This way, we would avo

[PATCH, PR50485, committed] Initialize variable in sse4_1-blendps{,-2}.c with random floats

2011-09-28 Thread Tom de Vries

Committed as obvious.

Thanks,
- Tom

2011-09-28  Tom de Vries  

PR testsuite/50485
* gcc.target/i386/sse4_1-blendps.c: Include .
(TEST): Initialize src3 with random floats.
* gcc.target/i386/sse4_1-blendps-2.c (sse4_1_test): Remove field i from
union src3.  Initialize src3 with random floats.
Index: gcc/testsuite/gcc.target/i386/sse4_1-blendps.c
===
--- gcc/testsuite/gcc.target/i386/sse4_1-blendps.c (revision 179043)
+++ gcc/testsuite/gcc.target/i386/sse4_1-blendps.c (working copy)
@@ -14,6 +14,7 @@
 
 #include 
 #include 
+#include 
 
 #define NUM 20
 
@@ -66,6 +67,9 @@ TEST (void)
 
   init_blendps (src1.f, src2.f);
 
+  for (i = 0; i < 4; i++)
+src3.f[i] = (int) random ();
+
   /* Check blendps imm8, m128, xmm */
   for (i = 0; i < NUM; i++)
 {
Index: gcc/testsuite/gcc.target/i386/sse4_1-blendps-2.c
===
--- gcc/testsuite/gcc.target/i386/sse4_1-blendps-2.c (revision 179043)
+++ gcc/testsuite/gcc.target/i386/sse4_1-blendps-2.c (working copy)
@@ -53,14 +53,13 @@ sse4_1_test (void)
 {
   __m128 x;
   float f[4];
-  int i[4];
 } src3;
   int i;
 
   init_blendps (src1.f, src2.f);
 
   for (i = 0; i < 4; i++)
-src3.i[i] = (int) random ();
+src3.f[i] = (int) random ();
 
   /* Check blendps imm8, m128, xmm */
   for (i = 0; i < NUM; i++)

[patch, testsuite, ARM] Skip architecture option in pr42575.c

2011-09-28 Thread Joey Ye

2011-09-28  Joey Ye  

* gcc.target/arm/pr42575.c: Remove architecture option.

Index: gcc/testsuite/gcc.target/arm/pr42575.c
===
--- gcc/testsuite/gcc.target/arm/pr42575.c  (revision 179308)
+++ gcc/testsuite/gcc.target/arm/pr42575.c  (working copy)
@@ -1,4 +1,4 @@
-/* { dg-options "-O2 -march=armv7-a" }  */
+/* { dg-options "-O2" }  */
 /* Make sure RA does good job allocating registers and avoids
unnecessary moves.  */
 /* { dg-final { scan-assembler-not "mov" } } */

Re: [PATCH] Fix stack red zone bug (PR38644)

2011-09-28 Thread Richard Guenther

On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu  wrote:
>
>
>> -Original Message-
>> From: Richard Guenther [mailto:richard.guent...@gmail.com]
>> Sent: Tuesday, September 27, 2011 3:41 PM
>> To: Jiangning Liu
>> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
>>
>> On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu 
>> wrote:
>> >> Think of it this way.  What the IR says is there is no barrier
>> between
>> >> those moves.  You either have an implicit barrier (which is what you
>> >> are proposing) or you have it explicitly.  I think we all rather
>> have
>> >> more things explicit rather than implicit in the IR.  And that has
>> >> been the overall feeling for a few years now.
>> >>
>> >
>> > Sorry, I'm afraid I can't agree with you. Instead, I think using
>> barrier to describe this kind of dependence is a kind of implicit
>> method. Please note that this is not an usual data dependence issue.
>> The stack pointer change doesn't have any dependence with memory access
>> at all.
>>
>> It is similar to atomic instructions that require being an
>> optimization/memory barrier.  Sure it is not a usual data dependence
>> (otherwise we would handle
>> it already), so the targets have to explicitly express the dependence
>> somehow, for which we only have barriers right now.
>>
>
> Richard,
>
> Thanks for your explanation. It's explicit to back-end, while it's implicit
> to scheduler in middle end, because barrier can decide dependence in
> scheduler but barrier can be generated from several different scenarios.
> It's unsafe and prone to introduce bug if any one of the scenarios requiring
> generating barriers is being missed in back-end.
>
> Between middle-end and back-end, we should have interfaces that is easy to
> be implemented by back-end. After all, middle-end itself can't consist of a
> compiler, and vice versa. Back-end needs middle-end's help to make sure
> back-end is easy to be implemented and reduce the possibility of introducing
> bugs.
>
> Without an explicit hook as I'm proposing, back-end implementers have to
> clearly know all scenarios of generating barrier very clearly, because there
> isn't any code tips and comments in middle end telling back-end the list of
> all scenarios on generating barriers.
>
> Yes, barrier is a perfect interface for scheduler in theory. But from
> engineering point of view, I think it's better to explicitly define an
> interface to describe stack red zone and inform back-end, or vice versa. Not
> like computer, people is easy to make mistake if you don't tell them. On
> this bug, the fact is over the years different back-ends made similar bugs.
>
> GCC is really a perfect platform on building new ports, and I saw a lot of
> new back-ends. The current middle end is unsafe, if port doesn't support
> stack red zone and back-ends doesn't generate barrier for it. Why can't we
> explicitly clarify this in compiler code between middle end and back end?
> What if any other back-end (new or old) NOT supporting stack red zone
> exposing the similar bug again?

There are gazillion things you have to explicitly get right in your backends,
so I don't see why exposing proper scheduling barriers should be special,
and there, why red-zones should be special (as opposed to other occasions
where you need to emit barriers from the backend for the scheduler).

Richard.

> Thanks,
> -Jiangning
>
>> Richard.
>>
>> > No matter what solution itself is, the problem itself is a quite a
>> common one on ISA level, so we should solve it in middle-end, because
>> middle-end is shared for all ports.
>> >
>> > My proposal avoids problems in future. Any new ports and new back-end
>> implementations needn't explicitly define this hook in a very safe way.
>> But if one port wants to "unsafely" introduce red zone, we can
>> explicitly define this hook in back-end. This way, we would avoid the
>> bug in the earliest time. Do you really want to hide this problem in
>> back-end silently? And wait others to report bug and then waste time to
>> get it fixed again?
>> >
>> > The facts I see is over the years, for different ports reported the
>> similar issue around this, and somebody tried to fix them. Actually,
>> this kind of problem should be fixed in design level. If the first
>> people solve this bug can give solution in middle end, we would not
>> need to waste time on filing those bugs in bug zilla and fixing them
>> around this problem.
>> >
>> > Thanks,
>> > -Jiangning
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>
>
>
>
>

1 2 >

1 - 100 of 103 matches

Mail list logo