[PING] Allow functions calling mcount before prologue to be leaf functions

2013-05-23 Thread Andreas Krebbel
Hi,

any comments regarding this patch?
http://gcc.gnu.org/ml/gcc-patches/2013-04/msg00993.html

Bye,

-Andreas-



[PATCH] Doc: Add documentation for the mnemonic attribute

2013-05-23 Thread Andreas Krebbel
Hi,

add missing documentation for the mnemonic attribute.

Ok?

Bye,

-Andreas-


2013-05-24  Andreas Krebbel  

* doc/md.texi: Document the mnemonic attribute.

---
 gcc/doc/md.texi |   67 +++!
 1 file changed, 47 insertions(+), 20 modifications(!)

Index: gcc/doc/md.texi
===
*** gcc/doc/md.texi.orig
--- gcc/doc/md.texi
*** to track the condition codes.
*** 7609,7614 
--- 7609,7615 
  * Attr Example::An example of assigning attributes.
  * Insn Lengths::Computing the length of insns.
  * Constant Attributes:: Defining attributes that are constant.
+ * Mnemonic Attribute::  Obtain the instruction mnemonic as attribute value.
  * Delay Slots:: Defining delay slots required for a machine.
  * Processor pipeline description:: Specifying information for insn scheduling.
  @end menu
*** by the target machine.  It looks like:
*** 7628,7642 
  (define_attr @var{name} @var{list-of-values} @var{default})
  @end smallexample
  
! @var{name} is a string specifying the name of the attribute being defined.
! Some attributes are used in a special way by the rest of the compiler. The
! @code{enabled} attribute can be used to conditionally enable or disable
! insn alternatives (@pxref{Disable Insn Alternatives}). The @code{predicable}
! attribute, together with a suitable @code{define_cond_exec}
! (@pxref{Conditional Execution}), can be used to automatically generate
! conditional variants of instruction patterns. The compiler internally uses
! the names @code{ce_enabled} and @code{nonce_enabled}, so they should not be
! used elsewhere as alternative names.
  
  @var{list-of-values} is either a string that specifies a comma-separated
  list of values that can be assigned to the attribute, or a null string to
--- 7629,7645 
  (define_attr @var{name} @var{list-of-values} @var{default})
  @end smallexample
  
! @var{name} is a string specifying the name of the attribute being
! defined.  Some attributes are used in a special way by the rest of the
! compiler. The @code{enabled} attribute can be used to conditionally
! enable or disable insn alternatives (@pxref{Disable Insn
! Alternatives}). The @code{predicable} attribute, together with a
! suitable @code{define_cond_exec} (@pxref{Conditional Execution}), can
! be used to automatically generate conditional variants of instruction
! patterns. The @code{mnemonic} attribute can be used to check for the
! instruction mnemonic (@pxref{Mnemonic Attribute}).  The compiler
! internally uses the names @code{ce_enabled} and @code{nonce_enabled},
! so they should not be used elsewhere as alternative names.
  
  @var{list-of-values} is either a string that specifies a comma-separated
  list of values that can be assigned to the attribute, or a null string to
*** distances. @xref{Insn Lengths}.
*** 7700,7705 
--- 7703,7713 
  The @code{enabled} attribute can be defined to prevent certain
  alternatives of an insn definition from being used during code
  generation. @xref{Disable Insn Alternatives}.
+ 
+ @item mnemonic
+ The @code{mnemonic} attribute can be defined to implement instruction
+ specific checks in e.g. the pipleline description.
+ @xref{Mnemonic Attribute}.
  @end table
  
  For each of these special attributes, the corresponding
*** forms involving insn attributes.
*** 8252,8257 
--- 8260,8306 
  
  @end ifset
  @ifset INTERNALS
+ @node Mnemonic Attribute
+ @subsection Mnemonic Attribute
+ @cindex mnemonic attribute
+ 
+ The @code{mnemonic} attribute is a string type attribute holding the
+ instruction mnemonic for an insn alternative.  The attribute values
+ will automatically be generated by the machine description parser if
+ there is an attribute definition in the md file:
+ 
+ @smallexample
+ (define_attr "mnemonic" "unknown" (const_string "unknown"))
+ @end smallexample
+ 
+ The default value can be freely chosen as long as it does not collide
+ with any of the instruction mnemonics.  This value will be used
+ whenever the machine description parser is not able to determine the
+ mnemonic string.  This might be the case for output templates
+ containing more than a single instruction as in
+ @code{"mvcle\t%0,%1,0\;jo\t.-4"}.
+ 
+ The @code{mnemonic} attribute set is not generated automatically if the
+ instruction string is generated via C code.
+ 
+ An existing @code{mnemonic} attribute set in an insn definition will not
+ be overriden by the md file parser.  That way it is possible to
+ manually set the instruction mnemonics for the cases where the md file
+ parser fails to determine it automatically.
+ 
+ The @code{mnemonic} attribute is useful for dealing with instruction
+ specific properties in the pipeline description without defining
+ additional insn attributes.
+ 
+ @smallexample
+ (define_attr "ooo_expanded" ""
+   (cond [(eq_attr "

Re: C++ PATCH for c++/56930 (wrong -Wconversion warning with sizeof)

2013-05-23 Thread David Edelsohn
This patch breaks bootstrap on AIX when building libstdc++.  I now
receive the following error message:

/tmp/20130524/gcc/include-fixed/math.h: In function 'long double
fmal(long double, long double, long double)':
/tmp/20130524/gcc/include-fixed/math.h:879:135: internal compiler
error: unexpected expression '#'fma_expr' not supported by
dump_expr#' of kind fma_expr
 inline long double fmal(long double __x, long double __y, long double
__z) { return fma((double) (__x), (double) (__y), (double) (__z)); }

When not using long double 128, AIX math.h provides a number of
definitions, including

inline long double fmal(long double __x, long double __y, long double __z)
{ return fma((double) (__x), (double) (__y), (double) (__z)); }


which was not a problem before the patch.

Also, you are not updating testsuite/ChangeLog.

Thanks, David


[PATCH, ARM, iWMMXT] Check IWMMXT_GR_REGS in the SECONDARY_RELOAD MACRO

2013-05-23 Thread Xinyu Qi
Hi,

  For this simple case, compiled with option -march=iwmmxt -O,
#define N 64
signed int b[N];
signed long long j[N], d[N];
void foo (void)
{
  int i;
  for (i = 0; i < N; i++)
j[i] = d[i] << b[i];
}
An internal compiler error occurs,
error: insn does not satisfy its constraints:
(insn 112 74 75 3 (set (reg:SI 96 wcgr0)
(mem/c:SI (plus:SI (reg:SI 3 r3 [orig:174 ivtmp.19 ] [174])
(reg/f:SI 0 r0 [183])) [0 MEM[symbol: b, index: ivtmp.19_14, 
offset: 0B]+0 S4 A32])) {*iwmmxt_movsi_insn}
 (nil))

The load address of wmmx wcgr register cannot accept the register offset mode 
and the reload pass fails to fix it, so that such error happens.
This issue could be solved by adding check code for IWMMXT_GR_REGS class in the 
SECONDARY_RELOAD MACRO if TARGET_IWMMXT. Current code only check the 
IWMMXT_REGS group.
Patch attached with a new test.
Pass full dejagnu test. No regression.

Is this fix proper?
OK for trunk?

ChangeLog
gcc/
2013-05-24  Xinyu Qi  

* config/arm/arm.h (SECONDARY_OUTPUT_RELOAD_CLASS): Check 
IWMMXT_GR_REGS.
(SECONDARY_INPUT_RELOAD_CLASS): Likewise.

testsuite/
2013-05-24  Xinyu Qi  

* gcc.target/arm/mmx-3.c: New test.


gr_secondary_reload.diff
Description: gr_secondary_reload.diff


Re: [patch, powerpc] allow --with-cpu=native when configuring gcc

2013-05-23 Thread David Edelsohn
This is another orphan PowerPC patch from our backlog.

On native PowerPC, GCC supports -mcpu=native, to generate code for the
same processor flavor that GCC itself is running on. This patch makes
it also possible to configure GCC to default to that option. Tested by
building GCC with this configure option and build=i686-pc-linux-gnu,
host=target=powerpc-linux-gnu, and verifying that the output of
running the resulting gcc with --version --verbose looked sane.

OK to commit?

-Sandra


2013-05-23  Nathan Sidwell  
   Sandra Loosemore  

gcc/
* config.gcc (powerpc-*): Allow native for with-cpu.

Okay.

What is happening with the other alignment patch?

Also, please explicitly include me in the list of email recipients for
PowerPC patches in the future.

Thanks, David


SUBREGs in move2add_note_store (Was: Re: RFA: fix rtl-optimization/56833)

2013-05-23 Thread Joern Rennecke

Quoting Joern Rennecke :


Looking into sharing the code with sites that perform essentially the same
function but look somewhat different, I see there's a problem with using
only reg_set_luid to indicate the consistency of a multi-hard-reg-value
in these other contexts.
For values that are use a base register, the reg_set_luid is the same as
for the base register; for constants, it is the same for all constants
set since the last label.

Say we have reg size 8 bit, base r0, and then
(set reg:HI 2 (plus:SI (reg:HI 0) (const_int 500))
...
(set reg:HI 3 (plus:SI (reg:HI 0) (const_int 500))

Now how do we tell that the value in r2 is no longer valid?

As the example shows, trying to replicate the recorded value across all hard
regs is pointless, as we still need to make sure that we have a still-valid
start register.
OTOH, this ties in nicely with setting the mode of subsequent registers
to VOIDmode.  We can verify the mode to make sure there was no more recent
set of any constituent register.  The check of the extra luids thus becomes
superfluous, as becomes the set.


I've found it is good to have also one mode to invalidate a register for
all uses; it seems natural to use VOIDmode for that, and then we can use
BLKmode for all but the first hard register of a multi-hard-reg register.


However, digging into the code that can use some factoring out of mode setting
and validity testing, I stumbled over the destination handling in  
move2add_note_store in the case of SUBREGs.

The original code was just coded or minimal effort to find the actual regno
of a REG or SUBREG (this only made sense with the old word-based SUBREG
scheme).
Alexandre, AFAICT, you have misunderstood this bit to make the function
pretend that the inner part of the subreg is the actual destination.
Or is there some obscure reason to have the mode checks worry about the
inside of the SUBREG?

I have attached the patch that makes the reload_cse_move2add sub-pass work
the way it think it should be; it'll take some time to test this properly,
though.
2013-05-24  Joern Rennecke 

* postreload.c (move2add_record_mode): New function.
(move2add_record_sym_value, move2add_valid_value_p): Likewise.
(move2add_use_add2_insn): Use move2add_record_sym_value.
(move2add_use_add3_insn): Likewise.
(reload_cse_move2add): Use move2add_valid_value_p and
move2add_record_mode.  Invalidate call-clobbered regs
by setting reg_mode to VOIDmode.
(move2add_note_store): Don't pretend the inside of a SUBREG is
the actual destination.  Invalidate PRE_INC / POST_INC
registers my setting reg_mode to VOIDmode.
Use move2add_record_sym_value, move2add_valid_value_p and
move2add_record_mode.

Index: postreload.c
===
--- postreload.c(revision 199190)
+++ postreload.c(working copy)
@@ -1645,14 +1645,22 @@ reload_combine_note_use (rtx *xp, rtx in
later disable any optimization that would cross it.
reg_offset[n] / reg_base_reg[n] / reg_symbol_ref[n] / reg_mode[n]
are only valid if reg_set_luid[n] is greater than
-   move2add_last_label_luid.  */
+   move2add_last_label_luid.
+   For a set that established a new (potential) base register with
+   non-constant value, we use move2add_luid from the place where the
+   setting insn is encountered; registers based off that base then
+   get the same reg_set_luid.  Constants all get
+   move2add_last_label_luid + 1 as their reg_set_luid.  */
 static int reg_set_luid[FIRST_PSEUDO_REGISTER];
 
 /* If reg_base_reg[n] is negative, register n has been set to
reg_offset[n] or reg_symbol_ref[n] + reg_offset[n] in mode reg_mode[n].
If reg_base_reg[n] is non-negative, register n has been set to the
sum of reg_offset[n] and the value of register reg_base_reg[n]
-   before reg_set_luid[n], calculated in mode reg_mode[n] .  */
+   before reg_set_luid[n], calculated in mode reg_mode[n] .
+   For multi-hard-register registers, all but the first one are
+   recorded as BLKmode in reg_mode.  Setting reg_mode to VOIDmode
+   marks it as invalid.  */
 static HOST_WIDE_INT reg_offset[FIRST_PSEUDO_REGISTER];
 static int reg_base_reg[FIRST_PSEUDO_REGISTER];
 static rtx reg_symbol_ref[FIRST_PSEUDO_REGISTER];
@@ -1674,6 +1682,63 @@ #define MODES_OK_FOR_MOVE2ADD(OUTMODE, I
|| (GET_MODE_SIZE (OUTMODE) <= GET_MODE_SIZE (INMODE) \
&& TRULY_NOOP_TRUNCATION_MODES_P (OUTMODE, INMODE)))
 
+/* Record that REG is being set to a value with the mode of REG.  */
+
+static void
+move2add_record_mode (rtx reg)
+{
+  int regno, nregs;
+  enum machine_mode mode = GET_MODE (reg);
+  int i;
+
+  if (GET_CODE (reg) == SUBREG)
+{
+  regno = subreg_regno (reg);
+  nregs = subreg_nregs (reg);
+}
+  else if (REG_P (reg))
+{
+  regno = REGNO (reg);
+  nregs = hard_regno_nregs[regno][mode];
+}
+  else
+gcc_unreachable ();
+  for (i = nregs; 

RE: [PING]RE: [patch] cilkplus: Array notation for C patch

2013-05-23 Thread Iyer, Balaji V
[I included Jeff Law also in this conversation]

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Aldy Hernandez
> Sent: Thursday, May 23, 2013 3:23 PM
> To: Jakub Jelinek
> Cc: Iyer, Balaji V; Richard Henderson; 'Joseph S. Myers'; 'gcc-patches'
> Subject: Re: [PING]RE: [patch] cilkplus: Array notation for C patch
> 
> On 05/23/13 14:03, Jakub Jelinek wrote:
> > On Thu, May 23, 2013 at 06:27:04PM +, Iyer, Balaji V wrote:
> >> gcc/testsuite/ChangeLog
> >> 2013-05-23  Balaji V. Iyer  
> >>
> >>  * gcc.dg/cilk-plus/array_notation/compile/array_test2.c: New test.
> >
> > I have concerns about the test locations, to me this looks way too
> > deep tree, whether something is a compile test, or compile test
> > expecting errors or runtime test is easily determined by { dg-do
> > compile } vs. { dg-do run } and presence or lack of { dg-error ... }
> > comments.  So IMHO that level should be left out, plus I'd say the
> > array_notation/ level is unnecessary as well, just put everything into
> > c-c++-common/cilk-plus/an-*.c (except for tests that aren't going to
> > be usable for C++, those can stay in gcc.dg/cilk-plus/an-*.c).  Then
> > gcc.dg/cilk-plus/*.exp would just ensure that tests from that
> > directory are run and also from c-c++-common/ and later on the same
> > would happen in g++.dg/cilk-plus/.  In the future when you will need
> > to link against runtime library cilk-plus.exp would just arrange for that 
> > to be
> added to LD_LIBRARY_PATH, -L.../ etc.
> 
> For the record, I agree.

I got all your responses and, if I remove the compile, execute and errors 
directories but keep cilk-plus and array notation, maybe even abbreviate array 
notation to "an", (in future cilk keywords to "ck", pragma simd to "ps" and  
elemental function to "ef"), will that be OK with everyone? We removed one 
directory and moved 3 scripts to 1, and with abbreviated subdirectories the 
ChangeLog lines won't be long. I really like to put all the cilk-plus tests in 
one location for C compiler so that if anyone wants to get them I can point to 
one location.

Thanks,

Balaji V. Iyer.


[patch, powerpc] allow --with-cpu=native when configuring gcc

2013-05-23 Thread Sandra Loosemore

This is another orphan PowerPC patch from our backlog.

On native PowerPC, GCC supports -mcpu=native, to generate code for the 
same processor flavor that GCC itself is running on.  This patch makes 
it also possible to configure GCC to default to that option.


Tested by building GCC with this configure option and 
build=i686-pc-linux-gnu, host=target=powerpc-linux-gnu, and verifying 
that the output of running the resulting gcc with --version --verbose 
looked sane.


OK to commit?

-Sandra


2013-05-23  Nathan Sidwell  
Sandra Loosemore  

gcc/
* config.gcc (powerpc-*): Allow native for with-cpu.

Index: gcc/config.gcc
===
--- gcc/config.gcc	(revision 199270)
+++ gcc/config.gcc	(working copy)
@@ -3535,7 +3535,7 @@ case "${target}" in
 tm_defines="${tm_defines} CONFIG_PPC405CR"
 eval "with_$which=405"
 ;;
-			"" | common \
+			"" | common | native \
 			| power | power[2345678] | power6x | powerpc | powerpc64 \
 			| rios | rios1 | rios2 | rsc | rsc1 | rs64a \
 			| 401 | 403 | 405 | 405fp | 440 | 440fp | 464 | 464fp \


Re: Unordered container insertion hints

2013-05-23 Thread Paolo Carlini

On 05/23/2013 10:01 PM, François Dumont wrote:

Some feedback regarding this patch ?
Two quick ones: what if the hint is wrong? I suppose the insertion 
succeeds anyway, it's only a little waste of time, right? Is it possible 
that for instance something throws in that case and would not now (when 
the hint is simply ignored)? In case, check and re-check we are still 
conforming.


In any case, I think it's quite easy to notice if an implementation is 
using the hint in this way or a similar one basing on some simple 
benchmarks, without looking of course at the actual implementation code. 
Do we have any idea what other implementations are doing? Like, eg, they 
invented something for unordered_set and map too? Or a better way to 
exploit the hint for the multi variants? Eventually I suppose we want to 
add a performance testcase to our testsuite.


Thanks!
Paolo.


Re: [PING]RE: [patch] cilkplus: Array notation for C patch

2013-05-23 Thread Mike Stump
On May 23, 2013, at 2:08 PM, "Iyer, Balaji V"  wrote:
> If I put things in c-c++-common, how do I test the code with different flags 
> (I didn't see any .exp file there)? For example, how can I test if it works 
> with "-O2" and then have another test for "-O2 -g" etc.? Do I just use 
> multiple "dg-options" in my code? The way I have it right now, it uses 
> several flags, and tries them in different combinations. I am if this is a 
> trivial question, I am not very familiar with DejaGNU framework and I went 
> through GCC and DejaGNU manual a while back and I couldn't find an answer for 
> this.

If you have 30+ tests, I'd do a .exp file and a new subdirectory.  If you only 
plan to have a few, you can

test-1.c:
// option -O2

test-1g.c
// option -O2 -g
#include "test-1.c"

I'm not worried about a new subdirectory as much as others are; indeed, I think 
I prefer it.  I kinda like the idea that one can run cilkplus_AN_c_compile.exp 
and feature test the compiler.

[tree-ssa] fix for PR57385

2013-05-23 Thread Alexander Ivchenko
Hi,

The following patch fixes PR57385
(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57385)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 51e7b9e..cca61e7 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2013-05-24  Alexander Ivchenko  
+
+   PR tree-ssa/57385
+   * tree-ssa-sccvn.c (fully_constant_vn_reference_p): Check
+   that index is not negative.
+
 2013-05-23  Richard Henderson  

PR target/56742
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index b1895fb..730e62f 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2013-05-24  Alexander Ivchenko  
+
+   PR tree-ssa/57385
+   * gcc.dg/tree-ssa/pr57385.c: New test.
+
 2013-05-23  Christian Bruel  

PR debug/57351
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr57385.c
b/gcc/testsuite/gcc.dg/tree-ssa/pr57385.c
new file mode 100644
index 000..a04a998
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr57385.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
+
+int c;
+
+void foo(int f)
+{
+  int wbi=-1;
+  c = (f ? "012346":"01345:6008")[wbi];
+}
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 49d61b0..0e7a74c 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -1294,6 +1294,7 @@ fully_constant_vn_reference_p (vn_reference_t ref)
  == TYPE_MODE (TREE_TYPE (TREE_TYPE (arg0->op0
  && GET_MODE_CLASS (TYPE_MODE (op->type)) == MODE_INT
  && GET_MODE_SIZE (TYPE_MODE (op->type)) == 1
+ && tree_int_cst_sgn (op->op0) >= 0
  && compare_tree_int (op->op0, TREE_STRING_LENGTH (arg0->op0)) < 0)
return build_int_cst_type (op->type,
   (TREE_STRING_POINTER (arg0->op0)



 Tested and bootsraped on x86-64 linux.
Is it ok for trunk? If yes, should we backport it to 4.8?


thanks,
Alexander


Re: [patch] Default to --enable-libstdcxx-time=auto

2013-05-23 Thread Rainer Orth
Rainer Orth  writes:

> Jakub Jelinek  writes:
>
>> On Thu, May 23, 2013 at 10:56:25PM +0200, Rainer Orth wrote:
>>> >> I think std::chrono::steady_clock::now() needs to be protected with
>>> >> !(__sun__ && __svr4__) in GLIBCXX_3.4.17 since it only became available
>>> >> by default with Jonathan's patch.
>>> >
>>> > Ah, I see, gnu.ver has some
>>> > #ifdef HAVE_SYMVER_SYMBOL_RENAMING_RUNTIME_SUPPORT
>>> > #endif
>>> > guards there, does it work if you add it around the
>>> > GLIBCXX_3.4.17 std::chrono::steady_clock::now() definition?
>>> 
>>> If I do so, I would probably get an abi_check failure: with your patch,
>>> std::chrono::steady_clock::now() ends up in GLIBCXX_3.4.19 while it
>>> should appear in GLIBCXX_3.4.20 since this is new in GCC 4.9.
>>
>> In that case, either the --enable-libstdcxx-time=auto patch needs to be
>> backported to 4.8.1, or at least a small portion of it (do that auto thing
>> on Solaris only)?
>> Have steady_clock::now() as @GLIBCXX_3.4.17 + @@GLIBCXX_3.4.19 on Linux,
>> @@GLIBCXX_3.4.20 on Solaris, something else on other OSes would be quite
>> confusing.
>
> Agreed, that seems the best course of action if that's an option.

I just remembered that we aren't there yet even on mainline:

* This snippet

  http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01255.html

  is necessary to avoid bootstrap failure on Solaris 9.

* We'll need to link every C++ program with -lrt on Solaris, as
  mentioned in the same message.  I suppose the best way to do this is
  along the lines of libgfortran.spec, rather than duplicate the
  necessary configury between g++ and libstdc++.  This might prove
  pretty invasive for the testsuite, though, and delay the 4.8.1 release
  quite a bit.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Remove global state from gcc/tracer.c

2013-05-23 Thread Jeff Law

On 05/23/2013 02:59 PM, Jakub Jelinek wrote:

On Thu, May 23, 2013 at 02:44:48PM -0600, Jeff Law wrote:

On 05/23/2013 02:31 PM, Richard Henderson wrote:

I think we need more weigh in from other maintainers on this, rather than
iterating a 5th time today...

This seems like an awful lot of pain.

I don't think we should be looking to generate different code for
library vs executable GCC.

I think we should look for *clean* first, then we can look at what
we could change if the compile-time performance isn't what we want.

Lots of C++ code manages to pass around the implicit this pointer
and use it appropriately without that being a significant source of
performance concerns.  I suspect GCC would be the same in that
regard. The cost of passing around & using that pointer is dwarfed
by all the other lameness we have.


I'm afraid we don't have the luxury of slowing the compiler too much.
Nor do we have the luxury of continuing to not deal with these long term 
issues.  Nor do we have the luxury of creating something so (*&@#$ ugly 
that nobody is willing to work on it again in the future.   Particularly 
when there's no evidence it's going to be measurably slower.





Anyway, I don't see how a single this would help with all the global state,
because there are various levels of global state.  The tracer changes show
just the easiest one, non-GTY pass that that is internal to the file,
starts living at the start of the pass and can be forgotten at the end of
the pass.
Agreed, but we need to start somewhere and passes of this nature seem 
like a reasonable place to me



  But, often pass has some of its global state, and calls functions

from other files that access different global state (cfun, crtl,
current_function_decl, lots of other things), some files have global state
preserved across multiple passes, etc.  Before we start changing anything,
we need a firm plan for everything, otherwise we end up with a useless
partial transformation.  Some global state data is accessed only
occassionally, but other is accessed all the time (cfun being a very good
example here).
I don't disagree in principle WRT partial transformations.  But I also 
believe that there is value in cleaning up the the simple stuff, even in 
isolation.


To take your specific examples:

  * Global state is global state needs to be available via
  a global context pointer of some kind regardless of how we
  handle pass-local data.

  * pass-local state that spans passes.  I'd like to see an
  example, but my gut feeling is such things are probably
  going to either become global state or die.  They'll have
  to be handled on a case by case basis.

In both cases, David's changes don't make those problems any easier or 
any harder.  He merely encapsulates pass-local data where it's easy to 
do so right now.  It's just a cleaner design/implementation and I'd like 
to see it pursued.


Jeff


Re: GCC does not support *mmintrin.h with function specific opts

2013-05-23 Thread Sriraman Tallam
Ping, for review of ipa-inline.c change.

Sri

On Mon, May 20, 2013 at 11:04 AM, Sriraman Tallam  wrote:
> On Fri, May 17, 2013 at 11:21 PM, Jakub Jelinek  wrote:
>> On Fri, May 17, 2013 at 09:00:21PM -0700, Sriraman Tallam wrote:
>>> --- ipa-inline.c  (revision 198950)
>>> +++ ipa-inline.c  (working copy)
>>> @@ -374,7 +374,33 @@ can_early_inline_edge_p (struct cgraph_edge *e)
>>>return false;
>>>  }
>>>if (!can_inline_edge_p (e, true))
>>> -return false;
>>> +{
>>> +  enum availability avail;
>>> +  struct cgraph_node *callee
>>> += cgraph_function_or_thunk_node (e->callee, &avail);
>>> +  /* Flag an error when the inlining cannot happen because of target 
>>> option
>>> +  mismatch but the callee is marked as "always_inline".  In -O0 mode
>>> +  this will go undetected because the error flagged in
>>> +  "expand_call_inline" in tree-inline.c might not execute and the
>>> +  inlining will not happen.  Then, the linker could complain about a
>>> +  missing body for the callee if it turned out that the callee was
>>> +  also marked "gnu_inline" with extern inline keyword as bodies of such
>>> +  functions are not generated.  */
>>> +  if ((!optimize
>>> +|| flag_no_inline)
>>
>> This should be if ((!optimize || flag_no_inline) on one line.
>>
>> I'd prefer also the testcase for the ICEs, something like:
>>
>> /* Test case to check if AVX intrinsics and function specific target
>>optimizations work together.  Check by including x86intrin.h  */
>>
>> /* { dg-do compile } */
>> /* { dg-options "-O2 -mno-sse -mno-avx" } */
>>
>> #include 
>>
>> __m256 a, b, c;
>> void __attribute__((target ("avx")))
>> foo (void)
>> {
>>   a = _mm256_and_ps (b, c);
>> }
>>
>> and another testcase that does:
>>
>> /* { dg-do compile } */
>> #pragma GCC target ("mavx") /* { dg-error "whatever" } */
>>
>> Otherwise it looks good to me, but I'd prefer the i?86 maintainers to review
>> it too (and Honza for ipa-inline.c?).
>
> Honza, could you please take a look at the ipa-inline.c fix? I will
> split the patches and submit after Honza's review. I will also make
> the changes mentioned.
>
> Thanks
> Sri
>
>
>>
>> Jakub


Re: fix memory spaces and references for C

2013-05-23 Thread Joseph S. Myers
On Thu, 23 May 2013, Mike Stump wrote:

> This is a smaller patch than maybe it should be.  Arguably not recursing 
> is a better approach, but then we need to split into two functions, so 
> that I can add the REFERENCE_TYPE back to the top.  Let me know if you 
> prefer it split.
> 
> A user actually hit this in rather trivial code with memory spaces.
> 
> Tested on two platforms, one with memory spaces and one without.
> 
> Ok?

This patch is OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [patch] Default to --enable-libstdcxx-time=auto

2013-05-23 Thread Rainer Orth
Jakub Jelinek  writes:

> On Thu, May 23, 2013 at 10:56:25PM +0200, Rainer Orth wrote:
>> >> I think std::chrono::steady_clock::now() needs to be protected with
>> >> !(__sun__ && __svr4__) in GLIBCXX_3.4.17 since it only became available
>> >> by default with Jonathan's patch.
>> >
>> > Ah, I see, gnu.ver has some
>> > #ifdef HAVE_SYMVER_SYMBOL_RENAMING_RUNTIME_SUPPORT
>> > #endif
>> > guards there, does it work if you add it around the
>> > GLIBCXX_3.4.17 std::chrono::steady_clock::now() definition?
>> 
>> If I do so, I would probably get an abi_check failure: with your patch,
>> std::chrono::steady_clock::now() ends up in GLIBCXX_3.4.19 while it
>> should appear in GLIBCXX_3.4.20 since this is new in GCC 4.9.
>
> In that case, either the --enable-libstdcxx-time=auto patch needs to be
> backported to 4.8.1, or at least a small portion of it (do that auto thing
> on Solaris only)?
> Have steady_clock::now() as @GLIBCXX_3.4.17 + @@GLIBCXX_3.4.19 on Linux,
> @@GLIBCXX_3.4.20 on Solaris, something else on other OSes would be quite
> confusing.

Agreed, that seems the best course of action if that's an option.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


RE: [PING]RE: [patch] cilkplus: Array notation for C patch

2013-05-23 Thread Iyer, Balaji V


> -Original Message-
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Thursday, May 23, 2013 4:52 PM
> To: Iyer, Balaji V
> Cc: Jakub Jelinek; Richard Henderson; 'Joseph S. Myers'; 'Aldy Hernandez'; 
> 'gcc-
> patches'
> Subject: Re: [PING]RE: [patch] cilkplus: Array notation for C patch
> 
> On 05/23/2013 02:38 PM, Iyer, Balaji V wrote:
> >
> > Hi Jakub & Aldy, There are a couple reasons why I picked this
> > hierarchy. I looked at gcc-c-torture directory and it has compile,
> > execute etc. This is why I had execute, compile and errors directory.
> > Also, we are planning to have some hybrid tests that will add array
> > notation + cilk keywords, array notation + pragma simd, etc. Yes, I
> > can see the deeply buried issue, but I once had long file names
> > (~25-30 characters) that tells what kind of tests (when we first
> > opened the branch) they are and someone in the mailing list complained
> > that the file names were long and suggested that I use directories
> > instead. If it is OK with you both I would like to keep this hierarchy
> c-torture is the oldest of our testing frameworks -- it goes back to separate 
> c-
> torture testing releases from Tege.  IIRC those were originally just shell 
> scripts
> which were invoked on every file in the directory.  Thus every file in a 
> particular
> directory had to have the same characteristics (ie, it must compile, compile &
> run, not compile).
> 
> I'm guessing Aldy & Jakub want this stuff done in the dg-torture framework
> which would flatten out one of the directories.
> 
> As someone (rth?) mentioned elsewhere, we have some tests that can and
> should be shared between the C & C++ front-end.  Most if not all of
> these seem to fall into that category.   I'd separate them into
> common to c/c++ (in the c-c++-common directory), c specific and c++ specific
> which would go into the gcc.dg and g++.dg directories.
> 
> I'd squash out the cilk-plus directory.  While this is currently an 
> extension, this
> may ultimately end up being part of ISO-C rather than being an extension.

If I put things in c-c++-common, how do I test the code with different flags (I 
didn't see any .exp file there)? For example, how can I test if it works with 
"-O2" and then have another test for "-O2 -g" etc.? Do I just use multiple 
"dg-options" in my code? The way I have it right now, it uses several flags, 
and tries them in different combinations. I am if this is a trivial question, I 
am not very familiar with DejaGNU framework and I went through GCC and DejaGNU 
manual a while back and I couldn't find an answer for this.

Thanks,

Balaji V. Iyer.


> 
> Jeff



Re: [patch] Default to --enable-libstdcxx-time=auto

2013-05-23 Thread Jakub Jelinek
On Thu, May 23, 2013 at 10:56:25PM +0200, Rainer Orth wrote:
> >> I think std::chrono::steady_clock::now() needs to be protected with
> >> !(__sun__ && __svr4__) in GLIBCXX_3.4.17 since it only became available
> >> by default with Jonathan's patch.
> >
> > Ah, I see, gnu.ver has some
> > #ifdef HAVE_SYMVER_SYMBOL_RENAMING_RUNTIME_SUPPORT
> > #endif
> > guards there, does it work if you add it around the
> > GLIBCXX_3.4.17 std::chrono::steady_clock::now() definition?
> 
> If I do so, I would probably get an abi_check failure: with your patch,
> std::chrono::steady_clock::now() ends up in GLIBCXX_3.4.19 while it
> should appear in GLIBCXX_3.4.20 since this is new in GCC 4.9.

In that case, either the --enable-libstdcxx-time=auto patch needs to be
backported to 4.8.1, or at least a small portion of it (do that auto thing
on Solaris only)?
Have steady_clock::now() as @GLIBCXX_3.4.17 + @@GLIBCXX_3.4.19 on Linux,
@@GLIBCXX_3.4.20 on Solaris, something else on other OSes would be quite
confusing.

> > Seems the solaris baseline_symbols.txt files don't meantion
> > this symbol, thus it wasn't exported before and thus the
> > compat definition there isn't really needed for Solaris.
> 
> Right, it only got enable by defaulting --enable-libstdcxx-time to auto.

Jakub


Re: [PING]RE: [patch] cilkplus: Array notation for C patch

2013-05-23 Thread Joseph S. Myers
On Wed, 22 May 2013, Jeff Law wrote:

> On 05/22/2013 03:58 PM, Joseph S. Myers wrote:
> > 
> > Regarding commonality between OpenMP and Cilk, note also the new C
> > Parallel Language Extensions WG14 study group chaired by Clark Nelson and
> > aiming to propose a standard set of C extensions for parallel programming,
> > announced yesterday on the WG14 reflector.  I don't know if anyone here is
> > intending to be involved in that.
> > 
> > http://www.open-std.org/mailman/listinfo/cplex
> > 
> > I'm OK with the array notation patch as it stands now, although certainly
> > as discussed today there is still some scope for further improvements.
> I'm aware of efforts to standardize the Cilk+ style extensions for ISO C.  Is
> the new study group effectively that effort or something different?

"This proposal is expected to combine the best ideas from Cilk and OpenMP, 
two of the most widely-used and well-established parallel language 
extensions for the C language family.".  See page 10 of the draft Delft 
minutes  for 
more discussion.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Remove global state from gcc/tracer.c

2013-05-23 Thread Jakub Jelinek
On Thu, May 23, 2013 at 02:44:48PM -0600, Jeff Law wrote:
> On 05/23/2013 02:31 PM, Richard Henderson wrote:
> >I think we need more weigh in from other maintainers on this, rather than
> >iterating a 5th time today...
> This seems like an awful lot of pain.
> 
> I don't think we should be looking to generate different code for
> library vs executable GCC.
> 
> I think we should look for *clean* first, then we can look at what
> we could change if the compile-time performance isn't what we want.
> 
> Lots of C++ code manages to pass around the implicit this pointer
> and use it appropriately without that being a significant source of
> performance concerns.  I suspect GCC would be the same in that
> regard. The cost of passing around & using that pointer is dwarfed
> by all the other lameness we have.

I'm afraid we don't have the luxury of slowing the compiler too much.

Anyway, I don't see how a single this would help with all the global state,
because there are various levels of global state.  The tracer changes show
just the easiest one, non-GTY pass that that is internal to the file,
starts living at the start of the pass and can be forgotten at the end of
the pass.  But, often pass has some of its global state, and calls functions
from other files that access different global state (cfun, crtl,
current_function_decl, lots of other things), some files have global state
preserved across multiple passes, etc.  Before we start changing anything,
we need a firm plan for everything, otherwise we end up with a useless
partial transformation.  Some global state data is accessed only
occassionally, but other is accessed all the time (cfun being a very good
example here).

Jakub


Re: [patch] Default to --enable-libstdcxx-time=auto

2013-05-23 Thread Rainer Orth
Jakub Jelinek  writes:

> On Thu, May 23, 2013 at 10:45:32PM +0200, Rainer Orth wrote:
>> Jakub Jelinek  writes:
>> 
>> > On Thu, May 23, 2013 at 04:02:18PM +0200, Jakub Jelinek wrote:
>> >> So, here is an untested 4.8 branch patch.  The @GLIBCXX_3.4.17 +
>> >> @@GLIBCXX_3.4.19 stuff gets ugly, I admit, but don't have other solution.
>> >> Tested just that it compiles/links, abi list looks good and abi.exp 
>> >> testing,
>> >> haven't actually tried to test it more than that.
>> >
>> > Now fully bootstrapped/regtested on x86_64-linux and i686-linux.
>> 
>> This patch breaks Solaris bootstrap:
>> 
>> ld: fatal: libstdc++-symbols.ver-sun: 4423: symbol 
>> 'std::chrono::steady_clock::now()' is already defined in file: 
>> libstdc++-symbols.ver-sun: symbol version conflict
>> collect2: error: ld returned 1 exit status
>> make[6]: *** [libstdc++.la] Error 1
>> 
>> I think std::chrono::steady_clock::now() needs to be protected with
>> !(__sun__ && __svr4__) in GLIBCXX_3.4.17 since it only became available
>> by default with Jonathan's patch.
>
> Ah, I see, gnu.ver has some
> #ifdef HAVE_SYMVER_SYMBOL_RENAMING_RUNTIME_SUPPORT
> #endif
> guards there, does it work if you add it around the
> GLIBCXX_3.4.17 std::chrono::steady_clock::now() definition?

If I do so, I would probably get an abi_check failure: with your patch,
std::chrono::steady_clock::now() ends up in GLIBCXX_3.4.19 while it
should appear in GLIBCXX_3.4.20 since this is new in GCC 4.9.

> Seems the solaris baseline_symbols.txt files don't meantion
> this symbol, thus it wasn't exported before and thus the
> compat definition there isn't really needed for Solaris.

Right, it only got enable by defaulting --enable-libstdcxx-time to auto.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [patch] Fix sched-deps DEP_POSTPONED, ds_t documentation

2013-05-23 Thread Steven Bosscher
Ping**2

On Fri, May 17, 2013 at 9:01 PM, Steven Bosscher  wrote:
> Ping
>
> On Sun, May 12, 2013 at 5:45 PM, Steven Bosscher wrote:
>> Hello,
>>
>> While working on a sched-deps based delay slot scheduler, I've come to
>> the conclusion that the dependencies themselves must indicate whether
>> the dependent ref is delayed. So I started hacking sched-deps and ran
>> into trouble... It turns out there is a problem introduced along with
>> DEP_POSTPONED last year, but the real problem is the complicated ds_t
>> representation and the unclear documentation. The first *6* bits on a
>> ds_t were reserved for the dependence type, and two more bits were
>> reserved for HARD_DEP and DEP_CANCELLED:
>>
>> -/* First '6' stands for 4 dep type bits and the HARD_DEP and DEP_CANCELLED
>> -   bits.
>> -   Second '4' stands for BEGIN_{DATA, CONTROL}, BE_IN_{DATA, CONTROL}
>> -   dep weakness.  */
>> -#define BITS_PER_DEP_WEAK ((BITS_PER_DEP_STATUS - 6) / 4)
>>
>> But DEP_POSTPONED adds another bit:
>>
>> /* Instruction has non-speculative dependence.  This bit represents the
>>property of an instruction - not the one of a dependence.
>>   Therefore, it can appear only in the TODO_SPEC field of an instruction.  */
>> #define HARD_DEP (DEP_CONTROL << 1)
>>
>> /* Set in the TODO_SPEC field of an instruction for which new_ready
>>has decided not to schedule it speculatively.  */
>>  #define DEP_POSTPONED (HARD_DEP << 1)
>>
>> /* Set if a dependency is cancelled via speculation.  */
>> #define DEP_CANCELLED (DEP_POSTPONED << 1)
>>
>> I wanted to add another flag, DEP_DELAYED, and optimistically just
>> added another bit, the compiler started warning, etc.
>>
>>
>> So far we seem to've gotten away with this because the sign bit on a
>> ds_t was unused:
>>
>> /* We exclude sign bit.  */
>> #define BITS_PER_DEP_STATUS (HOST_BITS_PER_INT - 1)
>>
>>
>> The attached patch extends the ds_t documentation to clarify in a
>> comment how all the bits are used. I've made ds_t and dw_t unsigned
>> int, because ds_t is just a collection of bits, and dw_t is unsigned.
>> The target hooks I had to update are all only used by ia64.
>>
>> I opportunistically reserved the one left-over bit for my own purposes ;-)
>>
>> Bootstrapped&tested on ia64-unknown-linux-gnu and on
>> powerpc64-unknown-linux-gnu unix-{,-m32).
>> OK for trunk?
>>
>> Ciao!
>> Steven


Re: Remove global state from gcc/tracer.c

2013-05-23 Thread Richard Henderson
On 05/23/2013 12:06 PM, Richard Henderson wrote:
> Another thing I should mention while you're doing all of these static function
> to class member conversions is that as written we're losing target-specific
> optimizations that can be done on purely local functions.  This is trivially
> fixed by placing these new classes in an anonymous namespace.

At which point it occurs to me that we don't need to distinguish between static
and normal member methods, nor play with sub-state structures.  All we need to
require is that IPA constant propagation sees that the initial "this" argument
is passed a constant value, and let it propagate and eliminate.

E.g.

namespace {

class pass_state
{
  private:
int x, y, z;

  public:
constexpr pass_state()
  : x(0), y(0), z(0)
{ }

void doit();

  private:
void a();
void b();
void c();
};

// ...

} // anon namespace

#ifdef GLOBAL_STATE
static pass_state ps;
#endif

void toplev()
{
#ifndef GLOBAL_STATE
  pass_state ps;
#endif
  ps.doit();
}

With an example that's probably too small, I verified that gcc will eliminate
all of the this parameters with GLOBAL_STATE defined.  It's certainly something
worth investigating further, as this scheme has the least amount of boilerplate
of any so far advanced.


r~


Re: [PING]RE: [patch] cilkplus: Array notation for C patch

2013-05-23 Thread Jeff Law

On 05/23/2013 02:38 PM, Iyer, Balaji V wrote:


Hi Jakub & Aldy, There are a couple reasons why I picked this
hierarchy. I looked at gcc-c-torture directory and it has compile,
execute etc. This is why I had execute, compile and errors directory.
Also, we are planning to have some hybrid tests that will add array
notation + cilk keywords, array notation + pragma simd, etc. Yes, I
can see the deeply buried issue, but I once had long file names
(~25-30 characters) that tells what kind of tests (when we first
opened the branch) they are and someone in the mailing list
complained that the file names were long and suggested that I use
directories instead. If it is OK with you both I would like to keep
this hierarchy
c-torture is the oldest of our testing frameworks -- it goes back to 
separate c-torture testing releases from Tege.  IIRC those were 
originally just shell scripts which were invoked on every file in the 
directory.  Thus every file in a particular directory had to have the 
same characteristics (ie, it must compile, compile & run, not compile).


I'm guessing Aldy & Jakub want this stuff done in the dg-torture 
framework which would flatten out one of the directories.


As someone (rth?) mentioned elsewhere, we have some tests that can and 
should be shared between the C & C++ front-end.  Most if not all of 
these seem to fall into that category.   I'd separate them into
common to c/c++ (in the c-c++-common directory), c specific and c++ 
specific which would go into the gcc.dg and g++.dg directories.


I'd squash out the cilk-plus directory.  While this is currently an 
extension, this may ultimately end up being part of ISO-C rather than 
being an extension.


Jeff



Re: Remove global state from gcc/tracer.c

2013-05-23 Thread David Malcolm
On Thu, 2013-05-23 at 13:31 -0700, Richard Henderson wrote:
> >  /* The Ith entry is the number of objects on a page or order I.  */
> >  
> > -static unsigned objects_per_page_table[NUM_ORDERS];
> > +DEFINE_STATIC_STATE_ARRAY(unsigned, objects_per_page_table, NUM_ORDERS)
> >  
> >  /* The Ith entry is the size of an object on a page of order I.  */
> >  
> > -static size_t object_size_table[NUM_ORDERS];
> > +DEFINE_STATIC_STATE_ARRAY(size_t, object_size_table, NUM_ORDERS)
> >  
> >  /* The Ith entry is a pair of numbers (mult, shift) such that
> > ((k * mult) >> shift) mod 2^32 == (k / OBJECT_SIZE(I)) mod 2^32,
> > for all k evenly divisible by OBJECT_SIZE(I).  */
> >  
> > -static struct
> > +struct inverse_table_def
> >  {
> >size_t mult;
> >unsigned int shift;
> > -}
> > -inverse_table[NUM_ORDERS];
> > +};
> > +DEFINE_STATIC_STATE_ARRAY(inverse_table_def, inverse_table, NUM_ORDERS)
> >  
> >  /* A page_entry records the status of an allocation page.  This
> > structure is dynamically sized to fit the bitmap in_use_p.  */
> > @@ -343,7 +346,7 @@ struct free_object
> >  #endif
> >  
> >  /* The rest of the global variables.  */
> > -static struct globals
> > +struct globals
> >  {
> >/* The Nth element in this array is a page with objects of size 2^N.
> >   If there are any pages with free objects, they will be at the
> > @@ -457,7 +460,11 @@ static struct globals
> >  /* The overhead for each of the allocation orders.  */
> >  unsigned long long total_overhead_per_order[NUM_ORDERS];
> >} stats;
> > -} G;
> > +};
> > +
> > +DEFINE_STATIC_STATE(globals, G)
> 
> Be careful here.  Note that all of the arrays that you convert are notionally
> static constants defining the bucket sizes for the algorithm, except we don't
> actually compute them at compile time.  They would be true global data shared
> by all threads.

I know - but when you have a big hammer, sometimes it's easier just to
hit all of the nails, rather than just the ones you need to :)

[...]



C++ PATCH for c++/57388 (ICE with ref-qualifier)

2013-05-23 Thread Jason Merrill

This is a simple oversight in the ref-qualifier code.

Tested x86_64-pc-linux-gnu, applying to trunk.  Jakub, is this OK for 4.8.1?
commit 0914d39b7335966f5d828c1b4225beb2e5448755
Author: Jason Merrill 
Date:   Thu May 23 14:01:27 2013 -0400

	PR c++/57388
	* tree.c (build_ref_qualified_type): Clear
	FUNCTION_RVALUE_QUALIFIED for lvalue ref-qualifier.

diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index e4967c1..a756634 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -1754,8 +1754,10 @@ build_ref_qualified_type (tree type, cp_ref_qualifier rqual)
 {
 case REF_QUAL_RVALUE:
   FUNCTION_RVALUE_QUALIFIED (t) = 1;
-  /* Intentional fall through */
+  FUNCTION_REF_QUALIFIED (t) = 1;
+  break;
 case REF_QUAL_LVALUE:
+  FUNCTION_RVALUE_QUALIFIED (t) = 0;
   FUNCTION_REF_QUALIFIED (t) = 1;
   break;
 default:
diff --git a/gcc/testsuite/g++.dg/cpp0x/ref-qual13.C b/gcc/testsuite/g++.dg/cpp0x/ref-qual13.C
new file mode 100644
index 000..84d3b0f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/ref-qual13.C
@@ -0,0 +1,29 @@
+// PR c++/57388
+// { dg-require-effective-target c++11 }
+
+template struct A
+{
+  static constexpr bool value = false;
+};
+
+template
+struct A
+{
+  static constexpr bool value = true;
+};
+
+template
+struct A
+{
+  static constexpr bool value = true;
+};
+
+template
+struct A
+{
+  static constexpr bool value = true;
+};
+
+static_assert(A::value, "Ouch");
+static_assert(A::value, ""); // #1
+static_assert(A::value, ""); // #2


Re: [patch] Default to --enable-libstdcxx-time=auto

2013-05-23 Thread Jakub Jelinek
On Thu, May 23, 2013 at 10:45:32PM +0200, Rainer Orth wrote:
> Jakub Jelinek  writes:
> 
> > On Thu, May 23, 2013 at 04:02:18PM +0200, Jakub Jelinek wrote:
> >> So, here is an untested 4.8 branch patch.  The @GLIBCXX_3.4.17 +
> >> @@GLIBCXX_3.4.19 stuff gets ugly, I admit, but don't have other solution.
> >> Tested just that it compiles/links, abi list looks good and abi.exp 
> >> testing,
> >> haven't actually tried to test it more than that.
> >
> > Now fully bootstrapped/regtested on x86_64-linux and i686-linux.
> 
> This patch breaks Solaris bootstrap:
> 
> ld: fatal: libstdc++-symbols.ver-sun: 4423: symbol 
> 'std::chrono::steady_clock::now()' is already defined in file: 
> libstdc++-symbols.ver-sun: symbol version conflict
> collect2: error: ld returned 1 exit status
> make[6]: *** [libstdc++.la] Error 1
> 
> I think std::chrono::steady_clock::now() needs to be protected with
> !(__sun__ && __svr4__) in GLIBCXX_3.4.17 since it only became available
> by default with Jonathan's patch.

Ah, I see, gnu.ver has some
#ifdef HAVE_SYMVER_SYMBOL_RENAMING_RUNTIME_SUPPORT
#endif
guards there, does it work if you add it around the
GLIBCXX_3.4.17 std::chrono::steady_clock::now() definition?
Seems the solaris baseline_symbols.txt files don't meantion
this symbol, thus it wasn't exported before and thus the
compat definition there isn't really needed for Solaris.

Jakub


Re: [patch] Default to --enable-libstdcxx-time=auto

2013-05-23 Thread Rainer Orth
Jakub Jelinek  writes:

> On Thu, May 23, 2013 at 04:02:18PM +0200, Jakub Jelinek wrote:
>> So, here is an untested 4.8 branch patch.  The @GLIBCXX_3.4.17 +
>> @@GLIBCXX_3.4.19 stuff gets ugly, I admit, but don't have other solution.
>> Tested just that it compiles/links, abi list looks good and abi.exp testing,
>> haven't actually tried to test it more than that.
>
> Now fully bootstrapped/regtested on x86_64-linux and i686-linux.

This patch breaks Solaris bootstrap:

ld: fatal: libstdc++-symbols.ver-sun: 4423: symbol 
'std::chrono::steady_clock::now()' is already defined in file: 
libstdc++-symbols.ver-sun: symbol version conflict
collect2: error: ld returned 1 exit status
make[6]: *** [libstdc++.la] Error 1

I think std::chrono::steady_clock::now() needs to be protected with
!(__sun__ && __svr4__) in GLIBCXX_3.4.17 since it only became available
by default with Jonathan's patch.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Remove global state from gcc/tracer.c

2013-05-23 Thread Jeff Law

On 05/23/2013 02:31 PM, Richard Henderson wrote:



I think we need more weigh in from other maintainers on this, rather than
iterating a 5th time today...

This seems like an awful lot of pain.

I don't think we should be looking to generate different code for 
library vs executable GCC.


I think we should look for *clean* first, then we can look at what we 
could change if the compile-time performance isn't what we want.


Lots of C++ code manages to pass around the implicit this pointer and 
use it appropriately without that being a significant source of 
performance concerns.  I suspect GCC would be the same in that regard. 
The cost of passing around & using that pointer is dwarfed by all the 
other lameness we have.



Jeff


RE: [PING]RE: [patch] cilkplus: Array notation for C patch

2013-05-23 Thread Iyer, Balaji V


> -Original Message-
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Thursday, May 23, 2013 3:04 PM
> To: Iyer, Balaji V
> Cc: Richard Henderson; 'Joseph S. Myers'; 'Aldy Hernandez'; 'gcc-patches'
> Subject: Re: [PING]RE: [patch] cilkplus: Array notation for C patch
> 
> On Thu, May 23, 2013 at 06:27:04PM +, Iyer, Balaji V wrote:
> > gcc/testsuite/ChangeLog
> > 2013-05-23  Balaji V. Iyer  
> >
> > * gcc.dg/cilk-plus/array_notation/compile/array_test2.c: New test.
> 
> I have concerns about the test locations, to me this looks way too deep tree,
> whether something is a compile test, or compile test expecting errors or 
> runtime
> test is easily determined by { dg-do compile } vs. { dg-do run } and presence 
> or
> lack of { dg-error ... } comments.  So IMHO that level should be left out, 
> plus I'd
> say the array_notation/ level is unnecessary as well, just put everything 
> into c-
> c++-common/cilk-plus/an-*.c (except for tests that aren't going to be usable 
> for
> C++, those can stay in gcc.dg/cilk-plus/an-*.c).  Then gcc.dg/cilk-plus/*.exp
> would just ensure that tests from that directory are run and also from c-c++-
> common/ and later on the same would happen in g++.dg/cilk-plus/.  In the
> future when you will need to link against runtime library cilk-plus.exp would 
> just
> arrange for that to be added to LD_LIBRARY_PATH, -L.../ etc.

Hi Jakub & Aldy,
There are a couple reasons why I picked this hierarchy. I looked at 
gcc-c-torture directory and it has compile, execute etc. This is why I had 
execute, compile and errors directory. Also, we are planning to have some 
hybrid tests that will add array notation + cilk keywords, array notation + 
pragma simd, etc. Yes, I can see the deeply buried issue, but I once had long 
file names (~25-30 characters) that tells what kind of tests (when we first 
opened the branch) they are and someone in the mailing list complained that the 
file names were long and suggested that I use directories instead. If it is OK 
with you both I would like to keep this hierarchy

Thanks,

Balaji V. Iyer.

> 
> > * gcc.dg/cilk-plus/array_notation/compile/array_test1.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/compile/array_test_ND.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/compile/builtin_func_double.c: 
> > Ditto.
> > * gcc.dg/cilk-plus/array_notation/compile/builtin_func_double2.c: 
> > Ditto.
> > * gcc.dg/cilk-plus/array_notation/compile/gather_scatter.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/compile/if_test.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/compile/sec_implicit_ex.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/errors/decl-ptr-colon.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/errors/dimensionless-arrays.c: 
> > Ditto.
> > * gcc.dg/cilk-plus/array_notation/errors/fn_ptr.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/errors/fp_triplet_values.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/errors/gather-scatter.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/errors/misc.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/errors/parser_errors.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/errors/parser_errors2.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/errors/parser_errors3.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/errors/parser_errors4.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/errors/rank_mismatch.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/errors/rank_mismatch2.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/errors/rank_mismatch3.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/errors/sec_implicit.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/errors/sec_implicit2.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/errors/sec_reduce_max_min_ind.c:
> > Ditto.
> > * gcc.dg/cilk-plus/array_notation/errors/tst_lngth.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/errors/vla.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/execute/an-if.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/execute/array_test1.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/execute/array_test2.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/execute/array_test_ND.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/execute/builtin_fn_custom.c: 
> > Ditto.
> > * gcc.dg/cilk-plus/array_notation/execute/builtin_fn_mutating.c: 
> > Ditto.
> > * gcc.dg/cilk-plus/array_notation/execute/builtin_func_double.c: 
> > Ditto.
> > * gcc.dg/cilk-plus/array_notation/execute/builtin_func_double2.c: 
> > Ditto.
> > * gcc.dg/cilk-plus/array_notation/execute/comma_exp.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/execute/conditional.c: Ditto.
> > * gcc.dg/cilk-plus/array_notation/execute/exec-once.c: Ditto.
> > * gcc.dg/cilk-plus/array

Re: RFA: fix rtl-optimization/56833

2013-05-23 Thread Joern Rennecke

Quoting Eric Botcazou :


The patch is OK on principle, but can you factor out the common code?  The
endings of move2add_use_add2_insn and move2add_use_add3_insn are identical so
it would be nice to have e.g. a record_reg_value helper function and call it
from there.  Similarly, the 3 new checks look strictly identical.


Looking into sharing the code with sites that perform essentially the same
function but look somewhat different, I see there's a problem with using
only reg_set_luid to indicate the consistency of a multi-hard-reg-value
in these other contexts.
For values that are use a base register, the reg_set_luid is the same as
for the base register; for constants, it is the same for all constants
set since the last label.

Say we have reg size 8 bit, base r0, and then
(set reg:HI 2 (plus:SI (reg:HI 0) (const_int 500))
...
(set reg:HI 3 (plus:SI (reg:HI 0) (const_int 500))

Now how do we tell that the value in r2 is no longer valid?

As the example shows, trying to replicate the recorded value across all hard
regs is pointless, as we still need to make sure that we have a still-valid
start register.
OTOH, this ties in nicely with setting the mode of subsequent registers
to VOIDmode.  We can verify the mode to make sure there was no more recent
set of any constituent register.  The check of the extra luids thus becomes
superflous, as becomes the set.
This logic relies on multi-hard register regs to be allocated contigously...
But if we'd want to change that, there'd be a lot more code that would
need changing.


Re: Remove global state from gcc/tracer.c

2013-05-23 Thread Richard Henderson
>  /* The Ith entry is the number of objects on a page or order I.  */
>  
> -static unsigned objects_per_page_table[NUM_ORDERS];
> +DEFINE_STATIC_STATE_ARRAY(unsigned, objects_per_page_table, NUM_ORDERS)
>  
>  /* The Ith entry is the size of an object on a page of order I.  */
>  
> -static size_t object_size_table[NUM_ORDERS];
> +DEFINE_STATIC_STATE_ARRAY(size_t, object_size_table, NUM_ORDERS)
>  
>  /* The Ith entry is a pair of numbers (mult, shift) such that
> ((k * mult) >> shift) mod 2^32 == (k / OBJECT_SIZE(I)) mod 2^32,
> for all k evenly divisible by OBJECT_SIZE(I).  */
>  
> -static struct
> +struct inverse_table_def
>  {
>size_t mult;
>unsigned int shift;
> -}
> -inverse_table[NUM_ORDERS];
> +};
> +DEFINE_STATIC_STATE_ARRAY(inverse_table_def, inverse_table, NUM_ORDERS)
>  
>  /* A page_entry records the status of an allocation page.  This
> structure is dynamically sized to fit the bitmap in_use_p.  */
> @@ -343,7 +346,7 @@ struct free_object
>  #endif
>  
>  /* The rest of the global variables.  */
> -static struct globals
> +struct globals
>  {
>/* The Nth element in this array is a page with objects of size 2^N.
>   If there are any pages with free objects, they will be at the
> @@ -457,7 +460,11 @@ static struct globals
>  /* The overhead for each of the allocation orders.  */
>  unsigned long long total_overhead_per_order[NUM_ORDERS];
>} stats;
> -} G;
> +};
> +
> +DEFINE_STATIC_STATE(globals, G)

Be careful here.  Note that all of the arrays that you convert are notionally
static constants defining the bucket sizes for the algorithm, except we don't
actually compute them at compile time.  They would be true global data shared
by all threads.

The G structure contains all of the data that you'd actually want to put into
the "universe".

That said, I'd really like to avoid ultra-macroization as with

> +#define probability_cutoff \
> +   universe.tracer_c->x_probability_cutoff
> +#define branch_ratio_cutoff \
> +   universe.tracer_c->x_branch_ratio_cutoff
> +#define bb_seen \
> +   universe.tracer_c->x_bb_seen

if we can avoid it.

I think we need more weigh in from other maintainers on this, rather than
iterating a 5th time today...


r~


Re: [libgfortran, build] Use -z ignore instead of --as-needed on Solaris

2013-05-23 Thread Tobias Burnus

Hi Rainer,

Rainer Orth wrote:
how should we proceed with this patch now, given the questions above? 
Install as is, although it doesn't seem really beneficial, or drop it?


I would install it. Actually, did you get a libquadmath dependence on 
Solaris or not?


On Linux - or to be more precise: with binutils newer than 2013-03-18 - 
you also shouldn't get a dependence. See 
http://sourceware.org/bugzilla/show_bug.cgi?id=12549 for details.


Tobias


Re: Unordered container insertion hints

2013-05-23 Thread François Dumont

Some feedback regarding this patch ?

Thanks


On 05/15/2013 09:49 PM, François Dumont wrote:

Hi

Here is a patch to consider the hint that users can give to 
enhancement insertion performances. As you can see I only use it for 
unordered_multi* containers to potentially avoid research within the 
bucket nodes.


Note that I have use a call to _M_equals to avoid a hash code 
computation when we end up inserting after the hint. It is an 
optimization because I consider that _M_equals will be always faster 
than a hash code computation. I think that I will submit an other 
patch later to generalize this when possible to limit the small 
performance we noticed when adopting the new data model (unless 
performance tests are showing me that it is worst).


I try to document it. If you accept this patch tell me if it is 
with or without the documentation cause I know that my English is not 
good enough. I didn't find out how I can fix the doc URLs regarding 
usage of hints in the std::unordered_* Doxygen comments.


2013-05-20  François Dumont  

* include/bits/hashtable_policy.h (_Insert_base): Consider hint in
insert methods.
* include/bits/hashtable.h: Likewise.
* testsuite/23_containers/unordered_multimap/insert/hint.cc: New.
* doc/xml/manual/containers.xml: Document hitting in unordered
containers.

François





Re: Remove global state from gcc/tracer.c

2013-05-23 Thread David Malcolm
On Thu, 2013-05-23 at 11:59 -0400, David Malcolm wrote:
> On Thu, 2013-05-23 at 06:56 -0400, David Malcolm wrote:
> > On Thu, 2013-05-23 at 07:14 +0200, Jakub Jelinek wrote:
> > > On Wed, May 22, 2013 at 08:45:45PM -0400, David Malcolm wrote:
> > > > I'm attempting to eliminate global state from the insides of gcc.
> > > > 
> > > > gcc/tracer.c has various global variables, which are only used during
> > > > the lifetime of the execute callback of that pass, and cleaned up at the
> > > > end of each invocation of the pass.
> > > > 
> > > > The attached patch introduces a class to hold the state of the pass
> > > > ("tracer_state"), eliminating these globals.  An instance of the state
> > > > is created on the stack, and all of the various "static" functions in
> > > > tracer.c that used the globals become member functions of the state.
> > > > Hence the state is passed around by the implicit "this" of the
> > > > tracer_state, avoiding the need to patch each individual use of a field
> > > > within the state, minimizing the diff.
> > > 
> > > But do we want to handle the global state this way?  This adds overhead
> > > to (almost?) every single function (now method) in the file (because it 
> > > gets
> > > an extra argument).  While that might be fine for rarely executed 
> > > functions,
> > > if it involves also hot functions called many times, where especially on
> > > register starved hosts it could increase register pressure, plus the
> > > overhead of passing the this argument everywhere, this could start to be
> > > noticeable.  Sure, if you plan to do that just in one pass (but, why 
> > > then?),
> > > it might be tiny slowdown, but after you convert the hundreds of passes in
> > > gcc that contain global state it might become significant.
> > > 
> > > There are alternative approaches that should be considered.
> > 
> > I thought of a possible way of doing this, attached is a
> > proof-of-concept attempt.
> > 
> > The idea is to use (and then not use) C++'s "static" syntax for class
> > methods and fields.  By making that optional with a big configure-time
> > switch, it gives us a way of making state be either global vs on-stack,
> > with minimal syntax changes.  In one configuration (for building gcc as
> > a library) there would be implicit this-> throughout, but in the other
> > (for speedy binaries) it would all compile away to global state, as per
> > the status quo.
> > 
> > This assumes that doing:
> > 
> >tracer_state state;
> >changed = state.tail_duplicate ();
> > 
> > is legitimate; when using global state, "state" is empty, and the call
> > to
> >   state.tail_duplicate ()
> > becomes effectively:
> >   state::tail_duplicate ()
> > since it's static in that configuration.
> > 
> > > E.g. global state of a pass can be moved into a per-pass structure,
> > > and have some way how to aggregate those per pass structures together from
> > > all the passes in the whole compiler (that can be either manual process,
> > > say each pass providing its own *-passstate.h and one big header including
> > > all that together), or automatic ones (say gengstate or a new tool could
> > > create those for us from special markings in the source, say new option on
> > > GTY or something) and have some magic macro how to access the global state
> > > within the pass (thispass->fieldname ?).  Then e.g. depending on how the
> > > compiler would be configured and built, thispass could be just address of 
> > > a
> > > pass struct var (i.e. essentially keep the global state as is, for
> > > performance reasons), or when trying to build compiler as a library (with
> > > -fpic overhead we probably don't want for cc1/cc1plus - we can build all 
> > > the
> > > *.o files twice, like libtool does) thispass could expand to __thread
> > > pointer var dereference plus a field inside of the global compiler state
> > > structure it points to for the current pass.  Thus, the library version
> > > of the compiler would be somewhat slower (both -fpic overhead and TLS
> > > overhead), and would need either a few of the entrypoints tweaked to 
> > > adjust
> > > the TLS pointer to the global state, or we could require users to just 
> > > call
> > > a special function to make the global state current in the current thread
> > > before calling compiler internals.
> > 
> > Thanks.   Though I thought we were trying to move away from relying on
> > GTY parsing?   (Sorry not to be able to answer more fully yet, need to
> > get family ready for school...)
> 
> I've warmed to your idea of having tooling to support state, and
> creating a generic framework for this.  For example, consider the
> (long-term) use-case of embedding GCC's code as a library inside a
> multithreaded app, where each thread could be JIT-compiling say OpenGL
> shader code to machine code (perhaps with some extra non-standard
> passes).  To get there, I'd need to isolate *all* of GGC's state, and
> when I look at, say, the garbage-collector, I shudder.

Use unsigned(-1) for lshift

2013-05-23 Thread Marc Glisse

Hello,

this is a simple patch to reduce a bit the noise in PR57324 (undefined 
behavior flagged by clang). I only handled some of the most obvious ones.

Passes bootstrap+testsuite on x86_64-linux-gnu.


2013-05-24  Marc Glisse  

PR other/57324
* expmed.c (expand_smod_pow2): Use an unsigned -1 for lshift.
* fold-const.c (fold_unary_loc): Likewise.
* double-int.c (rshift_double, lshift_double): Likewise.
* cse.c (cse_insn): Likewise.
* tree.c (integer_pow2p, tree_log2, tree_floor_log2): Likewise.
* tree-ssa-structalias.c (UNKNOWN_OFFSET): Shift 1 instead of -1.

--
Marc GlisseIndex: gcc/double-int.c
===
--- gcc/double-int.c(revision 199256)
+++ gcc/double-int.c(working copy)
@@ -264,21 +264,22 @@ rshift_double (unsigned HOST_WIDE_INT l1
 
   if (count >= prec)
 {
   *hv = signmask;
   *lv = signmask;
 }
   else if ((prec - count) >= HOST_BITS_PER_DOUBLE_INT)
 ;
   else if ((prec - count) >= HOST_BITS_PER_WIDE_INT)
 {
-  *hv &= ~((HOST_WIDE_INT) (-1) << (prec - count - 
HOST_BITS_PER_WIDE_INT));
+  *hv &= ~((unsigned HOST_WIDE_INT) (-1)
+  << (prec - count - HOST_BITS_PER_WIDE_INT));
   *hv |= signmask << (prec - count - HOST_BITS_PER_WIDE_INT);
 }
   else
 {
   *hv = signmask;
   *lv &= ~((unsigned HOST_WIDE_INT) (-1) << (prec - count));
   *lv |= signmask << (prec - count);
 }
 }
 
@@ -321,21 +322,21 @@ lshift_double (unsigned HOST_WIDE_INT l1
 
   signmask = -((prec > HOST_BITS_PER_WIDE_INT
? ((unsigned HOST_WIDE_INT) *hv
   >> (prec - HOST_BITS_PER_WIDE_INT - 1))
: (*lv >> (prec - 1))) & 1);
 
   if (prec >= HOST_BITS_PER_DOUBLE_INT)
 ;
   else if (prec >= HOST_BITS_PER_WIDE_INT)
 {
-  *hv &= ~((HOST_WIDE_INT) (-1) << (prec - HOST_BITS_PER_WIDE_INT));
+  *hv &= ~((unsigned HOST_WIDE_INT) -1 << (prec - HOST_BITS_PER_WIDE_INT));
   *hv |= signmask << (prec - HOST_BITS_PER_WIDE_INT);
 }
   else
 {
   *hv = signmask;
   *lv &= ~((unsigned HOST_WIDE_INT) (-1) << prec);
   *lv |= signmask << prec;
 }
 }
 
Index: gcc/tree-ssa-structalias.c
===
--- gcc/tree-ssa-structalias.c  (revision 199256)
+++ gcc/tree-ssa-structalias.c  (working copy)
@@ -475,21 +475,21 @@ struct constraint_expr
 
   /* Offset, in bits, of this constraint from the beginning of
  variables it ends up referring to.
 
  IOW, in a deref constraint, we would deref, get the result set,
  then add OFFSET to each member.   */
   HOST_WIDE_INT offset;
 };
 
 /* Use 0x8000... as special unknown offset.  */
-#define UNKNOWN_OFFSET ((HOST_WIDE_INT)-1 << (HOST_BITS_PER_WIDE_INT-1))
+#define UNKNOWN_OFFSET ((HOST_WIDE_INT) 1 << (HOST_BITS_PER_WIDE_INT-1))
 
 typedef struct constraint_expr ce_s;
 static void get_constraint_for_1 (tree, vec *, bool, bool);
 static void get_constraint_for (tree, vec *);
 static void get_constraint_for_rhs (tree, vec *);
 static void do_deref (vec *);
 
 /* Our set constraints are made up of two constraint expressions, one
LHS, and one RHS.
 
Index: gcc/cse.c
===
--- gcc/cse.c   (revision 199256)
+++ gcc/cse.c   (working copy)
@@ -5374,21 +5374,21 @@ cse_insn (rtx insn)
 may not equal what was stored, due to truncation.  */
 
   if (GET_CODE (SET_DEST (sets[i].rtl)) == ZERO_EXTRACT)
{
  rtx width = XEXP (SET_DEST (sets[i].rtl), 1);
 
  if (src_const != 0 && CONST_INT_P (src_const)
  && CONST_INT_P (width)
  && INTVAL (width) < HOST_BITS_PER_WIDE_INT
  && ! (INTVAL (src_const)
-   & ((HOST_WIDE_INT) (-1) << INTVAL (width
+   & ((unsigned HOST_WIDE_INT) (-1) << INTVAL (width
/* Exception: if the value is constant,
   and it won't be truncated, record it.  */
;
  else
{
  /* This is chosen so that the destination will be invalidated
 but no new value will be recorded.
 We must invalidate because sometimes constant
 values can be recorded for bitfields.  */
  sets[i].src_elt = 0;
Index: gcc/expmed.c
===
--- gcc/expmed.c(revision 199256)
+++ gcc/expmed.c(working copy)
@@ -3688,39 +3688,39 @@ expand_smod_pow2 (enum machine_mode mode
 }
 
   /* Mask contains the mode's signbit and the significant bits of the
  modulus.  By including the signbit in the operation, many targets
  can avoid an explicit compare operation in the following comparison
  against zero.  */
 
   masklow = ((HOST_WIDE_INT) 1 << logd) - 1;
   if (GET_MODE_BITSIZE (mode) <= HOST_BITS_PER_WIDE_INT)

Fix pr56742 - mingw64 seh unwinding error

2013-05-23 Thread Richard Henderson
Thanks to Kai for tracking down the root cause (detailed in the block comment),
and double-checking the testing.

Tested on x86_64-w64-mingw32 and a sanity build for werror on x86_64-linux.

Committed to mainline; I'll wait til 4.8.1 is out for application to that 
branch.


r~
PR target/56742
* config/i386/i386.c (ix86_seh_fixup_eh_fallthru): New.
(ix86_reorg): Call it.


diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 3470fef..20163b1 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -35564,6 +35564,46 @@ ix86_pad_short_function (void)
 }
 }
 
+/* Fix up a Windows system unwinder issue.  If an EH region falls thru into
+   the epilogue, the Windows system unwinder will apply epilogue logic and
+   produce incorrect offsets.  This can be avoided by adding a nop between
+   the last insn that can throw and the first insn of the epilogue.  */
+
+static void
+ix86_seh_fixup_eh_fallthru (void)
+{
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR->preds)
+{
+  rtx insn, next;
+
+  /* Find the beginning of the epilogue.  */
+  for (insn = BB_END (e->src); insn != NULL; insn = PREV_INSN (insn))
+   if (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_EPILOGUE_BEG)
+ break;
+  if (insn == NULL)
+   continue;
+
+  /* We only care about preceeding insns that can throw.  */
+  insn = prev_active_insn (insn);
+  if (insn == NULL || !can_throw_internal (insn))
+   continue;
+
+  /* Do not separate calls from their debug information.  */
+  for (next = NEXT_INSN (insn); next != NULL; next = NEXT_INSN (next))
+   if (NOTE_P (next)
+&& (NOTE_KIND (next) == NOTE_INSN_VAR_LOCATION
+|| NOTE_KIND (next) == NOTE_INSN_CALL_ARG_LOCATION))
+ insn = next;
+   else
+ break;
+
+  emit_insn_after (gen_nops (const1_rtx), insn);
+}
+}
+
 /* Implement machine specific optimizations.  We implement padding of returns
for K8 CPUs and pass to avoid 4 jumps in the single 16 byte window.  */
 static void
@@ -35573,6 +35613,9 @@ ix86_reorg (void)
  with old MDEP_REORGS that are not CFG based.  Recompute it now.  */
   compute_bb_for_insn ();
 
+  if (TARGET_SEH && current_function_has_exception_handlers ())
+ix86_seh_fixup_eh_fallthru ();
+
   if (optimize && optimize_function_for_speed_p (cfun))
 {
   if (TARGET_PAD_SHORT_FUNCTION)


Re: [PING]RE: [patch] cilkplus: Array notation for C patch

2013-05-23 Thread Richard Henderson
On 05/23/2013 11:27 AM, Iyer, Balaji V wrote:
> Hello Richard et al.,
>   Attached, please find a fixed patch. I have done the following changes:
> 
> 1. Used the c_finish_loop (...) function instead of building the loop myself
> 2. Got rid of ARRAY_NOTATION_TYPE and just used TREE_TYPE ().
> 
> It is passing all the regression tests and not breaking/passing any other
> tests that were not already breaking/passing in the trunk.

Good to know A_N_T wasn't needed.  You failed to remove it completely though:

> +/* Array Notation expression.
> +   Operand 0 is the array.
> +   Operand 1 is the starting array index.
> +   Operand 2 contains the number of elements you need to access.
> +   Operand 3 is the stride.
> +   Operand 4 is the array notation's type.  */
> +DEFTREECODE (ARRAY_NOTATION_REF, "array_notation_ref", tcc_reference, 5) 

> +@item ARRAY_NOTATION_REF
> +These nodes represent array notation expressions that are part of the
> +Cilk Plus language extensions (enabled by the @option{-fcilkplus}
> +flag).  The first operand is the array.  The second, third, and fourth
> +operands are the start-index, number of elements accessed (also called
> +length) and the stride, respectively.  The fifth operand holds the
> +array type.  Near the end of the parsing stage, these array notations
> +are broken up into array references (@code{ARRAY_REF}) enclosed inside
> +a loop iterating from 0 to the number of elements accessed.
> +

This really shouldn't go in generic.texi, because it's not a generic tree code.
 AFAIK, we have no texi documentation for the front-end specific stuff.

Otherwise I see nothing wrong with the patch vs compiler proper.


r~


Re: Remove global state from gcc/tracer.c

2013-05-23 Thread Richard Henderson
On 05/23/2013 12:20 PM, Jakub Jelinek wrote:
> On Thu, May 23, 2013 at 12:06:01PM -0700, Richard Henderson wrote:
>>> struct foo_c_state
>>> {
>>>   some_type bar;
>>> };
>>> # define bar   ctxt->x_foo_c_bar;
>>> /* there's a"context *ctxt;" variable somewhere, possibly
>>>using TLS */
>>
>> I've an idea that this will perform very badly.  With ctxt being a global (if
>> tls) variable, it's call clobbered.  Thus we'll wind up reloading it all the
>> time, which for pic involves another call to the __get_tlsaddr runtime 
>> function.
> 
> If all of gcc just has one __thread pointer in it, then we can just use
> tls_model ("initial-exec") for it and lower that overhead down a lot.

Not available on emutls systems.  But even with initial-exec the overhead is a
lot more than having the value stored within the local stack frame.

TLS is great for passing around rarely used state, which might be needed 10
call frames down but not in between.  E.g. any state for the presumed
per-thread memory allocator.

But it's not nearly so good for pass-specific data that potentially every
function within the pass might need.


r~


[Patch wwwdocs] gcc-4.9 changes: address sanitizer on ARM

2013-05-23 Thread Christophe Lyon
Hello,

This patch mentions Address Sanitizer on ARM in the gcc-4.9/changes.html pages.
(and re-enables the "General Optimizer Improvements" section)

Is it OK to commit?

Thanks,

Christophe.

Index: gcc-4.9/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v
retrieving revision 1.14
diff -r1.14 changes.html
35d34
< 
38a37,40
>   
> AddressSanitizer, a fast memory error detector, is now available on 
> ARM.
> 
>   


Re: [PING]RE: [patch] cilkplus: Array notation for C patch

2013-05-23 Thread Aldy Hernandez

On 05/23/13 14:03, Jakub Jelinek wrote:

On Thu, May 23, 2013 at 06:27:04PM +, Iyer, Balaji V wrote:

gcc/testsuite/ChangeLog
2013-05-23  Balaji V. Iyer  

 * gcc.dg/cilk-plus/array_notation/compile/array_test2.c: New test.


I have concerns about the test locations, to me this looks way too deep
tree, whether something is a compile test, or compile test expecting errors
or runtime test is easily determined by { dg-do compile } vs. { dg-do run }
and presence or lack of { dg-error ... } comments.  So IMHO that level
should be left out, plus I'd say the array_notation/ level is unnecessary as
well, just put everything into c-c++-common/cilk-plus/an-*.c
(except for tests that aren't going to be usable for C++, those can stay in
gcc.dg/cilk-plus/an-*.c).  Then gcc.dg/cilk-plus/*.exp would just ensure
that tests from that directory are run and also from c-c++-common/ and later
on the same would happen in g++.dg/cilk-plus/.  In the future when you will
need to link against runtime library cilk-plus.exp would just arrange for
that to be added to LD_LIBRARY_PATH, -L.../ etc.


For the record, I agree.


Re: Remove global state from gcc/tracer.c

2013-05-23 Thread Jakub Jelinek
On Thu, May 23, 2013 at 12:06:01PM -0700, Richard Henderson wrote:
> > struct foo_c_state
> > {
> >   some_type bar;
> > };
> > # define bar   ctxt->x_foo_c_bar;
> > /* there's a"context *ctxt;" variable somewhere, possibly
> >using TLS */
> 
> I've an idea that this will perform very badly.  With ctxt being a global (if
> tls) variable, it's call clobbered.  Thus we'll wind up reloading it all the
> time, which for pic involves another call to the __get_tlsaddr runtime 
> function.

If all of gcc just has one __thread pointer in it, then we can just use
tls_model ("initial-exec") for it and lower that overhead down a lot.

Jakub


Re: Remove global state from gcc/tracer.c

2013-05-23 Thread Richard Henderson
On 05/23/2013 03:56 AM, David Malcolm wrote:
> The idea is to use (and then not use) C++'s "static" syntax for class
> methods and fields.  By making that optional with a big configure-time
> switch, it gives us a way of making state be either global vs on-stack,
> with minimal syntax changes.

Plausible.

Another thing I should mention while you're doing all of these static function
to class member conversions is that as written we're losing target-specific
optimizations that can be done on purely local functions.  This is trivially
fixed by placing these new classes in an anonymous namespace.

> +private:
> +
> +  /* Minimal outgoing edge probability considered for superblock
> + formation.  */
> +  STATEFUL int probability_cutoff;
> +  STATEFUL int branch_ratio_cutoff;
> +
> +  /* A bit BB->index is set if BB has already been seen, i.e. it is
> + connected to some trace already.  */
> +  STATEFUL sbitmap bb_seen;
...
> +#if GLOBAL_STATE
> +/* Global definitions of static data.  */
> +int tracer_state::probability_cutoff;
> +int tracer_state::branch_ratio_cutoff;
> +sbitmap tracer_state::bb_seen;
> +#endif
...
> +tracer_state::tracer_state()
> +#if !GLOBAL_STATE
> +  : probability_cutoff(0),
> +branch_ratio_cutoff(0),
> +bb_seen()
> +#endif
> +{
> +}
> +

What I don't like about this arrangement is that it requires three repetitions
of the state variables and their initializations.  I wonder if we can do better
with

class tracer_state
{
  private:
struct data
{
  int probability_cutoff;
  int branch_ratio_cutoff;
  sbitmap bb_seen;

  data()
: probability_cutoff(0),
  branch_ratio_cutoff(0)
  bb_seen()
  { }
};

STATEFUL data d;

  public:
tracer_state() { }
  ...
};

#if GLOBAL_STATE
tracer_state::data tracer_state::d;
#endif

which does require accesses as "d.foo" instead of just foo, but at least we've
cut down on the boilerplate.

Though with this I wonder if we need a CONSTEXPR define to markup
tracer_state::data::data so that the global variable doesn't wind up running a
constructor at runtime?  (I.e. performs correctly if inefficiently for stage1,
but stage[23] use g++ and thus can have the c++11 extension?)

> #if SUPPORT_MULTIPLE_STATES
> struct foo_c_state
> {
>   some_type bar;
> };
> # define bar   ctxt->x_foo_c_bar;
> /* there's a"context *ctxt;" variable somewhere, possibly
>using TLS */

I've an idea that this will perform very badly.  With ctxt being a global (if
tls) variable, it's call clobbered.  Thus we'll wind up reloading it all the
time, which for pic involves another call to the __get_tlsaddr runtime function.

For a very heavily used pointers, we're almost certainly better off having the
data be referenced via "this", where at least the starting point is known to be
function invariant.


r~


Re: [PING]RE: [patch] cilkplus: Array notation for C patch

2013-05-23 Thread Jakub Jelinek
On Thu, May 23, 2013 at 06:27:04PM +, Iyer, Balaji V wrote:
> gcc/testsuite/ChangeLog
> 2013-05-23  Balaji V. Iyer  
> 
> * gcc.dg/cilk-plus/array_notation/compile/array_test2.c: New test.

I have concerns about the test locations, to me this looks way too deep
tree, whether something is a compile test, or compile test expecting errors
or runtime test is easily determined by { dg-do compile } vs. { dg-do run }
and presence or lack of { dg-error ... } comments.  So IMHO that level
should be left out, plus I'd say the array_notation/ level is unnecessary as
well, just put everything into c-c++-common/cilk-plus/an-*.c
(except for tests that aren't going to be usable for C++, those can stay in
gcc.dg/cilk-plus/an-*.c).  Then gcc.dg/cilk-plus/*.exp would just ensure
that tests from that directory are run and also from c-c++-common/ and later
on the same would happen in g++.dg/cilk-plus/.  In the future when you will
need to link against runtime library cilk-plus.exp would just arrange for
that to be added to LD_LIBRARY_PATH, -L.../ etc.

> * gcc.dg/cilk-plus/array_notation/compile/array_test1.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/compile/array_test_ND.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/compile/builtin_func_double.c: 
> Ditto.
> * gcc.dg/cilk-plus/array_notation/compile/builtin_func_double2.c: 
> Ditto.
> * gcc.dg/cilk-plus/array_notation/compile/gather_scatter.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/compile/if_test.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/compile/sec_implicit_ex.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/errors/decl-ptr-colon.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/errors/dimensionless-arrays.c: 
> Ditto.
> * gcc.dg/cilk-plus/array_notation/errors/fn_ptr.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/errors/fp_triplet_values.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/errors/gather-scatter.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/errors/misc.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/errors/parser_errors.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/errors/parser_errors2.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/errors/parser_errors3.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/errors/parser_errors4.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/errors/rank_mismatch.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/errors/rank_mismatch2.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/errors/rank_mismatch3.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/errors/sec_implicit.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/errors/sec_implicit2.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/errors/sec_reduce_max_min_ind.c:
> Ditto.
> * gcc.dg/cilk-plus/array_notation/errors/tst_lngth.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/errors/vla.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/execute/an-if.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/execute/array_test1.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/execute/array_test2.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/execute/array_test_ND.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/execute/builtin_fn_custom.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/execute/builtin_fn_mutating.c: 
> Ditto.
> * gcc.dg/cilk-plus/array_notation/execute/builtin_func_double.c: 
> Ditto.
> * gcc.dg/cilk-plus/array_notation/execute/builtin_func_double2.c: 
> Ditto.
> * gcc.dg/cilk-plus/array_notation/execute/comma_exp.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/execute/conditional.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/execute/exec-once.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/execute/exec-once2.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/execute/gather_scatter.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/execute/if_test.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/execute/n-ptr-test.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/execute/sec_implicit_ex.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/execute/side-effects-1.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/execute/test_builtin_return.c: 
> Ditto.
> * gcc.dg/cilk-plus/array_notation/execute/test_sec_limits.c: Ditto.
> * gcc.dg/cilk-plus/array_notation/execute/cilkplus_AN_c_execute.exp:
> New script.
> * gcc.dg/cilk-plus/array_notation/compile/cilkplus_AN_c_compile.exp:
> Ditto.
> * gcc.dg/cilk-plus/array_notation/errors/cilkplus_AN_c_errors.exp:
> Ditto.

Jakub


RE: [PING]RE: [patch] cilkplus: Array notation for C patch

2013-05-23 Thread Iyer, Balaji V
Hello Richard et al.,
Attached, please find a fixed patch. I have done the following changes:

1. Used the c_finish_loop (...) function instead of building the loop myself
2. Got rid of ARRAY_NOTATION_TYPE and just used TREE_TYPE ().

It is passing all the regression tests and not breaking/passing any other tests 
that were not already breaking/passing in the trunk.

Here are the ChangeLog Entries:

gcc/ChangeLog
013-05-23  Balaji V. Iyer  

* doc/extend.texi (C Extensions): Added documentation about Cilk Plus
array notation built-in reduction functions.
* doc/passes.texi (Passes): Added documentation about changes done
for Cilk Plus.
* doc/invoke.texi (C Dialect Options): Added documentation about
the -fcilkplus flag.
* doc/generic.texi (Storage References): Added documentation for
ARRAY_NOTATION_REF storage.
* Makefile.in (C_COMMON_OBJS): Added c-family/array-notation-common.o.
(BUILTINS_DEF): Depend on cilkplus.def.
* builtins.def: Include cilkplus.def.  Define DEF_CILKPLUS_BUILTIN.
* builtin-types.def: Define BT_FN_INT_PTR_PTR_PTR.
* cilkplus.def: New file.

gcc/c-family/ChangeLog
2013-05-23  Balaji V. Iyer  

* c-common.c (c_define_builtins): When cilkplus is enabled, the
function array_notation_init_builtins is called.
(c_common_init_ts): Added ARRAY_NOTATION_REF as typed.
* c-common.def (ARRAY_NOTATION_REF): New tree.
* c-common.h (build_array_notation_expr): New function declaration.
(build_array_notation_ref): Likewise.
(extract_sec_implicit_index_arg): New extern declaration.
(is_sec_implicit_index_fn): Likewise.
(ARRAY_NOTATION_CHECK): New define.
(ARRAY_NOTATION_ARRAY): Likewise.
(ARRAY_NOTATION_START): Likewise.
(ARRAY_NOTATION_LENGTH): Likewise.
(ARRAY_NOTATION_STRIDE): Likewise.
* c-pretty-print.c (pp_c_postifix_expression): Added a new case for
ARRAY_NOTATION_REF.
(pp_c_expression): Likewise.
* c.opt (flag_enable_cilkplus): New flag.
* array-notation-common.c: New file.

gcc/c/ChangeLog
2013-05-23  Balaji V. Iyer  

* c-typeck.c (build_array_ref): Added a check to see if array's
index is greater than one.  If true, then emit an error.
(build_function_call_vec): Exclude error reporting and checking
for builtin array-notation functions.
(convert_arguments): Likewise.
(c_finish_return): Added a check for array notations as a return
expression.  If true, then emit an error.
(c_finish_loop): Added a check for array notations in a loop
condition.  If true then emit an error.
(lvalue_p): Added a ARRAY_NOTATION_REF case.
(build_binary_op): Added a check for array notation expr inside
op1 and op0.  If present, we call another function to find correct
type.
* Make-lang.in (C_AND_OBJC_OBJS): Added c-array-notation.o.
* c-parser.c (c_parser_compound_statement): Check if array
notation code is used in tree, if so, then transform them into
appropriate C code.
(c_parser_expr_no_commas): Check if array notation is used in LHS
or RHS, if so, then build array notation expression instead of
regular modify.
(c_parser_postfix_expression_after_primary): Added a check for
colon(s) after square braces, if so then handle it like an array
notation.  Also, break up array notations in unary op if found.
(c_parser_direct_declarator_inner): Added a check for array
notation.
(c_parser_compound_statement): Added a check for array notation in
a stmt.  If one is present, then expand array notation expr.
(c_parser_if_statement): Likewise.
(c_parser_switch_statement): Added a check for array notations in
a switch statement's condition.  If true, then output an error.
(c_parser_while_statement): Similarly, but for a while.
(c_parser_do_statement): Similarly, but for a do-while.
(c_parser_for_statement): Similarly, but for a for-loop.
(c_parser_unary_expression): Check if array notation is used in a
pre-increment or pre-decrement expression.  If true, then expand
them.
(c_parser_array_notation): New function.
* c-array-notation.c: New file.
* c-tree.h (is_cilkplus_reduce_builtin): Protoize.

gcc/testsuite/ChangeLog
2013-05-23  Balaji V. Iyer  

* gcc.dg/cilk-plus/array_notation/compile/array_test2.c: New test.
* gcc.dg/cilk-plus/array_notation/compile/array_test1.c: Ditto.
* gcc.dg/cilk-plus/array_notation/compile/array_test_ND.c: Ditto.
* gcc.dg/cilk-plus/array_notation/compile/builtin_func_double.c: Ditto.
* gcc.dg/cilk-plus/array_notation/compile/builtin_func_double2.c: Ditto.
* gcc.dg/cilk-plus/array_notation/compile/ga

Re: [patch, powerpc] increase array alignment for Altivec

2013-05-23 Thread Mike Stump
On May 23, 2013, at 9:17 AM, David Edelsohn  wrote:
> I want to have a version of the patch committed. The only question now
> is how much of the patch can be committed without exposing potential
> incompatibilities between different object files.

The hard part is to know when for certain, the data will be used.  As long as 
we know for certain the data will be used, we can up the alignment on the data 
(not the type).  No link once, no comdat, no weak would be a start.  The 
question in my mind, is it sufficient?  Some linkers will maximize the 
alignments of all instances of data, some won't.  I was thinking darwin would, 
but GNU ld would not.  ! TREE_PUBLIC (decl) || (! DECL_WEAK (decl)  && ! 
DECL_COMDAT (decl) && ! DECL_COMDAT_GROUP && DECL_COMMON (decl) …)  now, I say 
that, but am aware of the predicate routines in varasm.c.  decl_replaceable_p, 
decl_binds_to_current_def_p, binds_local_p, resolution_to_local_definition_p…  
certainly one of them should fit nicely, maybe.

Re: [PATCH][gensupport] Add optional attributes field to define_cond_exec

2013-05-23 Thread Michael Zolotukhin
> - What about define_insn_and_split? Currently, we can define "predicable"
> for a define_insn_and_split,
Yes, you're right. Currently define_subst cannot be applied to
define_insn_and_split. That's not implemented yet because I didn't see
a real usages of define_substs with these (though I'm not saying
nobody uses it) - in absence of use cases I wasn't able to design a
proper syntax for it. If you have any ideas of how that could be done
in a pretty way, please let me know:)

As for your second concern:
> - What about predication on a per-alternative basis (set_attr "predicable"
> "yes,no,yes")?
Currently, cond_exec actually could be a more compact way of doing
this. But in general, define_subst is able to substitute it (as
Richard said, it's a superset of define_cond_exec). Here is an example
of how that could be achieved (maybe, it's not optimal in terms of
lines of code):
Suppose we have:
(define_insn "arm_load_exclusive"
  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
(zero_extend:SI
  (unspec_volatile:NARROW
[(match_operand:NARROW 1 "mem_noofs_operand" "Ua,m")]
VUNSPEC_LL)))]
  "TARGET_HAVE_LDREXBH"
  "ldrex%?\t%0, %C1"
  [(set_attr "predicable" "yes,no")])
(I added a second alternative to the define_insn from your example)

We could add several subst-attributes, as following:
(define_subst_attr "constraint_operand1" "ds_predicable" "=r,r" "=r")
(define_subst_attr "constraint_operand2" "ds_predicable" "=Ua,m" "Ua")
And then rewrite the original pattern:
(define_insn "arm_load_exclusive"
  [(set (match_operand:SI 0 "s_register_operand" "")
(zero_extend:SI
  (unspec_volatile:NARROW
[(match_operand:NARROW 1 "mem_noofs_operand"
"")]
VUNSPEC_LL)))]
  "TARGET_HAVE_LDREXBH"
  "ldrex%?\t%0, %C1"
  []) ;; We don't need this attr for define_subst

Define_subst (I didn't copy it here) will expand this to the next two patterns:
;; First pattern (exactly like original), define_subst not-applied
(define_insn "arm_load_exclusive"
  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
(zero_extend:SI
  (unspec_volatile:NARROW
[(match_operand:NARROW 1 "mem_noofs_operand" "Ua,m")]
VUNSPEC_LL)))]
  "TARGET_HAVE_LDREXBH"
  "ldrex%?\t%0, %C1"
  [])
;; Second pattern, with applied define_subst
(define_insn "arm_load_exclusive"
  [(cond_exec
  (match_operator 1 "arm_comparison_operator"
 [(match_operand 2 "cc_register" "")
  (const_int 0)])
  (set
(match_operand:SI 0 "s_register_operand" "=r")
(zero_extend:SI
  (unspec_volatile:NARROW
[(match_operand:NARROW 1 "mem_noofs_operand" "m")]
VUNSPEC_LL)))]
  "TARGET_HAVE_LDREXBH"
  "ldrex%?\t%0, %C1"
  [])

So, the main idea here is to control how many and what alternatives
which pattern would have - and define_subst allows to do that.

The only problem here is that we might need many subst-attributes.
But I think that problem could also be solved if, as Richard
suggested, define_cond_exec would be expanded in gensupport - we might
generate needed attributes there. Thus, define_cond_exec would be a
kind of 'syntax sugar' for such cases.

Thanks, Michael

On 23 May 2013 20:53, Kyrylo Tkachov  wrote:
> Hi Michael,
>
>> Hi Kyrylo, Richard,
>>
>> > What would be the function of (set_attr "ds_predicable" "yes") ?
>> > Doesn't the use of  already trigger the
>> substitution?
>> To use define subst one doesn't need to write (set_attr
>> "ds_predicable" "yes") - it's triggered by mentioning any of connected
>> subst-attributes in the pattern.
>>
>> > ... But I'd like to keep using the
>> > "predicable" attribute
>> > the way it's used now to mark patterns for cond_exec'ednes.
>> If you decide to move to define_subst from cond_exec, then I'd suggest
>> to avoid using 'predicable' attribute - it could involve cond_exec
>> after or before define_subst and that's definitely not what you might
>> want to get.
>
> I'm reluctant to replace define_cond_exec with define_subst for a couple of
> reasons:
>
> - What about define_insn_and_split? Currently, we can define "predicable"
> for a define_insn_and_split,
> but the define_subst documentation says it can only be applied to
> define_insn and define_expand.
>
> - What about predication on a per-alternative basis (set_attr "predicable"
> "yes,no,yes")?
> If the presence of a subst_attr in a pattern triggers substitution (and
> hence predication),
> how do we specify that a particular alternative cannot be predicable? the
> define_cond_exec
> machinery does some implicit tricks with ce_enabled and noce_enabled that
> allows to do that
> (http://gcc.gnu.org/ml/gcc-patches/2011-06/msg00094.html).
> define_subst doesn't seem like a good substitute (no pun intended ;) ) at
> this point.
>
>>
>> If you describe in more details, which patterns you're trying to get
>> from which, I'll try to help with define_subst.
>
> An example of what I'm trying to ac

[Solaris] Catch FP exceptions

2013-05-23 Thread Eric Botcazou
Hi,

this isn't really valid Ada semantics, but some people enable traps-on-fp-
exceptions in the FPU on Solaris and expect the Ada exception to be caught.
There is a glitch with the x87 and the SPARC FPUs: the SIGFPE is delivered 
after the faulting instruction by Solaris, so the unwinder is fooled and 
miscomputes the faulting address.

Fixed thusly, tested on x86/Solaris and SPARC/Solaris, OK for mainline?


2013-05-23  Eric Botcazou  

libgcc/
* config/sparc/sol2-unwind.h (MD_FALLBACK_FRAME_STATE_FOR): Do not set
fs->signal_frame for SIGFPE raised for IEEE-754 exceptions.
* config/i386/sol2-unwind.h (x86_fallback_frame_state): Likewise.


2013-05-23  Eric Botcazou  

* gnat.dg/fp_exception.adb: New test.


-- 
Eric BotcazouIndex: config/sparc/sol2-unwind.h
===
--- config/sparc/sol2-unwind.h	(revision 199191)
+++ config/sparc/sol2-unwind.h	(working copy)
@@ -403,7 +403,12 @@ MD_FALLBACK_FRAME_STATE_FOR (struct _Unw
   fs->retaddr_column = 0;
   fs->regs.reg[0].how = REG_SAVED_OFFSET;
   fs->regs.reg[0].loc.offset = (long)shifted_ra_location - new_cfa;
-  fs->signal_frame = 1;
+
+  /* SIGFPE for IEEE-754 exceptions is delivered after the faulting insn
+ rather than before it, so don't set fs->signal_frame in that case.
+ We test whether the cexc field of the FSR is zero.  */
+  if ((mctx->fpregs.fpu_fsr & 0x1f) == 0)
+fs->signal_frame = 1;
 
   return _URC_NO_REASON;
 }
Index: config/i386/sol2-unwind.h
===
--- config/i386/sol2-unwind.h	(revision 199190)
+++ config/i386/sol2-unwind.h	(working copy)
@@ -249,7 +249,12 @@ x86_fallback_frame_state (struct _Unwind
   fs->regs.reg[8].how = REG_SAVED_OFFSET;
   fs->regs.reg[8].loc.offset = (long)&mctx->gregs[EIP] - new_cfa;
   fs->retaddr_column = 8;
-  fs->signal_frame = 1;
+
+  /* SIGFPE for IEEE-754 exceptions is delivered after the faulting insn
+ rather than before it, so don't set fs->signal_frame in that case.
+ We test whether the ES field of the Status Register is zero.  */
+  if ((mctx->fpregs.fp_reg_set.fpchip_state.status & 0x80) == 0)
+fs->signal_frame = 1;
 
   return _URC_NO_REASON;
 }-- { dg-do run { target *-*-solaris2.* } }

procedure FP_Exception is

  type my_fixed is digits 15;
  for my_fixed'size use 64;
  fixed1 : my_fixed := 1.0;  
  fixed2 : my_fixed := -0.0;
  mask_all : constant integer := 16#1F#;

  procedure fpsetmask(mask : in integer);
  pragma IMPORT (C, fpsetmask, "fpsetmask");

begin 

  -- Mask all floating point exceptions so they can be trapped
  fpsetmask (mask_all);

  fixed1 := fixed1 / fixed2;

exception
  when others => null;
end;

PR tree-optimization/57337

2013-05-23 Thread Easwaran Raman
This addresses the case where UID alone is not sufficient to figure
out which statement appears earlier in  a BB. Bootstraps and no test
regressions in x86_64 on linux. Ok for trunk?

Thanks,
Easwaran


2013-05-23  Easwaran Raman  

PR tree-optimization/57337
* tree-ssa-reassoc.c (appears_later_in_bb): New function.
(find_insert_point): Correctly identify the insertion point
when two statements with the same UID is compared.

Index: gcc/tree-ssa-reassoc.c
===
--- gcc/tree-ssa-reassoc.c  (revision 199211)
+++ gcc/tree-ssa-reassoc.c  (working copy)
@@ -2866,6 +2866,31 @@ not_dominated_by (gimple a, gimple b)

 }

+/* Among STMT1 and STMT2, return the statement that appears later. Both
+   statements are in same BB and have the same UID.  */
+
+static gimple
+appears_later_in_bb (gimple stmt1, gimple stmt2)
+{
+  unsigned uid = gimple_uid (stmt1);
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt1);
+  gsi_next (&gsi);
+  if (gsi_end_p (gsi))
+return stmt1;
+  for (; !gsi_end_p (gsi); gsi_next (&gsi))
+{
+  gimple stmt = gsi_stmt (gsi);
+
+  /* If STMT has a different UID than STMT1 and we haven't seen
+ STMT2 during traversal, we know STMT1 appears later.  */
+  if (gimple_uid (stmt) != uid)
+return stmt1;
+  else if (stmt == stmt2)
+return stmt2;
+}
+  gcc_unreachable ();
+}
+
 /* Find the statement after which STMT must be moved so that the
dependency from DEP_STMT to STMT is maintained.  */

@@ -2875,7 +2900,11 @@ find_insert_point (gimple stmt, gimple dep_stmt)
   gimple insert_stmt = stmt;
   if (dep_stmt == NULL)
 return stmt;
-  if (not_dominated_by (insert_stmt, dep_stmt))
+  if (gimple_uid (insert_stmt) == gimple_uid (dep_stmt)
+  && gimple_bb (insert_stmt) == gimple_bb (dep_stmt)
+  && insert_stmt != dep_stmt)
+insert_stmt = appears_later_in_bb (insert_stmt, dep_stmt);
+  else if (not_dominated_by (insert_stmt, dep_stmt))
 insert_stmt = dep_stmt;
   return insert_stmt;
 }


fix memory spaces and references for C

2013-05-23 Thread Mike Stump
So, memory spaces and references are interacting badly in C.  The standard 
allows conversions during assignment that can change qualifiers.  The good 
news, all that code is already written and appears to work just fine.  The sad 
part, we don't use it.  The code that needs fixing is in convert_for_assignment:

   /* A type converts to a reference to it. 
  
   This code doesn't fully support references, it's just for the
   
   special case of va_start and va_copy.  */
if (codel == REFERENCE_TYPE
&& comptypes (TREE_TYPE (type),
  TREE_TYPE (rhs)) == 1)

This doesn't work, as the memory space qualifiers disqualify the two types from 
being compatible (this is correct and matches the standard).  Instead, we 
expand the conditional to include all cases we are prepared to handle:

if (codel == REFERENCE_TYPE && coder != REFERENCE_TYPE)

and then, in the body, we use convert_for_assignment to handle the conversion, 
as it already does everything.

This is a smaller patch than maybe it should be.  Arguably not recursing is a 
better approach, but then we need to split into two functions, so that I can 
add the REFERENCE_TYPE back to the top.  Let me know if you prefer it split.

A user actually hit this in rather trivial code with memory spaces.

Tested on two platforms, one with memory spaces and one without.

Ok?

diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index fe6d1f6..40ccf58 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -5294,11 +5294,9 @@ convert_for_assignment (location_t location, tree type, 
tree rhs,
   rhs = require_complete_type (rhs);
   if (rhs == error_mark_node)
 return error_mark_node;
-  /* A type converts to a reference to it.
- This code doesn't fully support references, it's just for the
- special case of va_start and va_copy.  */
-  if (codel == REFERENCE_TYPE
-  && comptypes (TREE_TYPE (type), TREE_TYPE (rhs)) == 1)
+  /* A non-reference type can convert to a reference.  This handles
+ va_start, va_copy and possibly port built-ins.  */
+  if (codel == REFERENCE_TYPE && coder != REFERENCE_TYPE)
 {
   if (!lvalue_p (rhs))
{
@@ -5310,16 +5308,11 @@ convert_for_assignment (location_t location, tree type, 
tree rhs,
   rhs = build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (rhs)), rhs);
   SET_EXPR_LOCATION (rhs, location);
 
-  /* We already know that these two types are compatible, but they
-may not be exactly identical.  In fact, `TREE_TYPE (type)' is
-likely to be __builtin_va_list and `TREE_TYPE (rhs)' is
-likely to be va_list, a typedef to __builtin_va_list, which
-is different enough that it will cause problems later.  */
-  if (TREE_TYPE (TREE_TYPE (rhs)) != TREE_TYPE (type))
-   {
- rhs = build1 (NOP_EXPR, build_pointer_type (TREE_TYPE (type)), rhs);
- SET_EXPR_LOCATION (rhs, location);
-   }
+  rhs = convert_for_assignment (location, build_pointer_type (TREE_TYPE 
(type)),
+   rhs, origtype, errtype, 
null_pointer_constant,
+   fundecl, function, parmnum);
+  if (rhs == error_mark_node)
+   return error_mark_node;
 
   rhs = build1 (NOP_EXPR, type, rhs);
   SET_EXPR_LOCATION (rhs, location);
--


Re: [RFA PATCH, alpha]: Fix PR 57379, segfault in invalidate_any_buried_refs

2013-05-23 Thread Richard Henderson
On 05/23/2013 12:38 AM, Uros Bizjak wrote:
> 2013-05-23  Uros Bizjak  
> 
> * config/alpha/alpha.md (unspec): Add UNSPEC_XFLT_COMPARE.
> * config/alpha/alpha.c (alpha_emit_xfloating_compare): Construct
> REG_EQUAL note as UNSPEC_XFLT_COMPARE unspec.
> 
> Patch was bootstrapped and regression tested on alphaev68-linux-gnu.
> 
> OK for mainline and release branches?
> 
> [1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57379

Ok.


r~


Re: [google gcc-4_7,gcc-4_8,integration] Add bounds checks to vector

2013-05-23 Thread Paul Pluzhnikov
On Thu, May 23, 2013 at 9:14 AM, Jonathan Wakely  wrote:

> I was wondering the other day whether we should put these checks on
> trunk and enable them automatically when !defined(__OPTIMIZE__)

FWIW, we keep this under a separate macro so we can turn it on or off
independent of other build options.

Our current code looks like this:

#if !defined(__google_stl_debug_vector)
# if !defined(NDEBUG)
#  define __google_stl_debug_vector 1
# endif
#endif


Keying off NDEBUG rather than __OPTIMIZE__ seems like a more
consistent approach -- if you want assert()s, then you probably also
want these checks.

-- 
Paul Pluzhnikov


Re: [AArch64] Support for CLZ

2013-05-23 Thread Vidya Praveen

On 23/05/13 14:40, Marcus Shawcroft wrote:

On 22 May 2013 12:47, Vidya Praveen  wrote:

Hello,

This patch adds support to AdvSIMD CLZ instruction and adds tests for the
same.
Regression test done for aarch64-none-elf with no issues.

OK?

Regards
VP

---

gcc/ChangeLog

2013-05-22  Vidya Praveen 

 * config/aarch64/aarch64-simd.md (clzv4si2): Support for CLZ
   instruction (AdvSIMD).
 * config/aarch64/aarch64-builtins.c
   (aarch64_builtin_vectorized_function): Handler for BUILT_IN_CLZ.
 * config/aarch64/aarch-simd-builtins.def: Entry for CLZ.
 * testsuite/gcc.target/aarch64/vect-clz.c: New file.


I committed this for you, and moved the testsuite ChangeLog entry over
to gcc/testsuite/ChangeLog.


Thanks Marcus! :-)

Regards
VP





RE: [PATCH][gensupport] Add optional attributes field to define_cond_exec

2013-05-23 Thread Kyrylo Tkachov
Hi Michael,

> Hi Kyrylo, Richard,
> 
> > What would be the function of (set_attr "ds_predicable" "yes") ?
> > Doesn't the use of  already trigger the
> substitution?
> To use define subst one doesn't need to write (set_attr
> "ds_predicable" "yes") - it's triggered by mentioning any of connected
> subst-attributes in the pattern.
> 
> > ... But I'd like to keep using the
> > "predicable" attribute
> > the way it's used now to mark patterns for cond_exec'ednes.
> If you decide to move to define_subst from cond_exec, then I'd suggest
> to avoid using 'predicable' attribute - it could involve cond_exec
> after or before define_subst and that's definitely not what you might
> want to get.

I'm reluctant to replace define_cond_exec with define_subst for a couple of
reasons:

- What about define_insn_and_split? Currently, we can define "predicable"
for a define_insn_and_split,
but the define_subst documentation says it can only be applied to
define_insn and define_expand.

- What about predication on a per-alternative basis (set_attr "predicable"
"yes,no,yes")?
If the presence of a subst_attr in a pattern triggers substitution (and
hence predication),
how do we specify that a particular alternative cannot be predicable? the
define_cond_exec
machinery does some implicit tricks with ce_enabled and noce_enabled that
allows to do that
(http://gcc.gnu.org/ml/gcc-patches/2011-06/msg00094.html).
define_subst doesn't seem like a good substitute (no pun intended ;) ) at
this point.

> 
> If you describe in more details, which patterns you're trying to get
> from which, I'll try to help with define_subst.

An example of what I'm trying to achieve is here:
http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01139.html

So, define_cond_exec is used in the arm backend as follows:
(define_cond_exec
  [(match_operator 0 "arm_comparison_operator"
[(match_operand 1 "cc_register" "")
 (const_int 0)])]
  "TARGET_32BIT"
  ""
[(set_attr "predicated" "yes")]
)

If I were to replace it with a define_subst, as per Richards' suggestion, it
would look like this?

(define_subst "ds_predicable"
  [(match_operand 0)]
  "TARGET_32BIT"
  [(cond_exec (match_operator 1 "arm_comparison_operator"
[(match_operand 2 "cc_register" "")
 (const_int 0)])
  (match_dup 0))])

(define_subst_attr "ds_predicable_enabled" "ds_predicable" "no" "yes")

Then, a pattern like:

(define_insn "arm_load_exclusive"
  [(set (match_operand:SI 0 "s_register_operand" "=r")
(zero_extend:SI
  (unspec_volatile:NARROW
[(match_operand:NARROW 1 "mem_noofs_operand" "Ua")]
VUNSPEC_LL)))]
  "TARGET_HAVE_LDREXBH"
  "ldrex%?\t%0, %C1"
  [(set_attr "predicable" "yes")])

would be rewritten like this:

(define_insn "arm_load_exclusive"
  [(set (match_operand:SI 0 "s_register_operand" "=r")
(zero_extend:SI
  (unspec_volatile:NARROW
[(match_operand:NARROW 1 "mem_noofs_operand" "Ua")]
VUNSPEC_LL)))]
  "TARGET_HAVE_LDREXBH"
  "ldrex%?\t%0, %C1"
  [(set_attr "predicated" "")])

The substitution is triggered implicitly when  is
encountered.
The "predicable" attribute is gone
But what if there was a second alternative in this define_insn that was
originally non-predicable?
(i.e. (set_attr "predicable" "yes, no")). How would we ensure that the
cond_exec version of that is
not produced?

It seems to me that, as things stand currently, adding the capability to add
set_attr to define_cond_exec
(what my patch does) is the cleaner solution from a backend perspective,
requiring fewer rewrites/workarounds
for dealing with cond_exec's.


Thanks,
Kyrill
> 
> Thanks, Michael
> 
> On 23 May 2013 12:56, Kyrylo Tkachov  wrote:
> > Hi Richard,
> >
> >> No, define_subst works across patterns, keyed by attributes.
> Exactly
> >> like
> >> cond_exec, really.
> >>
> >> But what you ought to be able to do right now is
> >>
> >> (define_subst "ds_predicable"
> >>   [(match_operand 0)]
> >>   ""
> >>   [(cond_exec (blah) (match_dup 0))])
> >>
> >> (define_subst_attr "ds_predicable_enabled" "ds_predicable" "no"
> "yes"0
> >>
> >> (define_insn "blah"
> >>   [(blah)]
> >>   ""
> >>   "@
> >>blah
> >>blah"
> >>   [(set_attr "ds_predicable" "yes")
> >>(set_attr "ds_predicated" "")])
> >
> > What would be the function of (set_attr "ds_predicable" "yes") ?
> > Doesn't the use of  already trigger the
> substitution?
> >
> >>
> >> At which point you can define "enabled" in terms of ds_predicated
> plus
> >> whatever.
> >>
> >> With a small bit of work we ought to be able to move that
> ds_predicated
> >> attribute to the define_subst itself, so that you don't have to
> >> replicate that
> >> set_attr line N times.
> >
> > That would be nice. So we would have to use define_subst instead of
> > define_cond_exec
> > to generate the cond_exec patterns. But I'd like to keep using the
> > "predicable" attribute
> > the way it's used now to mark patterns for cond_exec'ednes.
> >
> > So you'd 

Re: Partial fix for PR opt/55177

2013-05-23 Thread Jeff Law

On 05/23/2013 10:05 AM, Eric Botcazou wrote:

The PR is about missed simplifications for __builtin_swap.  IIUC Andrew has
patches for them at the Tree level, but I think having basic simplifications
at the RTL level for BSWAP is also worthwhile, hence the attached patch.

Tested on x86_64-suse-linux.  Comments?


2013-05-23  Eric Botcazou  

PR opt/55177
* simplify-rtx.c (simplify_unary_operation_1) : Deal with BSWAP.
(simplify_byte_swapping_operation): New.
(simplify_binary_operation_1): Call it for AND, IOR and XOR.
(simplify_relational_operation_1): Deal with BSWAP.


2013-05-23  Eric Botcazou  

* gcc.dg/builtin-bswap-6.c: New test.
* gcc.dg/builtin-bswap-7.c: Likewise.
* gcc.dg/builtin-bswap-8.c: Likewise.
* gcc.dg/builtin-bswap-9.c: Likewise.
Seems reasonable.  And no matter how hard we try, just about everything 
we want to catch at the tree level will by some path be exposable at the 
RTL level as well.


I'd like to see the gimple equivalents moving forward as well, but 
realize you may not be in a position to move Andrew's code forward.


Jeff




Re: [PATCH, rs6000] power8 patches, patch #3, add V2DI vector support

2013-05-23 Thread David Edelsohn
On Tue, May 21, 2013 at 11:42 AM, Michael Meissner
 wrote:
> This is patch #3 of our power8 changes.  It adds support for vectorizing 
> 64-bit
> integer types (V2DI) for plus, subtract, absolute value, minimum, maximum,
> shift, rotate, and comparison.  Like the other patches, I have bootstraped
> these patches, and had no regressions.  The test gcc.dg/vect/vect-96.c now
> passes (it had failed on trunk, for compilers built with --with-cpu=power7).
> Are the patches ok to commit to the tree.
>
> Due to size issues, I will submit the tests for the testsuite either as part 
> of
> patch #4 or #5.
>
> 2013-05-20  Michael Meissner  
> Pat Haugen 
> Peter Bergner 
>
> * config/rs6000/vector.md (VEC_I): Add support for new power8 V2DI
> instructions.
> (VEC_A): Likewise.
> (VEC_C): Likewise.
> (vrotl3): Likewise.
> (vashl3): Likewise.
> (vlshr3): Likewise.
> (vashr3): Likewise.
>
> * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
> support for power8 V2DI builtins.
>
> * config/rs6000/rs6000-builtin.def (abs_v2di): Add support for
> power8 V2DI builtins.
> (vupkhsw): Likewise.
> (vupklsw): Likewise.
> (vaddudm): Likewise.
> (vminsd): Likewise.
> (vmaxsd): Likewise.
> (vminud): Likewise.
> (vmaxud): Likewise.
> (vpkudum): Likewise.
> (vpksdss): Likewise.
> (vpkudus): Likewise.
> (vpksdus): Likewise.
> (vrld): Likewise.
> (vsld): Likewise.
> (vsrd): Likewise.
> (vsrad): Likewise.
> (vsubudm): Likewise.
> (vcmpequd): Likewise.
> (vcmpgtsd): Likewise.
> (vcmpgtud): Likewise.
> (vcmpequd_p): Likewise.
> (vcmpgtsd_p): Likewise.
> (vcmpgtud_p): Likewise.
> (vupkhsw): Likewise.
> (vupklsw): Likewise.
> (vaddudm): Likewise.
> (vmaxsd): Likewise.
> (vmaxud): Likewise.
> (vminsd): Likewise.
> (vminud): Likewise.
> (vpksdss): Likewise.
> (vpksdus): Likewise.
> (vpkudum): Likewise.
> (vpkudus): Likewise.
> (vrld): Likewise.
> (vsld): Likewise.
> (vsrad): Likewise.
> (vsrd): Likewise.
> (vsubudm): Likewise.
>
> * config/rs6000/rs6000.c (rs6000_init_hard_regno_mode_ok): Add
> support for power8 V2DI instructions.
>
> * config/rs6000/altivec.md (UNSPEC_VPKUHUM): Add support for
> power8 V2DI instructions.  Combine pack and unpack insns to use an
> iterator for each mode.  Check whether a particular mode supports
> Altivec instructions instead of just checking TARGET_ALTIVEC.
> (UNSPEC_VPKUWUM): Likewise.
> (UNSPEC_VPKSHSS): Likewise.
> (UNSPEC_VPKSWSS): Likewise.
> (UNSPEC_VPKUHUS): Likewise.
> (UNSPEC_VPKSHUS): Likewise.
> (UNSPEC_VPKUWUS): Likewise.
> (UNSPEC_VPKSWUS): Likewise.
> (UNSPEC_VPACK_SIGN_SIGN_SAT): Likewise.
> (UNSPEC_VPACK_SIGN_UNS_SAT): Likewise.
> (UNSPEC_VPACK_UNS_UNS_SAT): Likewise.
> (UNSPEC_VPACK_UNS_UNS_MOD): Likewise.
> (UNSPEC_VUPKHSB): Likewise.
> (UNSPEC_VUNPACK_HI_SIGN): Likewise.
> (UNSPEC_VUNPACK_LO_SIGN): Likewise.
> (UNSPEC_VUPKHSH): Likewise.
> (UNSPEC_VUPKLSB): Likewise.
> (UNSPEC_VUPKLSH): Likewise.
> (VI2): Likewise.
> (VI_char): Likewise.
> (VI_scalar): Likewise.
> (VI_unit): Likewise.
> (VP): Likewise.
> (VP_small): Likewise.
> (VP_small_lc): Likewise.
> (VU_char): Likewise.
> (add3): Likewise.
> (altivec_vaddcuw): Likewise.
> (altivec_vaddus): Likewise.
> (altivec_vaddss): Likewise.
> (sub3): Likewise.
> (altivec_vsubcuw): Likewise.
> (altivec_vsubus): Likewise.
> (altivec_vsubss): Likewise.
> (altivec_vavgs): Likewise.
> (altivec_vcmpbfp): Likewise.
> (altivec_eq): Likewise.
> (altivec_gt): Likewise.
> (altivec_gtu): Likewise.
> (umax3): Likewise.
> (smax3): Likewise.
> (umin3): Likewise.
> (smin3): Likewise.
> (altivec_vpkuhum): Likewise.
> (altivec_vpkuwum): Likewise.
> (altivec_vpkshss): Likewise.
> (altivec_vpkswss): Likewise.
> (altivec_vpkuhus): Likewise.
> (altivec_vpkshus): Likewise.
> (altivec_vpkuwus): Likewise.
> (altivec_vpkswus): Likewise.
> (altivec_vpksss): Likewise.
> (altivec_vpksus): Likewise.
> (altivec_vpkuus): Likewise.
> (altivec_vpkuum): Likewise.
> (altivec_vrl): Likewise.
> (altivec_vsl): Likewise.
> (altivec_vsr): Likewise.
> (altivec_vsra): Likewise.
> (altivec_vsldoi_): Likewise.
> (altivec_vupkhsb): Likewise.
>   

Re: [patch] Default to --enable-libstdcxx-time=auto

2013-05-23 Thread Jakub Jelinek
On Thu, May 23, 2013 at 04:02:18PM +0200, Jakub Jelinek wrote:
> So, here is an untested 4.8 branch patch.  The @GLIBCXX_3.4.17 +
> @@GLIBCXX_3.4.19 stuff gets ugly, I admit, but don't have other solution.
> Tested just that it compiles/links, abi list looks good and abi.exp testing,
> haven't actually tried to test it more than that.

Now fully bootstrapped/regtested on x86_64-linux and i686-linux.

Here is the corresponding trunk patch (untested at this point though):

2013-05-23  Jakub Jelinek  

* src/c++11/chrono.cc: If _GLIBCXX_USE_CLOCK_MONOTONIC_SYSCALL,
include unistd.h and sys/syscall.h.  If _GLIBCXX_COMPATIBILITY_CXX0X,
don't define system_clock::is_steady, system_clock::now() and
steady_clock::is_steady.
(std::chrono::system_clock::now()): If
_GLIBCXX_USE_CLOCK_MONOTONIC_SYSCALL, call
syscall (SYS_clock_gettime, ...) instead of clock_gettime (...).
(std::chrono::system_clock::now()): Likewise.  Add weak attribute
if _GLIBCXX_COMPATIBILITY_CXX0X and compatibility-chrono.cc will
be non-empty.
* src/Makefile.am (cxx11_sources): Add compatibility-chrono.cc.
(compatibility-chrono.lo, compatibility-chrono.o): New goals.
* src/c++11/compatibility-chrono.cc: New file.
* acinclude.m4 (GLIBCXX_ENABLE_LIBSTDCXX_TIME): On linux*, check for
syscall (SYS_clock_gettime, CLOCK_MONOTONIC, &tp).
* testsuite/util/testsuite_abi.cc (check_version): Add
GLIBCXX_3.4.20 version and make it the latest.
* config/abi/pre/gnu.ver (_ZNSt6chrono12steady_clock3nowEv): Export
also @@GLIBCXX_3.4.19.  Move all symbols so far added for GCC 4.9 to
@@GLIBCXX_3.4.20 instead.
* config/abi/post/i386-linux-gnu/baseline_symbols.txt: Regenerated.
* config/abi/post/s390x-linux-gnu/baseline_symbols.txt: Regenerated.
* config/abi/post/powerpc64-linux-gnu/baseline_symbols.txt:
Regenerated.
* config/abi/post/powerpc64-linux-gnu/32/baseline_symbols.txt:
Regenerated.
* config/abi/post/x86_64-linux-gnu/baseline_symbols.txt: Regenerated.
* config/abi/post/x86_64-linux-gnu/32/baseline_symbols.txt:
Regenerated.
* config/abi/post/powerpc-linux-gnu/baseline_symbols.txt: Regenerated.
* config/abi/post/s390-linux-gnu/baseline_symbols.txt: Regenerated.
* config/abi/post/i486-linux-gnu/baseline_symbols.txt: Regenerated.
* config.h.in: Regenerated.
* src/Makefile.in: Regenerated.
* configure: Regenerated.

--- libstdc++-v3/src/c++11/chrono.cc.jj 2013-02-04 18:15:15.078395533 +0100
+++ libstdc++-v3/src/c++11/chrono.cc2013-05-23 18:06:40.562825017 +0200
@@ -32,13 +32,18 @@
  defined(_GLIBCXX_USE_GETTIMEOFDAY)
 #include 
 #endif
+#ifdef _GLIBCXX_USE_CLOCK_MONOTONIC_SYSCALL
+#include 
+#include 
+#endif
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
   namespace chrono
   {
   _GLIBCXX_BEGIN_NAMESPACE_VERSION
- 
+
+#ifndef _GLIBCXX_COMPATIBILITY_CXX0X
 constexpr bool system_clock::is_steady;
 
 system_clock::time_point
@@ -47,7 +52,11 @@ namespace std _GLIBCXX_VISIBILITY(defaul
 #ifdef _GLIBCXX_USE_CLOCK_REALTIME
   timespec tp;
   // -EINVAL, -EFAULT
+#ifdef _GLIBCXX_USE_CLOCK_MONOTONIC_SYSCALL
+  syscall(SYS_clock_gettime, CLOCK_REALTIME, &tp);
+#else
   clock_gettime(CLOCK_REALTIME, &tp);
+#endif
   return time_point(duration(chrono::seconds(tp.tv_sec)
 + chrono::nanoseconds(tp.tv_nsec)));
 #elif defined(_GLIBCXX_USE_GETTIMEOFDAY)
@@ -61,16 +70,29 @@ namespace std _GLIBCXX_VISIBILITY(defaul
   return system_clock::from_time_t(__sec);
 #endif
 }
+#endif
 
 #ifdef _GLIBCXX_USE_CLOCK_MONOTONIC
+#ifndef _GLIBCXX_COMPATIBILITY_CXX0X
 constexpr bool steady_clock::is_steady;
+#endif
 
+#if defined(_GLIBCXX_SYMVER_GNU) && defined(_GLIBCXX_SHARED) \
+&& defined(_GLIBCXX_HAVE_AS_SYMVER_DIRECTIVE) \
+&& defined(_GLIBCXX_HAVE_SYMVER_SYMBOL_RENAMING_RUNTIME_SUPPORT) \
+&& !defined(_GLIBCXX_COMPATIBILITY_CXX0X)
+__attribute__((__weak__))
+#endif
 steady_clock::time_point
 steady_clock::now() noexcept
 {
   timespec tp;
   // -EINVAL, -EFAULT
+#ifdef _GLIBCXX_USE_CLOCK_MONOTONIC_SYSCALL
+  syscall(SYS_clock_gettime, CLOCK_MONOTONIC, &tp);
+#else
   clock_gettime(CLOCK_MONOTONIC, &tp);
+#endif
   return time_point(duration(chrono::seconds(tp.tv_sec)
 + chrono::nanoseconds(tp.tv_nsec)));
 }
--- libstdc++-v3/src/Makefile.am.jj 2013-02-14 20:11:41.491575233 +0100
+++ libstdc++-v3/src/Makefile.am2013-05-23 18:06:40.578824061 +0200
@@ -50,7 +50,8 @@ cxx98_sources = \
 cxx11_sources = \
compatibility-c++0x.cc \
compatibility-atomic-c++0x.cc \
-   compatibility-thread-c++0x.cc
+   compatibility-thread-c++0x.cc \
+   compatibility-chrono.cc
 
 libstdc___la_SOURCES = $(cxx98_sources) $(cxx11_sources

Re: [patch, powerpc] increase array alignment for Altivec

2013-05-23 Thread David Edelsohn
On Thu, May 23, 2013 at 11:18 AM, Bill Schmidt
 wrote:
> On Thu, 2013-05-23 at 08:54 -0600, Sandra Loosemore wrote:
>> On 05/23/2013 06:29 AM, Bill Schmidt wrote:
>> >
>> > Sandra and David,
>> >
>> > The array-alignment patch is performance-neutral with respect to
>> > CPU2006.  All variations were in the noise range.
>>
>> Well, that settles it; I don't see any reason to pursue the patch any
>> further if it's not a performance win after all.  It probably helped on
>> some specific program or benchmark our original customer was interested
>> in but that was in an older version of GCC, etc.
>>
>> Bill, thanks very much for helping with this.
>
> I'm not sure that's the right message to take away here -- this was just
> verifying that we didn't see a benchmarking problem with the patch.  It
> seems likely that the patch does have benefits; they just aren't exposed
> in the particular benchmarks in SPEC CPU2006.
>
> I think the patch is worth pursuing, since there aren't any negative
> consequences in reporting benchmarks.

Sandra,

I completely agree with Bill. The intention of the benchmark run was a
quick sniff test that the patch did not cause any significant
performance degradation, not that the limited set of benchmarks showed
improvement.

I want to have a version of the patch committed. The only question now
is how much of the patch can be committed without exposing potential
incompatibilities between different object files.

Thanks, David


Re: [google gcc-4_7,gcc-4_8,integration] Add bounds checks to vector

2013-05-23 Thread Jonathan Wakely
On 23 May 2013 16:56, Paul Pluzhnikov wrote:
>>
>> This patch adds (relatively) cheap bounds and dangling checks to
>> vector, similar to the checks I added to vector in r195373,
>> r195356, etc.

I was wondering the other day whether we should put these checks on
trunk and enable them automatically when !defined(__OPTIMIZE__)

Now that we have -Og you could use that to disable the checks without
sacrificing debuggability.


Partial fix for PR opt/55177

2013-05-23 Thread Eric Botcazou
The PR is about missed simplifications for __builtin_swap.  IIUC Andrew has 
patches for them at the Tree level, but I think having basic simplifications 
at the RTL level for BSWAP is also worthwhile, hence the attached patch.

Tested on x86_64-suse-linux.  Comments?


2013-05-23  Eric Botcazou  

PR opt/55177
* simplify-rtx.c (simplify_unary_operation_1) : Deal with BSWAP.
(simplify_byte_swapping_operation): New.
(simplify_binary_operation_1): Call it for AND, IOR and XOR.
(simplify_relational_operation_1): Deal with BSWAP.


2013-05-23  Eric Botcazou  

* gcc.dg/builtin-bswap-6.c: New test.
* gcc.dg/builtin-bswap-7.c: Likewise.
* gcc.dg/builtin-bswap-8.c: Likewise.
* gcc.dg/builtin-bswap-9.c: Likewise.


-- 
Eric BotcazouIndex: simplify-rtx.c
===
--- simplify-rtx.c	(revision 199091)
+++ simplify-rtx.c	(working copy)
@@ -858,7 +858,6 @@ simplify_unary_operation_1 (enum rtx_cod
   /* (not (ashiftrt foo C)) where C is the number of bits in FOO
 	 minus 1 is (ge foo (const_int 0)) if STORE_FLAG_VALUE is -1,
 	 so we can perform the above simplification.  */
-
   if (STORE_FLAG_VALUE == -1
 	  && GET_CODE (op) == ASHIFTRT
 	  && GET_CODE (XEXP (op, 1))
@@ -890,7 +889,6 @@ simplify_unary_operation_1 (enum rtx_cod
 	 with negating logical insns (and-not, nand, etc.).  If result has
 	 only one NOT, put it first, since that is how the patterns are
 	 coded.  */
-
   if (GET_CODE (op) == IOR || GET_CODE (op) == AND)
 	{
 	  rtx in1 = XEXP (op, 0), in2 = XEXP (op, 1);
@@ -913,6 +911,13 @@ simplify_unary_operation_1 (enum rtx_cod
 	  return gen_rtx_fmt_ee (GET_CODE (op) == IOR ? AND : IOR,
  mode, in1, in2);
 	}
+
+  /* (not (bswap x)) -> (bswap (not x)).  */
+  if (GET_CODE (op) == BSWAP)
+	{
+	  rtx x = simplify_gen_unary (NOT, mode, XEXP (op, 0), mode);
+	  return simplify_gen_unary (BSWAP, mode, x, mode);
+	}
   break;
 
 case NEG:
@@ -2050,6 +2055,36 @@ simplify_const_unary_operation (enum rtx
   return NULL_RTX;
 }
 
+/* Subroutine of simplify_binary_operation to simplify a binary operation
+   CODE that can commute with byte swapping, with result mode MODE and
+   operating on OP0 and OP1.  CODE is currently one of AND, IOR or XOR.
+   Return zero if no simplification or canonicalization is possible.  */
+
+static rtx
+simplify_byte_swapping_operation (enum rtx_code code, enum machine_mode mode,
+  rtx op0, rtx op1)
+{
+  rtx tem;
+
+  /* (op (bswap x) C1)) -> (bswap (op x C2)) with C2 swapped.  */
+  if (GET_CODE (op0) == BSWAP
+  && (CONST_INT_P (op1) || CONST_DOUBLE_AS_INT_P (op1)))
+{
+  tem = simplify_gen_binary (code, mode, XEXP (op0, 0),
+ simplify_gen_unary (BSWAP, mode, op1, mode));
+  return simplify_gen_unary (BSWAP, mode, tem, mode);
+}
+
+  /* (op (bswap x) (bswap y)) -> (bswap (op x y)).  */
+  if (GET_CODE (op0) == BSWAP && GET_CODE (op1) == BSWAP)
+{
+  tem = simplify_gen_binary (code, mode, XEXP (op0, 0), XEXP (op1, 0));
+  return simplify_gen_unary (BSWAP, mode, tem, mode);
+}
+
+  return NULL_RTX;
+}
+
 /* Subroutine of simplify_binary_operation to simplify a commutative,
associative binary operation CODE with result mode MODE, operating
on OP0 and OP1.  CODE is currently one of PLUS, MULT, AND, IOR, XOR,
@@ -2791,6 +2826,10 @@ simplify_binary_operation_1 (enum rtx_co
 	XEXP (op0, 1));
 }
 
+  tem = simplify_byte_swapping_operation (code, mode, op0, op1);
+  if (tem)
+	return tem;
+
   tem = simplify_associative_operation (code, mode, op0, op1);
   if (tem)
 	return tem;
@@ -2934,6 +2973,10 @@ simplify_binary_operation_1 (enum rtx_co
 	  && (reversed = reversed_comparison (op0, mode)))
 	return reversed;
 
+  tem = simplify_byte_swapping_operation (code, mode, op0, op1);
+  if (tem)
+	return tem;
+
   tem = simplify_associative_operation (code, mode, op0, op1);
   if (tem)
 	return tem;
@@ -3116,6 +3159,10 @@ simplify_binary_operation_1 (enum rtx_co
 	  && op1 == XEXP (XEXP (op0, 0), 0))
 	return simplify_gen_binary (AND, mode, op1, XEXP (op0, 1));
 
+  tem = simplify_byte_swapping_operation (code, mode, op0, op1);
+  if (tem)
+	return tem;
+
   tem = simplify_associative_operation (code, mode, op0, op1);
   if (tem)
 	return tem;
@@ -4764,6 +4811,21 @@ simplify_relational_operation_1 (enum rt
 simplify_gen_binary (XOR, cmp_mode,
 			 XEXP (op0, 1), op1));
 
+  /* (eq/ne (bswap x) C1) simplifies to (eq/ne x C2) with C2 swapped.  */
+  if ((code == EQ || code == NE)
+  && GET_CODE (op0) == BSWAP
+  && (CONST_INT_P (op1) || CONST_DOUBLE_AS_INT_P (op1)))
+return simplify_gen_relational (code, mode, cmp_mode, XEXP (op0, 0),
+simplify_gen_unary (BSWAP, cmp_mode,
+			op1, cmp_mode));
+
+  /* (eq/ne (bswap x) (bswap y)) simplifies to (eq/ne x y).  */
+  if ((code == EQ || code == NE)

Re: Remove global state from gcc/tracer.c

2013-05-23 Thread David Malcolm
On Thu, 2013-05-23 at 06:56 -0400, David Malcolm wrote:
> On Thu, 2013-05-23 at 07:14 +0200, Jakub Jelinek wrote:
> > On Wed, May 22, 2013 at 08:45:45PM -0400, David Malcolm wrote:
> > > I'm attempting to eliminate global state from the insides of gcc.
> > > 
> > > gcc/tracer.c has various global variables, which are only used during
> > > the lifetime of the execute callback of that pass, and cleaned up at the
> > > end of each invocation of the pass.
> > > 
> > > The attached patch introduces a class to hold the state of the pass
> > > ("tracer_state"), eliminating these globals.  An instance of the state
> > > is created on the stack, and all of the various "static" functions in
> > > tracer.c that used the globals become member functions of the state.
> > > Hence the state is passed around by the implicit "this" of the
> > > tracer_state, avoiding the need to patch each individual use of a field
> > > within the state, minimizing the diff.
> > 
> > But do we want to handle the global state this way?  This adds overhead
> > to (almost?) every single function (now method) in the file (because it gets
> > an extra argument).  While that might be fine for rarely executed functions,
> > if it involves also hot functions called many times, where especially on
> > register starved hosts it could increase register pressure, plus the
> > overhead of passing the this argument everywhere, this could start to be
> > noticeable.  Sure, if you plan to do that just in one pass (but, why then?),
> > it might be tiny slowdown, but after you convert the hundreds of passes in
> > gcc that contain global state it might become significant.
> > 
> > There are alternative approaches that should be considered.
> 
> I thought of a possible way of doing this, attached is a
> proof-of-concept attempt.
> 
> The idea is to use (and then not use) C++'s "static" syntax for class
> methods and fields.  By making that optional with a big configure-time
> switch, it gives us a way of making state be either global vs on-stack,
> with minimal syntax changes.  In one configuration (for building gcc as
> a library) there would be implicit this-> throughout, but in the other
> (for speedy binaries) it would all compile away to global state, as per
> the status quo.
> 
> This assumes that doing:
> 
>tracer_state state;
>changed = state.tail_duplicate ();
> 
> is legitimate; when using global state, "state" is empty, and the call
> to
>   state.tail_duplicate ()
> becomes effectively:
>   state::tail_duplicate ()
> since it's static in that configuration.
> 
> > E.g. global state of a pass can be moved into a per-pass structure,
> > and have some way how to aggregate those per pass structures together from
> > all the passes in the whole compiler (that can be either manual process,
> > say each pass providing its own *-passstate.h and one big header including
> > all that together), or automatic ones (say gengstate or a new tool could
> > create those for us from special markings in the source, say new option on
> > GTY or something) and have some magic macro how to access the global state
> > within the pass (thispass->fieldname ?).  Then e.g. depending on how the
> > compiler would be configured and built, thispass could be just address of a
> > pass struct var (i.e. essentially keep the global state as is, for
> > performance reasons), or when trying to build compiler as a library (with
> > -fpic overhead we probably don't want for cc1/cc1plus - we can build all the
> > *.o files twice, like libtool does) thispass could expand to __thread
> > pointer var dereference plus a field inside of the global compiler state
> > structure it points to for the current pass.  Thus, the library version
> > of the compiler would be somewhat slower (both -fpic overhead and TLS
> > overhead), and would need either a few of the entrypoints tweaked to adjust
> > the TLS pointer to the global state, or we could require users to just call
> > a special function to make the global state current in the current thread
> > before calling compiler internals.
> 
> Thanks.   Though I thought we were trying to move away from relying on
> GTY parsing?   (Sorry not to be able to answer more fully yet, need to
> get family ready for school...)

I've warmed to your idea of having tooling to support state, and
creating a generic framework for this.  For example, consider the
(long-term) use-case of embedding GCC's code as a library inside a
multithreaded app, where each thread could be JIT-compiling say OpenGL
shader code to machine code (perhaps with some extra non-standard
passes).  To get there, I'd need to isolate *all* of GGC's state, and
when I look at, say, the garbage-collector, I shudder.

So I'm interested in writing some compile-time tooling for generic
state-management in GCC, to sidestep the conflict between speed in the
status quo single state case vs support for multiple states.

Here's a possible way of doing it:

Given e.g. gcc/foo.c containing som

Re: [google gcc-4_7,gcc-4_8,integration] Add bounds checks to vector

2013-05-23 Thread Paul Pluzhnikov
+cc libstd...@gcc.gnu.org

On Thu, May 23, 2013 at 8:51 AM, Paul Pluzhnikov  wrote:
> Greetings,
>
> This patch adds (relatively) cheap bounds and dangling checks to
> vector, similar to the checks I added to vector in r195373,
> r195356, etc.
>
> Ok for google branches (gcc-4_7, gcc-4_8, integration) ?
>
> Thanks,

-- 
Paul Pluzhnikov
Index: libstdc++-v3/include/bits/stl_bvector.h
===
--- libstdc++-v3/include/bits/stl_bvector.h (revision 199261)
+++ libstdc++-v3/include/bits/stl_bvector.h (working copy)
@@ -438,11 +438,31 @@
 #endif
 
   ~_Bvector_base()
-  { this->_M_deallocate(); }
+  {
+this->_M_deallocate();
+#if __google_stl_debug_bvector
+__builtin_memset(this, 0xcd, sizeof(*this));
+#endif
+  }
 
 protected:
   _Bvector_impl _M_impl;
 
+#if __google_stl_debug_bvector
+  bool _M_is_valid() const
+  {
+   return (this->_M_impl._M_start._M_p == 0
+   && this->_M_impl._M_finish._M_p == 0
+   && this->_M_impl._M_end_of_storage == 0)
+ || (this->_M_impl._M_start._M_p <= this->_M_impl._M_finish._M_p
+ && this->_M_impl._M_finish._M_p <= this->_M_impl._M_end_of_storage
+ && (this->_M_impl._M_start._M_p < this->_M_impl._M_end_of_storage
+  || (this->_M_impl._M_start._M_p == 
this->_M_impl._M_end_of_storage
+  && this->_M_impl._M_start._M_offset == 0
+  && this->_M_impl._M_finish._M_offset == 0)));
+  }
+#endif
+
   _Bit_type*
   _M_allocate(size_t __n)
   { return _M_impl.allocate(_S_nword(__n)); }
@@ -571,6 +591,10 @@
 vector&
 operator=(const vector& __x)
 {
+#if __google_stl_debug_bvector
+  if (!this->_M_is_valid())
+   __throw_logic_error("op=() on corrupt (dangling?) vector");
+#endif
   if (&__x == this)
return *this;
   if (__x.size() > capacity())
@@ -587,6 +611,10 @@
 vector&
 operator=(vector&& __x)
 {
+#if __google_stl_debug_bvector
+  if (!this->_M_is_valid())
+   __throw_logic_error("op=() on corrupt (dangling?) vector");
+#endif
   // NB: DR 1204.
   // NB: DR 675.
   this->clear();
@@ -608,12 +636,22 @@
 // or not the type is an integer.
 void
 assign(size_type __n, const bool& __x)
-{ _M_fill_assign(__n, __x); }
+{
+#if __google_stl_debug_bvector
+  if (!this->_M_is_valid())
+   __throw_logic_error("assign on corrupt (dangling?) vector");
+#endif
+  _M_fill_assign(__n, __x);
+}
 
 template
   void
   assign(_InputIterator __first, _InputIterator __last)
   {
+#if __google_stl_debug_bvector
+   if (!this->_M_is_valid())
+ __throw_logic_error("assign() on corrupt (dangling?) vector");
+#endif
typedef typename std::__is_integer<_InputIterator>::__type _Integral;
_M_assign_dispatch(__first, __last, _Integral());
   }
@@ -626,19 +664,43 @@
 
 iterator
 begin() _GLIBCXX_NOEXCEPT
-{ return this->_M_impl._M_start; }
+{
+#if __google_stl_debug_bvector
+  if (!this->_M_is_valid())
+   __throw_logic_error("begin() on corrupt (dangling?) vector");
+#endif
+  return this->_M_impl._M_start;
+}
 
 const_iterator
 begin() const _GLIBCXX_NOEXCEPT
-{ return this->_M_impl._M_start; }
+{
+#if __google_stl_debug_bvector
+  if (!this->_M_is_valid())
+   __throw_logic_error("begin() on corrupt (dangling?) vector");
+#endif
+  return this->_M_impl._M_start;
+}
 
 iterator
 end() _GLIBCXX_NOEXCEPT
-{ return this->_M_impl._M_finish; }
+{
+#if __google_stl_debug_bvector
+  if (!this->_M_is_valid())
+   __throw_logic_error("end() on corrupt (dangling?) vector");
+#endif
+  return this->_M_impl._M_finish;
+}
 
 const_iterator
 end() const _GLIBCXX_NOEXCEPT
-{ return this->_M_impl._M_finish; }
+{
+#if __google_stl_debug_bvector
+  if (!this->_M_is_valid())
+   __throw_logic_error("end() on corrupt (dangling?) vector");
+#endif
+  return this->_M_impl._M_finish;
+}
 
 reverse_iterator
 rbegin() _GLIBCXX_NOEXCEPT
@@ -659,11 +721,23 @@
 #ifdef __GXX_EXPERIMENTAL_CXX0X__
 const_iterator
 cbegin() const noexcept
-{ return this->_M_impl._M_start; }
+{
+#if __google_stl_debug_bvector
+  if (!this->_M_is_valid())
+   __throw_logic_error("cbegin() on corrupt (dangling?) vector");
+#endif
+  return this->_M_impl._M_start;
+}
 
 const_iterator
 cend() const noexcept
-{ return this->_M_impl._M_finish; }
+{
+#if __google_stl_debug_bvector
+  if (!this->_M_is_valid())
+   __throw_logic_error("cend() on corrupt (dangling?) vector");
+#endif
+  return this->_M_impl._M_finish;
+}
 
 const_reverse_iterator
 crbegin() const noexcept
@@ -681,6 +755,10 @@
 size_type
 max_size() const _GLIBCXX_NOEXCEPT
 {
+#if __google_stl_debug_b

[google gcc-4_7,gcc-4_8,integration] Add bounds checks to vector

2013-05-23 Thread Paul Pluzhnikov
Greetings,

This patch adds (relatively) cheap bounds and dangling checks to
vector, similar to the checks I added to vector in r195373,
r195356, etc.

Ok for google branches (gcc-4_7, gcc-4_8, integration) ?

Thanks,

--


Index: libstdc++-v3/include/bits/stl_bvector.h
===
--- libstdc++-v3/include/bits/stl_bvector.h (revision 199261)
+++ libstdc++-v3/include/bits/stl_bvector.h (working copy)
@@ -438,11 +438,31 @@
 #endif
 
   ~_Bvector_base()
-  { this->_M_deallocate(); }
+  {
+this->_M_deallocate();
+#if __google_stl_debug_bvector
+__builtin_memset(this, 0xcd, sizeof(*this));
+#endif
+  }
 
 protected:
   _Bvector_impl _M_impl;
 
+#if __google_stl_debug_bvector
+  bool _M_is_valid() const
+  {
+   return (this->_M_impl._M_start._M_p == 0
+   && this->_M_impl._M_finish._M_p == 0
+   && this->_M_impl._M_end_of_storage == 0)
+ || (this->_M_impl._M_start._M_p <= this->_M_impl._M_finish._M_p
+ && this->_M_impl._M_finish._M_p <= this->_M_impl._M_end_of_storage
+ && (this->_M_impl._M_start._M_p < this->_M_impl._M_end_of_storage
+  || (this->_M_impl._M_start._M_p == 
this->_M_impl._M_end_of_storage
+  && this->_M_impl._M_start._M_offset == 0
+  && this->_M_impl._M_finish._M_offset == 0)));
+  }
+#endif
+
   _Bit_type*
   _M_allocate(size_t __n)
   { return _M_impl.allocate(_S_nword(__n)); }
@@ -571,6 +591,10 @@
 vector&
 operator=(const vector& __x)
 {
+#if __google_stl_debug_bvector
+  if (!this->_M_is_valid())
+   __throw_logic_error("op=() on corrupt (dangling?) vector");
+#endif
   if (&__x == this)
return *this;
   if (__x.size() > capacity())
@@ -587,6 +611,10 @@
 vector&
 operator=(vector&& __x)
 {
+#if __google_stl_debug_bvector
+  if (!this->_M_is_valid())
+   __throw_logic_error("op=() on corrupt (dangling?) vector");
+#endif
   // NB: DR 1204.
   // NB: DR 675.
   this->clear();
@@ -608,12 +636,22 @@
 // or not the type is an integer.
 void
 assign(size_type __n, const bool& __x)
-{ _M_fill_assign(__n, __x); }
+{
+#if __google_stl_debug_bvector
+  if (!this->_M_is_valid())
+   __throw_logic_error("assign on corrupt (dangling?) vector");
+#endif
+  _M_fill_assign(__n, __x);
+}
 
 template
   void
   assign(_InputIterator __first, _InputIterator __last)
   {
+#if __google_stl_debug_bvector
+   if (!this->_M_is_valid())
+ __throw_logic_error("assign() on corrupt (dangling?) vector");
+#endif
typedef typename std::__is_integer<_InputIterator>::__type _Integral;
_M_assign_dispatch(__first, __last, _Integral());
   }
@@ -626,19 +664,43 @@
 
 iterator
 begin() _GLIBCXX_NOEXCEPT
-{ return this->_M_impl._M_start; }
+{
+#if __google_stl_debug_bvector
+  if (!this->_M_is_valid())
+   __throw_logic_error("begin() on corrupt (dangling?) vector");
+#endif
+  return this->_M_impl._M_start;
+}
 
 const_iterator
 begin() const _GLIBCXX_NOEXCEPT
-{ return this->_M_impl._M_start; }
+{
+#if __google_stl_debug_bvector
+  if (!this->_M_is_valid())
+   __throw_logic_error("begin() on corrupt (dangling?) vector");
+#endif
+  return this->_M_impl._M_start;
+}
 
 iterator
 end() _GLIBCXX_NOEXCEPT
-{ return this->_M_impl._M_finish; }
+{
+#if __google_stl_debug_bvector
+  if (!this->_M_is_valid())
+   __throw_logic_error("end() on corrupt (dangling?) vector");
+#endif
+  return this->_M_impl._M_finish;
+}
 
 const_iterator
 end() const _GLIBCXX_NOEXCEPT
-{ return this->_M_impl._M_finish; }
+{
+#if __google_stl_debug_bvector
+  if (!this->_M_is_valid())
+   __throw_logic_error("end() on corrupt (dangling?) vector");
+#endif
+  return this->_M_impl._M_finish;
+}
 
 reverse_iterator
 rbegin() _GLIBCXX_NOEXCEPT
@@ -659,11 +721,23 @@
 #ifdef __GXX_EXPERIMENTAL_CXX0X__
 const_iterator
 cbegin() const noexcept
-{ return this->_M_impl._M_start; }
+{
+#if __google_stl_debug_bvector
+  if (!this->_M_is_valid())
+   __throw_logic_error("cbegin() on corrupt (dangling?) vector");
+#endif
+  return this->_M_impl._M_start;
+}
 
 const_iterator
 cend() const noexcept
-{ return this->_M_impl._M_finish; }
+{
+#if __google_stl_debug_bvector
+  if (!this->_M_is_valid())
+   __throw_logic_error("cend() on corrupt (dangling?) vector");
+#endif
+  return this->_M_impl._M_finish;
+}
 
 const_reverse_iterator
 crbegin() const noexcept
@@ -681,6 +755,10 @@
 size_type
 max_size() const _GLIBCXX_NOEXCEPT
 {
+#if __google_stl_debug_bvector
+  if (!this->_M_is_valid())
+   __throw_logic_error("max_size() on corrupt (dangling?) vector");
+

Re: [patch] Preserve the CFG until after final

2013-05-23 Thread Eric Botcazou
> Sadly no. Most of these (the *agu* ones) are also reached from final.
> For example:
> 
> movdi_internal -> ix86_use_lea_for_mov -> ix86_lea_outperforms ->
> distance_non_agu_define -> distance_non_agu_define_in_bb
> 
> Likewise for movsi_internal, and zero_extendsidi2. For the
> mov?i_internal define_insns, it's been like that since at least
> r181077 (November 2011).

OK, I double-checked, but obviously not sufficiently.

> I must admit I was surprised by that, too. It may have been
> coincidence that it worked when this patch was (IMHO wrongfully)
> accepted. Someone got away with it because i386 calls
> compute_bb_for_insn in its machine-reorg, and does *not* call
> free_bb_for_insn, leaving the BLOCK_FOR_INSN pointers in place all the
> way through final. There are no passes between machine-reorg and final
> that run for i386 and damage the CFG because split5 doesn't run on
> i386 (because of STACK_REGS) and the other passes, like
> shorten_branches, don't modify the insns chain.

I see, thanks for the analysis.

> I think you're taking a too dbr_schedule-ports point of view on this
> There are already targets that never really destroy the CFG at all,
> all the way through final. Few ports that do destroy it, destroy it as
> badly as dbr_schedule. Most only have innocent "damage" that are
> really just deficiencies of verify_flow_info.

I cannot deny that I care about the architectures with delay slots and think 
that starting to put them aside is the beginning of a slippery slope.  Unlike 
in other compilers, a few of them are first-class architectures in GCC and I 
think that this should be preserved, even if they are less sexy these days.

> It is feasible in the short term -- as in: right now -- for some
> targets. Is it possible in the short term for all targets? No. But
> you've got to start somewhere. I firmly believe that port maintainers
> will not find it hard to make it work for their ports, dbr_schedule
> ports aside and that's a problem I'm trying to solve (while at it:
> ping**2 for http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00595.html
> that I need to move ahead).

Your efforts are much appreciated but replacing dbr_schedule is a huge 
undertaking which may take longer than expected and I don't think that
we have any guarantee on its outcome.

> I've considered that path, too, but I opted against it because we end
> up with pass_free_cfg being called from the majority of targets, and
> verify_flow_info calls in targets that really maintain a proper CFG.
> It also results in unnecessary damage from split5 for targets that
> have a valid CFG up to that point. But what worried me the most is
> that this approach made it more difficult to see what ports actually
> are CFG-safe. I chose the new target hook approach because you can
> just grep for the CFG-safe target hook to see what ports are already
> OK and which ones are still TODO.

Sure, but IMO one very probable future is that the CFG-safe target hook will 
be quickly enabled for x86, PowerPC and ARM, at which point people will start 
to add CFG-based enhancements to the late generic passes (or entire new CFG-
based late generic passes), leaving the architectures with delay slots dead in 
the water.

> But if I still haven't convince you, I'll prepare a patch along those lines.

I honestly cannot approve a patch that segregates the architectures with delay 
slots from the others.  Now, if another maintainer thinks this is the right 
call to make here, I won't oppose.

-- 
Eric Botcazou


Re: Fix PR 53743 and other -freorder-blocks-and-partition failures

2013-05-23 Thread Jeff Law

On 05/22/2013 11:20 PM, Steven Bosscher wrote:

On Thu, May 23, 2013 at 4:07 AM, Jeff Law wrote:

On 05/22/2013 04:07 PM, Steven Bosscher wrote:



The problem here is two things:

1. Many GCC developers still don't fully grasp the difference between
cfglayout mode and the older cfgrtl mode.


Absolutely true.  I'd actually love it if someone (you?) could write up the
basics of cfglayout mode.  Why does it exist, what things should a developer
need to know about it, etc.


I did that already, way back in 2006, based on a presentation I gave
at the Moscow Gelato meeting (see attachment). I first posted it as a
patch for doc/cfg.texi (twice) but it never got reviewed, so I put it
on the GCC Wiki instead: http://gcc.gnu.org/wiki/cfglayout_mode

The change-over to MoinMoin mangled the page markup so it's a bit
messy right now. It's also 7 year old text that hasn't been updated to
reflect the current state of the compiler (with almost all pre
register allocator passes working in cfglayout mode).

I will clean it up again and create a new diff for cfg.texi.

Thanks.  I wasn't aware of that wiki page.  I'll be reading it today :-)

jeff


Re: [PATCH][gensupport] Add optional attributes field to define_cond_exec

2013-05-23 Thread Michael Zolotukhin
Hi Kyrylo, Richard,

> What would be the function of (set_attr "ds_predicable" "yes") ?
> Doesn't the use of  already trigger the substitution?
To use define subst one doesn't need to write (set_attr
"ds_predicable" "yes") - it's triggered by mentioning any of connected
subst-attributes in the pattern.

> ... But I'd like to keep using the
> "predicable" attribute
> the way it's used now to mark patterns for cond_exec'ednes.
If you decide to move to define_subst from cond_exec, then I'd suggest
to avoid using 'predicable' attribute - it could involve cond_exec
after or before define_subst and that's definitely not what you might
want to get.

If you describe in more details, which patterns you're trying to get
from which, I'll try to help with define_subst.

Thanks, Michael

On 23 May 2013 12:56, Kyrylo Tkachov  wrote:
> Hi Richard,
>
>> No, define_subst works across patterns, keyed by attributes.  Exactly
>> like
>> cond_exec, really.
>>
>> But what you ought to be able to do right now is
>>
>> (define_subst "ds_predicable"
>>   [(match_operand 0)]
>>   ""
>>   [(cond_exec (blah) (match_dup 0))])
>>
>> (define_subst_attr "ds_predicable_enabled" "ds_predicable" "no" "yes"0
>>
>> (define_insn "blah"
>>   [(blah)]
>>   ""
>>   "@
>>blah
>>blah"
>>   [(set_attr "ds_predicable" "yes")
>>(set_attr "ds_predicated" "")])
>
> What would be the function of (set_attr "ds_predicable" "yes") ?
> Doesn't the use of  already trigger the substitution?
>
>>
>> At which point you can define "enabled" in terms of ds_predicated plus
>> whatever.
>>
>> With a small bit of work we ought to be able to move that ds_predicated
>> attribute to the define_subst itself, so that you don't have to
>> replicate that
>> set_attr line N times.
>
> That would be nice. So we would have to use define_subst instead of
> define_cond_exec
> to generate the cond_exec patterns. But I'd like to keep using the
> "predicable" attribute
> the way it's used now to mark patterns for cond_exec'ednes.
>
> So you'd recommend changing the define_subst machinery to handle that
> ds_predicated attribute?
>
>
>   I think that's more or less what you were
>> suggesting
>> with your cond_exec extension, yes?
>
> Pretty much, yes. Thanks for the explanation.
>
>>
>>
>>
>> r~
>

--
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.


Re: [patch, powerpc] increase array alignment for Altivec

2013-05-23 Thread Bill Schmidt
On Thu, 2013-05-23 at 08:54 -0600, Sandra Loosemore wrote:
> On 05/23/2013 06:29 AM, Bill Schmidt wrote:
> >
> > Sandra and David,
> >
> > The array-alignment patch is performance-neutral with respect to
> > CPU2006.  All variations were in the noise range.
> 
> Well, that settles it; I don't see any reason to pursue the patch any 
> further if it's not a performance win after all.  It probably helped on 
> some specific program or benchmark our original customer was interested 
> in but that was in an older version of GCC, etc.
> 
> Bill, thanks very much for helping with this.

I'm not sure that's the right message to take away here -- this was just
verifying that we didn't see a benchmarking problem with the patch.  It
seems likely that the patch does have benefits; they just aren't exposed
in the particular benchmarks in SPEC CPU2006.

I think the patch is worth pursuing, since there aren't any negative
consequences in reporting benchmarks.

Thanks,
Bill

> 
> -Sandra
> 



Re: RFA: fix rtl-optimization/56833

2013-05-23 Thread Eric Botcazou
> But I can see that there could be a problem with an earlier value
> that used to be valid in a multi-hard-register sub register to be
> considered to be still valid.
> Setting the mode of every constituent register but the first one
> (which has the new value recorded) to VOIDmode at the same time
> as updating reg_set_luid should be sufficent to address this issue.

Agreed.

> The pass was originally written with word_mode == Pmode targets like
> the SH in mind, where multi-hard-register values are uninteresting.
> 
> But for targets like the avr, most or all of the interesting values
> will be in multi-hard-register registers.

The patch is OK on principle, but can you factor out the common code?  The 
endings of move2add_use_add2_insn and move2add_use_add3_insn are identical so 
it would be nice to have e.g. a record_reg_value helper function and call it 
from there.  Similarly, the 3 new checks look strictly identical.

-- 
Eric Botcazou


Re: [patch, powerpc] increase array alignment for Altivec

2013-05-23 Thread Sandra Loosemore

On 05/23/2013 06:29 AM, Bill Schmidt wrote:


Sandra and David,

The array-alignment patch is performance-neutral with respect to
CPU2006.  All variations were in the noise range.


Well, that settles it; I don't see any reason to pursue the patch any 
further if it's not a performance win after all.  It probably helped on 
some specific program or benchmark our original customer was interested 
in but that was in an older version of GCC, etc.


Bill, thanks very much for helping with this.

-Sandra




[PATCH AArch64] Remove Usa constraint.

2013-05-23 Thread Marcus Shawcroft
Hi,  This patch switches the only use of the "Usa" constraint to use "S" 
instead and removes the definition and documentation for "Usa". 
Regressed for aarch64-none-elf. Applied.


/Marcus

2013-05-23  Chris Schlumberger-Socha 
Marcus Shawcroft  

* config/aarch64/aarch64.md (*movdi_aarch64): Replace Usa with S.
* config/aarch64/constraints.md (Usa): Remove.
* doc/md.texi (AArch64 Usa): Remove.Index: gcc/doc/md.texi
===
--- gcc/doc/md.texi	(revision 199258)
+++ gcc/doc/md.texi	(working copy)
@@ -1711,9 +1711,6 @@
 @item Z
 Integer constant zero
 
-@item Usa
-An absolute symbolic address
-
 @item Ush
 The high part (bits 12 and upwards) of the pc-relative address of a symbol
 within 4GB of the instruction
Index: gcc/config/aarch64/aarch64.md
===
--- gcc/config/aarch64/aarch64.md	(revision 199258)
+++ gcc/config/aarch64/aarch64.md	(working copy)
@@ -829,8 +829,8 @@
 )
 
 (define_insn "*movdi_aarch64"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,*w,m,  m,r,  r,  *w, r,*w,w")
-	(match_operand:DI 1 "aarch64_mov_operand"  " r,r,k,N,m, m,rZ,*w,Usa,Ush,rZ,*w,*w,Dd"))]
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,*w,m,  m,r,r,  *w, r,*w,w")
+	(match_operand:DI 1 "aarch64_mov_operand"  " r,r,k,N,m, m,rZ,*w,S,Ush,rZ,*w,*w,Dd"))]
   "(register_operand (operands[0], DImode)
 || aarch64_reg_or_zero (operands[1], DImode))"
   "@
Index: gcc/config/aarch64/constraints.md
===
--- gcc/config/aarch64/constraints.md	(revision 199258)
+++ gcc/config/aarch64/constraints.md	(working copy)
@@ -75,11 +75,6 @@
   "Integer constant zero."
   (match_test "op == const0_rtx"))
 
-(define_constraint "Usa"
-  "A constraint that matches an absolute symbolic address."
-  (and (match_code "const,symbol_ref")
-   (match_test "aarch64_symbolic_address_p (op)")))
-
 (define_constraint "Ush"
   "A constraint that matches an absolute symbolic address high part."
   (and (match_code "high")

[PATCH AArch64] Refactor aarch64_mov_operand predicate.

2013-05-23 Thread Marcus Shawcroft
This patch refactors the current implementation of the 
aarch64_mov_operand predicate in preparation for the addition of further 
memory models.  Regressed for aarch64-none-elf. Applied.


/Marcus

2013-05-23  Chris Schlumberger-Socha 
Marcus Shawcroft  

* config/aarch64/aarch64-protos.h (aarch64_mov_operand_p): Define.
* config/aarch64/aarch64.c (aarch64_mov_operand_p): Define.
* config/aarch64/predicates.md (aarch64_const_address): Remove.
(aarch64_mov_operand): Use aarch64_mov_operand_p.diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 001842e..91fcde8 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -146,6 +146,8 @@ bool aarch64_is_long_call_p (rtx);
 bool aarch64_label_mentioned_p (rtx);
 bool aarch64_legitimate_pic_operand_p (rtx);
 bool aarch64_move_imm (HOST_WIDE_INT, enum machine_mode);
+bool aarch64_mov_operand_p (rtx, enum aarch64_symbol_context,
+			enum machine_mode);
 bool aarch64_pad_arg_upward (enum machine_mode, const_tree);
 bool aarch64_pad_reg_upward (enum machine_mode, const_tree, bool);
 bool aarch64_regno_ok_for_base_p (int, bool);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a0aff58..12a7055 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6455,6 +6455,25 @@ aarch64_simd_imm_scalar_p (rtx x, enum machine_mode mode ATTRIBUTE_UNUSED)
   return true;
 }
 
+bool
+aarch64_mov_operand_p (rtx x,
+		   enum aarch64_symbol_context context ATTRIBUTE_UNUSED,
+		   enum machine_mode mode)
+{
+
+  if (GET_CODE (x) == HIGH
+  && aarch64_valid_symref (XEXP (x, 0), GET_MODE (XEXP (x, 0
+return true;
+
+  if (CONST_INT_P (x) && aarch64_move_imm (INTVAL (x), mode))
+return true;
+
+  if (GET_CODE (x) == SYMBOL_REF && mode == DImode && CONSTANT_ADDRESS_P (x))
+return true;
+
+  return false;
+}
+
 /* Return a const_int vector of VAL.  */
 rtx
 aarch64_simd_gen_const_vector_dup (enum machine_mode mode, int val)
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 8514e8f..16c4385 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -115,10 +115,6 @@
(match_test "aarch64_legitimate_address_p (mode, XEXP (op, 0), PARALLEL,
 	   0)")))
 
-(define_predicate "aarch64_const_address"
-  (and (match_code "symbol_ref")
-   (match_test "mode == DImode && CONSTANT_ADDRESS_P (op)")))
-
 (define_predicate "aarch64_valid_symref"
   (match_code "const, symbol_ref, label_ref")
 {
@@ -173,12 +169,7 @@
   (and (match_code "reg,subreg,mem,const_int,symbol_ref,high")
(ior (match_operand 0 "register_operand")
 	(ior (match_operand 0 "memory_operand")
-		 (ior (match_test "GET_CODE (op) == HIGH
-   && aarch64_valid_symref (XEXP (op, 0),
-			GET_MODE (XEXP (op, 0)))")
-		  (ior (match_test "CONST_INT_P (op)
-	&& aarch64_move_imm (INTVAL (op), mode)")
-			   (match_test "aarch64_const_address (op, mode)")))
+		 (match_test "aarch64_mov_operand_p (op, SYMBOL_CONTEXT_ADR, mode)")
 
 (define_predicate "aarch64_movti_operand"
   (and (match_code "reg,subreg,mem,const_int")

Re: [patch] Default to --enable-libstdcxx-time=auto

2013-05-23 Thread Jakub Jelinek
On Thu, May 23, 2013 at 01:04:36PM +0100, Jonathan Wakely wrote:
> > I mean something like completely untested following patch, then it would
> > be pretty much enabled for all non-prehistoric Linux builds (there is a risk
> > of it returning garbage on 2.4.x and earlier kernels, if you compile it on
> > something that defines __NR_clock_gettime in their headers, but the exact
> > same risk is if you do the same with --enable-libstdcxx-time=rt
> > (clock_gettime wrapper in glibc will return -1/ENOSYS in that case, so will
> > the syscall, but chrono.cc seems to ignore that return value)).
> > 2.6+ kernels (2004-ish and later or so) should support CLOCK_MONOTONIC just
> 
> This looks great to me, thanks.
> 
> > fine.  Of course, there is a possibility of fallback, at least for the
> > clock_gettime/syscall CLOCK_RUNTIME or gettimeofday, if they fail, fall
> > through into the time case, and for CLOCK_MONOTONIC perhaps just lie and
> > return time as well, shouldn't really affect almost anybody.
> 
> We should consider doing that yes, but it's less urgent.
> 
> > Still, the ABI question is there, would we want to apply to 4.8.1 (can we
> > get agreement on that RSN, this is pretty much the only blocker for 4.8.1
> > rc2 right now) and, would we export that symbol as @@GLIBCXX_3.4.18 (with
> > all trunk @@GLIBCXX_3.4.18 symbols moved to 3.4.19) and add @GLIBCXX_3.4.17
> > alias for backwards compatibility with those that configured with
> > --enable-libstdcxx-time=rt ?
> 
> I like that plan.

So, here is an untested 4.8 branch patch.  The @GLIBCXX_3.4.17 +
@@GLIBCXX_3.4.19 stuff gets ugly, I admit, but don't have other solution.
Tested just that it compiles/links, abi list looks good and abi.exp testing,
haven't actually tried to test it more than that.

For the trunk the patch will need some adjustments (basically, rename
everything GLIBCXX_3.4.19 to GLIBCXX_3.4.20 first).

2013-05-23  Jakub Jelinek  

* src/c++11/chrono.cc: If _GLIBCXX_USE_CLOCK_MONOTONIC_SYSCALL,
include unistd.h and sys/syscall.h.  If _GLIBCXX_COMPATIBILITY_CXX0X,
don't define system_clock::is_steady, system_clock::now() and
steady_clock::is_steady.
(std::chrono::system_clock::now()): If
_GLIBCXX_USE_CLOCK_MONOTONIC_SYSCALL, call
syscall (SYS_clock_gettime, ...) instead of clock_gettime (...).
(std::chrono::system_clock::now()): Likewise.  Add weak attribute
if _GLIBCXX_COMPATIBILITY_CXX0X and compatibility-chrono.cc will
be non-empty.
* src/Makefile.am (cxx11_sources): Add compatibility-chrono.cc.
(compatibility-chrono.lo, compatibility-chrono.o): New goals.
* src/c++11/compatibility-chrono.cc: New file.
* acinclude.m4 (GLIBCXX_ENABLE_LIBSTDCXX_TIME): On linux*, check for
syscall (SYS_clock_gettime, CLOCK_MONOTONIC, &tp).
* testsuite/util/testsuite_abi.cc (check_version): Add
GLIBCXX_3.4.19 version and make it the latest.
* config/abi/pre/gnu.ver (_ZNSt6chrono12steady_clock3nowEv): Export
also @@GLIBCXX_3.4.19.
* config/abi/post/i386-linux-gnu/baseline_symbols.txt: Regenerated.
* config/abi/post/s390x-linux-gnu/baseline_symbols.txt: Regenerated.
* config/abi/post/powerpc64-linux-gnu/baseline_symbols.txt:
Regenerated.
* config/abi/post/powerpc64-linux-gnu/32/baseline_symbols.txt:
Regenerated.
* config/abi/post/x86_64-linux-gnu/baseline_symbols.txt: Regenerated.
* config/abi/post/x86_64-linux-gnu/32/baseline_symbols.txt:
Regenerated.
* config/abi/post/powerpc-linux-gnu/baseline_symbols.txt: Regenerated.
* config/abi/post/s390-linux-gnu/baseline_symbols.txt: Regenerated.
* config/abi/post/i486-linux-gnu/baseline_symbols.txt: Regenerated.
* config.h.in: Regenerated.
* src/Makefile.in: Regenerated.
* configure: Regenerated.

--- libstdc++-v3/src/c++11/chrono.cc.jj 2013-03-16 08:07:57.0 +0100
+++ libstdc++-v3/src/c++11/chrono.cc2013-05-23 15:33:07.238690149 +0200
@@ -32,13 +32,18 @@
  defined(_GLIBCXX_USE_GETTIMEOFDAY)
 #include 
 #endif
+#ifdef _GLIBCXX_USE_CLOCK_MONOTONIC_SYSCALL
+#include 
+#include 
+#endif
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
   namespace chrono
   {
   _GLIBCXX_BEGIN_NAMESPACE_VERSION
- 
+
+#ifndef _GLIBCXX_COMPATIBILITY_CXX0X
 constexpr bool system_clock::is_steady;
 
 system_clock::time_point
@@ -47,7 +52,11 @@ namespace std _GLIBCXX_VISIBILITY(defaul
 #ifdef _GLIBCXX_USE_CLOCK_REALTIME
   timespec tp;
   // -EINVAL, -EFAULT
+#ifdef _GLIBCXX_USE_CLOCK_MONOTONIC_SYSCALL
+  syscall(SYS_clock_gettime, CLOCK_REALTIME, &tp);
+#else
   clock_gettime(CLOCK_REALTIME, &tp);
+#endif
   return time_point(duration(chrono::seconds(tp.tv_sec)
 + chrono::nanoseconds(tp.tv_nsec)));
 #elif defined(_GLIBCXX_USE_GETTIMEOFDAY)
@@ -61,16 +70,29 @@ namespace std _GLIBCXX_VISIBILITY(defau

Re: [PATCH] PR57377: Fix mnemonic attribute

2013-05-23 Thread Steven Bosscher
On Thu, May 23, 2013 at 3:52 PM, Andreas Krebbel wrote:
> Hi,
>
> when looking for user defined "mnemonic" attribute definitions the
> code so far only handles (set_attr ...).
>
> Fixed with the attached patch.

Can you also please document this attribute? It's missing from the
gccint manual, AFAICT.

Ciao!
Steven


[PATCH] PR57377: Fix mnemonic attribute

2013-05-23 Thread Andreas Krebbel
Hi,

when looking for user defined "mnemonic" attribute definitions the
code so far only handles (set_attr ...).

Fixed with the attached patch.

Tested on s390x.

Ok?

Bye,

-Andreas-

2013-05-23  Andreas Krebbel  

PR target/57377
* gensupport.c (gen_mnemonic_attr): Handle (set (attr x) y) and
(set_attr_alternative x ...) when searching for user defined
mnemonic attribute.

---
 gcc/gensupport.c |   27 !!!
 1 file changed, 27 modifications(!)

Index: gcc/gensupport.c
===
*** gcc/gensupport.c.orig
--- gcc/gensupport.c
*** gen_mnemonic_attr (void)
*** 2430,2443 
bool found = false;
  
/* Check if the insn definition already has
!(set_attr "mnemonic" ...).  */
if (XVEC (insn, 4))
for (i = 0; i < XVECLEN (insn, 4); i++)
! if (strcmp (XSTR (XVECEXP (insn, 4, i), 0), MNEMONIC_ATTR_NAME) == 0)
!   {
! found = true;
! break;
!   }
  
if (!found)
gen_mnemonic_setattr (mnemonic_htab, insn);
--- 2430,2458 
bool found = false;
  
/* Check if the insn definition already has
!(set_attr "mnemonic" ...) or (set (attr "mnemonic") ...).  */
if (XVEC (insn, 4))
for (i = 0; i < XVECLEN (insn, 4); i++)
! {
!   rtx set_attr = XVECEXP (insn, 4, i);
! 
!   switch (GET_CODE (set_attr))
! {
! case SET_ATTR:
! case SET_ATTR_ALTERNATIVE:
!   if (strcmp (XSTR (set_attr, 0), MNEMONIC_ATTR_NAME) == 0)
! found = true;
!   break;
! case SET:
!   if (GET_CODE (SET_DEST (set_attr)) == ATTR
!   && strcmp (XSTR (SET_DEST (set_attr), 0),
!  MNEMONIC_ATTR_NAME) == 0)
! found = true;
!   break;
! default:
!   break;
! }
! }
  
if (!found)
gen_mnemonic_setattr (mnemonic_htab, insn);



Re: [AArch64] Support for CLZ

2013-05-23 Thread Marcus Shawcroft
On 22 May 2013 12:47, Vidya Praveen  wrote:
> Hello,
>
> This patch adds support to AdvSIMD CLZ instruction and adds tests for the
> same.
> Regression test done for aarch64-none-elf with no issues.
>
> OK?
>
> Regards
> VP
>
> ---
>
> gcc/ChangeLog
>
> 2013-05-22  Vidya Praveen 
>
> * config/aarch64/aarch64-simd.md (clzv4si2): Support for CLZ
>   instruction (AdvSIMD).
> * config/aarch64/aarch64-builtins.c
>   (aarch64_builtin_vectorized_function): Handler for BUILT_IN_CLZ.
> * config/aarch64/aarch-simd-builtins.def: Entry for CLZ.
> * testsuite/gcc.target/aarch64/vect-clz.c: New file.

I committed this for you, and moved the testsuite ChangeLog entry over
to gcc/testsuite/ChangeLog.

Cheers
/Marcus


Re: [libgfortran, build] Use -z ignore instead of --as-needed on Solaris

2013-05-23 Thread Rainer Orth
Hi Tobias,

>> Rainer Orth wrote:
>>> As requested by Tobias, this patch supports -z ignore with Solaris ld
>>> instead of GNU ld's --as-needed.
>>
>> For reference, my request was motivated by
>> http://gcc.gnu.org/ml/gcc-patches/2013-04/msg00425.html
>> (The patch has been approved, but it does not seem to be in, yet.)
>
> the patch went in shortly after Paolo's approval, followed recently by
> another one to fix major fallout.  The latter prompted me to wait until
> I tackle this one...
>
>>> i386-pc-solaris2.10 and x86_64-unknown-linux-gnu bootstraps are still
>>> running.  In both cases, the correct options were detected and written
>>> into libgfortran.spec.  AFAICS the -static-libgfortran option isn't
>>> exercised anywhere in the testsuite, so I've both relinked one of the
>>> gfortran.dg testcases and a trivial FORTRAN hello world program with
>>> -static-libgfortran.  -z ignore/--as-needed is passed correctly in both
>>> cases, but while libgfortran is now linked statically, libquadmath.so is
>>> still dragged in due to references to at least quadmath_snprintf.  I
>>> thus can't tell if this --as-needed/-z ignore stuff ever does any good.
>>
>> Well, it kind of works - but seemingly not fully. If I use:
>>print *, "Hello World"; end
>> with -static-libgfortran, I get ("nm a.out"):
>>  w quadmath_snprintf@@QUADMATH_1.0
>>
>> While using a quad-precision variable, e.g.,
>>print *, 123.4_16; end
>> gives
>>  U quadmath_snprintf@@QUADMATH_1.0
>
> Still the effect is the same: both binaries depend on libquadmath.so.
> TBH, I don't know why the use of --as-needed/-z ignore should depend on
> -static-libgfortran at all.
>
>> I don't know whether one could do better.
>
> If there are no scenarios where this machinery avoids the libquadmath.so
> dependency completely, I don't really see a good use for.
>
>> +  # Test for native Solaris options first.
>>
>> Is there a reason that you first test the Solaris's options?
>
> Yes: Solaris ld from Solaris 11 onwards (sometimes backported to Solaris
> 10) has gained support for gld options for compatibility.  --as-needed
> isn't among the supported ones yet, but there's an open bug for that.
>
> I think it's better to stay with the native options if possible, so I
> prefer the Solaris ones over the GNU compatiblity ones.
>
>> +  # No whitespace after -z to pass it through -Wl.
>>
>> (By the way, you can use "-Wl,-z,ignore" if you want to have the space. For
>> the purpose of this patch, the space doesn't matter.)
>
> I know.
>
>>> Ok for mainline if testing passes?
>>
>> Looks fine to me - I don't know whether a build maintainer has still a
>> comment.
>
> Testing has passed successfully now on both targets.

how should we proceed with this patch now, given the questions above?
Install as is, although it doesn't seem really beneficial, or drop it?

Thanks.
Rainer


>>> 2013-05-14  Rainer Orth  
>>>
>>> * acinclude.m4 (libgfor_cv_have_as_needed): Check for -z ignore, too.
>>> * configure: Regenerate.

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Fix PR 53743 and other -freorder-blocks-and-partition failures

2013-05-23 Thread Teresa Johnson
On Wed, May 22, 2013 at 2:05 PM, Teresa Johnson  wrote:
> Revised patch included below. The spacing of my pasted in patch text
> looks funky again, let me know if you want the patch as an attachment
> instead.
>
> I addressed all of Steven's comments, except for the suggestion to use
> gcc_assert
> instead of error() in verify_hot_cold_block_grouping() to keep this consistent
> with the rest of the verify_flow_info subroutines (let me know if this is ok).

I fixed this issue too, which was actually in
insert_section_boundary_note(), so that it gcc_asserts more
efficiently as suggested. Retested, latest patch below.

Honza, would you be able to review the patch?

Thanks!
Teresa

>
> The other main changes:
> (1) Added several test cases (cloned from the torture subdirectories,
> where I manually
> built/ran with FDO and -freorder-blocks-and-partition with both the
> current trunk and
> my fixed trunk compiler, and was able to expose some failures I fixed.
> (2) Changed existing tree-prof tests that used
> -freorder-blocks-and-partition to be
> built with -O2 instead of -O, so that partitioning actually kicks in.
> (3) Fixed a couple of failures in the new
> verify_hot_cold_block_grouping() checks
> exposed by the torture tests I ran manually with splitting (2 of the
> tests cloned
> to tree-prof in this patch). One was in computed goto where we were
> too aggressive
> about cloning crossing edges, and the other was in rtl_split_edge
> called from the "stack"
> pass which was not correctly inserting the new bb in the correct partition 
> since
> bb layout is complete at that point.
>
> Re-tested on x86_64-unknown-linux-gnu with bootstrap and profiledbootstrap
> builds and regression testing. Re-built/ran cpu2006int with profile
> feedback and -freorder-blocks-and-partition enabled.
>
> Ok for trunk?
>
> Thanks!
> Teresa

2013-05-23  Teresa Johnson  

* ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
as this is now done by redirect_edge_and_branch_force.
* function.c (thread_prologue_and_epilogue_insns): Insert new bb after
barriers, and fix interaction with splitting.
* emit-rtl.c (try_split): Copy REG_CROSSING_JUMP notes.
* cfgcleanup.c (try_forward_edges): Fix early return value to properly
reflect changes made in the routine.
* bb-reorder.c (emit_barrier_after_bb): Move to cfgrtl.c.
(fix_up_fall_thru_edges): Remove incorrect check for bb layout order
since this is called in cfglayout mode, and replace partition fixup
with assert as that is now done by force_nonfallthru_and_redirect.
(add_reg_crossing_jump_notes): Handle the fact that some jumps may
already be marked with region crossing note.
(insert_section_boundary_note): Make non-static, gate on flag
has_bb_partition, rewrite to also check for multiple partitions.
(rest_of_handle_reorder_blocks): Remove call to
insert_section_boundary_note, now done later during free_cfg.
(duplicate_computed_gotos): Don't duplicate partition crossing edge.
* bb-reorder.h (insert_section_boundary_note): Declare.
* Makefile.in (cfgrtl.o): Depend on bb-reorder.h
* cfgrtl.c (rest_of_pass_free_cfg): If partitions exist
invoke insert_section_boundary_note.
(try_redirect_by_replacing_jump): Remove unnecessary
check for region crossing note.
(fixup_partition_crossing): New function.
(rtl_redirect_edge_and_branch): Fixup partition boundaries.
(emit_barrier_after_bb): Move here from bb-reorder.c, handle insertion
in non-cfglayout mode.
(force_nonfallthru_and_redirect): Fixup partition boundaries,
remove old code that tried to do this. Emit barrier correctly
when we are in cfglayout mode.
(last_bb_in_partition): New function.
(rtl_split_edge): Correctly fixup partition boundaries.
(commit_one_edge_insertion): Remove old code that tried to
fixup region crossing edge since this is now handled in
split_block, and set up insertion point correctly since
block may now end in a jump.
(verify_hot_cold_block_grouping): Guard against checking when not in
linearized RTL mode.
(rtl_verify_edges): Add checks for incorrect/missing REG_CROSSING_JUMP
notes.
(rtl_verify_flow_info_1): Move verify_hot_cold_block_grouping to
rtl_verify_flow_info, so not called in cfglayout mode.
(rtl_verify_flow_info): Move verify_hot_cold_block_grouping here.
(fixup_reorder_chain): Remove old code that attempted to fixup region
crossing note as this is now handled in force_nonfallthru_and_redirect.
(duplicate_insn_chain): Don't duplicate switch section notes.
(rtl_can_remove_branch_p): Remove unnecessary check for region crossing
note.
* basic-block.h (emit_barrier_after_bb): Declare.
* testsuite/gcc.dg/tree-prof/va-arg-pack-1.c: Cloned from c-torture, made
into -freorder-blocks-and-partition test.
* testsuite/gcc.dg/tree-prof/comp-goto-1.c: Ditto.
* testsuite/gcc.dg/tree-prof/20041218-1

Re: [GOOGLE] Fix bad merge of libstdc++-v3/libsupc++/Makefile.am

2013-05-23 Thread Diego Novillo

On 2013-05-23 08:43 , Evgeniy Stepanov wrote:

Hi,

r194664 in google/gcc-4_7 lost one line in
libstdc++-v3/libsupc++/Makefile.am and did not regenerate Makefile.in
(it seems to have been edited manually).

Now re-running automake in libstdc++ results in a non-trivial diff.

The attached patch updates Makefile.am. With this patch the difference
in generated files becomes trivial (line breaks only).


OK.


Diego.


Re: [testsuite] Disabling gcc.dg/cpp/trad/include.c for Android

2013-05-23 Thread Kirill Yukhin
Checked into trunk: http://gcc.gnu.org/ml/gcc-cvs/2013-05/msg00770.html

Thanks, K

On Tue, Apr 30, 2013 at 10:24 AM, Alexander Ivchenko  wrote:
> 2013/4/29 Mike Stump :
>> On Jan 9, 2013, at 7:14 AM, Alexander Ivchenko  wrote:
>>>  We have test fail for gcc.dg/cpp/trad/include.c on Android. The
>>> reason for that is that
>>> -ftraditional-cpp is not expected to work on Android due to variadic
>>> macro (like #define __builtin_warning(x, y...))
>>> in standard headers and traditional preprocessor cannot handle them.
>>>  The attached patch disables that test.
>>
>> Be sure to ask, Ok? in your patch submittals.
>>
>> Ok.
>
> thank you! I thought I did ask..
>
>> ...
>> in standard headers and traditional preprocessor cannot handle them."
>>
>> is it ok for trunk?
>>
>
> could someone commit that patch please? I don't have commit access.
>
> thanks,
> Alexander


[GOOGLE] Fix bad merge of libstdc++-v3/libsupc++/Makefile.am

2013-05-23 Thread Evgeniy Stepanov
Hi,

r194664 in google/gcc-4_7 lost one line in
libstdc++-v3/libsupc++/Makefile.am and did not regenerate Makefile.in
(it seems to have been edited manually).

Now re-running automake in libstdc++ results in a non-trivial diff.

The attached patch updates Makefile.am. With this patch the difference
in generated files becomes trivial (line breaks only).


1.patch
Description: Binary data


Re: C++ PATCH for c++/56930 (wrong -Wconversion warning with sizeof)

2013-05-23 Thread Jason Merrill

On 05/23/2013 01:26 AM, Jakub Jelinek wrote:

Is this sufficient though?


No, but it handles the most common case and is safer than the version on 
the trunk, which already required me to fix a couple of holes in the 
constexpr code.  If no more holes turn up, we could move the trunk 
version to the branch later.


Jason



Re: [patch, powerpc] increase array alignment for Altivec

2013-05-23 Thread Bill Schmidt
On Tue, 2013-05-21 at 21:57 -0400, David Edelsohn wrote:
> On Tue, May 21, 2013 at 7:13 PM, Sandra Loosemore
>  wrote:
> > On 05/21/2013 04:04 PM, David Edelsohn wrote:
> >>
> >>
> >> There are three issues here:
> >>
> >> 1) Someone in the LTC toolchain team needs to benchmark this patch on
> >> POWER7.
> >
> >
> > That would be great if somebody else could help with that.
> >
> >
> >> 2) We need to clarify how the patch affects the ABI because it cannot
> >> break the ABI.
> >
> >
> > I understand this.
> >
> >
> >> 3) Please stop saying that you cannot justify trying to get the patch
> >> in mainline.  Other developers have pointed out how the patch may be
> >> incorrect. Do you want to deliver a broken compiler to CodeSourcery's
> >> customers? The comment sets a bad tone for engaging with the GCC
> >> community.
> >
> >
> > I think you've misunderstood my position, here.  Delivering a broken
> > compiler is just what I want to avoid!  We've had the original
> > local-arrays-only patch in our local tree for a couple of years now, but we
> > no longer have a customer for it.  I thought the comments from the previous
> > review would be straightforward to address and it would be worth making one
> > more attempt to revise and resubmit the patch, but if the feedback we get
> > from the community is that this is still broken in other ways and is going
> > to need a lot more work before it's acceptable, we're going to give up on it
> > and revert the previous version of the patch locally too.  We have a lot of
> > higher-priority patches in our local tree that we'd like to get on mainline,
> > and limited resources for working on it, so we need to pick our battles.
> > That's all.  :-)
> 
> I think the local arrays patch makes sense, if it does not hurt
> performance. We had another recent case where increasing GCC's
> knowledge about the alignment of memory returned by malloc allowed
> additional vectorization opportunities, but hurt performance because
> of bad spilling choices by GCC RA.  This alignment patch may expose
> similar RA problems.  We may need to apply the patch with the
> optimization disabled until the RA spilling problem is fixed.
> 
> Increasing the alignment of arrays within structs and unions would be
> nice, but that probably will change the ABI. I think that they best we
> may be able to do is increase the alignment if the array is the first
> element of the struct or union, see ROUND_TYPE_ALIGN for AIX.
> Although this might be more trouble than it is worth.
> 
> Pat or Bill, can you test the performance of the array alignment patch?

Sandra and David,

The array-alignment patch is performance-neutral with respect to
CPU2006.  All variations were in the noise range.

Thanks,
Bill

> 
> Thanks, David
> 



Re: [patch] Default to --enable-libstdcxx-time=auto

2013-05-23 Thread Jonathan Wakely
On 23 May 2013 11:28, Jakub Jelinek wrote:
> On Wed, May 22, 2013 at 02:35:40PM +0200, Jakub Jelinek wrote:
>> non-steady clock instead.  Or, have you also considered just using
>> for this routine
>> #if _GLIBCXX_HAS_SYS_SYSCALL_H
>> #include 
>> #endif
>>
>> #if defined (SYS_clock_gettime) && defined (CLOCK_MONOTONIC)
>> syscall (SYS_clock_gettime, CLOCK_MONOTONIC, &tp);
>> #endif
>> if clock_gettime isn't available, at least on Linux?
>> The implementation seems to ignore ENOSYS from clock_gettime, so ignoring it
>> even here wouldn't make it worse.
>
> I mean something like completely untested following patch, then it would
> be pretty much enabled for all non-prehistoric Linux builds (there is a risk
> of it returning garbage on 2.4.x and earlier kernels, if you compile it on
> something that defines __NR_clock_gettime in their headers, but the exact
> same risk is if you do the same with --enable-libstdcxx-time=rt
> (clock_gettime wrapper in glibc will return -1/ENOSYS in that case, so will
> the syscall, but chrono.cc seems to ignore that return value)).
> 2.6+ kernels (2004-ish and later or so) should support CLOCK_MONOTONIC just

This looks great to me, thanks.

> fine.  Of course, there is a possibility of fallback, at least for the
> clock_gettime/syscall CLOCK_RUNTIME or gettimeofday, if they fail, fall
> through into the time case, and for CLOCK_MONOTONIC perhaps just lie and
> return time as well, shouldn't really affect almost anybody.

We should consider doing that yes, but it's less urgent.

> Still, the ABI question is there, would we want to apply to 4.8.1 (can we
> get agreement on that RSN, this is pretty much the only blocker for 4.8.1
> rc2 right now) and, would we export that symbol as @@GLIBCXX_3.4.18 (with
> all trunk @@GLIBCXX_3.4.18 symbols moved to 3.4.19) and add @GLIBCXX_3.4.17
> alias for backwards compatibility with those that configured with
> --enable-libstdcxx-time=rt ?

I like that plan.


Re: [PATCH,i386] FP Reassociation for AMD bdver1 and bdver2

2013-05-23 Thread Uros Bizjak
On Thu, May 23, 2013 at 1:11 PM, Gopalasubramanian, Ganesh
 wrote:

> The patch enables FP Reassociation pass AMD bdver1 and bdver2 architectures.
> We note a performance uplift of around ~8% on calculix.
>
> "make -k check" passes.
>
> Is it OK for upstream?

OK.

Thanks,
Uros.


[PATCH,i386] FP Reassociation for AMD bdver1 and bdver2

2013-05-23 Thread Gopalasubramanian, Ganesh
Hi 

The patch enables FP Reassociation pass AMD bdver1 and bdver2 architectures.
We note a performance uplift of around ~8% on calculix.

"make -k check" passes.

Is it OK for upstream?

Regards
Ganesh

Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 199133)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2013-05-23  Ganesh Gopalasubramanian  
+
+* config/i386/i386.c (initial_ix86_tune_features): Enable
+FP Reassociation for AMD bdver1 and bdver2.
+
 2013-05-21  Christian Bruel  

 * dwarf2out.c (multiple_reg_loc_descriptor): Use dbx_reg_number for
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c  (revision 199133)
+++ gcc/config/i386/i386.c  (working copy)
@@ -2026,7 +2026,7 @@

   /* X86_TUNE_REASSOC_FP_TO_PARALLEL: Try to produce parallel computations
  during reassociation of fp computation.  */
-  m_ATOM | m_HASWELL,
+  m_ATOM | m_HASWELL | m_BDVER1 | m_BDVER2,

   /* X86_TUNE_GENERAL_REGS_SSE_SPILL: Try to spill general regs to SSE
  regs instead of memory.  */



-Original Message-
From: Gopalasubramanian, Ganesh 
Sent: Monday, May 13, 2013 5:24 PM
To: gcc-patches@gcc.gnu.org
Cc: Uros Bizjak (ubiz...@gmail.com)
Subject: [PATCH,i386] FSGSBASE for AMD bdver3 

Hi 

The patch enables FSGSBASE instruction generation for AMD bdver3 architectures.

"make -k check" passes.

Is it OK for upstream?

Regards
Ganesh

Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 198821)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2013-05-13  Ganesh Gopalasubramanian  
+
+* config/i386/i386.c (processor_alias_table): Add instruction
+FSGSBASE for AMD bdver3 architecture.
+
 2013-05-13  Martin Jambor  

PR middle-end/42371
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c  (revision 198821)
+++ gcc/config/i386/i386.c  (working copy)
@@ -3000,7 +3000,7 @@
| PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX
| PTA_XOP | PTA_LWP | PTA_BMI | PTA_TBM | PTA_F16C
| PTA_FMA | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE
-   | PTA_XSAVEOPT},
+   | PTA_XSAVEOPT | PTA_FSGSBASE},
   {"btver1", PROCESSOR_BTVER1, CPU_GENERIC64,
PTA_64BIT | PTA_MMX |  PTA_SSE  | PTA_SSE2 | PTA_SSE3
| PTA_SSSE3 | PTA_SSE4A |PTA_ABM | PTA_CX16 | PTA_PRFCHW



Re: Remove global state from gcc/tracer.c

2013-05-23 Thread David Malcolm
On Thu, 2013-05-23 at 07:14 +0200, Jakub Jelinek wrote:
> On Wed, May 22, 2013 at 08:45:45PM -0400, David Malcolm wrote:
> > I'm attempting to eliminate global state from the insides of gcc.
> > 
> > gcc/tracer.c has various global variables, which are only used during
> > the lifetime of the execute callback of that pass, and cleaned up at the
> > end of each invocation of the pass.
> > 
> > The attached patch introduces a class to hold the state of the pass
> > ("tracer_state"), eliminating these globals.  An instance of the state
> > is created on the stack, and all of the various "static" functions in
> > tracer.c that used the globals become member functions of the state.
> > Hence the state is passed around by the implicit "this" of the
> > tracer_state, avoiding the need to patch each individual use of a field
> > within the state, minimizing the diff.
> 
> But do we want to handle the global state this way?  This adds overhead
> to (almost?) every single function (now method) in the file (because it gets
> an extra argument).  While that might be fine for rarely executed functions,
> if it involves also hot functions called many times, where especially on
> register starved hosts it could increase register pressure, plus the
> overhead of passing the this argument everywhere, this could start to be
> noticeable.  Sure, if you plan to do that just in one pass (but, why then?),
> it might be tiny slowdown, but after you convert the hundreds of passes in
> gcc that contain global state it might become significant.
> 
> There are alternative approaches that should be considered.

I thought of a possible way of doing this, attached is a
proof-of-concept attempt.

The idea is to use (and then not use) C++'s "static" syntax for class
methods and fields.  By making that optional with a big configure-time
switch, it gives us a way of making state be either global vs on-stack,
with minimal syntax changes.  In one configuration (for building gcc as
a library) there would be implicit this-> throughout, but in the other
(for speedy binaries) it would all compile away to global state, as per
the status quo.

This assumes that doing:

   tracer_state state;
   changed = state.tail_duplicate ();

is legitimate; when using global state, "state" is empty, and the call
to
  state.tail_duplicate ()
becomes effectively:
  state::tail_duplicate ()
since it's static in that configuration.

> E.g. global state of a pass can be moved into a per-pass structure,
> and have some way how to aggregate those per pass structures together from
> all the passes in the whole compiler (that can be either manual process,
> say each pass providing its own *-passstate.h and one big header including
> all that together), or automatic ones (say gengstate or a new tool could
> create those for us from special markings in the source, say new option on
> GTY or something) and have some magic macro how to access the global state
> within the pass (thispass->fieldname ?).  Then e.g. depending on how the
> compiler would be configured and built, thispass could be just address of a
> pass struct var (i.e. essentially keep the global state as is, for
> performance reasons), or when trying to build compiler as a library (with
> -fpic overhead we probably don't want for cc1/cc1plus - we can build all the
> *.o files twice, like libtool does) thispass could expand to __thread
> pointer var dereference plus a field inside of the global compiler state
> structure it points to for the current pass.  Thus, the library version
> of the compiler would be somewhat slower (both -fpic overhead and TLS
> overhead), and would need either a few of the entrypoints tweaked to adjust
> the TLS pointer to the global state, or we could require users to just call
> a special function to make the global state current in the current thread
> before calling compiler internals.

Thanks.   Though I thought we were trying to move away from relying on
GTY parsing?   (Sorry not to be able to answer more fully yet, need to
get family ready for school...)

Dave
diff --git a/gcc/tracer.c b/gcc/tracer.c
index 975cadb..f83ac0b 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -53,20 +53,74 @@
 static int count_insns (basic_block);
 static bool ignore_bb_p (const_basic_block);
 static bool better_p (const_edge, const_edge);
-static edge find_best_successor (basic_block);
-static edge find_best_predecessor (basic_block);
-static int find_trace (basic_block, basic_block *);
 
-/* Minimal outgoing edge probability considered for superblock formation.  */
-static int probability_cutoff;
-static int branch_ratio_cutoff;
+/* Crude testing hack for switching between:
+ global state
+   vs
+ (on-stack state plus implicit this->)
+   This would be a config option controlling the whole build, so that
+   you'd use the former for a standalone build of gcc, and the latter
+   when building the code for use as a dynamic library.  */
+#define GLOBAL_STATE 1
+
+#if GLOBAL_STATE
+/* When us

[PATCH] Fix PR57380

2013-05-23 Thread Richard Biener

This fixes a missed vectorization (missed MAX_EXPR detection
actually).  At some point I made the phiprop pass not transform
a load if it wasn't obvious that the loads would be "direct"
at the end.  But this pessimizes the case in question as it's
not easy to verify if forwprop will later combine the dereference
and a non-invariant address.

So the patch removes that restriction again but arranges for
phiprop to run right before forwprop so that after that and
FRE, phiopt has a chance to optimize to the MAX_EXPR.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2013-05-23  Richard Biener  

PR tree-optimization/57380
* tree-ssa-phiprop.c (propagate_with_phi): Do not require at
least one invariant or re-used load.
* passes.c (init_optimization_passes): Move pass_phiprop before
pass_forwprop.

* g++.dg/tree-ssa/pr57380.C: New testcase.

Index: gcc/tree-ssa-phiprop.c
===
*** gcc/tree-ssa-phiprop.c  (revision 199199)
--- gcc/tree-ssa-phiprop.c  (working copy)
*** propagate_with_phi (basic_block bb, gimp
*** 247,253 
ssa_op_iter i;
bool phi_inserted;
tree type = NULL_TREE;
-   bool one_invariant = false;
  
if (!POINTER_TYPE_P (TREE_TYPE (ptr))
|| !is_gimple_reg_type (TREE_TYPE (TREE_TYPE (ptr
--- 247,252 
*** propagate_with_phi (basic_block bb, gimp
*** 282,298 
if (!type
  && TREE_CODE (arg) == SSA_NAME)
type = TREE_TYPE (phivn[SSA_NAME_VERSION (arg)].value);
-   if (TREE_CODE (arg) == ADDR_EXPR
- && is_gimple_min_invariant (arg))
-   one_invariant = true;
  }
  
-   /* If we neither have an address of a decl nor can reuse a previously
-  inserted load, do not hoist anything.  */
-   if (!one_invariant
-   && !type)
- return false;
- 
/* Find a dereferencing use.  First follow (single use) ssa
   copy chains for ptr.  */
while (single_imm_use (ptr, &use, &use_stmt)
--- 281,288 
Index: gcc/passes.c
===
*** gcc/passes.c(revision 199199)
--- gcc/passes.c(working copy)
*** init_optimization_passes (void)
*** 1402,1413 
NEXT_PASS (pass_ccp);
/* After CCP we rewrite no longer addressed locals into SSA
 form if possible.  */
NEXT_PASS (pass_forwprop);
/* pass_build_alias is a dummy pass that ensures that we
 execute TODO_rebuild_alias at this point.  */
NEXT_PASS (pass_build_alias);
NEXT_PASS (pass_return_slot);
-   NEXT_PASS (pass_phiprop);
NEXT_PASS (pass_fre);
NEXT_PASS (pass_copy_prop);
NEXT_PASS (pass_merge_phi);
--- 1402,1413 
NEXT_PASS (pass_ccp);
/* After CCP we rewrite no longer addressed locals into SSA
 form if possible.  */
+   NEXT_PASS (pass_phiprop);
NEXT_PASS (pass_forwprop);
/* pass_build_alias is a dummy pass that ensures that we
 execute TODO_rebuild_alias at this point.  */
NEXT_PASS (pass_build_alias);
NEXT_PASS (pass_return_slot);
NEXT_PASS (pass_fre);
NEXT_PASS (pass_copy_prop);
NEXT_PASS (pass_merge_phi);
Index: gcc/testsuite/g++.dg/tree-ssa/pr57380.C
===
*** gcc/testsuite/g++.dg/tree-ssa/pr57380.C (revision 0)
--- gcc/testsuite/g++.dg/tree-ssa/pr57380.C (working copy)
***
*** 0 
--- 1,21 
+ /* { dg-do compile } */
+ /* { dg-options "-O2 -fdump-tree-phiopt1" } */
+ 
+ struct my_array {
+ int data[4];
+ };
+ 
+ const int& my_max(const int& a, const int& b) {
+ return a < b ? b : a;
+ }
+ 
+ int f(my_array a, my_array b) {
+ int res = 0;
+ for (int i = 0; i < 4; ++i) {
+   res += my_max(a.data[i], b.data[i]);
+ }
+ return res;
+ }
+ 
+ /* { dg-final { scan-tree-dump "MAX_EXPR" "phiopt1" } } */
+ /* { dg-final { cleanup-tree-dump "phiopt1" } } */


Re: [patch] Default to --enable-libstdcxx-time=auto

2013-05-23 Thread Jakub Jelinek
On Wed, May 22, 2013 at 02:35:40PM +0200, Jakub Jelinek wrote:
> non-steady clock instead.  Or, have you also considered just using
> for this routine
> #if _GLIBCXX_HAS_SYS_SYSCALL_H
> #include 
> #endif
> 
> #if defined (SYS_clock_gettime) && defined (CLOCK_MONOTONIC)
> syscall (SYS_clock_gettime, CLOCK_MONOTONIC, &tp);
> #endif
> if clock_gettime isn't available, at least on Linux?
> The implementation seems to ignore ENOSYS from clock_gettime, so ignoring it
> even here wouldn't make it worse.

I mean something like completely untested following patch, then it would
be pretty much enabled for all non-prehistoric Linux builds (there is a risk
of it returning garbage on 2.4.x and earlier kernels, if you compile it on
something that defines __NR_clock_gettime in their headers, but the exact
same risk is if you do the same with --enable-libstdcxx-time=rt
(clock_gettime wrapper in glibc will return -1/ENOSYS in that case, so will
the syscall, but chrono.cc seems to ignore that return value)).
2.6+ kernels (2004-ish and later or so) should support CLOCK_MONOTONIC just
fine.  Of course, there is a possibility of fallback, at least for the
clock_gettime/syscall CLOCK_RUNTIME or gettimeofday, if they fail, fall
through into the time case, and for CLOCK_MONOTONIC perhaps just lie and
return time as well, shouldn't really affect almost anybody.

Still, the ABI question is there, would we want to apply to 4.8.1 (can we
get agreement on that RSN, this is pretty much the only blocker for 4.8.1
rc2 right now) and, would we export that symbol as @@GLIBCXX_3.4.18 (with
all trunk @@GLIBCXX_3.4.18 symbols moved to 3.4.19) and add @GLIBCXX_3.4.17
alias for backwards compatibility with those that configured with
--enable-libstdcxx-time=rt ?

--- libstdc++-v3/src/c++11/chrono.cc.jj 2013-03-16 08:07:57.0 +0100
+++ libstdc++-v3/src/c++11/chrono.cc2013-05-23 12:08:04.165686015 +0200
@@ -32,6 +32,9 @@
  defined(_GLIBCXX_USE_GETTIMEOFDAY)
 #include 
 #endif
+#ifdef _GLIBCXX_USE_CLOCK_MONOTONIC_SYSCALL
+#include 
+#endif
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
@@ -47,7 +50,11 @@ namespace std _GLIBCXX_VISIBILITY(defaul
 #ifdef _GLIBCXX_USE_CLOCK_REALTIME
   timespec tp;
   // -EINVAL, -EFAULT
+#ifdef _GLIBCXX_USE_CLOCK_REALTIME_SYSCALL
+  syscall(SYS_clock_gettime, CLOCK_REALTIME, &tp);
+#else
   clock_gettime(CLOCK_REALTIME, &tp);
+#endif
   return time_point(duration(chrono::seconds(tp.tv_sec)
 + chrono::nanoseconds(tp.tv_nsec)));
 #elif defined(_GLIBCXX_USE_GETTIMEOFDAY)
@@ -70,7 +77,11 @@ namespace std _GLIBCXX_VISIBILITY(defaul
 {
   timespec tp;
   // -EINVAL, -EFAULT
+#ifdef _GLIBCXX_USE_CLOCK_REALTIME_SYSCALL
+  syscall(SYS_clock_gettime, CLOCK_MONOTONIC, &tp);
+#else
   clock_gettime(CLOCK_MONOTONIC, &tp);
+#endif
   return time_point(duration(chrono::seconds(tp.tv_sec)
 + chrono::nanoseconds(tp.tv_nsec)));
 }
--- libstdc++-v3/acinclude.m4.jj2013-04-10 08:32:08.0 +0200
+++ libstdc++-v3/acinclude.m4   2013-05-23 12:03:29.626014601 +0200
@@ -1274,6 +1274,28 @@ AC_DEFUN([GLIBCXX_ENABLE_LIBSTDCXX_TIME]
 fi
   fi
 
+  if test x"$ac_has_clock_monotonic" != x"yes"; then
+AC_MSG_CHECKING([for clock_gettime syscall])
+AC_TRY_COMPILE(
+  [#include 
+   #include 
+   #include 
+  ],
+  [#if _POSIX_TIMERS > 0 && defined(_POSIX_MONOTONIC_CLOCK)
+   timespec tp;
+   #endif
+   syscall(SYS_clock_gettime, CLOCK_MONOTONIC, &tp);
+   syscall(SYS_clock_gettime, CLOCK_REALTIME, &tp);
+  ], [ac_has_clock_monotonic_syscall=yes], 
[ac_has_clock_monotonic_syscall=no])
+AC_MSG_RESULT($ac_has_clock_monotonic_syscall)
+if test x"$ac_has_clock_monotonic_syscall" = x"yes"; then
+  AC_DEFINE(_GLIBCXX_USE_CLOCK_MONOTONIC_SYSCALL, 1,
+  [ Defined if clock_gettime syscall has monotonic clock support. ])
+  ac_has_clock_monotonic=yes
+  ac_has_clock_realtime=yes
+fi
+  fi
+
   if test x"$ac_has_clock_monotonic" = x"yes"; then
 AC_DEFINE(_GLIBCXX_USE_CLOCK_MONOTONIC, 1,
   [ Defined if clock_gettime has monotonic clock support. ])


Jakub


Re: [patch] Default to --enable-libstdcxx-time=auto

2013-05-23 Thread Rainer Orth
Rainer Orth  writes:

> Jonathan Wakely  writes:
>
>> This alters the configure script to enable C++11 thread library
>> features based on targets that are known to support the features,
>> rather than based on link tests which are disabled by default.  With
>> Glibc 2.17 this enables a nanosecond resolution std::system_clock in
>> the default configuration, yay!
>>
>> I've tested this on two versions of Fedora and Debian, but would be
>> grateful for test results on Solaris, Cygwin and BSD targets, and for
>> cross-compilers to any of those targets.
>
> Apart from the abi_check failure already reported, I get the following
> testsuite regressions on Solaris 10/x86:
>
> FAIL: 30_threads/async/54297.cc (test for excess errors)
> WARNING: 30_threads/async/54297.cc compilation failed to produce executable
> FAIL: 30_threads/condition_variable_any/53830.cc (test for excess errors)
> WARNING: 30_threads/condition_variable_any/53830.cc compilation failed to 
> produ
> e executable
> FAIL: 30_threads/this_thread/3.cc (test for excess errors)
> WARNING: 30_threads/this_thread/3.cc compilation failed to produce executable
> FAIL: 30_threads/this_thread/4.cc (test for excess errors)
> WARNING: 30_threads/this_thread/4.cc compilation failed to produce executable
> FAIL: 30_threads/thread/native_handle/cancel.cc (test for excess errors)
> WARNING: 30_threads/thread/native_handle/cancel.cc compilation failed to 
> produc
> e executable
>
> All of them have the same root cause:
>
> Excess errors:
> Undefined   first referenced
>  symbol in file
> nanosleep   /var/tmp//ccQhmiwd.o  (symbol belongs to 
> implicit dependency /lib/librt.so.1)
> ld: fatal: symbol referencing errors. No output written to ./54297.exe
> collect2: error: ld returned 1 exit status
>
> It seems that now every single C++ program needs to be linked with -lrt,
> not only libstdc++.so.  This will also happen on Solaris 9 (bootstrap
> still running), while on Solaris 11 nanosleep and the others were
> integrated into libc.so.1.
>
> Speaking of Solaris 9, there's another caveat: unlike Solaris 10 and up,
> CLOCK_MONOTONIC isn't defined, while the equivalent non-standard
> CLOCK_HIGHRES is.  Instead of handling this in
> libstdc++-v3/src/c++11/chrono.cc directly, I've chosen the following
> route which allows libstdc++ to build on Solaris 9:
>
> 2013-05-22  Rainer Orth  
>
>   * config/os/solaris/solaris2.9/os_defines.h [!CLOCK_MONOTONIC]
>   (CLOCK_MONOTONIC): Define.

The Solaris 9 (i386-pc-solaris2.9) bootstrap has now completed
successfully, so the patch above seems to be sound.  Ok for mainline?

Astonishingly, the only failure I see is abi_check.  The nanosleep error
above seems to be linker version dependent.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH, PR 57347] Do not create aggregate jump functions for bit-fields

2013-05-23 Thread Richard Biener
On Wed, May 22, 2013 at 6:23 PM, Martin Jambor  wrote:
> Hi,
>
> I have not intended aggregate jump functions to work with bit-fields
> but apparently forgot to include the test to ignore them.  PR 57347
> testcase gives a good example why they need to be avoided.  If we ever
> decide to optimize for them too (and not just in IPA land), they
> should be lowered earlier and jump functions can then be built for the
> stores to the representatives.
>
> The following patch disables their generation.  It passes bootstrap
> and testing on x8664-linux on trunk, the same for the 4.8 branch is
> currently underway.  OK for trunk and for the branch if it passes?

Ok.

Thanks,
Richard.

> Thanks,
>
> Martin
>
>
> 2013-05-21  Martin Jambor  
>
> PR middle-end/57347
> * tree.h (contains_bitfld_component_ref_p): Declare.
> * tree-sra.c (contains_bitfld_comp_ref_p): Move...
> * tree.c (contains_bitfld_component_ref_p): ...here.  Adjust its 
> caller.
> * ipa-prop.c (determine_known_aggregate_parts): Check that LHS does
> not access a bit-field.  Assert all final offsets are byte-aligned.
>
> testsuite/
> * gcc.dg/ipa/pr57347.c: New test.
>
> Index: src/gcc/ipa-prop.c
> ===
> --- src.orig/gcc/ipa-prop.c
> +++ src/gcc/ipa-prop.c
> @@ -1327,7 +1327,9 @@ determine_known_aggregate_parts (gimple
>
>lhs = gimple_assign_lhs (stmt);
>rhs = gimple_assign_rhs1 (stmt);
> -  if (!is_gimple_reg_type (rhs))
> +  if (!is_gimple_reg_type (rhs)
> + || TREE_CODE (lhs) == BIT_FIELD_REF
> + || contains_bitfld_component_ref_p (lhs))
> break;
>
>lhs_base = get_ref_base_and_extent (lhs, &lhs_offset, &lhs_size,
> @@ -1418,6 +1420,7 @@ determine_known_aggregate_parts (gimple
> {
>   struct ipa_agg_jf_item item;
>   item.offset = list->offset - arg_offset;
> + gcc_assert ((item.offset % BITS_PER_UNIT) == 0);
>   item.value = unshare_expr_without_location (list->constant);
>   jfunc->agg.items->quick_push (item);
> }
> Index: src/gcc/testsuite/gcc.dg/ipa/pr57347.c
> ===
> --- /dev/null
> +++ src/gcc/testsuite/gcc.dg/ipa/pr57347.c
> @@ -0,0 +1,27 @@
> +/* { dg-do run } */
> +/* { dg-options "-O3" } */
> +
> +struct S1 { int f0; int f1 : 10; int f2 : 13; };
> +int i;
> +int *j = &i;
> +
> +static void
> +foo (struct S1 s)
> +{
> +  int *p;
> +  int l[88];
> +  int **pp = &p;
> +  *pp = &l[1];
> +  l[0] = 1;
> +  *j = 1 && s.f2;
> +}
> +
> +int
> +main ()
> +{
> +  struct S1 s = { 0, 0, 1 };
> +  foo (s);
> +  if (i != 1)
> +__builtin_abort ();
> +  return 0;
> +}
> Index: src/gcc/tree-sra.c
> ===
> --- src.orig/gcc/tree-sra.c
> +++ src/gcc/tree-sra.c
> @@ -2998,23 +2998,6 @@ get_repl_default_def_ssa_name (struct ac
>return get_or_create_ssa_default_def (cfun, racc->replacement_decl);
>  }
>
> -/* Return true if REF has a COMPONENT_REF with a bit-field field declaration
> -   somewhere in it.  */
> -
> -static inline bool
> -contains_bitfld_comp_ref_p (const_tree ref)
> -{
> -  while (handled_component_p (ref))
> -{
> -  if (TREE_CODE (ref) == COMPONENT_REF
> -  && DECL_BIT_FIELD (TREE_OPERAND (ref, 1)))
> -return true;
> -  ref = TREE_OPERAND (ref, 0);
> -}
> -
> -  return false;
> -}
> -
>  /* Return true if REF has an VIEW_CONVERT_EXPR or a COMPONENT_REF with a
> bit-field field declaration somewhere in it.  */
>
> @@ -3110,7 +3093,7 @@ sra_modify_assign (gimple *stmt, gimple_
>  ???  This should move to fold_stmt which we simply should
>  call after building a VIEW_CONVERT_EXPR here.  */
>   if (AGGREGATE_TYPE_P (TREE_TYPE (lhs))
> - && !contains_bitfld_comp_ref_p (lhs))
> + && !contains_bitfld_component_ref_p (lhs))
> {
>   lhs = build_ref_for_model (loc, lhs, 0, racc, gsi, false);
>   gimple_assign_set_lhs (*stmt, lhs);
> Index: src/gcc/tree.c
> ===
> --- src.orig/gcc/tree.c
> +++ src/gcc/tree.c
> @@ -11785,4 +11785,21 @@ warn_deprecated_use (tree node, tree att
>  }
>  }
>
> +/* Return true if REF has a COMPONENT_REF with a bit-field field declaration
> +   somewhere in it.  */
> +
> +bool
> +contains_bitfld_component_ref_p (const_tree ref)
> +{
> +  while (handled_component_p (ref))
> +{
> +  if (TREE_CODE (ref) == COMPONENT_REF
> +  && DECL_BIT_FIELD (TREE_OPERAND (ref, 1)))
> +return true;
> +  ref = TREE_OPERAND (ref, 0);
> +}
> +
> +  return false;
> +}
> +
>  #include "gt-tree.h"
> Index: src/gcc/tree.h
> ===
> --- src.orig/gcc/tree.h
> +++ src/gcc/tree.h
> @@ -59

Re: [AArch64] Fix possible wrong code generation when comparing DImode values.

2013-05-23 Thread Richard Earnshaw

On 23/05/13 09:17, James Greenhalgh wrote:


Hi,

With the aarch64_cmdi patterns a bug was introduced. While the
unsplit versions of these patterns, which operate in the
SIMD register set, do not clobber CC_REGNUM, the split versions, which
operate in the general purpose register set, do clobber CC_REGNUM.

This causes a problem if scheduling rearranges the unsplit version
of these instructions. For example, if we have:

   aarch64_cmeqdi_unsplit
   set_cc_flags
   jump

Then we could schedule as

   set_cc_flags
   aarch64_cmeqdi_unsplit
   jump

Because the unsplit version does not clobber CC_REGNUM.

If we now decide to split we get:

   set_cc_flags
   aarch64_cmeqdi_set_cc_flags
   aarch64_cmeqdi_use_cc_flags
   jump

And the jump uses the wrong value for cc_flags.

We fix this problem by adding the clobber of CC_REGNUM to
the aarch64_cmdi patterns. This may restrict the
scheduling opportunities available, but should prevent
incorrect code generation.

Tested on aarch64-none-linux-gnu, aarch64-none-elf with no
regressions. The bug manifest itself in the libstdc++ testsuite,
so I've double checked there to ensure that the bug has cleared.

Thanks,
James

---
gcc/

2013-05-17  James Greenhalgh  

* config/aarch64/aarch64-simd.md
(aarch64_cmdi): Add clobber of CC_REGNUM to unsplit pattern.




OK.

R.




Re: [PATCH] Do not allow non-top-level BIT_FIELD_REFs, IMAGPART_EXPRs or REALPART_EXPRs

2013-05-23 Thread Richard Biener
On Thu, 23 May 2013, Eric Botcazou wrote:

> > earlier this week I asked on IRC whether we could have non-top-level
> > BIT_FIELD_REFs and Richi said that we could.  However, when I later
> > looked at SRA code, quite apparently it is not designed to handle
> > non-top-level BIT_FIELD_REFs, IMAGPART_EXPRs or REALPART_EXPRs.  So in
> > order to test whether that assumption is OK, I added the following
> > into the gimple verifier and ran bootstrap and testsuite of all
> > languages including Ada and ObjC++ on x86_64.  It survived, which
> > makes me wondering whether we do not want it in trunk.
> 
> This looks plausible to me, but I think that you ought to verify the real 
> assumption instead, which is that the type of the 3 nodes is always scalar.
> The non-toplevelness of the nodes is merely a consequence of this property.

Yeah.  But please put the verification into tree-cfg.c:verify_expr
instead.

Thanks,
Richard.


Re: [PATCH] Do not allow non-top-level BIT_FIELD_REFs, IMAGPART_EXPRs or REALPART_EXPRs

2013-05-23 Thread Eric Botcazou
> earlier this week I asked on IRC whether we could have non-top-level
> BIT_FIELD_REFs and Richi said that we could.  However, when I later
> looked at SRA code, quite apparently it is not designed to handle
> non-top-level BIT_FIELD_REFs, IMAGPART_EXPRs or REALPART_EXPRs.  So in
> order to test whether that assumption is OK, I added the following
> into the gimple verifier and ran bootstrap and testsuite of all
> languages including Ada and ObjC++ on x86_64.  It survived, which
> makes me wondering whether we do not want it in trunk.

This looks plausible to me, but I think that you ought to verify the real 
assumption instead, which is that the type of the 3 nodes is always scalar.
The non-toplevelness of the nodes is merely a consequence of this property.

-- 
Eric Botcazou


RE: [PATCH][gensupport] Add optional attributes field to define_cond_exec

2013-05-23 Thread Kyrylo Tkachov
Hi Richard,

> No, define_subst works across patterns, keyed by attributes.  Exactly
> like
> cond_exec, really.
> 
> But what you ought to be able to do right now is
> 
> (define_subst "ds_predicable"
>   [(match_operand 0)]
>   ""
>   [(cond_exec (blah) (match_dup 0))])
> 
> (define_subst_attr "ds_predicable_enabled" "ds_predicable" "no" "yes"0
> 
> (define_insn "blah"
>   [(blah)]
>   ""
>   "@
>blah
>blah"
>   [(set_attr "ds_predicable" "yes")
>(set_attr "ds_predicated" "")])

What would be the function of (set_attr "ds_predicable" "yes") ?
Doesn't the use of  already trigger the substitution?

> 
> At which point you can define "enabled" in terms of ds_predicated plus
> whatever.
> 
> With a small bit of work we ought to be able to move that ds_predicated
> attribute to the define_subst itself, so that you don't have to
> replicate that
> set_attr line N times.

That would be nice. So we would have to use define_subst instead of
define_cond_exec
to generate the cond_exec patterns. But I'd like to keep using the
"predicable" attribute
the way it's used now to mark patterns for cond_exec'ednes.

So you'd recommend changing the define_subst machinery to handle that
ds_predicated attribute?


  I think that's more or less what you were
> suggesting
> with your cond_exec extension, yes?

Pretty much, yes. Thanks for the explanation.

> 
> 
> 
> r~






Re: [PATCH] Fix store_split_bit_field (PR middle-end/57344)

2013-05-23 Thread Richard Biener
On Tue, 21 May 2013, Jakub Jelinek wrote:

> Hi!
> 
> My PR52979 patch introduced following regression in store_split_bit_field.
> If op0 is a REG or SUBREG, then the code was assuming that unit is still
> BITS_PER_WORD, which isn't the case after PR52979.  This patch changes
> those spots to no longer assume that (second and third hunks).
> The first hunk is just an optimization, I don't see how could we have
> data races on (reg) or (subreg (reg) ...), so by allowing to change whole
> words on those rather than limiting those because of too small bitregion_end
> we can generate smaller RTL out of the expansion (usually combine will
> optimize those back, but for e.g. -O0 it wouldn't).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8/4.7?
> 
> 2013-05-21  Jakub Jelinek  
> 
>   PR middle-end/57344
>   * expmed.c (store_split_bit_field): If op0 is a REG or
>   SUBREG of a REG, don't lower unit.  Handle unit not being
>   always BITS_PER_WORD.
> 
>   * gcc.c-torture/execute/pr57344-1.c: New test.
>   * gcc.c-torture/execute/pr57344-2.c: New test.
>   * gcc.c-torture/execute/pr57344-3.c: New test.
>   * gcc.c-torture/execute/pr57344-4.c: New test.
> 
> --- gcc/expmed.c.jj   2013-05-14 10:54:58.0 +0200
> +++ gcc/expmed.c  2013-05-21 11:59:00.275839242 +0200
> @@ -1094,10 +1094,14 @@ store_split_bit_field (rtx op0, unsigned
>thispos = (bitpos + bitsdone) % unit;
>  
>/* When region of bytes we can touch is restricted, decrease
> -  UNIT close to the end of the region as needed.  */
> +  UNIT close to the end of the region as needed.  If op0 is a REG
> +  or SUBREG of REG, don't do this, as there can't be data races
> +  on a register and we can expand shorter code in some cases.  */
>if (bitregion_end
> && unit > BITS_PER_UNIT
> -   && bitpos + bitsdone - thispos + unit > bitregion_end + 1)
> +   && bitpos + bitsdone - thispos + unit > bitregion_end + 1
> +   && !REG_P (op0)
> +   && (GET_CODE (op0) != SUBREG || !REG_P (SUBREG_REG (op0

I wonder if !REG_P (SUBREG_REG (op0)) can happen - but I guess better
be safe than sorry.  The rest of the patch looks ok to me.

Thus, ok.

Thanks,
Richard.

>   {
> unit = unit / 2;
> continue;
> @@ -1147,14 +1151,15 @@ store_split_bit_field (rtx op0, unsigned
>the current word starting from the base register.  */
>if (GET_CODE (op0) == SUBREG)
>   {
> -   int word_offset = (SUBREG_BYTE (op0) / UNITS_PER_WORD) + offset;
> +   int word_offset = (SUBREG_BYTE (op0) / UNITS_PER_WORD)
> + + (offset * unit / BITS_PER_WORD);
> enum machine_mode sub_mode = GET_MODE (SUBREG_REG (op0));
> if (sub_mode != BLKmode && GET_MODE_SIZE (sub_mode) < UNITS_PER_WORD)
>   word = word_offset ? const0_rtx : op0;
> else
>   word = operand_subword_force (SUBREG_REG (op0), word_offset,
> GET_MODE (SUBREG_REG (op0)));
> -   offset = 0;
> +   offset &= BITS_PER_WORD / unit - 1;
>   }
>else if (REG_P (op0))
>   {
> @@ -1162,8 +1167,9 @@ store_split_bit_field (rtx op0, unsigned
> if (op0_mode != BLKmode && GET_MODE_SIZE (op0_mode) < UNITS_PER_WORD)
>   word = offset ? const0_rtx : op0;
> else
> - word = operand_subword_force (op0, offset, GET_MODE (op0));
> -   offset = 0;
> + word = operand_subword_force (op0, offset * unit / BITS_PER_WORD,
> +   GET_MODE (op0));
> +   offset &= BITS_PER_WORD / unit - 1;
>   }
>else
>   word = op0;
> --- gcc/testsuite/gcc.c-torture/execute/pr57344-1.c.jj2013-05-21 
> 11:38:07.828956569 +0200
> +++ gcc/testsuite/gcc.c-torture/execute/pr57344-1.c   2013-05-21 
> 11:58:21.242061844 +0200
> @@ -0,0 +1,32 @@
> +/* PR middle-end/57344 */
> +
> +struct __attribute__((packed)) S
> +{
> +  int a : 11;
> +#if __SIZEOF_INT__ * __CHAR_BIT__ >= 32
> +  int b : 22;
> +#else
> +  int b : 13;
> +#endif
> +  char c;
> +  int : 0;
> +} s[2];
> +int i;
> +
> +__attribute__((noinline, noclone)) void
> +foo (int x)
> +{
> +  if (x != -3161)
> +__builtin_abort ();
> +  asm volatile ("" : : : "memory");
> +}
> +
> +int
> +main ()
> +{
> +  struct S t = { 0, -3161L };
> +  s[1] = t;
> +  for (; i < 1; i++)
> +foo (s[1].b);
> +  return 0;
> +}
> --- gcc/testsuite/gcc.c-torture/execute/pr57344-2.c.jj2013-05-21 
> 11:38:13.536922710 +0200
> +++ gcc/testsuite/gcc.c-torture/execute/pr57344-2.c   2013-05-21 
> 11:58:36.119977546 +0200
> @@ -0,0 +1,32 @@
> +/* PR middle-end/57344 */
> +
> +struct __attribute__((packed)) S
> +{
> +  int a : 27;
> +#if __SIZEOF_INT__ * __CHAR_BIT__ >= 32
> +  int b : 22;
> +#else
> +  int b : 13;
> +#endif
> +  char c;
> +  int : 0;
> +} s[2];
> +int i;
> +
> +__attribute__((noinline, noclone)) void
> +foo (int x)
> +{
> +  if (x != -3161

Re: [PATCH] Fix PR57381

2013-05-23 Thread Bin.Cheng
On Thu, May 23, 2013 at 4:29 PM, Richard Biener  wrote:
>
> This is another case of ADDR_EXPRs not comparing equal from
> operand_equal_p if they contain volatile field references.
> The issue is that we should compare the FIELD_DECLs with
> retaining OEP_CONSTANT_ADDRESS_OF (or maybe not set TREE_SIDE_EFFECTS
> on them - but that's a bigger change).
>
> Bootstrap / regtest pending on x86_64-unknown-linux-gnu.
>
> Richard.
>
> 2013-05-23  Richard Biener  
>
> PR middle-end/57380
Typo?

> * fold-const.c (operand_equal_p): Compare FIELD_DECLs with
> OEP_CONSTANT_ADDRESS_OF retained.
>
> * gcc.dg/torture/pr57381.c: New testcase.
>
> Index: gcc/fold-const.c
> ===
> *** gcc/fold-const.c(revision 199199)
> --- gcc/fold-const.c(working copy)
> *** operand_equal_p (const_tree arg0, const_
> *** 2664,2673 
> case COMPONENT_REF:
>   /* Handle operand 2 the same as for ARRAY_REF.  Operand 0
>  may be NULL when we're called to compare MEM_EXPRs.  */
> ! if (!OP_SAME_WITH_NULL (0))
> return 0;
>   flags &= ~OEP_CONSTANT_ADDRESS_OF;
> ! return OP_SAME (1) && OP_SAME_WITH_NULL (2);
>
> case BIT_FIELD_REF:
>   if (!OP_SAME (0))
> --- 2664,2673 
> case COMPONENT_REF:
>   /* Handle operand 2 the same as for ARRAY_REF.  Operand 0
>  may be NULL when we're called to compare MEM_EXPRs.  */
> ! if (!OP_SAME_WITH_NULL (0) || !OP_SAME (1))
> return 0;
>   flags &= ~OEP_CONSTANT_ADDRESS_OF;
> ! return OP_SAME_WITH_NULL (2);
>
> case BIT_FIELD_REF:
>   if (!OP_SAME (0))
> Index: gcc/testsuite/gcc.dg/torture/pr57381.c
> ===
> *** gcc/testsuite/gcc.dg/torture/pr57381.c  (revision 0)
> --- gcc/testsuite/gcc.dg/torture/pr57381.c  (working copy)
> ***
> *** 0 
> --- 1,24 
> + /* { dg-do compile } */
> +
> + struct S0 { int  f0, f1, f2; };
> +
> + struct S1 {
> + int  f0;
> + volatile struct S0 f2;
> + };
> +
> + static struct S1 s = {0x47BED265,{0x06D4EB3E,5,0U}};
> +
> + int foo(struct S0 p)
> + {
> +   for (s.f2.f2 = 0; (s.f2.f2 <= 12); s.f2.f2++)
> + {
> +   volatile int *l_61[5][2][2] = 
> {{{&s.f2.f0,&s.f2.f0},{&s.f2.f0,&s.f2.f0}},{{&s.f2.f0,&s.f2.f0},{&s.f2.f0,&s.f2.f0}},{{&s.f2.f0,(void*)0},{&s.f2.f0,&s.f2.f0}},{{&s.f2.f0,&s.f2.f0},{&s.f2.f0,&s.f2.f0}},{{&s.f2.f0,&s.f2.f0},{(void*)0,&s.f2.f0}}};
> +
> +   volatile int **l_68 = &l_61[0][0][1];
> +   volatile int *l_76 = &s.f2.f0;
> +   (*l_68) = l_61[0][0][0];
> +   if ((*l_76 = (p.f2 % 5))) ;
> + }
> +   return p.f0;
> + }



--
Best Regards.


[PATCH] Fix store_split_bit_field (PR middle-end/57344)

2013-05-23 Thread Jakub Jelinek
Hi!

My PR52979 patch introduced following regression in store_split_bit_field.
If op0 is a REG or SUBREG, then the code was assuming that unit is still
BITS_PER_WORD, which isn't the case after PR52979.  This patch changes
those spots to no longer assume that (second and third hunks).
The first hunk is just an optimization, I don't see how could we have
data races on (reg) or (subreg (reg) ...), so by allowing to change whole
words on those rather than limiting those because of too small bitregion_end
we can generate smaller RTL out of the expansion (usually combine will
optimize those back, but for e.g. -O0 it wouldn't).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8/4.7?

2013-05-21  Jakub Jelinek  

PR middle-end/57344
* expmed.c (store_split_bit_field): If op0 is a REG or
SUBREG of a REG, don't lower unit.  Handle unit not being
always BITS_PER_WORD.

* gcc.c-torture/execute/pr57344-1.c: New test.
* gcc.c-torture/execute/pr57344-2.c: New test.
* gcc.c-torture/execute/pr57344-3.c: New test.
* gcc.c-torture/execute/pr57344-4.c: New test.

--- gcc/expmed.c.jj 2013-05-14 10:54:58.0 +0200
+++ gcc/expmed.c2013-05-21 11:59:00.275839242 +0200
@@ -1094,10 +1094,14 @@ store_split_bit_field (rtx op0, unsigned
   thispos = (bitpos + bitsdone) % unit;
 
   /* When region of bytes we can touch is restricted, decrease
-UNIT close to the end of the region as needed.  */
+UNIT close to the end of the region as needed.  If op0 is a REG
+or SUBREG of REG, don't do this, as there can't be data races
+on a register and we can expand shorter code in some cases.  */
   if (bitregion_end
  && unit > BITS_PER_UNIT
- && bitpos + bitsdone - thispos + unit > bitregion_end + 1)
+ && bitpos + bitsdone - thispos + unit > bitregion_end + 1
+ && !REG_P (op0)
+ && (GET_CODE (op0) != SUBREG || !REG_P (SUBREG_REG (op0
{
  unit = unit / 2;
  continue;
@@ -1147,14 +1151,15 @@ store_split_bit_field (rtx op0, unsigned
 the current word starting from the base register.  */
   if (GET_CODE (op0) == SUBREG)
{
- int word_offset = (SUBREG_BYTE (op0) / UNITS_PER_WORD) + offset;
+ int word_offset = (SUBREG_BYTE (op0) / UNITS_PER_WORD)
+   + (offset * unit / BITS_PER_WORD);
  enum machine_mode sub_mode = GET_MODE (SUBREG_REG (op0));
  if (sub_mode != BLKmode && GET_MODE_SIZE (sub_mode) < UNITS_PER_WORD)
word = word_offset ? const0_rtx : op0;
  else
word = operand_subword_force (SUBREG_REG (op0), word_offset,
  GET_MODE (SUBREG_REG (op0)));
- offset = 0;
+ offset &= BITS_PER_WORD / unit - 1;
}
   else if (REG_P (op0))
{
@@ -1162,8 +1167,9 @@ store_split_bit_field (rtx op0, unsigned
  if (op0_mode != BLKmode && GET_MODE_SIZE (op0_mode) < UNITS_PER_WORD)
word = offset ? const0_rtx : op0;
  else
-   word = operand_subword_force (op0, offset, GET_MODE (op0));
- offset = 0;
+   word = operand_subword_force (op0, offset * unit / BITS_PER_WORD,
+ GET_MODE (op0));
+ offset &= BITS_PER_WORD / unit - 1;
}
   else
word = op0;
--- gcc/testsuite/gcc.c-torture/execute/pr57344-1.c.jj  2013-05-21 
11:38:07.828956569 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr57344-1.c 2013-05-21 
11:58:21.242061844 +0200
@@ -0,0 +1,32 @@
+/* PR middle-end/57344 */
+
+struct __attribute__((packed)) S
+{
+  int a : 11;
+#if __SIZEOF_INT__ * __CHAR_BIT__ >= 32
+  int b : 22;
+#else
+  int b : 13;
+#endif
+  char c;
+  int : 0;
+} s[2];
+int i;
+
+__attribute__((noinline, noclone)) void
+foo (int x)
+{
+  if (x != -3161)
+__builtin_abort ();
+  asm volatile ("" : : : "memory");
+}
+
+int
+main ()
+{
+  struct S t = { 0, -3161L };
+  s[1] = t;
+  for (; i < 1; i++)
+foo (s[1].b);
+  return 0;
+}
--- gcc/testsuite/gcc.c-torture/execute/pr57344-2.c.jj  2013-05-21 
11:38:13.536922710 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr57344-2.c 2013-05-21 
11:58:36.119977546 +0200
@@ -0,0 +1,32 @@
+/* PR middle-end/57344 */
+
+struct __attribute__((packed)) S
+{
+  int a : 27;
+#if __SIZEOF_INT__ * __CHAR_BIT__ >= 32
+  int b : 22;
+#else
+  int b : 13;
+#endif
+  char c;
+  int : 0;
+} s[2];
+int i;
+
+__attribute__((noinline, noclone)) void
+foo (int x)
+{
+  if (x != -3161)
+__builtin_abort ();
+  asm volatile ("" : : : "memory");
+}
+
+int
+main ()
+{
+  struct S t = { 0, -3161L };
+  s[1] = t;
+  for (; i < 1; i++)
+foo (s[1].b);
+  return 0;
+}
--- gcc/testsuite/gcc.c-torture/execute/pr57344-3.c.jj  2013-05-21 
11:43:11.157231567 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr57344-3.c 2013-05-21 
11:55:33.220012554 +0200
@@ -0,0 +1,28 @@

[PATCH] Fix PR57381

2013-05-23 Thread Richard Biener

This is another case of ADDR_EXPRs not comparing equal from
operand_equal_p if they contain volatile field references.
The issue is that we should compare the FIELD_DECLs with
retaining OEP_CONSTANT_ADDRESS_OF (or maybe not set TREE_SIDE_EFFECTS
on them - but that's a bigger change).

Bootstrap / regtest pending on x86_64-unknown-linux-gnu.

Richard.

2013-05-23  Richard Biener  

PR middle-end/57380
* fold-const.c (operand_equal_p): Compare FIELD_DECLs with
OEP_CONSTANT_ADDRESS_OF retained.

* gcc.dg/torture/pr57381.c: New testcase.

Index: gcc/fold-const.c
===
*** gcc/fold-const.c(revision 199199)
--- gcc/fold-const.c(working copy)
*** operand_equal_p (const_tree arg0, const_
*** 2664,2673 
case COMPONENT_REF:
  /* Handle operand 2 the same as for ARRAY_REF.  Operand 0
 may be NULL when we're called to compare MEM_EXPRs.  */
! if (!OP_SAME_WITH_NULL (0))
return 0;
  flags &= ~OEP_CONSTANT_ADDRESS_OF;
! return OP_SAME (1) && OP_SAME_WITH_NULL (2);
  
case BIT_FIELD_REF:
  if (!OP_SAME (0))
--- 2664,2673 
case COMPONENT_REF:
  /* Handle operand 2 the same as for ARRAY_REF.  Operand 0
 may be NULL when we're called to compare MEM_EXPRs.  */
! if (!OP_SAME_WITH_NULL (0) || !OP_SAME (1))
return 0;
  flags &= ~OEP_CONSTANT_ADDRESS_OF;
! return OP_SAME_WITH_NULL (2);
  
case BIT_FIELD_REF:
  if (!OP_SAME (0))
Index: gcc/testsuite/gcc.dg/torture/pr57381.c
===
*** gcc/testsuite/gcc.dg/torture/pr57381.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr57381.c  (working copy)
***
*** 0 
--- 1,24 
+ /* { dg-do compile } */
+ 
+ struct S0 { int  f0, f1, f2; };
+ 
+ struct S1 {
+ int  f0;
+ volatile struct S0 f2;
+ };
+ 
+ static struct S1 s = {0x47BED265,{0x06D4EB3E,5,0U}};
+ 
+ int foo(struct S0 p)
+ {
+   for (s.f2.f2 = 0; (s.f2.f2 <= 12); s.f2.f2++)
+ {
+   volatile int *l_61[5][2][2] = 
{{{&s.f2.f0,&s.f2.f0},{&s.f2.f0,&s.f2.f0}},{{&s.f2.f0,&s.f2.f0},{&s.f2.f0,&s.f2.f0}},{{&s.f2.f0,(void*)0},{&s.f2.f0,&s.f2.f0}},{{&s.f2.f0,&s.f2.f0},{&s.f2.f0,&s.f2.f0}},{{&s.f2.f0,&s.f2.f0},{(void*)0,&s.f2.f0}}};
+ 
+   volatile int **l_68 = &l_61[0][0][1];
+   volatile int *l_76 = &s.f2.f0;
+   (*l_68) = l_61[0][0][0];
+   if ((*l_76 = (p.f2 % 5))) ;
+ }
+   return p.f0;
+ }


Re: [PATCH] Fix PR57341

2013-05-23 Thread Jakub Jelinek
On Thu, May 23, 2013 at 09:58:50AM +0200, Richard Biener wrote:
> 2013-05-23  Richard Biener  
> 
>   PR rtl-optimization/57341
>   * ira.c (validate_equiv_mem_from_store): Use anti_dependence
>   instead of true_dependence.
> 
>   * gcc.dg/torture/pr57341.c: New testcase.

Ok, thanks.

Jakub


[AArch64] Fix possible wrong code generation when comparing DImode values.

2013-05-23 Thread James Greenhalgh

Hi,

With the aarch64_cmdi patterns a bug was introduced. While the
unsplit versions of these patterns, which operate in the
SIMD register set, do not clobber CC_REGNUM, the split versions, which
operate in the general purpose register set, do clobber CC_REGNUM.

This causes a problem if scheduling rearranges the unsplit version
of these instructions. For example, if we have:

  aarch64_cmeqdi_unsplit
  set_cc_flags
  jump

Then we could schedule as

  set_cc_flags
  aarch64_cmeqdi_unsplit
  jump

Because the unsplit version does not clobber CC_REGNUM.

If we now decide to split we get:

  set_cc_flags
  aarch64_cmeqdi_set_cc_flags
  aarch64_cmeqdi_use_cc_flags
  jump

And the jump uses the wrong value for cc_flags.

We fix this problem by adding the clobber of CC_REGNUM to
the aarch64_cmdi patterns. This may restrict the
scheduling opportunities available, but should prevent
incorrect code generation.

Tested on aarch64-none-linux-gnu, aarch64-none-elf with no
regressions. The bug manifest itself in the libstdc++ testsuite,
so I've double checked there to ensure that the bug has cleared.

Thanks,
James

---
gcc/

2013-05-17  James Greenhalgh  

* config/aarch64/aarch64-simd.md
(aarch64_cmdi): Add clobber of CC_REGNUM to unsplit pattern.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 9069a73..f91cf81 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3280,7 +3280,8 @@
 	  (COMPARISONS:DI
 	(match_operand:DI 1 "register_operand" "w,w,r")
 	(match_operand:DI 2 "aarch64_simd_reg_or_zero" "w,ZDz,r")
-	  )))]
+	  )))
+ (clobber (reg:CC CC_REGNUM))]
   "TARGET_SIMD"
   "@
   cm\t%d0, %d, %d
@@ -3291,15 +3292,7 @@
   happening in the 'w' constraint cases.  */
&& GP_REGNUM_P (REGNO (operands[0]))
&& GP_REGNUM_P (REGNO (operands[1]))"
-  [(set (reg:CC CC_REGNUM)
-(compare:CC
-  (match_dup 1)
-  (match_dup 2)))
-  (set (match_dup 0)
-(neg:DI
-  (COMPARISONS:DI
-	(match_operand 3 "cc_register" "")
-	(const_int 0]
+  [(const_int 0)]
   {
 enum machine_mode mode = SELECT_CC_MODE (, operands[1], operands[2]);
 rtx cc_reg = aarch64_gen_compare_reg (, operands[1], operands[2]);
@@ -3332,7 +3325,8 @@
 	  (UCOMPARISONS:DI
 	(match_operand:DI 1 "register_operand" "w,r")
 	(match_operand:DI 2 "aarch64_simd_reg_or_zero" "w,r")
-	  )))]
+	  )))
+(clobber (reg:CC CC_REGNUM))]
   "TARGET_SIMD"
   "@
   cm\t%d0, %d, %d
@@ -3342,17 +3336,9 @@
   happening in the 'w' constraint cases.  */
&& GP_REGNUM_P (REGNO (operands[0]))
&& GP_REGNUM_P (REGNO (operands[1]))"
-  [(set (reg:CC CC_REGNUM)
-(compare:CC
-  (match_dup 1)
-  (match_dup 2)))
-  (set (match_dup 0)
-(neg:DI
-  (UCOMPARISONS:DI
-	(match_operand 3 "cc_register" "")
-	(const_int 0]
+  [(const_int 0)]
   {
-enum machine_mode mode = SELECT_CC_MODE (, operands[1], operands[2]);
+enum machine_mode mode = CCmode;
 rtx cc_reg = aarch64_gen_compare_reg (, operands[1], operands[2]);
 rtx comparison = gen_rtx_ (mode, operands[1], operands[2]);
 emit_insn (gen_cstoredi_neg (operands[0], comparison, cc_reg));
@@ -3385,7 +3371,8 @@
 	(and:DI
 	  (match_operand:DI 1 "register_operand" "w,r")
 	  (match_operand:DI 2 "register_operand" "w,r"))
-	(const_int 0]
+	(const_int 0
+(clobber (reg:CC CC_REGNUM))]
   "TARGET_SIMD"
   "@
   cmtst\t%d0, %d1, %d2
@@ -3395,16 +3382,7 @@
   happening in the 'w' constraint cases.  */
&& GP_REGNUM_P (REGNO (operands[0]))
&& GP_REGNUM_P (REGNO (operands[1]))"
-   [(set (reg:CC_NZ CC_REGNUM)
-	(compare:CC_NZ
-	 (and:DI (match_dup 1)
-		  (match_dup 2))
-	 (const_int 0)))
-  (set (match_dup 0)
-(neg:DI
-  (ne:DI
-	(match_operand 3 "cc_register" "")
-	(const_int 0]
+  [(const_int 0)]
   {
 rtx and_tree = gen_rtx_AND (DImode, operands[1], operands[2]);
 enum machine_mode mode = SELECT_CC_MODE (NE, and_tree, const0_rtx);

  1   2   >