date:20131015

[PATCH,i386] Enable FMA4 for AMD bdver3

2013-10-15 Thread Gopalasubramanian, Ganesh

Hi 

The below patch enables FMA4 for AMD bdver3 architectures.

"make -k check" passes.

Is it OK for upstream?

Regards
Ganesh


diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index fb5b267..cbb5311 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2013-10-16 Ganesh Gopalasubramanian  
+
+   * config/i386/i386.c (ix86_option_override_internal): Enable FMA4
+   for AMD bdver3.
+
 2013-10-16  Hans-Peter Nilsson  

* config/cris/t-elfmulti (MULTILIB_OPTIONS, MULTILIB_DIRNAMES)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b5796db..c24ce36 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3104,7 +3104,7 @@ ix86_option_override_internal (bool main_args_p,
   {"bdver3", PROCESSOR_BDVER3, CPU_BDVER3,
PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
| PTA_SSE4A | PTA_CX16 | PTA_ABM | PTA_SSSE3 | PTA_SSE4_1
-   | PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX
+   | PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX | PTA_FMA4
| PTA_XOP | PTA_LWP | PTA_BMI | PTA_TBM | PTA_F16C
| PTA_FMA | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE
| PTA_XSAVEOPT | PTA_FSGSBASE},

Re: RFC: Add of type-demotion pass

2013-10-15 Thread Jeff Law


On 07/08/13 11:45, Kai Tietz wrote:

Hello,

These passes are implementing type-demotion (type sinking into
statements) for some gimple statements.  I limitted inital
implementation to the set of multiply, addition, substraction, and
binary-and/or/xor.  Additional this pass adds some rules to sink
type-cast sequences - eg. (int) (short) x; with char as type of x.
This special handing in this pass of such type-sequence
simplification is necessary to avoid too complex cast-sequences by
type-unsigned conversion used by this pass to avoid undefined
overflow behaviour.

I will sent separate patch with some test-cases to demonstrate and
verify operation of this new optimization.  Just one sample I will
cite here to demonstrate operation of type-demotion pass.
So I think we want to come back to this patch and make some decisions 
about how to go forward.


I've been reviewing all the discussions back through last year and I 
think the single biggest thing we need to realize is that there's 
nothing in this patch that *couldn't* be handled by tree-ssa-forwprop 
with a suitable amount of following the use-def chains through casts in 
various transformations and determining when those casts can be safely 
ignored.  However, I don't think extending tree-ssa-forwprop in this way 
is wise.


As I see it, the value in this patch is it allows us to avoid that 
nonsense by essentially reformulating this a problem of moving and 
reformulating the casts in the IL.


I see two primary effects of type sinking.  First and probably the most 
important in my mind is by sinking a cast through its uses the various 
transformations we already perform are more likely to apply *without* 
needing to handle optimizing through typecasts explicitly.


The second primary effect is, given two casts where the first indirectly 
feeds the second (ie, the first feeds some statement, which then feeds 
the second cast), if we're able to sink the first cast, we end up with 
the first cast directly feeding the second cast.  When this occurs one 
of the two casts can often be eliminated.   Sadly, I didn't keep any of 
those test files, but I regularly saw them in GCC bootstraps.


Kai, you mentioned you had more tests, but I never saw those posted.  I 
pulled together a handful of tests from the PRs & discussion threads. 
Some may be duplicates of yours (mine are attached).  If you could post 
yours as well, it'd be helpful.  Not all of mine are helped by this 
patch, but at least two are.


There was some question about what to do with PROMOTE_MODE based 
targets.  On those targets there's a lot of value in getting something 
into word mode and doing everything in word mode.  I think that 
can/should be a separate issue -- ISTM that ought to be handled fairly 
late in the tree optimizers.  I don't see a strong argument for gating 
this patch on catering to PROMOTE_MODE.


Similarly, I know there's a type hoisting patch that's also queued up. 
I think it should be handled separately as well.




--end--

You will notice that by typedemote2 pass the 's += t + 0x12345600;'
expression gets simplified to 's += t + 0;'.

I have added an additional early pass "typedemote1" to this patch for
simple cases types can be easily sunken into statement without
special unsigned-cast for overflow-case.   Jakub asked for it. Tests
have shown that this pass does optimizations in pretty few cases.  As
example in testsuite see for example pr46867.c testcase. The second
pass (I put it behind first vrp pass to avoid testcase-conflicts)
uses 'unsigned'-type casts to avoid undefined overflow behavior. This
pass has much more hits in standard code, but seems to trigger some
regressions in vect pass:

List of regi

FAIL: gcc.dg/vect/slp-cond-3.c scan-tree-dump-times vect "vectorizing
stmts using SLP" 1 FAIL: gcc.dg/vect/vect-reduc-dot-s8b.c -flto
scan-tree-dump-times vect "vect_recog_widen_mult_pattern: detected"
1 FAIL: gcc.dg/vect/slp-cond-3.c -flto  scan-tree-dump-times vect
"vectorizing stmts using SLP" 1 FAIL: gcc.target/i386/rotate-1.c
scan-assembler ro[lr]b
Have you analyzed these failures?  Are they a result of the casts ending 
up in inconvenient locations or is there something more subtle going on 
here?  We certainly need to understand precisely what's going on with 
these regressions before we can move forward.


As for the patch itself.  Obviously it needs updates as a result of 
David's work on the pass manager class, and other changes on the trunk 
since July.



tree-ssa-te.c needs a file block comment which describes what the patch 
is trying to accomplish.


Are all the includes necessary?  cfgloop.h hash-table.h, others?  We're 
trying to clean things up, new files ought to be relatively clean as 
they go in rather than making things worse :-)



build_and_add_sum is poorly named.  It handles far more operations than 
its name suggests.  Similarly its embedded comments still refer to 
"addition statement" and clearly need updating.  It may need u

Re: [PATCH][i386]Fix PR 57756

2013-10-15 Thread Alan Modra

On Tue, Oct 15, 2013 at 02:45:23PM -0700, Sriraman Tallam wrote:
> I committed this patch after making the above change.

/src/gcc-virgin/gcc/config/rs6000/rs6000.c: At global scope:
/src/gcc-virgin/gcc/config/rs6000/rs6000.c:31122:29: error: invalid conversion 
from ‘void (*)(cl_target_option*)’ to ‘void (*)(cl_target_option*, 
gcc_options*)’ [-fpermissive]
/src/gcc-virgin/gcc/config/rs6000/rs6000.c:31122:29: error: invalid conversion 
from ‘void (*)(cl_target_option*)’ to ‘void (*)(gcc_options*, 
cl_target_option*)’ [-fpermissive]


-- 
Alan Modra
Australia Development Lab, IBM

Re: [PATCH] Invalid unpoisoning of stack redzones on ARM

2013-10-15 Thread Yury Gribov

>>> I've recently submitted a bug report regarding invalid unpoisoning 
of stack frame redzones 
(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58543). Could someone take 
a look at proposed patch (a simple one-liner) and check whether it's ok 
for commit?

>>
>> Can you please be more verbose
>
> Do you think the proposed patch is ok or should I provide some 
additional info?


Jakub, Dodji,

Any updates on this one? Note that this bug is a huge blocker for using 
AddressSanitizer on ARM platforms.


-Y

Re: [PATCH, PR 53001] Re: Patch to split out new warning flag for floating point conversion

2013-10-15 Thread Joshua J Cogliati

Attached is a patch that addresses most of Dodji Seketeli's comments.
Explanations for the rest are inline.

On 10/14/2013 03:04 AM, Dodji Seketeli wrote:
> Thank you Joshua for following up on this.  Please find below some
> comments of mine that mostly belong to the nitpicking department.

You are welcome, and thank you for looking at it and commenting on it.

> Joshua J Cogliati  writes:
> 
> [...]
> 
>> A split out patch to make the floating conversions explicit is attached
>> at the PR 53001 page, but is not included in this patch.
> 
> It'd be nice to sent that split out patch here too, as it's a
> pre-requisite for this to pass bootstrap w/o warning.  Otherwise I am
> afraid that patch isn't likely to get reviewed.

It is not a prerequisite of the attached patch
warn_float_patch_simple_trunk.diff.  It becomes a pre-requisite if
-Wfloat-conversion is enabled with -Wextra. I did attached
fixup_float_conversions.diff that makes the changes needed in gcc.

> [...]
> 
>> == Testcases ==
>>
>> This patch has passes the existing -Wconversion testcases.  It modifies
>> Wconversion-real.c, Wconversion-real-integer.c and pr35635.c to be more
>> specific
> 
> If the patch passes existing tests, I'd be inclined to leave them
> tests alone and add new ones that are specific to what this patch
> changes.  You can even start from a copy of the tests you've modified
> above and trim them down so that only exercise the new code paths
> you've added; and of course add more cases top of these, if you can.
> But I won't fight hard for this if the other maintainers think it's OK
> this way.  :-)

Hm.  For gcc/testsuite/c-c++-common/Wconversion-real.c all the tests in
it are float-conversion by the patch's definition.  For
gcc/testsuite/gcc.dg/Wconversion-real-integer.c it already had a mixture
of conversion and overflow, so it is now a mixture conversion,
float-conversion and overflow.  Maybe gcc/testsuite/gcc.dg/pr35635.c
should stay the same.  I don't think new tests are needed since the
existing ones can just be switched; otherwise the testcases would be
redundant.  I certainly can change the patch if needed.

> [...]
> 
>> == Changelog ==
> 
> [...]
> 
>> * gcc/c-family/c-common.c Switching unsafe_conversion_p to
>> return an enumeration with more detail, and conversion_warning
>> to use this information.
> 
> The correct format for this ChangeLog entry would be:
> 
> * gcc/c-family/c-common.c (unsafe_conversion_p): 
> .
> 
> Please modify the rest of the ChangeLog entries to comply with this.
> Sorry if this seems nit-picky but I guess we need to keep all the
> ChangeLog entries coherent.

Here we go again:

Splitting out a -Wfloat-conversion from -Wconversion for
conversions that lower floating point number precision
or conversion from floating point numbers to integers
* gcc/c-family/c-common.c Switching unsafe_conversion_p to
return an enumeration with more detail, and conversion_warning
to use this information.
* gcc/c-family/c-common.h Adding conversion_safety enumeration
and switching return type of unsafe_conversion_p
* gcc/c-family/c.opt Adding new warning float-conversion and
enabling it -Wconversion
* gcc/doc/invoke.texi Adding documentation about
-Wfloat-conversion
* gcc/testsuite/c-c++-common/Wconversion-real.c Switching tests
to use float-conversion
* gcc/testsuite/gcc.dg/Wconversion-real-integer.c Switching
tests to use float-conversion
* gcc/testsuite/gcc.dg/pr35635.c Switching tests to use
float-conversion

> 
> [...]
> 
>> +++ gcc/c-family/c-common.c  (working copy)
>> @@ -2517,10 +2517,10 @@ shorten_binary_op (tree result_type, tre
>> Function allows conversions between types of different signedness and
>> does not return true in that case.  Function can produce signedness
>> warnings if PRODUCE_WARNS is true.  */
>> -bool
>> +enum conversion_safety
>>  unsafe_conversion_p (tree type, tree expr, bool produce_warns)
>>  {
> 
> As you are modifying the return type of this function, I think you
> should update the comment of the function too, especially the part
> that talks about the return values.

Changed.


> 
> You don't need the curly brackets here, so please move them.
> 

Changed.


> 
>> +++ gcc/c-family/c-common.h  (working copy)
> 
> [...]
> 
> 
>> +/* These variables are possible types of unsafe conversions. 
> 
> I would say "These enumerators", rather than "These variables".

Changed.


> 
> Other than these nits, the patch looks nice to me.
> 
> Thank you for working on this.
> 

Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c	(revision 203112)
+++ gcc/c-family/c-common.c	(working copy)
@@ -2507,7 +2507,7 @@ shorten_binary_op (tree result_type, tre
 }
 
 /* Checks if expression EXPR of real/integer type cannot be converted 
-   to

Re: [patch] omp-low.h

2013-10-15 Thread Andrew MacLeod


On 10/11/2013 04:12 AM, Richard Biener wrote:

On Fri, Oct 11, 2013 at 5:31 AM, Andrew MacLeod  wrote:

Missed a bit in tree-flow.h..  I mistakenly assumed omp_region belonged
there :-P

Anyway by moving struct omp_region into omp_low.h, along with the prototypes
from tree-flow.h, gimple.h and tree.h.  Everything works great with just a
few files actually requiring omp-low.h.

AS an extra bonus, omp-low.c was *exporting*  "struct omp_region
*root_omp_region".   tree-cfg.c was checking it for non-null and calling
free_omp_regions().   Well,  free_omp_regions works just fine will a NULL
root_omp_region (basically does nothing and returns), so exporting it just
for that one check seems nonsensical.  Its now static.

Bootstraps (will really-all languages) on x86_64-unknown-linux-gnu with no
new regressions.  also stage 1 cross builds on rs6000 and mips.  No more of
that crap :-)

OK?



Here's the new reworked version after Jakub's merge.

Couple of extra things.  well, 3 :-)

1 - The entire omp_region structure was being exported for tree-cfg to 
use, and it was used in a  single routine.. make_edges.  I split out the 
GIMPLE_OMP clauses from that switch, and put them into a function in 
omp-low.c for make_edges to call.  Now the structure contents are 
private to omp-low.c which I think is better.
2 - A few extra front end files are now using find_omp_clause() and thus 
need omp-low.h.  A couple of them do not understand gimple (nor should 
they).  They didn't even know what "enum gimple_code" was and failed to 
compile both struct omp_region (which has an enum gimple_code), as well 
as one or two of the prototypes.  Item 1) solved the struct issue, but I 
still needed to avoid 'enum gimple_code' in the prototypes, which is why 
make_gimple_omp_edges doesn't pass in the code, rather it picks it up 
from the last gimple_stmt of the basic block. pretty easy.  I folowed 
the naming convention used for make_gimple_asm_edges().
3 - omp-low was also exporting copy_var_decl with tree-cfg.c as the only 
current consumer.   It is actually a very generic tree routine,  but 
since its only used in these gimple contexts, I moved it to gimple.c so 
I wont have to duplicate it later when I do the wrappers. It will likely 
become a method within the gimple_var_decl object, so it seemed 
reasonable to put it in gimple.c for now. omp-low.c doesn't seem like 
the right place to leave it.


Bootstraps on 86_64-unknown-linux-gnu and no new regressions.  OK?

Andrew

	* tree-flow.h (struct omp_region): Move to omp-low.c
	Remove omp_ prototypes and variables.
	* gimple.h (omp_reduction_init): Move prototype to omp-low.h.
	(copy_var_decl): Relocate prototype from tree-flow.h.
	* gimple.c (copy_var_decl): Relocate from omp-low.c.
	* tree.h: Move prototype to omp-low.h.
	* omp-low.h: New File. Relocate prototypes here.
	* omp-low.c (struct omp_region): Make local here.
	(root_omp_region): Make static.
	(copy_var_decl) Move to gimple.c.
	(new_omp_region): Make static.
	(make_gimple_omp_edges): New.  Refactored from tree-cfg.c make_edges.
	* tree-cfg.c: Include omp-low.h.
	(make_edges): Factor out OMP specific bits to make_gimple_omp_edges.
	* gimplify.c: Include omp-low.h.
	* tree-parloops.c: Include omp-low.h.

	c
	* c-parser.c: Include omp-low.h.
	* c-typeck.c: Include omp-low.h.

	cp
	* parser.c: Include omp-low.h.
	* semantics.c: Include omp-low.h.

	fortran
	* trans-openmp.c: Include omp-low.h.

Index: tree-flow.h
===
*** tree-flow.h	(revision 203625)
--- tree-flow.h	(working copy)
*** along with GCC; see the file COPYING3.
*** 37,92 
  #include "tree-into-ssa.h"
  #include "tree-ssa-loop.h"
  
- /*---
- 			  OpenMP Region Tree
- ---*/
- 
- /* Parallel region information.  Every parallel and workshare
-directive is enclosed between two markers, the OMP_* directive
-and a corresponding OMP_RETURN statement.  */
- 
- struct omp_region
- {
-   /* The enclosing region.  */
-   struct omp_region *outer;
- 
-   /* First child region.  */
-   struct omp_region *inner;
- 
-   /* Next peer region.  */
-   struct omp_region *next;
- 
-   /* Block containing the omp directive as its last stmt.  */
-   basic_block entry;
- 
-   /* Block containing the OMP_RETURN as its last stmt.  */
-   basic_block exit;
- 
-   /* Block containing the OMP_CONTINUE as its last stmt.  */
-   basic_block cont;
- 
-   /* If this is a combined parallel+workshare region, this is a list
-  of additional arguments needed by the combined parallel+workshare
-  library call.  */
-   vec *ws_args;
- 
-   /* The code for the omp directive of this region.  */
-   enum gimple_code type;
- 
-   /* Schedule kind, only used for OMP_FOR type regions.  */
-   enum omp_clause_schedule_kind sched_kind;
- 
-   /* True if this is a combine

Re: [SKETCH] Refactor implicit function template implementation and fix 58534, 58536, 58548, 58549 and 58637.

2013-10-15 Thread Adam Butcher

On 2013-10-15 22:21, Adam Butcher wrote:

On Wed, 25 Sep 2013 11:01:26 -0500, Jason Merrill wrote:
>
> 2) If we see 'auto', scan ahead (possibly via tentative parsing) to 
see if there's a ...

>
My current preferred option.  The problem with it is that, ideally, I
only want to look ahead for '...' in the parm decl if an 'auto' is
seen. But I'm not sure how to do this in the case where the first
'auto' is nested in a template parameter (or other complex type).
E.g.

   auto f(pair, auto>... v)
^
From the 'auto' I'd need to unwind to the fn parm scope and then try
to tentatively parse the ellipsis.  To unwind and look ahead for 
'...'
needs the full parser mechanics but without any side-effects.  I 
don't

think I can do it with the lexer alone as I think there may be
ambiguity with scanning <, <<, >, >> tokens.

Look-ahead seems like the right way to go (unless there's a way to
defer type hashing) but I'm not sure how to achieve it.

I've got a [potential] [partial] solution to this.  It doesn't handle
edge cases but I believe it will handle the majority of cases.

By maintaining a nesting counter of calls to
cp_parser_enclosed_template_argument_list we can determine, for a
particular parm, what level of template-id were are parsing when we see
an 'auto'.  A tentative skip parse handling < << ( [ ] ) >> > can be
done until reaching an ellipsis or, if <> nesting count drops to zero,
a closing paren or comma terminating the parm.  Mismatches would also
be considered as terminating the parm.

In the ellipsis case we assume some sort of pack.  If the <> nesting
count is zero when an ellipsis is found, we assume a function parameter
pack and flag that all 'auto's in the parm should be type packs and
don't need to do the look-ahead again for this parm.  If the <> nesting
count is non-zero when an ellipsis is found, then only the 'auto's
within that level of template argument list are considered type packs
and look-ahead will be done for 'auto' found in subsequent template
argument lists within the parm.

The latter case implements the implicit type pack extension supporting,
for example, the following:

   f(tuple,tuple> t)

This doesn't handle cases where < << >> > are used as operators within
the parm but I think that's an edge case and it would only affect
whether the 'auto' was considered an implicit type pack or not.

I haven't gone all the way with this theory yet, but it could be a
plausible solution.

Let me know what you think.

Cheers,
Adam

Committed: CRIS: new multilib for v8, libgcc improvements and move to soft-fp.

2013-10-15 Thread Hans-Peter Nilsson

There's a page-full or two of numbers reported here with the
background, but maintainers of software-floating-point ports
used with a microcontroller may find that of use, if they're on
a cycle or size budget and consider fp-bit vs. soft-fp.

For an on-chip controller subsystem with a CRIS CPU, there was a
control-loop with floating-point numbers involved, with numbers
having a range that appeared to not lend itself to easy conversion
to our fixed-point-library (shameless plug here seems in order;
it's LGPL: )
i.e. there was no suitable "fixed point" position within 32
bits.  Worse, this particular CRIS CPU does not have fast
multiplication; it has the "CRIS v8" ISA.

The golden model for the control loop used "double", but a raw
cycle count using that type showed 117836 cycles (average over a
data set of 1000 iterations, using the CRIS simulator
co-habiting with the gdb project).  The budget was 25000.
Thankfully, 24 bits precision proved sufficient.  Switching to
"float" make the number of cycles shrink to 46974.  Still a long
way to go.  Switching to a separate multilib made sense, as this
CRIS version has a leading-zeros-count insn which did not exist
in the base version, and libgcc makes use of that, both for the
older fp-bit.c and for the newer soft-fp floating-point
libraries.  CRIS used the older fp-bit library only because I've
never found reason to try the newer soft-fp before, but I
recalled promises of big performance improvements.  The multilib
did help some, but not much; just taking the number down to
45741 cycles (some 2%).  A special umulsidi3 function and an
improved longlong.h helped more, down to 43266 cycles; about 5%.
Then I tried soft-fp and...  Bam!  The number of cycles went
down to 23318, a 46% improvement.  Arguably, there was a
downsize: the size of the program went up, from 10901 to 13741
bytes.  More tweaks included in the patch (arit.c, mulsi3.S)
showed small improvements, down to 23236 cycles.

If you're in this situation and using newlib, you'll also find
that using __ieee754_sqrtf instead of sqrtf helps and also to
use -ffast-math if your formulas allows that.  Though, those
changes just took the number of cycles down to 22951 (1%).  Use
of the internal sqrt-function makes more sense from a size
perspective, as the wrapper function pulls in conversion support
from and to double (obvious from the code).  The difference is
13713 to 11035 bytes (static data and program), which matters
more when you know that the size of the associated internal RAM
is 16 KiB and that those previous numbers exclude startup code,
stack and heap.

Checking afterwards whether the separate multilib still made
sense, it seems that without it, the number of cycles would be
25158 (with all other improvements in), so yes.  Apparently
soft-fp makes more use of leading-zeros-count than fp-bit.

A belated bottom-line advice: port maintainers who have not
already done so, switch to soft-fp from fp-bit if floating-point
performance ever may matter to your port.  If you're worried
it'll bloat the code, that's your decision, but you can actually
get something like twice the speed *at the application level*
for a 1/4 size increase.  If you're worried the library doesn't
have the knobs you want for your port, it does have them; a lot
more than fp-bit.  Have a look: are several different core code
choices for each of multiplication, division and addition.  And
as seen in the patch, I didn't even have to turn those knobs.
(Also, the goal of the target is met and I'm out of time.
Later.)

Tested to not regress for cris-elf (all multilibs except v32)
and crisv32-elf for r203561.  Also built crisv32-linux-gnu for
sanity-checking, but that was for a local import done long ago,
4.7-era.

gcc:
* config/cris/t-elfmulti (MULTILIB_OPTIONS, MULTILIB_DIRNAMES)
(MULTILIB_MATCHES): Add multilib for -march=v8.

libgcc:
For CRIS ports, switch to soft-fp.  Improve arit.c and longlong.h.
* config.host (cpu_type) : Add entry for
crisv32-*-*.
(tmake_file) 
: Adjust.
* longlong.h: Wrap the whole CRIS section in a single
defined(__CRIS__) conditional.  Add comment about add_ss
and sub_ddmmss.
(COUNT_LEADING_ZEROS_0): Define when count_leading_zeros is
defined.
[__CRIS__] (__umulsidi3): Define.
[__CRIS__] (umul_ppmm): Define in terms of __umulsidi3.
* config/cris/sfp-machine.h: New file.
* config/cris/umulsidi3.S: New file.
* config/cris/t-elfmulti (LIB2ADD_ST): Add umulsidi3.S.
* config/cris/arit.c (SIGNMULT): New macro.
(__Div, __Mod): Use SIGNMULT instead of naked multiplication.
* config/cris/mulsi3.S: Tweak to avoid redundant register-copying;
saving 3 out of originally 33 cycles from the fastest
path, 3 out of 54 from the medium path and one from the longest
path.  Improve comments.

diff --git a/gc

[PATCH, rs6000] Fix vsx_concat_ insns for little endian

2013-10-15 Thread Bill Schmidt

Simple patch to reverse the order of the input operands when
concatenating for little endian code generation.  Bootstrapped and
tested on powerpc64-unknown-linux-gnu and powerpc64le-unknown-linux-gnu
with no regressions.  Fixes two tests in the testsuite for the latter.
Ok for trunk?

Thanks,
Bill


2013-10-15  Bill Schmidt  

* config/rs6000/vsx.md (vsx_concat_): Adjust output for LE.
(vsx_concat_v2sf): Likewise.


Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 203508)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -1194,7 +1194,12 @@
 (match_operand: 1 "vsx_register_operand" "ws,wa")
 (match_operand: 2 "vsx_register_operand" "ws,wa")))]
   "VECTOR_MEM_VSX_P (mode)"
-  "xxpermdi %x0,%x1,%x2,0"
+{
+  if (BYTES_BIG_ENDIAN)
+return "xxpermdi %x0,%x1,%x2,0";
+  else
+return "xxpermdi %x0,%x2,%x1,0";
+}
   [(set_attr "type" "vecperm")])
 
 ;; Special purpose concat using xxpermdi to glue two single precision values
@@ -1207,7 +1212,12 @@
  (match_operand:SF 2 "vsx_register_operand" "f,f")]
 UNSPEC_VSX_CONCAT))]
   "VECTOR_MEM_VSX_P (V2DFmode)"
-  "xxpermdi %x0,%x1,%x2,0"
+{
+  if (BYTES_BIG_ENDIAN)
+return "xxpermdi %x0,%x1,%x2,0";
+  else
+return "xxpermdi %x0,%x2,%x1,0";
+}
   [(set_attr "type" "vecperm")])
 
 ;; xxpermdi for little endian loads and stores.  We need several of

[jit] Improvements to documentation.

2013-10-15 Thread David Malcolm

Committed to dmalcolm/jit branch.

gcc/jit/
* libgccjit.h (gcc_jit_location): Rewrite comment to reflect
that this part of the API is now implemented.
("Functions for use within the code factory."): Add notes on
memory-management and lifetimes.
* notes.txt: Update diagram to show handle_locations.
---
 gcc/jit/ChangeLog.jit |  8 
 gcc/jit/libgccjit.h   | 26 +-
 gcc/jit/notes.txt |  5 +
 3 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit
index 18101f1..da3e75a 100644
--- a/gcc/jit/ChangeLog.jit
+++ b/gcc/jit/ChangeLog.jit
@@ -1,5 +1,13 @@
 2013-10-15  David Malcolm  
 
+   * libgccjit.h (gcc_jit_location): Rewrite comment to reflect
+   that this part of the API is now implemented.
+   ("Functions for use within the code factory."): Add notes on
+   memory-management and lifetimes.
+   * notes.txt: Update diagram to show handle_locations.
+
+2013-10-15  David Malcolm  
+
* TODO.rst: Update.
 
 2013-10-14  David Malcolm  
diff --git a/gcc/jit/libgccjit.h b/gcc/jit/libgccjit.h
index a17ab09..8985b8f 100644
--- a/gcc/jit/libgccjit.h
+++ b/gcc/jit/libgccjit.h
@@ -30,17 +30,17 @@ typedef struct gcc_jit_context gcc_jit_context;
 /* A gcc_jit_result encapsulates the result of a compilation.  */
 typedef struct gcc_jit_result gcc_jit_result;
 
-/* A gcc_jit_location will encapsulate a source code location, so that
-   you can associated locations in your language with statements in the
-   JIT-compiled code, allowing the debugger to single-step through
-   your language.
+/* A gcc_jit_location encapsulates a source code location, so that
+   you can (optionally) associate locations in your language with
+   statements in the JIT-compiled code, allowing the debugger to
+   single-step through your language.
 
-   This part of the API is a placeholder to allow future expansion
-   without breaking ABI: there currently is no way of creating a
-   gcc_jit_location.
+   Note that to do so, you also need to enable
+ GCC_JIT_BOOL_OPTION_DEBUGINFO
+   on the gcc_jit_context.
 
-   For now you must pass NULL into parameters expecting a
-   (gcc_jit_location *).  */
+   gcc_jit_location instances are optional; you can always pass
+   NULL.  */
 typedef struct gcc_jit_location gcc_jit_location;
 
 /* A gcc_jit_type encapsulates a type e.g. "int" or a "struct foo*".  */
@@ -235,6 +235,14 @@ gcc_jit_result_release (gcc_jit_result *result);
 
 /**
  Functions for use within the code factory.
+
+ All objects created by these functions are cleaned up for you, after
+ your code-creation hook returns.  Note that this means you can't hold
+ references to them for use after you return from your callback.
+
+ All (const char *) string arguments passed to these functions are
+ copied, so you don't need to keep them around.  Note that this *isn't*
+ the case for other parts of the API.
  **/
 /* Creating source code locations for use by the debugger.
Line and column numbers are 1-based.  */
diff --git a/gcc/jit/notes.txt b/gcc/jit/notes.txt
index 6d1288f..2c225ce 100644
--- a/gcc/jit/notes.txt
+++ b/gcc/jit/notes.txt
@@ -38,6 +38,11 @@ Client Code   . Generated .libgccjit.so
│etc   .   .  .   .
│  .   .  .   .
──>   .
+Return from client callback  .│  .
+  .   .  .│  .
+  .   .  .│(handle_locations: add locations to
+  .   .  .│ linemap and associate them with 
trees)
+  .   .  .│  .
   .   .  .│  .   No GC in here
 ..│..A...
   .   .  .│ for each function
-- 
1.7.11.7

Re: [PING^2][PATCH][2 of 2] RTL expansion for zero sign extension elimination with VRP

2013-10-15 Thread Kugan

Thanks Richard for the review.

On 15/10/13 23:55, Richard Biener wrote:
> On Tue, 15 Oct 2013, Kugan wrote:
> 
>> Hi Eric,
>>
>> Can you please help to review this patch?
>> http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00452.html
> 
> I think that gimple_assign_is_zero_sign_ext_redundant and its
> description is somewhat confused.  You seem to have two cases
> here, one being NOP_EXPR or CONVERT_EXPR that are in the IL
> because they are required for type correctness.  So,
> 

I have changed the name and the comments to make it more clear.

>long = (long) int
> 
> which usually expands to
> 
>(set:DI (sext:DI reg:SI))
> 
> you want to expand as
> 
>(set:DI (subreg:DI reg:SI))
> 
> where you have to be careful for modes smaller than word_mode.
> You don't seem to implement this optimization though (but
> the gimple_assign_is_zero_sign_ext_redundant talks about it).
> 


I am actually handling only the cases smaller than word_mode. If there
is any sign change, I dont do any changes. In the place RTL expansion is
done, I have added these condition as it is done for convert_move and
others.

For example, when an expression is evaluated and it's value is assigned
to variable of type short, the generated RTL would look something like
the following.

(set (reg:SI 110)
(zero_extend:SI (subreg:HI (reg:SI 117) 0)))

However, if during value range propagation, if we can say for certain
that the value of the expression which is present in register 117 is
within the limits of short and there is no sign conversion, we do not
need to perform the subreg and zero_extend; instead we can generate the
following RTl.

(set (reg:SI 110)
(reg:SI 117)))


> Second are promotions required by the target (PROMOTE_MODE)
> that do arithmetic on wider registers like for
> 
>   char = char + char
> 
> where they cannot do
> 
>   (set:QI (plus:QI reg:QI reg:QI))
> 
> but instead have, for example reg:SI only and you get
> 
>   (set:SI (plus:SI reg:SI reg:SI))
>   (set:SI ([sz]ext:SI (subreg:QI reg:SI)))
> 
> that you try to address with the cfgexpand hunk but I believe
> it doesn't work the way you do it.  That is because on GIMPLE
> you do not see SImode temporaries and thus no SImode value-ranges.
> Consider
> 
>   tem = (char) 255 + (char) 1;
> 
> which has a value-range of [0,0] but clearly when computed in
> SImode the value-range is [256, 256].  That is, VRP computes
> value-ranges in the expression type, not in some arbitrary
> larger type.
> 
> So what you'd have to do is take the value-ranges of the
> two operands of the plus and see whether the plus can overflow
> QImode when computed in SImode (for the example).
>

Yes. Instead of calculating the value ranges of the two operand in
SImode, What I am doing in this case is to look at the value range of
tem and if it is within [CHAR_MIN + 1, CHAR_MAX -1]. As you have
explained earlier, we cant rely on being within the [CHAR_MIN, CHAR_MAX]
as the range could have been modified to fit the LHS type. This ofcourse
will miss some of the cases where we can remove extensions but
simplifies the logic.

> [exposing the effect of PROMOTE_MODE earlier than at RTL expansion
> time may make this less awkward]

Please find the modified patch attached.


+2013-10-16  Kugan Vivekanandarajah  
+
+   * dojump.c (do_compare_and_jump): Generate rtl without
+   zero/sign extension if redundant.
+   * cfgexpand.c (expand_gimple_stmt_1): Likewise.
+   * gimple.c (gimple_is_rhs_value_fits_in_assign) : New
+   function.
+   * gimple.h (gimple_is_rhs_value_fits_in_assign) : Declare.
+

Thanks,
Kugan
> 
> Thanks,
> Richard.
> 
>> Thanks,
>> Kugan
>>
>>> +2013-09-25  Kugan Vivekanandarajah  
>>> +
>>> +   * dojump.c (do_compare_and_jump): Generate rtl without
>>> +   zero/sign extension if redundant.
>>> +   * cfgexpand.c (expand_gimple_stmt_1): Likewise.
>>> +   * gimple.c (gimple_assign_is_zero_sign_ext_redundant) : New
>>> +   function.
>>> +   * gimple.h (gimple_assign_is_zero_sign_ext_redundant) : Declare.
>>> +
>>>
>>>

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 88e48c2..60869ce 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -2311,6 +2311,20 @@ expand_gimple_stmt_1 (gimple stmt)
 
if (temp == target)
  ;
+   /* If the value in SUBREG of temp fits that SUBREG (does not
+  overflow) and is assigned to target SUBREG of the same mode
+  without sign conversion, we can skip the SUBREG
+  and extension.  */
+   else if (promoted
+&& gimple_is_rhs_value_fits_in_assign (stmt)
+&& (GET_CODE (temp) == SUBREG)
+&& (GET_MODE_PRECISION (GET_MODE (SUBREG_REG (temp)))
+>= GET_MODE_PRECISION (GET_MODE (target)))
+&& (GET_MODE (SUBREG_REG (target))
+== GET_MODE (SUBREG_REG (temp
+ {
+   emit_move_insn (SUBREG_REG (target), SUBREG_

[jit] Update TODO for jit.

2013-10-15 Thread David Malcolm

Committed to dmalcolm/jit branch

---
 gcc/jit/ChangeLog.jit |  4 
 gcc/jit/TODO.rst  | 11 ++-
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit
index d998134..18101f1 100644
--- a/gcc/jit/ChangeLog.jit
+++ b/gcc/jit/ChangeLog.jit
@@ -1,3 +1,7 @@
+2013-10-15  David Malcolm  
+
+   * TODO.rst: Update.
+
 2013-10-14  David Malcolm  
 
* libgccjit.map: Alphabetize the exported symbols.
diff --git a/gcc/jit/TODO.rst b/gcc/jit/TODO.rst
index 65d4dc8..feb4d2b 100644
--- a/gcc/jit/TODO.rst
+++ b/gcc/jit/TODO.rst
@@ -13,13 +13,6 @@ TODOs:
 * unions
 * function ptrs
 
-* demo of JIT-compilation of a bytecode interpreter:
-https://github.com/davidmalcolm/jittest
-  (move this into tree, as an example)
-
-* wiring up testing into a DejaGNU testsuite, rather than testing it
-  by hand.
-
 * docs
 
 * fixing all the state issues
@@ -28,8 +21,8 @@ TODOs:
 
 * try porting llvmpipe to gcc
 
-* source locations: line-numbering etc
-
 * pkgconfig .pc file
 
+* add a SONAME to the library (and potentially version the symbols?)
+
 etc etc
-- 
1.7.11.7

Re: [Patch] Handle profile counts truncated to 0 in coldness test

2013-10-15 Thread Teresa Johnson

Ping on this one since the new param would be useful in the other
profile related patch I'm working on (the one to handle the dropped
profile counts in tree-profile).

Thanks,
Teresa

On Thu, Oct 10, 2013 at 2:35 PM, Teresa Johnson  wrote:
> On Mon, Oct 7, 2013 at 10:45 PM, Teresa Johnson  wrote:
>> On Sun, Oct 6, 2013 at 6:43 AM, Jan Hubicka  wrote:
 2013-10-03  Teresa Johnson  

 * params.def (PARAM_MIN_HOT_RUN_RATIO): New parameter.
>>>
>>> I would not mention HOT in the parameter name.  Blocks are 
>>> hot/normal/unlikely
>>> and we HOT_BB_FREQUENCY_FRACTION.
>>>
>>> So perhaps UNLIKELY_BB_COUNT_FRACTION with an explanation that it is 
>>> relative
>>> to number of train runs?
>>
>> Ok.
>>
 +DEFPARAM(PARAM_MIN_HOT_RUN_RATIO,
 +"min-hot-run-ratio",
 + "The minimum ratio of profile runs to basic block execution 
 count "
 + "for the block to be considered hot",
>>> Perhaps as:
 + "The maximum ratio of profile runs to basic block execution 
 count "
 + "for the block to be considered unlikely",
>>>
 +4, 0, 1)
>>>
>>> lower bound should be 1 or we divide by 0.
>>
>> Yep, will fix.
>>
 Index: predict.c
 ===
 --- predict.c   (revision 203152)
 +++ predict.c   (working copy)
 @@ -237,17 +237,31 @@ probably_never_executed (struct function *fun,
gcc_checking_assert (fun);
if (profile_status_for_function (fun) == PROFILE_READ)
  {
 -  if ((count * 4 + profile_info->runs / 2) / profile_info->runs > 0)
 +  int min_run_ratio = PARAM_VALUE (PARAM_MIN_HOT_RUN_RATIO);
 +  if (RDIV (count * min_run_ratio, profile_info->runs) > 0)
>>>
>>> This way if count is 1, runs needs to be n_run_ratio * 2
>>> So perhaps if (count * min_run_ratio >= profile_info->runs) instead of 
>>> division.
>>
>> Ok.
>>
 -  if (ENTRY_BLOCK_PTR->count && ENTRY_BLOCK_PTR->count < 
 REG_BR_PROB_BASE)
 +  if (ENTRY_BLOCK_PTR->count)
 {
 - return (RDIV (frequency * ENTRY_BLOCK_PTR->count,
 -   ENTRY_BLOCK_PTR->frequency)
 - < REG_BR_PROB_BASE / 4);
 +  gcov_type scaled_count
 +  = frequency * ENTRY_BLOCK_PTR->count * min_run_ratio;
 +  gcov_type computed_count;
 +  /* Check for overflow, in which case ENTRY_BLOCK_PTR->count 
 should
 + be large enough to do the division first without losing much
 + precision.  */
 +  if (scaled_count/frequency/min_run_ratio != 
 ENTRY_BLOCK_PTR->count)
>>>
>>> I do not like the undefined behaviour triggered before the check (sure 
>>> unsigned
>>> arithmetic would fix that, but it seems all strange).  What about guarding 
>>> the
>>> whole code.
>>>
>>> if (ENTRY_BLOCK_PTR->count && count < REG_BR_PROB_BASE)
>>> For large counts the roudoff problems in first test should not be problem.
>>
>> Sure, that sounds reasonable.
>
> Actually, we don't care about count at this point, just 
> ENTRY_BLOCK_PTR->count.
>
>>
 +{
 +  computed_count = RDIV (ENTRY_BLOCK_PTR->count,
 + ENTRY_BLOCK_PTR->frequency);
 +  computed_count *= frequency * min_run_ratio;
 +}
 +  else
 +computed_count = RDIV (scaled_count, 
 ENTRY_BLOCK_PTR->frequency);
 +  if (RDIV (computed_count * min_run_ratio, profile_info->runs) > 
 0)
>>>
>>> So at nonoverflow path you compute
>>>
>>>((frequency * entry_bb_count * min_run_ratio) / entry_bb_frequency) * 
>>> min_run_ratio / runs > 0.5
>>
>> Yes, there is an extra min_run_ratio factor in there that I need to remove!
>>>
>>> I think you want
>>>
>>>((frequency * entry_bb_count * min_run_ratio) / entry_bb_frequency >= 
>>> runs
>>
>> Right, will change that too.
>
> Note that the above change essentially decreases the min_run_ratio in
> half (since we are comparing >= runs instead of >= 0.5 runs due to the
> rounding divide).
>
>>
>>>
>>> If entry_bb_frequency is 0, you can just return true iff frequency is > 
>>> min_run_ratio.
>>
>> We already return false conservatively if entry_bb_frequency is 0. Do
>> we really want to change this - I'm not sure it makes sense to compare
>> the frequency to the run ratio directly.
>>
>>>
>>> I think safe way to compute this is
>>>
>>> if (ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE * REG_BR_PROB_BASE)
>>>   scaled_count = ((frequency * entry_bb_count * min_run_ratio) / 
>>> entry_bb_frequency;
>>> else
>>>   scaled_count = ((frequency * entry_bb_count) / entry_bb_frequency) * 
>>> min_run_ratio
>>>
>>> In first path, we have at most 1*1*1*1 that still fits in 
>>> 64bit
>>> value.
>>>
>>> In the second path the base of fixed point arithmetic is large enough so the
>>> divisi

Re: [PATCH v4 04/20] add configury

2013-10-15 Thread Gerald Pfeifer

On Tue, 15 Oct 2013, Tom Tromey wrote:
>> Is nobody else seeing this?  Or is everyone just lucky enough to have
>> a recent version of GCC as the default compiler?
> How exactly are you configuring and invoking "make"?
> I will try to reproduce it.

Super, thanks Tom!

I've tried to reproduce this manually and had some mixed results, but 
basically my script does this:

  % mkdir /scratch2/tmp/gerald/OBJ-1014-1920
  % cd /scratch2/tmp/gerald/OBJ-1014-1920
  % $GCC_SOURCE/configure --prefix=/home/gerald/something
  % gmake bootstrap-lean
  % gmake install

Wait a minute!  

It _looks_ like I can get this to succeed using "gmake bootstrap" 
and have the very same script fail using "gmake bootstrap-lean".

Can you reproduce that?

Gerald

Re: [PATCH] decide edge's hotness when there is profile info

2013-10-15 Thread Teresa Johnson

On Tue, Oct 15, 2013 at 2:57 PM, Jan Hubicka  wrote:
>> On Tue, Oct 15, 2013 at 2:03 PM, Jan Hubicka  wrote:
>> >> Yes, that would work. So let's discard this patch because the fix for
>> >> comdat can also fix this problem.
>> >
>> > Unforutnately ipa-profile-estimate is an IPA pass and as such it does
>> > not have access to profile_status_to_function.
>>
>> I see. I was coding that up today but hadn't tested it yet.
>>
>> > You can probably just factor this out into a function that can be called
>> > and for normal FDO we call it where the loop stis now and for auto-FDO we 
>> > can
>> > probably have another invocation from before early passes where auto-FDO 
>> > is collected.
>>
>> Ok, let's go with that approach for now. It won't address the 0 count
>> COMDAT calling another 0 count COMDAT problem, but I will probably
>> just find a way to deal with this when inlining.
>
> You can still propagate, since tree-profile is an simple-ipa pass.

Ok, I think I will leave that for the second patch, since I need to do
some testing on the effects - i.e. the propagation I had in mind would
make any 0-count routine called by a routine that was dropped to
guessed to also be dropped to guessed (and no longer unlikely). It may
be too aggressive, I need to check.

>>
>> >> >>> +  if (node->count)
>> >> >>> + continue;
>> > Also here we should sum the counts and consider function non unlikely 
>> > executed
>> > in the same way as probably_never_executed does.
>>
>> I assume you mean by doing the same comparison to the number of
>> profile->runs. Yes, this makes sense.
>
> Yes.
>>
>> >
>> > I can prepare updated patch, but i am currently travelling, so i would not
>> > be disapointed if you beat me ;)
>>
>> I'm working on it, and I think based on Dehao's needs I am going to
>> split up the patch into two phases, the one that does just the part
>> you had sent a patch for (making sure 0 count routines with non-zero
>> calls are marked guessed and have their node frequency set
>> appropriately), and a subsequent one to do the count application when
>> we inline a 0-count routine into a non-zero callsite. I'll shoot for
>> getting this ready by tomorrow.
>>
>> BTW, in your original patch you are checking for both e->count or
>> cgraph_maybe_hot_edge_p(e), but AFAICT the call to
>> cgraph_maybe_hot_edge_p will never return true when e->count is zero.
>> When there is a profile it will return false via maybe_hot_count_p
>> since e->count == 0. When there is no profile it will return false
>> when the callee has NODE_FREQUENCY_UNLIKELY_EXECUTED. So I think just
>> checking for e->count >0 is sufficient here.
>
> I think I was checking caller count here (that is read) and the code
> was supposed to make functoin with hot caller to be hot...

The code in your patch was just checking the edge count, not the
caller count. cgraph_maybe_hot_edge_p also doesn't check the caller
count, just the edge count. Do we want to make all functions with hot
callers also be hot, even when the edge count is not hot? This may be
to aggressive. I was simply going to make the code check e->count.

Teresa

>
> Honza
>>
>> Thanks,
>> Teresa
>>
>> >
>> > Honza
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

[Patch, Fortran] PR58652 - accept CLASS() as argument to CLASS()

2013-10-15 Thread Tobias Burnus


As the test case (see also PR) showed, gfortran was rejecting:

 subroutine list_move_alloc(self,item)
   class(list_node),intent(inout) :: self
   class(*),intent(inout),allocatable :: item
...
 class(*), allocatable :: expr
...
   call ast%move_alloc(expr)

with the bogus message:

call ast%move_alloc(expr)
1
Error: Actual argument to 'item' at (1) must have the same declared type


The attached patch now also accepts passing CLASS(*) to CLASS(*).

Built and currently regtesting on x86-64-gnu-linux (when successful:)
OK for the trunk?

Tobias
2013-10-16  Tobias Burnus  

	PR fortran/58652
	* interface.c (compare_parameter): Accept passing CLASS(*)
	to CLASS(*).

2013-10-16  Tobias Burnus  

	PR fortran/58652
	* gfortran.dg/unlimited_polymorphic_12.f90: New.

diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c
index b878644..b3ddf5f 100644
--- a/gcc/fortran/interface.c
+++ b/gcc/fortran/interface.c
@@ -1990,8 +1990,9 @@ compare_parameter (gfc_symbol *formal, gfc_expr *actual,
   if (!gfc_expr_attr (actual).class_ok)
 	return 0;
 
-  if (!gfc_compare_derived_types (CLASS_DATA (actual)->ts.u.derived,
-  CLASS_DATA (formal)->ts.u.derived))
+  if ((!UNLIMITED_POLY (formal) || !UNLIMITED_POLY(actual))
+	  && !gfc_compare_derived_types (CLASS_DATA (actual)->ts.u.derived,
+	 CLASS_DATA (formal)->ts.u.derived))
 	{
 	  if (where)
 	gfc_error ("Actual argument to '%s' at %L must have the same "
diff --git a/gcc/testsuite/gfortran.dg/unlimited_polymorphic_12.f90 b/gcc/testsuite/gfortran.dg/unlimited_polymorphic_12.f90
new file mode 100644
index 000..c583c6b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/unlimited_polymorphic_12.f90
@@ -0,0 +1,44 @@
+! { dg-do compile }
+!
+! PR fortran/58652
+!
+! Contributed by Vladimir Fuka
+!
+! The passing of a CLASS(*) to a CLASS(*) was reject before
+!
+module gen_lists
+  type list_node
+class(*),allocatable :: item
+contains
+  procedure :: move_alloc => list_move_alloc
+  end type
+
+  contains
+
+subroutine list_move_alloc(self,item)
+  class(list_node),intent(inout) :: self
+  class(*),intent(inout),allocatable :: item
+
+  call move_alloc(item, self%item)
+end subroutine
+end module
+
+module lists
+  use gen_lists, only: node => list_node
+end module lists
+
+
+module sexp
+  use lists
+contains
+ subroutine parse(ast)
+class(*), allocatable, intent(out) :: ast
+class(*), allocatable :: expr
+integer :: ierr
+allocate(node::ast)
+select type (ast)
+  type is (node)
+call ast%move_alloc(expr)
+end select
+  end subroutine
+end module

Re: libgomp.texi: Update for OpenMP v4.0

2013-10-15 Thread Tobias Burnus


Am 15.10.2013 22:35, schrieb Jakub Jelinek:
Thanks. Just a few nits: 


Thanks for the review - I have now committed the attached revised 
version (Rev. 203635).


Regarding your comments:

non-negative, 0 is a valid target device number.


I thought I read somewhere that for some variable 0 is the host number, 
but as I cannot find it anymore, I must misremember it.



+If set to @code{TRUE}, the cancellation is activated.  If set to @code{FALSE} 
or
+if unset, cancellation is disabled and the @code{cancel} construct is ignored.

Perhaps mention that the default is @code{FALSE}?


Well, I did: "... if unset, cancellation is disabled"

Tobias
gcc/fortran/
2013-10-15  Tobias Burnus  

	* intrinsic.texi (OpenMP Modules): Update to OpenMPv4.
	Document omp_proc_bind_kind.

libgomp/
2013-10-15  Tobias Burnus  

	* libgomp.texi: (Runtime Library Routines): Update references for
	OpenMP 4.0. Add omp_get_cancellation, omp_get_default_device,
	omp_get_num_devices, omp_get_num_teams, omp_get_proc_bind,
	omp_get_team_num, omp_is_initial_device, omp_set_default_device.
	(Environment Variables): Update references for OpenMP 4.0. Add
	OMP_CANCELLATION, OMP_DEFAULT_DEVICE, OMP_PLACES.
	Move OMP_DISPLAY_ENV and OMP_PROC_BIND up to be in alphabetical
	order.

diff --git a/gcc/fortran/intrinsic.texi b/gcc/fortran/intrinsic.texi
index afe9671..18bd4d2 100644
--- a/gcc/fortran/intrinsic.texi
+++ b/gcc/fortran/intrinsic.texi
@@ -13177,7 +13177,7 @@ Both are equivalent to the value @code{NULL} in C.
 @section OpenMP Modules @code{OMP_LIB} and @code{OMP_LIB_KINDS}
 @table @asis
 @item @emph{Standard}:
-OpenMP Application Program Interface v3.1
+OpenMP Application Program Interface v4.0
 @end table
 
 
@@ -13190,8 +13190,8 @@ the named constants defined in the modules are listed
 below.
 
 For details refer to the actual
-@uref{http://www.openmp.org/mp-documents/spec31.pdf,
-OpenMP Application Program Interface v3.1}.
+@uref{http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf,
+OpenMP Application Program Interface v4.0}.
 
 @code{OMP_LIB_KINDS} provides the following scalar default-integer
 named constants:
@@ -13199,15 +13199,17 @@ named constants:
 @table @asis
 @item @code{omp_lock_kind}
 @item @code{omp_nest_lock_kind}
+@item @code{omp_proc_bind_kind}
 @item @code{omp_sched_kind}
 @end table
 
 @code{OMP_LIB} provides the scalar default-integer
 named constant @code{openmp_version} with a value of the form
 @var{mm}, where @code{} is the year and @var{mm} the month
-of the OpenMP version; for OpenMP v3.1 the value is @code{201107}.
+of the OpenMP version; for OpenMP v3.1 the value is @code{201107}
+and for OpenMP v4.0 the value is @code{201307}.
 
-And the following scalar integer named constants of the
+The following scalar integer named constants of the
 kind @code{omp_sched_kind}:
 
 @table @asis
@@ -13216,3 +13218,14 @@ kind @code{omp_sched_kind}:
 @item @code{omp_sched_guided}
 @item @code{omp_sched_auto}
 @end table
+
+And the following scalar integer named constants of the 
+kind @code{omp_proc_bind_kind}:
+
+@table @asis
+@item @code{omp_proc_bind_false}
+@item @code{omp_proc_bind_true}
+@item @code{omp_proc_bind_master}
+@item @code{omp_proc_bind_close}
+@item @code{omp_proc_bind_spread}
+@end table
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 5e55716..0033beb 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -106,17 +106,17 @@ for multi-platform shared-memory parallel programming in C/C++ and Fortran.
 @chapter Enabling OpenMP
 
 To activate the OpenMP extensions for C/C++ and Fortran, the compile-time 
-flag @command{-fopenmp} must be specified. This enables the OpenMP directive
+flag @command{-fopenmp} must be specified.  This enables the OpenMP directive
 @code{#pragma omp} in C/C++ and @code{!$omp} directives in free form, 
 @code{c$omp}, @code{*$omp} and @code{!$omp} directives in fixed form, 
 @code{!$} conditional compilation sentinels in free form and @code{c$},
-@code{*$} and @code{!$} sentinels in fixed form, for Fortran. The flag also
+@code{*$} and @code{!$} sentinels in fixed form, for Fortran.  The flag also
 arranges for automatic linking of the OpenMP runtime library 
 (@ref{Runtime Library Routines}).
 
 A complete description of all OpenMP directives accepted may be found in 
 the @uref{http://www.openmp.org, OpenMP Application Program Interface} manual,
-version 3.1.
+version 4.0.
 
 
 @c -
@@ -126,30 +126,37 @@ version 3.1.
 @node Runtime Library Routines
 @chapter Runtime Library Routines
 
-The runtime routines described here are defined by section 3 of the OpenMP 
-specifications in version 3.1. The routines are structured in following
+The runtime routines described here are defined by Section 3 of the OpenMP
+specification in version 4.0.  The routines are structured in following
 three parts:
 
-Control threads, processors and the parallel environment.
-
-They have C linkage, and don'

Re: [PATCH] decide edge's hotness when there is profile info

2013-10-15 Thread Jan Hubicka

> On Tue, Oct 15, 2013 at 2:03 PM, Jan Hubicka  wrote:
> >> Yes, that would work. So let's discard this patch because the fix for
> >> comdat can also fix this problem.
> >
> > Unforutnately ipa-profile-estimate is an IPA pass and as such it does
> > not have access to profile_status_to_function.
> 
> I see. I was coding that up today but hadn't tested it yet.
> 
> > You can probably just factor this out into a function that can be called
> > and for normal FDO we call it where the loop stis now and for auto-FDO we 
> > can
> > probably have another invocation from before early passes where auto-FDO is 
> > collected.
> 
> Ok, let's go with that approach for now. It won't address the 0 count
> COMDAT calling another 0 count COMDAT problem, but I will probably
> just find a way to deal with this when inlining.

You can still propagate, since tree-profile is an simple-ipa pass.
> 
> >> >>> +  if (node->count)
> >> >>> + continue;
> > Also here we should sum the counts and consider function non unlikely 
> > executed
> > in the same way as probably_never_executed does.
> 
> I assume you mean by doing the same comparison to the number of
> profile->runs. Yes, this makes sense.

Yes.
> 
> >
> > I can prepare updated patch, but i am currently travelling, so i would not
> > be disapointed if you beat me ;)
> 
> I'm working on it, and I think based on Dehao's needs I am going to
> split up the patch into two phases, the one that does just the part
> you had sent a patch for (making sure 0 count routines with non-zero
> calls are marked guessed and have their node frequency set
> appropriately), and a subsequent one to do the count application when
> we inline a 0-count routine into a non-zero callsite. I'll shoot for
> getting this ready by tomorrow.
> 
> BTW, in your original patch you are checking for both e->count or
> cgraph_maybe_hot_edge_p(e), but AFAICT the call to
> cgraph_maybe_hot_edge_p will never return true when e->count is zero.
> When there is a profile it will return false via maybe_hot_count_p
> since e->count == 0. When there is no profile it will return false
> when the callee has NODE_FREQUENCY_UNLIKELY_EXECUTED. So I think just
> checking for e->count >0 is sufficient here.

I think I was checking caller count here (that is read) and the code
was supposed to make functoin with hot caller to be hot...

Honza
> 
> Thanks,
> Teresa
> 
> >
> > Honza
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: [PATCH][i386]Fix PR 57756

2013-10-15 Thread Sriraman Tallam

On Sat, Oct 12, 2013 at 12:10 AM, Uros Bizjak  wrote:
> On Sat, Oct 12, 2013 at 2:16 AM, Sriraman Tallam  wrote:
>
>> Ping.
>
> This looks nice.  I suppose you grepped for effected targets and rs6000
> was the only one besides i386.
>
> This is ok given target maintainers do not object within 24h and the test
> successfully bootstrapped and passed regression tests.

 I changed this patch  quite a bit after I realized I missed many other
 places where global_options field was accessed. The two specific
 changes are:

 * Particularly, ix86_function_specific_save and
 ix86_function_specific_restore had to be changed to copy to a specific
 gcc_options structure than to global_options.
 * Function ix86_option_override_internal accesses global_options via
 macros, like TARGET_64BIT etc. This had to be changed. So, I created
 new macros to accept a parameter and named them like TARGET_64BIT_P
 and replaced all the accesses in i386.c

 I also marked as TODO a copy in function ix86_function_specific_save.
 Here, the cl_target_option structure has a ix86_target_flags_explicit
 field whereas the global_options structure does not have one.
 Previously, this used to get saved to the global_options target_flags
 field but this did not make sense to me. I have commented this line
 out. Please let me know if a new field needs to be added.

 This patch passes bootstrap and all tests. Please take another look.
>
> AFAICS, the patch was approved by Richard on 23. September. None of
> the target maintainers have had any further objections to it.
>
> Regarding the TODO in i386.c: some options depend on and enable other
> options, for example -msse3 enables SSE2, etc, although user didn't
> explicitly add -msse2 to the options. This is not the case with global
> options. Since this field is used extensively through i386.c, I'd say
> to play safe and save it.
>
> So, x86 part is OK with the above change. Middle end is already
> approved and rs6000 maintainer didn't object the approval, so please
> go ahead and commit the patch.

I committed this patch after making the above change.

Thanks
Sri

>
> Thanks,
> Uros.

Re: [PATCH v2] Fix libgfortran cross compile configury w.r.t newlib

2013-10-15 Thread Mike Stump

On Oct 15, 2013, at 1:17 PM, Tobias Burnus  wrote:
> Marcus Shawcroft wrote:>> 2013-10-01  Marcus Shawcroft 
> 
> >>
> >>  * configure.ac (AC_CHECK_FUNCS_ONCE): Add for exit() then make
> >>  existing AC_CHECK_FUNCS_ONCE dependent on outcome.
> >
> > Ping^2
> 
> For configure patches, I am never quite sure whether they should be reviewed 
> by a build maintainer or by the library maintainer. In any case, the change 
> looks reasonable to me and as no other newlib user has complained, it is also 
> okay from my side.

Would be nice for a build/config person to weigh in or to upgrade and make 
bullet proof the system against such failures.  My take, by default, the 
compile line should do something useful, and that should be enough for autoconf 
style tests to smell the library.  Now, we can observe that Steve's 
mips-mti-elf newlib port apparently violates that, but it is lost on me why 
that is and why that is a good thing.  Steve?  Is there some reason why by 
default, a suitable linker script can't be added by the compiler, or, if none 
suitable, a stub one that isn't suitable in general, but, is complete enough to 
allow autoconf to function normally?

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-10-15 Thread Wei Mi

Thanks for the comments. One question inlined. Preparing another patch
addressing the comments.

Regards,
Wei Mi.

On Tue, Oct 15, 2013 at 1:35 PM, Jeff Law  wrote:
> On 10/03/13 12:24, Wei Mi wrote:
>>
>> Thanks,
>> Wei Mi.
>>
>> 2013-10-03  Wei Mi  
>>
>>  * gcc/config/i386/i386.c (memory_address_length): Extract a part
>>  of code to rip_relative_addr_p.
>>  (rip_relative_addr_p): New Function.
>>  (ix86_macro_fusion_p): Ditto.
>>  (ix86_macro_fusion_pair_p): Ditto.
>>  * gcc/config/i386/i386.h: Add new tune features about
>> macro-fusion.
>>  * gcc/config/i386/x86-tune.def (DEF_TUNE): Ditto.
>>  * gcc/doc/tm.texi: Generated.
>>  * gcc/doc/tm.texi.in: Ditto.
>>  * gcc/haifa-sched.c (try_group_insn): New Function.
>>  (group_insns_for_macro_fusion): Ditto.
>>  (sched_init): Call group_insns_for_macro_fusion.
>>  * gcc/sched-rgn.c (add_branch_dependences): Keep insns in
>>  a SCHED_GROUP at the end of BB to remain their location.
>>  * gcc/target.def: Add two hooks: macro_fusion_p and
>>  macro_fusion_pair_p.
>
> I'm not going to comment on the x86 specific stuff -- I'll defer to the port
> maintainers for that.
>
>
>
>> index 61eaaef..d6726a9 100644
>> --- a/gcc/haifa-sched.c
>> +++ b/gcc/haifa-sched.c
>> @@ -6519,6 +6519,44 @@ setup_sched_dump (void)
>>  ? stderr : dump_file);
>>   }
>>
>> +static void
>> +try_group_insn (rtx insn)
>
> You need a comment for this function.
>

Ok, will add comment for it.

>
>
>> +{
>> +  unsigned int condreg1, condreg2;
>> +  rtx cc_reg_1;
>> +  rtx prev;
>> +
>> +  targetm.fixed_condition_code_regs (&condreg1, &condreg2);
>> +  cc_reg_1 = gen_rtx_REG (CCmode, condreg1);
>> +  prev = prev_nonnote_nondebug_insn (insn);
>> +  if (!any_condjump_p (insn)
>> +  || !reg_referenced_p (cc_reg_1, PATTERN (insn))
>> +  || !prev
>> +  || !modified_in_p (cc_reg_1, prev))
>> +return;
>
> I'd test !any_condjump_p at the start of this function before calling the
> target hook.  If insn isn't a conditional jump, then all the other work is
> totally useless.

Ok. will fix it.

>
> Aren't you just trying to see if we have a comparison feeding the
> conditional jump and if they're already adjacent?  Do you actually need to
> get the condition code regs to do that test?
>

Yes, I am trying to see if we have a comparison feeding the
conditional jump and if they're already adjacent. Do you have more
easier way to do that test?

>
>> +
>> +  /* Different microarchitectures support macro fusions for different
>> + combinations of insn pairs.  */
>> +  if (!targetm.sched.macro_fusion_pair_p
>> +  || !targetm.sched.macro_fusion_pair_p (prev, insn))
>> +return;
>> +
>> +  SCHED_GROUP_P (insn) = 1;
>
> I'm surprised that SCHED_GROUP_P worked -- I've tried to do similar stuff in
> the past and ran into numerous problems trying to hijack SCHED_GROUP_P for
> this kind of purpose.
>
>
>
>>
>>   static void haifa_init_only_bb (basic_block, basic_block);
>> diff --git a/gcc/sched-rgn.c b/gcc/sched-rgn.c
>> index e1a2dce..156359e 100644
>> --- a/gcc/sched-rgn.c
>> +++ b/gcc/sched-rgn.c
>> @@ -2443,6 +2443,8 @@ add_branch_dependences (rtx head, rtx tail)
>>cc0 setters remain at the end because they can't be moved away from
>>their cc0 user.
>>
>> + Predecessors of SCHED_GROUP_P instructions at the end remain at the
>> end.
>> +
>>COND_EXEC insns cannot be moved past a branch (see e.g. PR17808).
>>
>>Insns setting TARGET_CLASS_LIKELY_SPILLED_P registers (usually
>> return
>> @@ -2465,7 +2467,8 @@ add_branch_dependences (rtx head, rtx tail)
>>   #endif
>>   || (!reload_completed
>>   && sets_likely_spilled (PATTERN (insn)
>> -|| NOTE_P (insn))
>> +|| NOTE_P (insn)
>> +|| (last != 0 && SCHED_GROUP_P (last)))
>>   {
>> if (!NOTE_P (insn))
>>  {
>
> This looks like a straighforward bugfix and probably should go forward
> independent of this enhancement.

Ok, I will separate it into another patch.

>
> Jeff

Add predefined macros for library use in defining __STDC_IEC_559*

2013-10-15 Thread Joseph S. Myers

ISO C99 and C11 specify various predefined macros for implementation
properties that describe library features, or features of the
combination of the compiler and library implementations.  In such
cases, GCC does not predefine the macros but provides a mechanism for
the library implementation to provide a stdc-predef.h header with
definitions of those macros, and implicitly preincludes that header in
compilations to meet the standard requirement for the macro
definitions to be constant throughout the translation unit.

Two such macros are __STDC_IEC_559__ and __STDC_IEC_559_COMPLEX__,
relating to IEEE 754 floating-point arithmetic support.  Now, for
neither of those are the relevant features fully implemented in GCC
(or most other C99 compilers), but as with __STDC_VERSION__ it's
appropriate to treat these macros as indicating intent rather than
completeness, given that the GCC documentation specifically disclaims
completeness of C99 or C11 support.

Intent can be unclear in some cases, but it's reasonably clear that if
the user used -ffast-math, for example, that has specified an intent
to enable optimizations that conflict with the C99/C11 Annex F
requirements.  And if the compiler configuration uses software
floating point without exceptions and rounding modes support (and
making software floating point support exceptions and rounding modes
depends on compiler and library cooperation, and likely isn't desired
for many configurations that use software floating point), that is
also prevents proper Annex F support.

 suggested
a glibc sysdeps mechanism for controlling whether these macros are
defined, but that would only handle architecture-specific conditionals
for software floating point.  There are various GCC options that
indicate intent contrary to Annex F, many of which do not predefine
macros that could be used to control the __STDC_* definitions
appropriately.

This patch adds new predefined macros __GCC_IEC_559 and
__GCC_IEC_559_COMPLEX, whose values specify the intent of the compiler
configuration and command-line options and so can be used by
stdc-predef.h to determine whether to define the __STDC_* macros.  The
detailed semantics are given in the documentation changes;
distinguishing whether IEEE 754-2008 is supported is intended to allow
use in future to control the definition of __STDC_IEC_60559_BFP__ if
support for TS 18661-1 (latest draft: WG14 N1756) were to be added to
glibc (again, there are compiler features involved, quite complicated
to implement in the case of the static rounding direction pragmas, but
the macros are treated as being about intent rather than
completeness).

(It is quite likely there is *also* use for predefined macros for
various individual floating-point options that don't currently have
them, to facilitate optimization of library calls by indicating to the
library what the requirements are for accuracy, rounding mode
handling, exception handling etc. of library functions being called
from the current translatin unit, and so allow calls to be directed to
appropriate optimized function variants.  But that's independent of
the new macros in support of __STDC_*; I don't think __STDC_* can
readily be computed from macros for individual options, given the
possibility of new options for new optimizations that may also
conflict with Annex F requirements.)

The default macro definitions are based on whether there is an adddf3
insn pattern - except on powerpc*-*-linux*, where a separate target
hook implementation is used because of software floating point support
in glibc handling exceptions and rounding modes - and on various
command-line options.  (-ftrapping-math and -frounding-math are not
considered because in C99/C11 terms those relate to the default state
of the FENV_ACCESS pragma, which is permitted to be "off".)

There is a "???" comment regarding -ffp-contract, where logically
-ffp-contract=fast should be considered to conflict with Annex F but
options such as -std=c99 don't do anything to change the default to
-ffp-contract=on (= -ffp-contract=off at present).  I looked back at
some of the discussion (Oct/Nov 2010) from when -ffp-contract was
added, but didn't find anything giving a reason why conformance
options don't do this, as I suggested in
 that they
should.  So in the absence of someone coming up with such a reason I'd
be inclined to make such options enable -ffp-contract=on, just as they
enable -fexcess-precision=standard on 32-bit x86, and add a
corresponding check when setting the new macros.

Portable testcases for these macros are difficult, given the inherent
system-dependency of their values (which is part of the point of
adding them), but tests are added that various options force the
values of the macros to 0, along with a test enabled for i?86-*-linux*
x86_64-*-linux* powerpc*-*-linux* that the values are 2 on those
targets with -st

Re: [SKETCH] Refactor implicit function template implementation and fix 58534, 58536, 58548, 58549 and 58637.

2013-10-15 Thread Adam Butcher

On Wed, 25 Sep 2013 11:01:26 -0500, Jason Merrill wrote:

On 09/24/2013 02:05 AM, Adam Butcher wrote:
> On the subject of on-the-fly synthesis: I haven't started yet but 
I'm

> thinking to trigger in 'cp_parser_simple_type_specifier' when
> 'current_binding_level->kind == sk_function_parms'.
>
This is working out quite well.  And I've cleaned up and optimized 
synthesize_implicit_template_parm to update current_template_parms 
directly.  Implicitly packs are still a problem though.

> I can foresee a
> potential issue with packs in that, upon reading the 'auto' 
(potentially
> nested in some other type), a plain template parm will be 
synthesized;

> but it may need to be a pack parm type if '...' is found later.

Hmm, good point.

I think there are two options:

1) Build up the type as normal and use tsubst to replace the non-pack 
template parameter with a pack if needed.

The problem I've hit with this (and other hacks I've tried that involve 
finishing the type first and rewriting afterward) is that, with parm 
types such as "pair...", the specialization "pair" 
is indexed in pt.c by hash_specialization into type_specializations 
before the '...' is seen.  In an explicit template, the template type 
parms are already marked as packs so are hashed as such and are distinct 
from non-pack template type parms.

The issue occurs in cases such as:

  auto a = [] (auto, pair v) { return sizeof (v); };
  auto b = [] (auto, pair... v) { return sizeof... (v); };

  a(1, pair());
  b(2, pair(), pair());

If the declarations of a and b are reversed, it works as expected.

I cannot find a way to prevent picking up the non-pack 
"pair" type on the way through parsing "pair...".  
I've tried a few hacks to prevent hashing if an 'auto' is present in the 
current type but ended up with canonical type issues.  I also tried 
rehashing after pack replacement but still hit the same issue.  My 
conclusion is that the 'auto' needs to synthesize a pack type from the 
start for parameter packs.  Which leads to:

2) If we see 'auto', scan ahead (possibly via tentative parsing) to 
see if there's a ...

My current preferred option.  The problem with it is that, ideally, I 
only want to look ahead for '...' in the parm decl if an 'auto' is seen. 
But I'm not sure how to do this in the case where the first 'auto' is 
nested in a template parameter (or other complex type).  E.g.

   auto f(pair, auto>... v)
^
From the 'auto' I'd need to unwind to the fn parm scope and then try to 
tentatively parse the ellipsis.  To unwind and look ahead for '...' 
needs the full parser mechanics but without any side-effects.  I don't 
think I can do it with the lexer alone as I think there may be ambiguity 
with scanning <, <<, >, >> tokens.

Look-ahead seems like the right way to go (unless there's a way to 
defer type hashing) but I'm not sure how to achieve it.

Any suggestions?

Cheers,
Adam

Re: [PATCH] decide edge's hotness when there is profile info

2013-10-15 Thread Teresa Johnson

On Tue, Oct 15, 2013 at 2:03 PM, Jan Hubicka  wrote:
>> Yes, that would work. So let's discard this patch because the fix for
>> comdat can also fix this problem.
>
> Unforutnately ipa-profile-estimate is an IPA pass and as such it does
> not have access to profile_status_to_function.

I see. I was coding that up today but hadn't tested it yet.

> You can probably just factor this out into a function that can be called
> and for normal FDO we call it where the loop stis now and for auto-FDO we can
> probably have another invocation from before early passes where auto-FDO is 
> collected.

Ok, let's go with that approach for now. It won't address the 0 count
COMDAT calling another 0 count COMDAT problem, but I will probably
just find a way to deal with this when inlining.

>> >>> +  if (node->count)
>> >>> + continue;
> Also here we should sum the counts and consider function non unlikely executed
> in the same way as probably_never_executed does.

I assume you mean by doing the same comparison to the number of
profile->runs. Yes, this makes sense.

>
> I can prepare updated patch, but i am currently travelling, so i would not
> be disapointed if you beat me ;)

I'm working on it, and I think based on Dehao's needs I am going to
split up the patch into two phases, the one that does just the part
you had sent a patch for (making sure 0 count routines with non-zero
calls are marked guessed and have their node frequency set
appropriately), and a subsequent one to do the count application when
we inline a 0-count routine into a non-zero callsite. I'll shoot for
getting this ready by tomorrow.

BTW, in your original patch you are checking for both e->count or
cgraph_maybe_hot_edge_p(e), but AFAICT the call to
cgraph_maybe_hot_edge_p will never return true when e->count is zero.
When there is a profile it will return false via maybe_hot_count_p
since e->count == 0. When there is no profile it will return false
when the callee has NODE_FREQUENCY_UNLIKELY_EXECUTED. So I think just
checking for e->count >0 is sufficient here.

Thanks,
Teresa

>
> Honza

-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: PING Re: [Patch, Fortran] PR58658 - add missing pointer-assign check for CLASS(*)

2013-10-15 Thread Paul Richard Thomas

Then, the patch is OK for trunk :-)

Thanks for putting this right - it's obviously my cock-up!

Cheers

Paul



On 15 October 2013 22:20, Tobias Burnus  wrote:
> Hi Paul,
>
>
> Paul Richard Thomas wrote:
>>
>> Have you checked that:
>>
>> subroutine sub(a)
>>class(*),pointer :: a
>>a => null()
>> end subroutine
>>
>> does not give an error?  I think that it is why the check was introduced.
>
>
> I haven't checked it in particular, but was relying that some test in the
> library would test for it. Additionally, gfc_expr_attr() takes care to set
> for BT_CLASS "pointer" only for class_pointer - which I also checked before
> submittal.
>
> In any case, your test case compiles and "grep '=>'" finds the following
> NULL initializations:
>
> unlimited_polymorphic_1.f03:  u2 => NULL()
> unlimited_polymorphic_1.f03:  u2 => NULL(aptr)
>
> Tobias



-- 
The knack of flying is learning how to throw yourself at the ground and miss.
   --Hitchhikers Guide to the Galaxy

Re: [PATCH] Fix profile count updates during tail merging

2013-10-15 Thread Jan Hubicka

> This patch fixes a profile count insanity introduced by ssa tail
> merging. When replacing bb1 with bb2, which has the same successors,
> the bb counts were being merged, but the successor edge weights
> were not.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk?
> 
> Thanks,
> Teresa
> 
> 2013-10-15  Teresa Johnson  
> 
> * tree-ssa-tail-merge.c (replace_block_by): Update edge
> weights during merging.
> 
> Index: tree-ssa-tail-merge.c
> ===
> --- tree-ssa-tail-merge.c   (revision 203389)
> +++ tree-ssa-tail-merge.c   (working copy)
> @@ -1462,6 +1462,8 @@ static void
>  replace_block_by (basic_block bb1, basic_block bb2)
>  {
>edge pred_edge;
> +  edge e1;
> +  edge_iterator ei;
>unsigned int i;
>gimple bb2_phi;
> 
> @@ -1488,6 +1490,15 @@ replace_block_by (basic_block bb1, basic_block bb2
>pred_edge, UNKNOWN_LOCATION);
>  }
> 
> +  /* Merge the outgoing edge counts from bb1 onto bb2.  */
> +  FOR_EACH_EDGE (e1, ei, bb1->succs)
> +{
> +  edge e2;
> +  e2 = find_edge (bb2, e1->dest);
> +  gcc_assert (e2);
> +  e2->count += e1->count;

Don't you need to redistribute the counts via edge probabilities?

Honza
> +}
> +
>bb2->frequency += bb1->frequency;
>if (bb2->frequency > BB_FREQ_MAX)
>  bb2->frequency = BB_FREQ_MAX;
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: [PATCH] decide edge's hotness when there is profile info

2013-10-15 Thread Jan Hubicka

> Yes, that would work. So let's discard this patch because the fix for
> comdat can also fix this problem.

Unforutnately ipa-profile-estimate is an IPA pass and as such it does
not have access to profile_status_to_function.
You can probably just factor this out into a function that can be called
and for normal FDO we call it where the loop stis now and for auto-FDO we can
probably have another invocation from before early passes where auto-FDO is 
collected.
> >>> +  if (node->count)
> >>> + continue;
Also here we should sum the counts and consider function non unlikely executed
in the same way as probably_never_executed does.

I can prepare updated patch, but i am currently travelling, so i would not
be disapointed if you beat me ;)

Honza

Re: [PATCH] Hoist loop invariant statements containing data refs with zero-step during loop-versioning in vectorization.

2013-10-15 Thread Cong Hou

Thank you for your reminder, Jeff! I just noticed Richard's comment. I
have modified the patch according to that.

The new patch is attached.


thanks,
Cong


On Tue, Oct 15, 2013 at 12:33 PM, Jeff Law  wrote:
> On 10/14/13 17:31, Cong Hou wrote:
>>
>> Any comment on this patch?
>
> Richi replied in the BZ you opened.
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58508
>
> Essentially he said emit the load on the edge rather than in the block
> itself.
> jeff
>
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 8a38316..2637309 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2013-10-15  Cong Hou  
+
+   * tree-vect-loop-manip.c (vect_loop_versioning): Hoist loop invariant
+   statement that contains data refs with zero-step.
+
 2013-10-14  David Malcolm  
 
* dumpfile.h (gcc::dump_manager): New class, to hold state
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 075d071..9d0f4a5 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2013-10-15  Cong Hou  
+
+   * gcc.dg/vect/pr58508.c: New test.
+
 2013-10-14  Tobias Burnus  
 
PR fortran/58658
diff --git a/gcc/testsuite/gcc.dg/vect/pr58508.c 
b/gcc/testsuite/gcc.dg/vect/pr58508.c
new file mode 100644
index 000..cb22b50
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr58508.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
+
+
+/* The GCC vectorizer generates loop versioning for the following loop
+   since there may exist aliasing between A and B.  The predicate checks
+   if A may alias with B across all iterations.  Then for the loop in
+   the true body, we can assert that *B is a loop invariant so that
+   we can hoist the load of *B before the loop body.  */
+
+void foo (int* a, int* b)
+{
+  int i;
+  for (i = 0; i < 10; ++i)
+a[i] = *b + 1;
+}
+
+
+/* { dg-final { scan-tree-dump-times "hoist" 2 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 574446a..f4fdec2 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -2477,6 +2477,92 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
   adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi));
 }
 
+
+  /* Extract load and store statements on pointers with zero-stride
+ accesses.  */
+  if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
+{
+  /* In the loop body, we iterate each statement to check if it is a load
+or store.  Then we check the DR_STEP of the data reference.  If
+DR_STEP is zero, then we will hoist the load statement to the loop
+preheader, and move the store statement to the loop exit.  */
+
+  for (gimple_stmt_iterator si = gsi_start_bb (loop->header);
+  !gsi_end_p (si);)
+   {
+ gimple stmt = gsi_stmt (si);
+ stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+ struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
+
+ if (dr && integer_zerop (DR_STEP (dr)))
+   {
+ if (DR_IS_READ (dr))
+   {
+ if (dump_enabled_p ())
+   {
+ dump_printf_loc
+ (MSG_NOTE, vect_location,
+  "hoist the statement to outside of the loop ");
+ dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
+ dump_printf (MSG_NOTE, "\n");
+   }
+
+ gsi_remove (&si, false);
+ gsi_insert_on_edge_immediate (loop_preheader_edge (loop), 
stmt);
+   }
+ /* TODO: We also consider vectorizing loops containing zero-step
+data refs as writes.  For example:
+
+int a[N], *s;
+for (i = 0; i < N; i++)
+  *s += a[i];
+
+In this case the write to *s can be also moved after the
+loop.  */
+
+ continue;
+   }
+ else if (!dr)
+ {
+   bool hoist = true;
+   for (size_t i = 0; i < gimple_num_ops (stmt); i++)
+ {
+   tree op = gimple_op (stmt, i);
+   if (TREE_CODE (op) == INTEGER_CST
+   || TREE_CODE (op) == REAL_CST)
+ continue;
+   if (TREE_CODE (op) == SSA_NAME)
+ {
+   gimple def = SSA_NAME_DEF_STMT (op);
+   if (def == stmt
+   || gimple_nop_p (def)
+   || !flow_bb_inside_loop_p (loop, gimple_bb (def)))
+ continue;
+ }
+   hoist = false;
+   break;
+ }
+
+   if (hoist)
+ {
+   gsi_remove (&si, false);
+   gsi_insert_on_edge_immediate (loop_preheader_edge (loop), stmt);
+
+   if (dump_enabl

[PATCH] Another ChangeLog entry in the wrong place

2013-10-15 Thread Jeff Law



I noticed a testsuite entry went into gcc/ChangeLog rather than 
gcc/testsuite/ChangeLog while looking to see if Paulo's recent patch had 
already been reviewed & installed.


Fixed in the obvious way.

commit c7b05d9a220e67fdaaab74aa51c81d14284ed99f
Author: Jeff Law 
Date:   Tue Oct 15 14:43:27 2013 -0600

Move Paulo's 9/27/2013 ChangeLog entry to the right file

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 60e76c1..c0602b5 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -3128,11 +3128,6 @@
 
 2013-09-27  Paulo Matos  
 
-   PR middle-end/58463
-   * gcc.dg/pr58463.c: New test.
-
-2013-09-27  Paulo Matos  
-
* cfgloop.h (number_of_loops): Fix typo in check for null.
 
 2013-09-27  Jakub Jelinek  
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 11099e1..1e60e53 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -479,6 +479,11 @@
* gcc.target/powerpc/p8vector-ldst.c: New test for -mupper-regs-sf
and -mupper-regs-df.
 
+2013-09-27  Paulo Matos  
+
+   PR middle-end/58463
+   * gcc.dg/pr58463.c: New test.
+
 2013-09-27  Jakub Jelinek  
 
PR middle-end/58551

Re: [PATCH] Fix PR54702 and LTO bootstrap (for me)

2013-10-15 Thread Mike Stump

On Oct 15, 2013, at 12:45 AM, Richard Biener  wrote:
> On Mon, 14 Oct 2013, Mike Stump wrote:
>> On Sep 25, 2012, at 8:00 AM, Richard Guenther  wrote:
>>> 2012-09-25  Richard Guenther  
>>> 
>>> PR lto/54625
>>> * lto-symtab.c (lto_symtab_merge_cgraph_nodes_1): Do not merge
>>> cgraph nodes for builtins.
>>> 
>>> * gcc.dg/lto/pr54702_0.c: New testcase.
>>> * gcc.dg/lto/pr54702_1.c: Likewise.
>>> * gcc.dg/lto/pr54625-1_0.c: Likewise.
>>> * gcc.dg/lto/pr54625-1_1.C: Likewise.
>>> * gcc.dg/lto/pr54625-2_0.c: Likewise.
>>> * gcc.dg/lto/pr54625-2_1.C: Likewise.
>> 
>> 
>> xgcc: error: /home/mrs/wg/gcc/gcc/testsuite/gcc.dg/lto/pr54625-2_1.C: C++ 
>> compiler not installed on this system
>> 
>> FAIL: gcc.dg/lto/pr54625-2 c_lto_pr54625-2_1.o assemble, -O2 -flto 
>> -fuse-linker-plugin
>> 
>> shouldn't C++ testcases go into the C++ testsuite?
> 
> Well ... this is a mixed C / C++ testcase (see the other file 
> participating, gcc.dg/lto/pr54625-2_0.c).  The C testsuite is the only
> one where the driver switches languages based on the file ending.

That is an interesting inquiry, but, doesn't lead to the error.  Not sure 
exactly why you bring it up:

From a random C++ test suite run:

spawn /home/mrs/work2/gcc-port/gcc/xgcc -B/home/mrs/work2/gcc-port/gcc/ 
-fno-diagnostics-show-caret -O0 -flto -flto-partition=none -fuse-linker-plugin 
-DNO_TRAMPOLINES -c -DNO_TRAMPOLINES -o cp_lto_20100603-1_1.o 
/home/mrs/work2/gcc/gcc/testsuite/g++.dg/lto/20100603-1_1.c

here, we see that xgcc is used to build a C part of a mix language testcase in 
the C++ test suite.

If you follow that style, is there some reason why it won't just work as you 
want?

> Hmm, it seems at least the fortran frontend knows how to mix fortran
> and C.

And the fortran/c test cases are in the fortran test suite.

> So if you really did not build the C++ compiler then we need,

Yup, no C++ compiler.

> at least for lto.exp, a way to say dg-require-c++-frontend in the main file
> (thats the _0 one).

Why do that?

> Btw, can we end up with the C frontend not built?

I'v never contemplated that idea…  but no, c is automatically added in order I 
suppose, to build the language independent runtime (libgcc) with.

Re: [PATCH] Add --enable-host-shared configuration option

2013-10-15 Thread David Malcolm

On Tue, 2013-10-15 at 10:29 -0600, Jeff Law wrote:
> On 10/09/13 18:25, David Malcolm wrote:
[...]

> Presumably other host libraries we depend on such as gmp, mpc, etc are 
> available in shared (or at least PIC) form as well?  Obviously these are 
> out of our source tree and largely out of our control, but I'm a bit 
> curious if there's other host libraries that may cause you headaches in 
> the future.

Yes, on this Fedora box:
$ file /usr/lib64/libgmp.so.10.0.2
/usr/lib64/libgmp.so.10.0.2: ELF 64-bit LSB shared object, x86-64,
version 1 (SYSV), dynamically linked,
BuildID[sha1]=0x6fb87c7b6e9aa56e4a09a3fbd496a9ee2dadbdc2, stripped

$ file /usr/lib64/libmpc.so.2.0.0 
/usr/lib64/libmpc.so.2.0.0: ELF 64-bit LSB shared object, x86-64,
version 1 (SYSV), dynamically linked,
BuildID[sha1]=0x7fb16b27ba6412301c4efab4b9c0c3a09bb3c95d, stripped

> This is fine with the install.texi changes Joseph wanted.

Thanks; I've committed the combined patch to trunk as r203632.

> zlib/
>   * configure.ac: Add --enable-host-shared, setting up new
> > PICFLAG variable.
> > * Makefile.am: Add PICFLAG to libz_a_CFLAGS.
> > * Makefile.in: Regenerate.
> > * configure: Regenerate.
> 

Incidentally, on committing this I noticed that zlib/ChangeLog is from
upstream zlib.  The downstream changes to gcc appear to be being
recorded into zlib/ChangeLog.gcj (I assume due to historical accident)
so I added the zlib changelog entries here.  I hope that's OK.

Dave

Re: libgomp.texi: Update for OpenMP v4.0

2013-10-15 Thread Jakub Jelinek

On Tue, Oct 15, 2013 at 10:13:32PM +0200, Tobias Burnus wrote:
> I used texinfo 4.14a to create the PDF and info file, which didn't
> show any warning. Also the output looks okay. (However, I believe
> some newer texinfo is picker.)
> OK for the trunk?

Thanks.  Just a few nits:

> +@node omp_set_default_device
> +@section @code{omp_set_default_device} -- Set the default device for target 
> regions
> +@table @asis
> +@item @emph{Description}:
> +Set the default device for target regions without device clause.  The 
> argument
> +shall be a positive device number.

non-negative, 0 is a valid target device number.

> +@node OMP_CANCELLATION
> +@section @env{OMP_CANCELLATION} -- Set whether cancellation is activated
> +@cindex Environment Variable
> +@table @asis
> +@item @emph{Description}:
> +If set to @code{TRUE}, the cancellation is activated.  If set to 
> @code{FALSE} or
> +if unset, cancellation is disabled and the @code{cancel} construct is 
> ignored.

Perhaps mention that the default is @code{FALSE}?

> +@node OMP_DEFAULT_DEVICE
> +@section @env{OMP_DEFAULT_DEVICE} -- Set the device used in target regions
> +@cindex Environment Variable
> +@table @asis
> +@item @emph{Description}:
> +Set to choose the device which is used in a @code{target} region, unless the
> +value is overridden by @code{omp_set_default_device} or by a @code{device}
> +clause.  The value shall be the positive device number or @code{0} to execute

non-negative; 0 doesn't force execution on host (well, right now everything
is executed on the host, but hopefully that will change in the future), only
numbers equal or above omp_get_num_devices () do.

> +on the host.  If unset, code in a @code{target} region is executed on the 
> host.

Nope, the default will be probably 0, first target device if present.

> +Alternatively, the placement can be specified explicitly as comma-separated
> +list of places.  A place is specified by set of nonnegative numbers in curly
> +braces, denoting the denoting the hardware threads.  The hardware threads
> +belonging to a place can either be specified as comma-separated list of
> +nonnegative thread numbers or using an interval.  Multiple places can also be
> +either specified by a comma-separated list of places or by an interval.  To
> +specify an interval, a colon with the number of threads which shall be in the
> +interval is placed after after the hardware thread number or the place.
> +Optionally, the length can be followed by a colon and the stride number --
> +otherwise a unit stride is assumed.  For instance, the following specifies 
> the
> +same places list: @code{"@{0,1,2@}, @{3,4,6@}, @{7,8,9@}, @{10,11,12@}"};
> +@code{"@{0:2@}, @{3:6@}, @{7:9@}, @{10:12@}"}; and @code{"@{0:2@}:4:3"}.

The number after : is count, not end, so you want here
{0:3}, {3:3}, {7:3}, {10:3} resp. {0:3}:4:3.

Jakub

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-10-15 Thread Jeff Law


On 10/03/13 12:24, Wei Mi wrote:

Thanks,
Wei Mi.

2013-10-03  Wei Mi  

 * gcc/config/i386/i386.c (memory_address_length): Extract a part
 of code to rip_relative_addr_p.
 (rip_relative_addr_p): New Function.
 (ix86_macro_fusion_p): Ditto.
 (ix86_macro_fusion_pair_p): Ditto.
 * gcc/config/i386/i386.h: Add new tune features about macro-fusion.
 * gcc/config/i386/x86-tune.def (DEF_TUNE): Ditto.
 * gcc/doc/tm.texi: Generated.
 * gcc/doc/tm.texi.in: Ditto.
 * gcc/haifa-sched.c (try_group_insn): New Function.
 (group_insns_for_macro_fusion): Ditto.
 (sched_init): Call group_insns_for_macro_fusion.
 * gcc/sched-rgn.c (add_branch_dependences): Keep insns in
 a SCHED_GROUP at the end of BB to remain their location.
 * gcc/target.def: Add two hooks: macro_fusion_p and
 macro_fusion_pair_p.
I'm not going to comment on the x86 specific stuff -- I'll defer to the 
port maintainers for that.




index 61eaaef..d6726a9 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -6519,6 +6519,44 @@ setup_sched_dump (void)
 ? stderr : dump_file);
  }

+static void
+try_group_insn (rtx insn)

You need a comment for this function.



+{
+  unsigned int condreg1, condreg2;
+  rtx cc_reg_1;
+  rtx prev;
+
+  targetm.fixed_condition_code_regs (&condreg1, &condreg2);
+  cc_reg_1 = gen_rtx_REG (CCmode, condreg1);
+  prev = prev_nonnote_nondebug_insn (insn);
+  if (!any_condjump_p (insn)
+  || !reg_referenced_p (cc_reg_1, PATTERN (insn))
+  || !prev
+  || !modified_in_p (cc_reg_1, prev))
+return;
I'd test !any_condjump_p at the start of this function before calling 
the target hook.  If insn isn't a conditional jump, then all the other 
work is totally useless.


Aren't you just trying to see if we have a comparison feeding the 
conditional jump and if they're already adjacent?  Do you actually need 
to get the condition code regs to do that test?



+
+  /* Different microarchitectures support macro fusions for different
+ combinations of insn pairs.  */
+  if (!targetm.sched.macro_fusion_pair_p
+  || !targetm.sched.macro_fusion_pair_p (prev, insn))
+return;
+
+  SCHED_GROUP_P (insn) = 1;
I'm surprised that SCHED_GROUP_P worked -- I've tried to do similar 
stuff in the past and ran into numerous problems trying to hijack 
SCHED_GROUP_P for this kind of purpose.





  static void haifa_init_only_bb (basic_block, basic_block);
diff --git a/gcc/sched-rgn.c b/gcc/sched-rgn.c
index e1a2dce..156359e 100644
--- a/gcc/sched-rgn.c
+++ b/gcc/sched-rgn.c
@@ -2443,6 +2443,8 @@ add_branch_dependences (rtx head, rtx tail)
   cc0 setters remain at the end because they can't be moved away from
   their cc0 user.

+ Predecessors of SCHED_GROUP_P instructions at the end remain at the end.
+
   COND_EXEC insns cannot be moved past a branch (see e.g. PR17808).

   Insns setting TARGET_CLASS_LIKELY_SPILLED_P registers (usually return
@@ -2465,7 +2467,8 @@ add_branch_dependences (rtx head, rtx tail)
  #endif
  || (!reload_completed
  && sets_likely_spilled (PATTERN (insn)
-|| NOTE_P (insn))
+|| NOTE_P (insn)
+|| (last != 0 && SCHED_GROUP_P (last)))
  {
if (!NOTE_P (insn))
 {
This looks like a straighforward bugfix and probably should go forward 
independent of this enhancement.


Jeff

libgomp.texi: Update for OpenMP v4.0

2013-10-15 Thread Tobias Burnus


Hi Jakub, hi all,

the attached patch updated the references in libgomp to OpenMP 4.0 and 
documents the new OpenMPv4 library functions and environment variables. 
It does *not* update the section about the libgomp API.


Additionally, I fixes some bugs (-> omp_get_schedule, omp_set_schedule), 
added the lacking second space after the end-of-sentence fullstops. 
Furthermore, I have put the "They have C linkage, and don't throw 
exceptions." in the previous paragraph as the output looked wrong in PDF 
mode. (There you get four lines without the @menu and those four lines 
are introduced by "structured in following three parts:").


As the library part for Fortran matches OpenMPv4, I have also updated 
the section about parameters (constants) in fortran/intrinsic.texi. As 
OpenMPv4 is not yet fully supported, I mention both the OpenMPv3.1 and 
4.0 value for "openmp_version".
(When the full support has been added, the 3.1 version number can be 
removed and gfortran.texi be updated as well.)


I used texinfo 4.14a to create the PDF and info file, which didn't show 
any warning. Also the output looks okay. (However, I believe some newer 
texinfo is picker.)

OK for the trunk?

Tobias
gcc/fortran/
2013-10-15  Tobias Burnus  

	* intrinsic.texi (OpenMP Modules): Update to OpenMPv4.
	Document omp_proc_bind_kind.

libgomp/
2013-10-15  Tobias Burnus  

	* libgomp.texi: (Runtime Library Routines): Update references for
	OpenMP 4.0. Add omp_get_cancellation, omp_get_default_device,
	omp_get_num_devices, omp_get_num_teams, omp_get_proc_bind,
	omp_get_team_num, omp_is_initial_device, omp_set_default_device.
	(Environment Variables): Update references for OpenMP 4.0. Add
	OMP_CANCELLATION, OMP_DEFAULT_DEVICE, OMP_PLACES.
	Move OMP_DISPLAY_ENV and OMP_PROC_BIND up to be in alphabetical
	order.

diff --git a/gcc/fortran/intrinsic.texi b/gcc/fortran/intrinsic.texi
index afe9671..18bd4d2 100644
--- a/gcc/fortran/intrinsic.texi
+++ b/gcc/fortran/intrinsic.texi
@@ -13177,7 +13177,7 @@ Both are equivalent to the value @code{NULL} in C.
 @section OpenMP Modules @code{OMP_LIB} and @code{OMP_LIB_KINDS}
 @table @asis
 @item @emph{Standard}:
-OpenMP Application Program Interface v3.1
+OpenMP Application Program Interface v4.0
 @end table
 
 
@@ -13190,8 +13190,8 @@ the named constants defined in the modules are listed
 below.
 
 For details refer to the actual
-@uref{http://www.openmp.org/mp-documents/spec31.pdf,
-OpenMP Application Program Interface v3.1}.
+@uref{http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf,
+OpenMP Application Program Interface v4.0}.
 
 @code{OMP_LIB_KINDS} provides the following scalar default-integer
 named constants:
@@ -13199,15 +13199,17 @@ named constants:
 @table @asis
 @item @code{omp_lock_kind}
 @item @code{omp_nest_lock_kind}
+@item @code{omp_proc_bind_kind}
 @item @code{omp_sched_kind}
 @end table
 
 @code{OMP_LIB} provides the scalar default-integer
 named constant @code{openmp_version} with a value of the form
 @var{mm}, where @code{} is the year and @var{mm} the month
-of the OpenMP version; for OpenMP v3.1 the value is @code{201107}.
+of the OpenMP version; for OpenMP v3.1 the value is @code{201107}
+and for OpenMP v4.0 the value is @code{201307}.
 
-And the following scalar integer named constants of the
+The following scalar integer named constants of the
 kind @code{omp_sched_kind}:
 
 @table @asis
@@ -13216,3 +13218,14 @@ kind @code{omp_sched_kind}:
 @item @code{omp_sched_guided}
 @item @code{omp_sched_auto}
 @end table
+
+And the following scalar integer named constants of the 
+kind @code{omp_proc_bind_kind}:
+
+@table @asis
+@item @code{omp_proc_bind_false}
+@item @code{omp_proc_bind_true}
+@item @code{omp_proc_bind_master}
+@item @code{omp_proc_bind_close}
+@item @code{omp_proc_bind_spread}
+@end table
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 5e55716..0033beb 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -106,17 +106,17 @@ for multi-platform shared-memory parallel programming in C/C++ and Fortran.
 @chapter Enabling OpenMP
 
 To activate the OpenMP extensions for C/C++ and Fortran, the compile-time 
-flag @command{-fopenmp} must be specified. This enables the OpenMP directive
+flag @command{-fopenmp} must be specified.  This enables the OpenMP directive
 @code{#pragma omp} in C/C++ and @code{!$omp} directives in free form, 
 @code{c$omp}, @code{*$omp} and @code{!$omp} directives in fixed form, 
 @code{!$} conditional compilation sentinels in free form and @code{c$},
-@code{*$} and @code{!$} sentinels in fixed form, for Fortran. The flag also
+@code{*$} and @code{!$} sentinels in fixed form, for Fortran.  The flag also
 arranges for automatic linking of the OpenMP runtime library 
 (@ref{Runtime Library Routines}).
 
 A complete description of all OpenMP directives accepted may be found in 
 the @uref{http://www.openmp.org, OpenMP Application Program Interface} manual,
-version 3.1.
+version 4.0.
 
 
 @c -

Re: PING Re: [Patch, Fortran] PR58658 - add missing pointer-assign check for CLASS(*)

2013-10-15 Thread Tobias Burnus


Hi Paul,

Paul Richard Thomas wrote:

Have you checked that:

subroutine sub(a)
   class(*),pointer :: a
   a => null()
end subroutine

does not give an error?  I think that it is why the check was introduced.


I haven't checked it in particular, but was relying that some test in 
the library would test for it. Additionally, gfc_expr_attr() takes care 
to set for BT_CLASS "pointer" only for class_pointer - which I also 
checked before submittal.


In any case, your test case compiles and "grep '=>'" finds the following 
NULL initializations:


unlimited_polymorphic_1.f03:  u2 => NULL()
unlimited_polymorphic_1.f03:  u2 => NULL(aptr)

Tobias

Re: [PATCH v2] Fix libgfortran cross compile configury w.r.t newlib

2013-10-15 Thread Tobias Burnus

Marcus Shawcroft wrote:>> 2013-10-01  Marcus Shawcroft 

>>
>>  * configure.ac (AC_CHECK_FUNCS_ONCE): Add for exit() then make
>>  existing AC_CHECK_FUNCS_ONCE dependent on outcome.
>
> Ping^2

For configure patches, I am never quite sure whether they should be 
reviewed by a build maintainer or by the library maintainer. In any 
case, the change looks reasonable to me and as no other newlib user has 
complained, it is also okay from my side.

Tobias

Re: [PATCH] Hoist loop invariant statements containing data refs with zero-step during loop-versioning in vectorization.

2013-10-15 Thread Jeff Law


On 10/14/13 17:31, Cong Hou wrote:

Any comment on this patch?

Richi replied in the BZ you opened.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58508

Essentially he said emit the load on the edge rather than in the block 
itself.

jeff

Re: Apply attribute returns_nonnull in libiberty

2013-10-15 Thread Marc Glisse


On Tue, 15 Oct 2013, Jeff Law wrote:


On 10/11/13 17:21, Marc Glisse wrote:

The gcc-specific part now. Bootstrap+testsuite on
x86_64-unknown-linux-gnu together with the libiberty patch (well,
libgomp.graphite/force-parallel-4.c failed, but that randomly happens).

2013-10-12  Marc Glisse  

 PR tree-optimization/58689
 * system.h (concat, reconcat, choose_temp_base, xmalloc, xrealloc,
 xcalloc, xstrdup, xstrndup, xmemdup, pex_init): Mark with attribute
 returns_nonnull.

OK.
jeff


Thanks.

I am going to wait a bit though. I sent an email to all libiberty users, 
and if none of them complains, I'd rather go with the original patch:


http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00817.html

--
Marc Glisse

Re: [patch] combine ICE fix

2013-10-15 Thread Jeff Law


On 10/10/13 10:25, Jakub Jelinek wrote:

On Thu, Oct 10, 2013 at 07:26:43AM -0700, Cesar Philippidis wrote:

This patch addresses an ICE when building qemu for a Mips target in
Yocto. Both gcc-trunk, gcc-4.8 and all of the targets are potentially
affected. The problem occurs because the instruction combine phase uses
two data structures to keep track of registers, reg_stat and
regstat_n_sets_and_refs, but they potentially end up out of sync; when
combine inserts a new register into reg_stat it does not update
regstat_n_sets_and_refs. Failing to update the latter results in an
occasional segmentation fault.

Is this OK for trunk and gcc-4.8? If so, please check it in. I tested it
on Mips and x86-64 and no regressions showed up.


That looks broken.  You leave everything from the last size till the current
one uninitialized, so it would only work if combine_split_insns
always increments max_reg_num () by at most one.
I don't think that assumption is safe.  Consider a parallel with a bunch 
of (clobber (match_scratch)) expressions.




Furthermore, there are macros which should be used to access
the fields, and, if the vector is ever going to be resized, supposedly
it should be vec.h vector rather than just array.
Or perhaps take into account:
/* If a pass need to change these values in some magical way or the
pass needs to have accurate values for these and is not using
incremental df scanning, then it should use REG_N_SETS and
REG_N_USES.  If the pass is doing incremental scanning then it
should be getting the info from DF_REG_DEF_COUNT and
DF_REG_USE_COUNT.  */
and not use REG_N_SETS etc. but instead the df stuff.
Which begs the question, how exactly is combine utilizing the 
regstat_n_* structures and is that use even valid for combine?


Jeff

Re: Apply attribute returns_nonnull in libiberty

2013-10-15 Thread Jeff Law


On 10/11/13 17:21, Marc Glisse wrote:

The gcc-specific part now. Bootstrap+testsuite on
x86_64-unknown-linux-gnu together with the libiberty patch (well,
libgomp.graphite/force-parallel-4.c failed, but that randomly happens).

2013-10-12  Marc Glisse  

 PR tree-optimization/58689
 * system.h (concat, reconcat, choose_temp_base, xmalloc, xrealloc,
 xcalloc, xstrdup, xstrndup, xmemdup, pex_init): Mark with attribute
 returns_nonnull.

OK.
jeff

[PATCH v2 3/4] Handle simple inheritance in gengtype.

2013-10-15 Thread Jeff Law


Treat GTY structs that have a "desc" as being the root of an inheritance
hierarchy.  Generate a switch on desc within the marking function with
cases for each subclass, visiting all fields of the type (including
inherited ones).

Don't create marking functions for subclasses, instead using the base
class marking functions.  Use walk_type on them within walk_subclasses
to generate the case within the switch for handling the tag, directly
walking all fields of the type.

* gengtype-parse.c (opts_have): Drop "static" so that
we can use this from gengtype.c.
* gengtype.c (set_gc_used_type): Mark any base class as used;
update field traversal to visit inherited fields.
(output_mangled_typename):  Convert references to classes within
an inheritance hierarchy to reference the ultimate base class,
since only it will have gt_ functions.
(get_string_option): New.
(walk_subclasses): New.
(walk_type): Treat GTY structs that have a "desc" as being the
root of an inheritance hierarchy.  Generate a switch on it
within the marking function which walks all subclasses, adding
cases for them via walk_subclasses.  For subclasses, visit all
fields of the type (including inherited ones).
(write_func_for_structure): Don't write fns for subclasses, only
for the ultimate base class within an inheritance hierarchy.
Subclasses-marking will be handled by the base class marking
functions.
(write_types): Likewise.
(write_local_func_for_structure): Likewise.
(USED_BY_TYPED_GC_P): Emit allocators for subclasses that have
a "tag" option (and are thus concrete subclasses).
(write_root): Use the marker function for the ultimate base class.
* gengtype.h (FOR_ALL_INHERITED_FIELDS): New.
(opts_have): Add declaration.
This is OK.

Jeff

Re: [PATCH v2 4/4] Add documentation about gengtype and inheritance

2013-10-15 Thread Jeff Law


On 09/24/13 11:49, David Malcolm wrote:

gcc/
* doc/gty.texi (GTY Options): Add note about inheritance to
description of desc and tag.
(Inheritance and GTY): New.
So what happens if I have a class hierarchy without a gty-user marker 
which violates the assumptions made by your new code?


I'm not worried about existing code with gty-user markers as I can 
easily see how those are avoided.  I'm thinking more of new code where 
the author isn't as familiar with the gty machinery as they should be or 
in the future if we're using this stuf extensively and someone mucks 
around with the class hierarchy and breaks one of the assumptions.  I 
just want to make sure those two cases fail-safe.


As for this patch, it is OK once the prerequisites go in.

Jeff

Re: [PATCH v2 2/4] Parse base classes for GTY-marked types

2013-10-15 Thread Jeff Law


On 09/24/13 11:49, David Malcolm wrote:

Extend gengtype (and gtype.state reading/writing) so that it is able to
parse base classes in simple cases, and only attempt to do it for
GTY-marked types.

* gengtype-parse.c (require_without_advance): New.
(type): For GTY-marked types that are not GTY((user)), parse any
base classes, requiring them to be single-inheritance, and not
be templates.  For non-GTY-marked types and GTY((user)),
continue to skip over any C++ inheritance specification.
* gengtype-state.c (state_writer::write_state_struct_type):
Write base class of type (if any).
(read_state_struct_type): Read base class of type (if any).
* gengtype.c (new_structure): Add a "base_class" parameter.
(create_optional_field_): Update for new parameter to
new_structure.
(adjust_field_rtx_def): Likewise.
(adjust_field_tree_exp): Likewise.
* gengtype.h (struct type): Add "base_class" field to the s
union field.
(new_structure): Add "base" parameter.

This is OK.

jeff

Re: [PATCH v2 1/4] Ignore access-control keywords when parsing fields.

2013-10-15 Thread Jeff Law


On 09/24/13 11:49, David Malcolm wrote:

Classes containing access-control keywords such as "public:"
confuse struct_field_seq, leading it to call consume_until_eos
i.e. ignore text until after the next semicolon.

This leads to the first field after an access-control keyword
being ignored by gengtype.  This can be seen in:
   http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01532.html
where the autogenerated marking function erroneously omitted the
traversal of the "callees" field of cgraph_node *.

This patch fixes up struct_field_seq to gracefully ignore such
keywords, thus fixing gengtype so that it does not erroneouly omit
fields of such a class.

* gengtype-parse.c (struct_field_seq): Ignore access-control
keywords ("public:" etc).

OK.
Jeff

Re: patch to canonize unsigned tree-csts

2013-10-15 Thread Richard Sandiford

Richard Sandiford  writes:
>   if (small_prec)
> ;
>   else if (precision == xprecision)
> while (len >= 0 && val[len - 1] == -1)
>   len--;

Err, len > 0 obviously.

Re: patch to canonize unsigned tree-csts

2013-10-15 Thread Richard Sandiford

Thanks for doing this.

Kenneth Zadeck  writes:
> @@ -1204,11 +1204,11 @@ wide_int_to_tree (tree type, const wide_
>  }
>  
>wide_int cst = wide_int::from (pcst, prec, sgn);
> -  int len = int (cst.get_len ());
> -  int small_prec = prec & (HOST_BITS_PER_WIDE_INT - 1);
> +  unsigned int len = int (cst.get_len ());

The cast to int is redundant now (get_len () is unsigned int).

> @@ -5172,39 +5173,48 @@ wi::int_traits ::get_precisi
>return TYPE_PRECISION (TREE_TYPE (tcst));
>  }
>  
> -/* Convert the tree_cst X into a wide_int.  */
> +/* Convert the tree_cst X into a wide_int of PRECISION.  */
>  inline wi::storage_ref
>  wi::int_traits ::decompose (HOST_WIDE_INT *scratch,
>   unsigned int precision, const_tree x)
>  {
> -  unsigned int xprecision = get_precision (x);
>unsigned int len = TREE_INT_CST_NUNITS (x);
>const HOST_WIDE_INT *val = (const HOST_WIDE_INT *) &TREE_INT_CST_ELT (x, 
> 0);
>unsigned int max_len = ((precision + HOST_BITS_PER_WIDE_INT - 1)
> / HOST_BITS_PER_WIDE_INT);
> -  /* Truncate the constant if necessary.  */
> -  if (len > max_len)
> -return wi::storage_ref (val, max_len, precision);
> +  unsigned int xprecision = get_precision (x);
> +
> +  gcc_assert (precision >= xprecision);
>  
> -  if (precision <= xprecision)
> +  /* Got to be careful of precision 0 values.  */
> +  if (precision)
> +len = MIN (len, max_len);
> +  if (TYPE_SIGN (TREE_TYPE (x)) == UNSIGNED)
>  {
> -  if (precision < HOST_BITS_PER_WIDE_INT 
> -   && TYPE_SIGN (TREE_TYPE (x)) == UNSIGNED)
> +  unsigned int small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
> +  if (small_prec)
> + {
> +   /* We have to futz with this because the canonization for
> +  short unsigned numbers in wide-int is different from the
> +  canonized short unsigned numbers in the tree-cst.  */
> +   if (len == max_len) 
> + {
> +   for (unsigned int i = 0; i < len - 1; i++)
> + scratch[i] = val[i];
> +   scratch[len - 1] = sext_hwi (val[len - 1], precision);
> +   return wi::storage_ref (scratch, len, precision);
> + }
> + }
> +
> +  if (precision < xprecision + HOST_BITS_PER_WIDE_INT)
>   {
> -   /* The rep of wide-int is signed, so if the value comes from
> -  an unsigned int_cst, we have to sign extend it to make it
> -  correct.  */
> -   scratch[0] = sext_hwi (val[0], precision);
> -   return wi::storage_ref (scratch, 1, precision);
> +   len = wi::force_to_size (scratch, val, len, xprecision, precision, 
> UNSIGNED);
> +   return wi::storage_ref (scratch, len, precision);
>   }

AFAICT, with the assert, this last case is only needed when
!small_prec && precision == xprecision, and the only situation it
handles is where the top bit is set.  Is that right?  If so,
we can use the original val as-is and just trim the length.  I.e.

  if (small_prec)
;
  else if (precision == xprecision)
while (len >= 0 && val[len - 1] == -1)
  len--;

Sorry if I've got that wrong -- it's taking a while to page the
wide-int stuff back in.

Thanks,
Richard

[GOOGLE] Add flag to enalbe AutoFDO accurate mode

2013-10-15 Thread Dehao Chen

This patch add a new flag to let user to tell compiler that the
AutoFDO profile is accurate. So the compiler will assume function
without any sample is UNLIKELY_EXECUTED. This could save 10%~20% text
section size.

Bootstrapped and passed regression test.

OK for google-4_8 branch?

Thanks,
Dehao

Index: gcc/common.opt
===
--- gcc/common.opt (revision 203546)
+++ gcc/common.opt (working copy)
@@ -942,6 +942,10 @@ Common Joined RejectNegative Var(auto_profile_file
 Use sample profile information for call graph node weights. The profile
 file is specified in the argument.

+fauto-profile-accurate
+Common Report Var(flag_auto_profile_accurate) Optimization
+Whether to assume the sample profile is accurate.
+
 ; -fcheck-bounds causes gcc to generate array bounds checks.
 ; For C, C++ and ObjC: defaults off.
 ; For Java: defaults to on.
Index: gcc/predict.c
===
--- gcc/predict.c (revision 203546)
+++ gcc/predict.c (working copy)
@@ -2866,7 +2866,9 @@ compute_function_frequency (void)
   || (flag_auto_profile && profile_status == PROFILE_GUESSED))
 {
   int flags = flags_from_decl_or_type (current_function_decl);
-  if (lookup_attribute ("cold", DECL_ATTRIBUTES (current_function_decl))
+  if (profile_info && flag_auto_profile_accurate)
+ node->frequency = NODE_FREQUENCY_UNLIKELY_EXECUTED;
+  else if (lookup_attribute ("cold", DECL_ATTRIBUTES
(current_function_decl))
   != NULL)
 node->frequency = NODE_FREQUENCY_UNLIKELY_EXECUTED;
   else if (lookup_attribute ("hot", DECL_ATTRIBUTES
(current_function_decl))

Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) & ~(CST2 - CST1)) == 0

2013-10-15 Thread Jeff Law


On 10/15/13 02:12, Jakub Jelinek wrote:

On Tue, Oct 15, 2013 at 03:57:23PM +0800, Zhenqiang Chen wrote:

Is it OK?


Ok, except two comments apparently still need updating.

+/* Optimize X == CST1 || X == CST2
+   if popcount (CST1 ^ CST2) == 1 into
+   (X & ~(CST1 ^ CST2)) == (CST1 & ~(CST1 ^ CST2)).
+   Similarly for ranges.  E.g.
+   X != 2 && X != 3 && X != 10 && X != 11
+   will be transformed by the previous optimization into
+   (X - 2U) <= 1U && (X - 10U) <= 1U
+   and this loop can transform that into
+   ((X & ~8) - 2U) <= 1U.  */

Here the example is using != and &&, so it is transformed into:
!((X - 2U) <= 1U || (X - 10U) <= 1U)
(everything is negated at the end, and note || instead of &&,
with && it could fold into !(0))
and finally into:
!(((X & ~8) - 2U) <= 1U)
Or alternatively change the first expression into:
   X == 2 || X == 3 || X == 10 || X == 11
and the second to:
   (X - 2U) <= 1U || (X - 10U) <= 1U
and the third keep as is.

+/* Optimize X == CST1 || X == CST2
+   if popcount (CST2 - CST1) == 1 into
+   ((X - CST1) & ~(CST2 - CST1)) == 0.
+   Similarly for ranges.  E.g.
+   X == 43 || X == 76 || X == 44 || X == 78 || X == 77 || X == 46
+   || X == 75 || X == 45
+   will be transformed by the previous optimization into
+   (X - 43U) <= 3U && (X - 75U) <= 3U
+   and this loop can transform that into
+   ((X - 43U) & ~(75U - 43U)) <= 3U.  */

Here the example is using == and ||, so the only wrong thing in there
is that the second expression should be
(X - 43U) <= 3U || (X - 75U) <= 3U

Ok with those changes.
I fixed up the comments & ChangeLog entry and committed on Zhenqiang's 
behalf.


jeff

Re: RFA: Remove alias usage from libgcc/sync.c

2013-10-15 Thread Ian Lance Taylor

On Tue, Oct 15, 2013 at 10:18 AM, Richard Sandiford
 wrote:
> Ian Lance Taylor  writes:
>> On Tue, Oct 15, 2013 at 2:18 AM, Richard Biener
>>  wrote:
>>>
>>> Ok for the tailcall parts and the testcase - I'd prefer someone else to
>>> ack the libgcc change.  CCing maintainer.
>>
>> The libgcc patch is missing the updates to the comments.  This code is
>> confusing enough as it is, having incorrect comments wouldn't help.
>
> How does this look?
>
> Thanks,
> Richard
>
>
> gcc/
> 2013-10-15  Richard Biener  
>
> * tree-tailcall.c (find_tail_calls): Don't use tail-call recursion
> for built-in functions.
>
> gcc/testsuite/
> * gcc.dg/torture/builtin-self.c: New file.
>
> libgcc/
> * sync.c: Remove static aliases and define each function directly
> under its real name.


This one is OK.

Thanks.

Ian

Re: [PATCH] Relax the requirement of reduction pattern in GCC vectorizer.

2013-10-15 Thread Cong Hou

I have corrected the ChangeLog format, and committed this patch.

Thank you!


Cong


On Tue, Oct 15, 2013 at 6:38 AM, Richard Biener
 wrote:
> On Sat, Sep 28, 2013 at 3:28 AM, Cong Hou  wrote:
>> The current GCC vectorizer requires the following pattern as a simple
>> reduction computation:
>>
>>loop_header:
>>  a1 = phi < a0, a2 >
>>  a3 = ...
>>  a2 = operation (a3, a1)
>>
>> But a3 can also be defined outside of the loop. For example, the
>> following loop can benefit from vectorization but the GCC vectorizer
>> fails to vectorize it:
>>
>>
>> int foo(int v)
>> {
>>   int s = 1;
>>   ++v;
>>   for (int i = 0; i < 10; ++i)
>> s *= v;
>>   return s;
>> }
>>
>>
>> This patch relaxes the original requirement by also considering the
>> following pattern:
>>
>>
>>a3 = ...
>>loop_header:
>>  a1 = phi < a0, a2 >
>>  a2 = operation (a3, a1)
>>
>>
>> A test case is also added. The patch is tested on x86-64.
>>
>>
>> thanks,
>> Cong
>>
>> 
>>
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index 39c786e..45c1667 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,3 +1,9 @@
>> +2013-09-27  Cong Hou  
>> +
>> + * tree-vect-loop.c: Relax the requirement of the reduction
>
> ChangeLog format is
>
> * tree-vect-loop.c (vect_is_simple_reduction_1): Relax the
> requirement of the reduction.
>
> Ok with that change.
>
> Thanks,
> Richard.
>
>> + pattern so that one operand of the reduction operation can
>> + come from outside of the loop.
>> +
>>  2013-09-25  Tom Tromey  
>>
>>   * Makefile.in (PARTITION_H, LTO_SYMTAB_H, COMMON_TARGET_DEF_H)
>> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
>> index 09644d2..90496a2 100644
>> --- a/gcc/testsuite/ChangeLog
>> +++ b/gcc/testsuite/ChangeLog
>> @@ -1,3 +1,7 @@
>> +2013-09-27  Cong Hou  
>> +
>> + * gcc.dg/vect/vect-reduc-pattern-3.c: New test.
>> +
>>  2013-09-25  Marek Polacek  
>>
>>   PR sanitizer/58413
>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> index 2871ba1..3c51c3b 100644
>> --- a/gcc/tree-vect-loop.c
>> +++ b/gcc/tree-vect-loop.c
>> @@ -2091,6 +2091,13 @@ vect_is_slp_reduction (loop_vec_info loop_info,
>> gimple phi, gimple first_stmt)
>>   a3 = ...
>>   a2 = operation (a3, a1)
>>
>> +   or
>> +
>> +   a3 = ...
>> +   loop_header:
>> + a1 = phi < a0, a2 >
>> + a2 = operation (a3, a1)
>> +
>> such that:
>> 1. operation is commutative and associative and it is safe to
>>change the order of the computation (if CHECK_REDUCTION is true)
>> @@ -2451,6 +2458,7 @@ vect_is_simple_reduction_1 (loop_vec_info
>> loop_info, gimple phi,
>>if (def2 && def2 == phi
>>&& (code == COND_EXPR
>>|| !def1 || gimple_nop_p (def1)
>> +  || !flow_bb_inside_loop_p (loop, gimple_bb (def1))
>>|| (def1 && flow_bb_inside_loop_p (loop, gimple_bb (def1))
>>&& (is_gimple_assign (def1)
>>|| is_gimple_call (def1)
>> @@ -2469,6 +2477,7 @@ vect_is_simple_reduction_1 (loop_vec_info
>> loop_info, gimple phi,
>>if (def1 && def1 == phi
>>&& (code == COND_EXPR
>>|| !def2 || gimple_nop_p (def2)
>> +  || !flow_bb_inside_loop_p (loop, gimple_bb (def2))
>>|| (def2 && flow_bb_inside_loop_p (loop, gimple_bb (def2))
>>&& (is_gimple_assign (def2)
>>|| is_gimple_call (def2)
>> diff --git gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-3.c
>> gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-3.c
>> new file mode 100644
>> index 000..06a9416
>> --- /dev/null
>> +++ gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-3.c
>> @@ -0,0 +1,41 @@
>> +/* { dg-require-effective-target vect_int } */
>> +
>> +#include 
>> +#include "tree-vect.h"
>> +
>> +#define N 10
>> +#define RES 1024
>> +
>> +/* A reduction pattern in which there is no data ref in
>> +   the loop and one operand is defined outside of the loop.  */
>> +
>> +__attribute__ ((noinline)) int
>> +foo (int v)
>> +{
>> +  int i;
>> +  int result = 1;
>> +
>> +  ++v;
>> +  for (i = 0; i < N; i++)
>> +result *= v;
>> +
>> +  return result;
>> +}
>> +
>> +int
>> +main (void)
>> +{
>> +  int res;
>> +
>> +  check_vect ();
>> +
>> +  res = foo (1);
>> +  if (res != RES)
>> +abort ();
>> +
>> +  return 0;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>> +/* { dg-final { cleanup-tree-dump "vect" } } */
>> +

Re: RFA: Remove alias usage from libgcc/sync.c

2013-10-15 Thread Richard Sandiford

Ian Lance Taylor  writes:
> On Tue, Oct 15, 2013 at 2:18 AM, Richard Biener
>  wrote:
>>
>> Ok for the tailcall parts and the testcase - I'd prefer someone else to
>> ack the libgcc change.  CCing maintainer.
>
> The libgcc patch is missing the updates to the comments.  This code is
> confusing enough as it is, having incorrect comments wouldn't help.

How does this look?

Thanks,
Richard


gcc/
2013-10-15  Richard Biener  

* tree-tailcall.c (find_tail_calls): Don't use tail-call recursion
for built-in functions.

gcc/testsuite/
* gcc.dg/torture/builtin-self.c: New file.

libgcc/
* sync.c: Remove static aliases and define each function directly
under its real name.

Index: gcc/tree-tailcall.c
===
--- gcc/tree-tailcall.c 2013-10-15 08:52:07.358853220 +0100
+++ gcc/tree-tailcall.c 2013-10-15 08:53:06.980419473 +0100
@@ -446,7 +446,9 @@ find_tail_calls (basic_block bb, struct
   /* We found the call, check whether it is suitable.  */
   tail_recursion = false;
   func = gimple_call_fndecl (call);
-  if (func && recursive_call_p (current_function_decl, func))
+  if (func
+  && !DECL_BUILT_IN (func)
+  && recursive_call_p (current_function_decl, func))
 {
   tree arg;
 
Index: gcc/testsuite/gcc.dg/torture/builtin-self.c
===
--- /dev/null   2013-10-13 08:29:45.608935301 +0100
+++ gcc/testsuite/gcc.dg/torture/builtin-self.c 2013-10-15 08:58:00.064045777 
+0100
@@ -0,0 +1,10 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* Check that we can use this idiom to define out-of-line copies of built-in
+   functions.  This is used by libgcc/sync.c, for example.  */
+void __sync_synchronize (void)
+{
+  __sync_synchronize ();
+}
+/* { dg-final { scan-assembler "__sync_synchronize" } } */
+/* { dg-final { scan-assembler "\t(lock|mfence)" } } */
+/* { dg-final { scan-assembler-not "\tcall" } } */
Index: libgcc/sync.c
===
--- libgcc/sync.c   2013-10-15 08:52:07.358853220 +0100
+++ libgcc/sync.c   2013-10-15 18:16:24.383123115 +0100
@@ -67,27 +67,26 @@ Software Foundation; either version 3, o
 
 #if defined FN
 
-/* Define macros for each __sync_* function type.  Each macro defines a
-   local function called _ that acts like ___.
-   TYPE is a type that has UNITS bytes.  */
+/* Define functions called __sync__, with one macro per
+   signature.  TYPE is a type that has UNITS bytes.  */
 
 #define DEFINE_V_PV(NAME, UNITS, TYPE) \
-  static TYPE  \
-  NAME##_##UNITS (TYPE *ptr, TYPE value)   \
+  TYPE \
+  __##NAME##_##UNITS (TYPE *ptr, TYPE value)   \
   {\
 return __##NAME (ptr, value);  \
   }
 
-#define DEFINE_V_PVV(NAME, UNITS, TYPE)\
-  static TYPE  \
-  NAME##_##UNITS (TYPE *ptr, TYPE value1, TYPE value2) \
+#define DEFINE_V_PVV(NAME, UNITS, TYPE)
\
+  TYPE \
+  __##NAME##_##UNITS (TYPE *ptr, TYPE value1, TYPE value2) \
   {\
 return __##NAME (ptr, value1, value2); \
   }
 
 #define DEFINE_BOOL_PVV(NAME, UNITS, TYPE) \
-  static _Bool \
-  NAME##_##UNITS (TYPE *ptr, TYPE value1, TYPE value2) \
+  _Bool
\
+  __##NAME##_##UNITS (TYPE *ptr, TYPE value1, TYPE value2) \
   {\
 return __##NAME (ptr, value1, value2); \
   }
@@ -118,9 +117,7 @@ #define local_sync_lock_test_and_set DEF
 #define DEFINE1(NAME, UNITS, TYPE) \
   static int unused[sizeof (TYPE) == UNITS ? 1 : -1]   \
 __attribute__((unused));   \
-  local_##NAME (NAME, UNITS, TYPE);\
-  typeof (NAME##_##UNITS) __##NAME##_##UNITS   \
-__attribute__((alias (#NAME "_" #UNITS)));
+  local_##NAME (NAME, UNITS, TYPE);
 
 /* As above, but performing macro expansion on the arguments.  */
 #define DEFINE(NAME, UNITS, TYPE) DEFINE1 (NAME, UNITS, TYPE)
@@ -167,13 +164,11 @@ DEFINE (FN, 8, UOItype)
 
 #if defined Lsync_synchronize
 
-static void
-sync_synchronize (void)
+void
+__sync_synchronize (void)
 {
   __sync_synchronize ();
 }
-typeof (sync_synchro

Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) & ~(CST2 - CST1)) == 0

2013-10-15 Thread Jeff Law


On 10/15/13 10:53, Jakub Jelinek wrote:

On Tue, Oct 15, 2013 at 10:50:39AM -0600, Jeff Law wrote:

I noticed that we're now including rtl.h and tm_p.h in
tree-ssa-reassoc.c, which seems wrong.


Isn't that required for BRANCH_COST use?
Other option would be to add some dummy wrapper around
BRANCH_COST, put that wrapper into some file that already includes rtl.h and
tm_p.h and just call it from there.
Yea, looking at it now, I'm a bit surprised this hasn't been converted 
to a target hook which would avoid this problem entirely.


jeff

Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) & ~(CST2 - CST1)) == 0

2013-10-15 Thread Jakub Jelinek

On Tue, Oct 15, 2013 at 10:50:39AM -0600, Jeff Law wrote:
> I noticed that we're now including rtl.h and tm_p.h in
> tree-ssa-reassoc.c, which seems wrong.

Isn't that required for BRANCH_COST use?
Other option would be to add some dummy wrapper around
BRANCH_COST, put that wrapper into some file that already includes rtl.h and
tm_p.h and just call it from there.

Jakub

Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) & ~(CST2 - CST1)) == 0

2013-10-15 Thread Jeff Law


On 10/15/13 02:12, Jakub Jelinek wrote:

On Tue, Oct 15, 2013 at 03:57:23PM +0800, Zhenqiang Chen wrote:

Is it OK?


Ok, except two comments apparently still need updating.

+/* Optimize X == CST1 || X == CST2
+   if popcount (CST1 ^ CST2) == 1 into
+   (X & ~(CST1 ^ CST2)) == (CST1 & ~(CST1 ^ CST2)).
+   Similarly for ranges.  E.g.
+   X != 2 && X != 3 && X != 10 && X != 11
+   will be transformed by the previous optimization into
+   (X - 2U) <= 1U && (X - 10U) <= 1U
+   and this loop can transform that into
+   ((X & ~8) - 2U) <= 1U.  */

Here the example is using != and &&, so it is transformed into:
!((X - 2U) <= 1U || (X - 10U) <= 1U)
(everything is negated at the end, and note || instead of &&,
with && it could fold into !(0))
and finally into:
!(((X & ~8) - 2U) <= 1U)
Or alternatively change the first expression into:
   X == 2 || X == 3 || X == 10 || X == 11
and the second to:
   (X - 2U) <= 1U || (X - 10U) <= 1U
and the third keep as is.

+/* Optimize X == CST1 || X == CST2
+   if popcount (CST2 - CST1) == 1 into
+   ((X - CST1) & ~(CST2 - CST1)) == 0.
+   Similarly for ranges.  E.g.
+   X == 43 || X == 76 || X == 44 || X == 78 || X == 77 || X == 46
+   || X == 75 || X == 45
+   will be transformed by the previous optimization into
+   (X - 43U) <= 3U && (X - 75U) <= 3U
+   and this loop can transform that into
+   ((X - 43U) & ~(75U - 43U)) <= 3U.  */

Here the example is using == and ||, so the only wrong thing in there
is that the second expression should be
(X - 43U) <= 3U || (X - 75U) <= 3U

Ok with those changes.
I noticed that we're now including rtl.h and tm_p.h in 
tree-ssa-reassoc.c, which seems wrong.


Jeff

Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) & ~(CST2 - CST1)) == 0

2013-10-15 Thread Jeff Law

On 10/11/13 20:11, Zhenqiang Chen wrote:

-Original Message-
From: Jeff Law [mailto:l...@redhat.com]
Sent: Friday, October 11, 2013 1:20 PM
To: Zhenqiang Chen
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2

-

CST1) == 1 into ((X - CST1) & ~(CST2 - CST1)) == 0

On 10/10/13 03:25, Zhenqiang Chen wrote:

It comes from Coremark. The code is:

if (NEXT_SYMBOL == '+' || NEXT_SYMBOL == '-')

I should have guessed ;-)

For ARM, there are three instructions rather than 4 (in thumb state).
For some older gcc, I got performance improvement on ARM chromebook.
But I can not reproduce the performance gain now.

I will collect logs during GCC bootstrap.

Thanks.  It doesn't have to be anything particularly complex.  When I'm
looking at a transformation I usually just put a printf when it triggers

and grep

for the string in a make log.

Thanks. I check the make log. There are 1906 occurrences.

Wow.  That's awsome.  Thanks,

jeff

Re: [PATCH] decide edge's hotness when there is profile info

2013-10-15 Thread Dehao Chen

Yes, that would work. So let's discard this patch because the fix for
comdat can also fix this problem.

Thanks,
Dehao

On Tue, Oct 15, 2013 at 8:42 AM, Teresa Johnson  wrote:
> I'm planning to move it to ipa_profile (pass ipa-profile_estimate) and
> doing it iteratively. Would that location work?
>
> Teresa
>
> On Tue, Oct 15, 2013 at 8:40 AM, Dehao Chen  wrote:
>> Thanks for the pointer to Honza's patch. The patch does exactly what I
>> need. But it only resides in the instrumentation based FDO path. Can
>> we move the code to more common place (like rebuild_cgraph_edges)?
>>
>> Thanks,
>> Dehao
>>
>> On Tue, Oct 15, 2013 at 7:59 AM, Teresa Johnson  wrote:
>>> On Mon, Oct 14, 2013 at 3:26 PM, Dehao Chen  wrote:
 On Mon, Oct 14, 2013 at 2:50 PM, Jan Hubicka  wrote:
>> For my test case, the entire inline instance is optimized away, so
>> there is no info about it in the profile. I can do some fixup in the
>> rebuild_cgraph_edge though.
>
> Yep, I understand that.  In this case we should turn PROFILE_READ to 
> PROFILE_GUESSED
> and guess the profile when we detect this (i.e. we have edges with non-0 
> counts into
> functions with 0 profile).  That should prvent these from getting 
> UNLIKELY_EXECUTED
> and they will be inlined normal way.

 Oh, actually in AutoFDO, only functions with samples will be marked as
 PROFILE_READ. Others will all be marked as PROFILE_GUESSED.
>>>
>>> Here is Honza's patch that he was referring to:
>>>
>>> Index: tree-profile.c
>>> ===
>>> --- tree-profile.c (revision 201838)
>>> +++ tree-profile.c (working copy)
>>> @@ -604,6 +604,34 @@
>>>
>>>pop_cfun ();
>>>  }
>>> +  /* See if 0 count function has non-0 count callers.  In this case we
>>> + lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
>>> +  FOR_EACH_DEFINED_FUNCTION (node)
>>> +{
>>> +  struct cgraph_edge *e;
>>> +  bool called = false;
>>> +  if (node->count)
>>> + continue;
>>> +  for (e = node->callers; e; e = e->next_caller)
>>> + {
>>> +  if (e->count)
>>> +called = true;
>>> +  if (cgraph_maybe_hot_edge_p (e))
>>> +break;
>>> + }
>>> +  if (e || called
>>> +  && profile_status_for_function
>>> +  (DECL_STRUCT_FUNCTION (node->symbol.decl)) == PROFILE_READ)
>>> + {
>>> +  if (dump_file)
>>> +fprintf (stderr, "Dropping 0 profile for %s/%i.%s based on calls.\n",
>>> + cgraph_node_name (node), node->symbol.order,
>>> + e ? "function is hot" : "function is normal");
>>> +  profile_status_for_function (DECL_STRUCT_FUNCTION (node->symbol.decl))
>>> += (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
>>> +  node->frequency = e ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
>>> + }
>>> +}
>>>
>>>del_node_map();
>>>return 0;
>>> Index: predict.c
>>> ===
>>> --- predict.c (revision 201838)
>>> +++ predict.c (working copy)
>>> @@ -2715,6 +2715,9 @@
>>>gcov_type count_max, true_count_max = 0;
>>>basic_block bb;
>>>
>>> +  if (!ENTRY_BLOCK_PTR->count)
>>> +return 0;
>>> +
>>>FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
>>>  true_count_max = MAX (bb->count, true_count_max);
>>>
>>>
>>> Which is discussed in this email:
>>> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01185.html
>>>
>>> For COMDATs I need to extend this to do it a little later to do it
>>> recursively to catch the case of COMDATs feeding other COMDATs and I
>>> need to do some other handling to compute counts from the frequencies
>>> when inlining. I have been meaning to work on this for awhile but
>>> finally am getting to it this week. (Here's the last message from a
>>> later thread that forked off the above one:
>>> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01907.html)
>>>
>>> In the meantime, perhaps Honza's patch will suffice?
>>>
>>> Teresa
>>>

 Dehao

>
> Honza
>>
>> Dehao
>>
>> On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li  
>> wrote:
>> > Is it possible to update the callee node summary after profile
>> > annotate (using information from inline instances which are not
>> > inlined in early inline)?
>> >
>> > David
>> >
>> > On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen  wrote:
>> >> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka  wrote:
>>  Not for instrumented FDO (not as I know of). But for AutoFDO, this
>>  could be a potential risk because some callee is marked unlikely
>>  executed simply because they are inlined and eliminated in the O2
>>  binary. But in ipa-inline it will not get inlined because the edge 
>>  is
>>  not hot from cgraph_maybe_hot_edge_p (because callee is
>>  UNLIKELY_EXECUTED), while the edge->count is actually hot.
>> >>>
>> >>> Can't you prevent setting calle to UNLIK

Re: [PATCH] Add --enable-host-shared configuration option

2013-10-15 Thread Jeff Law


On 10/09/13 18:25, David Malcolm wrote:

My JIT branch requires embedding GCC's code as a shared library on the
host.  To do requires building the host code as position-independent,
which unfortunately incurs a small speed hit.

Obviously this is an opt-in right now, so I'm not terribly concerned.



My first time attempt at this handled --enable-host-shared by detecting
it in each child "configure" script (as generated from configure.ac) and
conditionally appending -fPIC to makefile variables such as CFLAGS,
CXXFLAGS and LDFLAGS.

This worked when invoking "make" from each subdirectory, but not when
running "make" in the top-level directory: the top-level make would
pass in "CFLAGS=-g -O2" to the child makes, overriding their CFLAGS,
thus erroneously omitting the "-fPIC" flag.
Yea.  This stuff (particularly the overriding of flags) is a mess and 
far from obvious IMHO.  But I don't think fixing that is "in scope" for 
what you're working on.




This patch instead adds a PICFLAG variable, set up in the child
configure scripts, to contain -fPIC if configured with
--enable-host-shared.  The PICFLAG variable is then added to an
appropriate variable in each child makefile to ensure that it is used
when building host code.
Seems reasonable -- particularly since you've tested it both with a 
toplevel invocation and invocations in subdirectories.




There's precedent for both "PICFLAG" and "PIC_FLAG" in the source tree:
libiberty uses "PICFLAG", libbacktrace usss "PIC_FLAG".  I went with
"PICFLAG" for the new flags.

But of course...



The patch uses -fPIC rather than -fpic.

Right.




Successfully bootstrapped and regtesting on x86_64-unknown-linux.

OK for trunk?

ChangeLog follows inline:

/
* configure.ac: Add --enable-host-shared
* configure: Regenerate.

gcc/
* Makefile.in (PICFLAG): New.
(enable_host_shared): New.
(INTERNAL_CFLAGS): Use PICFLAG.
(LIBIBERTY): Use pic build of libiberty.a if configured with
--enable-host-shared.
* configure.ac: Add --enable-host-shared, setting up new
PICFLAG variable.
* configure: Regenerate.

libbacktrace/
* configure.ac: Add --enable-host-shared, setting up
pre-existing PIC_FLAG variable within Makefile.am et al.
* configure: Regenerate.

libcpp/
* Makefile.in (PICFLAG): New.
(ALL_CFLAGS): Add PICFLAG.
(ALL_CXXFLAGS): Likewise.
* configure.ac: Add --enable-host-shared, setting up new
PICFLAG variable.
* configure: Regenerate.

libdecnumber/
* Makefile.in (PICFLAG): New.
(ALL_CFLAGS): Add PICFLAG.
* configure.ac: Add --enable-host-shared, setting up new
PICFLAG variable.
* configure: Regenerate.

libiberty/
* configure.ac: If --enable-host-shared, use -fPIC.
* configure: Regenerate.

zlib/
* configure.ac: Add --enable-host-shared, setting up new
PICFLAG variable.
* Makefile.am: Add PICFLAG to libz_a_CFLAGS.
* Makefile.in: Regenerate.
* configure: Regenerate.
Presumably other host libraries we depend on such as gmp, mpc, etc are 
available in shared (or at least PIC) form as well?  Obviously these are 
out of our source tree and largely out of our control, but I'm a bit 
curious if there's other host libraries that may cause you headaches in 
the future.


This is fine with the install.texi changes Joseph wanted.

jeff

Re: [C++ Patch] PR 58707

2013-10-15 Thread Jason Merrill


OK.

Jason

Re: [PATCH] Add --enable-host-shared configuration option

2013-10-15 Thread Jeff Law


On 10/11/13 14:49, David Malcolm wrote:

On Fri, 2013-10-11 at 20:45 +, Joseph S. Myers wrote:

On Fri, 11 Oct 2013, David Malcolm wrote:


On Thu, 2013-10-10 at 01:05 +, Joseph S. Myers wrote:

On Wed, 9 Oct 2013, David Malcolm wrote:


This patch adds an "--enable-host-shared" option throughout the various
configure/Make machinery for host code, adding "-fPIC" where appropriate
when enabled.


Please document this in install.texi (even if it isn't particularly useful
at the stage where it just means PIC rather than actual shared libraries).


How does the following look:

gcc/
* doc/install.texi (--enable-shared): Add note contrasting it
with...
(--enable-host-shared): ...new option.


Seems reasonable to me.


Thanks.   Presumably the initially posted configure/make patch still
needs review, right?

Yes.  I'm looking at it now.
jeff

Re: [PATCH i386 3/8] [AVX512] [20/n] Add AVX-512 patterns: Misc.

2013-10-15 Thread Richard Henderson

On 10/09/2013 03:31 AM, Kirill Yukhin wrote:
> +  else if (TARGET_AVX512PF && (write || !TARGET_PREFETCH_SSE))
> +operands[2] = GEN_INT (1);

I don't believe you want the TARGET_PREFETCH_SSE check there.
That was really to select between SSE and 3dNow prefetch.  If we have AVX,
we're guaranteed to have SSE.

Indeed, I think the whole condition ought to be rewritten to

  if (TARGET_AVX512PF && write)
operands[2] = const1_rtx;
  else if (TARGET_PRFCHW && (write || !TARGET_PREFETCH_SSE))
operands[2] = GEN_INT (3);
  else
operands[1] = const0_rtx;

Otherwise OK.

r~

Minor ChangeLog goof

2013-10-15 Thread Jeff Law



Martin put his entry in the wrong ChangeLog.  (toplevel instead of 
inside the gcc subdirectory).


Fixed in the obvious way.

diff --git a/ChangeLog b/ChangeLog
index 63c6cd8..0d3c199 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,9 +1,3 @@
-2013-10-15  Martin Jambor  
-
-   * ipa-utils.h (ipa_edge_within_scc): Declare.
-   * ipa-cp.c (edge_within_scc): Moved...
-   * ipa-utils.c (ipa_edge_within_scc): ...here.  Updated all callers.
-
 2013-01-10  Joern Rennecke  
 
Import from savannah.gnu.org:
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 4142dd2..09e2494 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -753,6 +753,11 @@
(avx512f_getmant): Ditto.
(avx512f_rndscale): Fix formatting.
 
+2013-10-15  Martin Jambor  
+
+   * ipa-utils.h (ipa_edge_within_scc): Declare.
+   * ipa-cp.c (edge_within_scc): Moved...
+   * ipa-utils.c (ipa_edge_within_scc): ...here.  Updated all callers.
 
 2013-10-15  Alexander Ivchenko  
Maxim Kuznetsov

Re: [PATCH i386 3/8] [AVX512] [19/n] Add AVX-512 patterns: Extracts and converts.

2013-10-15 Thread Richard Henderson

On 10/09/2013 03:31 AM, Kirill Yukhin wrote:
> +  rtx op1 = operands[1];
> +  if (REG_P (op1))
> +op1 = gen_rtx_REG (V16HImode, REGNO (op1));
> +  else
> +op1 = gen_lowpart (V16HImode, op1);

The IF case is incorrect.  You need to use gen_lowpart always.

> +(define_insn "*avx512f_unpcklpd512"
> +  [(set (match_operand:V8DF 0 "register_operand" "=v,v")
> + (vec_select:V8DF
> +   (vec_concat:V16DF
> + (match_operand:V8DF 1 "nonimmediate_operand" "v,vm")
> + (match_operand:V8DF 2 "nonimmediate_operand" "vm,1"))
> +   (parallel [(const_int 0) (const_int 8)
> +  (const_int 2) (const_int 10)
> +  (const_int 4) (const_int 12)
> +  (const_int 6) (const_int 14)])))]
> +  "TARGET_AVX512F"
> +  "@
> +   vunpcklpd\t{%2, %1, %0|%0, %1, %2}
> +   vmovddup\t{%1, %0|%0, %1}"
> +  [(set_attr "type" "sselog")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "V8DF")])

The second alternative only matches for v/m/1.  While I imagine that it doesn't
really matter, it might be better to swap the two so that vmovddup gets used
for the v/v/1 case too.

Why do you have separate define_expand and define_insn for this pattern?

> +(define_insn "*avx512f_v8div16qi2_store"
> +  [(set (match_operand:V16QI 0 "memory_operand" "=m")
> + (vec_concat:V16QI
> +   (any_truncate:V8QI
> + (match_operand:V8DI 1 "register_operand" "v"))
> +   (vec_select:V8QI
> + (match_dup 0)
> + (parallel [(const_int 8) (const_int 9)
> +(const_int 10) (const_int 11)
> +(const_int 12) (const_int 13)
> +(const_int 14) (const_int 15)]]
> +  "TARGET_AVX512F"
> +  "vpmovqb\t{%1, %0|%0, %1}"
> +  [(set_attr "type" "ssemov")
> +   (set_attr "memory" "store")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "TI")])

Any pattern that uses a match_dup of an input like this must use "+" not "=" to
signal that the operand is in_out not output.

And is this really the best description for this insn?  It's a store to memory.
 Surely it's better to say that we store V8QI.

> +(define_expand "vec_widen_umult_even_v16si"
> +  [(set (match_operand:V8DI 0 "register_operand")
> +(mult:V8DI
> +  (zero_extend:V8DI
> +(vec_select:V8SI
> +  (match_operand:V16SI 1 "nonimmediate_operand")
> +  (parallel [(const_int 0) (const_int 2)
> + (const_int 4) (const_int 6)
> + (const_int 8) (const_int 10)
> + (const_int 12) (const_int 14)])))
> +  (zero_extend:V8DI
> +(vec_select:V8SI
> +  (match_operand:V16SI 2 "nonimmediate_operand")
> +  (parallel [(const_int 0) (const_int 2)
> + (const_int 4) (const_int 6)
> + (const_int 8) (const_int 10)
> + (const_int 12) (const_int 14)])]
> +  "TARGET_AVX512F"
> +  "ix86_fixup_binary_operands_no_copy (MULT, V16SImode, operands);")

We don't need to use ix86_fixup_binary_operands for any three-operand insn.
That function is in order to help SSE's two-operand insns.  You can drop the
define-expand and just keep the define_insn.

> +(define_insn "*vec_widen_umult_even_v16si"
> +  [(set (match_operand:V8DI 0 "register_operand" "=v")
> +(mult:V8DI
> +  (zero_extend:V8DI
> +(vec_select:V8SI
> +  (match_operand:V16SI 1 "nonimmediate_operand" "v")
> +  (parallel [(const_int 0) (const_int 2)
> + (const_int 4) (const_int 6)
> + (const_int 8) (const_int 10)
> + (const_int 12) (const_int 14)])))
> +  (zero_extend:V8DI
> +(vec_select:V8SI
> +  (match_operand:V16SI 2 "nonimmediate_operand" "vm")
> +  (parallel [(const_int 0) (const_int 2)
> + (const_int 4) (const_int 6)
> + (const_int 8) (const_int 10)
> + (const_int 12) (const_int 14)])]
> +  "TARGET_AVX512F && ix86_binary_operator_ok (MULT, V16SImode, operands)"

You want a "%v" for operand 1, to make the operands commutative.

> +(define_insn "*vec_widen_smult_even_v16si"
> +  [(set (match_operand:V8DI 0 "register_operand" "=x")
> +(mult:V8DI

Similarly, plus errant "x".


r~

Re: [PATCH] decide edge's hotness when there is profile info

2013-10-15 Thread Teresa Johnson

I'm planning to move it to ipa_profile (pass ipa-profile_estimate) and
doing it iteratively. Would that location work?

Teresa

On Tue, Oct 15, 2013 at 8:40 AM, Dehao Chen  wrote:
> Thanks for the pointer to Honza's patch. The patch does exactly what I
> need. But it only resides in the instrumentation based FDO path. Can
> we move the code to more common place (like rebuild_cgraph_edges)?
>
> Thanks,
> Dehao
>
> On Tue, Oct 15, 2013 at 7:59 AM, Teresa Johnson  wrote:
>> On Mon, Oct 14, 2013 at 3:26 PM, Dehao Chen  wrote:
>>> On Mon, Oct 14, 2013 at 2:50 PM, Jan Hubicka  wrote:
> For my test case, the entire inline instance is optimized away, so
> there is no info about it in the profile. I can do some fixup in the
> rebuild_cgraph_edge though.

 Yep, I understand that.  In this case we should turn PROFILE_READ to 
 PROFILE_GUESSED
 and guess the profile when we detect this (i.e. we have edges with non-0 
 counts into
 functions with 0 profile).  That should prvent these from getting 
 UNLIKELY_EXECUTED
 and they will be inlined normal way.
>>>
>>> Oh, actually in AutoFDO, only functions with samples will be marked as
>>> PROFILE_READ. Others will all be marked as PROFILE_GUESSED.
>>
>> Here is Honza's patch that he was referring to:
>>
>> Index: tree-profile.c
>> ===
>> --- tree-profile.c (revision 201838)
>> +++ tree-profile.c (working copy)
>> @@ -604,6 +604,34 @@
>>
>>pop_cfun ();
>>  }
>> +  /* See if 0 count function has non-0 count callers.  In this case we
>> + lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
>> +  FOR_EACH_DEFINED_FUNCTION (node)
>> +{
>> +  struct cgraph_edge *e;
>> +  bool called = false;
>> +  if (node->count)
>> + continue;
>> +  for (e = node->callers; e; e = e->next_caller)
>> + {
>> +  if (e->count)
>> +called = true;
>> +  if (cgraph_maybe_hot_edge_p (e))
>> +break;
>> + }
>> +  if (e || called
>> +  && profile_status_for_function
>> +  (DECL_STRUCT_FUNCTION (node->symbol.decl)) == PROFILE_READ)
>> + {
>> +  if (dump_file)
>> +fprintf (stderr, "Dropping 0 profile for %s/%i.%s based on calls.\n",
>> + cgraph_node_name (node), node->symbol.order,
>> + e ? "function is hot" : "function is normal");
>> +  profile_status_for_function (DECL_STRUCT_FUNCTION (node->symbol.decl))
>> += (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
>> +  node->frequency = e ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
>> + }
>> +}
>>
>>del_node_map();
>>return 0;
>> Index: predict.c
>> ===
>> --- predict.c (revision 201838)
>> +++ predict.c (working copy)
>> @@ -2715,6 +2715,9 @@
>>gcov_type count_max, true_count_max = 0;
>>basic_block bb;
>>
>> +  if (!ENTRY_BLOCK_PTR->count)
>> +return 0;
>> +
>>FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
>>  true_count_max = MAX (bb->count, true_count_max);
>>
>>
>> Which is discussed in this email:
>> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01185.html
>>
>> For COMDATs I need to extend this to do it a little later to do it
>> recursively to catch the case of COMDATs feeding other COMDATs and I
>> need to do some other handling to compute counts from the frequencies
>> when inlining. I have been meaning to work on this for awhile but
>> finally am getting to it this week. (Here's the last message from a
>> later thread that forked off the above one:
>> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01907.html)
>>
>> In the meantime, perhaps Honza's patch will suffice?
>>
>> Teresa
>>
>>>
>>> Dehao
>>>

 Honza
>
> Dehao
>
> On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li  
> wrote:
> > Is it possible to update the callee node summary after profile
> > annotate (using information from inline instances which are not
> > inlined in early inline)?
> >
> > David
> >
> > On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen  wrote:
> >> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka  wrote:
>  Not for instrumented FDO (not as I know of). But for AutoFDO, this
>  could be a potential risk because some callee is marked unlikely
>  executed simply because they are inlined and eliminated in the O2
>  binary. But in ipa-inline it will not get inlined because the edge is
>  not hot from cgraph_maybe_hot_edge_p (because callee is
>  UNLIKELY_EXECUTED), while the edge->count is actually hot.
> >>>
> >>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases 
> >>> instead?
> >>> It seems that having profile set incorrectly will lead to other 
> >>> problems later, too.
> >>> We discussed similar problem with Teresa about the missing profiles 
> >>> for comdat,
> >>> basically one should detect these cases as profile being

Re: [PATCH] decide edge's hotness when there is profile info

2013-10-15 Thread Dehao Chen

Thanks for the pointer to Honza's patch. The patch does exactly what I
need. But it only resides in the instrumentation based FDO path. Can
we move the code to more common place (like rebuild_cgraph_edges)?

Thanks,
Dehao

On Tue, Oct 15, 2013 at 7:59 AM, Teresa Johnson  wrote:
> On Mon, Oct 14, 2013 at 3:26 PM, Dehao Chen  wrote:
>> On Mon, Oct 14, 2013 at 2:50 PM, Jan Hubicka  wrote:
 For my test case, the entire inline instance is optimized away, so
 there is no info about it in the profile. I can do some fixup in the
 rebuild_cgraph_edge though.
>>>
>>> Yep, I understand that.  In this case we should turn PROFILE_READ to 
>>> PROFILE_GUESSED
>>> and guess the profile when we detect this (i.e. we have edges with non-0 
>>> counts into
>>> functions with 0 profile).  That should prvent these from getting 
>>> UNLIKELY_EXECUTED
>>> and they will be inlined normal way.
>>
>> Oh, actually in AutoFDO, only functions with samples will be marked as
>> PROFILE_READ. Others will all be marked as PROFILE_GUESSED.
>
> Here is Honza's patch that he was referring to:
>
> Index: tree-profile.c
> ===
> --- tree-profile.c (revision 201838)
> +++ tree-profile.c (working copy)
> @@ -604,6 +604,34 @@
>
>pop_cfun ();
>  }
> +  /* See if 0 count function has non-0 count callers.  In this case we
> + lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
> +  FOR_EACH_DEFINED_FUNCTION (node)
> +{
> +  struct cgraph_edge *e;
> +  bool called = false;
> +  if (node->count)
> + continue;
> +  for (e = node->callers; e; e = e->next_caller)
> + {
> +  if (e->count)
> +called = true;
> +  if (cgraph_maybe_hot_edge_p (e))
> +break;
> + }
> +  if (e || called
> +  && profile_status_for_function
> +  (DECL_STRUCT_FUNCTION (node->symbol.decl)) == PROFILE_READ)
> + {
> +  if (dump_file)
> +fprintf (stderr, "Dropping 0 profile for %s/%i.%s based on calls.\n",
> + cgraph_node_name (node), node->symbol.order,
> + e ? "function is hot" : "function is normal");
> +  profile_status_for_function (DECL_STRUCT_FUNCTION (node->symbol.decl))
> += (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
> +  node->frequency = e ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
> + }
> +}
>
>del_node_map();
>return 0;
> Index: predict.c
> ===
> --- predict.c (revision 201838)
> +++ predict.c (working copy)
> @@ -2715,6 +2715,9 @@
>gcov_type count_max, true_count_max = 0;
>basic_block bb;
>
> +  if (!ENTRY_BLOCK_PTR->count)
> +return 0;
> +
>FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
>  true_count_max = MAX (bb->count, true_count_max);
>
>
> Which is discussed in this email:
> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01185.html
>
> For COMDATs I need to extend this to do it a little later to do it
> recursively to catch the case of COMDATs feeding other COMDATs and I
> need to do some other handling to compute counts from the frequencies
> when inlining. I have been meaning to work on this for awhile but
> finally am getting to it this week. (Here's the last message from a
> later thread that forked off the above one:
> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01907.html)
>
> In the meantime, perhaps Honza's patch will suffice?
>
> Teresa
>
>>
>> Dehao
>>
>>>
>>> Honza

 Dehao

 On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li  
 wrote:
 > Is it possible to update the callee node summary after profile
 > annotate (using information from inline instances which are not
 > inlined in early inline)?
 >
 > David
 >
 > On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen  wrote:
 >> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka  wrote:
  Not for instrumented FDO (not as I know of). But for AutoFDO, this
  could be a potential risk because some callee is marked unlikely
  executed simply because they are inlined and eliminated in the O2
  binary. But in ipa-inline it will not get inlined because the edge is
  not hot from cgraph_maybe_hot_edge_p (because callee is
  UNLIKELY_EXECUTED), while the edge->count is actually hot.
 >>>
 >>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases 
 >>> instead?
 >>> It seems that having profile set incorrectly will lead to other 
 >>> problems later, too.
 >>> We discussed similar problem with Teresa about the missing profiles 
 >>> for comdat,
 >>> basically one should detect these cases as profile being lost and go 
 >>> with guessed
 >>> profile.  (I believe patch for that was posted, too, and so far it 
 >>> seems best approach
 >>> to this issue)
 >>
 >> The current AutoFDO implementation will take all functions that do not
 >> have have profile as normally executed, thus use gues

Re: PING Re: [Patch, Fortran] PR58658 - add missing pointer-assign check for CLASS(*)

2013-10-15 Thread Paul Richard Thomas

Hi Tobias,

Have you checked that:

subroutine sub(a)
  class(*),pointer :: a
  a => null()
end subroutine

does not give an error?  I think that it is why the check was introduced.

Cheers

Paul


On 13 October 2013 09:51, Tobias Burnus  wrote:
> *PING*: http://gcc.gnu.org/ml/fortran/2013-10/msg00018.html
>
> Additionally, I'd like to early ping for the do concurrent patch:
> http://gcc.gnu.org/ml/fortran/2013-10/msg00022.html , even if the ME review
> is still pending.
>
> Tobias Burnus wrote:
>>
>> The patch is rather obvious. The question is just why was the check there
>> at the first place.
>>
>> Build and regtested on x86-64-gnu-linux.
>> OK for the trunk?
>>
>> Tobias
>
>



-- 
The knack of flying is learning how to throw yourself at the ground and miss.
   --Hitchhikers Guide to the Galaxy

[C++ Patch] PR 58707

2013-10-15 Thread Paolo Carlini


Hi,

in order to fix this parsing issue with '>' between square brackets, it 
seems we have simply to use the parser->greater_than_is_operator_p 
mechanism. Tested x86_64-linux.


Thanks,
Paolo.

PS: Even if the testcase uses a constexpr array I don't think we need to 
special case c++11 vs c++98. Neither wrap only the cp_parser_expression 
call.


///
/cp
2013-10-15  Paolo Carlini  

PR c++/58707
* parser.c (cp_parser_postfix_open_square_expression): Set
parser->greater_than_is_operator_p for the argument.

/testsuite
2013-10-15  Paolo Carlini  

PR c++/58707
* g++.dg/cpp0x/pr58707.C: New.
Index: cp/parser.c
===
--- cp/parser.c (revision 203587)
+++ cp/parser.c (working copy)
@@ -6231,10 +6231,14 @@ cp_parser_postfix_open_square_expression (cp_parse
 {
   tree index = NULL_TREE;
   location_t loc = cp_lexer_peek_token (parser->lexer)->location;
+  bool saved_greater_than_is_operator_p;
 
   /* Consume the `[' token.  */
   cp_lexer_consume_token (parser->lexer);
 
+  saved_greater_than_is_operator_p = parser->greater_than_is_operator_p;
+  parser->greater_than_is_operator_p = true;
+
   /* Parse the index expression.  */
   /* ??? For offsetof, there is a question of what to allow here.  If
  offsetof is not being used in an integral constant expression context,
@@ -6278,6 +6282,8 @@ cp_parser_postfix_open_square_expression (cp_parse
index = cp_parser_expression (parser, /*cast_p=*/false, NULL);
 }
 
+  parser->greater_than_is_operator_p = saved_greater_than_is_operator_p;
+
   /* Look for the closing `]'.  */
   cp_parser_require (parser, CPP_CLOSE_SQUARE, RT_CLOSE_SQUARE);
 
Index: testsuite/g++.dg/cpp0x/pr58707.C
===
--- testsuite/g++.dg/cpp0x/pr58707.C(revision 0)
+++ testsuite/g++.dg/cpp0x/pr58707.C(working copy)
@@ -0,0 +1,6 @@
+// PR c++/58707
+// { dg-do compile { target c++11 } }
+
+template class TC { };
+constexpr int foo[] = { 42 };
+TC 1]> bar;

RE: [PATCH] tree_code_name wrapper

2013-10-15 Thread Paulo Matos

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On
> Behalf Of Jakub Jelinek
> Sent: 15 October 2013 15:45
> To: Paulo Matos
> Cc: Richard Biener; Paolo Carlini; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] tree_code_name wrapper
> 
> The ChangeLog is still wrong:
> 

OK, sorry.
If this is ok now I will open a bottle of champagne.

Thank you all for your time reviewing this. Ok to submit?

2013-10-15  Paulo Matos  

gcc/
* tree-core.h (tree_code_name): Remove.
* tree.h (get_tree_code_name): New prototype.
* tree.c (tree_code_name): Make static.
(get_tree_code_name): New function.
(dump_tree_statistics, tree_check_failed, tree_not_check_failed, 
tree_class_check_failed, tree_range_check_failed,
tree_not_class_check_failed, omp_clause_check_failed,
tree_contains_struct_check_failed, tree_operand_check_failed): Use new
wrapper get_tree_code_name instead of calling tree_code_name directly.
* tree-vrp.c (dump_asserts_for): Likewise.
* tree-dump.c (dequeue_and_dump): Likewise.
* tree-pretty-print.c (do_niy, dump_generic_node): Likewise.
* tree-pretty-print.h (pp_unsupported_tree): Likewise.
* lto-streamer-out.c (lto_write_tree, DFS_write_tree): Likewise.
* tree-ssa-dom.c (print_expr_hash_elt): Likewise.
* gimple-pretty-print.c (dump_unary_rhs, dump_binary_rhs,
dump_ternary_rhs, dump_gimple_assign, dump_gimple_cond,
dump_gimple_omp_for): Likewise.
* tree-vect-data-refs.c (vect_create_data_ref_ptr): Likewise.
* tree-ssa-pre.c (print_pre_expr): Likewise.
* ipa-prop.c (ipa_print_node_jump_functions_for_edge): Likewise.
* print-tree.c (print_node_brief, print_node): Likewise.
* gimple.c (gimple_check_failed): Likewise.
* lto-streamer.c (lto_tag_name, print_lto_report): Likewise.
* config/frv/frv.c (frv_init_cumulative_args): Likewise.
* config/mep/mep.c (mep_validate_vliw): Likewise.
* config/iq2000/iq2000.c (init_cumulative_args): Likewise.
* config/rs6000/rs6000.c (init_cumulative_args): Likewise.

gcc/cp/
* error.c (code_to_string): Use new wrapper get_tree_code_name.
* cxx-pretty-print.c (pp_cxx_assignment_operator): Likewise.
* pt.c (tsubst): Likewise.
* semantics.c (cxx_eval_constant_expression,
potential_constant_expression_1): Likewise.
* mangle.c (MANGLE_TRACE_TREE, dump_substitution_candidates,
add_substitution, find_substitution): Likewise.

-- 
Paulo Matos

Re: [Patch] Fix PR58737

2013-10-15 Thread Jonathan Wakely

On 15 October 2013 15:56, Tim Shen wrote:
> This memory leak is because forgetting virtual destructor of the base
> class _Executor.
>
> Thanks!

Great, if it passes the testsuite please commit it.

Thanks for the quick fix, and to Paolo for identifying the cause!

Re: [Patch] Fix PR58737

2013-10-15 Thread Tim Shen

On Tue, Oct 15, 2013 at 11:01 AM, Jonathan Wakely  wrote:
> Great, if it passes the testsuite please commit it.

It surely passed -m32 and -m64, and committed :)


-- 
Tim Shen

Re: [Patch] Fix PR58737

2013-10-15 Thread Paolo Carlini


On 10/15/2013 04:56 PM, Tim Shen wrote:

This memory leak is because forgetting virtual destructor of the base
class _Executor.

Of course.

Thanks,
Paolo.

Re: [PATCH] decide edge's hotness when there is profile info

2013-10-15 Thread Teresa Johnson

On Mon, Oct 14, 2013 at 3:26 PM, Dehao Chen  wrote:
> On Mon, Oct 14, 2013 at 2:50 PM, Jan Hubicka  wrote:
>>> For my test case, the entire inline instance is optimized away, so
>>> there is no info about it in the profile. I can do some fixup in the
>>> rebuild_cgraph_edge though.
>>
>> Yep, I understand that.  In this case we should turn PROFILE_READ to 
>> PROFILE_GUESSED
>> and guess the profile when we detect this (i.e. we have edges with non-0 
>> counts into
>> functions with 0 profile).  That should prvent these from getting 
>> UNLIKELY_EXECUTED
>> and they will be inlined normal way.
>
> Oh, actually in AutoFDO, only functions with samples will be marked as
> PROFILE_READ. Others will all be marked as PROFILE_GUESSED.

Here is Honza's patch that he was referring to:

Index: tree-profile.c
===
--- tree-profile.c (revision 201838)
+++ tree-profile.c (working copy)
@@ -604,6 +604,34 @@

   pop_cfun ();
 }
+  /* See if 0 count function has non-0 count callers.  In this case we
+ lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
+  FOR_EACH_DEFINED_FUNCTION (node)
+{
+  struct cgraph_edge *e;
+  bool called = false;
+  if (node->count)
+ continue;
+  for (e = node->callers; e; e = e->next_caller)
+ {
+  if (e->count)
+called = true;
+  if (cgraph_maybe_hot_edge_p (e))
+break;
+ }
+  if (e || called
+  && profile_status_for_function
+  (DECL_STRUCT_FUNCTION (node->symbol.decl)) == PROFILE_READ)
+ {
+  if (dump_file)
+fprintf (stderr, "Dropping 0 profile for %s/%i.%s based on calls.\n",
+ cgraph_node_name (node), node->symbol.order,
+ e ? "function is hot" : "function is normal");
+  profile_status_for_function (DECL_STRUCT_FUNCTION (node->symbol.decl))
+= (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
+  node->frequency = e ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
+ }
+}

   del_node_map();
   return 0;
Index: predict.c
===
--- predict.c (revision 201838)
+++ predict.c (working copy)
@@ -2715,6 +2715,9 @@
   gcov_type count_max, true_count_max = 0;
   basic_block bb;

+  if (!ENTRY_BLOCK_PTR->count)
+return 0;
+
   FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
 true_count_max = MAX (bb->count, true_count_max);


Which is discussed in this email:
http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01185.html

For COMDATs I need to extend this to do it a little later to do it
recursively to catch the case of COMDATs feeding other COMDATs and I
need to do some other handling to compute counts from the frequencies
when inlining. I have been meaning to work on this for awhile but
finally am getting to it this week. (Here's the last message from a
later thread that forked off the above one:
http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01907.html)

In the meantime, perhaps Honza's patch will suffice?

Teresa

>
> Dehao
>
>>
>> Honza
>>>
>>> Dehao
>>>
>>> On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li  
>>> wrote:
>>> > Is it possible to update the callee node summary after profile
>>> > annotate (using information from inline instances which are not
>>> > inlined in early inline)?
>>> >
>>> > David
>>> >
>>> > On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen  wrote:
>>> >> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka  wrote:
>>>  Not for instrumented FDO (not as I know of). But for AutoFDO, this
>>>  could be a potential risk because some callee is marked unlikely
>>>  executed simply because they are inlined and eliminated in the O2
>>>  binary. But in ipa-inline it will not get inlined because the edge is
>>>  not hot from cgraph_maybe_hot_edge_p (because callee is
>>>  UNLIKELY_EXECUTED), while the edge->count is actually hot.
>>> >>>
>>> >>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases 
>>> >>> instead?
>>> >>> It seems that having profile set incorrectly will lead to other 
>>> >>> problems later, too.
>>> >>> We discussed similar problem with Teresa about the missing profiles for 
>>> >>> comdat,
>>> >>> basically one should detect these cases as profile being lost and go 
>>> >>> with guessed
>>> >>> profile.  (I believe patch for that was posted, too, and so far it 
>>> >>> seems best approach
>>> >>> to this issue)
>>> >>
>>> >> The current AutoFDO implementation will take all functions that do not
>>> >> have have profile as normally executed, thus use guessed profile for
>>> >> it. This is like using profile for truly hot functions, and using O2
>>> >> for other functions. This works fine. However, it leads to larger code
>>> >> size (approximately 10%~20% larger than FDO).
>>> >>
>>> >> I'd like to introduce another mode for users who care about both
>>> >> performance and code size, and can be sure that profile is
>>> >> representative. In this mode, we will mark all functions without
>>> >> sample as "unlikely e

Re: [PATCH][ubsan] Add VLA bound instrumentation

2013-10-15 Thread Joseph S. Myers

On Tue, 15 Oct 2013, Marek Polacek wrote:

> Ping^2.  Jason, Joseph, are you fine with the C++/C FE changes?

The C changes are fine with me.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: RFA: Remove alias usage from libgcc/sync.c

2013-10-15 Thread Ian Lance Taylor

On Tue, Oct 15, 2013 at 2:18 AM, Richard Biener
 wrote:
>
> Ok for the tailcall parts and the testcase - I'd prefer someone else to
> ack the libgcc change.  CCing maintainer.

The libgcc patch is missing the updates to the comments.  This code is
confusing enough as it is, having incorrect comments wouldn't help.

Ian

Re: [PATCH] decide edge's hotness when there is profile info

2013-10-15 Thread Dehao Chen

The patch updated:

Index: gcc/cgraph.c
===
--- gcc/cgraph.c (revision 203609)
+++ gcc/cgraph.c (working copy)
@@ -877,6 +877,10 @@ cgraph_create_edge_1 (struct cgraph_node *caller,
   if (call_stmt && caller->call_site_hash)
 cgraph_add_edge_to_call_site_hash (edge);

+  if (count > 0 && edge->callee
+  && edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED)
+edge->callee->frequency = NODE_FREQUENCY_NORMAL;
+
   return edge;
 }

Thanks,
Dehao

On Mon, Oct 14, 2013 at 3:26 PM, Dehao Chen  wrote:
> On Mon, Oct 14, 2013 at 2:50 PM, Jan Hubicka  wrote:
>>> For my test case, the entire inline instance is optimized away, so
>>> there is no info about it in the profile. I can do some fixup in the
>>> rebuild_cgraph_edge though.
>>
>> Yep, I understand that.  In this case we should turn PROFILE_READ to 
>> PROFILE_GUESSED
>> and guess the profile when we detect this (i.e. we have edges with non-0 
>> counts into
>> functions with 0 profile).  That should prvent these from getting 
>> UNLIKELY_EXECUTED
>> and they will be inlined normal way.
>
> Oh, actually in AutoFDO, only functions with samples will be marked as
> PROFILE_READ. Others will all be marked as PROFILE_GUESSED.
>
> Dehao
>
>>
>> Honza
>>>
>>> Dehao
>>>
>>> On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li  
>>> wrote:
>>> > Is it possible to update the callee node summary after profile
>>> > annotate (using information from inline instances which are not
>>> > inlined in early inline)?
>>> >
>>> > David
>>> >
>>> > On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen  wrote:
>>> >> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka  wrote:
>>>  Not for instrumented FDO (not as I know of). But for AutoFDO, this
>>>  could be a potential risk because some callee is marked unlikely
>>>  executed simply because they are inlined and eliminated in the O2
>>>  binary. But in ipa-inline it will not get inlined because the edge is
>>>  not hot from cgraph_maybe_hot_edge_p (because callee is
>>>  UNLIKELY_EXECUTED), while the edge->count is actually hot.
>>> >>>
>>> >>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases 
>>> >>> instead?
>>> >>> It seems that having profile set incorrectly will lead to other 
>>> >>> problems later, too.
>>> >>> We discussed similar problem with Teresa about the missing profiles for 
>>> >>> comdat,
>>> >>> basically one should detect these cases as profile being lost and go 
>>> >>> with guessed
>>> >>> profile.  (I believe patch for that was posted, too, and so far it 
>>> >>> seems best approach
>>> >>> to this issue)
>>> >>
>>> >> The current AutoFDO implementation will take all functions that do not
>>> >> have have profile as normally executed, thus use guessed profile for
>>> >> it. This is like using profile for truly hot functions, and using O2
>>> >> for other functions. This works fine. However, it leads to larger code
>>> >> size (approximately 10%~20% larger than FDO).
>>> >>
>>> >> I'd like to introduce another mode for users who care about both
>>> >> performance and code size, and can be sure that profile is
>>> >> representative. In this mode, we will mark all functions without
>>> >> sample as "unlikely executed". However, because AutoFDO use debug info
>>> >> (of optimized code) to represent profile, it's possible that some hot
>>> >> functions (say foo) are inlined and fully eliminated into another hot
>>> >> function (say bar). So in the profile, bar is cold, and because the
>>> >> profile for foo::bar is eliminated, bar will not be inlined into foo
>>> >> before the profile annotation. However, after profile annotate, we can
>>> >> infer from the bb count that foo->bar is hot, thus it should be
>>> >> inlined in ipa-inline phase. However, because bar itself is marked
>>> >> UNLIKELY_EXECUTED, it will not be inlined.
>>> >>
>>> >> One possible workaround would be that during rebuild_cgraph_edges, if
>>> >> we find an edge's callee is unlikely executed, add the edge count to
>>> >> the callee's count and recalculate callee's frequency.
>>> >>
>>> >> Dehao
>>> >>
>>> >>>
>>> >>> Honza

[Patch] Fix PR58737

2013-10-15 Thread Tim Shen

This memory leak is because forgetting virtual destructor of the base
class _Executor.

Thanks!


-- 
Tim Shen


a.patch
Description: Binary data

Re: [PATCH] tree_code_name wrapper

2013-10-15 Thread Jakub Jelinek

The ChangeLog is still wrong:

On Tue, Oct 15, 2013 at 02:38:31PM +, Paulo Matos wrote:
> 2013-10-15  Paulo Matos  
> 
> gcc/
>   * tree-core.h: Remove extern declaration of tree_code_name.

* tree-core.h (tree_code_name): Remove.
(or Remove external declaration. etc.).

>   * tree.h: Prototype for new function get_tree_code_name.

* tree.h (get_tree_code_name): New prototype.

>   * tree.c: Make tree_code_name static. Define new function
>   wrapper get_tree_code_name.

* tree.c (tree_code_name): Make static.
(get_tree_code_name): New function.

>   (dump_tree_statistics, tree_check_failed, tree_not_check_failed, 
>   tree_class_check_failed, tree_range_check_failed, 
> tree_not_class_check_failed,

Line length 80 characters.

>   omp_clause_check_failed, tree_contains_struct_check_failed, 
>   tree_operand_check_failed):  Use new wrapper get_tree_code_name 
> instead of

What is the extra tab after : for?

>   calling tree_code_name directly.
>   * tree-vrp.c (dump_asserts_for): Likewise.
>   * tree-dump.c (dequeue_and_dump): Likewise.
>   * tree-pretty-print.c (do_niy, dump_generic_node): Likewise.
>   * tree-pretty-print.h (pp_unsupported_tree): Likewise.
>   * lto-streamer-out.c (lto_write_tree, DFS_write_tree): Likewise.
>   * tree-ssa-dom.c (print_expr_hash_elt): Likewise.
>   * gimple-pretty-print.c (dump_unary_rhs, dump_binary_rhs, 
> dump_ternary_rhs, 

Line length.

>   dump_gimple_assign, dump_gimple_cond, dump_gimple_omp_for): Likewise.
>   * tree-vect-data-refs.c (vect_create_data_ref_ptr): Likewise.
>   * tree-ssa-pre.c (print_pre_expr): Likewise.
>   * ipa-prop.c (ipa_print_node_jump_functions_for_edge): Likewise.
>   * print-tree.c (print_node_brief, print_node): Likewise.
>   * gimple.c (gimple_check_failed): Likewise.
>   * lto-streamer.c (lto_tag_name, print_lto_report): Likewise.
> 
> gcc/cp/
>   * error.c (code_to_string): Use new wrapper get_tree_code_name.
>   * cxx-pretty-print.c (pp_cxx_assignment_operator): Likewise.
>   * pt.c (tsubst): Likewise.
>   * semantics.c (cxx_eval_constant_expression, 
> potential_constant_expression_1): Likewise.

Again.

>   * mangle.c (MANGLE_TRACE_TREE, dump_substitution_candidates, 
> add_substitution,

Again.

>   find_substitution): Likewise.
> 
> gcc/config/

There is no ChangeLog file in gcc/config/ subdirectory, so these should be
* config/frv/frv.c ... etc. in the gcc/ part of the ChangeLog.

>   * frv/frv.c (frv_init_cumulative_args): Use new wrapper 
> get_tree_code_name.

Again.

>   * mep/mep.c (mep_validate_vliw): Likewise.
>   * iq2000/iq2000.c (init_cumulative_args): Likewise.
>   * rs6000/rs6000.c (init_cumulative_args): Likewise.

Jakub

RE: [PATCH] tree_code_name wrapper

2013-10-15 Thread Paulo Matos

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On
> Behalf Of Richard Biener
> Sent: 15 October 2013 14:34
> To: Paulo Matos
> Cc: Jakub Jelinek; Paolo Carlini; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] tree_code_name wrapper
> 
> On Tue, Oct 15, 2013 at 3:19 PM, Paulo Matos  wrote:
> >> -Original Message-
> >> From: Jakub Jelinek [mailto:ja...@redhat.com]
> >> Sent: 15 October 2013 10:51
> >> To: Paulo Matos
> >> Cc: Richard Biener; Paolo Carlini; gcc-patches@gcc.gnu.org
> >> Subject: Re: [PATCH] tree_code_name wrapper
> >>
> >> On Tue, Oct 15, 2013 at 09:42:17AM +, Paulo Matos wrote:
> >> > Thanks, regarding the indentation I was convinced we used 2 space
> >> > indentation with maximum line length of 80 characters.
> >>
> >> We do, however 8 consecutive spaces in the indentation should be always
> >> replaced by a tab.
> >>
> >
> > This means that every sequence of 8 spaces should be converted into tabs?
> So, if we indent something 4 times, that becomes a tab instead of 4 times 2
> spaces.
> 
> Correct.
> 
> Richard.
> 

Thanks. Updated patch attached. Ok to submit?

2013-10-15  Paulo Matos  

gcc/
* tree-core.h: Remove extern declaration of tree_code_name.
* tree.h: Prototype for new function get_tree_code_name.
* tree.c: Make tree_code_name static. Define new function
wrapper get_tree_code_name.
(dump_tree_statistics, tree_check_failed, tree_not_check_failed, 
tree_class_check_failed, tree_range_check_failed, 
tree_not_class_check_failed,
omp_clause_check_failed, tree_contains_struct_check_failed, 
tree_operand_check_failed):  Use new wrapper get_tree_code_name 
instead of
calling tree_code_name directly.
* tree-vrp.c (dump_asserts_for): Likewise.
* tree-dump.c (dequeue_and_dump): Likewise.
* tree-pretty-print.c (do_niy, dump_generic_node): Likewise.
* tree-pretty-print.h (pp_unsupported_tree): Likewise.
* lto-streamer-out.c (lto_write_tree, DFS_write_tree): Likewise.
* tree-ssa-dom.c (print_expr_hash_elt): Likewise.
* gimple-pretty-print.c (dump_unary_rhs, dump_binary_rhs, 
dump_ternary_rhs, 
dump_gimple_assign, dump_gimple_cond, dump_gimple_omp_for): Likewise.
* tree-vect-data-refs.c (vect_create_data_ref_ptr): Likewise.
* tree-ssa-pre.c (print_pre_expr): Likewise.
* ipa-prop.c (ipa_print_node_jump_functions_for_edge): Likewise.
* print-tree.c (print_node_brief, print_node): Likewise.
* gimple.c (gimple_check_failed): Likewise.
* lto-streamer.c (lto_tag_name, print_lto_report): Likewise.

gcc/cp/
* error.c (code_to_string): Use new wrapper get_tree_code_name.
* cxx-pretty-print.c (pp_cxx_assignment_operator): Likewise.
* pt.c (tsubst): Likewise.
* semantics.c (cxx_eval_constant_expression, 
potential_constant_expression_1): Likewise.
* mangle.c (MANGLE_TRACE_TREE, dump_substitution_candidates, 
add_substitution,
find_substitution): Likewise.

gcc/config/
* frv/frv.c (frv_init_cumulative_args): Use new wrapper 
get_tree_code_name.
* mep/mep.c (mep_validate_vliw): Likewise.
* iq2000/iq2000.c (init_cumulative_args): Likewise.
* rs6000/rs6000.c (init_cumulative_args): Likewise.

tree_code_name-v2.patch
Description: tree_code_name-v2.patch

Re: [PATCH i386 3/8] [AVX512] [18/n] Add AVX-512 patterns: various RCPs and SQRTs.

2013-10-15 Thread Richard Henderson

On 10/15/2013 06:57 AM, Kirill Yukhin wrote:
> Hello,
> On 14 Oct 13:10, Richard Henderson wrote:
>> On 10/09/2013 03:31 AM, Kirill Yukhin wrote:
>>> +(define_mode_attr ssefixupmode
>>> +  [(V16SF "V16SI") (V4SF "V4SI") (V8DF "V8DI") (V2DF "V2DI")])
>>> +
>>
>> Oh, I forgot.  How is this different from sseintvecmode?
> It is definetely a bug. Ok with corresponding replacement and removal
> of redundant mode attr?

Yes.


r~

Re: [ARM/AARCH64] Remodel type attribute values for Neon Instructions.

2013-10-15 Thread Richard Earnshaw

On 15/10/13 12:23, James Greenhalgh wrote:
> Hi,
> 
> The historical neon_type attribute formed groups over the Neon
> instructions which were well suited for modelling the Cortex-A8
> pipeline, but were cumbersome for other processor models.
> 
> The AArch64 has another classification "simd_type". This is, with a
> few exceptions, and when augmented by simd_mode, suitably high
> resolution for our needs. However, this is not integrated in to the
> "type" attribute, which we would ideally like to be the One True
> instruction classification attribute.
> 
> This patch series aims to solve these problems by defining a new,
> high resolution classification across the Neon instructions. From
> this we can derive two benefits:
> 
>   * Convergence between the A32 and A64 backends.
>   * Better Pipeline modeling.
> 
> The patch series first introduces the new Neon type classifications,
> then updates config/arm/neon.md and config/aarch64/aarch64-simd.md
> to use the new classifications.
> 
> We then update the pipeline models for the new types. For Cortex-A8
> and Cortex-A9, this simply means reforming the old groups. For Cortex-A15,
> this is a chance to form new groups with which we can better model the
> pipeline latencies.
> 
> Finally, we can remove the old types and config/arm/neon-schedgen.ml.
> 
> The patch series has been bootstrapped on a Chromebook and the full
> testsuite run with no regressions. All pipeline models have been
> checked against some sample neon intrinsics code to ensure the new
> schedules are sensible, and there are no holes in the pipeline models.
> 
> Thanks,
> James
> 
> ---
> James Greenhalgh (10):
>   [ARM] [1/10] Add new types to describe Neon insns.
>   [AArch64] [Neon types 2/10] Update Current type attributes to new Neon
> Types.
>   [ARM] [Neon types 3/10] Update Current type attributes to new Neon
> Types.
>   [AArch64] [Neon types 4/10] Add type attributes to all simd insns
>   [ARM] [Neon types 5/10] Update Cortex-A8 pipeline model
>   [ARM] [Neon types 6/10] Cortex-A9 neon pipeline changes
>   [ARM] [Neon types 7/10] Cortex-A15 neon pipeline changes
>   [ARM] [Neon types 8/10] Cortex-A7 neon pipeline model
>   [ARM] [Neon types 9/10] Remove old neon types
>   [ARM] [Neon types 10/10] Remove neon-schedgen.ml
> 

All of the above are OK.

R.

Re: [PATCH v2] Fix libgfortran cross compile configury w.r.t newlib

2013-10-15 Thread Richard Earnshaw

On 15/10/13 12:31, Marcus Shawcroft wrote:
> On 1 October 2013 12:40, Marcus Shawcroft  wrote:
>> On 30/09/13 13:40, Marcus Shawcroft wrote:
>>
 Well, I thought this patch would work for me, but it does not.  It looks
 like gcc_no_link is set to 'no' on my target because, technically, I can
 link even if I don't use a linker script.  I just can't find any
 functions.

>>
>>> In which case gating on gcc_no_link could be replaced with a test that
>>> looks to see if we can link with the library.  Perhaps looking for
>>> exit() or some such that might reasonably be expected to be present.
>>>
>>> For example:
>>>
>>> AC_CHECK_FUNC(exit)
>>> if test "x${with_newlib}" = "xyes" -a "x${ac_cv_func_exit}" = "xno"; then
>>>
>>> /Marcus
>>>
>>>
>>>
>>>
>>
>>
>> Patch attached.
>>
>> /Marcus
>>
>> 2013-10-01  Marcus Shawcroft  
>>
>> * configure.ac (AC_CHECK_FUNCS_ONCE): Add for exit() then make
>> existing AC_CHECK_FUNCS_ONCE dependent on outcome.
> 
> Ping^2
> 
> /Marcus
> 

OK provided no Fortran maintainer objects within the next 24 hours.

R.

Re: [PATCH] Fix pr571518.c test case.

2013-10-15 Thread Jack Howarth

On Fri, Jul 05, 2013 at 09:25:52AM -0700, Mike Stump wrote:
> On Jul 4, 2013, at 9:17 AM, Marcus Shawcroft  wrote:
> >* gcc.dg/pr57518.c: Adjust scan-rtl-dump-not pattern.
> 
> [ If you want a review or need an approval, be sure to ask Ok?  Just in case 
> you forgot...  ]  Ok.
> 
> Thanks.

Looks like this fix should have been applied to the gcc-4_8-branch as well...

http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01077.html

Too late for gcc 4.8.2 I guess but after the branch reopens...

2013-07-04  Marcus Shawcroft  

* gcc.dg/pr57518.c: Adjust scan-rtl-dump-not pattern.
diff --git a/gcc/testsuite/gcc.dg/pr57518.c b/gcc/testsuite/gcc.dg/pr57518.c
index 4c84a85..47e819c 100644
--- a/gcc/testsuite/gcc.dg/pr57518.c
+++ b/gcc/testsuite/gcc.dg/pr57518.c
@@ -2,7 +2,7 @@
 
 /* { dg-do compile } */
 /* { dg-options "-O2 -fdump-rtl-ira" } */
-/* { dg-final { scan-rtl-dump-not "REG_EQUIV.*mem.*\"ip\"" "ira" } } */
+/* { dg-final { scan-rtl-dump-not "REG_EQUIV\[^\n\]*mem\[^\n\]*\"ip\"" "ira" } 
} */
 
 char ip[10];
 int total;

should be applied there for gcc 4.8.3.
Jack

RE: [PATCH] Fix PR58143 and dups

2013-10-15 Thread Bernd Edlinger

Hi,

On Tue, 15 Oct 2013 15:57:16, Richard Biener wrote:
>
> This is an alternate fix (see
> http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00234.html for the other
> one) for the various PRs that show that LIM exposes undefined
> signed overflow on paths where it wasn't executed before LIM
> ultimately leading to a miscompilation.
>
> For this fix we rewrite invariant stmts to use unsigned arithmetic
> when it is one of the operations that SCEV and niter analysis
> handles (thus not division or absolute). The other fix instead
> disables invariant motion for those expressions.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> The issue is also present on the 4.8 branch, so either patch
> should be backported in the end. I will try to benchmark
> both tomorrow (unless somebody beats me on that).
>
> Any preference or other suggestions?
>
> Thanks,
> Richard.
>

Thanks.

Your patch looks good to me.

Although this approach also has a price:
That is rewriting the statement from signed to unsigned arithmetic
may masquerade the possible undefined behaviour to the loop optimization.

Bernd.


> 2013-10-15 Richard Biener 
>
> PR tree-optimization/58143
> * tree-ssa-loop-im.c (arith_code_with_undefined_signed_overflow):
> New function.
> (rewrite_to_defined_overflow): Likewise.
> (move_computations_dom_walker::before_dom): Rewrite stmts
> with undefined signed overflow that are not always executed
> into unsigned arithmetic.
>
> * gcc.dg/torture/pr58143-1.c: New testcase.
> * gcc.dg/torture/pr58143-2.c: Likewise.
> * gcc.dg/torture/pr58143-3.c: Likewise.
>
> Index: gcc/tree-ssa-loop-im.c
> ===
> *** gcc/tree-ssa-loop-im.c (revision 203600)
> --- gcc/tree-ssa-loop-im.c (working copy)
> *** public:
> *** 1117,1122 
> --- 1117,1183 
> unsigned int todo_;
> };
>
> + /* Return true if CODE is an operation that when operating on signed
> + integer types involves undefined behavior on overflow and the
> + operation can be expressed with unsigned arithmetic. */
> +
> + static bool
> + arith_code_with_undefined_signed_overflow (tree_code code)
> + {
> + switch (code)
> + {
> + case PLUS_EXPR:
> + case MINUS_EXPR:
> + case MULT_EXPR:
> + case NEGATE_EXPR:
> + case POINTER_PLUS_EXPR:
> + return true;
> + default:
> + return false;
> + }
> + }
> +
> + /* Rewrite STMT, an assignment with a signed integer or pointer arithmetic
> + operation that can be transformed to unsigned arithmetic by converting
> + its operand, carrying out the operation in the corresponding unsigned
> + type and converting the result back to the original type.
> +
> + Returns a sequence of statements that replace STMT and also contain
> + a modified form of STMT itself. */
> +
> + static gimple_seq
> + rewrite_to_defined_overflow (gimple stmt)
> + {
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> + fprintf (dump_file, "rewriting stmt with undefined signed "
> + "overflow ");
> + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
> + }
> +
> + tree lhs = gimple_assign_lhs (stmt);
> + tree type = unsigned_type_for (TREE_TYPE (lhs));
> + gimple_seq stmts = NULL;
> + for (unsigned i = 1; i < gimple_num_ops (stmt); ++i)
> + {
> + gimple_seq stmts2 = NULL;
> + gimple_set_op (stmt, i,
> + force_gimple_operand (fold_convert (type,
> + gimple_op (stmt, i)),
> + &stmts2, true, NULL_TREE));
> + gimple_seq_add_seq (&stmts, stmts2);
> + }
> + gimple_assign_set_lhs (stmt, make_ssa_name (type, stmt));
> + if (gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR)
> + gimple_assign_set_rhs_code (stmt, PLUS_EXPR);
> + gimple_seq_add_stmt (&stmts, stmt);
> + gimple cvt = gimple_build_assign_with_ops
> + (NOP_EXPR, lhs, gimple_assign_lhs (stmt), NULL_TREE);
> + gimple_seq_add_stmt (&stmts, cvt);
> +
> + return stmts;
> + }
> +
> /* Hoist the statements in basic block BB out of the loops prescribed by
> data stored in LIM_DATA structures associated with each statement. Callback
> for walk_dominator_tree. */
> *** move_computations_dom_walker::before_dom
> *** 1247,1253 
> }
> }
> gsi_remove (&bsi, false);
> ! gsi_insert_on_edge (e, stmt);
> }
> }
>
> --- 1308,1328 
> }
> }
> gsi_remove (&bsi, false);
> ! /* In case this is a stmt that is not unconditionally executed
> ! when the target loop header is executed and the stmt may
> ! invoke undefined integer or pointer overflow rewrite it to
> ! unsigned arithmetic. */
> ! if (is_gimple_assign (stmt)
> ! && INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt)))
> ! && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (gimple_assign_lhs (stmt)))
> ! && arith_code_with_undefined_signed_overflow
> ! (gimple_assign_rhs_code (stmt))
> ! && (!ALWAYS_EXECUTED_IN (bb)
> ! || !(ALWAYS_EXECUTED_IN (bb) == level
> ! || flow_loop_nested_p (ALWAYS_EXECUTED_IN (bb), level
> ! gsi_insert_seq_on_edge (e, rewrite_to_defined_overflow (stmt));
> ! else
> ! gsi_insert_on_edge (e, stmt);
> }
> }
>
> Index: gcc/testsuite/gcc.dg/torture/pr58143

Re: [PATCH i386 3/8] [AVX512] [18/n] Add AVX-512 patterns: various RCPs and SQRTs.

2013-10-15 Thread Kirill Yukhin

Hello,
On 14 Oct 13:10, Richard Henderson wrote:
> On 10/09/2013 03:31 AM, Kirill Yukhin wrote:
> > +(define_mode_attr ssefixupmode
> > +  [(V16SF "V16SI") (V4SF "V4SI") (V8DF "V8DI") (V2DF "V2DI")])
> > +
> 
> Oh, I forgot.  How is this different from sseintvecmode?
It is definetely a bug. Ok with corresponding replacement and removal
of redundant mode attr?

--
Thanks,  K

[PATCH] Fix PR58143 and dups

2013-10-15 Thread Richard Biener


This is an alternate fix (see 
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00234.html for the other
one) for the various PRs that show that LIM exposes undefined
signed overflow on paths where it wasn't executed before LIM
ultimately leading to a miscompilation.

For this fix we rewrite invariant stmts to use unsigned arithmetic
when it is one of the operations that SCEV and niter analysis
handles (thus not division or absolute).  The other fix instead
disables invariant motion for those expressions.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

The issue is also present on the 4.8 branch, so either patch
should be backported in the end.  I will try to benchmark
both tomorrow (unless somebody beats me on that).

Any preference or other suggestions?

Thanks,
Richard.

2013-10-15  Richard Biener  

PR tree-optimization/58143
* tree-ssa-loop-im.c (arith_code_with_undefined_signed_overflow):
New function.
(rewrite_to_defined_overflow): Likewise.
(move_computations_dom_walker::before_dom): Rewrite stmts
with undefined signed overflow that are not always executed
into unsigned arithmetic.

* gcc.dg/torture/pr58143-1.c: New testcase.
* gcc.dg/torture/pr58143-2.c: Likewise.
* gcc.dg/torture/pr58143-3.c: Likewise.

Index: gcc/tree-ssa-loop-im.c
===
*** gcc/tree-ssa-loop-im.c  (revision 203600)
--- gcc/tree-ssa-loop-im.c  (working copy)
*** public:
*** 1117,1122 
--- 1117,1183 
unsigned int todo_;
  };
  
+ /* Return true if CODE is an operation that when operating on signed
+integer types involves undefined behavior on overflow and the
+operation can be expressed with unsigned arithmetic.  */
+ 
+ static bool
+ arith_code_with_undefined_signed_overflow (tree_code code)
+ {
+   switch (code)
+ {
+ case PLUS_EXPR:
+ case MINUS_EXPR:
+ case MULT_EXPR:
+ case NEGATE_EXPR:
+ case POINTER_PLUS_EXPR:
+   return true;
+ default:
+   return false;
+ }
+ }
+ 
+ /* Rewrite STMT, an assignment with a signed integer or pointer arithmetic
+operation that can be transformed to unsigned arithmetic by converting
+its operand, carrying out the operation in the corresponding unsigned
+type and converting the result back to the original type.
+ 
+Returns a sequence of statements that replace STMT and also contain
+a modified form of STMT itself.  */
+ 
+ static gimple_seq
+ rewrite_to_defined_overflow (gimple stmt)
+ {
+   if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+   fprintf (dump_file, "rewriting stmt with undefined signed "
+  "overflow ");
+   print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+ }
+ 
+   tree lhs = gimple_assign_lhs (stmt);
+   tree type = unsigned_type_for (TREE_TYPE (lhs));
+   gimple_seq stmts = NULL;
+   for (unsigned i = 1; i < gimple_num_ops (stmt); ++i)
+ {
+   gimple_seq stmts2 = NULL;
+   gimple_set_op (stmt, i,
+force_gimple_operand (fold_convert (type,
+gimple_op (stmt, i)),
+  &stmts2, true, NULL_TREE));
+   gimple_seq_add_seq (&stmts, stmts2);
+ }
+   gimple_assign_set_lhs (stmt, make_ssa_name (type, stmt));
+   if (gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR)
+ gimple_assign_set_rhs_code (stmt, PLUS_EXPR);
+   gimple_seq_add_stmt (&stmts, stmt);
+   gimple cvt = gimple_build_assign_with_ops
+   (NOP_EXPR, lhs, gimple_assign_lhs (stmt), NULL_TREE);
+   gimple_seq_add_stmt (&stmts, cvt);
+ 
+   return stmts;
+ }
+ 
  /* Hoist the statements in basic block BB out of the loops prescribed by
 data stored in LIM_DATA structures associated with each statement.  
Callback
 for walk_dominator_tree.  */
*** move_computations_dom_walker::before_dom
*** 1247,1253 
}
}
gsi_remove (&bsi, false);
!   gsi_insert_on_edge (e, stmt);
  }
  }
  
--- 1308,1328 
}
}
gsi_remove (&bsi, false);
!   /* In case this is a stmt that is not unconditionally executed
!  when the target loop header is executed and the stmt may
!invoke undefined integer or pointer overflow rewrite it to
!unsigned arithmetic.  */
!   if (is_gimple_assign (stmt)
! && INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt)))
! && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (gimple_assign_lhs (stmt)))
! && arith_code_with_undefined_signed_overflow
!  (gimple_assign_rhs_code (stmt))
! && (!ALWAYS_EXECUTED_IN (bb)
! || !(ALWAYS_EXECUTED_IN (bb) == level
!  || flow_loop_nested_p (ALWAYS_EXECUTED_IN (bb), level
!   gsi_insert_seq_on_edge (e, rewrite_to_defined_overflow (stmt));
!   else
!   gsi_insert_on_edge (e, stmt);
  }

Re: [PATCH i386 3/8] [AVX512] [16/n] Add AVX-512 patterns: VI48_512 and VI4F_128 iterators.

2013-10-15 Thread Kirill Yukhin

Hello,
On 11 Oct 11:21, Richard Henderson wrote:
> On 10/09/2013 03:30 AM, Kirill Yukhin wrote:
> > +;; Return true if OP is either -1 constant or stored in register.
> > +(define_predicate "register_or_constm1_operand"
> > +  (ior (match_operand 0 "register_operand")
> > +   (match_test "op == constm1_rtx")))
> 
> This won't do the right thing, because you're not exposing
> that const_int is a valid input.  You need a match_code too.
Thanks, fixed and checked in.

--
Thanks, K

Re: [PATCH i386 3/8] [AVX512] [15/n] Add AVX-512 patterns: VI48F_512 iterator.

2013-10-15 Thread Kirill Yukhin

Hello,
On 11 Oct 10:30, Richard Henderson wrote:
> On 10/09/2013 03:29 AM, Kirill Yukhin wrote:
> > +(define_insn "avx512f_vec_dup_mem"
> > +  [(set (match_operand:VI48F_512 0 "register_operand" "=x")
> > +   (vec_duplicate:VI48F_512
> > + (match_operand: 1 "nonimmediate_operand" "xm")))]
> > +  "TARGET_AVX512F"
> > +  "vbroadcast\t{%1, %0|%0, %1}"
> > +  [(set_attr "type" "ssemov")
> > +   (set_attr "prefix" "evex")
> > +   (set_attr "mode" "")])
> 
> Ought these be 'v' not 'x'?
Good catch! Fixed and checked in.

--
Thanks, K

[PATCH] Fix profile count updates during tail merging

2013-10-15 Thread Teresa Johnson

This patch fixes a profile count insanity introduced by ssa tail
merging. When replacing bb1 with bb2, which has the same successors,
the bb counts were being merged, but the successor edge weights
were not.

Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk?

Thanks,
Teresa

2013-10-15  Teresa Johnson  

* tree-ssa-tail-merge.c (replace_block_by): Update edge
weights during merging.

Index: tree-ssa-tail-merge.c
===
--- tree-ssa-tail-merge.c   (revision 203389)
+++ tree-ssa-tail-merge.c   (working copy)
@@ -1462,6 +1462,8 @@ static void
 replace_block_by (basic_block bb1, basic_block bb2)
 {
   edge pred_edge;
+  edge e1;
+  edge_iterator ei;
   unsigned int i;
   gimple bb2_phi;

@@ -1488,6 +1490,15 @@ replace_block_by (basic_block bb1, basic_block bb2
   pred_edge, UNKNOWN_LOCATION);
 }

+  /* Merge the outgoing edge counts from bb1 onto bb2.  */
+  FOR_EACH_EDGE (e1, ei, bb1->succs)
+{
+  edge e2;
+  e2 = find_edge (bb2, e1->dest);
+  gcc_assert (e2);
+  e2->count += e1->count;
+}
+
   bb2->frequency += bb1->frequency;
   if (bb2->frequency > BB_FREQ_MAX)
 bb2->frequency = BB_FREQ_MAX;


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: [PATCH] Relax the requirement of reduction pattern in GCC vectorizer.

2013-10-15 Thread Richard Biener

On Sat, Sep 28, 2013 at 3:28 AM, Cong Hou  wrote:
> The current GCC vectorizer requires the following pattern as a simple
> reduction computation:
>
>loop_header:
>  a1 = phi < a0, a2 >
>  a3 = ...
>  a2 = operation (a3, a1)
>
> But a3 can also be defined outside of the loop. For example, the
> following loop can benefit from vectorization but the GCC vectorizer
> fails to vectorize it:
>
>
> int foo(int v)
> {
>   int s = 1;
>   ++v;
>   for (int i = 0; i < 10; ++i)
> s *= v;
>   return s;
> }
>
>
> This patch relaxes the original requirement by also considering the
> following pattern:
>
>
>a3 = ...
>loop_header:
>  a1 = phi < a0, a2 >
>  a2 = operation (a3, a1)
>
>
> A test case is also added. The patch is tested on x86-64.
>
>
> thanks,
> Cong
>
> 
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 39c786e..45c1667 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,9 @@
> +2013-09-27  Cong Hou  
> +
> + * tree-vect-loop.c: Relax the requirement of the reduction

ChangeLog format is

* tree-vect-loop.c (vect_is_simple_reduction_1): Relax the
requirement of the reduction.

Ok with that change.

Thanks,
Richard.

> + pattern so that one operand of the reduction operation can
> + come from outside of the loop.
> +
>  2013-09-25  Tom Tromey  
>
>   * Makefile.in (PARTITION_H, LTO_SYMTAB_H, COMMON_TARGET_DEF_H)
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index 09644d2..90496a2 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,3 +1,7 @@
> +2013-09-27  Cong Hou  
> +
> + * gcc.dg/vect/vect-reduc-pattern-3.c: New test.
> +
>  2013-09-25  Marek Polacek  
>
>   PR sanitizer/58413
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 2871ba1..3c51c3b 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -2091,6 +2091,13 @@ vect_is_slp_reduction (loop_vec_info loop_info,
> gimple phi, gimple first_stmt)
>   a3 = ...
>   a2 = operation (a3, a1)
>
> +   or
> +
> +   a3 = ...
> +   loop_header:
> + a1 = phi < a0, a2 >
> + a2 = operation (a3, a1)
> +
> such that:
> 1. operation is commutative and associative and it is safe to
>change the order of the computation (if CHECK_REDUCTION is true)
> @@ -2451,6 +2458,7 @@ vect_is_simple_reduction_1 (loop_vec_info
> loop_info, gimple phi,
>if (def2 && def2 == phi
>&& (code == COND_EXPR
>|| !def1 || gimple_nop_p (def1)
> +  || !flow_bb_inside_loop_p (loop, gimple_bb (def1))
>|| (def1 && flow_bb_inside_loop_p (loop, gimple_bb (def1))
>&& (is_gimple_assign (def1)
>|| is_gimple_call (def1)
> @@ -2469,6 +2477,7 @@ vect_is_simple_reduction_1 (loop_vec_info
> loop_info, gimple phi,
>if (def1 && def1 == phi
>&& (code == COND_EXPR
>|| !def2 || gimple_nop_p (def2)
> +  || !flow_bb_inside_loop_p (loop, gimple_bb (def2))
>|| (def2 && flow_bb_inside_loop_p (loop, gimple_bb (def2))
>&& (is_gimple_assign (def2)
>|| is_gimple_call (def2)
> diff --git gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-3.c
> gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-3.c
> new file mode 100644
> index 000..06a9416
> --- /dev/null
> +++ gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-3.c
> @@ -0,0 +1,41 @@
> +/* { dg-require-effective-target vect_int } */
> +
> +#include 
> +#include "tree-vect.h"
> +
> +#define N 10
> +#define RES 1024
> +
> +/* A reduction pattern in which there is no data ref in
> +   the loop and one operand is defined outside of the loop.  */
> +
> +__attribute__ ((noinline)) int
> +foo (int v)
> +{
> +  int i;
> +  int result = 1;
> +
> +  ++v;
> +  for (i = 0; i < N; i++)
> +result *= v;
> +
> +  return result;
> +}
> +
> +int
> +main (void)
> +{
> +  int res;
> +
> +  check_vect ();
> +
> +  res = foo (1);
> +  if (res != RES)
> +abort ();
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> +

Re: [PATCH] tree_code_name wrapper

2013-10-15 Thread Richard Biener

On Tue, Oct 15, 2013 at 3:19 PM, Paulo Matos  wrote:
>> -Original Message-
>> From: Jakub Jelinek [mailto:ja...@redhat.com]
>> Sent: 15 October 2013 10:51
>> To: Paulo Matos
>> Cc: Richard Biener; Paolo Carlini; gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH] tree_code_name wrapper
>>
>> On Tue, Oct 15, 2013 at 09:42:17AM +, Paulo Matos wrote:
>> > Thanks, regarding the indentation I was convinced we used 2 space
>> > indentation with maximum line length of 80 characters.
>>
>> We do, however 8 consecutive spaces in the indentation should be always
>> replaced by a tab.
>>
>
> This means that every sequence of 8 spaces should be converted into tabs? So, 
> if we indent something 4 times, that becomes a tab instead of 4 times 2 
> spaces.

Correct.

Richard.

> Just to confirm so I can setup my emacs to use this.
>
> Thanks,
> --
> Paulo Matos
>

Re: [PATCH, i386, MPX, 1/X] Support of Intel MPX ISA. 1/2 Bound type and modes

2013-10-15 Thread Ilya Enkovich

Hey guys,

could please someone look at this small patch? It blocks approved MPX
ISA support on i386 target.

Thanks,
Ilya

2013/10/2 Ilya Enkovich :
> Ping
>
> 2013/9/17 Ilya Enkovich :
>> Hi,
>>
>> Here is a patch introducing new type and mode for bounds. It is a part of 
>> MPX ISA support patch 
>> (http://gcc.gnu.org/ml/gcc-patches/2013-07/msg01094.html).
>>
>> Bootstrapped and tested on linux-x86_64. Is it OK for trunk?
>>
>> Thanks,
>> Ilya
>> --
>>
>> gcc/
>>
>> 2013-09-16  Ilya Enkovich  
>>
>> * mode-classes.def (MODE_BOUND): New.
>> * tree.def (BOUND_TYPE): New.
>> * genmodes.c (complete_mode): Support MODE_BOUND.
>> (BOUND_MODE): New.
>> (make_bound_mode): New.
>> * machmode.h (BOUND_MODE_P): New.
>> * stor-layout.c (int_mode_for_mode): Support MODE_BOUND.
>> (layout_type): Support BOUND_TYPE.
>> * tree-pretty-print.c (dump_generic_node): Support BOUND_TYPE.
>> * tree.c (build_int_cst_wide): Support BOUND_TYPE.
>> (type_contains_placeholder_1): Likewise.
>> * tree.h (BOUND_TYPE_P): New.
>> * varasm.c (output_constant): Support BOUND_TYPE.
>> * doc/rtl.texi (MODE_BOUND): New.
>>
>> diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
>> index 1d62223..02b1214 100644
>> --- a/gcc/doc/rtl.texi
>> +++ b/gcc/doc/rtl.texi
>> @@ -1382,6 +1382,10 @@ any @code{CC_MODE} modes listed in the 
>> @file{@var{machine}-modes.def}.
>>  @xref{Jump Patterns},
>>  also see @ref{Condition Code}.
>>
>> +@findex MODE_BOUND
>> +@item MODE_BOUND
>> +Bound modes class.  Used to represent values of pointer bounds.
>> +
>>  @findex MODE_RANDOM
>>  @item MODE_RANDOM
>>  This is a catchall mode class for modes which don't fit into the above
>> diff --git a/gcc/genmodes.c b/gcc/genmodes.c
>> index dc38483..89174ec 100644
>> --- a/gcc/genmodes.c
>> +++ b/gcc/genmodes.c
>> @@ -333,6 +333,7 @@ complete_mode (struct mode_data *m)
>>break;
>>
>>  case MODE_INT:
>> +case MODE_BOUND:
>>  case MODE_FLOAT:
>>  case MODE_DECIMAL_FLOAT:
>>  case MODE_FRACT:
>> @@ -533,6 +534,18 @@ make_special_mode (enum mode_class cl, const char *name,
>>new_mode (cl, name, file, line);
>>  }
>>
>> +#define BOUND_MODE(N, Y) make_bound_mode (#N, Y, __FILE__, __LINE__)
>> +
>> +static void ATTRIBUTE_UNUSED
>> +make_bound_mode (const char *name,
>> +   unsigned int bytesize,
>> +   const char *file, unsigned int line)
>> +{
>> +  struct mode_data *m = new_mode (MODE_BOUND, name, file, line);
>> +  m->bytesize = bytesize;
>> +}
>> +
>> +
>>  #define INT_MODE(N, Y) FRACTIONAL_INT_MODE (N, -1U, Y)
>>  #define FRACTIONAL_INT_MODE(N, B, Y) \
>>make_int_mode (#N, B, Y, __FILE__, __LINE__)
>> diff --git a/gcc/machmode.h b/gcc/machmode.h
>> index 981ee92..d4a20b2 100644
>> --- a/gcc/machmode.h
>> +++ b/gcc/machmode.h
>> @@ -174,6 +174,9 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
>> || CLASS == MODE_ACCUM  \
>> || CLASS == MODE_UACCUM)
>>
>> +#define BOUND_MODE_P(MODE)  \
>> +  (GET_MODE_CLASS (MODE) == MODE_BOUND)
>> +
>>  /* Get the size in bytes and bits of an object of mode MODE.  */
>>
>>  extern CONST_MODE_SIZE unsigned char mode_size[NUM_MACHINE_MODES];
>> diff --git a/gcc/mode-classes.def b/gcc/mode-classes.def
>> index 7207ef7..c5ea215 100644
>> --- a/gcc/mode-classes.def
>> +++ b/gcc/mode-classes.def
>> @@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
>>DEF_MODE_CLASS (MODE_RANDOM),/* other */  
>>   \
>>DEF_MODE_CLASS (MODE_CC),/* condition code in a register */ \
>>DEF_MODE_CLASS (MODE_INT),   /* integer */  \
>> +  DEF_MODE_CLASS (MODE_BOUND),/* bounds */ \
>>DEF_MODE_CLASS (MODE_PARTIAL_INT),   /* integer with padding bits */\
>>DEF_MODE_CLASS (MODE_FRACT), /* signed fractional number */ \
>>DEF_MODE_CLASS (MODE_UFRACT),/* unsigned fractional 
>> number */   \
>> diff --git a/gcc/stor-layout.c b/gcc/stor-layout.c
>> index 6f6b310..82611c7 100644
>> --- a/gcc/stor-layout.c
>> +++ b/gcc/stor-layout.c
>> @@ -383,6 +383,7 @@ int_mode_for_mode (enum machine_mode mode)
>>  case MODE_VECTOR_ACCUM:
>>  case MODE_VECTOR_UFRACT:
>>  case MODE_VECTOR_UACCUM:
>> +case MODE_BOUND:
>>mode = mode_for_size (GET_MODE_BITSIZE (mode), MODE_INT, 0);
>>break;
>>
>> @@ -2135,6 +2136,13 @@ layout_type (tree type)
>>SET_TYPE_MODE (type, VOIDmode);
>>break;
>>
>> +case BOUND_TYPE:
>> +  SET_TYPE_MODE (type,
>> + mode_for_size (TYPE_PRECISION (type), MODE_BOUND, 0));
>> +  TYPE_SIZE (type) = bitsize_int (GET_MODE_BITSIZE (TYPE_MODE (type)));
>> +  TYPE_SIZE_UNIT (type) = size_int (GET_MODE_SIZE (TYPE_MODE (type)));
>> +  break;
>> +
>>  case OFFSET_TYPE:
>>TYPE_SIZE (type) = bitsize

Re: [PING^2][PATCH][2 of 2] RTL expansion for zero sign extension elimination with VRP

2013-10-15 Thread Richard Biener

On Tue, 15 Oct 2013, Kugan wrote:

> Hi Eric,
> 
> Can you please help to review this patch?
> http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00452.html

I think that gimple_assign_is_zero_sign_ext_redundant and its
description is somewhat confused.  You seem to have two cases
here, one being NOP_EXPR or CONVERT_EXPR that are in the IL
because they are required for type correctness.  So,

   long = (long) int

which usually expands to

   (set:DI (sext:DI reg:SI))

you want to expand as

   (set:DI (subreg:DI reg:SI))

where you have to be careful for modes smaller than word_mode.
You don't seem to implement this optimization though (but
the gimple_assign_is_zero_sign_ext_redundant talks about it).

Second are promotions required by the target (PROMOTE_MODE)
that do arithmetic on wider registers like for

  char = char + char

where they cannot do

  (set:QI (plus:QI reg:QI reg:QI))

but instead have, for example reg:SI only and you get

  (set:SI (plus:SI reg:SI reg:SI))
  (set:SI ([sz]ext:SI (subreg:QI reg:SI)))

that you try to address with the cfgexpand hunk but I believe
it doesn't work the way you do it.  That is because on GIMPLE
you do not see SImode temporaries and thus no SImode value-ranges.
Consider

  tem = (char) 255 + (char) 1;

which has a value-range of [0,0] but clearly when computed in
SImode the value-range is [256, 256].  That is, VRP computes
value-ranges in the expression type, not in some arbitrary
larger type.

So what you'd have to do is take the value-ranges of the
two operands of the plus and see whether the plus can overflow
QImode when computed in SImode (for the example).

[exposing the effect of PROMOTE_MODE earlier than at RTL expansion
time may make this less awkward]

Thanks,
Richard.

> Thanks,
> Kugan
> 
> > +2013-09-25  Kugan Vivekanandarajah  
> > +
> > +   * dojump.c (do_compare_and_jump): Generate rtl without
> > +   zero/sign extension if redundant.
> > +   * cfgexpand.c (expand_gimple_stmt_1): Likewise.
> > +   * gimple.c (gimple_assign_is_zero_sign_ext_redundant) : New
> > +   function.
> > +   * gimple.h (gimple_assign_is_zero_sign_ext_redundant) : Declare.
> > +
> > 
> >

Re: [PATCH v4 04/20] add configury

2013-10-15 Thread Tom Tromey

Gerald>  - When I do a gmake at the top level of the build tree, nothing is 
Gerald>rebuilt at all.  This only happens during `gmake install`
[...]
Gerald> Is nobody else seeing this?  Or is everyone just lucky enough to have
Gerald> a recent version of GCC as the default compiler?

How exactly are you configuring and invoking "make"?
I will try to reproduce it.

Tom

RE: [PATCH] tree_code_name wrapper

2013-10-15 Thread Paulo Matos

> -Original Message-
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: 15 October 2013 10:51
> To: Paulo Matos
> Cc: Richard Biener; Paolo Carlini; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] tree_code_name wrapper
> 
> On Tue, Oct 15, 2013 at 09:42:17AM +, Paulo Matos wrote:
> > Thanks, regarding the indentation I was convinced we used 2 space
> > indentation with maximum line length of 80 characters.
> 
> We do, however 8 consecutive spaces in the indentation should be always
> replaced by a tab.
> 

This means that every sequence of 8 spaces should be converted into tabs? So, 
if we indent something 4 times, that becomes a tab instead of 4 times 2 spaces. 

Just to confirm so I can setup my emacs to use this.

Thanks,
-- 
Paulo Matos

Re: [PATCH][ubsan] Add VLA bound instrumentation

2013-10-15 Thread Marek Polacek

Ping^2.  Jason, Joseph, are you fine with the C++/C FE changes?

Thanks.

On Mon, Oct 07, 2013 at 10:17:38PM +0200, Marek Polacek wrote:
> Ping.
> 
> On Wed, Sep 25, 2013 at 02:41:32PM +0200, Marek Polacek wrote:
> > On Thu, Sep 12, 2013 at 02:26:55PM +0200, Marek Polacek wrote:
> > > This patch adds the instrumentation of VLA bounds.  Basically, it just 
> > > checks that
> > > the size of a VLA is positive.  I.e., We also issue an error if the size 
> > > of the
> > > VLA is 0.  It catches e.g.
> > > 
> > > int i = 1;
> > > int a[i][i - 2];
> > > 
> > > It is pretty straightforward, but I had
> > > issues in the C++ FE, mainly choosing the right spot where to 
> > > instrument...
> > > Hopefully I picked up the right one.  Also note that in C++1y we throw
> > > an exception when the size of a VLA is negative; hence no need to perform
> > > the instrumentation if -std=c++1y is in effect.
> > > 
> > > Regtested/ran bootstrap-ubsan on x86_64-linux, also
> > > make check -C gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} 
> > > ubsan.exp'
> > > passes.
> > > 
> > > Ok for trunk?
> > 
> > I'd like to ping this patch; below is rebased version with the ubsan.c
> > hunk omitted, since that part was already fixed by another patch.
> > 
> > (It still doesn't contain alloca/SIZE_MAX/... checking, since that
> > very much relies on libubsan.  Still, it'd be felicitous to get at
> > least the basic VLA checking in.)
> > 
> > Ran ubsan testsuite + bootstrap-ubsan on x86_64-linux.
> > 
> > 2013-09-25  Marek Polacek  
> > 
> > * opts.c (common_handle_option): Handle vla-bound.
> > * sanitizer.def (BUILT_IN_UBSAN_HANDLE_VLA_BOUND_NOT_POSITIVE):
> > Define.
> > * flag-types.h (enum sanitize_code): Add SANITIZE_VLA.
> > * asan.c (initialize_sanitizer_builtins): Build BT_FN_VOID_PTR_PTR.
> > c-family/
> > * c-ubsan.c: Don't include hash-table.h.
> > (ubsan_instrument_vla): New function.
> > * c-ubsan.h: Declare it.
> > cp/
> > * decl.c (create_array_type_for_decl): Add VLA instrumentation.
> > c/
> > * c-decl.c (grokdeclarator): Add VLA instrumentation.
> > testsuite/
> > * g++.dg/ubsan/cxx1y-vla.C: New test.
> > * c-c++-common/ubsan/vla-3.c: New test.
> > * c-c++-common/ubsan/vla-2.c: New test.
> > * c-c++-common/ubsan/vla-4.c: New test.
> > * c-c++-common/ubsan/vla-1.c: New test.
> > 
> > --- gcc/opts.c.mp   2013-09-25 14:06:58.531276511 +0200
> > +++ gcc/opts.c  2013-09-25 14:07:03.580294566 +0200
> > @@ -1428,6 +1428,7 @@ common_handle_option (struct gcc_options
> >   { "undefined", SANITIZE_UNDEFINED, sizeof "undefined" - 1 },
> >   { "unreachable", SANITIZE_UNREACHABLE,
> > sizeof "unreachable" - 1 },
> > + { "vla-bound", SANITIZE_VLA, sizeof "vla-bound" - 1 },
> >   { NULL, 0, 0 }
> > };
> > const char *comma;
> > --- gcc/c-family/c-ubsan.c.mp   2013-09-25 14:06:58.535276527 +0200
> > +++ gcc/c-family/c-ubsan.c  2013-09-25 14:07:03.580294566 +0200
> > @@ -25,7 +25,6 @@ along with GCC; see the file COPYING3.
> >  #include "alloc-pool.h"
> >  #include "cgraph.h"
> >  #include "gimple.h"
> > -#include "hash-table.h"
> >  #include "output.h"
> >  #include "toplev.h"
> >  #include "ubsan.h"
> > @@ -86,8 +85,7 @@ ubsan_instrument_division (location_t lo
> >return t;
> >  }
> >  
> > -/* Instrument left and right shifts.  If not instrumenting, return
> > -   NULL_TREE.  */
> > +/* Instrument left and right shifts.  */
> >  
> >  tree
> >  ubsan_instrument_shift (location_t loc, enum tree_code code,
> > @@ -157,4 +155,23 @@ ubsan_instrument_shift (location_t loc,
> >t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node);
> >  
> >return t;
> > +}
> > +
> > +/* Instrument variable length array bound.  */
> > +
> > +tree
> > +ubsan_instrument_vla (location_t loc, tree size)
> > +{
> > +  tree type = TREE_TYPE (size);
> > +  tree t, tt;
> > +
> > +  t = fold_build2 (LE_EXPR, boolean_type_node, size, build_int_cst (type, 
> > 0));
> > +  tree data = ubsan_create_data ("__ubsan_vla_data",
> > +loc, ubsan_type_descriptor (type), NULL_TREE);
> > +  data = build_fold_addr_expr_loc (loc, data);
> > +  tt = builtin_decl_explicit 
> > (BUILT_IN_UBSAN_HANDLE_VLA_BOUND_NOT_POSITIVE);
> > +  tt = build_call_expr_loc (loc, tt, 2, data, ubsan_encode_value (size));
> > +  t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node);
> > +
> > +  return t;
> >  }
> > --- gcc/c-family/c-ubsan.h.mp   2013-09-25 14:06:58.538276539 +0200
> > +++ gcc/c-family/c-ubsan.h  2013-09-25 14:07:03.595294628 +0200
> > @@ -23,5 +23,6 @@ along with GCC; see the file COPYING3.
> >  
> >  extern tree ubsan_instrument_division (location_t, tree, tree);
> >  extern tree ubsan_instrument_shift (location_t, enum tree_code, tree, 
> > tree);
> > +extern tree ubsan_instrument_vla (location_t, tree);
> >  
> >  #endif  /* GCC_C_UBSAN_H  */
> > --- gcc/sanitizer.def.mp2013-0

[RFC] By default if-convert only basic blocks that will be vectorized

2013-10-15 Thread Jakub Jelinek

Hi!

Especially on i?86/x86_64 if-conversion pass seems to be often
a pessimization, but the vectorization relies on it and without it we can't
vectorize a lot of the loops.

Here is a prototype of a patch that will by default (unless explicit
-ftree-loop-if-convert) only if-convert loops internally for vectorization,
so the COND_EXPRs actually only appear as VEC_COND_EXPRs in the vectorized
basic blocks, but will not appear if vectorization fails, or in the
scalar loop if vectorization is conditional, or in the prologue or epilogue
loops around the vectorized loop.

Instead of moving the ifcvt pass inside of the vectorizer, this patch
during ifcvt performs loop versioning depending on a special internal
call, only if the internal call returns true we go to the if-converted
original loop, otherwise the non-if-converted copy of the original loop
is performed.  And the vectorizer is taught to fold this internal call
into true resp. false depending on if the loop was vectorized or not, and
vectorizer loop versioning, peeling for alignment and for bound are adjusted
to also copy from the non-if-converted loop rather than if-converted one.

Besides fixing the various PRs where if-conversion pessimizes code I'd like
to also move forward with this with conditional loads and stores,
http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00202.html
where the if-unconversion approach looked like a failure.

This patch doesn't yet handle if-converted inner loop in outer loop
vectorization, something on my todo list (so several vect-cond-*.c tests
FAIL because they are no longer vectorized) plus I had to change two
SLP vectorization tests that silently relied on loop if-conversion being
performed to actually optimize the basic block (if the same thing didn't
appear in a loop, it wouldn't be optimized at all).

On the newly added testcase on x86_64, there are before this patch
18 scalar conditional moves, with the patch just 2 (both in the checking
routine).

Comments?

--- gcc/internal-fn.def.jj  2013-10-11 14:32:57.079909782 +0200
+++ gcc/internal-fn.def 2013-10-11 17:23:58.705526840 +0200
@@ -43,3 +43,4 @@ DEF_INTERNAL_FN (STORE_LANES, ECF_CONST
 DEF_INTERNAL_FN (GOMP_SIMD_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW)
 DEF_INTERNAL_FN (GOMP_SIMD_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW)
 DEF_INTERNAL_FN (GOMP_SIMD_LAST_LANE, ECF_CONST | ECF_LEAF | ECF_NOTHROW)
+DEF_INTERNAL_FN (LOOP_VECTORIZED, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW)
--- gcc/tree-vect-loop-manip.c.jj   2013-09-30 22:13:47.0 +0200
+++ gcc/tree-vect-loop-manip.c  2013-10-15 12:57:54.854970913 +0200
@@ -374,24 +374,31 @@ LOOP->  loop1
 
 static void
 slpeel_update_phi_nodes_for_guard1 (edge guard_edge, struct loop *loop,
+   struct loop *scalar_loop,
 bool is_new_loop, basic_block *new_exit_bb)
 {
-  gimple orig_phi, new_phi;
+  gimple orig_phi, new_phi, scalar_phi = NULL;
   gimple update_phi, update_phi2;
   tree guard_arg, loop_arg;
   basic_block new_merge_bb = guard_edge->dest;
   edge e = EDGE_SUCC (new_merge_bb, 0);
   basic_block update_bb = e->dest;
   basic_block orig_bb = loop->header;
-  edge new_exit_e;
+  edge new_exit_e, scalar_e = NULL;
   tree current_new_name;
-  gimple_stmt_iterator gsi_orig, gsi_update;
+  gimple_stmt_iterator gsi_orig, gsi_update, gsi_scalar = gsi_none ();
 
   /* Create new bb between loop and new_merge_bb.  */
   *new_exit_bb = split_edge (single_exit (loop));
 
   new_exit_e = EDGE_SUCC (*new_exit_bb, 0);
 
+  if (scalar_loop != NULL && !is_new_loop)
+{
+  gsi_scalar = gsi_start_phis (scalar_loop->header);
+  scalar_e = EDGE_SUCC (scalar_loop->latch, 0);
+}
+
   for (gsi_orig = gsi_start_phis (orig_bb),
gsi_update = gsi_start_phis (update_bb);
!gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update);
@@ -401,6 +408,11 @@ slpeel_update_phi_nodes_for_guard1 (edge
   tree new_res;
   orig_phi = gsi_stmt (gsi_orig);
   update_phi = gsi_stmt (gsi_update);
+  if (scalar_e != NULL)
+   {
+ scalar_phi = gsi_stmt (gsi_scalar);
+ gsi_next (&gsi_scalar);
+   }
 
   /** 1. Handle new-merge-point phis  **/
 
@@ -460,7 +472,13 @@ slpeel_update_phi_nodes_for_guard1 (edge
 current_new_name = loop_arg;
   else
 {
-  current_new_name = get_current_def (loop_arg);
+ if (scalar_e)
+   {
+ current_new_name = PHI_ARG_DEF_FROM_EDGE (scalar_phi, scalar_e);
+ current_new_name = get_current_def (current_new_name);
+   }
+ else
+   current_new_name = get_current_def (loop_arg);
  /* current_def is not available only if the variable does not
 change inside the loop, in which case we also don't care
 about recording a current_def for it because we won't be
@@ -503,6 +521,7 @@ LOOP->  loop2
 
 static void
 slpeel_update_phi_nodes_for_guard2 (edge guard_edge, struct loop *loop,
+

Re: patch to canonize unsigned tree-csts

2013-10-15 Thread Kenneth Zadeck


i added the assertion that richard requested and tested this on x86-64.

committed as revision 203602.


On 10/06/2013 05:13 AM, Richard Sandiford wrote:

Kenneth Zadeck  writes:

On 10/04/2013 01:00 PM, Richard Sandiford wrote:

I was hoping Richard would weigh in here.  In case not...

Kenneth Zadeck  writes:

I was thinking that we should always be able to use the constant as-is
for max_wide_int-based and addr_wide_int-based operations.  The small_prec

again, you can get edge cased to death here.i think it would work
for max because that really is bigger than anything else, but it is
possible (though unlikely) to have something big converted to an address
by truncation.

But I'd have expected that conversion to be represented by an explicit
CONVERT_EXPR or NOP_EXPR.  It seems wrong to use addr_wide_int directly on
something that isn't bit- or byte-address-sized.  It'd be the C equivalent
of int + long -> int rather than the expected int + long -> long.

Same goes for wide_int.  If we're doing arithmetic at a specific
precision, it seems odd for one of the inputs to be wider and yet
not have an explicit truncation.

you miss the second reason why we needed addr-wide-int.   A large amount
of the places where the addressing arithmetic is done are not "type
safe".Only the gimple and rtl that is translated from the source
code is really type safe. In passes like the strength reduction
where they just "grab things from all over", the addr-wide-int or the
max-wide-int are safe haven structures that are guaranteed to be large
enough to not matter.So what i fear here is something like a very
wide loop counter being grabbed into some address calculation.

It still seems really dangerous to be implicitly truncating a wider type
to addr_wide_int.  It's not something we'd ever do in mainline, because
uses of addr_wide_int are replacing uses of double_int, and double_int
is our current maximum-width representation.

Using addr_wide_int rather than max_wide_int is an optimisation.
IMO part of implementating that optimisation should be to introduce
explicit truncations whenever we try to use addr_wide_int to operate
on inputs than might be wider than addr_wide_int.

So I still think the assert is the way to go.

addr_wide_int is not as much of an optimization as it is documentation
of what you are doing - i.e. this is addressing arithmetic.  My
justification for putting it in was that we wanted a sort of an abstract
type to say that this was not just user math, it was addressing
arithmetic and that the ultimate result is going to be slammed into a
target pointer.

I was only using that as an example to try to indicate that I did not
think that it was wrong if someone did truncate.   In particular, would
you want the assert to be that the value was truncated or that the type
of the value would allow numbers that would be truncated?   I actually
think neither.

I'm arguing for:

 gcc_assert (precision >= xprecision);

in wi::int_traits ::decompose.

IIRC one of the reasons for wanting addr_wide_int rather than wide_int
was that we wanted a type that could handle both bit and byte sizes.
And we wanted to be able to convert between bits and bytes seamlessly.
That means that shifting is a valid operation for addr_wide_int.  But if
we also implicitly (and that's the key word) used addr_wide_int
directly on tree constants that are wider than addr_wide_int, and say
shifted the result right, the answer would be different from what you'd
get if you did the shift in max_wide_int.  That seems like new behaviour,
since all address arithmetic is effectively done to maximum precision on
mainline.  It's just that the maximum on mainline is rather small.

If code is looking through casts to see wider-than-addr_wide_int types,
I think it's reasonable for that code to have to explicitly force the
tree to addr_wide_int size, via addr_wide_int::from.  Leaving it implicit
seems too subtle and also means that every caller to wi::int_traits
::decompose does a check that is usually unnecessary.


If a programmer uses a long long on a 32 bit machine for some index
variable and slams that into a pointer, he either knows what he is doing
or has made a mistake.do you really think that the compiler should ice?

No, I'm saying that passes that operate on addr_wide_ints while "grabbing
trees from all over" (still not sure what that means in practice) should
explicitly mark places where a truncation is deliberately allowed.
Those places then guarantee that the dropped bits wouldn't affect any of
the later calculations, which is something only the pass itself can know.

We already forbid direct assignments like:

addr_wide_int x = max_wide_int(...);

at compile time, for similar reasons.

Thanks,
Richard


Index: gcc/tree.c
===
--- gcc/tree.c	(revision 203521)
+++ gcc/tree.c	(working copy)
@@ -1187,10 +1187,10 @@ wide_int_to_tree (tree type, const wide_
   tre

Re: [PATCH] Fix IVOPTs introducing undefined signed overflow

2013-10-15 Thread Richard Biener

On Tue, 15 Oct 2013, Richard Biener wrote:

> 
> This makes sure that we do not record a signed GIV that we do not
> know whether it overflows or not.  For the testcase IVOPTs else
> can end up replacing an unsigned computation with a signed one.
> 
> Note that alternatively we may decide that it is not desirable
> for SCEV to return signed { 2147483643, +, 1 } for this case
> (ISTR adding that as feature ...).  But I think that the
> result, (int) { 2147483643, +, 1 } would not be a simple-iv
> anymore.
> 
> Bootstrap and regtest pending on x86_64-unknown-linux-gnu.
> 
> Any opinions?

Actually the bug seems to be triggered by local patches in my
tree that change get_computation_aff to better preserve original
types ...

So disregard this.

Thanks,
Richard.

> Thanks,
> Richard.
> 
> 2013-10-15  Richard Biener  
> 
>   PR tree-optimization/58736
>   * tree-ssa-loop-ivopts.c (find_givs_in_stmt_scev): If this
>   is a GIV with undefined behavior on overflow make sure it
>   does not overflow before recording it.
> 
>   * gcc.dg/torture/pr58736.c: New testcase.
> 
> Index: gcc/tree-ssa-loop-ivopts.c
> ===
> --- gcc/tree-ssa-loop-ivopts.c(revision 203590)
> +++ gcc/tree-ssa-loop-ivopts.c(working copy)
> @@ -1084,6 +1084,13 @@ find_givs_in_stmt_scev (struct ivopts_da
>  
>if (!simple_iv (loop, loop_containing_stmt (stmt), lhs, iv, true))
>  return false;
> +
> +  /* If the induction variable invokes undefined behavior when it wraps
> + make sure it does not overflow.  */
> +  if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (lhs))
> +  && !iv->no_overflow)
> +return false;
> +
>iv->base = expand_simple_operations (iv->base);
>  
>if (contains_abnormal_ssa_name_p (iv->base)
> Index: gcc/testsuite/gcc.dg/torture/pr58736.c
> ===
> --- gcc/testsuite/gcc.dg/torture/pr58736.c(revision 0)
> +++ gcc/testsuite/gcc.dg/torture/pr58736.c(working copy)
> @@ -0,0 +1,20 @@
> +/* { dg-do run } */
> +/* { dg-additional-options "-fstrict-overflow" } */
> +
> +int a, b, c, d, e;
> +
> +int
> +main ()
> +{
> +  for (b = 4; b > -30; b--)
> +{
> +  e = a > (int)((unsigned int) __INT_MAX__ - (unsigned int) b);
> +  for (; c;)
> + for (;;)
> +   {
> + if (d)
> +   break;
> +   }
> +}
> +  return 0;
> +}
> 

-- 
Richard Biener 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend

[PATCH] Fix IVOPTs introducing undefined signed overflow

2013-10-15 Thread Richard Biener


This makes sure that we do not record a signed GIV that we do not
know whether it overflows or not.  For the testcase IVOPTs else
can end up replacing an unsigned computation with a signed one.

Note that alternatively we may decide that it is not desirable
for SCEV to return signed { 2147483643, +, 1 } for this case
(ISTR adding that as feature ...).  But I think that the
result, (int) { 2147483643, +, 1 } would not be a simple-iv
anymore.

Bootstrap and regtest pending on x86_64-unknown-linux-gnu.

Any opinions?

Thanks,
Richard.

2013-10-15  Richard Biener  

PR tree-optimization/58736
* tree-ssa-loop-ivopts.c (find_givs_in_stmt_scev): If this
is a GIV with undefined behavior on overflow make sure it
does not overflow before recording it.

* gcc.dg/torture/pr58736.c: New testcase.

Index: gcc/tree-ssa-loop-ivopts.c
===
--- gcc/tree-ssa-loop-ivopts.c  (revision 203590)
+++ gcc/tree-ssa-loop-ivopts.c  (working copy)
@@ -1084,6 +1084,13 @@ find_givs_in_stmt_scev (struct ivopts_da
 
   if (!simple_iv (loop, loop_containing_stmt (stmt), lhs, iv, true))
 return false;
+
+  /* If the induction variable invokes undefined behavior when it wraps
+ make sure it does not overflow.  */
+  if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (lhs))
+  && !iv->no_overflow)
+return false;
+
   iv->base = expand_simple_operations (iv->base);
 
   if (contains_abnormal_ssa_name_p (iv->base)
Index: gcc/testsuite/gcc.dg/torture/pr58736.c
===
--- gcc/testsuite/gcc.dg/torture/pr58736.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr58736.c  (working copy)
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-additional-options "-fstrict-overflow" } */
+
+int a, b, c, d, e;
+
+int
+main ()
+{
+  for (b = 4; b > -30; b--)
+{
+  e = a > (int)((unsigned int) __INT_MAX__ - (unsigned int) b);
+  for (; c;)
+   for (;;)
+ {
+   if (d)
+ break;
+ }
+}
+  return 0;
+}

Re: [PATCH v2] Fix libgfortran cross compile configury w.r.t newlib

2013-10-15 Thread Marcus Shawcroft

On 1 October 2013 12:40, Marcus Shawcroft  wrote:
> On 30/09/13 13:40, Marcus Shawcroft wrote:
>
>>> Well, I thought this patch would work for me, but it does not.  It looks
>>> like gcc_no_link is set to 'no' on my target because, technically, I can
>>> link even if I don't use a linker script.  I just can't find any
>>> functions.
>>>
>
>> In which case gating on gcc_no_link could be replaced with a test that
>> looks to see if we can link with the library.  Perhaps looking for
>> exit() or some such that might reasonably be expected to be present.
>>
>> For example:
>>
>> AC_CHECK_FUNC(exit)
>> if test "x${with_newlib}" = "xyes" -a "x${ac_cv_func_exit}" = "xno"; then
>>
>> /Marcus
>>
>>
>>
>>
>
>
> Patch attached.
>
> /Marcus
>
> 2013-10-01  Marcus Shawcroft  
>
> * configure.ac (AC_CHECK_FUNCS_ONCE): Add for exit() then make
> existing AC_CHECK_FUNCS_ONCE dependent on outcome.

Ping^2

/Marcus

[ARM] [Neon types 8/10] Cortex-A7 neon pipeline model

2013-10-15 Thread James Greenhalgh


Hi,

This patch updates the A7 pipeline for the new Neon types.

Sanity checked and tested with some neon intrinsics code to see
schedule quality.

Thanks,
James

---
gcc/

2013-10-15  James Greenhalgh  

* config/arm/cortex-a7.md
(cortex_a7_neon_type): New.
(cortex_a7_neon_mul): Update for new types.
(cortex_a7_neon_mla): Likewise.
(cortex_a7_neon): Likewise.
diff --git a/gcc/config/arm/cortex-a7.md b/gcc/config/arm/cortex-a7.md
index a72a88d90af1c5491115ee84af47ec6d4f593535..7db6c5b24fb9cfe0c8a6a9837798736bd94b7788 100644
--- a/gcc/config/arm/cortex-a7.md
+++ b/gcc/config/arm/cortex-a7.md
@@ -20,6 +20,45 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; .
 
+(define_attr "cortex_a7_neon_type"
+  "neon_mul, neon_mla, neon_other"
+  (cond [
+  (eq_attr "type" "neon_mul_b, neon_mul_b_q,\
+	   neon_mul_h, neon_mul_h_q,\
+			   neon_mul_s, neon_mul_s_q,\
+			   neon_mul_b_long, neon_mul_h_long,\
+			   neon_mul_s_long, neon_mul_h_scalar,\
+			   neon_mul_h_scalar_q, neon_mul_s_scalar,\
+			   neon_mul_s_scalar_q, neon_mul_h_scalar_long,\
+			   neon_mul_s_scalar_long,\
+			   neon_sat_mul_b, neon_sat_mul_b_q,\
+			   neon_sat_mul_h, neon_sat_mul_h_q,\
+			   neon_sat_mul_s, neon_sat_mul_s_q,\
+			   neon_sat_mul_b_long, neon_sat_mul_h_long,\
+			   neon_sat_mul_s_long,\
+			   neon_sat_mul_h_scalar, neon_sat_mul_h_scalar_q,\
+			   neon_sat_mul_s_scalar, neon_sat_mul_s_scalar_q,\
+			   neon_sat_mul_h_scalar_long,\
+			   neon_sat_mul_s_scalar_long,\
+			   neon_fp_mul_s, neon_fp_mul_s_q,\
+			   neon_fp_mul_s_scalar, neon_fp_mul_s_scalar_q")
+ (const_string "neon_mul")
+  (eq_attr "type" "neon_mla_b, neon_mla_b_q, neon_mla_h,\
+	   neon_mla_h_q, neon_mla_s, neon_mla_s_q,\
+			   neon_mla_b_long, neon_mla_h_long,\
+   neon_mla_s_long,\
+			   neon_mla_h_scalar, neon_mla_h_scalar_q,\
+			   neon_mla_s_scalar, neon_mla_s_scalar_q,\
+			   neon_mla_h_scalar_long, neon_mla_s_scalar_long,\
+			   neon_sat_mla_b_long, neon_sat_mla_h_long,\
+			   neon_sat_mla_s_long,\
+			   neon_sat_mla_h_scalar_long,\
+   neon_sat_mla_s_scalar_long,\
+			   neon_fp_mla_s, neon_fp_mla_s_q,\
+			   neon_fp_mla_s_scalar, neon_fp_mla_s_scalar_q")
+ (const_string "neon_mla")]
+   (const_string "neon_other")))
+
 (define_automaton "cortex_a7")
 
 
@@ -227,14 +266,7 @@ (define_insn_reservation "cortex_a7_fpmu
 
 (define_insn_reservation "cortex_a7_neon_mul" 4
   (and (eq_attr "tune" "cortexa7")
-   (eq_attr "type"
-"neon_mul_ddd_8_16_qdd_16_8_long_32_16_long,\
- neon_mul_qqq_8_16_32_ddd_32,\
- neon_mul_qdd_64_32_long_qqd_16_ddd_32_scalar_64_32_long_scalar,\
- neon_mul_ddd_16_scalar_32_16_long_scalar,\
- neon_mul_qqd_32_scalar,\
- neon_fp_vmul_ddd,\
- neon_fp_vmul_qqd"))
+   (eq_attr "cortex_a7_neon_type" "neon_mul"))
   "(cortex_a7_both+cortex_a7_fpmul_pipe)*2")
 
 (define_insn_reservation "cortex_a7_fpmacs" 8
@@ -244,16 +276,7 @@ (define_insn_reservation "cortex_a7_fpma
 
 (define_insn_reservation "cortex_a7_neon_mla" 8
   (and (eq_attr "tune" "cortexa7")
-   (eq_attr "type"
-"neon_mla_ddd_8_16_qdd_16_8_long_32_16_long,\
- neon_mla_qqq_8_16,\
- neon_mla_ddd_32_qqd_16_ddd_32_scalar_qdd_64_32_long_scalar_qdd_64_32_long,\
- neon_mla_qqq_32_qqd_32_scalar,\
- neon_mla_ddd_16_scalar_qdd_32_16_long_scalar,\
- neon_fp_vmla_ddd,\
- neon_fp_vmla_qqq,\
- neon_fp_vmla_ddd_scalar,\
- neon_fp_vmla_qqq_scalar"))
+   (eq_attr "cortex_a7_neon_type" "neon_mla"))
   "cortex_a7_both+cortex_a7_fpmul_pipe")
 
 (define_bypass 4 "cortex_a7_fpmacs,cortex_a7_neon_mla"
@@ -366,21 +389,6 @@ (define_bypass 2 "cortex_a7_f_loads, cor
 
 (define_insn_reservation "cortex_a7_neon" 4
   (and (eq_attr "tune" "cortexa7")
-   (eq_attr "type"
-"neon_mul_ddd_8_16_qdd_16_8_long_32_16_long,\
- neon_mul_qqq_8_16_32_ddd_32,\
- neon_mul_qdd_64_32_long_qqd_16_ddd_32_scalar_64_32_long_scalar,\
- neon_mla_ddd_8_16_qdd_16_8_long_32_16_long,\
- neon_mla_qqq_8_16,\
- neon_mla_ddd_32_qqd_16_ddd_32_scalar_qdd_64_32_long_scalar_qdd_64_32_long,\
- neon_mla_qqq_32_qqd_32_scalar,\
- neon_mul_ddd_16_scalar_32_16_long_scalar,\
- neon_mul_qqd_32_scalar,\
- neon_mla_ddd_16_scalar_qdd_32_16_long_scalar,\
- neon_fp_vmul_ddd,\
- neon_fp_vmul_qqd,\
- neon_fp_vmla_ddd,\
- neon_fp_vmla_qqq,\
- neon_fp_vmla_ddd_scalar,

1 2 >

1 - 100 of 126 matches

Mail list logo