Re: [patch] fix fortran regressions on FreeBSD10.0/11.0

2013-12-17 Thread Janne Blomqvist
On Tue, Dec 17, 2013 at 8:51 AM, Jakub Jelinek  wrote:
> On Tue, Dec 17, 2013 at 06:19:43AM +0100, Andreas Tobler wrote:
>> The below patch allows me to get back to normal, means zero unexpected
>> fails, on FreeBSD. The patch has been tested on Linux/x86 as well, no
>> regressions.
>
> On Linux, mkostemp performs on flags:
> (flags & ~O_ACCMODE) | O_RDWR | O_CREAT | O_EXCL
> before passing it to open(2), so the patch makes no real change on Linux.

Yes, with the help of strace and a small test program I figured out
that it must do something like that. See also

http://austingroupbugs.net/view.php?id=411

I sent a request to the linux man pages project to improve the documentation.

-- 
Janne Blomqvist


Re: [PATCH] Fix convert_mode

2013-12-17 Thread Richard Biener
Jakub Jelinek  wrote:
>On Mon, Dec 16, 2013 at 11:12:40PM +0100, Jakub Jelinek wrote:
>> When testing the patch the overflow-2.c testcase didn't exist yet,
>> nor was ubsan on -m32 actually ever reporting overflows on the DImode
>> multiplication (it simply expanded it as normal DImode multiplication
>with
>> no overflow checking).
>> 
>> To me this looks like very old bug (r2174 added it), will
>bootstrap/regtest
>> this:
>> 
>> 2013-12-16  Jakub Jelinek  
>> 
>>  * expr.c (convert_modes): For SUBREG_PROMOTED_VAR_P use SUBREG_REG
>(x)
>>  instead of x as last gen_lowpart argument.
>> 
>> --- gcc/expr.c.jj2013-12-12 09:39:45.0 +0100
>> +++ gcc/expr.c   2013-12-16 23:05:07.519747459 +0100
>> @@ -719,7 +719,7 @@ convert_modes (enum machine_mode mode, e
>>if (GET_CODE (x) == SUBREG && SUBREG_PROMOTED_VAR_P (x)
>>&& GET_MODE_SIZE (GET_MODE (SUBREG_REG (x))) >= GET_MODE_SIZE
>(mode)
>>&& SUBREG_PROMOTED_UNSIGNED_P (x) == unsignedp)
>> -x = gen_lowpart (mode, x);
>> +x = gen_lowpart (mode, SUBREG_REG (x));
>>  
>>if (GET_MODE (x) != VOIDmode)
>>  oldmode = GET_MODE (x);
>> 
>
>Successfully bootstrapped/regtested on x86_64-linux and i686-linux, ok
>for
>trunk?

Ok.

Thanks,
Richard.

>   Jakub




Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly Elemental functions) for C

2013-12-17 Thread Thomas Schwinge
Hi!

For reference, here's my rationale for OpenACC on this topic:

On Tue, 17 Dec 2013 07:17:31 +0100, Jakub Jelinek  wrote:
> On Tue, Dec 17, 2013 at 03:51:14AM +, Iyer, Balaji V wrote:
> > Hi Jakub,   
> > I will work on this, but I need a couple clarifications about some of 
> > your comments. Please see below:
> > 
> > > > +#define CILK_SIMD_FN_CLAUSE_MASK   \
> > > > +   ( (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_SIMDLEN)
> > >   \
> > > > +   | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_LINEAR)
> > >   \
> > > > +   | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_UNIFORM)
> > >   \
> > > > +   | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_INBRANCH)
> > >   \
> > > > +   | (OMP_CLAUSE_MASK_1 <<
> > > PRAGMA_OMP_CLAUSE_NOTINBRANCH))
> > > 
> > > I thought you'd instead add there PRAGMA_CILK_CLAUSE_VECTORLENGTH,
> > > PRAGMA_CILK_CLAUSE_MASK and PRAGMA_CILK_CLAUSE_NOMASK (or
> > > similar).
> > > 
> > 
> > I looked at OpenACC implementation and they seem to use the OMP_CLAUSE_* 
> > (line # 11174 in c-parser.c)
> 
> It uses just PRAGMA_OMP_CLAUSE_NONE, which really means no clauses at all (I
> think it is for now).

Right, that's only for now.

> > Also, If I created CILK_CLAUSE_* variants, I have to re-create another 
> > function similar to c_parser_omp_all_clauses, whose workings will be 
> > identical to the c_parser_omp_all_clauses. Is that OK with you?
> 
> No, I'd remove enum pragma_cilk_clause altogether and fold it into the end of
> pragma_omp_clause, as:
>   PRAGMA_CILK_CLAUSE_VECTORLENGTH,
>   PRAGMA_CILK_CLAUSE_MASK,
>   PRAGMA_CILK_CLAUSE_NOMASK,
>   PRAGMA_CILK_CLAUSE_NONE = PRAGMA_OMP_CLAUSE_NONE,
>   PRAGMA_CILK_CLAUSE_LINEAR = PRAGMA_OMP_CLAUSE_LINEAR,
>   PRAGMA_CILK_CLAUSE_PRIVATE = PRAGMA_OMP_CLAUSE_PRIVATE,
>   PRAGMA_CILK_CLAUSE_FIRSTPRIVATE = PRAGMA_OMP_CLAUSE_FIRSTPRIVATE,
>   PRAGMA_CILK_CLAUSE_LASTPRIVATE = PRAGMA_OMP_CLAUSE_LASTPRIVATE,
>   PRAGMA_CILK_CLAUSE_REDUCTION = PRAGMA_OMP_CLAUSE_REDUCTION
> so that you can use it in the same bitmasks.

Hmm, indeed my inclination (and what I have implemented in my working
trees) has been to literally re-use the existing PRAGMA_OMP_* ones for
OpenACC, without adding new aliasesm, and extend/add new ones as
required.

My understanding/reasoning is that PRAGMA_OMP_* just literally represents
a parser token of a pragma line (see the one-to-one translation in
c-parser.c:c_parser_omp_clause_name, for example).  This means that
»#pragma omp parallel copyin ([...])« and »#pragma acc parallel copyin
([...])« can share the same PRAGMA_OMP_CLAUSE_COPYIN, even though it
means something different to both of them; PRAGMA_OMP_CLAUSE_* alone
doesn't convey any meaning (apart from the token/"string" used in the
pragma line), and it gets its meaning only if interpreted as part of a
Open* construct/directive.  Just like many other tokens only get their
semantic meaning when parsed inside a specific language construct.  For
OpenACC, the disambiguation, that is, translation from
PRAGMA_OMP_CLAUSE_* to OMP_CLAUSE_*...

> That way, you don't have to change anything in c_parser_omp_all_clauses,
> just add handling of the 3 clauses that don't have OpenMP counterparts.

... then indeed happens in a new c_parser_oacc_all_clauses, which parses
all the applicable PRAGMA_OMP_CLAUSE_* according to the OpenACC
semantics.

For example, said PRAGMA_OMP_CLAUSE_COPYIN is translated to
OMP_CLAUSE_MAP with OMP_CLAUSE_MAP_TO, and the (new)
PRAGMA_OMP_CLAUSE_PRESENT_OR_COPYOUT (which is only interpreted/valid
inside OpenACC contexts) is translated to OMP_CLAUSE_MAP with (new)
OMP_CLAUSE_MAP_PRESENT_OR_FROM (which is only interpreted/valid inside
OpenACC contexts).


Grüße,
 Thomas


pgpplSmD1S2Jc.pgp
Description: PGP signature


Re: [Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-17 Thread Allan Sandfeld Jensen
On Monday 16 December 2013, Uros Bizjak wrote:
> On Mon, Dec 16, 2013 at 10:34 AM, Uros Bizjak  wrote:
> > On Sun, Dec 15, 2013 at 7:54 PM, Allan Sandfeld Jensen
> > 
> >  wrote:
> >> Hi again
> >> 
> >> On Wednesday 11 December 2013, Uros Bizjak wrote:
> >>> Hello!
> >>> 
> >>> > PR gcc/59422
> >>> > 
> >>> > This patch extends the supported targets for function multi versiong
> >>> > to also include Haswell, Silvermont, and the most recent AMD models.
> >>> > It also prioritizes AVX2 versions over AMD specific pre-AVX2
> >>> > versions.
> >>> 
> >>> Please add a ChangeLog entry and attach the complete patch. Please
> >>> also state how you tested the patch, as outlined in the instructions
> >>> [1].
> >>> 
> >>> [1] http://gcc.gnu.org/contribute.html
> >> 
> >> Updated patch for better CPU model detection and added ChangeLog.
> >> 
> >> The patch has been tested with the attached test.cpp. Verified that it
> >> doesn't build before the patch, and that it builds after, and verified
> >> it selects correct versions at runtime based on either CPU model or
> >> supported ISA (tested on 3 machines: SandyBridge, IvyBridge and Phenom
> >> II).
> >> 
> >> Btw, I couldn't find anything that corresponds to gcc's btver2 arch. Is
> >> that an old term for what has become the Jaguar architecture?
> > 
> > Thanks for the patch!
> > 
> > However, you should not change the existing order of enums in
> > cpuinfo.c (enum processor_vendor, enum processor_types, enum
> > processor_subtypes, enum processor_features), but new entries should
> > be added at the end (before *_MAX entry, if exists) of the enum. The
> > enums (enum processor_features and enum processor_model) in
> > config/i386/i386.c should mirror these changes. Please see [1].
> > 
> > Probably, we should document this in the source...
> > 
> > -  {"sandybridge", M_INTEL_COREI7_SANDYBRIDGE},
> > +  {"corei7-avx", M_INTEL_COREI7_SANDYBRIDGE},
> > 
> > Huh... Thanks for catching this. -march=sandybridge is not recognized...
> 
> -  {"sandybridge", M_INTEL_COREI7_SANDYBRIDGE},
> +  {"corei7-avx", M_INTEL_COREI7_SANDYBRIDGE},
> +  {"core-avx-i", M_INTEL_COREI7_IVYBRIDGE},
> +  {"core-avx2", M_INTEL_COREI7_HASWELL},
> 
> Ah, no. These names are not intended to be used in -march. We can
> follow the tradition and use sandybridge, ivybridge and haswell here.
> 
I had the problem that "arch=corei7-avx" was not recognized as a valid 
property argument until I made that change. I thought it was the intend to 
merge this list of models with the canonical names, but perhaps it is an error 
in the new parameter validation?

Note that similarly "arch=sandybridge" is accepted as a valid property 
argument but then fails as an invalid argument for march.

Regards
`Allan


Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly Elemental functions) for C

2013-12-17 Thread Jakub Jelinek
On Tue, Dec 17, 2013 at 11:03:12AM +0100, Thomas Schwinge wrote:
> > > Also, If I created CILK_CLAUSE_* variants, I have to re-create another 
> > > function similar to c_parser_omp_all_clauses, whose workings will be 
> > > identical to the c_parser_omp_all_clauses. Is that OK with you?
> > 
> > No, I'd remove enum pragma_cilk_clause altogether and fold it into the end 
> > of
> > pragma_omp_clause, as:
> >   PRAGMA_CILK_CLAUSE_VECTORLENGTH,
> >   PRAGMA_CILK_CLAUSE_MASK,
> >   PRAGMA_CILK_CLAUSE_NOMASK,
> >   PRAGMA_CILK_CLAUSE_NONE = PRAGMA_OMP_CLAUSE_NONE,
> >   PRAGMA_CILK_CLAUSE_LINEAR = PRAGMA_OMP_CLAUSE_LINEAR,
> >   PRAGMA_CILK_CLAUSE_PRIVATE = PRAGMA_OMP_CLAUSE_PRIVATE,
> >   PRAGMA_CILK_CLAUSE_FIRSTPRIVATE = PRAGMA_OMP_CLAUSE_FIRSTPRIVATE,
> >   PRAGMA_CILK_CLAUSE_LASTPRIVATE = PRAGMA_OMP_CLAUSE_LASTPRIVATE,
> >   PRAGMA_CILK_CLAUSE_REDUCTION = PRAGMA_OMP_CLAUSE_REDUCTION
> > so that you can use it in the same bitmasks.
> 
> Hmm, indeed my inclination (and what I have implemented in my working
> trees) has been to literally re-use the existing PRAGMA_OMP_* ones for
> OpenACC, without adding new aliasesm, and extend/add new ones as
> required.

The aliases would be only if they are needed, I understood that
those are already used in #pragma simd parsing.  Surely, if they are renamed
to PRAGMA_OMP_* where they have counterparts, the aliases aren't needed.

> My understanding/reasoning is that PRAGMA_OMP_* just literally represents
> a parser token of a pragma line (see the one-to-one translation in
> c-parser.c:c_parser_omp_clause_name, for example).  This means that
> »#pragma omp parallel copyin ([...])« and »#pragma acc parallel copyin
> ([...])« can share the same PRAGMA_OMP_CLAUSE_COPYIN, even though it
> means something different to both of them; PRAGMA_OMP_CLAUSE_* alone
> doesn't convey any meaning (apart from the token/"string" used in the
> pragma line), and it gets its meaning only if interpreted as part of a
> Open* construct/directive.  Just like many other tokens only get their
> semantic meaning when parsed inside a specific language construct.  For
> OpenACC, the disambiguation, that is, translation from
> PRAGMA_OMP_CLAUSE_* to OMP_CLAUSE_*...
> 
> > That way, you don't have to change anything in c_parser_omp_all_clauses,
> > just add handling of the 3 clauses that don't have OpenMP counterparts.
> 
> ... then indeed happens in a new c_parser_oacc_all_clauses, which parses
> all the applicable PRAGMA_OMP_CLAUSE_* according to the OpenACC
> semantics.

Unlike OpenACC, Cilk+ for the vector attribute has pretty much the OpenMP
syntax, with just a few exceptions (in particular, 3 clauses have different
names (and there are extra requirements for vectorlength?) and for linear
there is an extension on the Cilk+ side.  So, duplicating the
c_parser_*all_clauses in that case is IMHO not needed, the mask specifies
which clauses are allowed in the particular construct and the only case
which needs disambiguation (linear clauses' step) can be disambiguated
by checking if some Cilk+ specific clause is in the mask (already the
clause splitting code uses such tests).

If OpenACC clauses have different names from the OpenMP/Cilk+ ones, I don't
see why you would need a new *_all_clauses function, just supply a different
mask (unless we run out of the 64-bits in the bitmask, then we'd need extra
steps, perhaps start using real bitmasks or something).

> For example, said PRAGMA_OMP_CLAUSE_COPYIN is translated to
> OMP_CLAUSE_MAP with OMP_CLAUSE_MAP_TO, and the (new)
> PRAGMA_OMP_CLAUSE_PRESENT_OR_COPYOUT (which is only interpreted/valid
> inside OpenACC contexts) is translated to OMP_CLAUSE_MAP with (new)
> OMP_CLAUSE_MAP_PRESENT_OR_FROM (which is only interpreted/valid inside
> OpenACC contexts).

This is weird, because present or {alloc,from,to,fromto} is the OpenMP
behavior, so I'd expect you would be adding a bit for the other, non-OpenMP
compatible behavior instead.

Jakub


[ARM 2/5 big.LITTLE] Allow tuning parameters without unique tuning targets.

2013-12-17 Thread James Greenhalgh

Hi,

A limitation in the ARM backend is that each core added to arm-cores.def
must provide a unique identifier to be used for tuning. This restricts
us when we want to share the same identifier between a number of cores.

The machinery here is a bit messy, and we don't really make it any nicer
in this patch. But, this change does allow you to add core names which
use other tuning targets easily.

This, for example allows us to wire up -mcpu=cortex-a15.cortex-a7 to
use the scheduler description for Cortex-A7 without requiring
modifications to the Cortex-A7 scheduler description.

Bootstrapped in series and checked on arm-none-linux-gnueabi and
arm-none-eabi.

OK?

Thanks,
James

---
gcc/

2013-12-17  James Greenhalgh  

* config/arm/arm-cores.def: Add new column for TUNE_IDENT.
* config/arm/genopt.sh: Improve layout.
* config/arm/arm-tune.md: Regenerate.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-opts.h (ARM_CORE): Modify macro for TUNE_IDENT.
* config/arm/arm.c (ARM_CORE): Modify macro for TUNE_IDENT.
(arm_option_override): When a CPU is chosen, that should also
form the tune target.
* config/arm/arm.h (ARM_CORE): Modify macro for TUNE_IDENT.
diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index e7cea63..3264eed 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -20,10 +20,13 @@
 
 /* Before using #include to read this file, define a macro:
 
-  ARM_CORE(CORE_NAME, CORE_IDENT, ARCH, FLAGS, COSTS)
+  ARM_CORE(CORE_NAME, INTERNAL_IDENT, TUNE_IDENT, ARCH, FLAGS, COSTS)
 
The CORE_NAME is the name of the core, represented as a string constant.
-   The CORE_IDENT is the name of the core, represented as an identifier.
+   The INTERNAL_IDENT is the name of the core represented as an identifier.
+   This must be unique for each entry in this table.
+   The TUNE_IDENT is the name of the core for which scheduling decisions
+   should be made, represented as an identifier.
ARCH is the architecture revision implemented by the chip.
FLAGS are the bitwise-or of the traits that apply to that core.
This need not include flags implied by the architecture.
@@ -35,109 +38,115 @@
Some tools assume no whitespace up to the first "," in each entry.  */
 
 /* V2/V2A Architecture Processors */
-ARM_CORE("arm2",   arm2,	2,	FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm250", arm250,	2,	FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm3",   arm3,	2,	FL_CO_PROC | FL_MODE26, slowmul)
+ARM_CORE("arm2", 	arm2, arm2,	2, FL_CO_PROC | FL_MODE26, slowmul)
+ARM_CORE("arm250", 	arm250, arm250,	2, FL_CO_PROC | FL_MODE26, slowmul)
+ARM_CORE("arm3",	arm3, arm3,	2, FL_CO_PROC | FL_MODE26, slowmul)
 
 /* V3 Architecture Processors */
-ARM_CORE("arm6",  arm6,		3,	FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm60", arm60,	3,	FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm600",arm600,	3,	FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm610",arm610,	3,	 FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm620",arm620,	3,	FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm7",  arm7,		3,	FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm7d", arm7d,	3,	FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm7di",arm7di,	3,	FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm70", arm70,	3,	FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm700",arm700,	3,	FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm700i",   arm700i,	3,	FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm710",arm710,	3,	 FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm720",arm720,	3,	 FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm710c",   arm710c,	3,	 FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm7100",   arm7100,	3,	 FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm7500",   arm7500,	3,	 FL_MODE26 | FL_WBUF, slowmul)
-/* Doesn't have an external co-proc, but does have embedded fpa.  */
-ARM_CORE("arm7500fe", arm7500fe,	3,	FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
+ARM_CORE("arm6",	arm6, arm6,		3, FL_CO_PROC | FL_MODE26, slowmul)
+ARM_CORE("arm60",	arm60, arm60,		3, FL_CO_PROC | FL_MODE26, slowmul)
+ARM_CORE("arm600",	arm600, arm600,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
+ARM_CORE("arm610",	arm610, arm610,		3, FL_MODE26 | FL_WBUF, slowmul)
+ARM_CORE("arm620",	arm620, arm620,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
+ARM_CORE("arm7",	arm7, arm7,		3, FL_CO_PROC | FL_MODE26, slowmul)
+ARM_CORE("arm7d",	arm7d, arm7d,		3, FL_CO_PROC | FL_MODE26, slowmul)
+ARM_CORE("arm7di",	arm7di, arm7di,		3, FL_CO_PROC | FL_MODE26, slowmul)
+ARM_CORE("arm70",	arm70, arm70,		3, FL_CO_PROC | FL_MODE26, slowmul)
+ARM_CORE("arm700",	arm700, arm700,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
+ARM_CORE("arm700i",	arm700i, arm700i,	3, FL_CO_PROC | FL_MODE26 |

[ARM 5/5 big.LITTLE] Add support for -mcpu=cortex-a57.cortex-a53

2013-12-17 Thread James Greenhalgh

Hi,

This patch wires up -mcpu=cortex-a57.cortex-a53 as an option to
-mcpu.

Bootstrapped in series, and sanity checked.

OK?

Thanks,
James

---
2013-12-17  James Greenhalgh  

* config/arm/arm-cores.def (cortex-a57.cortex-a53): New.
* doc/invoke.texi: Document -mcpu=cortex-a57.cortex-a53.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.
* config/arm/bpabi.h
(BE8_LINK_SPEC): Handle -mcpu=cortex-a57.cortex-a53.
diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index d5e562b..9bd3f39 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -154,3 +154,6 @@ ARM_CORE("cortex-a15.cortex-a7", cortexa15cortexa7, cortexa7,	7A,  FL_LDSCHED |
 /* V8 Architecture Processors */
 ARM_CORE("cortex-a53",	cortexa53, cortexa53,	8A, FL_LDSCHED, cortex_a53)
 ARM_CORE("cortex-a57",	cortexa57, cortexa15,	8A, FL_LDSCHED, cortex_a15)
+
+/* V8 big.LITTLE implementations */
+ARM_CORE("cortex-a57.cortex-a53", cortexa57cortexa53, cortexa53, 8A,  FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15)
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index 03c1560..702338c 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -291,6 +291,9 @@ Enum(processor_type) String(cortex-a53) Value(cortexa53)
 EnumValue
 Enum(processor_type) String(cortex-a57) Value(cortexa57)
 
+EnumValue
+Enum(processor_type) String(cortex-a57.cortex-a53) Value(cortexa57cortexa53)
+
 Enum
 Name(arm_arch) Type(int)
 Known ARM architectures (for use with the -march= option):
diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md
index d56956d0ab1bd917ad049f835880bdc0186d7d2a..954cab8efb10329eb40042acb0de2c361d6c13d2 100644
--- a/gcc/config/arm/arm-tune.md
+++ b/gcc/config/arm/arm-tune.md
@@ -30,5 +30,5 @@ (define_attr "tune"
 	cortexa15,cortexr4,cortexr4f,
 	cortexr5,cortexr7,cortexm4,
 	cortexm3,marvell_pj4,cortexa15cortexa7,
-	cortexa53,cortexa57"
+	cortexa53,cortexa57,cortexa57cortexa53"
 	(const (symbol_ref "((enum attr_tune) arm_tune)")))
diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h
index 796003b..5cfaeb8 100644
--- a/gcc/config/arm/bpabi.h
+++ b/gcc/config/arm/bpabi.h
@@ -64,6 +64,7 @@
|mcpu=marvell-pj4	\
|mcpu=cortex-a53	\
|mcpu=cortex-a57	\
+   |mcpu=cortex-a57.cortex-a53\
|mcpu=generic-armv7-a\
|march=armv7-m|mcpu=cortex-m3\
|march=armv7e-m|mcpu=cortex-m4   \
@@ -79,6 +80,7 @@
|mcpu=cortex-a15.cortex-a7\
|mcpu=cortex-a53	\
|mcpu=cortex-a57	\
+   |mcpu=cortex-a57.cortex-a53\
|mcpu=marvell-pj4	\
|mcpu=generic-armv7-a\
|march=armv7-m|mcpu=cortex-m3\
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9743387..b102e13 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12170,8 +12170,8 @@ assembly code.  Permissible names are: @samp{arm2}, @samp{arm250},
 @samp{fa606te}, @samp{fa626te}, @samp{fmp626}, @samp{fa726te}.
 
 Additionally, this option can specify that GCC should tune the performance
-of the code for a big.LITTLE system.  The only permissible name is:
-@samp{cortex-a15.cortex-a7}.
+of the code for a big.LITTLE system.  Permissible names are:
+@samp{cortex-a15.cortex-a7}, @samp{cortex-a57.cortex-a53}.
 
 @option{-mcpu=generic-@var{arch}} is also permissible, and is
 equivalent to @option{-march=@var{arch} -mtune=generic-@var{arch}}.

[ARM 3/5 big.LITTLE] Add support for -mcpu=cortex-a15.cortex-a7

2013-12-17 Thread James Greenhalgh

Hi,

This patch wires up -mcpu=cortex-a15.cortex-a7 as an option to
-mcpu.

Bootstrapped in series, with --with-cpu=cortex-a15.cortex-a7.

OK?

Thanks,
James

---
2013-12-17  James Greenhalgh  

* config/arm/arm-cores.def (cortex-a15.cortex-a7): New.
* doc/invoke.texi: Document -mcpu=cortex-a15.cortex-a7.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.
* config/arm/bpabi.h
(BE8_LINK_SPEC): Handle -mcpu=cortex-a5.cortex-a7.
diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 3264eed..0ea5eef 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -148,5 +148,8 @@ ARM_CORE("cortex-m4",		cortexm4, cortexm4,		7EM, FL_LDSCHED, v7m)
 ARM_CORE("cortex-m3",		cortexm3, cortexm3,		7M,  FL_LDSCHED, v7m)
 ARM_CORE("marvell-pj4",		marvell_pj4, marvell_pj4,	7A,  FL_LDSCHED, 9e)
 
+/* V7 big.LITTLE implementations */
+ARM_CORE("cortex-a15.cortex-a7", cortexa15cortexa7, cortexa7,	7A,  FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15)
+
 /* V8 Architecture Processors */
 ARM_CORE("cortex-a53",	cortexa53, cortexa53,	8A, FL_LDSCHED, cortex_a53)
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index 7da7cc8..d847c10 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -283,6 +283,9 @@ EnumValue
 Enum(processor_type) String(marvell-pj4) Value(marvell_pj4)
 
 EnumValue
+Enum(processor_type) String(cortex-a15.cortex-a7) Value(cortexa15cortexa7)
+
+EnumValue
 Enum(processor_type) String(cortex-a53) Value(cortexa53)
 
 Enum
diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md
index 0386afff7428169ad0e31ae4de4bd677413bc817..beee9af013f6a5a75b7051f3c7077e98fafd45ef 100644
--- a/gcc/config/arm/arm-tune.md
+++ b/gcc/config/arm/arm-tune.md
@@ -29,5 +29,6 @@ (define_attr "tune"
 	cortexa8,cortexa9,cortexa12,
 	cortexa15,cortexr4,cortexr4f,
 	cortexr5,cortexr7,cortexm4,
-	cortexm3,marvell_pj4,cortexa53"
+	cortexm3,marvell_pj4,cortexa15cortexa7,
+	cortexa53"
 	(const (symbol_ref "((enum attr_tune) arm_tune)")))
diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h
index b39c4a9..669884d 100644
--- a/gcc/config/arm/bpabi.h
+++ b/gcc/config/arm/bpabi.h
@@ -60,6 +60,7 @@
|mcpu=cortex-a7  \
|mcpu=cortex-a8|mcpu=cortex-a9|mcpu=cortex-a15   \
|mcpu=cortex-a12	\
+   |mcpu=cortex-a15.cortex-a7\
|mcpu=marvell-pj4	\
|mcpu=cortex-a53	\
|mcpu=generic-armv7-a\
@@ -74,6 +75,7 @@
|mcpu=cortex-a7  \
|mcpu=cortex-a8|mcpu=cortex-a9|mcpu=cortex-a15   \
|mcpu=cortex-a12	\
+   |mcpu=cortex-a15.cortex-a7\
|mcpu=cortex-a53	\
|mcpu=marvell-pj4	\
|mcpu=generic-armv7-a\
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index b655a64..e069305 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12168,6 +12168,9 @@ assembly code.  Permissible names are: @samp{arm2}, @samp{arm250},
 @samp{fa526}, @samp{fa626},
 @samp{fa606te}, @samp{fa626te}, @samp{fmp626}, @samp{fa726te}.
 
+Additionally, this option can specify that GCC should tune the performance
+of the code for a big.LITTLE system.  The only permissible name is:
+@samp{cortex-a15.cortex-a7}.
 
 @option{-mcpu=generic-@var{arch}} is also permissible, and is
 equivalent to @option{-march=@var{arch} -mtune=generic-@var{arch}}.

[ARM 4/5 big.LITTLE] Add support for -mcpu=cortex-a57

2013-12-17 Thread James Greenhalgh

Hi,

This patch wires up -mcpu=cortex-a57 as an option to
-mcpu. As we don't yet have a scheduling model for Cortex-A57
available, for now we use the scheduling description for another
"big" core, the Cortex-A15.

Bootstrapped in series and sanity checked.

OK?

Thanks,
James

---
2013-12-17  James Greenhalgh  

* config/arm/arm-cores.def (cortex-a57): New.
* doc/invoke.texi: Document -mcpu=cortex-a57.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.
* config/arm/bpabi.h (BE8_LINK_SPEC): Handle -mcpu=cortex-a57.
diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 0ea5eef..d5e562b 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -153,3 +153,4 @@ ARM_CORE("cortex-a15.cortex-a7", cortexa15cortexa7, cortexa7,	7A,  FL_LDSCHED |
 
 /* V8 Architecture Processors */
 ARM_CORE("cortex-a53",	cortexa53, cortexa53,	8A, FL_LDSCHED, cortex_a53)
+ARM_CORE("cortex-a57",	cortexa57, cortexa15,	8A, FL_LDSCHED, cortex_a15)
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index d847c10..03c1560 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -288,6 +288,9 @@ Enum(processor_type) String(cortex-a15.cortex-a7) Value(cortexa15cortexa7)
 EnumValue
 Enum(processor_type) String(cortex-a53) Value(cortexa53)
 
+EnumValue
+Enum(processor_type) String(cortex-a57) Value(cortexa57)
+
 Enum
 Name(arm_arch) Type(int)
 Known ARM architectures (for use with the -march= option):
diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md
index beee9af013f6a5a75b7051f3c7077e98fafd45ef..d56956d0ab1bd917ad049f835880bdc0186d7d2a 100644
--- a/gcc/config/arm/arm-tune.md
+++ b/gcc/config/arm/arm-tune.md
@@ -30,5 +30,5 @@ (define_attr "tune"
 	cortexa15,cortexr4,cortexr4f,
 	cortexr5,cortexr7,cortexm4,
 	cortexm3,marvell_pj4,cortexa15cortexa7,
-	cortexa53"
+	cortexa53,cortexa57"
 	(const (symbol_ref "((enum attr_tune) arm_tune)")))
diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h
index 669884d..796003b 100644
--- a/gcc/config/arm/bpabi.h
+++ b/gcc/config/arm/bpabi.h
@@ -63,6 +63,7 @@
|mcpu=cortex-a15.cortex-a7\
|mcpu=marvell-pj4	\
|mcpu=cortex-a53	\
+   |mcpu=cortex-a57	\
|mcpu=generic-armv7-a\
|march=armv7-m|mcpu=cortex-m3\
|march=armv7e-m|mcpu=cortex-m4   \
@@ -77,6 +78,7 @@
|mcpu=cortex-a12	\
|mcpu=cortex-a15.cortex-a7\
|mcpu=cortex-a53	\
+   |mcpu=cortex-a57	\
|mcpu=marvell-pj4	\
|mcpu=generic-armv7-a\
|march=armv7-m|mcpu=cortex-m3\
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e069305..9743387 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12157,7 +12157,8 @@ assembly code.  Permissible names are: @samp{arm2}, @samp{arm250},
 @samp{arm1136j-s}, @samp{arm1136jf-s}, @samp{mpcore}, @samp{mpcorenovfp},
 @samp{arm1156t2-s}, @samp{arm1156t2f-s}, @samp{arm1176jz-s}, @samp{arm1176jzf-s},
 @samp{cortex-a5}, @samp{cortex-a7}, @samp{cortex-a8}, @samp{cortex-a9},
-@samp{cortex-a12}, @samp{cortex-a15}, @samp{cortex-a53}, @samp{cortex-r4},
+@samp{cortex-a12}, @samp{cortex-a15}, @samp{cortex-a53}, @samp{cortex-a57},
+@samp{cortex-r4},
 @samp{cortex-r4f}, @samp{cortex-r5}, @samp{cortex-r7}, @samp{cortex-m4},
 @samp{cortex-m3},
 @samp{cortex-m1},

[Patch ARM] Add big.LITTLE tuning options

2013-12-17 Thread James Greenhalgh
Hi,

This patch series adds machinery and functionality to enable
tuning for big.LITTLE systems when compiling for the ARM target.

We take the convention for names to -mcpu that for some big.LITTLE
system where the big core is 'x' and the little core is 'y', the -mcpu
name will be x.y

In order to acheive that, we must first tweak some infrastructure.

First, in order to reduce coupling between assembler versions, we
must add name rewriting for the -mcpu command. big.LITTLE systems
use architecturally compatible cores, so we can be sure that if
we are asked to assemble for cortex-a15.cortex-a7, then we can also
assemble for cortex-a15. Thus, we choose to truncate at the first '.'
delimiter between core names.

The ARM backend presently carries the limitation that each entry in
arm-cores.def must provide a unique 'tuning' target. This is
restrictive and would require constant churn modifications to the
scheduler descriptions to add each big.LITTLE flavour which is released.
We modify this infrastructure to still carry a unique identifier, but also
to carry a potentially shared sheduling identifier.

The final 3 patches add support for new -mcpu values:
cortex-a15.cortex-a7, cortex-a57, cortex-a57.cortex-a53.

The series has been regression tested and built on a number of
configurations, bootstrapped with option --with-cpu=cortex-a15.cortex-a7
and benchmarked on an Cortex-A15 based system and a Cortex-A7 base
system with no regressions.

OK?

Thanks,
James

[ARM 1/5 big.LITTLE] Add driver support for rewriting -mcpu names

2013-12-17 Thread James Greenhalgh

Hi,

This patch adds machinery to the driver to ensure that big.LITTLE
style tuning names are rewritten before they are passed to the
assembler. This reduces the coupling needed between GCC versions
and assembler versions.

The rule is simple, we truncate the CPU name at the first '.'
character we see.

Thus -mcpu=cortex-a15.cortex-a7 would be truncated to -mcpu=cortex-a15.

Bootstrapped on a ChromeBook and checked for an arm-none-eabi and
an arm-none-linux-gnueabi build.

Thanks,
James

---
gcc/

2013-12-17  James Greenhalgh  

* common/config/arm/arm-common.c (arm_rewrite_selected_cpu): New.
(arm_rewrite_mcpu): Likewise.
* config/arm/arm-protos.h (arm_rewrite_selected_cpu): New.
* config/arm/arm.h (BIG_LITTLE_SPEC): New.
(BIG_LITTLE_SPEC_FUNCTIONS): Likewise.
(EXTRA_SPEC_FUNCTIONS): Include BIG_LITTLE_SPEC_FUNCTIONS.
(ASM_CPU_SPEC): Include BIG_LITTLE_SPEC.
* config/arm/arm.c (arm_file_start): Rewrite arm_selecetd_cpu values.
diff --git a/gcc/common/config/arm/arm-common.c b/gcc/common/config/arm/arm-common.c
index c43a2ce..87f18ec 100644
--- a/gcc/common/config/arm/arm-common.c
+++ b/gcc/common/config/arm/arm-common.c
@@ -63,6 +63,41 @@ arm_except_unwind_info (struct gcc_options *opts)
   return UI_SJLJ;
 }
 
+#define ARM_CPU_NAME_LENGTH 20
+
+/* Truncate NAME at the first '.' character seen, or return
+   NAME unmodified.  */
+
+const char *
+arm_rewrite_selected_cpu (const char *name)
+{
+  static char output_buf[ARM_CPU_NAME_LENGTH + 1] = {0};
+  char *arg_pos;
+
+  strncpy (output_buf, name, ARM_CPU_NAME_LENGTH);
+  arg_pos = strchr (output_buf, '.');
+
+  /* If we found a '.' truncate the entry at that point.  */
+  if (arg_pos)
+*arg_pos = '\0';
+
+  return output_buf;
+}
+
+/* Called by the driver to rewrite a name passed to the -mcpu
+   argument in preparation to be passed to the assembler.  The
+   name will be in ARGV[0], ARGC should always be 1.  */
+
+const char *
+arm_rewrite_mcpu (int argc, const char **argv)
+{
+  gcc_assert (argc == 1);
+  return arm_rewrite_selected_cpu (argv[0]);
+}
+
+#undef ARM_CPU_NAME_LENGTH
+
+
 #undef  TARGET_DEFAULT_TARGET_FLAGS
 #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT | MASK_SCHED_PROLOG)
 
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index c5b16da..558f134 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -289,4 +289,7 @@ extern bool arm_autoinc_modes_ok_p (enum machine_mode, enum arm_auto_incmodes);
 
 extern void arm_emit_eabi_attribute (const char *, int, int);
 
+/* Defined in gcc/common/config/arm-common.c.  */
+extern const char *arm_rewrite_selected_cpu (const char *name);
+
 #endif /* ! GCC_ARM_PROTOS_H */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 7027a26..a4ab6be 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -27527,7 +27527,11 @@ arm_file_start (void)
   else if (strncmp (arm_selected_cpu->name, "generic", 7) == 0)
 	asm_fprintf (asm_out_file, "\t.arch %s\n", arm_selected_cpu->name + 8);
   else
-	asm_fprintf (asm_out_file, "\t.cpu %s\n", arm_selected_cpu->name);
+	{
+	  const char* truncated_name
+	= arm_rewrite_selected_cpu (arm_selected_cpu->name);
+	  asm_fprintf (asm_out_file, "\t.cpu %s\n", truncated_name);
+	}
 
   if (TARGET_SOFT_FLOAT)
 	{
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 8b8b80e..6539ec6 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -2343,16 +2343,25 @@ extern int making_const_table;
instruction.  */
 #define MAX_LDM_STM_OPS 4
 
+#define BIG_LITTLE_SPEC \
+   " %{mcpu=*:%

[PATCH][ARM] Wire up scheduling for Cortex-A12

2013-12-17 Thread Kyrill Tkachov

Hi all,

This patch wires up the Cortex-A12 instruction scheduling to use the Cortex-A15 
pipeline description and sets the issue rate for it to 2 in arm_issue_rate.


This patch depends on James' recent rework of the tuning parameters posted at
http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01477.html

Tested arm-none-eabi on qemu.

Ok for trunk after the prerequisite goes in?

Thanks,
Kyrill

2013-12-17  Kyrylo Tkachov  

* config/arm/arm-cores.def (cortex-a12): Use cortexa15 scheduling.
* config/arm/arm.c (arm_issue_rate): Handle cortexa12.
* config/arm/arm.md (generic_vfp): Remove cortexa12.diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 3264eed..abe7636 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -138,7 +138,7 @@ ARM_CORE("cortex-a5",		cortexa5, cortexa5,		7A,  FL_LDSCHED, cortex_a5)
 ARM_CORE("cortex-a7",		cortexa7, cortexa7,		7A,  FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a7)
 ARM_CORE("cortex-a8",		cortexa8, cortexa8,		7A,  FL_LDSCHED, cortex)
 ARM_CORE("cortex-a9",		cortexa9, cortexa9,		7A,  FL_LDSCHED, cortex_a9)
-ARM_CORE("cortex-a12",	  	cortexa12, cortexa12,		7A,  FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a12)
+ARM_CORE("cortex-a12",	  	cortexa12, cortexa15,		7A,  FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a12)
 ARM_CORE("cortex-a15",		cortexa15, cortexa15,		7A,  FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15)
 ARM_CORE("cortex-r4",		cortexr4, cortexr4,		7R,  FL_LDSCHED, cortex)
 ARM_CORE("cortex-r4f",		cortexr4f, cortexr4f,		7R,  FL_LDSCHED, cortex)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 0fc6b76..0d773bb 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28979,6 +28979,7 @@ arm_issue_rate (void)
 case cortexa7:
 case cortexa8:
 case cortexa9:
+case cortexa12:
 case cortexa53:
 case fa726te:
 case marvell_pj4:
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 46fc442..c474ff1 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -477,7 +477,7 @@
 (define_attr "generic_vfp" "yes,no"
   (const (if_then_else
 	  (and (eq_attr "fpu" "vfp")
-	   (eq_attr "tune" "!arm1020e,arm1022e,cortexa5,cortexa7,cortexa8,cortexa9,cortexa12,cortexa53,cortexm4,marvell_pj4")
+	   (eq_attr "tune" "!arm1020e,arm1022e,cortexa5,cortexa7,cortexa8,cortexa9,cortexa53,cortexm4,marvell_pj4")
 	   (eq_attr "tune_cortexr4" "no"))
 	  (const_string "yes")
 	  (const_string "no"

Fix devirt2.C testcase

2013-12-17 Thread Jan Hubicka
Hi,
I forgot the following change in my tree.  It fixes type consistency sanity
check in get_polymorphic_call_info.  With the change to gimple-fold it is
now needed to devrirtualize devirt2.C. (previously the bug went latent since
the old code handled the testcase)

I am re-testing x86_64-linux and will commit it shortly.  I apologize for
breaking the testcase.

Honza

* ipa-devirt.c (get_polymorphic_call_info): Fix offset when
checking type consistency; do not set bogus outer_type
when check fails.

Index: ipa-devirt.c
===
--- ipa-devirt.c(revision 206040)
+++ ipa-devirt.c(working copy)
@@ -982,23 +982,22 @@ get_polymorphic_call_info (tree fndecl,
 is known.  */
  else if (DECL_P (base))
{
- context->outer_type = TREE_TYPE (base);
  gcc_assert (!POINTER_TYPE_P (context->outer_type));
 
  /* Only type inconsistent programs can have otr_type that is
 not part of outer type.  */
- if (!contains_type_p (context->outer_type,
-   context->offset, *otr_type))
+ if (!contains_type_p (TREE_TYPE (base),
+   context->offset + offset2, *otr_type))
return base_pointer;
+ context->outer_type = TREE_TYPE (base);
  context->offset += offset2;
- base_pointer = NULL;
  /* Make very conservative assumption that all objects
 may be in construction. 
 TODO: ipa-prop already contains code to tell better. 
 merge it later.  */
  context->maybe_in_construction = true;
  context->maybe_derived_type = false;
- return base_pointer;
+ return NULL;
}
  else
break;


Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly Elemental functions) for C

2013-12-17 Thread Thomas Schwinge
Hi!

On Tue, 17 Dec 2013 11:27:51 +0100, Jakub Jelinek  wrote:
> On Tue, Dec 17, 2013 at 11:03:12AM +0100, Thomas Schwinge wrote:
> > My understanding/reasoning is that PRAGMA_OMP_* just literally represents
> > a parser token of a pragma line (see the one-to-one translation in
> > c-parser.c:c_parser_omp_clause_name, for example).  This means that
> > »#pragma omp parallel copyin ([...])« and »#pragma acc parallel copyin
> > ([...])« can share the same PRAGMA_OMP_CLAUSE_COPYIN, even though it
> > means something different to both of them; PRAGMA_OMP_CLAUSE_* alone
> > doesn't convey any meaning (apart from the token/"string" used in the
> > pragma line), and it gets its meaning only if interpreted as part of a
> > Open* construct/directive.  Just like many other tokens only get their
> > semantic meaning when parsed inside a specific language construct.  For
> > OpenACC, the disambiguation, that is, translation from
> > PRAGMA_OMP_CLAUSE_* to OMP_CLAUSE_*...
> > 
> > > That way, you don't have to change anything in c_parser_omp_all_clauses,
> > > just add handling of the 3 clauses that don't have OpenMP counterparts.
> > 
> > ... then indeed happens in a new c_parser_oacc_all_clauses, which parses
> > all the applicable PRAGMA_OMP_CLAUSE_* according to the OpenACC
> > semantics.
> 
> Unlike OpenACC, Cilk+ for the vector attribute has pretty much the OpenMP
> syntax, with just a few exceptions (in particular, 3 clauses have different
> names (and there are extra requirements for vectorlength?) and for linear
> there is an extension on the Cilk+ side.  So, duplicating the
> c_parser_*all_clauses in that case is IMHO not needed, the mask specifies
> which clauses are allowed in the particular construct and the only case
> which needs disambiguation (linear clauses' step) can be disambiguated
> by checking if some Cilk+ specific clause is in the mask (already the
> clause splitting code uses such tests).

Right, if they're that similar, I agree that's the way to go.

> If OpenACC clauses have different names from the OpenMP/Cilk+ ones, I don't
> see why you would need a new *_all_clauses function, just supply a different
> mask

OpenACC clauses share some clause names and their semantics with OpenMP,
and some new ones, but there are also several that have the same name
(such as said copyin) but with a different meaning.

For example for the copyin case, my understanding is that I can either
re-use the existing PRAGMA_OMP_CLAUSE_COPYIN but then need to interpret
it differently in an OpenACC context, and thus need a new
c_parser_oacc_all_clauses (or add some ugly »if ([inside OpenACC
directive]) { [...] } else { [existing OpenMP code]}«.  Alternatively, I
have to add a new PRAGMA_OMP_CLAUSE_OACC_COPYIN, and can then use the
existing c_parser_omp_all_clauses.  Due to the perceived one-to-one
mapping between the existing PRAGMA_OMP_CLAUSE_* and the tokens (such as
"copyin"), the former seemed more appropriate to me, as detailed above.


> > For example, said PRAGMA_OMP_CLAUSE_COPYIN is translated to
> > OMP_CLAUSE_MAP with OMP_CLAUSE_MAP_TO, and the (new)
> > PRAGMA_OMP_CLAUSE_PRESENT_OR_COPYOUT (which is only interpreted/valid
> > inside OpenACC contexts) is translated to OMP_CLAUSE_MAP with (new)
> > OMP_CLAUSE_MAP_PRESENT_OR_FROM (which is only interpreted/valid inside
> > OpenACC contexts).
> 
> This is weird, because present or {alloc,from,to,fromto} is the OpenMP
> behavior, so I'd expect you would be adding a bit for the other, non-OpenMP
> compatible behavior instead.

I'm currently working on this.  Per my current understanding, in the
front end and middle end (gimplify.c, omp-low.c), the handling of
present_or_* vs. their "normal" variants is the same for OpenACC, and the
difference is only apparent once they're interpreted in the runtime
library, which is free to decide which way round to interpret the
present_or bit.  Anyway, you're absolutely right that I should preserve
the same meaning of the map kinds, such as OMP_CLAUSE_MAP_FROM, for both
the OpenACC (meaning present_or_copyout) and OpenMP (meaning map from)
entry points, to avoid confusion at this level.  Especially if their
semantics are exactly the same (to be checked), and especially given
we're trying to converge on the shared infrastructure, as I'm advocating
it myself...


I hope to post some code soon, which will hopefully help to display my
ideas.


Grüße,
 Thomas


pgpQt0tAbCTgn.pgp
Description: PGP signature


[RFC][gomp4] Offloading patches (1/3): Add '-fopenmp_target' option

2013-12-17 Thread Michael V. Zolotukhin
Hi everybody,

Here is a set of patches implementing one more piece of offloading support in
GCC.  These three patches allow to build a host binary with target image and all
tables embedded.  Along with patches for libgomp and libgomp plugin, which
hopefully will be sent soon, that gives a functional and runnable executable (or
DSO) with actual offloading to MIC.

There is still a lot to do in this area, but this is the necessary basics - with
this we could actually run offloaded code, produced fully by compiler.

We would like to hear any feedback on these patches: what issues we should
address first before commit (if any), how the patches fit OpenACC work, etc.

Here is a patch 1/3: Add '-fopenmp_target' option This option tells lto1 to look
for "*.target_lto*" sections instead of usual "*.lto*".  That option is passed
to target compiler when we invoke it to build target image.

Thanks,
Michael


---
 gcc/lto/lang.opt |4 
 gcc/lto/lto-object.c |5 +++--
 gcc/lto/lto.c|7 ++-
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/gcc/lto/lang.opt b/gcc/lto/lang.opt
index 7a9aede..cd0098c 100644
--- a/gcc/lto/lang.opt
+++ b/gcc/lto/lang.opt
@@ -40,4 +40,8 @@ fresolution=
 LTO Joined
 The resolution file
 
+fopenmp_target
+LTO Var(flag_openmp_target)
+Run LTO infrastructure to read target-side bytecode and to build it.
+
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/lto/lto-object.c b/gcc/lto/lto-object.c
index 19f10cc..64274f3 100644
--- a/gcc/lto/lto-object.c
+++ b/gcc/lto/lto-object.c
@@ -59,6 +59,8 @@ struct lto_simple_object
 
 static simple_object_attributes *saved_attributes;
 
+extern const char *section_name_prefix;
+
 /* Initialize FILE, an LTO file object for FILENAME.  */
 
 static void
@@ -229,8 +231,7 @@ lto_obj_add_section (void *data, const char *name, off_t 
offset,
   void **slot;
   struct lto_section_list *list = loasd->list;
 
-  if (strncmp (name, LTO_SECTION_NAME_PREFIX,
-  strlen (LTO_SECTION_NAME_PREFIX)) != 0)
+  if (strncmp (name, section_name_prefix, strlen (section_name_prefix)))
 return 1;
 
   new_name = xstrdup (name);
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index 0211437..dedf8a8 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -49,6 +49,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "context.h"
 #include "pass_manager.h"
 
+extern const char *section_name_prefix;
+
 /* Vector to keep track of external variables we've seen so far.  */
 vec *lto_global_var_decls;
 
@@ -2081,7 +2083,7 @@ lto_section_with_id (const char *name, unsigned 
HOST_WIDE_INT *id)
 {
   const char *s;
 
-  if (strncmp (name, LTO_SECTION_NAME_PREFIX, strlen 
(LTO_SECTION_NAME_PREFIX)))
+  if (strncmp (name, section_name_prefix, strlen (section_name_prefix)))
 return 0;
   s = strrchr (name, '.');
   return s && sscanf (s, "." HOST_WIDE_INT_PRINT_HEX_PURE, id) == 1;
@@ -2757,6 +2759,9 @@ read_cgraph_and_symbols (unsigned nfiles, const char 
**fnames)
 
   timevar_push (TV_IPA_LTO_DECL_IN);
 
+  if (flag_openmp_target)
+section_name_prefix = OMP_SECTION_NAME_PREFIX;
+
   real_file_decl_data
 = decl_data = ggc_alloc_cleared_vec_lto_file_decl_data_ptr (nfiles + 1);
   real_file_count = nfiles;
-- 
1.7.1





[RFC][gomp4] Offloading patches (2/3): Add tables generation

2013-12-17 Thread Michael V. Zolotukhin
Hi everybody,

Here is a patch 2/3: Add tables generation.

This patch is just a slightly modified patch sent a couple of weeks ago.  When
compiling with '-fopenmp' compiler generates a special symbol, containing
addresses and sizes of globals/omp_fn-functions, and places it into a special
section.  Later, at linking, these sections are merged together and we get a
single table with all addresses/sizes for entire binary.  Also, in this patch we
start to pass '__OPENMP_TARGET__' symbol to GOMP_target calls.

Thanks,
Michael


---
 gcc/omp-low.c |  119 +
 gcc/omp-low.h |1 +
 gcc/toplev.c  |3 +
 3 files changed, 115 insertions(+), 8 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index e0f7d1d..f860204 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -58,6 +58,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "optabs.h"
 #include "cfgloop.h"
 #include "target.h"
+#include "common/common-target.h"
 #include "omp-low.h"
 #include "gimple-low.h"
 #include "tree-cfgcleanup.h"
@@ -8371,19 +8372,22 @@ expand_omp_target (struct omp_region *region)
 }
 
   gimple g;
-  /* FIXME: This will be address of
- extern char __OPENMP_TARGET__[] __attribute__((visibility ("hidden")))
- symbol, as soon as the linker plugin is able to create it for us.  */
-  tree openmp_target = build_zero_cst (ptr_type_node);
+  tree openmp_target
+= build_decl (UNKNOWN_LOCATION, VAR_DECL,
+ get_identifier ("__OPENMP_TARGET__"), ptr_type_node);
+  TREE_PUBLIC (openmp_target) = 1;
+  DECL_EXTERNAL (openmp_target) = 1;
   if (kind == GF_OMP_TARGET_KIND_REGION)
 {
   tree fnaddr = build_fold_addr_expr (child_fn);
-  g = gimple_build_call (builtin_decl_explicit (start_ix), 7,
-device, fnaddr, openmp_target, t1, t2, t3, t4);
+  g = gimple_build_call (builtin_decl_explicit (start_ix), 7, device,
+fnaddr, build_fold_addr_expr (openmp_target),
+t1, t2, t3, t4);
 }
   else
-g = gimple_build_call (builtin_decl_explicit (start_ix), 6,
-  device, openmp_target, t1, t2, t3, t4);
+g = gimple_build_call (builtin_decl_explicit (start_ix), 6, device,
+  build_fold_addr_expr (openmp_target),
+  t1, t2, t3, t4);
   gimple_set_location (g, gimple_location (entry_stmt));
   gsi_insert_before (&gsi, g, GSI_SAME_STMT);
   if (kind != GF_OMP_TARGET_KIND_REGION)
@@ -12379,4 +12383,103 @@ make_pass_omp_simd_clone (gcc::context *ctxt)
   return new pass_omp_simd_clone (ctxt);
 }
 
+/* Helper function for omp_finish_file routine.
+   Takes decls from V_DECLS and adds their addresses and sizes to
+   constructor-vector V_CTOR.  It will be later used as DECL_INIT for decl
+   representing a global symbol for OpenMP descriptor.
+   If IS_FUNCTION is true, we use 1 for size.  */
+static void
+add_decls_addresses_to_decl_constructor (vec *v_decls,
+vec *v_ctor,
+bool is_function)
+{
+  unsigned int len = 0, i;
+  tree size, it;
+  len = vec_safe_length (v_decls);
+  for (i = 0; i < len; i++)
+{
+  /* Decls are placed in reversed order in fat-objects, so we need to
+revert them back if we compile target.  */
+  if (!flag_openmp_target)
+   it = (*v_decls)[i];
+  else
+   it = (*v_decls)[len - i - 1];
+  size = is_function ? integer_one_node : DECL_SIZE (it);
+  CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, build_fold_addr_expr (it));
+  CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE,
+ fold_convert (const_ptr_type_node,
+   size));
+}
+}
+
+/* Create new symbol containing (address, size) pairs for omp-marked
+   functions and global variables.  */
+void
+omp_finish_file (void)
+{
+  struct cgraph_node *node;
+  struct varpool_node *vnode;
+  const char *section_name = ".offload_func_table_section";
+  tree new_decl, new_decl_type;
+  vec *v;
+  vec *v_func, *v_var;
+  tree ctor;
+  int num = 0;
+
+  if (!targetm_common.have_named_sections)
+return;
+
+  vec_alloc (v_func, 0);
+  vec_alloc (v_var, 0);
+
+  /* Collect all omp-target functions.  */
+  FOR_EACH_DEFINED_FUNCTION (node)
+{
+  /* TODO: This check could fail on functions, created by omp
+parallel/task pragmas.  It's better to name outlined for offloading
+functions in some different way and to check here the function name.
+It could be something like "*_omp_tgtfn" in contrast with "*_omp_fn"
+for functions from omp parallel/task pragmas.  */
+  if (!lookup_attribute ("omp declare target",
+DECL_ATTRIBUTES (node->decl))
+ || !DECL_ARTIFICIAL (node->decl))
+   continue;
+  vec_safe_push (v_func, node->decl);
+  num ++;
+}
+  /* Collect a

[RFC][gomp4] Offloading patches (3/3): Add invocation of target compiler

2013-12-17 Thread Michael V. Zolotukhin
Hi everybody,

Here is a patch 3/3: Add invocation of target compiler.

With this patch lto-wrapper performs invocation of target compilers and embeds
the resultant target images into the host binary.  The targets and the
corresponding compilers are supposed to be specified in a special environment
variables (that works well with the recent Andrey's patch).  For now, we need
'-flto' options for the infrastructure to run - but I think we could remove this
requirement in future.

We generate C-files which are used for creating symbols for descriptor header
and descriptor end.  With this and after some manipulations with symbols (we
need to call objcopy a couple of times for this) all files: i.e. target images,
descriptor-header and descriptor-end - are linked together into a new object
file which is passed to the linker as an output of lto-wrapper.

Thanks,
Michael

---
 gcc/lto-wrapper.c |  560 +
 1 files changed, 560 insertions(+), 0 deletions(-)

diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index 335ec8f..a9085b5 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -52,6 +52,11 @@ along with GCC; see the file COPYING3.  If not see
 
 #define LTO_SECTION_NAME_PREFIX ".gnu.lto_"
 
+#define OFFLOAD_FUNC_TABLE_SECTION_NAME ".offload_func_table_section"
+#define OFFLOAD_IMAGE_SECTION_NAME ".offload_image_section"
+#define OFFLOAD_TARGET_NAMES_ENV   "OFFLOAD_TARGET_NAMES"
+#define OFFLOAD_TARGET_COMPILERS_ENV   "OFFLOAD_TARGET_COMPILERS"
+
 /* End of lto-streamer.h copy.  */
 
 int debug; /* true if -save-temps.  */
@@ -447,6 +452,540 @@ merge_and_complain (struct cl_decoded_option 
**decoded_options,
 }
 }
 
+
+/* Parse STR, saving found tokens into PVALUES and return their number.
+   Tokens are assumed to be delimited by ':'.  */
+
+static unsigned
+parse_env_var (const char *str, char ***pvalues)
+{
+  const char *curval, *nextval;
+  char **values;
+  unsigned num = 1, i;
+
+  curval = strchr (str, ':');
+  while (curval)
+{
+  num++;
+  curval = strchr (curval + 1, ':');
+}
+
+  values = (char**) xmalloc (num * sizeof (char*));
+  curval = str;
+  nextval = strchrnul (curval, ':');
+  for (i = 0; i < num; i++)
+{
+  int l = nextval - curval;
+  values[i] = (char*) xmalloc (l + 1);
+  memcpy (values[i], curval, l);
+  values[i][l] = 0;
+
+  curval = nextval + 1;
+  nextval = strchrnul (curval, ':');
+}
+  *pvalues = values;
+  return num;
+}
+
+/* Generate openmp-descriptor file.  The function generates source-file and 
then
+   compiles it with COLLECT_GCC.
+   NAMES are the names of the targets, they are used in names of generated
+   symbols.  NUM is the number of targets.
+   FOR_TARGET specifies whether we generate descriptor for host or for
+   target-side.  We add pointers to images into the table only for host side.
+   Return value is the name of the generated object file.  */
+
+static char*
+generate_descriptor_file (int num, char **names, const char *collect_gcc,
+ bool for_target = false)
+{
+  const char **target_argv;
+  struct obstack target_argv_obstack;
+  FILE *desc_src_file = NULL;
+  char *desc_src_filename = NULL;
+  char *desc_obj_filename = NULL;
+  int i;
+
+  desc_src_filename = make_temp_file ("_omp_descr.c");
+  desc_src_file = fopen (desc_src_filename, "wb");
+  if (!desc_src_file)
+{
+  free (desc_src_filename);
+  return NULL;
+}
+
+  for (i = 0; i < num; i++)
+{
+  fprintf (desc_src_file, "extern void *_omp_image_%s_start;\n", names[i]);
+  fprintf (desc_src_file, "extern void *_omp_image_%s_end;\n", names[i]);
+}
+  fprintf (desc_src_file, "extern void *_omp_func_table[];\n");
+  fprintf (desc_src_file, "extern void *_omp_table_end[];\n");
+  fprintf (desc_src_file,
+  "void *__OPENMP_TARGET__[]\n"
+  "  __attribute__ ((__used__, visibility (\"protected\"),\n"
+  "  section (\"%s\"))) = {\n",
+  OFFLOAD_IMAGE_SECTION_NAME);
+  /* First two elements describes Openmp Functions/Globals table.
+ Target side descriptor contains nothing else.  Host side descriptor
+ contains pointers to images after that.  */
+  fprintf (desc_src_file, "  &_omp_func_table, &_omp_table_end,\n");
+  if (!for_target)
+for (i = 0; i < num; i++)
+  fprintf (desc_src_file, "  &_omp_image_%s_start, &_omp_image_%s_end,\n",
+  names[i], names[i]);
+  fprintf (desc_src_file, "};\n");
+
+  if (!for_target)
+fprintf (desc_src_file,
+"void GOMP_register_lib (const void *);\n"
+"__attribute__((constructor))\n"
+"static void\n"
+"init (void)\n"
+"{\n"
+"  GOMP_register_lib (__OPENMP_TARGET__);\n"
+"}\n");
+  else
+fprintf (desc_src_file,
+"void target_register_lib (const void *);\n"
+"__attribute__((constructor))\n"
+"static void\n"
+"init (void)\n"
+"{\n"
+"  target

[RS6000] bswapdi2 pattern, reload and lra

2013-12-17 Thread Alan Modra
This patch is aimed at fixing test failures introduced by my
2013-12-07 change to bswapdi2_32bit:
FAIL: gcc.target/powerpc/pr53199.c scan-assembler-times lwbrx 6
FAIL: gcc.target/powerpc/pr53199.c scan-assembler-times stwbrx 6

The 2013-12-07 change was necessary to make -m32 -mlra generate good
code for pr53199.c:reg_reverse.  Too many '?'s on the r->r alternative
result in lra never choosing that option.  Instead we get Z->r,
ie. storing the input reg to a stack temp then using lwbrx from there.
That means we have a load-hit-store flush with a severe slowdown.
(See http://gcc.gnu.org/ml/gcc-patches/2013-12/msg2.html for the
corresponding -m64 result, a 4x slowdown.)

A similar problem occurs with -m64 -mcpu=power7 -mlra due to
bswapdi2_ldbrx having two '?'s on r->r.

To fix this I ran into a whole lot of pain.  reload and lra are quite
different in their selection of insn alternatives.  I could not find
any combination of '?'s that generated the best code for both reload
an lra on pr53199.c.  To see why, it's necessary to look (from a great
height) at how reload chooses amongst alternatives.  A particular
alternative gets a "loser" score, with the lowest "loser" being
chosen.  "loser" is incremented
a) when an operand does not match its constraint.
b) when an alternative has a '?' in the constraint.
c) when a scratch register is required.
d) when an early clobber output clashes with one of the inputs.

a) is fairly obvious.  For example, if we have a MEM when the operand
   alternative needs a reg, then we'll require a reload.
b) is also quite obvious.  Multiple '?'s accumulate.
c) is a little more subtle.  It bites you when alternatives require
   differing numbers of scratch registers.  Take for example
   bswapdi2_64bit, which before this patch has three alternatives
   Z->r with 3 scratch regs, (Z is a subset of m)
   r->Z with 2 scratch regs,
   r->r with 3 scratch regs.
   All other things being equal, with reload you could correct this
   disparity by adding a '?' to the r->Z alternative.  We might want
   to do that so that Z->r and r->r are the same "distance" apart
   as r->Z is from r->r.  With lra it seems that scratch regs are
   weighted differently..
d) is also tricky, and a trap for anyone optimizing insn selection for
   functions like some in pr53199.c that have just one rtl insn with
   early clobbers.  PowerPC generally returns function results in the
   same register as the first argument, so these hit the early
   clobber.  Code elsewhere in larger functions probably won't.
   lra penalizes early clobbers differently to reload (a lot less).

So, putting this all together..  Point (d) test implication is covered
by the additional functions in pr53199.c.  Avoiding early clobbers
where possible is good since it reduces differences between reload and
lra insn alternative costing.  We also generate better code.  I
managed to do this for all the Z->r bswapdi patterns, but stopped
short of changing r->r as well since at that point everything looked
OK!  Avoiding extra scratch registers for one alternative is also good.
The op4 r->r scratch wasn't even used, and op4 for Z->r fairly easy to
do without.  Renaming word_high/low to word1/2 was to make it a little
easier to track lifetime of addr1/2.

Bootstrapped and regression tested powerpc64-linux.  Output of
pr53199.c inspected for sanity with -mcpu=power{6,7} -m{32,64} and
{-mlra,}.  OK to apply?

gcc/
* config/rs6000/rs6000.md (bswapdi2): Remove one scratch reg.
Modify Z->r bswapdi splitter to use dest in place of scratch.
In r->Z and Z->r bswapdi splitter rename word_high, word_low
to word1, word2 and rearrange logic to suit.
(bswapdi2_64bit): Remove early clobber on Z->r alternative.
(bswapdi2_ldbrx): Likewise.  Remove '??' on r->r.
(bswapdi2_32bit): Remove early clobber on Z->r alternative.
Add one '?' on r->r.  Modify Z->r splitter to avoid need for
early clobber.
gcc/testsuite/
* gcc.target/powerpc/pr53199.c: Add extra functions.

Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 206009)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -2344,8 +2344,7 @@
   (bswap:DI
(match_operand:DI 1 "reg_or_mem_operand" "")))
  (clobber (match_scratch:DI 2 ""))
- (clobber (match_scratch:DI 3 ""))
- (clobber (match_scratch:DI 4 ""))])]
+ (clobber (match_scratch:DI 3 ""))])]
   ""
 {
   if (!REG_P (operands[0]) && !REG_P (operands[1]))
@@ -2363,11 +2362,10 @@
 
 ;; Power7/cell has ldbrx/stdbrx, so use it directly
 (define_insn "*bswapdi2_ldbrx"
-  [(set (match_operand:DI 0 "reg_or_mem_operand" "=&r,Z,??&r")
+  [(set (match_operand:DI 0 "reg_or_mem_operand" "=r,Z,&r")
(bswap:DI (match_operand:DI 1 "reg_or_mem_operand" "Z,r,r")))
(clobber (match_scratch:DI 2 "=X,X,&r"))
-   (clob

Re: [ARM 1/5 big.LITTLE] Add driver support for rewriting -mcpu names

2013-12-17 Thread Richard Earnshaw
On 17/12/13 10:40, James Greenhalgh wrote:
> 
> Hi,
> 
> This patch adds machinery to the driver to ensure that big.LITTLE
> style tuning names are rewritten before they are passed to the
> assembler. This reduces the coupling needed between GCC versions
> and assembler versions.
> 
> The rule is simple, we truncate the CPU name at the first '.'
> character we see.
> 
> Thus -mcpu=cortex-a15.cortex-a7 would be truncated to -mcpu=cortex-a15.
> 
> Bootstrapped on a ChromeBook and checked for an arm-none-eabi and
> an arm-none-linux-gnueabi build.
> 
> Thanks,
> James
> 
> ---
> gcc/
> 
> 2013-12-17  James Greenhalgh  
> 
>   * common/config/arm/arm-common.c (arm_rewrite_selected_cpu): New.
>   (arm_rewrite_mcpu): Likewise.
>   * config/arm/arm-protos.h (arm_rewrite_selected_cpu): New.
>   * config/arm/arm.h (BIG_LITTLE_SPEC): New.
>   (BIG_LITTLE_SPEC_FUNCTIONS): Likewise.
>   (EXTRA_SPEC_FUNCTIONS): Include BIG_LITTLE_SPEC_FUNCTIONS.
>   (ASM_CPU_SPEC): Include BIG_LITTLE_SPEC.
>   * config/arm/arm.c (arm_file_start): Rewrite arm_selecetd_cpu values.
> 

OK.

R.




Re: [ARM 2/5 big.LITTLE] Allow tuning parameters without unique tuning targets.

2013-12-17 Thread Richard Earnshaw
On 17/12/13 10:40, James Greenhalgh wrote:
> 
> Hi,
> 
> A limitation in the ARM backend is that each core added to arm-cores.def
> must provide a unique identifier to be used for tuning. This restricts
> us when we want to share the same identifier between a number of cores.
> 
> The machinery here is a bit messy, and we don't really make it any nicer
> in this patch. But, this change does allow you to add core names which
> use other tuning targets easily.
> 
> This, for example allows us to wire up -mcpu=cortex-a15.cortex-a7 to
> use the scheduler description for Cortex-A7 without requiring
> modifications to the Cortex-A7 scheduler description.
> 
> Bootstrapped in series and checked on arm-none-linux-gnueabi and
> arm-none-eabi.
> 
> OK?
> 
> Thanks,
> James
> 
> ---
> gcc/
> 
> 2013-12-17  James Greenhalgh  
> 
>   * config/arm/arm-cores.def: Add new column for TUNE_IDENT.
>   * config/arm/genopt.sh: Improve layout.
>   * config/arm/arm-tune.md: Regenerate.
>   * config/arm/arm-tables.opt: Regenerate.
>   * config/arm/arm-opts.h (ARM_CORE): Modify macro for TUNE_IDENT.
>   * config/arm/arm.c (ARM_CORE): Modify macro for TUNE_IDENT.
>   (arm_option_override): When a CPU is chosen, that should also
>   form the tune target.
>   * config/arm/arm.h (ARM_CORE): Modify macro for TUNE_IDENT.
> 

OK.

R.




Re: [ARM 3/5 big.LITTLE] Add support for -mcpu=cortex-a15.cortex-a7

2013-12-17 Thread Richard Earnshaw
On 17/12/13 10:40, James Greenhalgh wrote:
> 
> Hi,
> 
> This patch wires up -mcpu=cortex-a15.cortex-a7 as an option to
> -mcpu.
> 
> Bootstrapped in series, with --with-cpu=cortex-a15.cortex-a7.
> 
> OK?
> 
> Thanks,
> James
> 
> ---
> 2013-12-17  James Greenhalgh  
> 
>   * config/arm/arm-cores.def (cortex-a15.cortex-a7): New.
>   * doc/invoke.texi: Document -mcpu=cortex-a15.cortex-a7.
>   * config/arm/arm-tables.opt: Regenerate.
>   * config/arm/arm-tune.md: Regenerate.
>   * config/arm/bpabi.h
>   (BE8_LINK_SPEC): Handle -mcpu=cortex-a5.cortex-a7.
> 
> 
OK.

R.




Re: [ARM 4/5 big.LITTLE] Add support for -mcpu=cortex-a57

2013-12-17 Thread Richard Earnshaw
On 17/12/13 10:40, James Greenhalgh wrote:
> 
> Hi,
> 
> This patch wires up -mcpu=cortex-a57 as an option to
> -mcpu. As we don't yet have a scheduling model for Cortex-A57
> available, for now we use the scheduling description for another
> "big" core, the Cortex-A15.
> 
> Bootstrapped in series and sanity checked.
> 
> OK?
> 
> Thanks,
> James
> 
> ---
> 2013-12-17  James Greenhalgh  
> 
>   * config/arm/arm-cores.def (cortex-a57): New.
>   * doc/invoke.texi: Document -mcpu=cortex-a57.
>   * config/arm/arm-tables.opt: Regenerate.
>   * config/arm/arm-tune.md: Regenerate.
>   * config/arm/bpabi.h (BE8_LINK_SPEC): Handle -mcpu=cortex-a57.
> 
> 

OK.

R.




Re: [ARM 5/5 big.LITTLE] Add support for -mcpu=cortex-a57.cortex-a53

2013-12-17 Thread Richard Earnshaw
On 17/12/13 10:40, James Greenhalgh wrote:
> 
> Hi,
> 
> This patch wires up -mcpu=cortex-a57.cortex-a53 as an option to
> -mcpu.
> 
> Bootstrapped in series, and sanity checked.
> 
> OK?
> 
> Thanks,
> James
> 
> ---
> 2013-12-17  James Greenhalgh  
> 
>   * config/arm/arm-cores.def (cortex-a57.cortex-a53): New.
>   * doc/invoke.texi: Document -mcpu=cortex-a57.cortex-a53.
>   * config/arm/arm-tables.opt: Regenerate.
>   * config/arm/arm-tune.md: Regenerate.
>   * config/arm/bpabi.h
>   (BE8_LINK_SPEC): Handle -mcpu=cortex-a57.cortex-a53.
> 
> 

OK.

R.




Re: [PATCH][ARM] Wire up scheduling for Cortex-A12

2013-12-17 Thread Richard Earnshaw
On 17/12/13 10:51, Kyrill Tkachov wrote:
> Hi all,
> 
> This patch wires up the Cortex-A12 instruction scheduling to use the 
> Cortex-A15 
> pipeline description and sets the issue rate for it to 2 in arm_issue_rate.
> 
> This patch depends on James' recent rework of the tuning parameters posted at
> http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01477.html
> 
> Tested arm-none-eabi on qemu.
> 
> Ok for trunk after the prerequisite goes in?
> 
> Thanks,
> Kyrill
> 
> 2013-12-17  Kyrylo Tkachov  
> 
>  * config/arm/arm-cores.def (cortex-a12): Use cortexa15 scheduling.
>  * config/arm/arm.c (arm_issue_rate): Handle cortexa12.
>  * config/arm/arm.md (generic_vfp): Remove cortexa12.
> 
> 
OK.

R.



Minor cleanup in expmed.c

2013-12-17 Thread Eric Botcazou
This fixes a few glitches introduced by the recent changes to the file.

Tested on x86-64/Linux, applied on the mainline as obvious.


2013-12-17  Eric Botcazou  

* expmed.c (lowpart_bit_field_p): Fix comment.
(store_bit_field_using_insv): Fix formatting.
(store_bit_field): Likewise.
(store_fixed_bit_field): More declaration and remove return.
(store_fixed_bit_field_1): Fix formatting.
(extract_fixed_bit_field): Move declaration.
(extract_fixed_bit_field_1): Simplify.


-- 
Eric BotcazouIndex: expmed.c
===
--- expmed.c	(revision 206039)
+++ expmed.c	(working copy)
@@ -422,7 +422,7 @@ lowpart_bit_field_p (unsigned HOST_WIDE_
 return bitnum % BITS_PER_WORD == 0;
 }
 
-/* Return true if -fstrict-volatile-bitfields applies an access of OP0
+/* Return true if -fstrict-volatile-bitfields applies to an access of OP0
containing BITSIZE bits starting at BITNUM, with field mode FIELDMODE.
Return false if the access would touch memory outside the range
BITREGION_START to BITREGION_END for conformance to the C++ memory
@@ -490,7 +490,8 @@ simple_mem_bitfield_p (rtx op0, unsigned
 static bool
 store_bit_field_using_insv (const extraction_insn *insv, rtx op0,
 			unsigned HOST_WIDE_INT bitsize,
-			unsigned HOST_WIDE_INT bitnum, rtx value)
+			unsigned HOST_WIDE_INT bitnum,
+			rtx value)
 {
   struct expand_operand ops[4];
   rtx value1;
@@ -940,7 +941,6 @@ store_bit_field (rtx str_rtx, unsigned H
   if (strict_volatile_bitfield_p (str_rtx, bitsize, bitnum, fieldmode,
   bitregion_start, bitregion_end))
 {
-
   /* Storing any naturally aligned field can be done with a simple
 	 store.  For targets that support fast unaligned memory, any
 	 naturally sized, unit aligned field can be done directly.  */
@@ -957,8 +957,7 @@ store_bit_field (rtx str_rtx, unsigned H
 	  /* Explicitly override the C/C++ memory model; ignore the
 	 bit range so that we can do the access in the mode mandated
 	 by -fstrict-volatile-bitfields instead.  */
-	  store_fixed_bit_field_1 (str_rtx, bitsize, bitnum,
-   value);
+	  store_fixed_bit_field_1 (str_rtx, bitsize, bitnum, value);
 	}
 
   return;
@@ -1002,8 +1001,6 @@ store_fixed_bit_field (rtx op0, unsigned
 		   unsigned HOST_WIDE_INT bitregion_end,
 		   rtx value)
 {
-  enum machine_mode mode;
-
   /* There is a case not handled here:
  a structure with a known alignment of just a halfword
  and a field split across two aligned halfwords within the structure.
@@ -1013,7 +1010,7 @@ store_fixed_bit_field (rtx op0, unsigned
 
   if (MEM_P (op0))
 {
-  mode = GET_MODE (op0);
+  enum machine_mode mode = GET_MODE (op0);
   if (GET_MODE_BITSIZE (mode) == 0
 	  || GET_MODE_BITSIZE (mode) > GET_MODE_BITSIZE (word_mode))
 	mode = word_mode;
@@ -1033,7 +1030,6 @@ store_fixed_bit_field (rtx op0, unsigned
 }
 
   store_fixed_bit_field_1 (op0, bitsize, bitnum, value);
-  return;
 }
 
 /* Helper function for store_fixed_bit_field, stores
@@ -1041,8 +1037,8 @@ store_fixed_bit_field (rtx op0, unsigned
 
 static void
 store_fixed_bit_field_1 (rtx op0, unsigned HOST_WIDE_INT bitsize,
-		 unsigned HOST_WIDE_INT bitnum,
-		 rtx value)
+			 unsigned HOST_WIDE_INT bitnum,
+			 rtx value)
 {
   enum machine_mode mode;
   rtx temp;
@@ -1793,12 +1789,11 @@ extract_fixed_bit_field (enum machine_mo
 			 unsigned HOST_WIDE_INT bitnum, rtx target,
 			 int unsignedp)
 {
-  enum machine_mode mode;
-
   if (MEM_P (op0))
 {
-  mode = get_best_mode (bitsize, bitnum, 0, 0,
-			MEM_ALIGN (op0), word_mode, MEM_VOLATILE_P (op0));
+  enum machine_mode mode
+	= get_best_mode (bitsize, bitnum, 0, 0, MEM_ALIGN (op0), word_mode,
+			 MEM_VOLATILE_P (op0));
 
   if (mode == VOIDmode)
 	/* The only way this should occur is if the field spans word
@@ -1821,9 +1816,7 @@ extract_fixed_bit_field_1 (enum machine_
 			   unsigned HOST_WIDE_INT bitnum, rtx target,
 			   int unsignedp)
 {
-  enum machine_mode mode;
-
-  mode = GET_MODE (op0);
+  enum machine_mode mode = GET_MODE (op0);
   gcc_assert (SCALAR_INT_MODE_P (mode));
 
   /* Note that bitsize + bitnum can be greater than GET_MODE_BITSIZE (mode)


RE: Two build != host fixes

2013-12-17 Thread Bernd Edlinger
Hi Alan,


just for the records, this is how my cross-build fails:

../gcc-4.9-20131215/configure --prefix=/home/ed/gnu/x/arm-linux-gnueabihf-cross 
--host=arm-linux-gnueabihf --target=arm-linux-gnueabihf 
--enable-languages=c,c++ --with-arch=armv7-a --with-tune=cortex-a9 
--with-fpu=vfpv3-d16 --with-float=hard



...
make[2]: Entering directory 
`/home/ed/gnu/x/gcc-build-arm-linux-gnueabihf-cross/gcc'
g++ -c -DIN_GCC -DGENERATOR_FILE -I. -Ibuild -I../../gcc-4.9-20131215/gcc 
-I../../gcc-4.9-20131215/gcc/build -I../../gcc-4.9-20131215/gcc/../include 
-I../../gcc-4.9-20131215/gcc/../libcpp/include 
-I/home/ed/gnu/x/gcc-build-arm-linux-gnueabihf-cross/./gmp 
-I/home/ed/gnu/x/gcc-4.9-20131215/gmp 
-I/home/ed/gnu/x/gcc-build-arm-linux-gnueabihf-cross/./mpfr 
-I/home/ed/gnu/x/gcc-4.9-20131215/mpfr 
-I/home/ed/gnu/x/gcc-4.9-20131215/mpc/src  
-I../../gcc-4.9-20131215/gcc/../libdecnumber 
-I../../gcc-4.9-20131215/gcc/../libdecnumber/dpd -I../libdecnumber 
-I../../gcc-4.9-20131215/gcc/../libbacktrace    \
        -o build/genmddeps.o ../../gcc-4.9-20131215/gcc/genmddeps.c
In file included from ./bconfig.h:3:0,
 from ../../gcc-4.9-20131215/gcc/genmddeps.c:18:
./auto-build.h:2037:16: error: declaration does not declare anything 
[-fpermissive]
 #define rlim_t long
    ^
In file included from ../../gcc-4.9-20131215/gcc/genmddeps.c:19:0:
../../gcc-4.9-20131215/gcc/system.h:450:23: error: conflicting declaration of C 
function 'void* sbrk(int)'
 extern void *sbrk (int);
   ^
In file included from ../../gcc-4.9-20131215/gcc/system.h:262:0,
 from ../../gcc-4.9-20131215/gcc/genmddeps.c:19:
/usr/include/unistd.h:1067:14: note: previous declaration 'void* sbrk(intptr_t)'
 extern void *sbrk (intptr_t __delta) __THROW;
  ^
In file included from ../../gcc-4.9-20131215/gcc/genmddeps.c:19:0:
../../gcc-4.9-20131215/gcc/system.h:454:48: error: ambiguating new declaration 
of 'char* strstr(const char*, const char*)'
 extern char *strstr (const char *, const char *);
    ^
In file included from /home/ed/gnu/install/include/c++/4.9.0/cstring:42:0,
 from ../../gcc-4.9-20131215/gcc/system.h:205,
 from ../../gcc-4.9-20131215/gcc/genmddeps.c:19:
/usr/include/string.h:323:22: note: old declaration 'const char* strstr(const 
char*, const char*)'
 extern __const char *strstr (__const char *__haystack,
  ^
In file included from /usr/include/features.h:357:0,
 from /usr/include/stdio.h:28,
 from ../../gcc-4.9-20131215/gcc/system.h:40,
 from ../../gcc-4.9-20131215/gcc/genmddeps.c:19:
/usr/include/malloc.h:76:32: error: declaration of 'void free(void*) throw ()' 
has a different exception specifier
 extern void free (void *__ptr) __THROW;
    ^
In file included from ../../gcc-4.9-20131215/gcc/genmddeps.c:19:0:
../../gcc-4.9-20131215/gcc/system.h:426:13: error: from previous declaration 
'void free(void*)'
 extern void free (void *);
 ^
In file included from ../../gcc-4.9-20131215/gcc/genmddeps.c:19:0:
../../gcc-4.9-20131215/gcc/system.h:506:34: error: conflicting declaration of C 
function 'const char* strsignal(int)'
 extern const char *strsignal (int);
  ^
In file included from /home/ed/gnu/install/include/c++/4.9.0/cstring:42:0,
 from ../../gcc-4.9-20131215/gcc/system.h:205,
 from ../../gcc-4.9-20131215/gcc/genmddeps.c:19:
/usr/include/string.h:566:14: note: previous declaration 'char* strsignal(int)'
 extern char *strsignal (int __sig) __THROW;
  ^
In file included from ./bconfig.h:5:0,
 from ../../gcc-4.9-20131215/gcc/genmddeps.c:18:
../../gcc-4.9-20131215/gcc/../include/ansidecl.h:308:64: error: ambiguating new 
declaration of 'char* basename(const char*)'
 #  define ATTRIBUTE_NONNULL(m) __attribute__ ((__nonnull__ (m)))
    ^
../../gcc-4.9-20131215/gcc/../include/libiberty.h:110:64: note: in expansion of 
macro 'ATTRIBUTE_NONNULL'
 extern char *basename (const char *) ATTRIBUTE_RETURNS_NONNULL 
ATTRIBUTE_NONNULL(1);
    ^
In file included from /home/ed/gnu/install/include/c++/4.9.0/cstring:42:0,
 from ../../gcc-4.9-20131215/gcc/system.h:205,
 from ../../gcc-4.9-20131215/gcc/genmddeps.c:19:
/usr/include/string.h:603:28: note: old declaration 'const char* basename(const 
char*)'
 extern "C++" __const char *basename (__const char *__filename)
    ^
In file included from ./bconfig.h:5:0,
 from ../../gcc-4.9-20131215/gcc/genmddeps.c:18:
../../gcc-4.9-20131215/gcc/../include/ansidecl.h:308:64: error: declaration of 
'int snprintf(char*, size_t, const char*, ...)' has a different exception 
specifier
 #  de

Re: [Patch,avr]: Fix wrong warning PR59396

2013-12-17 Thread Georg-Johann Lay

Am 12/05/2013 04:09 PM, schrieb Richard Biener:

On Thu, Dec 5, 2013 at 3:53 PM, Georg-Johann Lay  wrote:

This is a fix of a wrong warning for a bas ISR name.  The assumption was
that if DECL_ASSEMBLER_NAME is set, it would always starts with a *.

This is not the case for LTO compiler where the assembler name is the plain
name of the function (except an assembler name is set).


That sounds odd to me.  Does the bug reproduce with -fwhole-program?
Or if the interrupt handler is static?


Hi, I tried to debug lto1.

What I see is that SET_DECL_ASSEMBLER_NAME with "__vector_14", i.e. without a 
leading '*', is called from


tree-streamer-in.c:lto_input_ts_decl_with_vis_tree_pointers().

Hope that helps in narrowing down the issue.

Johann


Richard.


Thus, do a more restrictive test if the first character of the function name
has to be skipped.

Ok to commit?

Johann

 PR target/59396
 * config/avr/avr.c (avr_set_current_function): If the first char
 of the function name is skipped, make sure it is actually '*'.




Another build!=host fix

2013-12-17 Thread Bernd Edlinger
Hi,

there is a small problem with SSIZE_MAX, because it is not always
defined, especially not in gcc/glimits.h, which seems to be the fall-back
if the target fails to have a working limits.h.

When I create a cross-compiler for --target=arm-linux-gnueabihf, the
working limits.h is overwritten by fix-includes with a copy of gcc/glimits.h.
Probably because it is not possible to compile the target headers with the build
compiler and produce meaningful test results.

However because gcc/glimits.h does not define SSIZE_MAX the following build 
fails with

In file included from ../../gcc-4.9-20131215/gcc/config/host-linux.c:21:0:
../../gcc-4.9-20131215/gcc/config/host-linux.c: In function 'int 
linux_gt_pch_use_address(void*, size_t, int, size_t)':
../../gcc-4.9-20131215/gcc/config/host-linux.c:215:43: error: 'SSIZE_MAX' was 
not declared in this scope
   nbytes = read (fd, base, MIN (size, SSIZE_MAX));
   ^
../../gcc-4.9-20131215/gcc/system.h:351:26: note: in definition of macro 'MIN'
 #define MIN(X,Y) ((X) < (Y) ? (X) : (Y))
  ^


The most simple way to fix this would be to not use SSIZE_MAX
here.

Boot-Strapped and regression-tested on X86_64.
Plus cross-build for arm-linux-gnueabihf.

Ok for trunk?


Thanks
Bernd.

Re: patch for elimination to SP when it is changed in RTL (PR57293)

2013-12-17 Thread Yvan Roux
On 17 December 2013 00:03, Vladimir Makarov  wrote:
> On 12/13/2013, 8:07 AM, Yvan Roux wrote:
>>
>> Thanks for your help Vlad.  Another bad news about this PR fix, is
>> that it has resurrected the thumb_movhi_clobber bug (PR 58785) but in
>> a different manner as the original failing testcase still pass.  I
>> attached a testcase to be compiled with :
>>
>> cc1 -mthumb -mcpu=cortex-m0 -O2 m.c
>>
>> And Thumb bootstrap seems to be broken with an ICE in check_rtl, I'm
>> checking if it is the same issue.
>>
>
> The compiler crashes because a reload pattern is trying to take address of
> memory which is actually a spilled pseudo for LRA.  The pattern is designed
> for reload which always uses memory not a spilled pseudo as LRA does.
>
> But we don't need to adjust the pattern for LRA.  LRA can manage by itself
> without reload patterns.
>
> I found that I missed to switch off these patterns for LRA fully.  The
> following patch solves the problem.  The same was done for
> THUMB_SECONDARY_INPUT_RELOAD_CLASS long ago.
>
> Yvan, could go from this patch by yourself.  I mean testing and getting its
> approval from an ARM maintainer.  Thanks.


I remember having tested that very same patch when we changed
THUMB_SECONDARY_INPUT_RELOAD_CLASS and having build issues, but with
the fixes made since the summer, the build and the testsuite are now
ok.  Before submitting this patch I wanted to check if  it is not more
general fix which is needed, and modifying
SECONDARY_OUTPUT_RELOAD_CLASS and SECONDARY_INPUT_RELOAD_CLASS instead
of the THUMB macros, because this is here that the target IWMMXT is
handled and that we have the lra loop issue during the constraint
solving for that target.  First results shows that it fixes also some
Thumb1 regressions, but I don't have the full results for the moment.

Thanks
Yvan


[Patch,testsuite] Fix testcases that use bind_pic_locally

2013-12-17 Thread Vidya Praveen
Hello,

bind_pic_locally is broken for targets that doesn't pass -fPIC/-fpic by
default [1][2].

One of the suggestions was to have a effective target check called
bind_pic_locally_ok which checks if bind_pic_locally will work and have it
included in all the tests that uses bind_pic_locally in dg-add-options [1].

This patch implements the same by checking if -fpic/-fPIC are passed by
default as well in general with the flags passed through various means. It
returns 1 when either the -fpic/-fPIC is passed by default OR when it is 
not passed by default as well as not passed through any other means. This 
however, will allow if -fpic/-fPIC is passed both by default and by the 
other means since we can't really tell such a case and it makes no sense 
to do so (because there's no reason for the testcase to pass -fPIC/-fpic 
when it tries to override the same using bind_pic_locally and if it is 
passed by default, there's no need to pass them through, say, board file's
cflags).

default  other-means  returns
pic -   1
pic pic 1 (invalid)
-   pic 0
-   -   1

This patch also modifies all the testcases that use bind_pic_locally to 
include this bind_pic_locally_ok check.

Tested for aarch64-none-elf, arm-none-eabi, arm-none-linux-gnueabihf.

OK?

Cheers
VP.

[1] http://gcc.gnu.org/ml/gcc/2013-09/msg00207.html
[2] http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00462.html


gcc/testsuite/ChangeLog:

2013-12-17  Vidya Praveen  

* lib/target-support.exp: (check_effective_target_bind_pic_locally_ok):
New check.
* g++.dg/ipa/iinline-1.C: Introduce bind_pic_locally_ok.
* g++.dg/ipa/iinline-2.C: Likewise.
* g++.dg/ipa/iinline-3.C: Likewise.
* g++.dg/ipa/inline-1.C: Likewise.
* g++.dg/ipa/inline-2.C: Likewise.
* g++.dg/ipa/inline-3.C: Likewise.
* g++.dg/other/first-global.C: Likewise.
* g++.dg/parse/attr-externally-visible-1.C: Likewise.
* g++.dg/torture/pr40323.C: Likewise.
* g++.dg/torture/pr55260-1.C: Likewise.
* g++.dg/torture/pr55260-2.C: Likewise.
* g++.dg/tree-ssa/inline-1.C: Likewise.
* g++.dg/tree-ssa/inline-2.C: Likewise.
* g++.dg/tree-ssa/inline-3.C: Likewise.
* g++.dg/tree-ssa/nothrow-1.C: Likewise.
* gcc.dg/inline-33.c: Likewise.
* gcc.dg/ipa/ipa-1.c: Likewise.
* gcc.dg/ipa/ipa-2.c: Likewise.
* gcc.dg/ipa/ipa-3.c: Likewise.
* gcc.dg/ipa/ipa-4.c: Likewise.
* gcc.dg/ipa/ipa-5.c: Likewise.
* gcc.dg/ipa/ipa-7.c: Likewise.
* gcc.dg/ipa/ipa-8.c: Likewise.
* gcc.dg/ipa/ipacost-2.c: Likewise.
* gcc.dg/ipa/ipcp-1.c: Likewise.
* gcc.dg/ipa/ipcp-2.c: Likewise.
* gcc.dg/ipa/ipcp-4.c: Likewise.
* gcc.dg/ipa/ipcp-agg-1.c: Likewise.
* gcc.dg/ipa/ipcp-agg-2.c: Likewise.
* gcc.dg/ipa/ipcp-agg-3.c: Likewise.
* gcc.dg/ipa/ipcp-agg-4.c: Likewise.
* gcc.dg/ipa/ipcp-agg-5.c: Likewise.
* gcc.dg/ipa/ipcp-agg-6.c: Likewise.
* gcc.dg/ipa/ipcp-agg-7.c: Likewise.
* gcc.dg/ipa/ipcp-agg-8.c: Likewise.
* gcc.dg/ipa/pr56988.c: Likewise.
* gcc.dg/tree-ssa/inline-3.c: Likewise.
* gcc.dg/tree-ssa/inline-4.c: Likewise.
* gcc.dg/tree-ssa/ipa-cp-1.c: Likewise.
* gcc.dg/tree-ssa/local-pure-const.c: Likewise.
* gfortran.dg/whole_file_5.f90: Likewise.
* gfortran.dg/whole_file_6.f90: Likewise.
diff --git a/gcc/testsuite/g++.dg/ipa/iinline-1.C b/gcc/testsuite/g++.dg/ipa/iinline-1.C
index 9f99893..b86daf1 100644
--- a/gcc/testsuite/g++.dg/ipa/iinline-1.C
+++ b/gcc/testsuite/g++.dg/ipa/iinline-1.C
@@ -1,6 +1,7 @@
 /* Verify that simple indirect calls are inlined even without early
inlining..  */
 /* { dg-do compile } */
+/* { dg-require-effective-target bind_pic_locally_ok } */
 /* { dg-options "-O3 -fdump-ipa-inline -fno-early-inlining"  } */
 /* { dg-add-options bind_pic_locally } */
 
diff --git a/gcc/testsuite/g++.dg/ipa/iinline-2.C b/gcc/testsuite/g++.dg/ipa/iinline-2.C
index 670a5dd..d4329c1 100644
--- a/gcc/testsuite/g++.dg/ipa/iinline-2.C
+++ b/gcc/testsuite/g++.dg/ipa/iinline-2.C
@@ -1,6 +1,7 @@
 /* Verify that simple indirect calls are inlined even without early
inlining..  */
 /* { dg-do compile } */
+/* { dg-require-effective-target bind_pic_locally_ok } */
 /* { dg-options "-O3 -fdump-ipa-inline -fno-early-inlining"  } */
 /* { dg-add-options bind_pic_locally } */
 
diff --git a/gcc/testsuite/g++.dg/ipa/iinline-3.C b/gcc/testsuite/g++.dg/ipa/iinline-3.C
index 3daee9a..4dc604e 100644
--- a/gcc/testsuite/g++.dg/ipa/iinline-3.C
+++ b/gcc/testsuite/g++.dg/ipa/iinline-3.C
@@ -1,6 +1,7 @@
 /* Verify that we do not indirect-inline using member pointer
parameters which have been modified.  */
 /* { dg-do run } */
+/* { dg-require-effective-target bind_pic_locally_ok } */
 /* { dg-options "-O3 -fno-early-inlining"  } */
 /* { dg-

Re: [PATCH] Masked load/store vectorization (take 7)

2013-12-17 Thread H.J. Lu
On Mon, Dec 9, 2013 at 12:27 PM, Jakub Jelinek  wrote:
> Hi!
>
> On Fri, Dec 06, 2013 at 01:49:50PM +0100, Richard Biener wrote:
>> >basic_block bb = ifc_bbs[i];
>> >gimple_seq stmts;
>> >
>> > -  if (!is_predicated (bb))
>> > +  if (!is_predicated (bb)
>> > + || dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
>>
>> isn't that redundant now?
>
> After IRC discussion, moved this dominated_by_p call and another one
> from predicate_bbs to add_to_predicate_list, so that we don't change
> an always true predicate to something else on a bb that dominates
> loop->latch and therefore will be executed unconditionally inside of
> the loop body.
>
>> > +   if (TREE_CODE (addr) == SSA_NAME && !SSA_NAME_PTR_INFO (addr))
>> > + copy_ref_info (build2 (MEM_REF, TREE_TYPE (ref), addr, ptr),
>> > +ref);
>>
>> Eh - can you split out a copy_ref_info_to_addr so you can avoid
>> creating the MEM_REF?
>
> Haven't changed this one, because copy_ref_info uses the new_ref in
> several places (in addition to the initial few ones).
>
>> >  static bool
>> > +version_loop_for_if_conversion (struct loop *loop, bool *do_outer)
>> > +{
>>
>> What's the do_outer parameter?
>
> Outer loops aren't versioned anymore.
>
>> Please add a comment before this.  Seems you match what outer loop
>> vectorization handles?  Thus, best factor out a predicate in
>> tree-loop-vect.c that you can use in both places?
>
> So this went away completely.
>
>> This needs a comment with explaining what code you create.
>
> Ditto (and several other spots).
>
>> Btw I hate that we do update_ssa multiple times per pass per
>> function.  That makes us possibly worse than O(N^2) as update_ssa computes
>> the IDF of the whole function.
>>
>> This is something your patch introduces (it's only rewriting the
>> virtuals, not the incremental SSA update by BB copying).
>
> This too.
>
>> See above.  And factor this out into a function.  Also move this
>> to the cleanup loop below.
>
> Not moved to the cleanup loop, because there is no easy way to find
> then if a loop has been vectorized or not.  But it is in a separate helper
> etc.
>
> The rest should be in this new version of the patch, bootstrapped/regtested
> on x86_64-linux and i686-linux, ok for trunk?
>
> 2013-12-09  Jakub Jelinek  
>
> * tree-vectorizer.h (struct _loop_vec_info): Add scalar_loop field.
> (LOOP_VINFO_SCALAR_LOOP): Define.
> (slpeel_tree_duplicate_loop_to_edge_cfg): Add scalar_loop argument.
> * config/i386/sse.md (maskload, maskstore): New expanders.
> * tree-data-ref.c (get_references_in_stmt): Handle MASK_LOAD and
> MASK_STORE.
> * internal-fn.def (LOOP_VECTORIZED, MASK_LOAD, MASK_STORE): New
> internal fns.
> * tree-if-conv.c: Include expr.h, optabs.h, tree-ssa-loop-ivopts.h and
> tree-ssa-address.h.
> (release_bb_predicate): New function.
> (free_bb_predicate): Use it.
> (reset_bb_predicate): Likewise.  Don't unallocate bb->aux
> just to immediately allocate it again.
> (add_to_predicate_list): Add loop argument.  If basic blocks that
> dominate loop->latch don't insert any predicate.
> (add_to_dst_predicate_list): Adjust caller.
> (if_convertible_phi_p): Add any_mask_load_store argument, if true,
> handle it like flag_tree_loop_if_convert_stores.
> (insert_gimplified_predicates): Likewise.
> (ifcvt_can_use_mask_load_store): New function.
> (if_convertible_gimple_assign_stmt_p): Add any_mask_load_store
> argument, check if some conditional loads or stores can't be
> converted into MASK_LOAD or MASK_STORE.
> (if_convertible_stmt_p): Add any_mask_load_store argument,
> pass it down to if_convertible_gimple_assign_stmt_p.
> (predicate_bbs): Don't return bool, only check if the last stmt
> of a basic block is GIMPLE_COND and handle that.  Adjust
> add_to_predicate_list caller.
> (if_convertible_loop_p_1): Only call predicate_bbs if
> flag_tree_loop_if_convert_stores and free_bb_predicate in that case
> afterwards, check gimple_code of stmts here.  Replace is_predicated
> check with dominance check.  Add any_mask_load_store argument,
> pass it down to if_convertible_stmt_p and if_convertible_phi_p,
> call if_convertible_phi_p only after all if_convertible_stmt_p
> calls.
> (if_convertible_loop_p): Add any_mask_load_store argument,
> pass it down to if_convertible_loop_p_1.
> (predicate_mem_writes): Emit MASK_LOAD and/or MASK_STORE calls.
> (combine_blocks): Add any_mask_load_store argument, pass
> it down to insert_gimplified_predicates and call predicate_mem_writes
> if it is set.  Call predicate_bbs.
> (version_loop_for_if_conversion): New function.
> (tree_if_conversion): Adjust if_conv

Re: Fix devirt2.C testcase

2013-12-17 Thread H.J. Lu
On Tue, Dec 17, 2013 at 3:04 AM, Jan Hubicka  wrote:
> Hi,
> I forgot the following change in my tree.  It fixes type consistency sanity
> check in get_polymorphic_call_info.  With the change to gimple-fold it is
> now needed to devrirtualize devirt2.C. (previously the bug went latent since
> the old code handled the testcase)
>
> I am re-testing x86_64-linux and will commit it shortly.  I apologize for
> breaking the testcase.
>
> Honza
>
> * ipa-devirt.c (get_polymorphic_call_info): Fix offset when
> checking type consistency; do not set bogus outer_type
> when check fails.
>

Does it fix:

FAIL: g++.dg/ipa/devirt-13.C -std=gnu++11  scan-ipa-dump cgraph
"Devirtualizing call"
FAIL: g++.dg/ipa/devirt-13.C -std=gnu++98  scan-ipa-dump cgraph
"Devirtualizing call"

on Linux/x86?

-- 
H.J.


RE: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly Elemental functions) for C

2013-12-17 Thread Iyer, Balaji V


> -Original Message-
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Tuesday, December 17, 2013 1:18 AM
> To: Iyer, Balaji V
> Cc: Joseph S. Myers; Aldy Hernandez; 'gcc-patches@gcc.gnu.org'
> Subject: Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly
> Elemental functions) for C
> 
> On Tue, Dec 17, 2013 at 03:51:14AM +, Iyer, Balaji V wrote:
> > Hi Jakub,
> > I will work on this, but I need a couple clarifications about some of
> your comments. Please see below:
> >
> > > > +#define CILK_SIMD_FN_CLAUSE_MASK   \
> > > > +   ( (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_SIMDLEN)
> > >   \
> > > > +   | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_LINEAR)
> > >   \
> > > > +   | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_UNIFORM)
> > >   \
> > > > +   | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_INBRANCH)
> > >   \
> > > > +   | (OMP_CLAUSE_MASK_1 <<
> > > PRAGMA_OMP_CLAUSE_NOTINBRANCH))
> > >
> > > I thought you'd instead add there
> PRAGMA_CILK_CLAUSE_VECTORLENGTH,
> > > PRAGMA_CILK_CLAUSE_MASK and PRAGMA_CILK_CLAUSE_NOMASK
> (or similar).
> > >
> >
> > I looked at OpenACC implementation and they seem to use the
> OMP_CLAUSE_* (line # 11174 in c-parser.c)
> 
> It uses just PRAGMA_OMP_CLAUSE_NONE, which really means no clauses at
> all (I
> think it is for now).
> 
> > Also, If I created CILK_CLAUSE_* variants, I have to re-create another
> function similar to c_parser_omp_all_clauses, whose workings will be
> identical to the c_parser_omp_all_clauses. Is that OK with you?
> 
> No, I'd remove enum pragma_cilk_clause altogether and fold it into the end
> of
> pragma_omp_clause, as:
>   PRAGMA_CILK_CLAUSE_VECTORLENGTH,
>   PRAGMA_CILK_CLAUSE_MASK,
>   PRAGMA_CILK_CLAUSE_NOMASK,
>   PRAGMA_CILK_CLAUSE_NONE = PRAGMA_OMP_CLAUSE_NONE,
>   PRAGMA_CILK_CLAUSE_LINEAR = PRAGMA_OMP_CLAUSE_LINEAR,
>   PRAGMA_CILK_CLAUSE_PRIVATE = PRAGMA_OMP_CLAUSE_PRIVATE,
>   PRAGMA_CILK_CLAUSE_FIRSTPRIVATE =
> PRAGMA_OMP_CLAUSE_FIRSTPRIVATE,
>   PRAGMA_CILK_CLAUSE_LASTPRIVATE =
> PRAGMA_OMP_CLAUSE_LASTPRIVATE,
>   PRAGMA_CILK_CLAUSE_REDUCTION =
> PRAGMA_OMP_CLAUSE_REDUCTION
> so that you can use it in the same bitmasks.
> 
> That way, you don't have to change anything in c_parser_omp_all_clauses,
> just add handling of the 3 clauses that don't have OpenMP counterparts.

I think it sort of makes sense to me now.  I will work on this.

Oh, VECTORLENGTH in SIMD-enabled function is same as SIMDLEN in OMP4

And,

MASK = INBRANCH
NOMASK = NOTINBRANCH.


Thanks,

Balaji V. Iyer.



> 
>   Jakub


Re: Fix devirt2.C testcase

2013-12-17 Thread Jan Hubicka
> On Tue, Dec 17, 2013 at 3:04 AM, Jan Hubicka  wrote:
> > Hi,
> > I forgot the following change in my tree.  It fixes type consistency sanity
> > check in get_polymorphic_call_info.  With the change to gimple-fold it is
> > now needed to devrirtualize devirt2.C. (previously the bug went latent since
> > the old code handled the testcase)
> >
> > I am re-testing x86_64-linux and will commit it shortly.  I apologize for
> > breaking the testcase.
> >
> > Honza
> >
> > * ipa-devirt.c (get_polymorphic_call_info): Fix offset when
> > checking type consistency; do not set bogus outer_type
> > when check fails.
> >
> 
> Does it fix:
> 
> FAIL: g++.dg/ipa/devirt-13.C -std=gnu++11  scan-ipa-dump cgraph
> "Devirtualizing call"
> FAIL: g++.dg/ipa/devirt-13.C -std=gnu++98  scan-ipa-dump cgraph
> "Devirtualizing call"
> 
> on Linux/x86?

This one needs update, since it is foled a lot earlier than before.
I made the change, but it seems I did not include it at commit line
and subsequently reverted.  Here is a patch I will commit shortly.

Index: g++.dg/ipa/devirt-13.C
===
--- g++.dg/ipa/devirt-13.C  (revision 206040)
+++ g++.dg/ipa/devirt-13.C  (working copy)
@@ -1,6 +1,6 @@
 /* { dg-do run } */
 /* Call to foo should be devirtualized because there are no derived types of 
A.  */
-/* { dg-options "-O2 -fdump-ipa-cgraph -fdump-tree-ssa"  } */
+/* { dg-options "-O2 -fdump-tree-ssa"  } */
 namespace {
 class A {
 public:
@@ -16,7 +16,5 @@ main()
   return b->foo();
 }
 
-/* { dg-final { scan-ipa-dump "Devirtualizing call"  "cgraph"  } } */
 /* { dg-final { scan-tree-dump-times "OBJ_TYPE_REF" 0 "ssa"} } */
-/* { dg-final { cleanup-ipa-dump "cgraph" } } */
 /* { dg-final { cleanup-tree-dump "ssa" } } */


Re: [PATCH i386] Enable -freorder-blocks-and-partition

2013-12-17 Thread Teresa Johnson
Thanks for the data. A few questions:

- Do you have the raw data used to generate your pdfs available? Since
you gave me the binaries, if I have the data in terms of exactly what
addresses are being plotted I can correlate with the specific cold
functions via nm. Once I know what cold functions are being hit, I
would then need the .i files and the .gcda files to reproduce the
build.

- I tried running the binaries, but don't have the necessary shared
library dependencies installed on my system:
$ ldd gimp-2.8 | grep found
libgimpwidgets-2.0.so.0 => not found
libgimpconfig-2.0.so.0 => not found
libgimpcolor-2.0.so.0 => not found
libgimpmath-2.0.so.0 => not found
libgimpthumb-2.0.so.0 => not found
libgimpmodule-2.0.so.0 => not found
libgimpbase-2.0.so.0 => not found
libgegl-0.2.so.0 => not found
libbabl-0.1.so.0 => not found

I'll try to get these installed, but the last time I did that in an
attempt to build gimp I had a lot of trouble trying to get the right
versions and get them to build for me - any chance you could build an
archive version of the gimp binary?

Thanks,
Teresa

On Sun, Dec 15, 2013 at 2:19 PM, Martin Liška  wrote:
> On 15 December 2013 23:17, Martin Liška  wrote:
>> Dear Jan and Teresa,
>> Jan was right that I've been using changes which were commited by
>> Teresa and do live in trunk. So the graph with time profile presented
>> in my previous post was really with enabled
>> -freorder-blocks-and-partition. I removed the hack in varasm.c and I
>> do use classic section layout. Please open the following dump
>> (includes PDF graph+html report that shows functions with time profile
>> located in cold section and all -fdump-ipa-all dumps):
>>
>> https://drive.google.com/file/d/0B0pisUJ80pO1YW1QWUFkZjdqME0/edit?usp=sharing
>>
>> Apart from that, I created also PDF graph 
>> (https://drive.google.com/file/d/0B0pisUJ80pO1aHhPWW56dXpLVTQ/edit?usp=sharing)
>>  that
>> shows that time profile is almost perfect for GIMP. I miss just some
>> examples that do not have profile in generate phase.
>>
>> I will merge current trunk and prepare final patch.
>>
>> Are there any other data that you want to be prepared?
>>
>> Martin
>>
>>
>> On 13 December 2013 02:13, Jan Hubicka  wrote:
 On Wed, Dec 11, 2013 at 1:21 AM, Martin Liška  
 wrote:
 > Hello,
 >I prepared a collection of systemtap graphs for GIMP.
 >
 > 1) just my profile-based function reordering: 550 pages
 > 2) just -freorder-blocks-and-partitions: 646 pages
 > 3) just -fno-reorder-blocks-and-partitions: 638 pages
 >
 > Please see attached data.

 Thanks for the data. A few observations/questions:

 With both 1) (your (time-based?) reordering) and 2)
 (-freorder-blocks-and-partitions) there are a fair amount of accesses
 out of the cold section. I'm not seeing so many accesses out of the
 cold section in the apps I am looking at with splitting enabled. In
>>>
>>> I see you already comitted the patch, so perhaps Martin's measurement assume
>>> the pass is off by default?
>>>
>>> I rebuilded GCC with profiledboostrap and with the linkerscript unmapping
>>> text.unlikely.  I get ICE in:
>>> (gdb) bt
>>> #0  diagnostic_set_caret_max_width(diagnostic_context*, int) () at 
>>> ../../gcc/diagnostic.c:108
>>> #1  0x00f68457 in diagnostic_initialize (context=0x18ae000 
>>> , n_opts=n_opts@entry=1290) at 
>>> ../../gcc/diagnostic.c:135
>>> #2  0x0100050e in general_init (argv0=) at 
>>> ../../gcc/toplev.c:1110
>>> #3  toplev_main(int, char**) () at ../../gcc/toplev.c:1922
>>> #4  0x7774cbe5 in __libc_start_main () from /lib64/libc.so.6
>>> #5  0x00f7898d in _start () at ../sysdeps/x86_64/start.S:122
>>>
>>> That is relatively early in startup process. The function seems inlined and
>>> it fails only on second invocation, did not have time to investigate 
>>> further,
>>> yet while without -fprofile-use it starts...
>>>
>>> On our periodic testers I see off-noise improvement in crafty 2200->2300
>>> and regression on Vortex, 2900->2800, plus code size increase.
>>>
>>> Honza



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[committed] Fix up retval1.f90 testcase (PR testsuite/59534)

2013-12-17 Thread Jakub Jelinek
Hi!

Because e5 and f5 share space (like in C union), it is undesirable
to use non-shortcircuited comparisons, because e.g. on alpha we can end up
with denormal exception.

Committed thusly:

2013-12-17  Jakub Jelinek  

PR testsuite/59534
* testsuite/libgomp.fortran/retval1.f90 (e5): Avoid non-shortcircuited
comparisons.

--- libgomp/testsuite/libgomp.fortran/retval1.f90.jj2008-09-05 
12:53:58.0 +0200
+++ libgomp/testsuite/libgomp.fortran/retval1.f90   2013-12-17 
16:08:54.076319770 +0100
@@ -91,8 +91,8 @@ entry e5 (is_f5)
   l = .false.
 !$omp parallel firstprivate (f5, e5) shared (is_f5) num_threads (2) &
 !$omp reduction (.or.:l)
-  l = .not. is_f5 .and. e5 .ne. 8
-  l = l .or. (is_f5 .and. f5 .ne. 6.5)
+  if (.not. is_f5) l = l .or. e5 .ne. 8
+  if (is_f5) l = l .or. f5 .ne. 6.5
   if (omp_get_thread_num () .eq. 0) e5 = 8
   if (omp_get_thread_num () .eq. 1) e5 = 14
   f5 = e5 - 4.5

Jakub


Re: [RFC] libgcov.c re-factoring and offline profile-tool

2013-12-17 Thread Teresa Johnson
On Mon, Dec 16, 2013 at 2:48 PM, Xinliang David Li  wrote:
> Ok -- gcov_write_counter and gcov_write_tag_length are qualified as
> low level primitives for basic gcov format and probably should be kept
> in gcov-io.c.
>
> gcov_rewrite is petty much libgcov runtime implementation details so I
> think it should be moved out. gcov_write_summary is not related to
> gcov low level format either, neither is gcov_seek.  Ok for them to be
> moved?

After looking at these some more, with the idea that gcov-io.c should
encapsulate the lower level IO routines, then I think all of these
(including gcov_rewrite) should remain in gcov-io.c. I think
gcov_write_summary belongs there since all of the other gcov_write_*
are there. And gcov_seek and gcov_rewrite are both adjusting gcov_var
fields to affect the file IO operations. And there are currently no
references to gcov_var within libgcc/libgcov* files.

So I think we should leave the patch as-is. Honza, is the current
patch ok for trunk?

Thanks,
Teresa

>
> thanks,
>
> David
>
>
> On Mon, Dec 16, 2013 at 2:34 PM, Jan Hubicka  wrote:
>>> I think so -- they are private to libgcov.  Honza, what do you think?
>>
>> Hmm, the purpose of gcov-io was to be low level IO library for the basic
>> gcov file format.  I am not sure if gcov_write_tag_length should really 
>> resist
>> in other file than gcov_write_tag.
>>
>> I see a desire to isolate actual stdio calls so one can have replacement 
>> driver
>> for i.e. Linux kernel. For that reason things like gcov_seek and friends 
>> probably
>> should be separated, but what is reason for splitting the file handling 
>> itself?
>>
>> Honza
>>>
>>> thanks,
>>>
>>> David
>>>
>>> On Mon, Dec 16, 2013 at 1:17 PM, Teresa Johnson  
>>> wrote:
>>> > On Mon, Dec 16, 2013 at 12:55 PM, Xinliang David Li  
>>> > wrote:
>>> >> gcov_rewrite function is only needed (and defined) with IN_LIBGCOV.
>>> >> Should it be moved from common file gcov-io.c to libgcov.c?
>>> >
>>> > Possibly. I just looked through gcov-io.c and there are several
>>> > additional functions that are only defined under "#ifdef IN_LIBGCOV"
>>> > and only used in libgcov*c (or each other):
>>> >
>>> > gcov_write_counter
>>> > gcov_write_tag_length
>>> > gcov_write_summary
>>> > gcov_seek
>>> >
>>> > Should they all, plus gcov_rewrite, be moved to libgcov-driver.c?
>>> >
>>> > Teresa
>>> >
>>> >>
>>> >>
>>> >> David
>>> >>
>>> >> On Thu, Dec 12, 2013 at 12:11 PM, Teresa Johnson  
>>> >> wrote:
>>> >>> On Wed, Dec 11, 2013 at 10:05 PM, Teresa Johnson  
>>> >>> wrote:
>>>  On Fri, Dec 6, 2013 at 6:23 AM, Jan Hubicka  wrote:
>>> >> Hi, all
>>> >>
>>> >> This is the new patch for gcov-tool (previously profile-tool).
>>> >>
>>> >> Honza: can you comment on the new merge interface? David posted some
>>> >> comments in an earlier email and we want to know what's your opinion.
>>> >>
>>> >> Test patch has been tested with boostrap, regresssion,
>>> >> profiledbootstrap and SPEC2006.
>>> >>
>>> >> Noticeable changes from the earlier version:
>>> >>
>>> >> 1. create a new file libgcov.h and move libgcov-*.h headers to 
>>> >> libgcov.h
>>> >> So we can included multiple libgcov-*.c without adding new macros.
>>> >>
>>> >> 2. split libgcov.h specific code in gcvo-io.h to libcc/libgcov.h
>>> >> Avoid multiple-page of code under IN_LIBGCOV macro -- this
>>> >> improves the readability.
>>> >>
>>> >> 3. make gcov_var static, and move the definition from gcov-io.h to
>>> >> gcov-io.c. Also
>>> >>move some static functions accessing gcov_var to gcvo-io.c
>>> >> Current code rely on GCOV_LINKAGE tricks to avoid multi-definition. 
>>> >> I don't see
>>> >> a reason that gcov_var needs to exposed as a global.
>>> >>
>>> >> 4. expose gcov_write_strings() and gcov_sync() to gcov_tool usage
>>> >>
>>> >> 5. rename profile-tool to gcov-tool per Honza's suggestion.
>>> >>
>>> >> Thanks,
>>> >
>>> > Hi,
>>> > I did not read in deatil the gcov-tool source itself, but lets first 
>>> > make the interface changes
>>> > needed.
>>> >
>>> >> 2013-11-18  Rong Xu  
>>> >>
>>> >>   * gcc/gcov-io.c (gcov_var): Moved from gcov-io.h and make it 
>>> >> static.
>>> >>   (gcov_position): Move from gcov-io.h
>>> >>   (gcov_is_error): Ditto.
>>> >>   (gcov_rewrite): Ditto.
>>> >>   * gcc/gcov-io.h: Re-factoring. Move gcov_var to gcov-io.h and
>>> >> move the libgcov only part of libgcc/libgcov.h.
>>> >>   * libgcc/libgcov.h: New common header files for libgcov-*.h
>>> >>   * libgcc/Makefile.in: Add dependence to libgcov.h
>>> >>   * libgcc/libgcov-profiler.c: Use libgcov.h
>>> >>   * libgcc/libgcov-driver.c: Ditto.
>>> >>   * libgcc/libgcov-interface.c: Ditto.
>>> >>   * libgcc/libgcov-driver-system.c (allocate_filename_struct): 
>>> >> use
>>> 

Re: [Patch,testsuite] Fix testcases that use bind_pic_locally

2013-12-17 Thread Rainer Orth
Hi Vidya,

> bind_pic_locally is broken for targets that doesn't pass -fPIC/-fpic by
> default [1][2].
>
> One of the suggestions was to have a effective target check called
> bind_pic_locally_ok which checks if bind_pic_locally will work and have it
> included in all the tests that uses bind_pic_locally in dg-add-options [1].

if this patch is to go in, it needs documentation for the new keyword in
gcc/doc/sourcebuild.texi.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[Patch] Fix PR 59527 (assert in cfg fixup with function splitting)

2013-12-17 Thread Teresa Johnson
Add handling to fixup_reorder_chain for a region crossing branch, which
cannot be optimized away (since it is needed to cross the region boundary).
In the case when there is no fallthru for a conditional jump the comments
indicate that this can happen if the conditional jump has side effects and
can't be deleted, in which case a barrier is inserted and no change is
made to the branch. Here, since the branch is region crossing,
it also cannot be eliminated, but the assert was not handling that
case. I fixed by simply adding a check for it to the assert.

fixup_reorder_chain already has some handling for region-crossing branches,
but it was only handling the case where there was both a taken and
fallthru edge. In this case we had no fallthru. The reason was that
the fallthru had been eliminated in an earlier round of cfg
optimizations when going in/out of cfglayout mode during
pro_and_epilogue. The fallthru was an empty block that appears to be
due to switch expansion with the case having a __builtin_unreachable().

Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk?

2013-12-17  Teresa Johnson  

PR gcov-profile/59527
* cfgrtl.c (fixup_reorder_chain): Handle a region-crossing
branch, which can't be eliminated.

Index: cfgrtl.c
===
--- cfgrtl.c(revision 206033)
+++ cfgrtl.c(working copy)
@@ -3736,7 +3736,8 @@ fixup_reorder_chain (void)
  if (!e_fall)
{
  gcc_assert (!onlyjump_p (bb_end_insn)
- || returnjump_p (bb_end_insn));
+ || returnjump_p (bb_end_insn)
+  || (e_taken->flags & EDGE_CROSSING));
  emit_barrier_after (bb_end_insn);
  continue;
}


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[PATCH] Add __int128 test to ubsan

2013-12-17 Thread Marek Polacek
Regtested on x86_64-linux with -m32/-m64.

Ok?

2013-12-17  Marek Polacek  

testsuite/
* c-c++-common/ubsan/overflow-int128.c: New test.

--- gcc/testsuite/c-c++-common/ubsan/overflow-int128.c.mp   2013-12-17 
16:54:28.123468111 +0100
+++ gcc/testsuite/c-c++-common/ubsan/overflow-int128.c  2013-12-17 
18:07:19.539221035 +0100
@@ -0,0 +1,48 @@
+/* { dg-do run } */
+/* { dg-require-effective-target int128 } */
+/* { dg-options "-fsanitize=signed-integer-overflow -Wno-overflow" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
+
+/* 2^127 - 1 */
+#define INT128_MAX (((__int128) 1 << ((__SIZEOF_INT128__ * __CHAR_BIT__) - 1)) 
- 1)
+#define INT128_MIN (-INT128_MAX - 1)
+
+int
+main (void)
+{
+  volatile __int128 i = INT128_MAX;
+  volatile __int128 j = 1;
+  volatile __int128 k = i + j;
+  k = j + i;
+  i++;
+  j = INT128_MAX - 100;
+  j += (1 << 10);
+
+  j = INT128_MIN;
+  i = -1;
+  k = i + j;
+  k = j + i;
+  j--;
+  j = INT128_MIN + 100;
+  j += -(1 << 10);
+
+  i = INT128_MAX;
+  j = 2;
+  k = i * j;
+
+  i = INT128_MIN;
+  i = -i;
+
+  return 0;
+}
+
+/* { dg-output "signed integer overflow: 0x7fff 
\\+ 1 cannot be represented in type '__int128'(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*signed integer overflow: 1 \\+ 
0x7fff cannot be represented in type 
'__int128'(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*signed integer overflow: 
0x7fff \\+ 1 cannot be represented in type 
'__int128'(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*signed integer overflow: 
0x7f9b \\+ 1024 cannot be represented in type 
'__int128'(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*signed integer overflow: -1 \\+ 
0x8000 cannot be represented in type 
'__int128'(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*signed integer overflow: 
0x8000 \\+ -1 cannot be represented in type 
'__int128'(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*signed integer overflow: 
0x8000 \\+ -1 cannot be represented in type 
'__int128'(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*signed integer overflow: 
0x8064 \\+ -1024 cannot be represented in type 
'__int128'(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*signed integer overflow: 
0x7fff \\* 2 cannot be represented in type 
'__int128'(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*negation of 0x8000 cannot 
be represented in type '__int128'; cast to an unsigned type to negate this 
value to itself(\n|\r\n|\r)" } */

Marek


RE: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly Elemental functions) for C

2013-12-17 Thread Iyer, Balaji V
Hi Jakub,
Please see attached patch and my answers to your questions below.

Aldy, I have made a couple changes to #pragma simd routines, can you 
please give me your blessing on those?

Thanks,

Balaji V. Iyer.

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Jakub Jelinek
> Sent: Monday, December 16, 2013 5:01 PM
> To: Iyer, Balaji V; Joseph S. Myers
> Cc: Aldy Hernandez; 'gcc-patches@gcc.gnu.org'
> Subject: Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly
> Elemental functions) for C
> 
> On Mon, Dec 16, 2013 at 09:41:43PM +, Iyer, Balaji V wrote:
> > --- gcc/c/c-parser.c(revision 205759)
> > +++ gcc/c/c-parser.c(working copy)
> > @@ -208,6 +208,12 @@
> >/* True if we are in a context where the Objective-C "Property attribute"
> >   keywords are valid.  */
> >BOOL_BITFIELD objc_property_attr_context : 1;
> > +
> > +  /* Cilk Plus specific parser/lexer information.  */
> > +
> > +  /* Buffer to hold all the tokens from parsing the vector attribute for 
> > the
> > + SIMD-enabled functions (formerly known as elemental functions).
> > + */  vec  *cilk_simd_fn_tokens;
> >  } c_parser;
> 
> Joseph, is this ok for you?
> 
> > +/* Returns true of NAME is an IDENTIFIER_NODE with identiifer "vector,"
> > +   "__vector" or "__vector__."  */
> > +
> > +static bool
> 
> static inline bool
> 

Fixed.

> > +is_cilkplus_vector_p (tree name)
> > +{
> > +  if (flag_enable_cilkplus
> > +  && (simple_cst_equal (name, get_identifier ("vector")) == 1
> > + || simple_cst_equal (name, get_identifier ("__vector")) == 1
> > + || simple_cst_equal (name, get_identifier ("__vector__")) == 1))
> > +return true;
> > +  return false;
> > +}
> 
> Why not just
>   return flag_enable_cilkplus && is_attribute_p ("vector", name); ?

Fixed.

> 
> > +#define CILK_SIMD_FN_CLAUSE_MASK   \
> > +   ( (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_SIMDLEN)
>   \
> > +   | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_LINEAR)
>   \
> > +   | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_UNIFORM)
>   \
> > +   | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_INBRANCH)
>   \
> > +   | (OMP_CLAUSE_MASK_1 <<
> PRAGMA_OMP_CLAUSE_NOTINBRANCH))
> 
> I thought you'd instead add there PRAGMA_CILK_CLAUSE_VECTORLENGTH,
> PRAGMA_CILK_CLAUSE_MASK and PRAGMA_CILK_CLAUSE_NOMASK (or
> similar).
> 

Yes, I renamed them to PRAGMA_CILK_CLAUSE_*

> > +  if (token->type == CPP_NAME
> > + && TREE_CODE (token->value) == IDENTIFIER_NODE)
> > +   if (simple_cst_equal (token->value,
> > + get_identifier ("vectorlength")) == 1)
> 
> Why the simple_cst_equal + get_identifier?  I mean, strcmp on
> IDENTIFIER_POINTER should be cheaper, and done elsewhere in c-parser.c.
> 

Fixed.

> > + {
> > +   if (!c_parser_cilk_clause_vectorlength (parser, NULL, true))
> 
> Why do you parse it here?  Just parse it when parsing other clauses, and only
> during parsing of vectorlength clause create OMP_CLAUSE_SAFELEN out of it
> (after all the verifications).
> 
> > +  if (is_cilk_simd_fn && TREE_CODE (step) == PARM_DECL)
> > +   {
> > + error_at (clause_loc, "using parameters for % step is "
> > +   "not supported in this release");
> > + step = integer_one_node;
> 
> That would be sorry, not error_at.
> 

Fixed.

> >here = c_parser_peek_token (parser)->location;
> > -  c_kind = c_parser_omp_clause_name (parser);
> > +
> > +  if (mask == CILK_SIMD_FN_CLAUSE_MASK)
> 
> Ugh, no.
> 
> > +   c_kind = c_parser_cilk_simd_fn_clause_name (parser);
> > +  else
> > +   c_kind = c_parser_omp_clause_name (parser);
> 
> Always parse it the same, just use PRAGMA_CILK_CLAUSE_* for the Cilk+
> specific clauses.

Well, if  I don't have a different "*_clause_name" function, then I have to 
modify the c_parser_omp_clause_name to add the support for things like 
"vectorlength", "mask and "unmask." Am I right?

Now, if I do that, then if we compile the following 2 lines:

#pragma omp declare simd vectorlength (4)
void foo () 

with -fcilkplus, they will be parsed correctly, when it should give an error.

> >
> >switch (c_kind)
> > {
> > @@ -10933,7 +11092,8 @@
> >   c_name = "aligned";
> >   break;
> > case PRAGMA_OMP_CLAUSE_LINEAR:
> > - clauses = c_parser_omp_clause_linear (parser, clauses);
> > + clauses = c_parser_omp_clause_linear
> > +   (parser, clauses, mask == CILK_SIMD_FN_CLAUSE_MASK);
> 
> Again, this is too ugly.  Perhaps check if (mask &
> PRAGMA_CILK_CLAUSE_VECTORLENGTH) != 0 or similar.
> 

The reason why I do mask == CILK_SIMD_FN_CLAUSE_MASk is to set the bool 
parameter to true if we are compiling the Cilk Plus SIMD-enabled function. This 
bool is checked to give the sorry error that using parameter for step-size is 
not implemented.

>   Jakub
Index: gcc/c-family/c-common.c

[committed] Add new testcase (PR ipa/pr58290)

2013-12-17 Thread Jakub Jelinek
Hi!

Richard fixed this PR recently by adding a fixup_cfg pass again
right after IPA passes, I'm just including a testcase from this PR,
verified on x86_64-linux and verified it fails again if I comment
out the fixup_cfg pass from passes.def.

Committed as obvious to trunk.

2013-12-17  Jakub Jelinek  

PR ipa/58290
* gfortran.dg/pr58290.f90: New test.

--- gcc/testsuite/gfortran.dg/pr58290.f90.jj2013-12-17 18:31:32.710677694 
+0100
+++ gcc/testsuite/gfortran.dg/pr58290.f90   2013-12-17 18:32:03.048508980 
+0100
@@ -0,0 +1,33 @@
+! PR ipa/58290
+! { dg-do compile }
+! { dg-options "-O1 -fipa-pta" }
+
+MODULE pr58290
+  TYPE b
+CHARACTER(10) :: s = ''
+  END TYPE b
+  TYPE c
+TYPE(b) :: d
+  END TYPE c
+  TYPE h
+INTEGER, DIMENSION(:), POINTER :: b
+  END TYPE h
+CONTAINS
+  SUBROUTINE foo(x, y)
+LOGICAL, INTENT(IN) :: x
+TYPE(c), INTENT(INOUT) :: y
+  END SUBROUTINE 
+  FUNCTION bar (g) RESULT (z)
+TYPE(h), INTENT(IN) :: g
+TYPE(c) :: y
+CALL foo (.TRUE., y)
+z = SIZE (g%b)
+  END FUNCTION bar
+  SUBROUTINE baz (g)
+TYPE(h), INTENT(INOUT) :: g
+INTEGER :: i, j
+j = bar(g)
+DO i = 1, j
+ENDDO
+  END SUBROUTINE baz
+END MODULE

Jakub


[patch] libgomp test fixes for FreeBSD

2013-12-17 Thread Andreas Tobler
Hello,

the below patch fixes three unresolved and a FAIL test case in libgomp.
The FAIL test case can be solved with removing the include of the
alloca.h header which is not present on FreeBSD.

On Linux this header will be pulled in via stdlib.h if we define
_GNU_SOURCE. And this we do.

Tested on Linux/x86_64 and FreeBSD11.0 amd64.
On FreeBSD we get all test cases pass. (on Linux too, no regression)

Ok for trunk?

Thanks,
Andreas

2013-12-17  Andreas Tobler  

* testsuite/libgomp.c/affinity-1.c: Remove alloca.h inlcude.
* testsuite/libgomp.c/icv-2.c: Add FreeBSD coverage.
* testsuite/libgomp.c/lock-3.c: Likewise.
* testsuite/libgomp.c/pr48591.c: Likewise.

Index: testsuite/libgomp.c/affinity-1.c
===
--- testsuite/libgomp.c/affinity-1.c(revision 206062)
+++ testsuite/libgomp.c/affinity-1.c(working copy)
@@ -23,7 +23,6 @@
 #define _GNU_SOURCE
 #endif
 #include "config.h"
-#include 
 #include 
 #include 
 #include 
Index: testsuite/libgomp.c/icv-2.c
===
--- testsuite/libgomp.c/icv-2.c (revision 206062)
+++ testsuite/libgomp.c/icv-2.c (working copy)
@@ -1,4 +1,4 @@
-/* { dg-do run { target *-*-linux* *-*-gnu* } } */
+/* { dg-do run { target *-*-linux* *-*-gnu* *-*-freebsd* } } */
 
 #ifndef _GNU_SOURCE
 #define _GNU_SOURCE 1
Index: testsuite/libgomp.c/lock-3.c
===
--- testsuite/libgomp.c/lock-3.c(revision 206062)
+++ testsuite/libgomp.c/lock-3.c(working copy)
@@ -1,4 +1,4 @@
-/* { dg-do run { target *-*-linux* *-*-gnu* } } */
+/* { dg-do run { target *-*-linux* *-*-gnu* *-*-freebsd* } } */
 
 #ifndef _GNU_SOURCE
 #define _GNU_SOURCE 1
Index: testsuite/libgomp.c/pr48591.c
===
--- testsuite/libgomp.c/pr48591.c   (revision 206062)
+++ testsuite/libgomp.c/pr48591.c   (working copy)
@@ -1,5 +1,5 @@
 /* PR middle-end/48591 */
-/* { dg-do run { target i?86-*-linux* i?86-*-gnu* x86_64-*-linux* 
ia64-*-linux* } } */
+/* { dg-do run { target i?86-*-linux* i?86-*-gnu* x86_64-*-linux* 
ia64-*-linux* x86_64-*-freebsd* } } */
 /* { dg-options "-fopenmp" } */
 
 extern void abort (void);


Re: [PATCH] Add __int128 test to ubsan

2013-12-17 Thread Jakub Jelinek
On Tue, Dec 17, 2013 at 06:17:01PM +0100, Marek Polacek wrote:
> Regtested on x86_64-linux with -m32/-m64.
> 
> Ok?
> 
> 2013-12-17  Marek Polacek  
> 
> testsuite/
>   * c-c++-common/ubsan/overflow-int128.c: New test.
> 
> --- gcc/testsuite/c-c++-common/ubsan/overflow-int128.c.mp 2013-12-17 
> 16:54:28.123468111 +0100
> +++ gcc/testsuite/c-c++-common/ubsan/overflow-int128.c2013-12-17 
> 18:07:19.539221035 +0100
> @@ -0,0 +1,48 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target int128 } */
> +/* { dg-options "-fsanitize=signed-integer-overflow -Wno-overflow" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
> +
> +/* 2^127 - 1 */
> +#define INT128_MAX (((__int128) 1 << ((__SIZEOF_INT128__ * __CHAR_BIT__) - 
> 1)) - 1)

Isn't this undefined behavior in C?  I mean, shouldn't you
shift up (unsigned __int128) 1 and only cast to (__int128) at the end?

Jakub


Re: [patch] libgomp test fixes for FreeBSD

2013-12-17 Thread Jakub Jelinek
On Tue, Dec 17, 2013 at 06:42:22PM +0100, Andreas Tobler wrote:
> 2013-12-17  Andreas Tobler  
> 
>   * testsuite/libgomp.c/affinity-1.c: Remove alloca.h inlcude.

In this case please also change the alloca (...) use in the testcase
to __builtin_alloca (...) and update the ChangeLog entry.
Ok with that change.

>   * testsuite/libgomp.c/icv-2.c: Add FreeBSD coverage.
>   * testsuite/libgomp.c/lock-3.c: Likewise.
>   * testsuite/libgomp.c/pr48591.c: Likewise.

Jakub


PR middle-end/35535 part I

2013-12-17 Thread Jan Hubicka
Hi,
PR 35545 has trivial testcase of feedback directed devirtualization:
int main()
{
 int i;
  A* ap = 0;

  for (i = 0; i < 1; i++)
  {
 if (i%7==0)
ap = new A();
 else
ap = new B();
ap->foo();


Here we should devirtualize since B's foo is dominating target and we correctly 
do so.
We however do more, we trace the code into:
  for (i = 0; i < 1; i++)
  {
 if (i%7==0)
{
  ap = new A();
  ap->foo();
}
 else
{
  ap = new B();
  ap->foo();
}
that should allow us to devirtualize completely.  Instead of doing that we get 
stuck
on stupid
  :
  ap_8 = operator new (16);
  ap_8->i = 0;
  ap_8->_vptr.A = &MEM[(void *)&_ZTV1A + 16B];
  _19 = foo;
  PROF_26 = [obj_type_ref] OBJ_TYPE_REF(_19;(struct A)ap_8->0);
  if (PROF_26 == foo)
goto ;
  else
goto ;

  :
  ap_13 = operator new (16);
  MEM[(struct B *)ap_13].D.2237.i = 0;
  MEM[(struct B *)ap_13].b = 0;
  MEM[(struct B *)ap_13].D.2237._vptr.A = &MEM[(void *)&_ZTV1B + 16B];
  _1 = foo;
  PROF_30 = [obj_type_ref] OBJ_TYPE_REF(_1;(struct A)ap_13->0);
  if (PROF_30 == foo)
goto ;
  else
goto ;

There are several reasons for it
 1) most of our passes do not expect OBJ_TYPE_REF in arguments and cowardly
consider it volatile
 2) tracer happens too late and there is no VRP pass to cleanup afterwards
 3) folding machinery expect OBJ_TYPE_REF to be only in argument.

After some consideration I decided to not update gimple_ic to remove 
OBJ_TYPE_REF,
since this is a perfect example where OBJ_TYPE_REF may be useful after inlining:
in a more complex cases a type based devirt may kick in and save a day even
late after unrolling/tracing and other code specialization.

This is first trivial patch to make VRP behaving correctly.
Bootstrapped/regtested x86_64-linux OK?

* tree-vrp.c (extract_range_from_unary_expr_1): Add OBJ_TYPE_REF
Index: tree-vrp.c
===
--- tree-vrp.c  (revision 206040)
+++ tree-vrp.c  (working copy)
@@ -3202,9 +3202,9 @@ extract_range_from_unary_expr_1 (value_r
 }
 
   /* Handle operations that we express in terms of others.  */
-  if (code == PAREN_EXPR)
+  if (code == PAREN_EXPR || code == OBJ_TYPE_REF)
 {
-  /* PAREN_EXPR is a simple copy.  */
+  /* PAREN_EXPR and OBJ_TYPE_REF are simple copies.  */
   copy_value_range (vr, &vr0);
   return;
 }


Re: [PATCH] Introducing SAD (Sum of Absolute Differences) operation to GCC vectorizer.

2013-12-17 Thread Cong Hou
Ping?


thanks,
Cong


On Mon, Dec 2, 2013 at 5:06 PM, Cong Hou  wrote:
> Hi Richard
>
> Could you please take a look at this patch and see if it is ready for
> the trunk? The patch is pasted as a text file here again.
>
> Thank you very much!
>
>
> Cong
>
>
> On Mon, Nov 11, 2013 at 11:25 AM, Cong Hou  wrote:
>> Hi James
>>
>> Sorry for the late reply.
>>
>>
>> On Fri, Nov 8, 2013 at 2:55 AM, James Greenhalgh
>>  wrote:
 On Tue, Nov 5, 2013 at 9:58 AM, Cong Hou  wrote:
 > Thank you for your detailed explanation.
 >
 > Once GCC detects a reduction operation, it will automatically
 > accumulate all elements in the vector after the loop. In the loop the
 > reduction variable is always a vector whose elements are reductions of
 > corresponding values from other vectors. Therefore in your case the
 > only instruction you need to generate is:
 >
 > VABAL   ops[3], ops[1], ops[2]
 >
 > It is OK if you accumulate the elements into one in the vector inside
 > of the loop (if one instruction can do this), but you have to make
 > sure other elements in the vector should remain zero so that the final
 > result is correct.
 >
 > If you are confused about the documentation, check the one for
 > udot_prod (just above usad in md.texi), as it has very similar
 > behavior as usad. Actually I copied the text from there and did some
 > changes. As those two instruction patterns are both for vectorization,
 > their behavior should not be difficult to explain.
 >
 > If you have more questions or think that the documentation is still
 > improper please let me know.
>>>
>>> Hi Cong,
>>>
>>> Thanks for your reply.
>>>
>>> I've looked at Dorit's original patch adding WIDEN_SUM_EXPR and
>>> DOT_PROD_EXPR and I see that the same ambiguity exists for
>>> DOT_PROD_EXPR. Can you please add a note in your tree.def
>>> that SAD_EXPR, like DOT_PROD_EXPR can be expanded as either:
>>>
>>>   tmp = WIDEN_MINUS_EXPR (arg1, arg2)
>>>   tmp2 = ABS_EXPR (tmp)
>>>   arg3 = PLUS_EXPR (tmp2, arg3)
>>>
>>> or:
>>>
>>>   tmp = WIDEN_MINUS_EXPR (arg1, arg2)
>>>   tmp2 = ABS_EXPR (tmp)
>>>   arg3 = WIDEN_SUM_EXPR (tmp2, arg3)
>>>
>>> Where WIDEN_MINUS_EXPR is a signed MINUS_EXPR, returning a
>>> a value of the same (widened) type as arg3.
>>>
>>
>>
>> I have added it, although we currently don't have WIDEN_MINUS_EXPR (I
>> mentioned it in tree.def).
>>
>>
>>> Also, while looking for the history of DOT_PROD_EXPR I spotted this
>>> patch:
>>>
>>>   [autovect] [patch] detect mult-hi and sad patterns
>>>   http://gcc.gnu.org/ml/gcc-patches/2005-10/msg01394.html
>>>
>>> I wonder what the reason was for that patch to be dropped?
>>>
>>
>> It has been 8 years.. I have no idea why this patch is not accepted
>> finally. There is even no reply in that thread. But I believe the SAD
>> pattern is very important to be recognized. ARM also provides
>> instructions for it.
>>
>>
>> Thank you for your comment again!
>>
>>
>> thanks,
>> Cong
>>
>>
>>
>>> Thanks,
>>> James
>>>


Re: Another build!=host fix

2013-12-17 Thread Mike Stump
On Dec 17, 2013, at 5:47 AM, Bernd Edlinger  wrote:
> Ok for trunk?

ENOPATCH?

Re: [PATCH] Support addsub/subadd as non-isomorphic operations for SLP vectorizer.

2013-12-17 Thread Cong Hou
Ping?


thanks,
Cong


On Mon, Dec 2, 2013 at 5:02 PM, Cong Hou  wrote:
> Any comment on this patch?
>
>
> thanks,
> Cong
>
>
> On Fri, Nov 22, 2013 at 11:40 AM, Cong Hou  wrote:
>> On Fri, Nov 22, 2013 at 3:57 AM, Marc Glisse  wrote:
>>> On Thu, 21 Nov 2013, Cong Hou wrote:
>>>
 On Thu, Nov 21, 2013 at 4:39 PM, Marc Glisse  wrote:
>
> On Thu, 21 Nov 2013, Cong Hou wrote:
>
>> While I added the new define_insn_and_split for vec_merge, a bug is
>> exposed: in config/i386/sse.md, [ define_expand "xop_vmfrcz2" ]
>> only takes one input, but the corresponding builtin functions have two
>> inputs, which are shown in i386.c:
>>
>>  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_vmfrczv4sf2,
>> "__builtin_ia32_vfrczss", IX86_BUILTIN_VFRCZSS, UNKNOWN,
>> (int)MULTI_ARG_2_SF },
>>  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_vmfrczv2df2,
>> "__builtin_ia32_vfrczsd", IX86_BUILTIN_VFRCZSD, UNKNOWN,
>> (int)MULTI_ARG_2_DF },
>>
>> In consequence, the ix86_expand_multi_arg_builtin() function tries to
>> check two args but based on the define_expand of xop_vmfrcz2,
>> the content of insn_data[CODE_FOR_xop_vmfrczv4sf2].operand[2] may be
>> incorrect (because it only needs one input).
>>
>> The patch below fixed this issue.
>>
>> Bootstrapped and tested on ax x86-64 machine. Note that this patch
>> should be applied before the one I sent earlier (sorry for sending
>> them in wrong order).
>
>
>
> This is PR 56788. Your patch seems strange to me and I don't think it
> fixes the real issue, but I'll let more knowledgeable people answer.



 Thank you for pointing out the bug report. This patch is not intended
 to fix PR56788.
>>>
>>>
>>> IMHO, if PR56788 was fixed, you wouldn't have this issue, and if PR56788
>>> doesn't get fixed, I'll post a patch to remove _mm_frcz_sd and the
>>> associated builtin, which would solve your issue as well.
>>
>>
>> I agree. Then I will wait until your patch is merged to the trunk,
>> otherwise my patch could not pass the test.
>>
>>
>>>
>>>
 For your function:

 #include 
 __m128d f(__m128d x, __m128d y){
  return _mm_frcz_sd(x,y);
 }

 Note that the second parameter is ignored intentionally, but the
 prototype of this function contains two parameters. My fix is
 explicitly telling GCC that the optab xop_vmfrczv4sf3 should have
 three operands instead of two, to let it have the correct information
 in insn_data[CODE_FOR_xop_vmfrczv4sf3].operand[2] which is used to
 match the type of the second parameter in the builtin function in
 ix86_expand_multi_arg_builtin().
>>>
>>>
>>> I disagree that this is intentional, it is a bug. AFAIK there is no AMD
>>> documentation that could be used as a reference for what _mm_frcz_sd is
>>> supposed to do. The only existing documentations are by Microsoft (which
>>> does *not* ignore the second argument) and by LLVM (which has a single
>>> argument). Whatever we chose for _mm_frcz_sd, the builtin should take a
>>> single argument, and if necessary we'll use 2 builtins to implement
>>> _mm_frcz_sd.
>>>
>>
>>
>> I also only found the one by Microsoft.. If the second argument is
>> ignored, we could just remove it, as long as there is no "standard"
>> that requires two arguments. Hopefully it won't break current projects
>> using _mm_frcz_sd.
>>
>> Thank you for your comments!
>>
>>
>> Cong
>>
>>
>>> --
>>> Marc Glisse


Re: [patch] libgomp test fixes for FreeBSD

2013-12-17 Thread Andreas Tobler
On 17.12.13 18:54, Jakub Jelinek wrote:
> On Tue, Dec 17, 2013 at 06:42:22PM +0100, Andreas Tobler wrote:
>> 2013-12-17  Andreas Tobler  
>>
>>  * testsuite/libgomp.c/affinity-1.c: Remove alloca.h inlcude.
> 
> In this case please also change the alloca (...) use in the testcase
> to __builtin_alloca (...) and update the ChangeLog entry.
> Ok with that change.
> 
>>  * testsuite/libgomp.c/icv-2.c: Add FreeBSD coverage.
>>  * testsuite/libgomp.c/lock-3.c: Likewise.
>>  * testsuite/libgomp.c/pr48591.c: Likewise.


Thank you for the review. Done with r206063.

Andreas

2013-12-17  Andreas Tobler  

* testsuite/libgomp.c/affinity-1.c: Remove alloca.h inlcude.
Replace alloca () with __builtin_alloca ().
* testsuite/libgomp.c/icv-2.c: Add FreeBSD coverage.
* testsuite/libgomp.c/lock-3.c: Likewise.
* testsuite/libgomp.c/pr48591.c: Likewise.



Re: [PATCH] Add __int128 test to ubsan

2013-12-17 Thread Jakub Jelinek
On Tue, Dec 17, 2013 at 07:16:18PM +0100, Marek Polacek wrote:
> On Tue, Dec 17, 2013 at 06:50:24PM +0100, Jakub Jelinek wrote:
> > Isn't this undefined behavior in C?  I mean, shouldn't you
> > shift up (unsigned __int128) 1 and only cast to (__int128) at the end?
> 
> Oh my, how could I.
> 
> Yeah, (__int128) 1 << 127 is UB.  Fixed below, ok now?

Ok.

> 2013-12-17  Marek Polacek  
> 
> testsuite/:
>   * c-c++-common/ubsan/overflow-int128.c: New test.

Jakub


Re: [PATCH] Fix PR 58867: asan and ubsan tests not run for installed testing

2013-12-17 Thread Jakub Jelinek
On Tue, Oct 29, 2013 at 07:56:32AM -0700, Andrew Pinski wrote:
> * lib/ubsan-dg.exp (check_effective_target_fundefined_sanitizer): New 
> function.
> (ubsan_init): Save off ALWAYS_CXXFLAGS.
> (ubsan_finish): Restore ALWAYS_CXXFLAGS correctly.
> * lib/asan-dg.exp (check_effective_target_faddress_sanitizer): Change
> to creating an executable.
> (asan_init): Save off ALWAYS_CXXFLAGS.
> (asan_finish): Restore ALWAYS_CXXFLAGS correctly.
> * gcc.dg/ubsan/ubsan.exp: Don't check the return value of ubsan_init.
> Check check_effective_target_fundefined_sanitizer before running the
> tests.
> * g++.dg/ubsan/ubsan.exp: Likewise.
> * testsuite/gcc.dg/asan/asan.exp: Don't check
> check_effective_target_faddress_sanitizer too early. Don't check the
> return value of asan_init.
> * g++.dg/asan/asan.exp: Likewise.

Try to find out some mailer that doesn't eat tabs?  Once you have tabs,
some of the lines (at least the first one) will be too long.

> --- testsuite/lib/ubsan-dg.exp(revision 204138)
> +++ testsuite/lib/ubsan-dg.exp(working copy)
> @@ -14,6 +14,15 @@
>  # along with GCC; see the file COPYING3.  If not see
>  # .
>  
> +# Return 1 if compilation with -fsanitize=undefined is error-free for trivial
> +# code, 0 otherwise.
> +
> +proc check_effective_target_fundefined_sanitizer {} {

Please rename this to ..._fsanitize_undefined everywhere, and
also *_faddress_sanitizer to *_fsanitize_address.  The history behind
the latter is that the option I think was initially -faddress-sanitizer
and only later got renamed to the current form.

> @@ -60,6 +69,7 @@ proc ubsan_init { args } {
>  global ALWAYS_CXXFLAGS
>  global TOOL_OPTIONS
>  global ubsan_saved_TEST_ALWAYS_FLAGS
> +global usan_saved_ALWAYS_CXXFLAGS

Please spell it as ubsan* instead of usan*.

> --- testsuite/gcc.dg/asan/asan.exp(revision 204138)
> +++ testsuite/gcc.dg/asan/asan.exp(working copy)
> @@ -1,4 +1,4 @@
> -# Copyright (C) 2012-2013 Free Software Foundation, Inc.
> +# Copyright (C) 2012 Free Software Foundation, Inc.

Don't do this.

> --- testsuite/g++.dg/asan/asan.exp(revision 204138)
> +++ testsuite/g++.dg/asan/asan.exp(working copy)
> @@ -1,4 +1,4 @@
> -# Copyright (C) 2012-2013 Free Software Foundation, Inc.
> +# Copyright (C) 2012 Free Software Foundation, Inc.

Nor this.

Otherwise it looks good to me.  There is tsan.exp that will probably need
similar treatment though now.

Jakub


Re: [PATCH] Add __int128 test to ubsan

2013-12-17 Thread Marek Polacek
On Tue, Dec 17, 2013 at 06:50:24PM +0100, Jakub Jelinek wrote:
> Isn't this undefined behavior in C?  I mean, shouldn't you
> shift up (unsigned __int128) 1 and only cast to (__int128) at the end?

Oh my, how could I.

Yeah, (__int128) 1 << 127 is UB.  Fixed below, ok now?

2013-12-17  Marek Polacek  

testsuite/:
* c-c++-common/ubsan/overflow-int128.c: New test.

--- gcc/testsuite/c-c++-common/ubsan/overflow-int128.c.mp   2013-12-17 
16:54:28.123468111 +0100
+++ gcc/testsuite/c-c++-common/ubsan/overflow-int128.c  2013-12-17 
18:54:19.0 +0100
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+/* { dg-require-effective-target int128 } */
+/* { dg-options "-fsanitize=signed-integer-overflow" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
+
+/* 2^127 - 1 */
+#define INT128_MAX (__int128) (((unsigned __int128) 1 << ((__SIZEOF_INT128__ * 
__CHAR_BIT__) - 1)) - 1)
+#define INT128_MIN (-INT128_MAX - 1)
+
+int
+main (void)
+{
+  volatile __int128 i = INT128_MAX;
+  volatile __int128 j = 1;
+  volatile __int128 k = i + j;
+  k = j + i;
+  i++;
+  j = INT128_MAX - 100;
+  j += (1 << 10);
+
+  j = INT128_MIN;
+  i = -1;
+  k = i + j;
+  k = j + i;
+  j--;
+  j = INT128_MIN + 100;
+  j += -(1 << 10);
+
+  i = INT128_MAX;
+  j = 2;
+  k = i * j;
+
+  i = INT128_MIN;
+  i = -i;
+
+  return 0;
+}
+
+/* { dg-output "signed integer overflow: 0x7fff 
\\+ 1 cannot be represented in type '__int128'(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*signed integer overflow: 1 \\+ 
0x7fff cannot be represented in type 
'__int128'(\n|\r\n|\r)" } */

Marek


Re: [RFC] libgcov.c re-factoring and offline profile-tool

2013-12-17 Thread Xinliang David Li
On Tue, Dec 17, 2013 at 7:48 AM, Teresa Johnson  wrote:
> On Mon, Dec 16, 2013 at 2:48 PM, Xinliang David Li  wrote:
>> Ok -- gcov_write_counter and gcov_write_tag_length are qualified as
>> low level primitives for basic gcov format and probably should be kept
>> in gcov-io.c.
>>
>> gcov_rewrite is petty much libgcov runtime implementation details so I
>> think it should be moved out. gcov_write_summary is not related to
>> gcov low level format either, neither is gcov_seek.  Ok for them to be
>> moved?
>
> After looking at these some more, with the idea that gcov-io.c should
> encapsulate the lower level IO routines, then I think all of these
> (including gcov_rewrite) should remain in gcov-io.c. I think
> gcov_write_summary belongs there since all of the other gcov_write_*
> are there. And gcov_seek and gcov_rewrite are both adjusting gcov_var
> fields to affect the file IO operations. And there are currently no
> references to gcov_var within libgcc/libgcov* files.
>
> So I think we should leave the patch as-is.

Sounds fine to me.

David

> Honza, is the current
> patch ok for trunk?
>
> Thanks,
> Teresa
>
>>
>> thanks,
>>
>> David
>>
>>
>> On Mon, Dec 16, 2013 at 2:34 PM, Jan Hubicka  wrote:
 I think so -- they are private to libgcov.  Honza, what do you think?
>>>
>>> Hmm, the purpose of gcov-io was to be low level IO library for the basic
>>> gcov file format.  I am not sure if gcov_write_tag_length should really 
>>> resist
>>> in other file than gcov_write_tag.
>>>
>>> I see a desire to isolate actual stdio calls so one can have replacement 
>>> driver
>>> for i.e. Linux kernel. For that reason things like gcov_seek and friends 
>>> probably
>>> should be separated, but what is reason for splitting the file handling 
>>> itself?
>>>
>>> Honza

 thanks,

 David

 On Mon, Dec 16, 2013 at 1:17 PM, Teresa Johnson  
 wrote:
 > On Mon, Dec 16, 2013 at 12:55 PM, Xinliang David Li  
 > wrote:
 >> gcov_rewrite function is only needed (and defined) with IN_LIBGCOV.
 >> Should it be moved from common file gcov-io.c to libgcov.c?
 >
 > Possibly. I just looked through gcov-io.c and there are several
 > additional functions that are only defined under "#ifdef IN_LIBGCOV"
 > and only used in libgcov*c (or each other):
 >
 > gcov_write_counter
 > gcov_write_tag_length
 > gcov_write_summary
 > gcov_seek
 >
 > Should they all, plus gcov_rewrite, be moved to libgcov-driver.c?
 >
 > Teresa
 >
 >>
 >>
 >> David
 >>
 >> On Thu, Dec 12, 2013 at 12:11 PM, Teresa Johnson  
 >> wrote:
 >>> On Wed, Dec 11, 2013 at 10:05 PM, Teresa Johnson 
 >>>  wrote:
  On Fri, Dec 6, 2013 at 6:23 AM, Jan Hubicka  wrote:
 >> Hi, all
 >>
 >> This is the new patch for gcov-tool (previously profile-tool).
 >>
 >> Honza: can you comment on the new merge interface? David posted some
 >> comments in an earlier email and we want to know what's your 
 >> opinion.
 >>
 >> Test patch has been tested with boostrap, regresssion,
 >> profiledbootstrap and SPEC2006.
 >>
 >> Noticeable changes from the earlier version:
 >>
 >> 1. create a new file libgcov.h and move libgcov-*.h headers to 
 >> libgcov.h
 >> So we can included multiple libgcov-*.c without adding new macros.
 >>
 >> 2. split libgcov.h specific code in gcvo-io.h to libcc/libgcov.h
 >> Avoid multiple-page of code under IN_LIBGCOV macro -- this
 >> improves the readability.
 >>
 >> 3. make gcov_var static, and move the definition from gcov-io.h to
 >> gcov-io.c. Also
 >>move some static functions accessing gcov_var to gcvo-io.c
 >> Current code rely on GCOV_LINKAGE tricks to avoid multi-definition. 
 >> I don't see
 >> a reason that gcov_var needs to exposed as a global.
 >>
 >> 4. expose gcov_write_strings() and gcov_sync() to gcov_tool usage
 >>
 >> 5. rename profile-tool to gcov-tool per Honza's suggestion.
 >>
 >> Thanks,
 >
 > Hi,
 > I did not read in deatil the gcov-tool source itself, but lets first 
 > make the interface changes
 > needed.
 >
 >> 2013-11-18  Rong Xu  
 >>
 >>   * gcc/gcov-io.c (gcov_var): Moved from gcov-io.h and make it 
 >> static.
 >>   (gcov_position): Move from gcov-io.h
 >>   (gcov_is_error): Ditto.
 >>   (gcov_rewrite): Ditto.
 >>   * gcc/gcov-io.h: Re-factoring. Move gcov_var to gcov-io.h and
 >> move the libgcov only part of libgcc/libgcov.h.
 >>   * libgcc/libgcov.h: New common header files for libgcov-*.h
 >>   * libgcc/Makefile.in: Add dependence to libgcov.h
 >>   * lib

Re: [Patch] Fix PR 59527 (assert in cfg fixup with function splitting)

2013-12-17 Thread Steven Bosscher
On Tue, Dec 17, 2013 at 5:38 PM, Teresa Johnson wrote:
> PR gcov-profile/59527
> * cfgrtl.c (fixup_reorder_chain): Handle a region-crossing
> branch, which can't be eliminated.


This is OK. Thanks!

Ciao!
Steven


[Patch, microblaze]: Add __builtin_trap instruction pattern

2013-12-17 Thread Spenser Gilliland
Hi,

Just wanted to make a note that I have tested this patch and it works
for me.

Thanks,
Spenser


Re: PR middle-end/35535 part I

2013-12-17 Thread Jeff Law

On 12/17/13 11:00, Jan Hubicka wrote:



Here we should devirtualize since B's foo is dominating target and we correctly 
do so.
We however do more, we trace the code into:
   for (i = 0; i < 1; i++)
   {
  if (i%7==0)
{
   ap = new A();
   ap->foo();
}
  else
{
   ap = new B();
   ap->foo();
}
that should allow us to devirtualize completely.  Instead of doing that we get 
stuck
on stupid
   :
   ap_8 = operator new (16);
   ap_8->i = 0;
   ap_8->_vptr.A = &MEM[(void *)&_ZTV1A + 16B];
   _19 = foo;
   PROF_26 = [obj_type_ref] OBJ_TYPE_REF(_19;(struct A)ap_8->0);
   if (PROF_26 == foo)
 goto ;
   else
 goto ;

   :
   ap_13 = operator new (16);
   MEM[(struct B *)ap_13].D.2237.i = 0;
   MEM[(struct B *)ap_13].b = 0;
   MEM[(struct B *)ap_13].D.2237._vptr.A = &MEM[(void *)&_ZTV1B + 16B];
   _1 = foo;
   PROF_30 = [obj_type_ref] OBJ_TYPE_REF(_1;(struct A)ap_13->0);
   if (PROF_30 == foo)
 goto ;
   else
 goto ;

There are several reasons for it
  1) most of our passes do not expect OBJ_TYPE_REF in arguments and cowardly
 consider it volatile
  2) tracer happens too late and there is no VRP pass to cleanup afterwards
  3) folding machinery expect OBJ_TYPE_REF to be only in argument.

After some consideration I decided to not update gimple_ic to remove 
OBJ_TYPE_REF,
since this is a perfect example where OBJ_TYPE_REF may be useful after inlining:
in a more complex cases a type based devirt may kick in and save a day even
late after unrolling/tracing and other code specialization.

This is first trivial patch to make VRP behaving correctly.
Bootstrapped/regtested x86_64-linux OK?

* tree-vrp.c (extract_range_from_unary_expr_1): Add OBJ_TYPE_REF

s/Add/Handle.  Please add the PR marker as well.

OK with that trivial nit.

jeff



Re: PR middle-end/35535 part II

2013-12-17 Thread Jeff Law

On 12/17/13 13:45, Jan Hubicka wrote:

Hi,
this patch contines on tract of making us to handle OBJ_TYPE_REF in expressions
more gracefully.  The change in gimple_fold_stmt_to_constant_1 makes
it to skip the wrapper, so valuelize functions of passes never see it,
and in addition OBJ_TYPE_REF is stripped once argument becomes constant.

In gimple-fold we also ask type devirtualization machinery for call target the
same way as we do for call statements.  I think this is rather important, since
still we resolve good part of devirtualization only after inlining.  Often we
however manage to produce a speculative edge and we really do not want to keep
the speculation in code after we found real target.

Bootstrapped/regtested x86_64-linux, OK?

Honza

* gimple-fold.c (fold_gimple_assign): Attempt to devirtualize
OBJ_TYPE_REF.
(gimple_fold_stmt_to_constant_1): Bypass OBJ_TYPE_REF wrappers.
Index: gimple-fold.c
===
--- gimple-fold.c   (revision 206042)
+++ gimple-fold.c   (working copy)
@@ -374,6 +375,30 @@ fold_gimple_assign (gimple_stmt_iterator
if (REFERENCE_CLASS_P (rhs))
  return maybe_fold_reference (rhs, false);

+   else if (TREE_CODE (rhs) == OBJ_TYPE_REF)
+ {
+   tree val = OBJ_TYPE_REF_EXPR (rhs);
+   if (is_gimple_min_invariant (val))
+ return val;
+   else if (flag_devirtualize && virtual_method_call_p (val))
+ {
+   bool final;
+   vec targets
+ = possible_polymorphic_call_targets (val, &final);
+   if (final && targets.length () <= 1)
+ {
+   tree fndecl;
+   if (targets.length () == 1)
+ fndecl = targets[0]->decl;
+   else
+ fndecl = builtin_decl_implicit (BUILT_IN_UNREACHABLE);
+   val = fold_convert (TREE_TYPE (val), fndecl);
+   STRIP_USELESS_TYPE_CONVERSION (val);
+   return val;
+ }
+ }
+
+ }
else if (TREE_CODE (rhs) == ADDR_EXPR)
  {
tree ref = TREE_OPERAND (rhs, 0);
@@ -2525,6 +2550,13 @@ gimple_fold_stmt_to_constant_1 (gimple s

  return build_vector (TREE_TYPE (rhs), vec);
}
+ if (subcode == OBJ_TYPE_REF)
+   {
+ tree val = (*valueize) (OBJ_TYPE_REF_EXPR (rhs));
+ /* If calle is constant, we can fold away the wrapper.  */

s/calle/callee/?

OK with that fix.  Please add reference to the right (35545) PR number 
in the ChangeLog/commit message.


jeff



Re: [Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-17 Thread Allan Sandfeld Jensen
On Tuesday 17 December 2013, Allan Sandfeld Jensen wrote:
> On Monday 16 December 2013, Uros Bizjak wrote:
> > On Mon, Dec 16, 2013 at 10:34 AM, Uros Bizjak  wrote:
> > > On Sun, Dec 15, 2013 at 7:54 PM, Allan Sandfeld Jensen
> > > 
> > >  wrote:
> > >> Hi again
> > >> 
> > >> On Wednesday 11 December 2013, Uros Bizjak wrote:
> > >>> Hello!
> > >>> 
> > >>> > PR gcc/59422
> > >>> > 
> > >>> > This patch extends the supported targets for function multi
> > >>> > versiong to also include Haswell, Silvermont, and the most recent
> > >>> > AMD models. It also prioritizes AVX2 versions over AMD specific
> > >>> > pre-AVX2 versions.
> > >>> 
> > >>> Please add a ChangeLog entry and attach the complete patch. Please
> > >>> also state how you tested the patch, as outlined in the instructions
> > >>> [1].
> > >>> 
> > >>> [1] http://gcc.gnu.org/contribute.html
> > >> 
> > >> Updated patch for better CPU model detection and added ChangeLog.
> > >> 
> > >> The patch has been tested with the attached test.cpp. Verified that it
> > >> doesn't build before the patch, and that it builds after, and verified
> > >> it selects correct versions at runtime based on either CPU model or
> > >> supported ISA (tested on 3 machines: SandyBridge, IvyBridge and Phenom
> > >> II).
> > >> 
> > >> Btw, I couldn't find anything that corresponds to gcc's btver2 arch.
> > >> Is that an old term for what has become the Jaguar architecture?
> > > 
> > > Thanks for the patch!
> > > 
> > > However, you should not change the existing order of enums in
> > > cpuinfo.c (enum processor_vendor, enum processor_types, enum
> > > processor_subtypes, enum processor_features), but new entries should
> > > be added at the end (before *_MAX entry, if exists) of the enum. The
> > > enums (enum processor_features and enum processor_model) in
> > > config/i386/i386.c should mirror these changes. Please see [1].
> > > 
> > > Probably, we should document this in the source...
> > > 
> > > -  {"sandybridge", M_INTEL_COREI7_SANDYBRIDGE},
> > > +  {"corei7-avx", M_INTEL_COREI7_SANDYBRIDGE},
> > > 
> > > Huh... Thanks for catching this. -march=sandybridge is not
> > > recognized...
> > 
> > -  {"sandybridge", M_INTEL_COREI7_SANDYBRIDGE},
> > +  {"corei7-avx", M_INTEL_COREI7_SANDYBRIDGE},
> > +  {"core-avx-i", M_INTEL_COREI7_IVYBRIDGE},
> > +  {"core-avx2", M_INTEL_COREI7_HASWELL},
> > 
> > Ah, no. These names are not intended to be used in -march. We can
> > follow the tradition and use sandybridge, ivybridge and haswell here.
> 
> I had the problem that "arch=corei7-avx" was not recognized as a valid
> property argument until I made that change. I thought it was the intend to
> merge this list of models with the canonical names, but perhaps it is an
> error in the new parameter validation?
> 
Ah, sorry. I think I misremembered the problem. After reviewing the code 
again, I think the only problem is with target("arch=core-avx-i") because it 
is not in the list of architectures (because it is treated as the same 
architecture as corei7-avx presumably).

I will revert the sandybridge name change in the next patch, and make the new 
names match.

`Allan



Re: [PATCH] Builtins handling in IVOPT

2013-12-17 Thread Wei Mi
Ping.

Thanks,
Wei.

On Mon, Dec 9, 2013 at 9:54 PM, Wei Mi  wrote:
> Ping.
>
> Thanks,
> wei.
>
> On Sat, Nov 23, 2013 at 10:46 AM, Wei Mi  wrote:
>> bootstrap and regression of the updated patch pass.
>>
>> On Sat, Nov 23, 2013 at 12:05 AM, Wei Mi  wrote:
>>> On Thu, Nov 21, 2013 at 12:19 AM, Zdenek Dvorak
>>>  wrote:
 Hi,

> This patch works on the intrinsic calls handling issue in IVOPT mentioned 
> here:
> http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01295.html
>
> In find_interesting_uses_stmt, it changes
>
> arg = expr
> __builtin_xxx (arg)
>
> to
>
> arg = expr;
> tmp = addr_expr (mem_ref(arg));
> __builtin_xxx (tmp, ...)

 this looks a bit confusing (and wasteful) to me. It would make more sense 
 to
 just record the argument as USE_ADDRESS and do the rewriting in 
 rewrite_use_address.

 Zdenek
>>>
>>> I updated the patch. The gimple changing part is now moved to
>>> rewrite_use_address. Add support for plain address expr in addition to
>>> reference expr in find_interesting_uses_address.
>>>
>>> bootstrap and testing is going on.
>>>
>>> 2013-11-22  Wei Mi  
>>>
>>> * expr.c (expand_expr_addr_expr_1): Not to split TMR.
>>> (expand_expr_real_1): Ditto.
>>> * targhooks.c (default_builtin_has_mem_ref_p): Default
>>> builtin.
>>> * tree-ssa-loop-ivopts.c (builtin_has_mem_ref_p): New function.
>>> (rewrite_use_address): Add TMR for builtin.
>>> (find_interesting_uses_stmt): Special handling of builtins.
>>> * gimple-expr.c (is_gimple_address): Add handling of TMR.
>>> * gimple-expr.h (is_gimple_addressable): Ditto.
>>> * config/i386/i386.c (ix86_builtin_has_mem_ref_p): New target hook.
>>> (ix86_atomic_assign_expand_fenv): Ditto.
>>> (ix86_expand_special_args_builtin): Special handling of TMR for
>>> builtin.
>>> * target.def (builtin_has_mem_ref_p): New hook.
>>> * doc/tm.texi.in: Ditto.
>>> * doc/tm.texi: Generated.
>>>
>>> 2013-11-22  Wei Mi  
>>>
>>> * gcc.dg/tree-ssa/ivopt_5.c: New test.
>>>
>>> Index: testsuite/gcc.dg/tree-ssa/ivopt_5.c
>>> ===
>>> --- testsuite/gcc.dg/tree-ssa/ivopt_5.c (revision 0)
>>> +++ testsuite/gcc.dg/tree-ssa/ivopt_5.c (revision 0)
>>> @@ -0,0 +1,21 @@
>>> +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
>>> +/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */
>>> +
>>> +/* Make sure only one iv is selected after IVOPT.  */
>>> +
>>> +#include 
>>> +extern __m128i arr[], d[];
>>> +void test (void)
>>> +{
>>> +unsigned int b;
>>> +for (b = 0; b < 1000; b += 2) {
>>> +  __m128i *p = (__m128i *)(&d[b]);
>>> +  __m128i a = _mm_load_si128(&arr[4*b+3]);
>>> +  __m128i v = _mm_loadu_si128(p);
>>> +  v = _mm_xor_si128(v, a);
>>> +  _mm_storeu_si128(p, v);
>>> +}
>>> +}
>>> +
>>> +/* { dg-final { scan-tree-dump-times "PHI >> +/* { dg-final { cleanup-tree-dump "ivopts" } } */
>>> Index: targhooks.c
>>> ===
>>> --- targhooks.c (revision 204792)
>>> +++ targhooks.c (working copy)
>>> @@ -566,6 +566,13 @@ default_builtin_reciprocal (unsigned int
>>>  }
>>>
>>>  bool
>>> +default_builtin_has_mem_ref_p (int built_in_function ATTRIBUTE_UNUSED,
>>> +  int i ATTRIBUTE_UNUSED)
>>> +{
>>> +  return false;
>>> +}
>>> +
>>> +bool
>>>  hook_bool_CUMULATIVE_ARGS_mode_tree_bool_false (
>>> cumulative_args_t ca ATTRIBUTE_UNUSED,
>>> enum machine_mode mode ATTRIBUTE_UNUSED,
>>> Index: expr.c
>>> ===
>>> --- expr.c  (revision 204792)
>>> +++ expr.c  (working copy)
>>> @@ -7467,7 +7467,19 @@ expand_expr_addr_expr_1 (tree exp, rtx t
>>>   tem = fold_build_pointer_plus (tem, TREE_OPERAND (exp, 1));
>>> return expand_expr (tem, target, tmode, modifier);
>>>}
>>> +case TARGET_MEM_REF:
>>> +  {
>>> +   int old_cse_not_expected;
>>> +   addr_space_t as
>>> + = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (TREE_OPERAND (exp, 0;
>>>
>>> +   result = addr_for_mem_ref (exp, as, true);
>>> +   old_cse_not_expected = cse_not_expected;
>>> +   cse_not_expected = true;
>>> +   result = memory_address_addr_space (tmode, result, as);
>>> +   cse_not_expected = old_cse_not_expected;
>>> +   return result;
>>> +  }
>>>  case CONST_DECL:
>>>/* Expand the initializer like constants above.  */
>>>result = XEXP (expand_expr_constant (DECL_INITIAL (exp),
>>> @@ -9526,9 +9538,13 @@ expand_expr_real_1 (tree exp, rtx target
>>>   = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (TREE_OPERAND (exp, 0;
>>> enum insn_code icode;
>>> unsigned int align;
>>> +   int old_cse_not_expected;
>>>

Re: GOMP_target: alignment (was: [gomp4] #pragma omp target* fixes)

2013-12-17 Thread Jakub Jelinek
On Tue, Dec 17, 2013 at 08:21:57PM +0100, Thomas Schwinge wrote:
> On Mon, 16 Dec 2013 17:58:26 +0100, Jakub Jelinek  wrote:
> > I'd indeed prefer if you just used one
> > array, it can be say just uchar array of twice the width, with even chars
> > for alignment and odd for kinds (or vice versa), compared to two arrays
> > it is tiny bit cheaper at the caller side IMHO.
> 
> Like this, for gomp-4_0-branch?  Is hard-coding a shift by eight bits OK,
> or am I to fiddle with CHAR_TYPE_SIZE, and the like?

I think shift by 8 bits is fine, I believe 8 bits is the minimum a char can
have and it is better if you have a constant you can use on both sides.

Jakub


Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly Elemental functions) for C

2013-12-17 Thread Jakub Jelinek
On Tue, Dec 17, 2013 at 09:13:05PM +, Iyer, Balaji V wrote:
> @@ -10418,6 +10528,12 @@
>step = c_parser_expression (parser).value;
>mark_exp_read (step);
>step = c_fully_fold (step, false, NULL);
> +  if (is_cilk_simd_fn && TREE_CODE (step) == PARM_DECL)
> + {
> +   sorry ("using parameters for % step is not supported yet "
> +  "in this release");

I meant actually
sorry ("using parameters for % step is not supported yet");

> @@ -10933,8 +11051,14 @@
> c_name = "aligned";
> break;
>   case PRAGMA_OMP_CLAUSE_LINEAR:
> -   clauses = c_parser_omp_clause_linear (parser, clauses);
> -   c_name = "linear";
> +   {
> + bool is_cilk_simd_fn = false;
> + if ((mask & PRAGMA_CILK_CLAUSE_VECTORLENGTH) == 0)
> +   is_cilk_simd_fn = true;

I don't think this will work.  PRAGMA_CILK_CLAUSE_VECTORLENGTH
is something like 40 I think, so testing whether the mask doesn't
include copyin and default clauses is not what you wanted probably.
What I meant is
  if (((mask >> PRAGMA_CILK_CLAUSE_VECTORLENGTH) & 1) != 0)
is_cilk_simd_fn = true;
(note, for 32-bit HWI targets, omp_clause_mask is a class and not
all arithmetic is actually supported on it, so better limit yourself
to forms used elsewhere already).

> @@ -12754,10 +12882,20 @@
>  c_finish_omp_declare_simd (c_parser *parser, tree fndecl, tree parms,
>  vec clauses)
>  {
> +

Please remove this extra vertical space.

Otherwise looks good to me, just not sure where do you handle processor
clause (or how Cilk+ simd clones specify the ISA they want to use).

Jakub


PR middle-end/35535 part II

2013-12-17 Thread Jan Hubicka
Hi,
this patch contines on tract of making us to handle OBJ_TYPE_REF in expressions
more gracefully.  The change in gimple_fold_stmt_to_constant_1 makes
it to skip the wrapper, so valuelize functions of passes never see it,
and in addition OBJ_TYPE_REF is stripped once argument becomes constant.

In gimple-fold we also ask type devirtualization machinery for call target the
same way as we do for call statements.  I think this is rather important, since
still we resolve good part of devirtualization only after inlining.  Often we
however manage to produce a speculative edge and we really do not want to keep
the speculation in code after we found real target.

Bootstrapped/regtested x86_64-linux, OK?

Honza

* gimple-fold.c (fold_gimple_assign): Attempt to devirtualize
OBJ_TYPE_REF.
(gimple_fold_stmt_to_constant_1): Bypass OBJ_TYPE_REF wrappers.
Index: gimple-fold.c
===
--- gimple-fold.c   (revision 206042)
+++ gimple-fold.c   (working copy)
@@ -374,6 +375,30 @@ fold_gimple_assign (gimple_stmt_iterator
if (REFERENCE_CLASS_P (rhs))
  return maybe_fold_reference (rhs, false);
 
+   else if (TREE_CODE (rhs) == OBJ_TYPE_REF)
+ {
+   tree val = OBJ_TYPE_REF_EXPR (rhs);
+   if (is_gimple_min_invariant (val))
+ return val;
+   else if (flag_devirtualize && virtual_method_call_p (val))
+ {
+   bool final;
+   vec targets
+ = possible_polymorphic_call_targets (val, &final);
+   if (final && targets.length () <= 1)
+ {
+   tree fndecl;
+   if (targets.length () == 1)
+ fndecl = targets[0]->decl;
+   else
+ fndecl = builtin_decl_implicit (BUILT_IN_UNREACHABLE);
+   val = fold_convert (TREE_TYPE (val), fndecl);
+   STRIP_USELESS_TYPE_CONVERSION (val);
+   return val;
+ }
+ }
+
+ }
else if (TREE_CODE (rhs) == ADDR_EXPR)
  {
tree ref = TREE_OPERAND (rhs, 0);
@@ -2525,6 +2550,13 @@ gimple_fold_stmt_to_constant_1 (gimple s
 
  return build_vector (TREE_TYPE (rhs), vec);
}
+ if (subcode == OBJ_TYPE_REF)
+   {
+ tree val = (*valueize) (OBJ_TYPE_REF_EXPR (rhs));
+ /* If calle is constant, we can fold away the wrapper.  */
+ if (is_gimple_min_invariant (val))
+   return val;
+   }
 
   if (kind == tcc_reference)
{


RE: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly Elemental functions) for C

2013-12-17 Thread Iyer, Balaji V
Hi Jakub,
Attached, please find a fixed patch. I have fixed all the issues you 
have mentioned below. I have also answered your questions below. Is this OK for 
trunk/branch?

Here are the ChangeLog entries:
gcc/ChangeLog
2013-12-17  Balaji V. Iyer  

* omp-low.c (simd_clone_clauses_extract): Replaced the string
"cilk simd elemental" with "cilk simd function."

gcc/c-family/ChangeLog
2013-12-17  Balaji V. Iyer  

* c-common.c (c_common_attribute_table): Added "cilk simd function"
attribute.
* c-pragma.h (enum pragma_cilk_clause): Remove.
(enum pragma_omp_clause):  Added the following fields:
PRAGMA_CILK_CLAUSE_NOMASK, PRAGMA_CILK_CLAUSE_MASK,
PRAGMA_CILK_CLAUSE_VECTORLENGTH, PRAGMA_CILK_CLAUSE_NONE,
PRAGMA_CILK_CLAUSE_LINEAR, PRAGMA_CILK_CLAUSE_PRIVATE,
PRAGMA_CILK_CLAUSE_FIRSTPRIVATE, PRAGMA_CILK_CLAUSE_LASTPRIVATE,
PRAGMA_CILK_CLAUSE_UNIFORM.

gcc/c/ChangeLog
2013-12-17  Balaji V. Iyer  

* c-parser.c (struct c_parser::cilk_simd_fn_tokens): Added new field.
(c_parser_declaration_or_fndef): Added a check if cilk_simd_fn_tokens
field in parser is not empty.  If not-empty, call the function
c_parser_finish_omp_declare_simd.
(c_parser_cilk_clause_vectorlength): Modified function to be shared
between SIMD-enabled functions and #pragma simd.  Added new parameter.
(c_parser_cilk_all_clauses): Modified the usage of the function
c_parser_cilk_clause_vectorlength as mentioned above.
(c_parser_cilk_simd_fn_vector_attrs): New function.
(c_finish_cilk_simd_fn_tokens): Likewise.
(is_cilkplus_vector_p): Likewise.
(c_parser_omp_clause_name): Added checking for "vectorlength,"
"nomask," and "mask" strings in clause name.
(c_parser_omp_all_clauses): Added 3 new case statements:
PRAGMA_CILK_CLAUSE_VECTORLENGTH, PRAGMA_CILK_CLAUSE_MASK and
PRAGMA_CILK_CLAUSE_NOMASK.
(c_parser_attributes): Added a cilk_simd_fn_tokens parameter.  Added a
check for vector attribute and if so call the function
c_parser_cilk_simd_fn_vector_attrs.  Also, when Cilk plus is enabled,
called the function c_finish_cilk_simd_fn_tokens.
(c_finish_omp_declare_simd): Added a check if cilk_simd_fn_tokens in
parser field is non-empty.  If so, parse them as you would parse
the omp declare simd pragma.
(c_parser_omp_clause_linear): Added a new bool parm. is_cilk_simd_fn.
Added a check when step is a parameter and flag it as error.
(CILK_SIMD_FN_CLAUSE_MASK): New #define.
(c_parser_cilk_clause_name): Changed pragma_cilk_clause to
pragma_omp_clause.

gcc/cp/ChangeLog
2013-12-17  Balaji V. Iyer  

* parser.c (cp_parser_cilk_simd_clause_name): Changed cilk_clause_name
to omp_clause_name.

gcc/testsuite/ChangeLog 
2013-12-17  Balaji V. Iyer  

* c-c++-common/cilk-plus/SE/ef_test.c: New test.
* c-c++-common/cilk-plus/SE/ef_test2.c: Likewise.
* c-c++-common/cilk-plus/SE/vlength_errors.c: Likewise.
* c-c++-common/cilk-plus/SE/ef_error.c: Likewise.
* c-c++-common/cilk-plus/SE/ef_error2.c: Likewise.
* c-c++-common/cilk-plus/SE/ef_error3.c: Likewise.
* gcc.dg/cilk-plus/cilk-plus.exp: Added calls for the above tests.


Thanks,

Balaji V. Iyer.

> -Original Message-
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Tuesday, December 17, 2013 1:25 PM
> To: Iyer, Balaji V
> Cc: Joseph S. Myers; Aldy Hernandez (al...@redhat.com); 'gcc-
> patc...@gcc.gnu.org'
> Subject: Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly
> Elemental functions) for C
> 
> Hi!
> 
> On Tue, Dec 17, 2013 at 05:23:43PM +, Iyer, Balaji V wrote:
> > +/* Returns name of the next clause in Cilk Plus SIMD-enabled function's
> > +   attribute.
> > +   If the clause is not recognized PRAGMA_OMP_CLAUSE_NONE is
> returned and
> > +   the token is not consumed.  Otherwise appropriate
> pragma_omp_clause is
> > +   returned and the token is consumed.  */
> > +
> > +static pragma_omp_clause
> > +c_parser_cilk_simd_fn_clause_name (c_parser *parser) {
> > +  pragma_omp_clause result = PRAGMA_OMP_CLAUSE_NONE;
> > +
> > +  if (c_parser_next_token_is_not (parser, CPP_NAME))
> > +return result;
> > +
> > +  const char *p = IDENTIFIER_POINTER (c_parser_peek_token
> > + (parser)->value);  if (!strcmp (p, "vectorlength"))
> > +result = PRAGMA_CILK_CLAUSE_VECTORLENGTH;  else if (!strcmp (p,
> > + "uniform"))
> > +result = PRAGMA_CILK_CLAUSE_UNIFORM;  else if (!strcmp (p,
> > + "linear"))
> > +result = PRAGMA_CILK_CLAUSE_LINEAR;  else if (!strcmp (p,
> > + "mask"))
> > +result = PRAGMA_CILK_CLAUSE_MASK;  else if (!strcmp (p,
> > + "nomask"))
> > +result = PRAGMA_CILK_CLAUSE_UNMASK;
> > +
> > +  if (result != PRAGMA_OMP_CLAUSE_NONE)
> > +c_parser_consume_token (parser);
> > +  retur

Re: [Patch, microblaze]: Fix ICE with mhard-float

2013-12-17 Thread Spenser Gilliland
Hi,

Just wanted to say that this patch works for me.

Thanks,
Spenser


[patch] fix .DOT file generation for IPA passes

2013-12-17 Thread Aldy Hernandez
I was trying to generate a graph file with 
-fdump-ipa-tmipa-blocks-details-vops-graph, but the .dot file was 
corrupted.  It looks like the header bits printed in start_graph_dump() 
are not dumped because we are predicating the calls to 
clean_graph_dump_file->start_graph_dump by:


  && cfun && (cfun->curr_properties & PROP_cfg))

The problem is that for IPA passes (well at least for tmipa) cfun is 
NULL so we don't initialize the dump file, but later we go through each 
function (setting cfun appropriately) and dump the corresponding graphs 
somewhere in:


do_per_function (execute_function_dump, NULL);

I have fixed this by adding a bit in opt_pass to keep track of if a 
graph .DOT file has been initialized, and initialize it if not.  I 
suppose we could move initialization of the graph file further down, but 
that seemed a bit more tedious given all the places where we dump.


OK?
commit c5715cee17a20918375277ee602a5f0706138aba
Author: Aldy Hernandez 
Date:   Mon Dec 16 12:37:01 2013 -0800

* passes.c (execute_function_dump): Set graph_dump_initialized
appropriately.
(pass_init_dump_file): Similarly.
(execute_one_pass): Pass new argument to do_per_function.
* tree-pass.h (class opt_pass): New field graph_dump_initialized.

diff --git a/gcc/passes.c b/gcc/passes.c
index f30f159..bc7bf06 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -1640,8 +1640,10 @@ do_per_function_toporder (void (*callback) (void *data), 
void *data)
 /* Helper function to perform function body dump.  */
 
 static void
-execute_function_dump (void *data ATTRIBUTE_UNUSED)
+execute_function_dump (void *data)
 {
+  opt_pass *pass = (opt_pass *)data;
+
   if (dump_file && current_function_decl)
 {
   if (cfun->curr_properties & PROP_trees)
@@ -1655,7 +1657,14 @@ execute_function_dump (void *data ATTRIBUTE_UNUSED)
 
   if ((cfun->curr_properties & PROP_cfg)
  && (dump_flags & TDF_GRAPH))
-   print_graph_cfg (dump_file_name, cfun);
+   {
+ if (!pass->graph_dump_initialized)
+   {
+ clean_graph_dump_file (dump_file_name);
+ pass->graph_dump_initialized = true;
+   }
+ print_graph_cfg (dump_file_name, cfun);
+   }
 }
 }
 
@@ -1936,6 +1945,7 @@ verify_curr_properties (void *data)
 bool
 pass_init_dump_file (opt_pass *pass)
 {
+  pass->graph_dump_initialized = false;
   /* If a dump file name is present, open it if enabled.  */
   if (pass->static_pass_number != -1)
 {
@@ -1950,7 +1960,10 @@ pass_init_dump_file (opt_pass *pass)
   if (initializing_dump
  && dump_file && (dump_flags & TDF_GRAPH)
  && cfun && (cfun->curr_properties & PROP_cfg))
-   clean_graph_dump_file (dump_file_name);
+   {
+ clean_graph_dump_file (dump_file_name);
+ pass->graph_dump_initialized = true;
+   }
   timevar_pop (TV_DUMP);
   return initializing_dump;
 }
@@ -2230,7 +2243,7 @@ execute_one_pass (opt_pass *pass)
 
   verify_interpass_invariants ();
   if (dump_file)
-do_per_function (execute_function_dump, NULL);
+do_per_function (execute_function_dump, pass);
   if (pass->type == IPA_PASS)
 {
   struct cgraph_node *node;
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index b7b43de..d23181a 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -114,6 +114,11 @@ public:
   /* Static pass number, used as a fragment of the dump file name.  */
   int static_pass_number;
 
+  /* When a given dump file is being initialized, this flag is set to
+ true if the corresponding TDF_graph dump file has also been
+ initialized.  */
+  bool graph_dump_initialized;
+
 protected:
   gcc::context *m_ctxt;
 };


Re: [PATCH] Fix PR58944

2013-12-17 Thread Sriraman Tallam
On Fri, Dec 13, 2013 at 5:06 AM, H.J. Lu  wrote:
> On Mon, Dec 2, 2013 at 6:46 PM, Sriraman Tallam  wrote:
>> On Thu, Nov 28, 2013 at 9:36 PM, Bernd Edlinger
>>  wrote:
>>> Hi,
>>>
>>> On Wed, 27 Nov 2013 19:49:39, Uros Bizjak wrote:

 On Mon, Nov 25, 2013 at 10:08 PM, Sriraman Tallam  
 wrote:

> I have attached a patch to fix this bug :
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58944
>
> A similar problem was also reported here:
> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01050.html
>
>
> Recently, ix86_valid_target_attribute_tree in config/i386/i386.c was
> refactored to not depend on global_options structure and to be able to
> use any gcc_options structure. One clean way to fix this is by having
> target_option_default_node save all the default target options which
> can be restored to any gcc_options structure. The root cause of the
> above bugs was that ix86_arch_string and ix86_tune_string was not
> saved in target_option_deault_node in PR58944 and
> ix86_preferred_stack_boundary_arg was not saved in the latter case.
>
> This patch saves all the target options used in i386.opt which are
> either obtained from the command-line or set to some default. Is this
> patch alright?

 Things looks rather complicated, but I see no other solution that save
 and restore the way you propose.

 Please wait 24h if somebody has a different idea, otherwise please go
 ahead and commit the patch to mainline.

>>>
>>> Maybe you should also look at the handling or preferred_stack_boundary_arg
>>> versus incoming_stack_boundary_arg in ix86_option_override_internal:
>>>
>>> Remember ix86_incoming_stack_boundary_arg is defined to
>>> global_options.x_ix86_incoming_stack_boundary_arg.
>>>
>>> like this?
>>>
>>>   if (opts_set->x_ix86_incoming_stack_boundary_arg)
>>> {
>>> -  if (ix86_incoming_stack_boundary_arg
>>> +  if (opts->x_ix86_incoming_stack_boundary_arg
>>>   < (TARGET_64BIT_P (opts->x_ix86_isa_flags) ? 4 : 2)
>>> -  || ix86_incoming_stack_boundary_arg> 12)
>>> + || opts->x_ix86_incoming_stack_boundary_arg> 12)
>>> error ("-mincoming-stack-boundary=%d is not between %d and 12",
>>> -   ix86_incoming_stack_boundary_arg,
>>> +  opts->x_ix86_incoming_stack_boundary_arg,
>>>TARGET_64BIT_P (opts->x_ix86_isa_flags) ? 4 : 2);
>>>   else
>>> {
>>>   ix86_user_incoming_stack_boundary
>>> -= (1 << ix86_incoming_stack_boundary_arg) * BITS_PER_UNIT;
>>> +   = (1 << opts->x_ix86_incoming_stack_boundary_arg) * 
>>> BITS_PER_UNIT;
>>>   ix86_incoming_stack_boundary
>>> = ix86_user_incoming_stack_boundary;
>>> }
>>>
>>
>> Thanks for catching this. I will make this change in the same patch.
>>
>
> Your change caused:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59492

Thanks for fixing this. This is making me wonder if I am missing some
other important flags. Is there a way to detect this? I originally
looked at everything in i386.opt to form my list.

Thanks
Sri

>
>
> --
> H.J.


Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly Elemental functions) for C

2013-12-17 Thread Jakub Jelinek
Hi!

On Tue, Dec 17, 2013 at 05:23:43PM +, Iyer, Balaji V wrote:
> +/* Returns name of the next clause in Cilk Plus SIMD-enabled function's
> +   attribute.
> +   If the clause is not recognized PRAGMA_OMP_CLAUSE_NONE is returned and
> +   the token is not consumed.  Otherwise appropriate pragma_omp_clause is
> +   returned and the token is consumed.  */
> +
> +static pragma_omp_clause
> +c_parser_cilk_simd_fn_clause_name (c_parser *parser)
> +{
> +  pragma_omp_clause result = PRAGMA_OMP_CLAUSE_NONE;
> +
> +  if (c_parser_next_token_is_not (parser, CPP_NAME))
> +return result;
> +  
> +  const char *p = IDENTIFIER_POINTER (c_parser_peek_token (parser)->value);
> +  if (!strcmp (p, "vectorlength"))
> +result = PRAGMA_CILK_CLAUSE_VECTORLENGTH;
> +  else if (!strcmp (p, "uniform"))
> +result = PRAGMA_CILK_CLAUSE_UNIFORM;
> +  else if (!strcmp (p, "linear"))
> +result = PRAGMA_CILK_CLAUSE_LINEAR;
> +  else if (!strcmp (p, "mask"))
> +result = PRAGMA_CILK_CLAUSE_MASK;
> +  else if (!strcmp (p, "nomask"))
> +result = PRAGMA_CILK_CLAUSE_UNMASK;
> +
> +  if (result != PRAGMA_OMP_CLAUSE_NONE)
> +c_parser_consume_token (parser);
> +  return result;
> +}

No, this isn't what I meant.  I meant that you add the 3 new clause names
to c_parser_omp_clause_name (and use PRAGMA_CILK_* for those).

> +  if (token->type == CPP_NAME
> +   && TREE_CODE (token->value) == IDENTIFIER_NODE)   
> + if (!strcmp (IDENTIFIER_POINTER (token->value), "vectorlength"))
> +   {
> + if (!c_parser_cilk_clause_vectorlength (parser, NULL, true))
> +   {
> + c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
> + return;
> +   }
> + else
> +   continue;
> +   }

Why do you need this at all?  I'd expect you just remove this whole if and
the c_parser_cilk_clause_vectorlength function, and instead just parse
vectorlength normally when you see PRAGMA_CILK_CLAUSE_VECTORLENGTH.

> +   sorry ("using parameters for % step is not supported "
> +  "in this release");

... is not supported yet".

> -  c_kind = c_parser_omp_clause_name (parser);
> + 
> +  if (mask == CILK_SIMD_FN_CLAUSE_MASK)
> + c_kind = c_parser_cilk_simd_fn_clause_name (parser);
> +  else
> + c_kind = c_parser_omp_clause_name (parser);

Please revert this.

> @@ -10933,7 +11088,8 @@
> c_name = "aligned";
> break;
>   case PRAGMA_OMP_CLAUSE_LINEAR:
> -   clauses = c_parser_omp_clause_linear (parser, clauses);
> +   clauses = c_parser_omp_clause_linear
> + (parser, clauses, mask == CILK_SIMD_FN_CLAUSE_MASK);

Please test for some particular bit in the mask, not == on the masks.

Jakub


Go patch committed: Use backend interface for runtime errors

2013-12-17 Thread Ian Lance Taylor
This patch from Chris Manghane uses the backend interface to call the
runtime error function.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline.

Ian

diff -r 4108b7cd8ca9 go/expressions.cc
--- a/go/expressions.cc	Mon Dec 16 11:55:05 2013 -0800
+++ b/go/expressions.cc	Tue Dec 17 12:25:44 2013 -0800
@@ -4305,8 +4305,9 @@
 	   expr,
 	   fold_convert(TREE_TYPE(expr),
 			null_pointer_node));
-		tree crash = gogo->runtime_error(RUNTIME_ERROR_NIL_DEREFERENCE,
-		 loc);
+		Expression* crash_expr =
+		gogo->runtime_error(RUNTIME_ERROR_NIL_DEREFERENCE, loc);
+		tree crash = crash_expr->get_tree(context);
 		expr = fold_build2_loc(loc.gcc_location(), COMPOUND_EXPR,
    TREE_TYPE(expr), build3(COND_EXPR,
 			   void_type_node,
@@ -6183,9 +6184,9 @@
 
 	  // __go_runtime_error(RUNTIME_ERROR_DIVISION_BY_ZERO), 0
 	  int errcode = RUNTIME_ERROR_DIVISION_BY_ZERO;
+	  Expression* crash = gogo->runtime_error(errcode, this->location());
 	  tree panic = fold_build2_loc(gccloc, COMPOUND_EXPR, TREE_TYPE(ret),
-   gogo->runtime_error(errcode,
-			   this->location()),
+   crash->get_tree(context),
    fold_convert_loc(gccloc, TREE_TYPE(ret),
 			integer_zero_node));
 
@@ -6975,8 +6976,9 @@
   if (nil_check != NULL)
 {
   tree nil_check_tree = nil_check->get_tree(context);
-  tree crash =
+  Expression* crash_expr =
 	context->gogo()->runtime_error(RUNTIME_ERROR_NIL_DEREFERENCE, loc);
+  tree crash = crash_expr->get_tree(context);
   if (ret_tree == error_mark_node
 	  || nil_check_tree == error_mark_node
 	  || crash == error_mark_node)
@@ -10715,7 +10717,7 @@
 	  : (this->end_ == NULL
 		 ? RUNTIME_ERROR_SLICE_INDEX_OUT_OF_BOUNDS
 		 : RUNTIME_ERROR_SLICE_SLICE_OUT_OF_BOUNDS));
-  tree crash = gogo->runtime_error(code, loc);
+  tree crash = gogo->runtime_error(code, loc)->get_tree(context);
 
   if (this->end_ == NULL)
 {
@@ -11089,7 +11091,7 @@
   int code = (this->end_ == NULL
 	  ? RUNTIME_ERROR_STRING_INDEX_OUT_OF_BOUNDS
 	  : RUNTIME_ERROR_STRING_SLICE_OUT_OF_BOUNDS);
-  tree crash = context->gogo()->runtime_error(code, loc);
+  tree crash = context->gogo()->runtime_error(code, loc)->get_tree(context);
 
   if (this->end_ == NULL)
 {
@@ -11879,8 +11881,9 @@
 		this->expr_,
 		Expression::make_nil(loc),
 		loc);
-  tree crash = context->gogo()->runtime_error(RUNTIME_ERROR_NIL_DEREFERENCE,
-	  loc);
+  Expression* crash_expr =
+  context->gogo()->runtime_error(RUNTIME_ERROR_NIL_DEREFERENCE, loc);
+  tree crash = crash_expr->get_tree(context);
   if (closure_tree == error_mark_node
   || nil_check_tree == error_mark_node
   || crash == error_mark_node)
diff -r 4108b7cd8ca9 go/gogo-tree.cc
--- a/go/gogo-tree.cc	Mon Dec 16 11:55:05 2013 -0800
+++ b/go/gogo-tree.cc	Tue Dec 17 12:25:44 2013 -0800
@@ -2252,30 +2252,6 @@
   return ret;
 }
 
-// Build a call to the runtime error function.
-
-tree
-Gogo::runtime_error(int code, Location location)
-{
-  Type* int32_type = Type::lookup_integer_type("int32");
-  tree int32_type_tree = type_to_tree(int32_type->get_backend(this));
-
-  static tree runtime_error_fndecl;
-  tree ret = Gogo::call_builtin(&runtime_error_fndecl,
-location,
-"__go_runtime_error",
-1,
-void_type_node,
-int32_type_tree,
-build_int_cst(int32_type_tree, code));
-  if (ret == error_mark_node)
-return error_mark_node;
-  // The runtime error function panics and does not return.
-  TREE_NOTHROW(runtime_error_fndecl) = 0;
-  TREE_THIS_VOLATILE(runtime_error_fndecl) = 1;
-  return ret;
-}
-
 // Return a tree for receiving a value of type TYPE_TREE on CHANNEL.
 // TYPE_DESCRIPTOR_TREE is the channel's type descriptor.  This does a
 // blocking receive and returns the value read from the channel.
diff -r 4108b7cd8ca9 go/gogo.cc
--- a/go/gogo.cc	Mon Dec 16 11:55:05 2013 -0800
+++ b/go/gogo.cc	Tue Dec 17 12:25:44 2013 -0800
@@ -3060,6 +3060,19 @@
   this->traverse(&build_recover_thunks);
 }
 
+// Build a call to the runtime error function.
+
+Expression*
+Gogo::runtime_error(int code, Location location)
+{
+  Type* int32_type = Type::lookup_integer_type("int32");
+  mpz_t val;
+  mpz_init_set_ui(val, code);
+  Expression* code_expr = Expression::make_integer(&val, int32_type, location);
+  mpz_clear(val);
+  return Runtime::make_call(Runtime::RUNTIME_ERROR, location, 1, code_expr);
+}
+
 // Look for named types to see whether we need to create an interface
 // method table.
 
diff -r 4108b7cd8ca9 go/gogo.h
--- a/go/gogo.h	Mon Dec 16 11:55:05 2013 -0800
+++ b/go/gogo.h	Tue Dec 17 12:25:44 2013 -0800
@@ -576,7 +576,7 @@
 	   tree rettype, ...);
 
   // Build a call to the runtime error function.
-  tree
+  Expression*
   runtime_error(int code, Location);
 
   // Build a builtin struct with a list of fields.


[committed] Fix MASK_{LOAD,STORE} caused ICE (PR tree-optimization/59523)

2013-12-17 Thread Jakub Jelinek
Hi!

I forgot to update_stmt stmts I've changed.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
committed to trunk as obvious.

2013-12-17  Jakub Jelinek  

PR tree-optimization/59523
* tree-vectorizer.c (fold_loop_vectorized_call): Call update_stmt
on updated stmts.

* gcc.dg/pr59523.c: New test.

--- gcc/tree-vectorizer.c.jj2013-12-10 12:43:21.0 +0100
+++ gcc/tree-vectorizer.c   2013-12-17 16:54:27.584080849 +0100
@@ -369,8 +369,11 @@ fold_loop_vectorized_call (gimple g, tre
 
   update_call_from_tree (&gsi, value);
   FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs)
-FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
-  SET_USE (use_p, value);
+{
+  FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
+   SET_USE (use_p, value);
+  update_stmt (use_stmt);
+}
 }
 
 /* Function vectorize_loops.
--- gcc/testsuite/gcc.dg/pr59523.c.jj   2013-12-17 16:58:35.706806284 +0100
+++ gcc/testsuite/gcc.dg/pr59523.c  2013-12-17 16:58:23.0 +0100
@@ -0,0 +1,17 @@
+/* PR tree-optimization/59523 */
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-additional-options "-mavx2" { target { i?86-*-* x86_64-*-* } } } */
+
+int *
+foo (int a, int *b, int *c, int *d)
+{
+  int i, *r = __builtin_alloca (a * sizeof (int));
+  __builtin_memcpy (r, d, a * sizeof (int));
+  for (i = 0; i < 64; i++)
+c[i] += b[i];
+  for (i = 0; i < a; i++)
+if (r[i] == 0)
+  r[i] = 1;
+  return r;
+}

Jakub


Re: [PATCH] Fix PR58944

2013-12-17 Thread H.J. Lu
On Tue, Dec 17, 2013 at 10:29 AM, Sriraman Tallam  wrote:
> On Fri, Dec 13, 2013 at 5:06 AM, H.J. Lu  wrote:
>> On Mon, Dec 2, 2013 at 6:46 PM, Sriraman Tallam  wrote:
>>> On Thu, Nov 28, 2013 at 9:36 PM, Bernd Edlinger
>>>  wrote:
 Hi,

 On Wed, 27 Nov 2013 19:49:39, Uros Bizjak wrote:
>
> On Mon, Nov 25, 2013 at 10:08 PM, Sriraman Tallam  
> wrote:
>
>> I have attached a patch to fix this bug :
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58944
>>
>> A similar problem was also reported here:
>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01050.html
>>
>>
>> Recently, ix86_valid_target_attribute_tree in config/i386/i386.c was
>> refactored to not depend on global_options structure and to be able to
>> use any gcc_options structure. One clean way to fix this is by having
>> target_option_default_node save all the default target options which
>> can be restored to any gcc_options structure. The root cause of the
>> above bugs was that ix86_arch_string and ix86_tune_string was not
>> saved in target_option_deault_node in PR58944 and
>> ix86_preferred_stack_boundary_arg was not saved in the latter case.
>>
>> This patch saves all the target options used in i386.opt which are
>> either obtained from the command-line or set to some default. Is this
>> patch alright?
>
> Things looks rather complicated, but I see no other solution that save
> and restore the way you propose.
>
> Please wait 24h if somebody has a different idea, otherwise please go
> ahead and commit the patch to mainline.
>

 Maybe you should also look at the handling or preferred_stack_boundary_arg
 versus incoming_stack_boundary_arg in ix86_option_override_internal:

 Remember ix86_incoming_stack_boundary_arg is defined to
 global_options.x_ix86_incoming_stack_boundary_arg.

 like this?

   if (opts_set->x_ix86_incoming_stack_boundary_arg)
 {
 -  if (ix86_incoming_stack_boundary_arg
 +  if (opts->x_ix86_incoming_stack_boundary_arg
   < (TARGET_64BIT_P (opts->x_ix86_isa_flags) ? 4 : 2)
 -  || ix86_incoming_stack_boundary_arg> 12)
 + || opts->x_ix86_incoming_stack_boundary_arg> 12)
 error ("-mincoming-stack-boundary=%d is not between %d and 12",
 -   ix86_incoming_stack_boundary_arg,
 +  opts->x_ix86_incoming_stack_boundary_arg,
TARGET_64BIT_P (opts->x_ix86_isa_flags) ? 4 : 2);
   else
 {
   ix86_user_incoming_stack_boundary
 -= (1 << ix86_incoming_stack_boundary_arg) * BITS_PER_UNIT;
 +   = (1 << opts->x_ix86_incoming_stack_boundary_arg) * 
 BITS_PER_UNIT;
   ix86_incoming_stack_boundary
 = ix86_user_incoming_stack_boundary;
 }

>>>
>>> Thanks for catching this. I will make this change in the same patch.
>>>
>>
>> Your change caused:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59492
>
> Thanks for fixing this. This is making me wonder if I am missing some

-g?

> other important flags. Is there a way to detect this? I originally
> looked at everything in i386.opt to form my list.
>

You should check all options in

./ada/gcc-interface/lang.opt
./c-family/c.opt
./common.opt
./lto/lang.opt
./go/lang.opt
./fortran/lang.opt
./java/lang.opt

to see if any options in them are relevant and should be preserved.

-- 
H.J.


Re: [patch] Fix PR debug/59418

2013-12-17 Thread Jakub Jelinek
On Sun, Dec 15, 2013 at 06:53:07PM +0100, Eric Botcazou wrote:
> 2013-12-15  Eric Botcazou  
> 
>   PR debug/59418
>   * dwarf2cfi.c (dwarf2out_frame_debug_cfa_offset): Fix comment and clean 
>   up implementation.
>   (dwarf2out_frame_debug_cfa_restore): Handle TARGET_DWARF_REGISTER_SPAN.
>   (dwarf2out_frame_debug_expr): Clean up implementation.
> 
> 
> 2013-12-15  Eric Botcazou  
> 
>   * gcc.dg/pr59418.c: New test.

> @@ -1149,18 +1149,14 @@ dwarf2out_frame_debug_cfa_offset (rtx se
>else
>  {
>/* We have a PARALLEL describing where the contents of SRC live.
> -  Queue register saves for each piece of the PARALLEL.  */
> -  int par_index;
> -  int limit;
> +  Adjust the offset for each piece of the PARALLEL.  */
>HOST_WIDE_INT span_offset = offset;
>  
>gcc_assert (GET_CODE (span) == PARALLEL);
>  
> -  limit = XVECLEN (span, 0);
> -  for (par_index = 0; par_index < limit; par_index++)
> +  for (int par_index = 0; par_index < XVECLEN (span, 0); par_index++)

Is it really a good idea to put the XVECLEN into the loop condition?
I mean, the functions that are called in the loop are unlikely pure
and thus the length will need to be uselessly reread for each iteration.

My preference would be to keep the limit hoisted manually before the loop.

>   {
> /* We have a PARALLEL describing where the contents of SRC live.
>Queue register saves for each piece of the PARALLEL.  */
> -   int par_index;
> -   int limit;
> HOST_WIDE_INT span_offset = offset;
>  
> gcc_assert (GET_CODE (span) == PARALLEL);
>  
> -   limit = XVECLEN (span, 0);
> -   for (par_index = 0; par_index < limit; par_index++)
> +   for (int par_index = 0; par_index < XVECLEN (span, 0); par_index++)

And here too.

Otherwise looks good to me.

Jakub


Re: GOMP_target: alignment (was: [gomp4] #pragma omp target* fixes)

2013-12-17 Thread Thomas Schwinge
Hi!

On Mon, 16 Dec 2013 17:58:26 +0100, Jakub Jelinek  wrote:
> I'd indeed prefer if you just used one
> array, it can be say just uchar array of twice the width, with even chars
> for alignment and odd for kinds (or vice versa), compared to two arrays
> it is tiny bit cheaper at the caller side IMHO.

Like this, for gomp-4_0-branch?  Is hard-coding a shift by eight bits OK,
or am I to fiddle with CHAR_TYPE_SIZE, and the like?

While conceptually nicer, using some build_* machinery to actually build
a datatype and initializer for an array of a »struct { char kind; char
alignment; }« would be more difficult, for unclear benefit, so I didn't
look into that.

commit 46002ec0e69e2fbc1f14d2549a5cbb93849c1da1
Author: Thomas Schwinge 
Date:   Tue Dec 17 13:44:46 2013 +0100

Prepare OpenACC memory mapping interface for additional mapping kinds.

gcc/
* omp-low.c (lower_oacc_parallel): Switch kinds array to unsigned
short, and shift alignment description to begin at bit 8.
libgomp/
* libgomp_g.h (GOACC_parallel): Switch kinds array to unsigned
short.
* oacc-parallel.c (GOACC_parallel): Likewise, and catch
unsupported kinds.

diff --git gcc/omp-low.c gcc/omp-low.c
index e0f7d1d..eb755c3 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -8775,7 +8775,7 @@ lower_oacc_parallel (gimple_stmt_iterator *gsi_p, 
omp_context *ctx)
   TREE_ADDRESSABLE (TREE_VEC_ELT (t, 1)) = 1;
   TREE_STATIC (TREE_VEC_ELT (t, 1)) = 1;
   TREE_VEC_ELT (t, 2)
-   = create_tmp_var (build_array_type_nelts (unsigned_char_type_node,
+   = create_tmp_var (build_array_type_nelts (short_unsigned_type_node,
  map_cnt),
  ".omp_data_kinds");
   DECL_NAMELESS (TREE_VEC_ELT (t, 2)) = 1;
@@ -8884,7 +8884,7 @@ lower_oacc_parallel (gimple_stmt_iterator *gsi_p, 
omp_context *ctx)
if (TREE_CODE (s) != INTEGER_CST)
  TREE_STATIC (TREE_VEC_ELT (t, 1)) = 0;
 
-   unsigned char tkind = 0;
+   unsigned short tkind = 0;
switch (OMP_CLAUSE_CODE (c))
  {
  case OMP_CLAUSE_MAP:
@@ -8903,9 +8903,9 @@ lower_oacc_parallel (gimple_stmt_iterator *gsi_p, 
omp_context *ctx)
if (DECL_P (ovar) && DECL_ALIGN_UNIT (ovar) > talign)
  talign = DECL_ALIGN_UNIT (ovar);
talign = ceil_log2 (talign);
-   tkind |= talign << 3;
+   tkind |= talign << 8;
CONSTRUCTOR_APPEND_ELT (vkind, purpose,
-   build_int_cst (unsigned_char_type_node,
+   build_int_cst (short_unsigned_type_node,
   tkind));
if (nc && nc != c)
  c = nc;
diff --git libgomp/libgomp_g.h libgomp/libgomp_g.h
index 394f3a8..34d26f6 100644
--- libgomp/libgomp_g.h
+++ libgomp/libgomp_g.h
@@ -217,6 +217,6 @@ extern void GOMP_teams (unsigned int, unsigned int);
 /* oacc-parallel.c */
 
 extern void GOACC_parallel (int, void (*) (void *), const void *,
-   size_t, void **, size_t *, unsigned char *);
+   size_t, void **, size_t *, unsigned short *);
 
 #endif /* LIBGOMP_G_H */
diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
index 730b83b..bf7b74c 100644
--- libgomp/oacc-parallel.c
+++ libgomp/oacc-parallel.c
@@ -25,12 +25,29 @@
 
 /* This file handles the OpenACC parallel construct.  */
 
+#include "libgomp.h"
 #include "libgomp_g.h"
 
 void
 GOACC_parallel (int device, void (*fn) (void *), const void *openmp_target,
size_t mapnum, void **hostaddrs, size_t *sizes,
-   unsigned char *kinds)
+   unsigned short *kinds)
 {
-  GOMP_target (device, fn, openmp_target, mapnum, hostaddrs, sizes, kinds);
+  unsigned char kinds_[mapnum];
+  size_t i;
+
+  /* TODO.  Eventually, we'll be interpreting all mapping kinds according to
+ the OpenACC semantics; for now we're re-using what is implemented for
+ OpenMP.  */
+  for (i = 0; i < mapnum; ++i)
+{
+  unsigned char kind = kinds[i];
+  unsigned char align = kinds[i] >> 8;
+  if (kind > 4)
+   gomp_fatal ("memory mapping kind %x for %zd is not yet supported",
+   kind, i);
+
+  kinds_[i] = kind | align << 3;
+}
+  GOMP_target (device, fn, openmp_target, mapnum, hostaddrs, sizes, kinds_);
 }


Grüße,
 Thomas


pgp4C1qxNMCqj.pgp
Description: PGP signature


[Google][gcc-4_8] Completed backport of r202818 from trunk to google/gcc-4_8 as r206071.

2013-12-17 Thread Paul Pluzhnikov
Greetings,

I've finished backport of r202818 from trunk to google/gcc-4_8 as r206071.

Thanks,
-- 
Paul Pluzhnikov
Index: libstdc++-v3/include/debug/array
===
--- libstdc++-v3/include/debug/array(revision 206038)
+++ libstdc++-v3/include/debug/array(working copy)
@@ -165,7 +165,10 @@
   at(size_type __n)
   {
if (__n >= _Nm)
- std::__throw_out_of_range(__N("array::at"));
+ std::__throw_out_of_range_fmt(__N("array::at: __n "
+   "(which is %zu) >= _Nm "
+   "(which is %zu)"),
+   __n, _Nm);
return _AT_Type::_S_ref(_M_elems, __n);
   }
 
@@ -175,7 +178,9 @@
// Result of conditional expression must be an lvalue so use
// boolean ? lvalue : (throw-expr, lvalue)
return __n < _Nm ? _AT_Type::_S_ref(_M_elems, __n)
- : (std::__throw_out_of_range(__N("array::at")),
+ : (std::__throw_out_of_range_fmt(__N("array::at: __n (which is %zu) "
+  ">= _Nm (which is %zu)"),
+  __n, _Nm),
 _AT_Type::_S_ref(_M_elems, 0));
   }
 
Index: libstdc++-v3/include/std/bitset
===
--- libstdc++-v3/include/std/bitset (revision 206038)
+++ libstdc++-v3/include/std/bitset (working copy)
@@ -752,7 +752,27 @@
   typedef _Base_bitset<_GLIBCXX_BITSET_WORDS(_Nb)> _Base;
   typedef unsigned long _WordT;
 
+  template
   void
+  _M_check_initial_position(const std::basic_string<_CharT, _Traits, 
_Alloc>& __s,
+   size_t __position) const
+  {
+   if (__position > __s.size())
+ __throw_out_of_range_fmt(__N("bitset::bitset: __position "
+  "(which is %zu) > __s.size() "
+  "(which is %zu)"),
+  __position, __s.size());
+  }
+
+  void _M_check(size_t __position, const char *__s) const
+  {
+   if (__position >= _Nb)
+ __throw_out_of_range_fmt(__N("%s: __position (which is %zu) "
+  ">= _Nb (which is %zu)"),
+  __s, __position, _Nb);
+  }
+
+  void
   _M_do_sanitize() _GLIBCXX_NOEXCEPT
   { 
typedef _Sanitize<_Nb % _GLIBCXX_BITSET_BITS_PER_WORD> __sanitize_type;
@@ -867,9 +887,7 @@
   size_t __position = 0)
: _Base()
{
- if (__position > __s.size())
-   __throw_out_of_range(__N("bitset::bitset initial position "
-"not valid"));
+ _M_check_initial_position(__s, __position);
  _M_copy_from_string(__s, __position,
  std::basic_string<_CharT, _Traits, _Alloc>::npos,
  _CharT('0'), _CharT('1'));
@@ -890,9 +908,7 @@
   size_t __position, size_t __n)
: _Base()
{
- if (__position > __s.size())
-   __throw_out_of_range(__N("bitset::bitset initial position "
-"not valid"));
+ _M_check_initial_position(__s, __position);
  _M_copy_from_string(__s, __position, __n, _CharT('0'), _CharT('1'));
}
 
@@ -904,9 +920,7 @@
   _CharT __zero, _CharT __one = _CharT('1'))
: _Base()
{
- if (__position > __s.size())
-   __throw_out_of_range(__N("bitset::bitset initial position "
-"not valid"));
+ _M_check_initial_position(__s, __position);
  _M_copy_from_string(__s, __position, __n, __zero, __one);
}
 
@@ -1067,8 +1081,7 @@
   bitset<_Nb>&
   set(size_t __position, bool __val = true)
   {
-   if (__position >= _Nb)
- __throw_out_of_range(__N("bitset::set"));
+   this->_M_check(__position, __N("bitset::set"));
return _Unchecked_set(__position, __val);
   }
 
@@ -1092,8 +1105,7 @@
   bitset<_Nb>&
   reset(size_t __position)
   {
-   if (__position >= _Nb)
- __throw_out_of_range(__N("bitset::reset"));
+   this->_M_check(__position, __N("bitset::reset"));
return _Unchecked_reset(__position);
   }
   
@@ -1116,8 +1128,7 @@
   bitset<_Nb>&
   flip(size_t __position)
   {
-   if (__position >= _Nb)
- __throw_out_of_range(__N("bitset::flip"));
+   this->_M_check(__position, __N("bitset::flip"));
return _Unchecked_flip(__position);
   }
   
@@ -1302,8 +1313,7 @@
   bool
   test(size_t __position) const
   {
-   if (__position >= _Nb)
- __throw_out_of_range(__N("bitset::test"));
+   this->_M_check(__position, __N("bitset::test"));
return _Unchecked_test(__po

Re: Two build != host fixes

2013-12-17 Thread Alan Modra
On Tue, Dec 17, 2013 at 01:14:23PM +0100, Bernd Edlinger wrote:
> the reason for this is overwriting GMPINC for the auto-build generation, 
> because
> many test scripts include  which fails now completely (it is not 
> installed,
> I have it in-tree).

Yes, I understand the reason why your setup is failing.  Please try
this patch.

Index: gcc/configure.ac
===
--- gcc/configure.ac(revision 206009)
+++ gcc/configure.ac(working copy)
@@ -1529,8 +1529,13 @@
/* | [A-Za-z]:[\\/]* ) realsrcdir=${srcdir};;
*) realsrcdir=../${srcdir};;
esac
+   # Clearing GMPINC is necessary to prevent host headers being
+   # used by the build compiler.  Defining GENERATOR_FILE stops
+   # system.h from including gmp.h.
CC="${CC_FOR_BUILD}" CFLAGS="${CFLAGS_FOR_BUILD}" \
-   LDFLAGS="${LDFLAGS_FOR_BUILD}" GMPINC="" \
+   CXX="${CXX_FOR_BUILD}" CXXFLAGS="${CXXFLAGS_FOR_BUILD}" \
+   LD="${LD_FOR_BUILD}" LDFLAGS="${LDFLAGS_FOR_BUILD}" \
+   GMPINC="" CPPFLAGS="${CPPFLAGS} -DGENERATOR_FILE" \
${realsrcdir}/configure \
--enable-languages=${enable_languages-all} \
--target=$target_alias --host=$build_alias --build=$build_alias

-- 
Alan Modra
Australia Development Lab, IBM


RE: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly Elemental functions) for C

2013-12-17 Thread Iyer, Balaji V


> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Jakub Jelinek
> Sent: Tuesday, December 17, 2013 4:26 PM
> To: Iyer, Balaji V
> Cc: Joseph S. Myers; Aldy Hernandez (al...@redhat.com); 'gcc-
> patc...@gcc.gnu.org'
> Subject: Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly
> Elemental functions) for C
> 
> On Tue, Dec 17, 2013 at 09:13:05PM +, Iyer, Balaji V wrote:
> > @@ -10418,6 +10528,12 @@
> >step = c_parser_expression (parser).value;
> >mark_exp_read (step);
> >step = c_fully_fold (step, false, NULL);
> > +  if (is_cilk_simd_fn && TREE_CODE (step) == PARM_DECL)
> > +   {
> > + sorry ("using parameters for % step is not supported yet "
> > +"in this release");
> 
> I meant actually
>   sorry ("using parameters for % step is not supported yet");
> 
> > @@ -10933,8 +11051,14 @@
> >   c_name = "aligned";
> >   break;
> > case PRAGMA_OMP_CLAUSE_LINEAR:
> > - clauses = c_parser_omp_clause_linear (parser, clauses);
> > - c_name = "linear";
> > + {
> > +   bool is_cilk_simd_fn = false;
> > +   if ((mask & PRAGMA_CILK_CLAUSE_VECTORLENGTH) == 0)
> > + is_cilk_simd_fn = true;
> 
> I don't think this will work.  PRAGMA_CILK_CLAUSE_VECTORLENGTH is
> something like 40 I think, so testing whether the mask doesn't include copyin
> and default clauses is not what you wanted probably.
> What I meant is
>   if (((mask >> PRAGMA_CILK_CLAUSE_VECTORLENGTH) & 1) != 0)
> is_cilk_simd_fn = true;
> (note, for 32-bit HWI targets, omp_clause_mask is a class and not all
> arithmetic is actually supported on it, so better limit yourself to forms used
> elsewhere already).
> 

I have a better idea.. The where string, if it is "SIMD-enabled functions 
attribute" will indicate that it is a Cilk Plus SIMD-enabled function.  So, if 
I do a check for that, then I don't have to do any of this mask anding.

This is what I am talking about:

  if (where && !strcmp (where, "SIMD-enabled functions attribute"))
is_cilk_simd_fn = false;



> > @@ -12754,10 +12882,20 @@
> >  c_finish_omp_declare_simd (c_parser *parser, tree fndecl, tree parms,
> >vec clauses)
> >  {
> > +
> 
> Please remove this extra vertical space.
> 
> Otherwise looks good to me, just not sure where do you handle processor
> clause (or how Cilk+ simd clones specify the ISA they want to use).

Processor clause is an ICC only clause and thus GCC won't have it. The ISA 
clause (renamed to architecture), I am planning to handle in the next release 
along with parameters for step-size for linear


Can I install this? If so, can I push it to trunk? From what I understood, all 
the #pragma omp declare simd work are pushed into trunk right?

> 
>   Jakub
Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c (revision 205759)
+++ gcc/c-family/c-common.c (working copy)
@@ -771,6 +771,8 @@
  handle_returns_nonnull_attribute, false },
   { "omp declare simd",   0, -1, true,  false, false,
  handle_omp_declare_simd_attribute, false },
+  { "cilk simd function", 0, -1, true,  false, false,
+ handle_omp_declare_simd_attribute, false },
   { "omp declare target", 0, 0, true, false, false,
  handle_omp_declare_target_attribute, false },
   { "bnd_variable_size",  0, 0, true,  false, false,
Index: gcc/c-family/c-pragma.h
===
--- gcc/c-family/c-pragma.h (revision 205759)
+++ gcc/c-family/c-pragma.h (working copy)
@@ -104,20 +104,21 @@
   PRAGMA_OMP_CLAUSE_THREAD_LIMIT,
   PRAGMA_OMP_CLAUSE_TO,
   PRAGMA_OMP_CLAUSE_UNIFORM,
-  PRAGMA_OMP_CLAUSE_UNTIED
+  PRAGMA_OMP_CLAUSE_UNTIED,
+  
+  /* Clauses for Cilk Plus SIMD-enabled function.  */
+  PRAGMA_CILK_CLAUSE_NOMASK,
+  PRAGMA_CILK_CLAUSE_MASK,
+  PRAGMA_CILK_CLAUSE_VECTORLENGTH,
+  PRAGMA_CILK_CLAUSE_NONE = PRAGMA_OMP_CLAUSE_NONE,
+  PRAGMA_CILK_CLAUSE_LINEAR = PRAGMA_OMP_CLAUSE_LINEAR,
+  PRAGMA_CILK_CLAUSE_PRIVATE = PRAGMA_OMP_CLAUSE_PRIVATE,
+  PRAGMA_CILK_CLAUSE_FIRSTPRIVATE = PRAGMA_OMP_CLAUSE_FIRSTPRIVATE,
+  PRAGMA_CILK_CLAUSE_LASTPRIVATE = PRAGMA_OMP_CLAUSE_LASTPRIVATE,
+  PRAGMA_CILK_CLAUSE_REDUCTION = PRAGMA_OMP_CLAUSE_REDUCTION,
+  PRAGMA_CILK_CLAUSE_UNIFORM = PRAGMA_OMP_CLAUSE_UNIFORM
 } pragma_omp_clause;
 
-/* All Cilk Plus #pragma omp clauses.  */
-typedef enum pragma_cilk_clause {
-  PRAGMA_CILK_CLAUSE_NONE = 0,
-  PRAGMA_CILK_CLAUSE_VECTORLENGTH,
-  PRAGMA_CILK_CLAUSE_LINEAR,
-  PRAGMA_CILK_CLAUSE_PRIVATE,
-  PRAGMA_CILK_CLAUSE_FIRSTPRIVATE,
-  PRAGMA_CILK_CLAUSE_LASTPRIVATE,
-  PRAGMA_CILK_CLAUSE_REDUCTION
-} pragma_cilk_clause;
-
 extern struct cpp_reader* parse_in;
 
 /* It's safe to always leave visibility pragma enabled as if
Index: gc

Contents of PO file 'cpplib-4.8.0.pt_BR.po'

2013-12-17 Thread Translation Project Robot


cpplib-4.8.0.pt_BR.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.



New Brazilian Portuguese PO file for 'cpplib' (version 4.8.0)

2013-12-17 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Brazilian Portuguese team of translators.  The file is available at:

http://translationproject.org/latest/cpplib/pt_BR.po

(This file, 'cpplib-4.8.0.pt_BR.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly Elemental functions) for C

2013-12-17 Thread Jakub Jelinek
On Tue, Dec 17, 2013 at 11:38:48PM +, Iyer, Balaji V wrote:
> > What I meant is
> >   if (((mask >> PRAGMA_CILK_CLAUSE_VECTORLENGTH) & 1) != 0)
> > is_cilk_simd_fn = true;
> > (note, for 32-bit HWI targets, omp_clause_mask is a class and not all
> > arithmetic is actually supported on it, so better limit yourself to forms 
> > used
> > elsewhere already).
> > 
> 
> I have a better idea.. The where string, if it is "SIMD-enabled functions
> attribute" will indicate that it is a Cilk Plus SIMD-enabled function. 
> So, if I do a check for that, then I don't have to do any of this mask
> anding.
> 
> This is what I am talking about:
> 
>   if (where && !strcmp (where, "SIMD-enabled functions attribute"))
> is_cilk_simd_fn = false;

But this is more expensive and the string really is meant for diagnostics
messages, so I'd strongly prefer the above mask check instead.
Ok with that change.

> From what I understood, all the #pragma omp declare simd work are pushed into 
> trunk right?

Yes, though I still want to optimize it a little bit (generate thunks
and/or aliases when desirable/possible), but that only affects exported
entry-points for OpenMP, for Cilk+ the code matches more the Intel ABI
paper and generates only one ISA variant (and expects to parse processor
clause for other ISA variants), rather than emitting all 3.

Jakub


Re: PR middle-end/35535 part I

2013-12-17 Thread Tobias Burnus

Am 17.12.2013 21:56, schrieb Jeff Law:

* tree-vrp.c (extract_range_from_unary_expr_1): Add OBJ_TYPE_REF

s/Add/Handle.  Please add the PR marker as well.

OK with that trivial nit.


And the proper PR. I don't think that INVALID C++ PR is the PR you want 
to refer to.


Tobias


Re: PR middle-end/35535 part I

2013-12-17 Thread Jeff Law

On 12/17/13 23:53, Tobias Burnus wrote:

Am 17.12.2013 21:56, schrieb Jeff Law:

* tree-vrp.c (extract_range_from_unary_expr_1): Add OBJ_TYPE_REF

s/Add/Handle.  Please add the PR marker as well.

OK with that trivial nit.


And the proper PR. I don't think that INVALID C++ PR is the PR you want
to refer to.
Yea, I mentioned that for the part II patch.  The right number is 35545 
I think.


jeff



RE: Another build!=host fix

2013-12-17 Thread Bernd Edlinger
Hi,

On Tue, 17 Dec 2013 09:57:10, Mike Strump wrote:
>
> On Dec 17, 2013, at 5:47 AM, Bernd Edlinger  wrote:
>> Ok for trunk?
>
> ENOPATCH?

Ooops -- thanks for catching this.

Again, this time with patch:

there is a small problem with SSIZE_MAX, because it is not always
defined, especially not in gcc/glimits.h, which seems to be the fall-back
if the target fails to have a working limits.h.
 
When I create a cross-compiler for --target=arm-linux-gnueabihf, the
working limits.h is overwritten by fix-includes with a copy of gcc/glimits.h.
Probably because it is not possible to compile the target headers with the build
compiler and produce meaningful test results.
 
However because gcc/glimits.h does not define SSIZE_MAX the following build 
fails with
 
In file included from ../../gcc-4.9-20131215/gcc/config/host-linux.c:21:0:
../../gcc-4.9-20131215/gcc/config/host-linux.c: In function 'int 
linux_gt_pch_use_address(void*, size_t, int, size_t)':
../../gcc-4.9-20131215/gcc/config/host-linux.c:215:43: error: 'SSIZE_MAX' was 
not declared in this scope
   nbytes = read (fd, base, MIN (size, SSIZE_MAX));
   ^
../../gcc-4.9-20131215/gcc/system.h:351:26: note: in definition of macro 'MIN'
 #define MIN(X,Y) ((X) < (Y) ? (X) : (Y))
  ^
 
 
The most simple way to fix this would be to not use SSIZE_MAX
here.
 
Boot-Strapped and regression-tested on X86_64.
Plus cross-build for arm-linux-gnueabihf.
 
Ok for trunk?
 
 
Thanks
Bernd.

patch-host-linux.diff
Description: Binary data