Re: [PATCH] add option to emit more array bounds warnigs

2015-01-13 Thread Martin Uecker

Jeff Law :
> On 01/13/15 17:40, Martin Uecker wrote:
> > Jeff Law :
> >> On 01/13/15 10:34, Martin Uecker wrote:
> >>> Mon, 12 Jan 2015 11:00:44 -0700
> >>> Jeff Law :
>  On 11/11/14 23:13, Martin Uecker wrote:
> >
> > ...
> >
>  Has this patch been bootstrapped and regression tested, if so on what
>  platform.
> >>>
> >>> x86_64-unknown-linux-gnu
> >> Approved.  Please install on the trunk.  Sorry about the delays.
> >
> > I don't have write access ;-(
> I fixed up the ChangeLog entries and installed the patch for you.

Thank you, Jeff!

> If you plan to contribute regularly, you should go ahead and apply for 
> write access to the repository so that you'll be able to commit your own 
> patches once they're approved.

I put a request in with you as sponsor (hope this is ok).

> You'll also need to make sure you have an assignment on file with the 
> FSF.That patch was pretty small (the testcase was larger than the 
> patch itself, which I always like :-) so I didn't request an assignment. 
>   Further submissions likely will require an assignment.

I already have an assignment on file.

Martin



Re: [PATCH][rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1

2015-01-13 Thread Jeff Law

On 01/13/15 11:55, Eric Botcazou wrote:



(1) we have a non-paradoxical subreg;
(2) both (reg:ymode xregno) and (reg:xmode xregno) occupy full
 hard registers (no padding or unused upper bits);
(3) (reg:ymode xregno) and (reg:xmode xregno) store the same number
 of bytes (X) in each constituent hard register;
(4) the offset is a multiple of X, i.e. the data we're accessing
 is aligned to a register boundary; and
(5) endianness is regular (no differences between words and bytes,
 or between registers and memory)


OK, that's a nice translation of the new code. :-)

It seems to me that the patch wants to extend the support of generic subregs
to modes whose sizes are not multiple of each other, which is a requirement of
the existing code, but does that in a very specific case for the sake of the
ARM port without saying where all the above restrictions come from.
Basically we're lifting the restriction that the the sizes are multiples 
of each other.  The requirements above are the set where we know it will 
work.  They are target independent, but happen to match what the ARM needs.


The certainly do short circuit the meat of the function, that's the 
whole point, there's this set of conditions under which we know this 
will work and when they hold, we bypass.


Now one could argue that instead of bypassing we should put the code to 
handle this situation further down.  I'd be leery of doing that just 
from a complexity standpoint.  But one could also argue that short 
circuiting like the patch does adds complexity as well and may be a bit 
kludgy.


Maybe the way forward here is for someone to try and integrate this 
support in the main part of the code and see how it looks.  Then we can 
pick one.


The downside is since this probably isn't a regression that work would 
need to happen quickly to make it into gcc-5.


Which leads to another option, get the release managers to sign off on 
the kludge after gcc-5 branches and only install the kludge on the gcc-5 
branch and insisting the other solution go in for gcc-6 and beyond.  Not 
sure if they'd do that, but it's a discussion that could happen.



jeff


Re: [PATCH] Allow MIPS call-saved-{4-6}.c tests to correctly run for micromips

2015-01-13 Thread Richard Sandiford
"Maciej W. Rozycki"  writes:
> On Tue, 13 Jan 2015, Matthew Fortune wrote:
>
>> > >> I have tested this for both mips and micromips, and the tests now
>> > >> pass successfully.
>> > >> The ChangeLog and patch are below.
>> > >
>> > >  Hmm, instead of trying to avoid testing microMIPS code generation
>> > > just to satisfy the test suite I'd rather see the test cases updated
>> > > so that LWM/SWM register ranges are expected and accepted whenever
>> > > microMIPS code is produced.  These scan patterns can be made
>> > conditional.
>> > 
>> > FWIW I think Andrew's patch is correct.  If we want to test microMIPS
>> > output against micromips-specific regexps, we should add a separate test
>> > that forces micromips, so that it gets tested regardless of people's
>> > RUNTESTFLAGS.  Doing that shouldn't hold up Andrew's patch though.
>
>  Taking care that the default compilation mode does not conflict (e.g. 
> MIPS16, incompatible) and taking any exceptions into account (e.g. n64, 
> unsupported) I presume, right?

mips.exp sorts that out for you.  Adding "-mmicromips" or "(-micromips)"
to dg-options forces (or at least is supposed to force) the overall flags
to be compatible with microMIPS.

The aim of mips.exp is avoid skipping tests whereever possible.  If
someone runs the testsuite with -mips16 and we have a -micromips test,
it's better to remove -mips16 for that test than to skip the test entirely.

>> I was going to suggest a follow up patch to add copies of the three tests
>> as Richard suggests. I haven't yet done a micromips run of the testsuite
>> to check for any other issues like this but I suspect problems are limited
>> to the tests that I recently added.
>
>  Please always try to test changes reasonably, i.e. at least o32, 
> o32/MIPS16, o32/microMIPS, n32, n64, and then Linux and ELF if applicable, 
> plus any options that may be relevant, unless it is absolutely clear 
> ABI/ISA variations do not matter for a change proposed.

TBH this seems a bit much.  On the one hand it's more testing than you'd
get for almost any other target, but on the other it leaves out important
differences like MIPS I vs MIPS II vs MIPS 32, MIPS III vs MIPS IV vs MIPS64,
r1 vs. r2 vs. r6, Octeon vs. Loongson vs. vanilla, DSP vs. no DSP, etc.
I think we just have to accept that there are so many possible
combinations that we can't test everything that's potentially relevant.
I think it's more useful to be flexible than prescribe a particular list.

Having everyone test the same multilib combinations on the same target
isn't necessarily a good thing anyway.  Diversity in testing (between
developers) is useful too.

Thanks,
Richard


Re: [PATCH][rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1

2015-01-13 Thread Jeff Law

On 01/10/15 06:05, Richard Sandiford wrote:

Sorry for the slow response.  Jeff has approved the patch in the
meantime, but I didn't want to go ahead and apply it while there
was still disagreement...
Thanks.  I didn't realize there was a disagreement when I approved. 
Let's continue to hash this out a bit in the hopes that we can all get 
to a place where we're comfortable with the final change, whatever it 
happens to me.


jeff



Re: [PATCH] Correct target selector in -mfentry tests

2015-01-13 Thread Jeff Law

On 01/13/15 14:27, H.J. Lu wrote:

-fprofile -mfentry works with PIE if gcrt1.o is compiled with -fPIC.  A
glibc has been filed, PR 17836, and a glibc patch has been submitted.
OK for trunk?

Thanks.


H.J.
--
* gcc.target/i386/fentry-override.c: Properly place {} in target
selector.  Remove nonpic.
* gcc.target/i386/fentry.c: Likewise.
Does this change the pass/fail result of the test on a system without an 
updated glibc?


jeff


Re: [PATCH] PR59448 - Promote consume to acquire

2015-01-13 Thread Jeff Law

On 01/13/15 15:56, Andrew MacLeod wrote:

On 01/13/2015 02:06 PM, Andrew MacLeod wrote:

On 01/13/2015 01:38 PM, Torvald Riegel wrote:

On Tue, 2015-01-13 at 10:11 -0500, Andrew MacLeod wrote:

On 01/13/2015 09:59 AM, Richard Biener wrote:

On Tue, Jan 13, 2015 at 3:56 PM, Andrew MacLeod
 wrote:

Lengthy discussion :
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

Basically we can generate incorrect code for an atomic consume
operation in
some circumstances.  The general feeling seems to be that we
should simply
promote all consume operations to an acquire operation until there
is a
better definition/understanding of the consume model and how GCC
can track
it.

I proposed a simple patch in the PR, and I have not seen or heard
of any
dissenting opinion.   We should get this in before the end of
stage 3 I
think.

The problem with the patch in the PR is the  memory model is
immediately
promoted from consume to acquire.   This happens *before* any of the
memmodel checks are made.  If a consume is illegally specified
(such as in a
compare_exchange), it gets promoted to acquire and the compiler
doesn't
report the error because it never sees the consume.

This new patch simply makes the adjustment after any errors are
checked on
the originally specified model.   It bootstraps on
x86_64-unknown-linux-gnu
and passes all regression testing.
I also built an aarch64 compiler and it appears to issue the LDAR as
specified in the PR, but anyone with a vested interest really
ought to check
it out with a real build to be sure.

OK for trunk?

Why not patch get_memmodel?  (not sure if that catches all cases)

Richard.



That was the original patch.

The issue is that it  promotes consume to acquire before any error
checking gets to look at the model, so then we allow illegal
specification of consume. (It actually triggers a failure in the
testsuite)

(This is this test: gcc/testsuite/gcc.dg/atomic-invalid.c)

The documentation of the atomic builtins also disallows mo_consume on
atomic_exchange.

However, I don't see any such requirement in C11 or C++14 (and I'd be
surprised to see it in C++11).  It would be surprising also because for
other atomic read-modify-write operations (eg, fetch_add), we don't make
such a requirement in the builtins docs -- and atomic_exchange is just a
read-modify-write with a noop, basically.

Does anyone remember why this requirement for no consume on exchange was
added, or sees a reason to keep it?  If not, I think we should drop it.
This would solve the testsuite failure for Andrew.  Dropping it would
prevent GCC from checking the consume-on-success / acquire-on-failure
case for compare_excahnge I mentioned previously, but I think that this
is pretty harmless.

I could imagine that, for some reason, either backends or libatomic do
not implement consume on atomic_exchange just because the docs
disallowed it -- but I haven't checked that.


I imagine it was probably in a previous incarnation of the
standard...  Most of this was actually implemented  based on very
early draft standards years and years ago and never revised.   It
wasnt put in by me unless the standard at some point said had such
wording.   The current standard appears to make no mention of the
situation.

It seems that it should be safe to move back to the original patch,
and remove that error test for using consume on an exchange...

Andrew

Here's the original patch along with the lien removed from the testcase.
x86_64-unknown-linux-gnu bootstraps, no regressions, and so forth.

OK for trunk?

-ENOPATCH

However, I can get it from the BZ and it's OK assuming you also fixup 
the one testcase we've discussed on this thread.


Jeff



Re: [PATCH] [AArch64, NEON] Improve vpmaxX & vpminX intrinsics

2015-01-13 Thread Yangfei (Felix)
> On 09/12/14 08:17, Yangfei (Felix) wrote:
> >> On 28 November 2014 at 09:23, Yangfei (Felix) 
> wrote:
> >>> Hi,
> >>>This patch converts vpmaxX & vpminX intrinsics to use builtin
> >>> functions
> >> instead of the previous inline assembly syntax.
> >>>Regtested with aarch64-linux-gnu on QEMU.  Also passed the
> >>> glorious
> >> testsuite of Christophe Lyon.
> >>>OK for the trunk?
> >>
> >> Hi Felix,   We know from experience that the advsimd intrinsics tend
> >> to be fragile for big endian and in general it is fairly easy to
> >> break the big endian case.  For these advsimd improvements that you
> >> are working on (that we very much appreciate) it is important to run
> >> both little endian and big endian regressions.
> >>
> >> Thanks
> >> /Marcus
> >
> >
> > Okay.  Any plan for the advsimd big-endian improvement?
> > I rebased this patch over Alan Lawrance's patch:
> > https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00279.html
> > No regressions for aarch64_be-linux-gnu target too.  OK for the thunk?
> >
> >
> > Index: gcc/ChangeLog
> >
> =
> ==
> > --- gcc/ChangeLog   (revision 218464)
> > +++ gcc/ChangeLog   (working copy)
> > @@ -1,3 +1,18 @@
> > +2014-12-09  Felix Yang  
> > +
> > +   * config/aarch64/aarch64-simd.md
> (aarch64_p): New
> > +   pattern.
> > +   * config/aarch64/aarch64-simd-builtins.def (smaxp, sminp, umaxp,
> > +   uminp, smax_nanp, smin_nanp): New builtins.
> > +   * config/aarch64/arm_neon.h (vpmax_s8, vpmax_s16, vpmax_s32,
> > +   vpmax_u8, vpmax_u16, vpmax_u32, vpmaxq_s8, vpmaxq_s16,
> vpmaxq_s32,
> > +   vpmaxq_u8, vpmaxq_u16, vpmaxq_u32, vpmax_f32, vpmaxq_f32,
> vpmaxq_f64,
> > +   vpmaxqd_f64, vpmaxs_f32, vpmaxnm_f32, vpmaxnmq_f32,
> vpmaxnmq_f64,
> > +   vpmaxnmqd_f64, vpmaxnms_f32, vpmin_s8, vpmin_s16, vpmin_s32,
> vpmin_u8,
> > +   vpmin_u16, vpmin_u32, vpminq_s8, vpminq_s16, vpminq_s32,
> vpminq_u8,
> > +   vpminq_u16, vpminq_u32, vpmin_f32, vpminq_f32, vpminq_f64,
> vpminqd_f64,
> > +   vpmins_f32, vpminnm_f32, vpminnmq_f32, vpminnmq_f64,
> > + vpminnmqd_f64,
> > +
> 
> 
> >   __extension__ static __inline float32x2_t __attribute__
> > ((__always_inline__))
> > Index: gcc/config/aarch64/aarch64-simd.md
> >
> =
> ==
> > --- gcc/config/aarch64/aarch64-simd.md  (revision 218464)
> > +++ gcc/config/aarch64/aarch64-simd.md  (working copy)
> > @@ -1017,6 +1017,28 @@
> > DONE;
> >   })
> >
> > +;; Pairwise Integer Max/Min operations.
> > +(define_insn "aarch64_p"
> > + [(set (match_operand:VDQ_BHSI 0 "register_operand" "=w")
> > +   (unspec:VDQ_BHSI [(match_operand:VDQ_BHSI 1
> "register_operand" "w")
> > +(match_operand:VDQ_BHSI 2 "register_operand"
> "w")]
> > +   MAXMINV))]
> > + "TARGET_SIMD"
> > + "p\t%0., %1., %2."
> > +  [(set_attr "type" "neon_minmax")]
> > +)
> > +
> 
> Hi Felix,
> 
> Sorry for the delay in getting back to you on this.
> 
> If you've rolled aarch64_reduc__internalv2si into the above
> pattern, do you still need it? For all its call points, just point them to
> aarch64_p?
> 
> Thanks,
> Tejas.
> 


Hello Tejas,

  I didn't do this yet. 
  Currently the aarch64_reduc__internalv2si is only called by 
reduc__scal_. 
  I find it kind of trouble to handle this due to the use of iterators in the 
caller pattern. 
  Are you going to rework this part? 


Re: [PATCH] add option to emit more array bounds warnigs

2015-01-13 Thread Jeff Law

On 01/13/15 17:40, Martin Uecker wrote:

Jeff Law :

On 01/13/15 10:34, Martin Uecker wrote:

Mon, 12 Jan 2015 11:00:44 -0700
Jeff Law :

On 11/11/14 23:13, Martin Uecker wrote:


...


Has this patch been bootstrapped and regression tested, if so on what
platform.


x86_64-unknown-linux-gnu

Approved.  Please install on the trunk.  Sorry about the delays.


I don't have write access ;-(

I fixed up the ChangeLog entries and installed the patch for you.

If you plan to contribute regularly, you should go ahead and apply for 
write access to the repository so that you'll be able to commit your own 
patches once they're approved.


You'll also need to make sure you have an assignment on file with the 
FSF.That patch was pretty small (the testcase was larger than the 
patch itself, which I always like :-) so I didn't request an assignment. 
 Further submissions likely will require an assignment.


Thanks,
jeff



Re: [PATCH] Reenable CSE of non-volatile inline asm (PR rtl-optimization/63637)

2015-01-13 Thread Jeff Law

On 01/13/15 09:18, Jakub Jelinek wrote:

Hi!

My PR60663 fix unfortunately stopped CSE of all inline-asms, even when
they e.g. only have the clobbers added by default.

This patch attempts to restore the old behavior, with the exceptions:
1) as always, asm volatile is not CSEd
2) inline-asm with multiple outputs are not CSEd
3) on request from Richard (which Segher on IRC argues against), "memory"
clobber also prevents CSE; this can be removed by removing the
int j, lim = XVECLEN (x, 0); and loop below it
4) inline-asm with clobbers is never copied into an insn that wasn't
inline-asm before, so if there are clobbers, we allow CSEing of
e.g. two same inline-asms, but only by reusing results of one
of those

Bootstrapped/regtested on x86_64-linux and i686-linux, tested also
with arm cross after reverting the PR60663 arm cost fix.

Ok for trunk this way, or with 3) removed?

2015-01-13  Jakub Jelinek  

PR rtl-optimization/63637
PR rtl-optimization/60663
* cse.c (merge_equiv_classes): Set new_elt->cost to MAX_COST
if elt->cost is MAX_COST for ASM_OPERANDS.
(find_sets_in_insn): Fix up comment typo.
(cse_insn): Don't set src_volatile for all non-volatile
ASM_OPERANDS in PARALLELs, but just those with multiple outputs
or with "memory" clobber.  Set elt->cost to MAX_COST
for ASM_OPERANDS in PARALLEL.  Set src_elt->cost to MAX_COST
if new_src is ASM_OPERANDS and elt->cost is MAX_COST.

* gcc.dg/pr63637-1.c: New test.
* gcc.dg/pr63637-2.c: New test.
* gcc.dg/pr63637-3.c: New test.
* gcc.dg/pr63637-4.c: New test.
* gcc.dg/pr63637-5.c: New test.
* gcc.dg/pr63637-6.c: New test.
* gcc.target/i386/pr63637-1.c: New test.
* gcc.target/i386/pr63637-2.c: New test.
* gcc.target/i386/pr63637-3.c: New test.
* gcc.target/i386/pr63637-4.c: New test.
* gcc.target/i386/pr63637-5.c: New test.
* gcc.target/i386/pr63637-6.c: New test.

--- gcc/cse.c.jj2015-01-09 21:59:44.0 +0100
+++ gcc/cse.c   2015-01-13 13:26:23.391216064 +0100
@@ -1792,6 +1792,8 @@ merge_equiv_classes (struct table_elt *c
}
  new_elt = insert (exp, class1, hash, mode);
  new_elt->in_memory = hash_arg_in_memory;
+ if (GET_CODE (exp) == ASM_OPERANDS && elt->cost == MAX_COST)
+   new_elt->cost = MAX_COST;
}
  }
  }
@@ -4258,7 +4260,7 @@ find_sets_in_insn (rtx_insn *insn, struc
  {
int i, lim = XVECLEN (x, 0);

-  /* Go over the epressions of the PARALLEL in forward order, to
+  /* Go over the expressions of the PARALLEL in forward order, to
 put them in the same order in the SETS array.  */
for (i = 0; i < lim; i++)
{
@@ -4634,12 +4636,27 @@ cse_insn (rtx_insn *insn)
  && REGNO (dest) >= FIRST_PSEUDO_REGISTER)
sets[i].src_volatile = 1;

-  /* Also do not record result of a non-volatile inline asm with
-more than one result or with clobbers, we do not want CSE to
-break the inline asm apart.  */
else if (GET_CODE (src) == ASM_OPERANDS
   && GET_CODE (x) == PARALLEL)
-   sets[i].src_volatile = 1;
+   {
+ /* Do not record result of a non-volatile inline asm with
+more than one result.  */
+ if (n_sets > 1)
+   sets[i].src_volatile = 1;
+
+ int j, lim = XVECLEN (x, 0);
+ for (j = 0; j < lim; j++)
+   {
+ rtx y = XVECEXP (x, 0, j);
+ /* And do not record result of a non-volatile inline asm
+with "memory" clobber.  */
+ if (GET_CODE (y) == CLOBBER && MEM_P (XEXP (y, 0)))
Can you please add a comment here which references the full form of the 
"memory" tag.  (clobber (mem:BLK (scratch))).


If we ever have to look at this again (say perhaps to break out the read 
anything vs write anything into separate tags :-) it'll save 
considerable time and angst trying to track all this stuff down.


The tests you've got are a step forward, but there's obviously a lot 
more we could do.  For example testing DSE around ASMs without and 
without a memory "clobber", testing CSE of unrelated memory references 
around an ASM without and without a memory clobber come to mind.   You 
don't have to add them to get approval, but if you were to take the time 
to cobble them together it'd be hugely appreciated.



Given the discussion with Segher, let's give him a chance to chime in on 
tonight's messages before we make a final decision.


jeff



[PATCH, nds32] Committed: Remove some features that are not available yet in nds32 port of GNU binutils package.

2015-01-13 Thread Chung-Ju Wu
Hi, all,

The nds32 target supports two features, fp-as-gp and ex9, designed
for code size optimizations.  They are majorly performed by linker
so that compiler is merely to give some hints or directives with
-mforce-fp-as-gp, -mforbid-fp-as-gp, and -mex9 options.

However, those two features are not available yet in the current
nds32 port of GNU binutils package.  For consistency concern, I think
it would be better to remove them from gcc trunk as well for now.

Committed as Rev. 219576: https://gcc.gnu.org/r219576


gcc/ChangeLog

2015-01-14  Chung-Ju Wu  

* config/nds32/nds32.opt (mforce-fp-as-gp): Remove.
(mforbid-fp-as-gp): Remove.
(mex9): Remove.
* config/nds32/nds32-fp-as-gp.c (nds32_have_prologue_p): Remove.
(nds32_symbol_load_store_p): Remove.
(nds32_fp_as_gp_check_available): Clean up implementation.
* config/nds32/nds32.h (LINK_SPEC): Remove -mforce-as-gp and -mex9
cases.
* config/nds32/nds32.c (nds32_asm_file_start): No need to consider
fp-as-gp and ex9 cases.


Best regards,
jasonwucj


0010-Remove-some-features-that-are-not-available-yet-in-n.patch
Description: Binary data


Re: [PATCH] Reenable CSE of non-volatile inline asm (PR rtl-optimization/63637)

2015-01-13 Thread Jeff Law

On 01/13/15 17:03, Segher Boessenkool wrote:

On Tue, Jan 13, 2015 at 03:17:08PM -0700, Jeff Law wrote:

And finally there is the case of non-volatile asm with "memory" clobber
with
no memory stores in between the two - the posted (safer) patch will not
allow to CSE the two, while in theory we could CSE them into just one asm.

I think we have to assume that CSEing them is wrong.  The first may set
something in memory that is read by the second.

Thoughts?


I agree with pretty much everything you say in the thread, except for this
idea that a memory clobber reads memory.  No clobber reads anything.

The commit that introduced the memory clobber concept, 426b38c9 (svn 1207),
by rms, has as only comment

/* `memory', don't cache memory across asm */
RMS botched this and you can see it in that the scheduler was not 
updated at the same time.  The scheduler absolutely must track if an ASM 
does a memory read of an arbitrary location.  I'd have to dig deeper to 
see when this got fixed, but it was clearly botched.



Many years later another pass which needs to precisely track such things 
came along, namely DSE.


The code in DSE is actually easier to grok.


First, if you look at the ASM handling in cfgexpand.c you'll find:

if (j == -4)  /* `memory', don't cache memory across asm */
{
  XVECEXP (body, 0, i++)
= gen_rtx_CLOBBER (VOIDmode,
   gen_rtx_MEM
   (BLKmode,
gen_rtx_SCRATCH (VOIDmode)));
  continue;
}


So we generate (CLOBBER (MEM:BLK (SCRATCH))) when we see "memory" in the 
"clobber" list of an ASM.



If you then look at dse.c we have this in record_store:

 /* At this point we know mem is a mem. */
  if (GET_MODE (mem) == BLKmode)
{
  if (GET_CODE (XEXP (mem, 0)) == SCRATCH)
{
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, " adding wild read for (clobber 
(mem:BLK (scratch))\n");

  add_wild_read (bb_info);
  insn_info->cannot_delete = true;
  return 0;
}


Which says very precisely that we treat (CLOBBER (MEM:BLK (SCRATCH))) as 
potentially *reading* any location.


If you trace through how the scheduler builds dependencies, paying 
particular attention to alias.c you'll see that (CLOBBER (MEM:BLK 
(SCRATCH))) is treated as both a read and a write of an arbitrary location.


It's unfortunate that RMS put the "memory" tag in the "clobber" list. 
But he really wasn't a compiler junkie and didn't realize the right 
thing to do was to have a memory tag in both the inputs and 
[output|clobber] section to represent a read of an arbitrary location 
and a write to an arbitrary location independently.  But it is what it 
is at this point and we have to treat "memory" appearing in the 
"clobber" list as an arbitrary memory read and an arbitrary memory write.




Jeff




Re: [PATCH/expand] PR64011 Adjust bitsize when partial overflow happen for big-endian

2015-01-13 Thread Jeff Law

On 01/13/15 15:42, Joseph Myers wrote:

On Tue, 13 Jan 2015, Jeff Law wrote:


In many ways having the compiler or assembler spitting out an error here is
preferable to silently compiling the code.  That would also help explain why


As usual, an error is incorrect in such a case that only has undefined
behavior at runtime (but it may be compiled into an abort if the behavior
is unconditionally undefined, and the abort doesn't replace anything
before the undefined behavior that might have stopped the undefined
behavior from occurring).
You are, of course, correct.  We can't error here, but we can generate a 
conditional warning.


jeff


Patch ping...

2015-01-13 Thread Jan Hubicka
Hi,
I would like to ping the patch to fix divergence between a type and its main 
variant introduced by C++ FE.
https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01202.html

Honza


Drop workaround for old binutils linker plugin bug

2015-01-13 Thread Jan Hubicka
Hi,
this workaround actually triggers bug in quite recent golds, so it seems
to be good motivation to finally drop it.  The bug is long fixed.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

Honza

* tree-profile.c (init_ic_make_global_vars): Drop workaround
for bintuils bug 14342.
(init_ic_make_global_vars): Likewise.
(gimple_init_edge_profiler): Likewise.
(gimple_gen_ic_func_profiler): Likewise.
Index: tree-profile.c
===
--- tree-profile.c  (revision 219571)
+++ tree-profile.c  (working copy)
@@ -105,30 +105,15 @@ init_ic_make_global_vars (void)
 
   ptr_void = build_pointer_type (void_type_node);
 
-  /* Workaround for binutils bug 14342.  Once it is fixed, remove lto path.  */
-  if (flag_lto)
-{
-  ic_void_ptr_var
-   = build_decl (UNKNOWN_LOCATION, VAR_DECL,
- get_identifier ("__gcov_indirect_call_callee_ltopriv"),
- ptr_void);
-  TREE_PUBLIC (ic_void_ptr_var) = 1;
-  DECL_COMMON (ic_void_ptr_var) = 1;
-  DECL_VISIBILITY (ic_void_ptr_var) = VISIBILITY_HIDDEN;
-  DECL_VISIBILITY_SPECIFIED (ic_void_ptr_var) = true;
-}
-  else
-{
-  ic_void_ptr_var
-   = build_decl (UNKNOWN_LOCATION, VAR_DECL,
- get_identifier (
- (PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ?
-  "__gcov_indirect_call_topn_callee" :
-  "__gcov_indirect_call_callee")),
- ptr_void);
-  TREE_PUBLIC (ic_void_ptr_var) = 1;
-  DECL_EXTERNAL (ic_void_ptr_var) = 1;
-}
+  ic_void_ptr_var
+= build_decl (UNKNOWN_LOCATION, VAR_DECL,
+ get_identifier (
+ (PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ?
+  "__gcov_indirect_call_topn_callee" :
+  "__gcov_indirect_call_callee")),
+ ptr_void);
+  TREE_PUBLIC (ic_void_ptr_var) = 1;
+  DECL_EXTERNAL (ic_void_ptr_var) = 1;
   TREE_STATIC (ic_void_ptr_var) = 1;
   DECL_ARTIFICIAL (ic_void_ptr_var) = 1;
   DECL_INITIAL (ic_void_ptr_var) = NULL;
@@ -138,30 +123,16 @@ init_ic_make_global_vars (void)
   varpool_node::finalize_decl (ic_void_ptr_var);
 
   gcov_type_ptr = build_pointer_type (get_gcov_type ());
-  /* Workaround for binutils bug 14342.  Once it is fixed, remove lto path.  */
-  if (flag_lto)
-{
-  ic_gcov_type_ptr_var
-   = build_decl (UNKNOWN_LOCATION, VAR_DECL,
- get_identifier ("__gcov_indirect_call_counters_ltopriv"),
- gcov_type_ptr);
-  TREE_PUBLIC (ic_gcov_type_ptr_var) = 1;
-  DECL_COMMON (ic_gcov_type_ptr_var) = 1;
-  DECL_VISIBILITY (ic_gcov_type_ptr_var) = VISIBILITY_HIDDEN;
-  DECL_VISIBILITY_SPECIFIED (ic_gcov_type_ptr_var) = true;
-}
-  else
-{
-  ic_gcov_type_ptr_var
-   = build_decl (UNKNOWN_LOCATION, VAR_DECL,
- get_identifier (
- (PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ?
-  "__gcov_indirect_call_topn_counters" :
-  "__gcov_indirect_call_counters")),
- gcov_type_ptr);
-  TREE_PUBLIC (ic_gcov_type_ptr_var) = 1;
-  DECL_EXTERNAL (ic_gcov_type_ptr_var) = 1;
-}
+
+  ic_gcov_type_ptr_var
+= build_decl (UNKNOWN_LOCATION, VAR_DECL,
+ get_identifier (
+ (PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ?
+  "__gcov_indirect_call_topn_counters" :
+  "__gcov_indirect_call_counters")),
+ gcov_type_ptr);
+  TREE_PUBLIC (ic_gcov_type_ptr_var) = 1;
+  DECL_EXTERNAL (ic_gcov_type_ptr_var) = 1;
   TREE_STATIC (ic_gcov_type_ptr_var) = 1;
   DECL_ARTIFICIAL (ic_gcov_type_ptr_var) = 1;
   DECL_INITIAL (ic_gcov_type_ptr_var) = NULL;
@@ -230,33 +201,18 @@ gimple_init_edge_profiler (void)
 
   init_ic_make_global_vars ();
 
-  /* Workaround for binutils bug 14342.  Once it is fixed, remove lto 
path.  */
-  if (flag_lto)
-{
- /* void (*) (gcov_type, void *)  */
- ic_profiler_fn_type
-  = build_function_type_list (void_type_node,
- gcov_type_ptr, gcov_type_node,
- ptr_void, ptr_void,
- NULL_TREE);
- tree_indirect_call_profiler_fn
- = build_fn_decl ("__gcov_indirect_call_profiler",
-ic_profiler_fn_type);
-}
-  else
-{
- /* void (*) (gcov_type, void *)  */
- ic_profiler_fn_type
-  = build_function_type_list (void_type_node,
- gcov_type_node,
- ptr_void,
- 

[PATCH, AArch64] Fix abitest for ilp32

2015-01-13 Thread Hurugalawadi, Naveen
Hi,

Please find attached the patch that fixes abitest for ilp32.

"testfunc_ptr" is a 32bit pointer in ILP32 but is being loaded as 64bit.

Hence some of the func-ret testcases FAIL's for ILP32.

Please review the patch and let us know if its okay?

Regression tested on aarch64-elf.

Thanks,
Naveen

gcc/testsuite

2015-01-15  Andrew Pinski  
Naveen H.S  

* gcc.target/aarch64/aapcs64/abitest.S (LABEL_TEST_FUNC_RETURN): Load
testfunc_ptr as 32bit for ILP32 and 64bit for LP64.--- gcc/testsuite/ChangeLog	2015-01-14 10:00:59.524914610 +0530
+++ gcc/testsuite/ChangeLog	2015-01-14 10:21:20.928932740 +0530
@@ -1,3 +1,9 @@
+2015-01-15  Andrew Pinski  
+	Naveen H.S  
+
+	* gcc.target/aarch64/aapcs64/abitest.S (LABEL_TEST_FUNC_RETURN): Load
+	testfunc_ptr as 32bit for ILP32 and 64bit for LP64.
+
 2015-01-13  David Malcolm  
 
 	* jit.dg/harness.h (set_up_logging): Move string concatenation
--- gcc/testsuite/gcc.target/aarch64/aapcs64/abitest.S	2015-01-14 09:37:46.368893934 +0530
+++ gcc/testsuite/gcc.target/aarch64/aapcs64/abitest.S	2015-01-14 10:13:08.456925431 +0530
@@ -2,6 +2,13 @@
 	.global myfunc
 	.type dumpregs,%function
 	.type myfunc,%function
+
+#ifdef __LP64__
+#define PTR_REG(n)  x##n
+#else
+#define PTR_REG(n)  w##n
+#endif
+
 dumpregs:
 myfunc:
   mov	x16, sp
@@ -48,7 +55,7 @@ myfunc:
 LABEL_TEST_FUNC_RETURN:
   adrp	x9, testfunc_ptr
   add	x9, x9, :lo12:testfunc_ptr
-  ldr	x9, [x9, #0]
+  ldr  PTR_REG(9), [x9, #0]
   blr	x9// function return value test
   adrp	x9, saved_return_address
   add	x9, x9, :lo12:saved_return_address


[PATCH] Fix PR c++/16160

2015-01-13 Thread Patrick Palka
This patch fixes the above PR where it was reported that the C++
frontend does not reject the malformed class declaration

struct X<5>;

Instead of rejecting it, the FE treats this declaration as if it were a
forward declaration of a template specialization, i.e. as if it were
written

template<> struct X<5>;

First off, the FE should reject the declaration because it is malformed
(not 100% sure, though).  Second, since the user probably intended to
have written an explicit template instantiation (as in the PR), the FE
should suggest adding "template" before such a declaration, that is the
declaration

struct X<5>; // error + suggest adding "template"

This patch does both these things along with adding error messages +
suggestions for

struct X<5> { }; // error + suggest adding "template <>"

and

template struct X<5> { }; // error + suggest replacing with "template <>"

Bootstrap and regtesting in progress.  Does this patch look OK for trunk?

gcc/cp/ChangeLog:

PR c++/16160
* parser.c (cp_parser_class_head): Identify and reject malformed
template-id declarations and definitions.
---
 gcc/cp/parser.c  | 53 +++-
 gcc/testsuite/g++.dg/cpp0x/gen-attrs-9.C |  2 +-
 gcc/testsuite/g++.dg/ext/attrib9.C   |  2 +-
 gcc/testsuite/g++.dg/template/crash54.C  |  2 +-
 gcc/testsuite/g++.dg/template/error55.C  | 11 +++
 5 files changed, 53 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/error55.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 3290dfa..f6dc004 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -20264,6 +20264,34 @@ cp_parser_class_head (cp_parser* parser,
 }
   virt_specifiers = cp_parser_virt_specifier_seq_opt (parser);
 
+  /* Make sure a top-level template-id declaration or definition is preceded
+ by "template" or "template <>".  */
+  if (template_id_p
+  && at_namespace_scope_p ()
+  && parser->num_template_parameter_lists == 0
+  && !processing_explicit_instantiation)
+{
+  if (cp_parser_next_token_starts_class_definition_p (parser))
+   {
+  error_at (type_start_token->location,
+   "an explicit specialization must be preceded by "
+   "%%>");
+ invalid_explicit_specialization_p = true;
+ /* Try to recover gracefully by taking the same action that would
+have been taken by cp_parser_explicit_specialization.  */
+ ++parser->num_template_parameter_lists;
+ begin_specialization ();
+   }
+  else if (cp_parser_declares_only_class_p (parser))
+   {
+  error_at (type_start_token->location,
+   "an explicit instantiation must be preceded by "
+   "%");
+ type = error_mark_node;
+ goto out;
+   }
+}
+
   /* If it's not a `:' or a `{' then we can't really be looking at a
  class-head, since a class-head only appears as part of a
  class-specifier.  We have to detect this situation before calling
@@ -20275,6 +20303,16 @@ cp_parser_class_head (cp_parser* parser,
   goto out;
 }
 
+  if (processing_explicit_instantiation)
+{
+  error_at (type_start_token->location,
+   "an explicit instantiation may not have a definition");
+  inform (type_start_token->location,
+ "use %%> to define an explicit specialization");
+  type = error_mark_node;
+  goto out;
+}
+
   /* At this point, we're going ahead with the class-specifier, even
  if some other problem occurs.  */
   cp_parser_commit_to_tentative_parse (parser);
@@ -20346,20 +20384,7 @@ cp_parser_class_head (cp_parser* parser,
  num_templates = 0;
}
 }
-  /* An explicit-specialization must be preceded by "template <>".  If
- it is not, try to recover gracefully.  */
-  if (at_namespace_scope_p ()
-  && parser->num_template_parameter_lists == 0
-  && template_id_p)
-{
-  error_at (type_start_token->location,
-   "an explicit specialization must be preceded by %%>");
-  invalid_explicit_specialization_p = true;
-  /* Take the same action that would have been taken by
-cp_parser_explicit_specialization.  */
-  ++parser->num_template_parameter_lists;
-  begin_specialization ();
-}
+
   /* There must be no "return" statements between this point and the
  end of this function; set "type "to the correct return value and
  use "goto done;" to return.  */
diff --git a/gcc/testsuite/g++.dg/cpp0x/gen-attrs-9.C 
b/gcc/testsuite/g++.dg/cpp0x/gen-attrs-9.C
index 3dc51ee..4957ba1 100644
--- a/gcc/testsuite/g++.dg/cpp0x/gen-attrs-9.C
+++ b/gcc/testsuite/g++.dg/cpp0x/gen-attrs-9.C
@@ -9,4 +9,4 @@ enum [[gnu::unused]] e; // { dg-warning "already defined" }
 struct [[gnu::unused]] B *p;   //  { dg-warning "attributes" }
 
 template  struct A { };
-struct [[gnu::unused]] A; //  { dg-warning "att

PR 64481 (bootstrap miscompare)

2015-01-13 Thread Jan Hubicka
Hi,
in December I conditoinally disabled expensive sanity checking in inliner.
This triggeres bootstrap miscompare because caches are getting out of sync.
This patch fixes the problem found by sanity check - the node growth cache
was removed from use in badness calculation by Richard a while ago, but the
cache itself remained while the updating logic was dropped.  This of course
leads to somewhat randomish results.

The other problem fixed is that in some cases we forget to walk through aliases
to get into the callee.

Bootstrapped/regtested x86_64-linux, comitted.
PR ipa/64481
* ipa-inline-analysis.c (node_growth_cache): Remove.
(initialize_growth_caches): Do not initialize it.
(free_growth_caches): Do not free it.
(do_estimate_growth): Rename to ...
(estimate_growth): ... this one; drop growth cache code.
(growth_likely_positive): Always go the heuristics way.
* ipa-inline.c (can_inline_edge_p): Walk through aliases.
(reset_edge_caches): Do not reset node growth.
(heap_edge_removal_hook): Do not maintain cache.
(inline_small_functions): Likewise; strenghten sanity check.
(ipa_inline): Do not maintain caches.
* ipa-inline.h (node_growth_cache): Remove.
(do_estimate_growth): Remove to ...
(estimate_growth): this one; remove inline version.
(reset_node_growth_cache): Remove.
Index: ipa-inline-analysis.c
===
--- ipa-inline-analysis.c   (revision 219571)
+++ ipa-inline-analysis.c   (working copy)
@@ -167,7 +167,6 @@ function_summary  *inl
 vec inline_edge_summary_vec;
 
 /* Cached node/edge growths.  */
-vec node_growth_cache;
 vec edge_growth_cache;
 
 /* Edge predicates goes here.  */
@@ -1341,8 +1340,6 @@ initialize_growth_caches (void)
 {
   if (symtab->edges_max_uid)
 edge_growth_cache.safe_grow_cleared (symtab->edges_max_uid);
-  if (symtab->cgraph_max_uid)
-node_growth_cache.safe_grow_cleared (symtab->cgraph_max_uid);
 }
 
 
@@ -1352,7 +1349,6 @@ void
 free_growth_caches (void)
 {
   edge_growth_cache.release ();
-  node_growth_cache.release ();
 }
 
 
@@ -3931,7 +3927,7 @@ do_estimate_growth_1 (struct cgraph_node
 /* Estimate the growth caused by inlining NODE into all callees.  */
 
 int
-do_estimate_growth (struct cgraph_node *node)
+estimate_growth (struct cgraph_node *node)
 {
   struct growth_data d = { node, 0, false };
   struct inline_summary *info = inline_summaries->get (node);
@@ -3960,12 +3956,6 @@ do_estimate_growth (struct cgraph_node *
 + 50) / 100;
 }
 
-  if (node_growth_cache.exists ())
-{
-  if ((int) node_growth_cache.length () <= node->uid)
-   node_growth_cache.safe_grow_cleared (symtab->cgraph_max_uid);
-  node_growth_cache[node->uid] = d.growth + (d.growth >= 0);
-}
   return d.growth;
 }
 
@@ -3979,7 +3969,6 @@ bool
 growth_likely_positive (struct cgraph_node *node, int edge_growth 
ATTRIBUTE_UNUSED)
 {
   int max_callers;
-  int ret;
   struct cgraph_edge *e;
   gcc_checking_assert (edge_growth > 0);
 
@@ -3999,10 +3988,6 @@ growth_likely_positive (struct cgraph_no
   || !node->can_remove_if_no_direct_calls_p ())
 return true;
 
-  /* If there is cached value, just go ahead.  */
-  if ((int)node_growth_cache.length () > node->uid
-  && (ret = node_growth_cache[node->uid]))
-return ret > 0;
   if (!node->will_be_removed_from_program_if_no_direct_calls_p ()
   && (!DECL_COMDAT (node->decl)
  || !node->can_remove_if_no_direct_calls_p ()))
Index: ipa-inline.c
===
--- ipa-inline.c(revision 219571)
+++ ipa-inline.c(working copy)
@@ -388,11 +388,11 @@ can_inline_edge_p (struct cgraph_edge *e
   else if (caller_tree != callee_tree)
 {
   if (((opt_for_fn (e->caller->decl, optimize)
-   > opt_for_fn (e->callee->decl, optimize))
+   > opt_for_fn (callee->decl, optimize))
|| (opt_for_fn (e->caller->decl, optimize_size)
-   != opt_for_fn (e->callee->decl, optimize_size)))
+   != opt_for_fn (callee->decl, optimize_size)))
  /* gcc.dg/pr43564.c.  Look at forced inline even in -O0.  */
- && !DECL_DISREGARD_INLINE_LIMITS (e->callee->decl))
+ && !DECL_DISREGARD_INLINE_LIMITS (callee->decl))
{
  e->inline_failed = CIF_OPTIMIZATION_MISMATCH;
  inlinable = false;
@@ -1095,9 +1095,6 @@ reset_edge_caches (struct cgraph_node *n
   if (where->global.inlined_to)
 where = where->global.inlined_to;
 
-  /* WHERE body size has changed, the cached growth is invalid.  */
-  reset_node_growth_cache (where);
-
   for (edge = where->callers; edge; edge = edge->next_caller)
 if (edge->inline_failed)
   reset_edge_growth_cache (edge);
@@ -1428,8 +1425,6 @@ add_new_edges_to_heap (edge_heap_t *heap
 static void
 heap_edge_removal_hook (struc

Re: [PATCH 0/6, nds32] Committed: Have -mcmodel option to support varied code model of programs.

2015-01-13 Thread Chung-Ju Wu
2015-01-14 6:22 GMT+08:00 Joseph Myers :
> On Tue, 13 Jan 2015, Chung-Ju Wu wrote:
>
>> To fix this issue, we are going to use -mcmodel=X options, which probably
>> gives more flexibility to support varied code model on code generation.
>> The -mgp-direct option now becomes meaningless and can be discarded.
>
> If you add or remove command-line options, you need to update invoke.texi
> accordingly.
>

Thanks for your reminder.
I will propose another patch to update documentation as soon as possible.

Best regards,
jasonwucj

> --
> Joseph S. Myers
> jos...@codesourcery.com


Re: Housekeeping work in backends.html

2015-01-13 Thread Anthony Green
Eric Botcazou  writes:

> Some ports are missing (lm32, moxie, nios2, nvptx, rl78, rx) so the relevant 
> maintainers are CCed (see 6.3.9 Anatomy of a Target Back End in the doc).

I think I got this right

   |  Characteristics
Target | HMSLQNFICBD lqrcpfgmbdiates
---+
moxie  |   F   g  ds


AG


Re: [PATCH] add option to emit more array bounds warnigs

2015-01-13 Thread Martin Uecker
Jeff Law :
> On 01/13/15 10:34, Martin Uecker wrote:
> > Mon, 12 Jan 2015 11:00:44 -0700
> > Jeff Law :
> >> On 11/11/14 23:13, Martin Uecker wrote:

...

> >> Has this patch been bootstrapped and regression tested, if so on what
> >> platform.
> >
> > x86_64-unknown-linux-gnu
> Approved.  Please install on the trunk.  Sorry about the delays.

I don't have write access ;-(

Martin





Re: [PATCH] Reenable CSE of non-volatile inline asm (PR rtl-optimization/63637)

2015-01-13 Thread Segher Boessenkool
On Tue, Jan 13, 2015 at 03:17:08PM -0700, Jeff Law wrote:
> >And finally there is the case of non-volatile asm with "memory" clobber 
> >with
> >no memory stores in between the two - the posted (safer) patch will not
> >allow to CSE the two, while in theory we could CSE them into just one asm.
> I think we have to assume that CSEing them is wrong.  The first may set 
> something in memory that is read by the second.
> 
> Thoughts?

I agree with pretty much everything you say in the thread, except for this
idea that a memory clobber reads memory.  No clobber reads anything.

The commit that introduced the memory clobber concept, 426b38c9 (svn 1207),
by rms, has as only comment

/* `memory', don't cache memory across asm */


Segher


RE: [MIPS] Re-enable ABI->ISA inference

2015-01-13 Thread Moore, Catherine


> -Original Message-
> From: Matthew Fortune [mailto:matthew.fort...@imgtec.com]
> Sent: Monday, January 05, 2015 6:09 PM
> To: Moore, Catherine
> Cc: 'gcc-patches@gcc.gnu.org' (gcc-patches@gcc.gnu.org)
> Subject: [MIPS] Re-enable ABI->ISA inference
> 
> The R6 patch introduced MIPS_ISA_LEVEL_SPEC into DRIVER_SELF_SPECS for
> all configurations. One part of MIPS_ISA_LEVEL_SPEC is however
> incompatible with those configurations which infer an ISA from an ABI
> without specifically setting the default ISAs using --with-arch-[32|64].
> 
> I.e. a generic mips-linux (--enable-targets=all) and mips64-linux would fail 
> to
> build the n32/n64 multilibs as -mips1 would be introduced by
> DRIVER_SELF_SPECS.
> 
> I have therefore split MIPS_ISA_LEVEL_SPEC into two. One part is suitable
> for all confgurations and one part is only suitable for configurations that 
> infer
> an ABI from an ISA (these tend to be cross-compiler vendor configurations)
> 
> I have built and checked the driver generated options for all relevant
> configurations and everything appears to work.
> 
> Let me know if you can see any problems with this?
> 
> Thanks,
> Matthew
> 
> gcc/
> 
>  * config/mips/mips.h (MIPS_ISA_LEVEL_SPEC): Only infer an ISA
>  level from an ARCH; do not inject the default.
>  (MIPS_DEFAULT_ISA_LEVEL_SPEC): New macro split out from
>  MIPS_ISA_LEVEL_SPEC.
>  (MIPS_ISA_NAN2008_SPEC): Update comment.
>  (BASE_DRIVER_SELF_SPECS): Likewise.
>  * config/mips/elfoabi.h (DRIVER_SELF_SPECS): Add
>  MIPS_DEFAULT_ISA_LEVEL_SPEC.
>  * config/mips/mti-elf.h (DRIVER_SELF_SPECS): Likewise.
>  * config/mips/mti-linux.h (DRIVER_SELF_SPECS): Likewise.
>  * config/mips/sde.h (DRIVER_SELF_SPECS): Likewise.
> ---

This looks OK.



Re: [PATCH]: New configure options that make the compiler use -fPIE and -pie as default option

2015-01-13 Thread H.J. Lu
On Tue, Jan 13, 2015 at 5:03 AM, H.J. Lu  wrote:
> On Mon, Jan 12, 2015 at 11:50:41PM +, Joseph Myers wrote:
>> On Mon, 12 Jan 2015, H.J. Lu wrote:
>>
>> > +if test x$enable_default_pie = xyes; then
>> > +  AC_MSG_CHECKING(if $target supports default PIE)
>> > +  enable_default_pie=no
>> > +  case $target in
>> > +i?86*-*-linux* | x86_64*-*-linux*)
>> > +  saved_LDFLAGS="$LDFLAGS"
>> > +  saved_CFLAGS="$CFLAGS"
>> > +  CFLAGS="$CFLAGS -fPIE"
>> > +  LDFLAGS="$LDFLAGS -fPIE -pie"
>> > +  AC_TRY_LINK(,,[enable_default_pie=yes],)
>> > +  LDFLAGS="$saved_LDFLAGS"
>> > +  CFLAGS="$saved_CFLAGS"
>> > +  ;;
>> > +*)
>> > +  ;;
>> > +esac
>>
>> There should not be any such hardcoding of targets here without concrete
>> evidence that the targets for which this sets enable_default_pie=no really
>> cannot support PIE.  In particular, there is no reason at all for this to
>> be architecture-specific; all GNU/Linux architectures should support PIE.
>>
>> I believe AC_TRY_LINK here will test for the host, whereas what you want
>> to know is what's supported for the target (but it's not possible to run
>> link tests for the target at this point; the compiler for the target
>> hasn't even been built).
>>
>> So: just presume that if the user passes --enable-default-pie then they
>> know what they are doing, and don't try to override their choice.
>>
>> > diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
>> > index c9e3bf1..89fc305 100644
>> > --- a/gcc/doc/install.texi
>> > +++ b/gcc/doc/install.texi
>> > @@ -1583,6 +1583,10 @@ not be built.
>> >  Specify that the run-time libraries for stack smashing protection
>> >  should not be built.
>> >
>> > +@item --enable-default-pie
>> > +Turn on @option{-fPIE} and @option{-pie} by default if supported.
>> > +Currently supported targets are i?86-*-linux* and x86-64-*-linux*.
>>
>> The "if supported" and target list can then be removed here.
>>
>
> Here is the updated patch.  To support --enable-default-pie, each target
> must update STARTFILE_SPEC to support PIE_SPEC and NO_PIE_SPEC.  I can
> provide STARTFILE_SPEC patch if needed.
>
> Thanks.
>
>
> H.J.
> ---
> gcc/
>
> 2015-01-12  Magnus Granberg  
> H.J. Lu  
>
> * Makefile.in (COMPILER): Add @NO_PIE_CFLAGS@.
> (LINKER): Add @NO_PIE_FLAG@.
> (libgcc.mvars): Set NO_PIE_CFLAGS to -fno-PIE for
> --enable-default-pie.
> * common.opt (fPIE): Initialize to -1.
> (fpie): Likewise.
> (static): Add "RejectNegative Negative(shared)".
> (no-pie): New option.
> (pie): Replace "Negative(shared)" with "Negative(no-pie)".
> * configure.ac: Add --enable-default-pie.
> (NO_PIE_CFLAGS): New.  Check if -fno-PIE works.  AC_SUBST.
> (NO_PIE_FLAG): New.  Check if -no-pie works.  AC_SUBST.
> * defaults.h (DEFAULT_FLAG_PIE): New.  Default PIE to -fPIE.
> * gcc.c (NO_PIE_SPEC): New.
> (PIE_SPEC): Likewise.
> (LD_PIE_SPEC): Likewise.
> (LINK_PIE_SPEC): Handle -no-pie.  Use PIE_SPEC and LD_PIE_SPEC.
> * opts.c (DEFAULT_FLAG_PIE): New.  Set to 0 if ENABLE_DEFAULT_PIE
> is undefined.
> (finish_options): Update opts->x_flag_pie if it is -1.
> * config/gnu-user.h (FVTABLE_VERIFY_SPEC): New.
> (GNU_USER_TARGET_STARTFILE_SPEC): Use FVTABLE_VERIFY_SPEC.  Use
> NO_PIE_SPEC and NO_PIE_SPEC if ENABLE_DEFAULT_PIE is defined.
> (GNU_USER_TARGET_STARTFILE_SPEC): Use FVTABLE_VERIFY_SPEC.
> * doc/install.texi: Document --enable-default-pie.
> * doc/invoke.texi: Document -no-pie.
> * config.in: Regenerated.
> * configure: Likewise.
>
> gcc/ada/
>
> 2015-01-12  H.J. Lu  
>
> * gcc-interface/Makefile.in (TOOLS_LIBS): Add @NO_PIE_FLAG@.
>
> libgcc/
>
> 2015-01-12  H.J. Lu  
>
> * Makefile.in (CRTSTUFF_CFLAGS): Add $(NO_PIE_CFLAGS).
>

This is the updated patch.  I fixed the -r regression.   LTO tests
pass now.

-- 
H.J.
gcc/

2015-01-12  Magnus Granberg  
	H.J. Lu  

	* Makefile.in (COMPILER): Add @NO_PIE_CFLAGS@.
	(LINKER): Add @NO_PIE_FLAG@.
	(libgcc.mvars): Set NO_PIE_CFLAGS to -fno-PIE for
	--enable-default-pie.
	* common.opt (fPIE): Initialize to -1.
	(fpie): Likewise.
	(no-pie): New option.
	(pie): Replace "Negative(shared)" with "Negative(no-pie)".
	* configure.ac: Add --enable-default-pie.
	(NO_PIE_CFLAGS): New.  Check if -fno-PIE works.  AC_SUBST.
	(NO_PIE_FLAG): New.  Check if -no-pie works.  AC_SUBST.
	* defaults.h (DEFAULT_FLAG_PIE): New.  Default PIE to -fPIE.
	* gcc.c (NO_PIE_SPEC): New.
	(PIE_SPEC): Likewise.
	(LD_PIE_SPEC): Likewise.
	(LINK_PIE_SPEC): Handle -no-pie.  Use PIE_SPEC and LD_PIE_SPEC.
	* opts.c (DEFAULT_FLAG_PIE): New.  Set to 0 if ENABLE_DEFAULT_PIE
	is undefined.
	(finish_options): Update opts->x_flag_pie if it is -1.
	* config/gnu-user.h (FVTABLE_VERIFY_SPEC): New.
	(GNU_USER_TARGET_STARTFILE_SPEC): Use FVTABLE_VERIFY_SPEC.  Use
	P

Re: [PATCH] PR59448 - Promote consume to acquire

2015-01-13 Thread Andrew MacLeod

On 01/13/2015 02:06 PM, Andrew MacLeod wrote:

On 01/13/2015 01:38 PM, Torvald Riegel wrote:

On Tue, 2015-01-13 at 10:11 -0500, Andrew MacLeod wrote:

On 01/13/2015 09:59 AM, Richard Biener wrote:
On Tue, Jan 13, 2015 at 3:56 PM, Andrew MacLeod 
 wrote:
Lengthy discussion : 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448


Basically we can generate incorrect code for an atomic consume 
operation in
some circumstances.  The general feeling seems to be that we 
should simply
promote all consume operations to an acquire operation until there 
is a
better definition/understanding of the consume model and how GCC 
can track

it.

I proposed a simple patch in the PR, and I have not seen or heard 
of any
dissenting opinion.   We should get this in before the end of 
stage 3 I

think.

The problem with the patch in the PR is the  memory model is 
immediately

promoted from consume to acquire.   This happens *before* any of the
memmodel checks are made.  If a consume is illegally specified 
(such as in a
compare_exchange), it gets promoted to acquire and the compiler 
doesn't

report the error because it never sees the consume.

This new patch simply makes the adjustment after any errors are 
checked on
the originally specified model.   It bootstraps on 
x86_64-unknown-linux-gnu

and passes all regression testing.
I also built an aarch64 compiler and it appears to issue the LDAR as
specified in the PR, but anyone with a vested interest really 
ought to check

it out with a real build to be sure.

OK for trunk?

Why not patch get_memmodel?  (not sure if that catches all cases)

Richard.



That was the original patch.

The issue is that it  promotes consume to acquire before any error
checking gets to look at the model, so then we allow illegal
specification of consume. (It actually triggers a failure in the 
testsuite)

(This is this test: gcc/testsuite/gcc.dg/atomic-invalid.c)

The documentation of the atomic builtins also disallows mo_consume on
atomic_exchange.

However, I don't see any such requirement in C11 or C++14 (and I'd be
surprised to see it in C++11).  It would be surprising also because for
other atomic read-modify-write operations (eg, fetch_add), we don't make
such a requirement in the builtins docs -- and atomic_exchange is just a
read-modify-write with a noop, basically.

Does anyone remember why this requirement for no consume on exchange was
added, or sees a reason to keep it?  If not, I think we should drop it.
This would solve the testsuite failure for Andrew.  Dropping it would
prevent GCC from checking the consume-on-success / acquire-on-failure
case for compare_excahnge I mentioned previously, but I think that this
is pretty harmless.

I could imagine that, for some reason, either backends or libatomic do
not implement consume on atomic_exchange just because the docs
disallowed it -- but I haven't checked that.

I imagine it was probably in a previous incarnation of the 
standard...  Most of this was actually implemented  based on very 
early draft standards years and years ago and never revised.   It 
wasnt put in by me unless the standard at some point said had such 
wording.   The current standard appears to make no mention of the 
situation.


It seems that it should be safe to move back to the original patch, 
and remove that error test for using consume on an exchange...


Andrew

Here's the original patch along with the lien removed from the testcase.
x86_64-unknown-linux-gnu bootstraps, no regressions, and so forth.

OK for trunk?

Andrew


Re: Housekeeping work in backends.html

2015-01-13 Thread Eric Botcazou
> H - you could argue a hardware implementation does not exist as it's a
>  virtual target, but it can obviously be compiled down to run on
>  hardware that does exist.

Agreed.

> Q - "registers" are typed and you can declare 64 bit registers, so
>  probably this is true

OK.

> f - Not even sure what this is about. It only defines a small epilogue
>  pattern.

I think this means that the port still uses asm prologue/epilogue.  That's not 
the case here, so 'f' should probably be omitted and 'g' be added.

As a matter of fact, there are no ports left with 'f'.

> a - Port uses neither LRA nor reload. Might be a new characteristic
>  (along with several others).
> 
> The closest string is probably
> 
> SQCqfbde

Thanks, I have installed "SQCqgbde" based on the discussion above.

-- 
Eric Botcazou


Re: RFC: Two minor optimization patterns

2015-01-13 Thread Andrew Pinski
On Tue, Jan 13, 2015 at 2:41 PM, Rasmus Villemoes  
wrote:
> [My first attempt at submitting a patch for gcc, so please forgive me
> if I'm not following the right protocol.]

There are a few things missing.  For one, a testcase or two for the
added optimizations.

>
> Sometimes rounding a variable to the next even integer is written x += x
> & 1. This usually means using an extra register (and hence at least an
> extra mov instruction) compared to the equivalent x = (x + 1) & ~1. The
> first pattern below tries to do this transformation.
>
> While playing with various ways of rounding down, I noticed that gcc
> already optimizes all of x-(x&3), x^(x&3) and x&~(x&3) to simply
> x&~3. In fact, x&~(x&y) is rewritten as x&~y. However, the dual of this
> is not handled, so I included the second pattern below.
>
> I've tested the below in the sense that gcc compiles and that trivial
> test cases get compiled as expected.

The other thing you missed is a changelog entry for the change you did.
Also you mentioned you tested the patch below but did not mention
which target you tested it on and you should run the full GCC
testsuite.
https://gcc.gnu.org/contribute.html is a good page to start with how
to handle most of the items above.
https://gcc.gnu.org/wiki/HowToPrepareATestcase is a good page on how
to write the testcase for testing the added optimization.

Thanks,
Andrew Pinski

>
> Rasmus
>
>
>
> diff --git gcc/match.pd gcc/match.pd
> index 81c4ee6..04a0bc4 100644
> --- gcc/match.pd
> +++ gcc/match.pd
> @@ -262,6 +262,16 @@ along with GCC; see the file COPYING3.  If not see
>   (abs tree_expr_nonnegative_p@0)
>   @0)
>
> +/* x + (x & 1) -> (x + 1) & ~1 */
> +(simplify
> + (plus @0 (bit_and @0 integer_onep@1))
> + (bit_and (plus @0 @1) (bit_not @1)))
> +
> +/* x | ~(x | y) -> x | ~y */
> +(simplify
> + (bit_ior @0 (bit_not (bit_ior @0 @1)))
> + (bit_ior @0 (bit_not @1)))
> +
>
>  /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
> when profitable.


Re: [PATCH/expand] PR64011 Adjust bitsize when partial overflow happen for big-endian

2015-01-13 Thread Joseph Myers
On Tue, 13 Jan 2015, Jeff Law wrote:

> In many ways having the compiler or assembler spitting out an error here is
> preferable to silently compiling the code.  That would also help explain why

As usual, an error is incorrect in such a case that only has undefined 
behavior at runtime (but it may be compiled into an abort if the behavior 
is unconditionally undefined, and the abort doesn't replace anything 
before the undefined behavior that might have stopped the undefined 
behavior from occurring).

-- 
Joseph S. Myers
jos...@codesourcery.com


RFC: Two minor optimization patterns

2015-01-13 Thread Rasmus Villemoes
[My first attempt at submitting a patch for gcc, so please forgive me
if I'm not following the right protocol.]

Sometimes rounding a variable to the next even integer is written x += x
& 1. This usually means using an extra register (and hence at least an
extra mov instruction) compared to the equivalent x = (x + 1) & ~1. The
first pattern below tries to do this transformation.

While playing with various ways of rounding down, I noticed that gcc
already optimizes all of x-(x&3), x^(x&3) and x&~(x&3) to simply
x&~3. In fact, x&~(x&y) is rewritten as x&~y. However, the dual of this
is not handled, so I included the second pattern below.

I've tested the below in the sense that gcc compiles and that trivial
test cases get compiled as expected.

Rasmus



diff --git gcc/match.pd gcc/match.pd
index 81c4ee6..04a0bc4 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -262,6 +262,16 @@ along with GCC; see the file COPYING3.  If not see
  (abs tree_expr_nonnegative_p@0)
  @0)
 
+/* x + (x & 1) -> (x + 1) & ~1 */
+(simplify
+ (plus @0 (bit_and @0 integer_onep@1))
+ (bit_and (plus @0 @1) (bit_not @1)))
+
+/* x | ~(x | y) -> x | ~y */
+(simplify
+ (bit_ior @0 (bit_not (bit_ior @0 @1)))
+ (bit_ior @0 (bit_not @1)))
+
 
 /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
when profitable.


Re: [PATCH] PR59448 - Promote consume to acquire

2015-01-13 Thread Joseph Myers
On Tue, 13 Jan 2015, Andrew MacLeod wrote:

> It seems that it should be safe to move back to the original patch, and remove
> that error test for using consume on an exchange...

I don't think there should be any such errors, for any of the atomic 
built-in functions, only warnings (with the model converted to 
MEMMODEL_SEQ_CST if not valid, just like a non-constant model).  This is 
just like any other invalid function argument of a suitable type: 
undefined behavior only if the call is executed.

http://www.open-std.org/jtc1/sc22/wg14/12553

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Reenable CSE of non-volatile inline asm (PR rtl-optimization/63637)

2015-01-13 Thread Segher Boessenkool
On Tue, Jan 13, 2015 at 12:45:27PM -0700, Jeff Law wrote:
> On 01/13/15 09:38, Segher Boessenkool wrote:
> >On Tue, Jan 13, 2015 at 05:18:19PM +0100, Jakub Jelinek wrote:
> >>3) on request from Richard (which Segher on IRC argues against), "memory"
> >>clobber also prevents CSE;
> >
> >As extend.texi used to say:
> >
> >"
> >If your assembler instructions access memory in an unpredictable
> >fashion, add @samp{memory} to the list of clobbered registers.  This
> >causes GCC to not keep memory values cached in registers across the
> >assembler instruction and not optimize stores or loads to that memory.
> >You also should add the @code{volatile} keyword if the memory
> >affected is not listed in the inputs or outputs of the @code{asm}, as
> >the @samp{memory} clobber does not count as a side-effect of the
> >@code{asm}.
> >"
> >
> >so a "memory" clobber in a non-volatile asm should not prevent CSE.
> My reading of that paragraph is somewhat different.

It seems so.

I read that as "GCC can delete a memory clobber if it wants to" (just
like it can delete any other clobber when it wants to).

The only difference between ASM_OPERANDS and any other RTL is that
recog is useless for ASM_OPERANDS, it cannot tell you if after you
modify the construct you are left with something valid; so the only
thing the compiler can change about an asm is to delete it whole.
So unlike most RTL, where the compiler is free to remove a clobber
if what is left is valid RTL, the only way to delete a clobber from
an asm is to delete the whole asm.

> The key here is the memory clobber affects optimization of instructions 
> around the asm while the volatile specifier affects the optimization of 
> the ASM itself.

Those are roughly the effects, yes.  Writing unspecified stuff to
unspecified memory is a pretty heavy hammer ;-)

> A memory clobber must inhibit CSE of memory references on either side of 
> the asm because the asm must be assumed to read or write memory in 
> unpredictable ways.

I don't see how that follows.  The asm itself can be CSEd; its clobber
then disappears in a puff of smoke.

> The volatile specifier tells the compiler that the asm itself must be 
> preserved, even if dataflow shows the outputs as not used.

Yes.


Segher


Re: [PATCH 0/6, nds32] Committed: Have -mcmodel option to support varied code model of programs.

2015-01-13 Thread Joseph Myers
On Tue, 13 Jan 2015, Chung-Ju Wu wrote:

> To fix this issue, we are going to use -mcmodel=X options, which probably
> gives more flexibility to support varied code model on code generation.
> The -mgp-direct option now becomes meaningless and can be discarded.

If you add or remove command-line options, you need to update invoke.texi 
accordingly.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH, committed] jit: New API entrypoint: gcc_jit_context_dump_reproducer_to_file

2015-01-13 Thread David Malcolm
When debugging libgccjit or client code, the recipe for reproducing a bug
can be very awkward.  For example, consider client code, linked against
many libraries, which parses some source file into some internal
representation, and then walks this IR, calling into libgccjit.  If this
encounters a bug (e.g. an ICE deep inside gcc) then reproducing it would
require the client code and its dependencies (e.g. data).

This patch adds an API entrypoint:
  gcc_jit_context_dump_reproducer_to_file
which will write out C code for a simple executable linked only to
libgccjit that performs the equivalent calls into libgccjit, without
needing the client code and its data.

This may be useful when debugging the library or client code, for
reducing a complicated recipe for reproducing a bug into a simpler form.

To give us test coverage, the patch also updates the testsuite, so that
most testcase now call this entrypoint, and jit.exp then verifies that
.c reproducers were indeed written, and verifies that they compile.

It doesn't yet verify that they run.  Doing so would be almost a quine;
in theory we could verify that each reproducer can output its own source
code, diffing the results. Implementing that is left for another day
(or as an exercise to the reader...).

This patch takes jit.sum from 7362 to 7494 passes.

Committed to trunk as r219564.

gcc/jit/ChangeLog:
* docs/cp/topics/contexts.rst (Debugging): Add
gccjit::context::dump_reproducer_to_file.
* docs/internals/index.rst (Design notes): New section,
discussing input validation and
gcc_jit_context_dump_reproducer_to_file.
* docs/topics/contexts.rst (Debugging): Add
gcc_jit_context_dump_reproducer_to_file.
* docs/_build/texinfo/libgccjit.texi: Regenerate.
* jit-common.h (gcc::jit::dump::get_context): New accessor.
* jit-recording.c: Include "hash-map.h".
Within namespace ::gcc::jit...
(dump::write): Flush each line.
(dump::make_location): Pass false for new param "created_by_user".
(class allocator): New class.
(allocator::~allocator): New function.
(allocator::xstrdup_printf): New function.
(allocator::xstrdup_printf_va): New function.
(class reproducer): New subclass of dump.
(reproducer::reproducer): New function.
(reproducer::write_params): New function.
(reproducer::write_args): New function.
(reproducer::make_identifier): New function.
(reproducer::make_tmp_identifier): New function.
(reproducer::get_identifier): New pair of functions.
(reproducer::get_identifier_as_rvalue): New function.
(reproducer::get_identifier_as_lvalue): New function.
(reproducer::get_identifier_as_type): New function.
(reproducer::xstrdup_printf): New function.
(recording::context::context): Initialize m_toplevel_ctxt.
(recording::context::new_location): Add param created_by_user.
(str_option_reproducer_strings): New table of strings.
(int_option_reproducer_strings): Likewise.
(bool_option_reproducer_strings): Likewise.
(get_type_enum_strings): Likewise.
(names_of_function_kinds): Likewise.
(global_kind_reproducer_strings): Likewise.
(unary_op_reproducer_strings): Likewise.
(binary_op_reproducer_strings): Likewise.
(comparison_reproducer_strings): Likewise.
Within namespace ::gcc::jit::recording::...
(context::dump_reproducer_to_file): New function.
(string::write_reproducer): Likewise.
(location::write_reproducer): Likewise.
(type::access_as_type): Likewise.
(memento_of_get_type::write_reproducer): Likewise.
(memento_of_get_pointer::write_reproducer): Likewise.
(memento_of_get_const::write_reproducer): Likewise.
(memento_of_get_volatile::write_reproducer): Likewise.
(array_type::write_reproducer): Likewise.
(function_type::write_reproducer): Likewise.
(function_type::write_deferred_reproducer): Likewise.
(field::write_reproducer): Likewise.
(struct_::access_as_type): Likewise.
(struct_::write_reproducer): Likewise.
(union_::write_reproducer): Likewise.
(fields::write_reproducer): Likewise.
(rvalue::access_as_rvalue): Likewise.
(lvalue::access_as_rvalue): Likewise.
(lvalue::access_as_lvalue): Likewise.
(param::access_as_rvalue): Likewise.
(param::access_as_lvalue): Likewise.
(param::write_reproducer): Likewise.
(function::write_reproducer): Likewise.
(block::write_reproducer): Likewise.
(global::write_reproducer): Likewise.
(memento_of_new_rvalue_from_const ::write_reproducer):
Likewise.
(memento_of_new_rvalue_from_const ::write_reproducer):
Likewise.
(memento_of_new_rvalue_from_const ::write_reproducer):
Likewise.
(

Re: [PATCH] Reenable CSE of non-volatile inline asm (PR rtl-optimization/63637)

2015-01-13 Thread Jeff Law

On 01/13/15 13:13, Jakub Jelinek wrote:

On Tue, Jan 13, 2015 at 12:45:27PM -0700, Jeff Law wrote:

On 01/13/15 09:38, Segher Boessenkool wrote:

On Tue, Jan 13, 2015 at 05:18:19PM +0100, Jakub Jelinek wrote:

3) on request from Richard (which Segher on IRC argues against), "memory"
clobber also prevents CSE;


As extend.texi used to say:

"
If your assembler instructions access memory in an unpredictable
fashion, add @samp{memory} to the list of clobbered registers.  This
causes GCC to not keep memory values cached in registers across the
assembler instruction and not optimize stores or loads to that memory.
You also should add the @code{volatile} keyword if the memory
affected is not listed in the inputs or outputs of the @code{asm}, as
the @samp{memory} clobber does not count as a side-effect of the
@code{asm}.
"

so a "memory" clobber in a non-volatile asm should not prevent CSE.

My reading of that paragraph is somewhat different.

The key here is the memory clobber affects optimization of instructions
around the asm while the volatile specifier affects the optimization of the
ASM itself.

A memory clobber must inhibit CSE of memory references on either side of the
asm because the asm must be assumed to read or write memory in unpredictable
ways.

The volatile specifier tells the compiler that the asm itself must be
preserved, even if dataflow shows the outputs as not used.


That is not necessarily in conflict.
Possibly not :-)  This stuff isn't trivial and as well meaning as folks 
trying to update the docs have been, it's possible subtle issues like 
these have been missed, even in the review process.


I know that for me it's easier to reason about code changes like this 
than their associated documentation :-)




My reading of Jeff's comment is that in
int a;
int
foo (void)
{
   int b, c, d, e;
   b = a;
   asm ("..." : "=r" (c) : : "memory");
   d = a;
   asm ("..." : "=r" (e) : : "memory");
   return b + d + 2 * (c + e);
}
we are not allowed to CSE d = a; into d = b;
Precisely.  At least that's how I read things and it makes sense to the 
part of my brain that used to split time between kernel & GCC 
development in a previous life.


In effect the "memory" clobber is an aggregation of the read, write, 
clobbers dataflow for memory (and imprecise as it hits all memory).



.  CSE invalidate_from_clobbers

should ensure that already, even when we don't do anything special about
"memory" clobber in the patch.

OK.


  Another thing is if there is a store

in between the two non-volatile asms with "memory" clobber, here I'm not
sure if with the alternate patch we'd treat the "memory" clobber as use of
everything previously stored into memory (in this regard the posted version
is safe).
I woudln't be terribly surprised if DSE isn't safe in this regard.  I 
don't recall CSE doing any kind of dead store elimination so it wouldn't 
likely care that the memory clobber implies a read as well.





And finally there is the case of non-volatile asm with "memory" clobber with
no memory stores in between the two - the posted (safer) patch will not
allow to CSE the two, while in theory we could CSE them into just one asm.
I think we have to assume that CSEing them is wrong.  The first may set 
something in memory that is read by the second.


Thoughts?


Jeff





Re: [PATCH/expand] PR64011 Adjust bitsize when partial overflow happen for big-endian

2015-01-13 Thread Jeff Law

On 01/09/15 06:39, Jiong Wang wrote:



the bug testcase is
===

typedef short U __attribute__((may_alias, aligned (1)));
struct S
{
   _Complex float d __attribute__((aligned (8)));
};
void bar(struct S);
void f5 (int x)
{
   struct S s;
   ((U *)((char *) &s.d + 1))[3] = x;
   bar (s);
}


So I'm going to focus on that assignment statement.  Doesn't that write 
outside the bounds of "s"?If I'm reading everything correctly we 
have an object that is 8 bytes (a complex float).  The assignment is a 2 
byte write starting at byte 7 in the object.  ISTM that writes one byte 
beyond the underlying object, at which point we're in undefined 
behaviour territory.


In many ways having the compiler or assembler spitting out an error here 
is preferable to silently compiling the code.  That would also help 
explain why we haven't seen this on other big endian targets with rich 
bitfield support (PA and H8 come to mind) -- it only arises in cases of 
undefined behaviour AFAICT.


What I do not like is the ICE or unrecognizable insn error.  The odds 
that someone running into those messages is going to realize it's 
because the input source is bogus is pretty small.


I'm really tempted here to use the conditional you want to add to 
store_bit_field_using_insv and when it triggers issue an error/warning 
instead of or in addition to truncating the size of the assignment.


Thoughts?

jeff





[PATCH] Correct target selector in -mfentry tests

2015-01-13 Thread H.J. Lu
-fprofile -mfentry works with PIE if gcrt1.o is compiled with -fPIC.  A
glibc has been filed, PR 17836, and a glibc patch has been submitted.
OK for trunk?

Thanks.


H.J.
--
* gcc.target/i386/fentry-override.c: Properly place {} in target
selector.  Remove nonpic.
* gcc.target/i386/fentry.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/fentry-override.c | 2 +-
 gcc/testsuite/gcc.target/i386/fentry.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/fentry-override.c 
b/gcc/testsuite/gcc.target/i386/fentry-override.c
index 0464454..4dd87a8 100644
--- a/gcc/testsuite/gcc.target/i386/fentry-override.c
+++ b/gcc/testsuite/gcc.target/i386/fentry-override.c
@@ -1,5 +1,5 @@
 /* Test -mfentry override */
-/* { dg-do compile { target { *-*-linux* } && { nonpic || ! { ia32 } } } } */
+/* { dg-do compile { target { { *-*-linux* } && { ! { ia32 } } } } } */
 /* { dg-options "-mfentry" } */
 /* { dg-final { scan-assembler-not "__fentry__" } } */
 /* Origin: Andi Kleen */
diff --git a/gcc/testsuite/gcc.target/i386/fentry.c 
b/gcc/testsuite/gcc.target/i386/fentry.c
index d0d70c6..3548dd7 100644
--- a/gcc/testsuite/gcc.target/i386/fentry.c
+++ b/gcc/testsuite/gcc.target/i386/fentry.c
@@ -1,5 +1,5 @@
 /* Test -mfentry */
-/* { dg-do compile { target { *-*-linux* } && { nonpic || ! { ia32 } } } } */
+/* { dg-do compile { target { { *-*-linux* } && { ! { ia32 } } } } } */
 /* { dg-options "-fprofile -mfentry" } */
 /* { dg-final { scan-assembler "__fentry__" } } */
 /* Origin: Andi Kleen */
-- 
1.9.3



[committted] PATCH: Add dg-require-profiling to gcc.dg/aru-2.c

2015-01-13 Thread H.J. Lu
Hi,

I checked in this patch to add dg-require-profiling to gcc.dg/aru-2.c
as an obvious fix.


H.J.
---
Index: ChangeLog
===
--- ChangeLog   (revision 219560)
+++ ChangeLog   (working copy)
@@ -1,5 +1,9 @@
 2015-01-13  H.J. Lu  
 
+   * gcc.dg/aru-2.c: Add dg-require-profiling.
+
+2015-01-13  H.J. Lu  
+
* lib/target-supports.exp (check_profiling_available): Check if
-pg links.
 
Index: gcc.dg/aru-2.c
===
--- gcc.dg/aru-2.c  (revision 219560)
+++ gcc.dg/aru-2.c  (working copy)
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-require-profiling "-pg" } */
 /* { dg-options "-O2 -pg" } */
 
 static int __attribute__((noinline))


[Patch, Fortran, F03] PR 58023: ICE on invalid with bad PPC declaration

2015-01-13 Thread Janus Weil
Hi all,

the attached patch fixes an ICE-on-invalid problem with
procedure-pointer components by making sure that we continue resolving
all components of a derived type, even after an error is thrown.

Regtested on x86_64-unknown-linux-gnu. Ok for trunk?

Cheers,
Janus



2015-01-13  Janus Weil  

PR fortran/58023
* resolve.c (resolve_fl_derived0): Continue resolving next component
after error.

2015-01-13  Janus Weil  

PR fortran/58023
* gfortran.dg/proc_ptr_comp_43.f90: New.
Index: gcc/fortran/resolve.c
===
--- gcc/fortran/resolve.c   (Revision 219552)
+++ gcc/fortran/resolve.c   (Arbeitskopie)
@@ -12389,7 +12389,7 @@ resolve_fl_derived0 (gfc_symbol *sym)
{
  gfc_error ("Coarray component %qs at %L must be allocatable with "
 "deferred shape", c->name, &c->loc);
- return false;
+ continue;
}
 
   /* F2008, C443.  */
@@ -12398,7 +12398,7 @@ resolve_fl_derived0 (gfc_symbol *sym)
{
  gfc_error ("Component %qs at %L of TYPE(C_PTR) or TYPE(C_FUNPTR) "
 "shall not be a coarray", c->name, &c->loc);
- return false;
+ continue;
}
 
   /* F2008, C444.  */
@@ -12409,7 +12409,7 @@ resolve_fl_derived0 (gfc_symbol *sym)
  gfc_error ("Component %qs at %L with coarray component "
 "shall be a nonpointer, nonallocatable scalar",
 c->name, &c->loc);
- return false;
+ continue;
}
 
   /* F2008, C448.  */
@@ -12417,7 +12417,7 @@ resolve_fl_derived0 (gfc_symbol *sym)
{
  gfc_error ("Component %qs at %L has the CONTIGUOUS attribute but "
 "is not an array pointer", c->name, &c->loc);
- return false;
+ continue;
}
 
   if (c->attr.proc_pointer && c->ts.interface)
@@ -12427,7 +12427,7 @@ resolve_fl_derived0 (gfc_symbol *sym)
  if (!sym->attr.vtype && !check_proc_interface (ifc, &c->loc))
{
  c->tb->error = 1;
- return false;
+ continue;
}
 
  if (ifc->attr.if_source || ifc->attr.intrinsic)
@@ -12471,7 +12471,10 @@ resolve_fl_derived0 (gfc_symbol *sym)
  gfc_charlen *cl = gfc_new_charlen (sym->ns, ifc->ts.u.cl);
  if (cl->length && !cl->resolved
  && !gfc_resolve_expr (cl->length))
-   return false;
+   {
+ c->tb->error = 1;
+ continue;
+   }
  c->ts.u.cl = cl;
}
}
@@ -12514,7 +12517,7 @@ resolve_fl_derived0 (gfc_symbol *sym)
 "at %L has no argument %qs", c->name,
 c->tb->pass_arg, &c->loc, c->tb->pass_arg);
  c->tb->error = 1;
- return false;
+ continue;
}
}
  else
@@ -12528,7 +12531,7 @@ resolve_fl_derived0 (gfc_symbol *sym)
 "must have at least one argument",
 c->name, &c->loc);
  c->tb->error = 1;
- return false;
+ continue;
}
  me_arg = c->ts.interface->formal->sym;
}
@@ -12544,7 +12547,7 @@ resolve_fl_derived0 (gfc_symbol *sym)
 " the derived type %qs", me_arg->name, c->name,
 me_arg->name, &c->loc, sym->name);
  c->tb->error = 1;
- return false;
+ continue;
}
 
  /* Check for C453.  */
@@ -12554,7 +12557,7 @@ resolve_fl_derived0 (gfc_symbol *sym)
 "must be scalar", me_arg->name, c->name, me_arg->name,
 &c->loc);
  c->tb->error = 1;
- return false;
+ continue;
}
 
  if (me_arg->attr.pointer)
@@ -12563,7 +12566,7 @@ resolve_fl_derived0 (gfc_symbol *sym)
 "may not have the POINTER attribute", me_arg->name,
 c->name, me_arg->name, &c->loc);
  c->tb->error = 1;
- return false;
+ continue;
}
 
  if (me_arg->attr.allocatable)
@@ -12572,12 +12575,15 @@ resolve_fl_derived0 (gfc_symbol *sym)
 "may not be ALLOCATABLE", me_arg->name, c->name,
 me_arg->name, &c->loc);
  c->tb->error = 1;
- return false;
+ continue;
}
 
  if (gfc_type_is_extensible (sym) && me_arg->ts.type != BT_CLASS)
-   gfc_error ("Non-polymorphic passed-object dummy argument of %qs"
-  " at %L", c->name, &c->loc);
+   {
+ gfc_error ("Non-polymorphic passed-object dummy argument of %qs"
+" at %L", c-

Re: PR54442 build_qualified_type produces a non-canonical type

2015-01-13 Thread Jason Merrill

On 01/13/2015 01:46 PM, Paolo Carlini wrote:

In fact, I noticed today that this is a 4.8/4.9 Regression too. Shall I
try to apply the patchlet to 4_9-branch too and, if testing passes,
commit there and close the bug?


OK.

Jason




Re: [PATCH] IPA ICF: handle IMAGPART_EXPR and REALPART_EXPR

2015-01-13 Thread Jeff Law

On 01/13/15 13:47, Martin Liška wrote:

Hello.

As pointed out in the following PR64307, IPA ICF is missing support for
aforementioned TREE types. Apart from that, the patch also fixes
BIT_FIELD_REF, which was broken.

Tested on x86_64-linux-pc and no regression has been seen.

Ready for trunk?
Thanks,
Martin

0001-IPA-ICF-handle-IMAGPART_EXPR-and-REALPART_EXPR.patch


 From 036a27ec12ad556160cb92a86fd99ed02b66ed1a Mon Sep 17 00:00:00 2001
From: mliska
Date: Tue, 13 Jan 2015 18:52:22 +0100
Subject: [PATCH] IPA ICF: handle IMAGPART_EXPR and REALPART_EXPR.

gcc/testsuite/ChangeLog:

2015-01-13  Martin Liska

* gcc.dg/ipa/pr64307.c: New test.

gcc/ChangeLog:

2015-01-13  Martin Liska

* ipa-icf-gimple.c (func_checker::compare_operand): Add support for
IMAGPART_EXPR and REALPART_EXPR and fix BIT_FIELD_REF comparison.

OK.

Jeff



Re: [testsuite] PATCH: Check if -pg available

2015-01-13 Thread Jeff Law

On 01/13/15 13:42, H.J. Lu wrote:



since check_profiling_available result is cached.  It is wrong to use the
cached result from -m32 for -m64.  Here is the updated patch.  Tested
with RUNTESTFLAGS="--target_board='unix{-m32,-m64}'.  OK for
trunk?

Yes, the updated patch is OK.
jeff


Re: [PATCH][ARM] FreeBSD ARM support, EABI, v3

2015-01-13 Thread Andreas Tobler

On 13.01.15 11:25, Ramana Radhakrishnan wrote:

On Thu, Jan 8, 2015 at 8:51 PM, Andreas Tobler  wrote:

On 08.01.15 17:27, Richard Earnshaw wrote:


On 29/12/14 18:44, Andreas Tobler wrote:


All,

here is the third attempt to support ARM with FreeBSD.

In the meantime we found another issue in the unwinder where I had to
adapt some stuff.

The unwind_phase2_forced function in libgcc calls a stop_fn function.
This stop_fn is in FreeBSD's libthr implementation and is called
thread_unwind_stop. This thread_unwind_stop is a generic function used
on all FreeBSD archs.

The issue is now that this thread_unwind_stop expects a double int for
the exception_class, like on every other arch. For ARM EABI this
exception_class is an array of char which is passed in one register as
pointer vs. two registers for a double int.

To solve this issue we defined the exception_class as double integer for
FreeBSD.


My apologies for the slow response, some other work and then holidays
intervened.


Np, the only issue which made me hurry was the stage 4 entering this week.


From my understanding of the ABI document the implementation is

currently as mandated by the ABI. Also this isn't a part of the ABI
that's available for the platform (here FreeBSD to manipulate and
change as per it's wishes). ARM EHABI is special for software, making
FreeBSD more "special" for ARM appears to be counter intuitive from my
point of view. A number of exception unwinding libraries. for e.g.
libobjc , libstdc++ all use this implementation of exception_class.
Therefore this creates a divergence for the FreeBSD port which is
different from everything else. I expect that a number of language run
time support libraries that supported the ARM EHABI would be using
such an implementation, therefore you need to fix every single
implementation of this in every unwinder that supports the ARM EHABI
which I expect to have been ported to in a number of libraries
already. (I already see this in libobjc and libstdc++ in the GCC tree)


Grr ;) I didn't want to hear this answer, but I expected it somehow.
My proposal was the least effort for me.
The other way round is going to be very hard. Maybe impossible.

It is not only FreeBSD which is affected but also llvm and friends. They 
use for exception_class uint64_t.


I have to take a picture how the effort is and if it would be possible 
to do such a change in FreeBSD and more important in llvm etc.



I would rather fix the thread_unwind_stop implementation in libthr for
ARM EHABI rather than make this change.


It wouldn't be a 'fix' but more a wrapper I think.


This adaptation reduced the failure count in libstdc++ by about 40 fails.

I build and test this port on a regular basis and I post the results to
the usual place.


Thanks for doing this. I'm really glad that FreeBSD is finally moving to EABI.


Thanks for the review and the feedback.

Gruss,
Andreas



C++ PATCH for c++/64356 and libstdc++/58777 (constexpr pointer arithmetic)

2015-01-13 Thread Jason Merrill
In this testcase we are iterating through an array which is a local 
variable in a constexpr function.  So its address is not constant, but 
we can still do arithmetic on it to get to the address of an element and 
then pull out the value of the element; we shouldn't reject the 
evaluation as non-constant unless the non-constant address is part of 
the value of the complete expression.


This patch also fixes libstdc++/58777, another issue with giving up on 
an intermediate non-constant address even though the final value is 
constant.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 10760eb60bfd8f5de38ff1194718851a3d0f9073
Author: Jason Merrill 
Date:   Tue Jan 13 07:45:03 2015 -0500

	PR c++/64356
	PR libstdc++/58777
	* constexpr.c (cxx_eval_binary_expression): Don't VERIFY_CONSTANT
	pointer expressions.
	(cxx_eval_increment_expression): Likewise.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 650250b..1432506 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -1616,10 +1616,15 @@ cxx_eval_binary_expression (const constexpr_ctx *ctx, tree t,
   tree lhs, rhs;
   lhs = cxx_eval_constant_expression (ctx, orig_lhs, /*lval*/false,
   non_constant_p, overflow_p);
-  VERIFY_CONSTANT (lhs);
+  /* Don't VERIFY_CONSTANT if this might be dealing with a pointer to
+ a local array in a constexpr function.  */
+  bool ptr = POINTER_TYPE_P (TREE_TYPE (lhs));
+  if (!ptr)
+VERIFY_CONSTANT (lhs);
   rhs = cxx_eval_constant_expression (ctx, orig_rhs, /*lval*/false,
   non_constant_p, overflow_p);
-  VERIFY_CONSTANT (rhs);
+  if (!ptr)
+VERIFY_CONSTANT (lhs);
 
   location_t loc = EXPR_LOCATION (t);
   enum tree_code code = TREE_CODE (t);
@@ -1634,7 +1639,8 @@ cxx_eval_binary_expression (const constexpr_ctx *ctx, tree t,
 }
   else if (cxx_eval_check_shift_p (loc, ctx, code, type, lhs, rhs))
 *non_constant_p = true;
-  VERIFY_CONSTANT (r);
+  if (!ptr)
+VERIFY_CONSTANT (lhs);
   return r;
 }
 
@@ -2704,7 +2710,11 @@ cxx_eval_increment_expression (const constexpr_ctx *ctx, tree t,
   tree val = rvalue (op);
   val = cxx_eval_constant_expression (ctx, val, false,
   non_constant_p, overflow_p);
-  VERIFY_CONSTANT (val);
+  /* Don't VERIFY_CONSTANT if this might be dealing with a pointer to
+ a local array in a constexpr function.  */
+  bool ptr = POINTER_TYPE_P (TREE_TYPE (val));
+  if (!ptr)
+VERIFY_CONSTANT (val);
 
   /* The modified value.  */
   bool inc = (code == PREINCREMENT_EXPR || code == POSTINCREMENT_EXPR);
@@ -2719,7 +2729,8 @@ cxx_eval_increment_expression (const constexpr_ctx *ctx, tree t,
 }
   else
 mod = fold_build2 (inc ? PLUS_EXPR : MINUS_EXPR, type, val, offset);
-  VERIFY_CONSTANT (mod);
+  if (!ptr)
+VERIFY_CONSTANT (mod);
 
   /* Storing the modified value.  */
   tree store = build2 (MODIFY_EXPR, type, op, mod);
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-local2.C b/gcc/testsuite/g++.dg/cpp1y/constexpr-local2.C
new file mode 100644
index 000..fd6143b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-local2.C
@@ -0,0 +1,22 @@
+// PR c++/64356
+// { dg-do compile { target c++14 } }
+
+typedef unsigned long size_t;
+
+template
+constexpr size_t f(const char (&x)[N]) {
+  size_t s = 0;
+  for(size_t c : x)
+s += c;
+  return s;
+}
+
+template
+constexpr size_t g(const char (&x)[N]) {
+  char y[N] = {0};
+  for(size_t i = 0; i < N; ++i)
+y[i] = x[i];
+  return f(y);
+}
+
+constexpr auto x = g(__DATE__);
diff --git a/gcc/testsuite/g++.dg/cpp1y/pr63996.C b/gcc/testsuite/g++.dg/cpp1y/pr63996.C
index d0bf9b5..8f66cdc 100644
--- a/gcc/testsuite/g++.dg/cpp1y/pr63996.C
+++ b/gcc/testsuite/g++.dg/cpp1y/pr63996.C
@@ -6,5 +6,5 @@ foo (int i)
   int a[i] = { }; // { dg-error "forbids variable length" }
 }
 
-constexpr int j = foo (1); // { dg-error "is not a constant expression" }
+constexpr int j = foo (1); // { dg-error "flows off the end" }
 
diff --git a/libstdc++-v3/testsuite/experimental/optional/constexpr/make_optional.cc b/libstdc++-v3/testsuite/experimental/optional/constexpr/make_optional.cc
index f3b4388..d57cf5c 100644
--- a/libstdc++-v3/testsuite/experimental/optional/constexpr/make_optional.cc
+++ b/libstdc++-v3/testsuite/experimental/optional/constexpr/make_optional.cc
@@ -1,7 +1,4 @@
 // { dg-options "-std=gnu++14" }
-// XFAIL pending resolution of PR libstdc++/58777
-// { dg-do compile { xfail *-*-* } }
-// { dg-excess-errors "" }
 
 // Copyright (C) 2013-2015 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/experimental/optional/constexpr/observers/2.cc b/libstdc++-v3/testsuite/experimental/optional/constexpr/observers/2.cc
index 20c307d..803927b 100644
--- a/libstdc++-v3/testsuite/experimental/optional/constexpr/observers/2.cc
+++ b/libstdc++-v3/testsuite/experimental/optional/constexpr/observers/2.cc
@@ -1,7 +1,4 @@
 // { dg-options "-std=gnu++14" }
-// XFAIL pending resolution of PR libstdc++/58777
-// { dg-do compile { xfail *-*-* 

Re: [arm][patch] fix arm_neon_ok check on !arm_arch7

2015-01-13 Thread Andrew Stubbs

On 12/01/15 13:50, Ramana Radhakrishnan wrote:

In principle ok, but I'd like a comment in there explaining why we've
done this. Can you also post under what configurations these have been
tested ?


Is this better?

I tested it by running the vect.exp tests with a variety of -mcpu flags.

Andrew

2015-01-13  Andrew Stubbs  

	gcc/testsuite/
	* lib/target-supports.exp
	(check_effective_target_arm_neon_ok_nocache): Don't try to test Neon
	on ARM architures before v7.

Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp	(revision 219058)
+++ gcc/testsuite/lib/target-supports.exp	(working copy)
@@ -2565,6 +2565,11 @@
 	if { [check_no_compiler_messages_nocache arm_neon_ok object {
 		#include "arm_neon.h"
 		int dummy;
+		/* Avoid the case where a test adds -mfpu=neon, but the toolchain is
+		   configured for -mcpu=arm926ej-s, for example.  */
+		#if __ARM_ARCH < 7
+		#error Architecture too old for NEON.
+		#endif
 	} "$flags"] } {
 		set et_arm_neon_flags $flags
 		return 1


Fix ICE in inline_small_functions

2015-01-13 Thread Jan Hubicka
Hi,
this patch fixes ICE in inline_small_functions checking consistency of the
growth cache.  When removing speculation the cache needs to be reset because
the frequencies of edges change.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

PR ipa/64565
* g++.dg/torture/pr64565.C: New testcase.
* ipa-inline.c (inline_small_functions): Update callee keys after
resolving speculation
(inline_small_functions): Always check monotonicity of the queue.
Index: testsuite/g++.dg/torture/pr64565.C
===
--- testsuite/g++.dg/torture/pr64565.C  (revision 0)
+++ testsuite/g++.dg/torture/pr64565.C  (revision 0)
@@ -0,0 +1,89 @@
+/* { dg-do compile } */
+typedef enum
+{
+  NS_OK
+} nsresult;
+struct A
+{
+  static int kIID;
+};
+class B
+{
+};
+class C
+{
+public:
+  C (B p1) { m_fn1 (p1, A::kIID); }
+  void m_fn1 (B, int);
+};
+class D;
+class F
+{
+public:
+  F (int);
+};
+class G
+{
+public:
+  D *operator[](int);
+};
+class H
+{
+  virtual nsresult m_fn2 ();
+
+public:
+  void m_fn3 ();
+};
+class J : H
+{
+  G mQueries;
+  int mLiveUpdate;
+  nsresult m_fn2 ();
+};
+class D
+{
+public:
+  nsresult m_fn4 (int);
+  void m_fn5 (int);
+};
+class I
+{
+public:
+  static I *
+  m_fn6 ()
+  {
+B __trans_tmp_3;
+if (!gHistoryService)
+  C serv = __trans_tmp_3;
+  }
+  void m_fn7 ();
+  static I *gHistoryService;
+};
+D *Refresh___trans_tmp_2;
+D Refresh___trans_tmp_6, Refresh___trans_tmp_5;
+int Refresh_hasDomain;
+nsresult
+J::m_fn2 ()
+{
+  m_fn3 ();
+  I history = *I::m_fn6 ();
+  switch (mLiveUpdate)
+{
+case 1:
+  {
+mQueries[0];
+F query = 0;
+if (Refresh_hasDomain)
+  return NS_OK;
+  }
+case 0:
+  {
+Refresh___trans_tmp_2 = mQueries[0];
+F query = Refresh___trans_tmp_5.m_fn4 (0);
+history.m_fn7 ();
+Refresh___trans_tmp_6.m_fn5 (0);
+  }
+case 3:
+  m_fn2 ();
+}
+}
Index: ipa-inline.c
===
--- ipa-inline.c(revision 219452)
+++ ipa-inline.c(working copy)
@@ -1626,6 +1626,8 @@ inline_small_functions (void)
  reset_edge_caches (where);
   update_caller_keys (&edge_heap, where,
  updated_nodes, NULL);
+  update_callee_keys (&edge_heap, where,
+ updated_nodes);
   bitmap_clear (updated_nodes);
}
 }
@@ -1661,7 +1663,7 @@ inline_small_functions (void)
   /* Disable checking for profile because roundoff errors may cause slight
  deviations in the order.  */
   gcc_assert (max_count || cached_badness == current_badness);
-  gcc_assert (max_count || current_badness >= badness);
+  gcc_assert (current_badness >= badness);
 #else
   current_badness = edge_badness (edge, false);
 #endif


[PATCH] IPA ICF: handle IMAGPART_EXPR and REALPART_EXPR

2015-01-13 Thread Martin Liška

Hello.

As pointed out in the following PR64307, IPA ICF is missing support for 
aforementioned TREE types. Apart from that, the patch also fixes 
BIT_FIELD_REF, which was broken.


Tested on x86_64-linux-pc and no regression has been seen.

Ready for trunk?
Thanks,
Martin
>From 036a27ec12ad556160cb92a86fd99ed02b66ed1a Mon Sep 17 00:00:00 2001
From: mliska 
Date: Tue, 13 Jan 2015 18:52:22 +0100
Subject: [PATCH] IPA ICF: handle IMAGPART_EXPR and REALPART_EXPR.

gcc/testsuite/ChangeLog:

2015-01-13  Martin Liska  

	* gcc.dg/ipa/pr64307.c: New test.

gcc/ChangeLog:

2015-01-13  Martin Liska  

	* ipa-icf-gimple.c (func_checker::compare_operand): Add support for
	IMAGPART_EXPR and REALPART_EXPR and fix BIT_FIELD_REF comparison.
---
 gcc/ipa-icf-gimple.c   | 14 +-
 gcc/testsuite/gcc.dg/ipa/pr64307.c | 32 
 2 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr64307.c

diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c
index ed3cdf5..05c2a23 100644
--- a/gcc/ipa-icf-gimple.c
+++ b/gcc/ipa-icf-gimple.c
@@ -448,6 +448,8 @@ func_checker::compare_operand (tree t1, tree t2)
 
 	return return_with_debug (ret);
   }
+case IMAGPART_EXPR:
+case REALPART_EXPR:
 case ADDR_EXPR:
   {
 	x1 = TREE_OPERAND (t1, 0);
@@ -458,7 +460,17 @@ func_checker::compare_operand (tree t1, tree t2)
   }
 case BIT_FIELD_REF:
   {
-	ret = compare_decl (t1, t2);
+	x1 = TREE_OPERAND (t1, 0);
+	x2 = TREE_OPERAND (t2, 0);
+	y1 = TREE_OPERAND (t1, 1);
+	y2 = TREE_OPERAND (t2, 1);
+	z1 = TREE_OPERAND (t1, 2);
+	z2 = TREE_OPERAND (t2, 2);
+
+	ret = compare_operand (x1, x2)
+	  && compare_cst_or_decl (y1, y2)
+	  && compare_cst_or_decl (z1, z2);
+
 	return return_with_debug (ret);
   }
 case SSA_NAME:
diff --git a/gcc/testsuite/gcc.dg/ipa/pr64307.c b/gcc/testsuite/gcc.dg/ipa/pr64307.c
new file mode 100644
index 000..e8c1b02
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr64307.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -fipa-icf -fdump-ipa-icf"  } */
+
+#include 
+
+typedef _Complex float COMPLEX_FLOAT;
+
+__attribute__ ((noinline))
+static float real_part(COMPLEX_FLOAT a)
+{
+  return *(float*)(&a);
+}
+
+__attribute__ ((noinline))
+static float real_part_2(COMPLEX_FLOAT a)
+{
+  return ((float*)(&a))[0];
+}
+
+int main()
+{
+  COMPLEX_FLOAT f = 1.0f + _Complex_I;
+
+  float r1 = real_part(f);
+  float r2 = real_part(f);
+
+  return r1 - r2;
+}
+
+/* { dg-final { scan-ipa-dump "Semantic equality hit:real_part_2->real_part" "icf"  } } */
+/* { dg-final { scan-ipa-dump "Equal symbols: 1" "icf"  } } */
+/* { dg-final { cleanup-ipa-dump "icf" } } */
-- 
2.1.2



C++ PATCH for c++/64514 (zero-length fixed parameter pack)

2015-01-13 Thread Jason Merrill
The code I added for dealing with fixed parameter packs wasn't dealing 
properly with zero-length packs: if there were any elements we would see 
the pack expansion argument and punt, but we need to do that in the 
zero-length case as well.


Tested x86_64-pc-linux-gnu, applying to trunk and 4.9.
commit 8fa00de341538399cd675d50a6b1cb86bda39159
Author: Jason Merrill 
Date:   Mon Jan 12 22:00:26 2015 -0500

	PR c++/64514
	* pt.c (coerce_template_parameter_pack): Return NULL for a
	zero-length fixed parameter pack with a pack expansion arg.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 3ac93db..55871e5 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -6825,6 +6825,9 @@ coerce_template_parameter_pack (tree parms,
   if (invalid_nontype_parm_type_p (t, complain))
 return error_mark_node;
 }
+	  /* We don't know how many args we have yet, just
+	 use the unconverted ones for now.  */
+	  return NULL_TREE;
 }
 
   packed_args = make_tree_vec (TREE_VEC_LENGTH (packed_parms));
diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic165.C b/gcc/testsuite/g++.dg/cpp0x/variadic165.C
new file mode 100644
index 000..862931f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/variadic165.C
@@ -0,0 +1,17 @@
+// PR c++/64514
+// { dg-do compile { target c++11 } }
+
+template
+struct Functor
+{
+template 
+struct Inner
+{};
+};
+
+template struct Functor<>::Inner<>;
+
+int main()
+{
+
+}


Re: [testsuite] PATCH: Check if -pg available

2015-01-13 Thread H.J. Lu
On Tue, Jan 13, 2015 at 11:27 AM, Jeff Law  wrote:
> On 01/13/15 05:54, H.J. Lu wrote:
>>
>> On Mon, Jan 12, 2015 at 03:04:20PM -0700, Jeff Law wrote:
>>>
>>> On 01/12/15 14:51, Magnus Granberg wrote:

 måndag 12 januari 2015 12.11.17 skrev  H.J. Lu:
>
> On Mon, Jan 12, 2015 at 12:03 PM, Jeff Law  wrote:
>>
>> On 01/12/15 12:59, H.J. Lu wrote:
>>>
>>> I don't know if -pg will work PIE on any targets.  For Linux/x86
>>> the choices of crt1.o are
>>>
>>> %{!shared: %{pg|p|profile:gcrt1.o%s;pie:Scrt1.o%s;:crt1.o%s}}
>>>
>>> -shared, -pg and -pie are mutually exclusive. Those crt1 files are
>>> only crt1 files provided by glibc.  You can't even try -pg -pie on
>>> Linux without changing glibc.
>>
>>
>> You're totally missing the point.  What I care about is *why*.
>>
 With -pg it use gcrt1.o object file and that file is not compile with
 -fPIC.
 When you build a shared lib on x86_64 all the objects files need to be
 buiit
 with -fPIC else you get a error like that one abow and it is the same
 problems
 when you build bin with -fPIE and linke with -pie.
 Glibc do not provide one that is compile with -fPIC
>>>
>>> Is there some reason why glibc could not provide gcrt1.o compiled with
>>> -fPIC?
>>>
>>>
>>
>> Here is a patch to check if -pg is available.  If -pg doesn't link,
>> profiling isn't available.  OK for trunk?
>
> OK with a suitable ChangeLog entry.
>

Unfortunately it doesn't work with

make check-gcc RUNTESTFLAGS="--target_board='unix{-m32,-m64}'

since check_profiling_available result is cached.  It is wrong to use the
cached result from -m32 for -m64.  Here is the updated patch.  Tested
with RUNTESTFLAGS="--target_board='unix{-m32,-m64}'.  OK for
trunk?

Thanks.


-- 
H.J.

* lib/target-supports.exp (check_profiling_available): Check if
-pg links.
---
 gcc/testsuite/lib/target-supports.exp | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp
b/gcc/testsuite/lib/target-supports.exp
index 0ac9646..61bff53 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -549,7 +549,16 @@ proc check_profiling_available { test_what } {
  }
 }

-return $profiling_available_saved
+# -pg link test result can't be cached since it may change between
+# runs.
+set profiling_working $profiling_available_saved
+if { $profiling_available_saved == 1
+ && ![check_no_compiler_messages_nocache profiling executable {
+  int main() { return 0; } } "-pg"] } {
+ set profiling_working 0
+}
+
+return $profiling_working
 }

 # Check to see if a target is "freestanding". This is as per the definition
-- 
1.9.3


C++ PATCH for c++/64520 (ICE with std::initializer_list)

2015-01-13 Thread Jason Merrill
The special rules for deduction of std::initializer_list don't support a 
pack expansion, but we shouldn't crash.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit a1607c38aaf6e04c2a601ee78dca67984e179986
Author: Jason Merrill 
Date:   Mon Jan 12 13:27:20 2015 -0500

	PR c++/64520
	* pt.c (unify): Don't try to deduce to std::initializer_list.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index d8652fb..3ac93db 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -17854,7 +17854,13 @@ unify (tree tparms, tree targs, tree parm, tree arg, int strict,
   if (TREE_CODE (parm) == ARRAY_TYPE)
 	elttype = TREE_TYPE (parm);
   else
-	elttype = TREE_VEC_ELT (CLASSTYPE_TI_ARGS (parm), 0);
+	{
+	  elttype = TREE_VEC_ELT (CLASSTYPE_TI_ARGS (parm), 0);
+	  /* Deduction is defined in terms of a single type, so just punt
+	 on the (bizarre) std::initializer_list.  */
+	  if (PACK_EXPANSION_P (elttype))
+	return unify_success (explain_p);
+	}
 
   FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (arg), i, elt)
 	{
diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist91.C b/gcc/testsuite/g++.dg/cpp0x/initlist91.C
new file mode 100644
index 000..1387557
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist91.C
@@ -0,0 +1,8 @@
+// PR c++/64520
+// { dg-do compile { target c++11 } }
+
+#include 
+struct A {
+  template  A(std::initializer_list);
+};
+A a { 0 };			// { dg-error "" }


Re: flatten expr.h (version 2)

2015-01-13 Thread Prathamesh Kulkarni
On 13 January 2015 at 22:02, Prathamesh Kulkarni
 wrote:
> On 13 January 2015 at 16:06, Prathamesh Kulkarni
>  wrote:
>> On 13 January 2015 at 15:34, Richard Biener  wrote:
>>> On Sun, 11 Jan 2015, Prathamesh Kulkarni wrote:
>>>
 Hi,
 This is a revamped expr.h flattening flattening patch rebased on
 tree.h and tree-core.h flattening patch (r219402).
 It depends upon the following patch to get committed.
 https://gcc.gnu.org/ml/gcc-patches/2015-01/msg00565.html

 Changes:
 * Removed all includes except tree-core.h. Put includes required by
 expr.h in a comment.
 * Moved stmt.c, expmed.c prototypes to stmt.h, expmed.h respectively.
 * Adjusted generator programs: genemit.c, gengtype.c, genopinit.c, 
 genoutput.c.
 * Did not put includes in gcc-plugin.h since expr.h cannot be included
 by plugins
 (putting them broke building a file in c-family/ since expr.h is not
 allowed in front-ends)
 * Affects java front-end (expr.h is allowed in java front-end).

 Bootstrapped and tested on x86_64-unknown-linux-gnu with languages:
 all,go,ada,jit
 Built on all targets in config-list.mk with languages: all, go.
 OK to commit ?
>>>
>>> diff --git a/gcc/expr.c b/gcc/expr.c
>>> index fc22862..824541e 100644
>>> --- a/gcc/expr.c
>>> +++ b/gcc/expr.c
>>> @@ -41,11 +41,17 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "regs.h"
>>>  #include "hard-reg-set.h"
>>>  #include "except.h"
>>> -#include "input.h"
>>>  #include "function.h"
>>>  #include "insn-config.h"
>>>  #include "insn-attr.h"
>>>  /* Include expr.h after insn-config.h so we get HAVE_conditional_move.
>>> */
>>> +#include "hashtab.h"
>>> +#include "emit-rtl.h"
>>> +#include "expmed.h"
>>> +#include "stmt.h"
>>> +#include "statistics.h"
>>> +#include "real.h"
>>> +#include "fixed-value.h"
>>>  #include "expr.h"
>>>
>>> Please move the comment to the proper place
>> ah, my flattening tool doesn't look at comments. I will move the
>> comment before expr.h include, thanks.
>>>
>>> diff --git a/gcc/expr.h b/gcc/expr.h
>>> index a7638b8..f1be8dc 100644
>>> --- a/gcc/expr.h
>>> +++ b/gcc/expr.h
>>> @@ -20,7 +20,8 @@ along with GCC; see the file COPYING3.  If not see
>>>  #ifndef GCC_EXPR_H
>>>  #define GCC_EXPR_H
>>>
>>> -/* For inhibit_defer_pop */
>>> +/* expr.h required includes */
>>> +#if 0
>>>  #include "hashtab.h"
>>>  #include "hash-set.h"
>>>  #include "vec.h"
>>> @@ -29,15 +30,17 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "hard-reg-set.h"
>>>  #include "input.h"
>>>  #include "function.h"
>>> -/* For XEXP, GEN_INT, rtx_code */
>>>  #include "rtl.h"
>>> -/* For optimize_size */
>>>  #include "flags.h"
>>> -/* For tree_fits_[su]hwi_p, tree_to_[su]hwi, fold_convert, size_binop,
>>> -   ssize_int, TREE_CODE, TYPE_SIZE, int_size_in_bytes,*/
>>>  #include "tree-core.h"
>>> -/* For GET_MODE_BITSIZE, word_mode */
>>>  #include "insn-config.h"
>>> +#include "alias.h"
>>> +#include "emit-rtl.h"
>>> +#include "expmed.h"
>>> +#include "stmt.h"
>>> +#endif
>>>
>>> Err, please remove the #if 0 section
>> I kept it because if something breaks later (hopefully not!), it will
>> be easier to fix.
>> I will remove it.
>>>
>>> +
>>> +#include "tree-core.h"
>>>
>>> Why?  The original comment says
>>>
>>> -/* For tree_fits_[su]hwi_p, tree_to_[su]hwi, fold_convert, size_binop,
>>> -   ssize_int, TREE_CODE, TYPE_SIZE, int_size_in_bytes,*/
>>>
>>> but all those are declared in tree.h.  Which means the files including
>>> expr.h must already include tree.h.
>>>
>>> If that's not the reason we need to include tree-core.h from expr.c
>>> please add a comment explaining why.
>> bt-load.c fails to compile because it includes expr.h but does not
>> include tree.h
>> I will place tree.h include in all files that include expr.h and rebuild.
> This is not going to work, since tree.h is now flattened. Shall also
> require including all headers required by
> tree.h in all files that include expr.h. Could we retain tree-core.h
> in expr.h for now ?
> Or should I insert tree.h (along with tree.h required includes) in all
> files that include expr.h ?
I am including tree.h along with required includes in all files that
include expr.h.
This removes all includes from expr.h.

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
>>>
>>> -/* Definitions from emit-rtl.c */
>>> -#include "emit-rtl.h"
>>> -
>>>  /* Return a memory reference like MEMREF, but with its mode widened to
>>> MODE and adjusted by OFFSET.  */
>>>  extern rtx widen_memory_access (rtx, machine_mode, HOST_WIDE_INT);
>>>
>>> err - functions defined in emit-rtl.c should be declared in emit-rtl.h.
>>> Please fix that first.  expr.h should _only_ contain prototypes
>>> for stuff defined in expr.c.
>> oops, missed it :(
>>>
>>> Andrew did a good job with this, first cleaning up a header moving
>>> declarations to proper places and only after that flattening it.
>>>
>>> The rest of the patch looks good t

RE: [PATCH] Allow MIPS call-saved-{4-6}.c tests to correctly run for micromips

2015-01-13 Thread Maciej W. Rozycki
On Tue, 13 Jan 2015, Matthew Fortune wrote:

> > >> I have tested this for both mips and micromips, and the tests now
> > >> pass successfully.
> > >> The ChangeLog and patch are below.
> > >
> > >  Hmm, instead of trying to avoid testing microMIPS code generation
> > > just to satisfy the test suite I'd rather see the test cases updated
> > > so that LWM/SWM register ranges are expected and accepted whenever
> > > microMIPS code is produced.  These scan patterns can be made
> > conditional.
> > 
> > FWIW I think Andrew's patch is correct.  If we want to test microMIPS
> > output against micromips-specific regexps, we should add a separate test
> > that forces micromips, so that it gets tested regardless of people's
> > RUNTESTFLAGS.  Doing that shouldn't hold up Andrew's patch though.

 Taking care that the default compilation mode does not conflict (e.g. 
MIPS16, incompatible) and taking any exceptions into account (e.g. n64, 
unsupported) I presume, right?

> > Whereever possible gcc.target/mips should not have conditional dg-
> > finals.

 OK, if that's been the policy.

> I was going to suggest a follow up patch to add copies of the three tests
> as Richard suggests. I haven't yet done a micromips run of the testsuite
> to check for any other issues like this but I suspect problems are limited
> to the tests that I recently added.

 Please always try to test changes reasonably, i.e. at least o32, 
o32/MIPS16, o32/microMIPS, n32, n64, and then Linux and ELF if applicable, 
plus any options that may be relevant, unless it is absolutely clear 
ABI/ISA variations do not matter for a change proposed.  Running 
regression tests is just processing time after all (cheap!), you don't 
have to spend your brain cycles on it (expensive!).

  Maciej


Re: [PATCH] Reenable CSE of non-volatile inline asm (PR rtl-optimization/63637)

2015-01-13 Thread Jakub Jelinek
On Tue, Jan 13, 2015 at 12:45:27PM -0700, Jeff Law wrote:
> On 01/13/15 09:38, Segher Boessenkool wrote:
> >On Tue, Jan 13, 2015 at 05:18:19PM +0100, Jakub Jelinek wrote:
> >>3) on request from Richard (which Segher on IRC argues against), "memory"
> >>clobber also prevents CSE;
> >
> >As extend.texi used to say:
> >
> >"
> >If your assembler instructions access memory in an unpredictable
> >fashion, add @samp{memory} to the list of clobbered registers.  This
> >causes GCC to not keep memory values cached in registers across the
> >assembler instruction and not optimize stores or loads to that memory.
> >You also should add the @code{volatile} keyword if the memory
> >affected is not listed in the inputs or outputs of the @code{asm}, as
> >the @samp{memory} clobber does not count as a side-effect of the
> >@code{asm}.
> >"
> >
> >so a "memory" clobber in a non-volatile asm should not prevent CSE.
> My reading of that paragraph is somewhat different.
> 
> The key here is the memory clobber affects optimization of instructions
> around the asm while the volatile specifier affects the optimization of the
> ASM itself.
> 
> A memory clobber must inhibit CSE of memory references on either side of the
> asm because the asm must be assumed to read or write memory in unpredictable
> ways.
> 
> The volatile specifier tells the compiler that the asm itself must be
> preserved, even if dataflow shows the outputs as not used.

That is not necessarily in conflict.
My reading of Jeff's comment is that in
int a;
int
foo (void)
{
  int b, c, d, e;
  b = a;
  asm ("..." : "=r" (c) : : "memory");
  d = a;
  asm ("..." : "=r" (e) : : "memory");
  return b + d + 2 * (c + e);
}
we are not allowed to CSE d = a; into d = b;.  CSE invalidate_from_clobbers
should ensure that already, even when we don't do anything special about
"memory" clobber in the patch.  Another thing is if there is a store
in between the two non-volatile asms with "memory" clobber, here I'm not
sure if with the alternate patch we'd treat the "memory" clobber as use of
everything previously stored into memory (in this regard the posted version
is safe).
And finally there is the case of non-volatile asm with "memory" clobber with
no memory stores in between the two - the posted (safer) patch will not
allow to CSE the two, while in theory we could CSE them into just one asm.

Jakub


RE: [PATCH] Allow MIPS call-saved-{4-6}.c tests to correctly run for micromips

2015-01-13 Thread Matthew Fortune
Richard Sandiford   writes:
> "Maciej W. Rozycki"  writes:
> > On Tue, 13 Jan 2015, Andrew Bennett wrote:
> >
> >> The call-saved-{4-6}.c tests in the mips testsuite fail for
> micromips.
> >> The reason is
> >> that micromips uses the swm and lwm instructions to save/restore the
> >> call-saved registers rather than using the sw and lw instructions.
> >> The swm and lwm instructions only list the range of registers to use
> >> ie. $16-$25 and hence some of the scan-assembler patterns fail.  This
> >> fix adds the NO_COMPRESSION attribute to the foo function to force
> >> the tests to always compile as mips.
> >>
> >> I have tested this for both mips and micromips, and the tests now
> >> pass successfully.
> >> The ChangeLog and patch are below.
> >
> >  Hmm, instead of trying to avoid testing microMIPS code generation
> > just to satisfy the test suite I'd rather see the test cases updated
> > so that LWM/SWM register ranges are expected and accepted whenever
> > microMIPS code is produced.  These scan patterns can be made
> conditional.
> 
> FWIW I think Andrew's patch is correct.  If we want to test microMIPS
> output against micromips-specific regexps, we should add a separate test
> that forces micromips, so that it gets tested regardless of people's
> RUNTESTFLAGS.  Doing that shouldn't hold up Andrew's patch though.
> 
> Whereever possible gcc.target/mips should not have conditional dg-
> finals.

I was going to suggest a follow up patch to add copies of the three tests
as Richard suggests. I haven't yet done a micromips run of the testsuite
to check for any other issues like this but I suspect problems are limited
to the tests that I recently added.

I certainly agree that we shouldn't just ignore micromips expected output
given it is pretty easy to test.

Please go ahead and commit this patch so we clean up the test results for
GCC 5 in case you (or anyone else) doesn't get to submitting the extra test
cases before we hit stage 4.

Thanks,
Matthew


Re: [PATCH] Allow MIPS call-saved-{4-6}.c tests to correctly run for micromips

2015-01-13 Thread Richard Sandiford
"Maciej W. Rozycki"  writes:
> On Tue, 13 Jan 2015, Andrew Bennett wrote:
>
>> The call-saved-{4-6}.c tests in the mips testsuite fail for micromips.
>> The reason is
>> that micromips uses the swm and lwm instructions to save/restore the
>> call-saved registers
>> rather than using the sw and lw instructions.  The swm and lwm
>> instructions only list
>> the range of registers to use ie. $16-$25 and hence some of the
>> scan-assembler
>> patterns fail.  This fix adds the NO_COMPRESSION attribute to the foo
>> function to
>> force the tests to always compile as mips.
>>  
>> I have tested this for both mips and micromips, and the tests now pass
>> successfully.
>> The ChangeLog and patch are below.
>
>  Hmm, instead of trying to avoid testing microMIPS code generation just to 
> satisfy the test suite I'd rather see the test cases updated so that 
> LWM/SWM register ranges are expected and accepted whenever microMIPS code 
> is produced.  These scan patterns can be made conditional.

FWIW I think Andrew's patch is correct.  If we want to test microMIPS
output against micromips-specific regexps, we should add a separate test
that forces micromips, so that it gets tested regardless of people's
RUNTESTFLAGS.  Doing that shouldn't hold up Andrew's patch though.

Whereever possible gcc.target/mips should not have conditional dg-finals.

Thanks,
Richard


Re: [PATCH] Reenable CSE of non-volatile inline asm (PR rtl-optimization/63637)

2015-01-13 Thread Jeff Law

On 01/13/15 09:38, Segher Boessenkool wrote:

On Tue, Jan 13, 2015 at 05:18:19PM +0100, Jakub Jelinek wrote:

3) on request from Richard (which Segher on IRC argues against), "memory"
clobber also prevents CSE;


As extend.texi used to say:

"
If your assembler instructions access memory in an unpredictable
fashion, add @samp{memory} to the list of clobbered registers.  This
causes GCC to not keep memory values cached in registers across the
assembler instruction and not optimize stores or loads to that memory.
You also should add the @code{volatile} keyword if the memory
affected is not listed in the inputs or outputs of the @code{asm}, as
the @samp{memory} clobber does not count as a side-effect of the
@code{asm}.
"

so a "memory" clobber in a non-volatile asm should not prevent CSE.

My reading of that paragraph is somewhat different.

The key here is the memory clobber affects optimization of instructions 
around the asm while the volatile specifier affects the optimization of 
the ASM itself.


A memory clobber must inhibit CSE of memory references on either side of 
the asm because the asm must be assumed to read or write memory in 
unpredictable ways.


The volatile specifier tells the compiler that the asm itself must be 
preserved, even if dataflow shows the outputs as not used.


eff



Re: [testsuite] PATCH: Check if -pg available

2015-01-13 Thread Jeff Law

On 01/13/15 05:54, H.J. Lu wrote:

On Mon, Jan 12, 2015 at 03:04:20PM -0700, Jeff Law wrote:

On 01/12/15 14:51, Magnus Granberg wrote:

måndag 12 januari 2015 12.11.17 skrev  H.J. Lu:

On Mon, Jan 12, 2015 at 12:03 PM, Jeff Law  wrote:

On 01/12/15 12:59, H.J. Lu wrote:

I don't know if -pg will work PIE on any targets.  For Linux/x86
the choices of crt1.o are

%{!shared: %{pg|p|profile:gcrt1.o%s;pie:Scrt1.o%s;:crt1.o%s}}

-shared, -pg and -pie are mutually exclusive. Those crt1 files are
only crt1 files provided by glibc.  You can't even try -pg -pie on
Linux without changing glibc.


You're totally missing the point.  What I care about is *why*.


With -pg it use gcrt1.o object file and that file is not compile with -fPIC.
When you build a shared lib on x86_64 all the objects files need to be buiit
with -fPIC else you get a error like that one abow and it is the same problems
when you build bin with -fPIE and linke with -pie.
Glibc do not provide one that is compile with -fPIC

Is there some reason why glibc could not provide gcrt1.o compiled with
-fPIC?




Here is a patch to check if -pg is available.  If -pg doesn't link,
profiling isn't available.  OK for trunk?

OK with a suitable ChangeLog entry.

jeff



Re: [testsuite] PATCH: Add check_effective_target_pie

2015-01-13 Thread Jeff Law

On 01/13/15 05:52, H.J. Lu wrote:

On Mon, Jan 12, 2015 at 03:04:20PM -0700, Jeff Law wrote:

On 01/12/15 14:51, Magnus Granberg wrote:

måndag 12 januari 2015 12.11.17 skrev  H.J. Lu:

On Mon, Jan 12, 2015 at 12:03 PM, Jeff Law  wrote:

On 01/12/15 12:59, H.J. Lu wrote:

I don't know if -pg will work PIE on any targets.  For Linux/x86
the choices of crt1.o are

%{!shared: %{pg|p|profile:gcrt1.o%s;pie:Scrt1.o%s;:crt1.o%s}}

-shared, -pg and -pie are mutually exclusive. Those crt1 files are
only crt1 files provided by glibc.  You can't even try -pg -pie on
Linux without changing glibc.


You're totally missing the point.  What I care about is *why*.


With -pg it use gcrt1.o object file and that file is not compile with -fPIC.
When you build a shared lib on x86_64 all the objects files need to be buiit
with -fPIC else you get a error like that one abow and it is the same problems
when you build bin with -fPIE and linke with -pie.
Glibc do not provide one that is compile with -fPIC

Is there some reason why glibc could not provide gcrt1.o compiled with
-fPIC?


That is a good question. We can compile gcrt1.o with -fPIC and it will
work with both -pg and -pg -pie.  I will open a glibc bug.
Thanks for getting the bug opened, there's a reasonable chance that 
we'll have the gcrt1.o we want in the not too distant future.





Here is the updated patch without the check_profiling_available change.
OK for trunk?

Thanks.

H.J.
---
Subject: [PATCH 1/5] Add check_effective_target_pie

Hi,

This patch adds check_effective_target_pie to check if the current
multilib generates PIE by default.

Thanks.

H.J.
---
2015-01-11  H.J. Lu  

* gcc.target/i386/pie.c: New test.

* lib/target-supports.exp (check_effective_target_pie): New.

OK.
Jeff



Re: [PATCH] Fix ICE with -fgnu-tm and pragma ivdep (PR middle-end/64391)

2015-01-13 Thread Jeff Law

On 01/13/15 09:28, Marek Polacek wrote:

We ICE on this testcase, because the usage of #pragma GCC ivdep
pulls in the ANNOTATE internal functions which don't have underlying
fndecls, hence we segv on a NULL_TREE.  This patch makes get_attrs_for
be prepared for such a scenario.  The callers of get_attrs_for already
check for NULL_TREE.  I don't think internal fns can have transaction_*
attributes anyway.  While at it, I did some cleanups.

Bootstrapped/regtested on {ppc64,x86_64}-linux, ok for trunk?

2015-01-13  Marek Polacek  

PR middle-end/64391
* trans-mem.c (get_attrs_for): Return NULL_TREE if X is NULL_TREE.

* gcc.dg/tm/pr64391.c: New test.
OK.  I looked briefly at perhaps catching this earlier in the call 
chain, but your approach looks best to me.


Thanks,
Jeff



Re: [PATCH] Fix REE for vector modes (PR rtl-optimization/64286, take 2)

2015-01-13 Thread Jeff Law

On 01/13/15 09:11, Jakub Jelinek wrote:

On Mon, Jan 12, 2015 at 02:29:53PM -0700, Jeff Law wrote:

On 01/12/15 12:59, Jakub Jelinek wrote:

Hi!

As mentioned in the PR, giving up for all vector mode extensions
is unnecessary, but unlike scalar integer extensions, where the low part
of the extended value is the original value, for vectors this is not true,
thus the old value is lost.  Which means we can perform REE, but only if
all uses of the definition are the same (code+mode) extension.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2015-01-12  Jakub Jelinek  

PR rtl-optimization/64286
* ree.c (add_removable_extension): Don't add vector mode
extensions if all uses of the source register aren't the same
vector extensions.

* gcc.target/i386/avx2-pr64286.c: New test.

Does it make sense to remove your change for 59754 in combine_reaching_defs?
Shouldn't this patch handle that case as well?


You're right, this patch handles that too.

New patch, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2015-01-13  Jakub Jelinek  

PR rtl-optimization/64286
* ree.c (combine_reaching_defs): Move part of comment earlier,
remove !SCALAR_INT_MODE_P check.
(add_removable_extension): Don't add vector mode
extensions if all uses of the source register aren't the same
vector extensions.

* gcc.target/i386/avx2-pr64286.c: New test.
OK.  Thanks for taking care of this.  I can't seem to find time for 
doing any real debugging or bugfixing.


jeff



Re: [PATCH] PR59448 - Promote consume to acquire

2015-01-13 Thread Andrew MacLeod

On 01/13/2015 01:38 PM, Torvald Riegel wrote:

On Tue, 2015-01-13 at 10:11 -0500, Andrew MacLeod wrote:

On 01/13/2015 09:59 AM, Richard Biener wrote:

On Tue, Jan 13, 2015 at 3:56 PM, Andrew MacLeod  wrote:

Lengthy discussion : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

Basically we can generate incorrect code for an atomic consume operation in
some circumstances.  The general feeling seems to be that we should simply
promote all consume operations to an acquire operation until there is a
better definition/understanding of the consume model and how GCC can track
it.

I proposed a simple patch in the PR, and I have not seen or heard of any
dissenting opinion.   We should get this in before the end of stage 3 I
think.

The problem with the patch in the PR is the  memory model is immediately
promoted from consume to acquire.   This happens *before* any of the
memmodel checks are made.  If a consume is illegally specified (such as in a
compare_exchange), it gets promoted to acquire and the compiler doesn't
report the error because it never sees the consume.

This new patch simply makes the adjustment after any errors are checked on
the originally specified model.   It bootstraps on x86_64-unknown-linux-gnu
and passes all regression testing.
I also built an aarch64 compiler and it appears to issue the LDAR as
specified in the PR, but anyone with a vested interest really ought to check
it out with a real build to be sure.

OK for trunk?

Why not patch get_memmodel?  (not sure if that catches all cases)

Richard.



That was the original patch.

The issue is that it  promotes consume to acquire before any error
checking gets to look at the model, so then we allow illegal
specification of consume. (It actually triggers a failure in the testsuite)

(This is this test: gcc/testsuite/gcc.dg/atomic-invalid.c)

The documentation of the atomic builtins also disallows mo_consume on
atomic_exchange.

However, I don't see any such requirement in C11 or C++14 (and I'd be
surprised to see it in C++11).  It would be surprising also because for
other atomic read-modify-write operations (eg, fetch_add), we don't make
such a requirement in the builtins docs -- and atomic_exchange is just a
read-modify-write with a noop, basically.

Does anyone remember why this requirement for no consume on exchange was
added, or sees a reason to keep it?  If not, I think we should drop it.
This would solve the testsuite failure for Andrew.  Dropping it would
prevent GCC from checking the consume-on-success / acquire-on-failure
case for compare_excahnge I mentioned previously, but I think that this
is pretty harmless.

I could imagine that, for some reason, either backends or libatomic do
not implement consume on atomic_exchange just because the docs
disallowed it -- but I haven't checked that.

I imagine it was probably in a previous incarnation of the standard...  
Most of this was actually implemented  based on very early draft 
standards years and years ago and never revised.   It wasnt put in by me 
unless the standard at some point said had such wording.   The current 
standard appears to make no mention of the situation.


It seems that it should be safe to move back to the original patch, and 
remove that error test for using consume on an exchange...


Andrew


Re: [PATCH] add option to emit more array bounds warnigs

2015-01-13 Thread Jeff Law

On 01/13/15 10:34, Martin Uecker wrote:


Mon, 12 Jan 2015 11:00:44 -0700
Jeff Law :

On 11/11/14 23:13, Martin Uecker wrote:


...




* gcc/tree-vrp.c (check_array_ref): Emit more warnings
 for warn_array_bounds >= 2.
* gcc/testsuite/gcc.dg/Warray-bounds-11.c: New test-case.
* gcc/c-family/c.opt: New option -Warray-bounds=.
* gcc/common.opt: New option -Warray-bounds=.
* gcc/doc/invoke.texi: Document new option.

Has this patch been bootstrapped and regression tested, if so on what
platform.


x86_64-unknown-linux-gnu

Approved.  Please install on the trunk.  Sorry about the delays.

Thanks,

Jeff


Re: [PATCH] Fix for PR64081 in RTL loop unroller

2015-01-13 Thread Jeff Law

On 01/13/15 11:01, Zamyatin, Igor wrote:


Is it really sufficient here to verify that all the defs are on latch 
predecessors,
what about the case where there is a predecessor without a def.  How do
you guarantee domination in that case?

ISTM that given the structure for the code you're writing that you'd want to
verify that in the event of multiple definitions that all of them appear on
immediate predecessors of the latch *and* that each immediate
predecessor has a definition.


Yes, do you think it's better to check exactly immediate predecessors?
I'd use the same structure that you have in iv_get_reaching_def.  If 
there was a reasonable way to factor that test into a single function 
and call it from both places that would be even better.


Jeff


Re: [PATCH] PR59448 - Promote consume to acquire

2015-01-13 Thread Jeff Law

On 01/13/15 11:38, Torvald Riegel wrote:

On Tue, 2015-01-13 at 10:11 -0500, Andrew MacLeod wrote:

On 01/13/2015 09:59 AM, Richard Biener wrote:

On Tue, Jan 13, 2015 at 3:56 PM, Andrew MacLeod  wrote:

Lengthy discussion : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

Basically we can generate incorrect code for an atomic consume operation in
some circumstances.  The general feeling seems to be that we should simply
promote all consume operations to an acquire operation until there is a
better definition/understanding of the consume model and how GCC can track
it.

I proposed a simple patch in the PR, and I have not seen or heard of any
dissenting opinion.   We should get this in before the end of stage 3 I
think.

The problem with the patch in the PR is the  memory model is immediately
promoted from consume to acquire.   This happens *before* any of the
memmodel checks are made.  If a consume is illegally specified (such as in a
compare_exchange), it gets promoted to acquire and the compiler doesn't
report the error because it never sees the consume.

This new patch simply makes the adjustment after any errors are checked on
the originally specified model.   It bootstraps on x86_64-unknown-linux-gnu
and passes all regression testing.
I also built an aarch64 compiler and it appears to issue the LDAR as
specified in the PR, but anyone with a vested interest really ought to check
it out with a real build to be sure.

OK for trunk?

Why not patch get_memmodel?  (not sure if that catches all cases)

Richard.



That was the original patch.

The issue is that it  promotes consume to acquire before any error
checking gets to look at the model, so then we allow illegal
specification of consume. (It actually triggers a failure in the testsuite)


(This is this test: gcc/testsuite/gcc.dg/atomic-invalid.c)

The documentation of the atomic builtins also disallows mo_consume on
atomic_exchange.

However, I don't see any such requirement in C11 or C++14 (and I'd be
surprised to see it in C++11).  It would be surprising also because for
other atomic read-modify-write operations (eg, fetch_add), we don't make
such a requirement in the builtins docs -- and atomic_exchange is just a
read-modify-write with a noop, basically.

Does anyone remember why this requirement for no consume on exchange was
added, or sees a reason to keep it?  If not, I think we should drop it.
This would solve the testsuite failure for Andrew.  Dropping it would
prevent GCC from checking the consume-on-success / acquire-on-failure
case for compare_excahnge I mentioned previously, but I think that this
is pretty harmless.

I could imagine that, for some reason, either backends or libatomic do
not implement consume on atomic_exchange just because the docs
disallowed it -- but I haven't checked that.
AFAICT that test has been there since the initial commit of 
sync-mem-invalid.c (which was later renamed to atomic-invalid).   In 
fact, that was the only test initially in sync-mem-invalid.c

commit 64d1dbf10e3f08305f4a8569e27fc2224f9074d2
Author: amacleod 
Date:   Thu Jun 23 13:09:31 2011 +

Basica tests for __sync_mem_exchange and framework for further 
additions.


* lib/target-support.exp (check_effective_target_sync_int_128,
check_effective_target_sync_long_long): Check whether the target
supports 64 and 128 bit __sync builtins.
* gcc.dg/sync-mem.h: New. Common code to check memory model 
__syncs.

* gcc.dg/sync-mem-1.c: New. Check char size.
* gcc.dg/sync-mem-2.c: New. Check short size.
* gcc.dg/sync-mem-3.c: New. Check int size.
* gcc.dg/sync-mem-4.c: New. Check long long.
* gcc.dg/sync-mem-5.c: New. Check 128 bit.
* gcc.dg/sync-mem-invalid.c: New. Check invalid memory modes.



git-svn-id: 
svn+ssh://gcc.gnu.org/svn/gcc/branches/cxx-mem-model@175331 
138bc75d-0d04-0410-961f-82ee72b054a4



Mostly hoping this refreshes Andrew's memory and he can provide some 
insight on why we test this particular combination and consider it invalid.


I was kind of hoping that we'd track this down to something like a 
particular target didn't support this capability with the old sync 
builtins and we carried it into the atomics when we made that switch.


I don't have a vested interest in either approach.  I just want to see 
us DTRT.



jeff



Re: [PATCH][rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1

2015-01-13 Thread Eric Botcazou
> Sorry for the slow response.  Jeff has approved the patch in the
> meantime, but I didn't want to go ahead and apply it while there
> was still disagreement...

I still think that it isn't appropriate to short-circuit the main computation 
as the patch does, but I don't want to block it after Jeff's approval.

> (1) we have a non-paradoxical subreg;
> (2) both (reg:ymode xregno) and (reg:xmode xregno) occupy full
> hard registers (no padding or unused upper bits);
> (3) (reg:ymode xregno) and (reg:xmode xregno) store the same number
> of bytes (X) in each constituent hard register;
> (4) the offset is a multiple of X, i.e. the data we're accessing
> is aligned to a register boundary; and
> (5) endianness is regular (no differences between words and bytes,
> or between registers and memory)

OK, that's a nice translation of the new code. :-)

It seems to me that the patch wants to extend the support of generic subregs 
to modes whose sizes are not multiple of each other, which is a requirement of 
the existing code, but does that in a very specific case for the sake of the 
ARM port without saying where all the above restrictions come from.

-- 
Eric Botcazou


Re: PR54442 build_qualified_type produces a non-canonical type

2015-01-13 Thread Paolo Carlini

Hi,

On 06/09/2014 04:46 PM, Jason Merrill wrote:

On 06/09/2014 10:32 AM, Marc Glisse wrote:

On Mon, 9 Jun 2014, Jason Merrill wrote:


On 06/09/2014 10:18 AM, Marc Glisse wrote:

I doubt the patch can be wrong, but it may be that this is a situation
that is not supposed to happen and should be fixed elsewhere?


Seems likely.  What is the difference between the type returned from
build_qualified_type (TYPE_CANONICAL and it's TYPE_CANONICAL? I would
expect them to be the same.


 throws >>

(in what build_qualified_type returns)


I guess that makes sense, given that the exception specification isn't 
really part of the type.  The patch is OK.


In fact, I noticed today that this is a 4.8/4.9 Regression too. Shall I 
try to apply the patchlet to 4_9-branch too and, if testing passes, 
commit there and close the bug?


Thanks,
Paolo.


Re: [PATCH] PR59448 - Promote consume to acquire

2015-01-13 Thread Torvald Riegel
On Tue, 2015-01-13 at 10:11 -0500, Andrew MacLeod wrote:
> On 01/13/2015 09:59 AM, Richard Biener wrote:
> > On Tue, Jan 13, 2015 at 3:56 PM, Andrew MacLeod  wrote:
> >> Lengthy discussion : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448
> >>
> >> Basically we can generate incorrect code for an atomic consume operation in
> >> some circumstances.  The general feeling seems to be that we should simply
> >> promote all consume operations to an acquire operation until there is a
> >> better definition/understanding of the consume model and how GCC can track
> >> it.
> >>
> >> I proposed a simple patch in the PR, and I have not seen or heard of any
> >> dissenting opinion.   We should get this in before the end of stage 3 I
> >> think.
> >>
> >> The problem with the patch in the PR is the  memory model is immediately
> >> promoted from consume to acquire.   This happens *before* any of the
> >> memmodel checks are made.  If a consume is illegally specified (such as in 
> >> a
> >> compare_exchange), it gets promoted to acquire and the compiler doesn't
> >> report the error because it never sees the consume.
> >>
> >> This new patch simply makes the adjustment after any errors are checked on
> >> the originally specified model.   It bootstraps on x86_64-unknown-linux-gnu
> >> and passes all regression testing.
> >> I also built an aarch64 compiler and it appears to issue the LDAR as
> >> specified in the PR, but anyone with a vested interest really ought to 
> >> check
> >> it out with a real build to be sure.
> >>
> >> OK for trunk?
> > Why not patch get_memmodel?  (not sure if that catches all cases)
> >
> > Richard.
> >
> >
> That was the original patch.
> 
> The issue is that it  promotes consume to acquire before any error 
> checking gets to look at the model, so then we allow illegal 
> specification of consume. (It actually triggers a failure in the testsuite)

(This is this test: gcc/testsuite/gcc.dg/atomic-invalid.c)

The documentation of the atomic builtins also disallows mo_consume on
atomic_exchange.

However, I don't see any such requirement in C11 or C++14 (and I'd be
surprised to see it in C++11).  It would be surprising also because for
other atomic read-modify-write operations (eg, fetch_add), we don't make
such a requirement in the builtins docs -- and atomic_exchange is just a
read-modify-write with a noop, basically.

Does anyone remember why this requirement for no consume on exchange was
added, or sees a reason to keep it?  If not, I think we should drop it.
This would solve the testsuite failure for Andrew.  Dropping it would
prevent GCC from checking the consume-on-success / acquire-on-failure
case for compare_excahnge I mentioned previously, but I think that this
is pretty harmless.

I could imagine that, for some reason, either backends or libatomic do
not implement consume on atomic_exchange just because the docs
disallowed it -- but I haven't checked that.




Re: [PATCH] Allow MIPS call-saved-{4-6}.c tests to correctly run for micromips

2015-01-13 Thread Maciej W. Rozycki
On Tue, 13 Jan 2015, Andrew Bennett wrote:

> The call-saved-{4-6}.c tests in the mips testsuite fail for micromips.  The 
> reason is 
> that micromips uses the swm and lwm instructions to save/restore the 
> call-saved registers 
> rather than using the sw and lw instructions.  The swm and lwm instructions 
> only list 
> the range of registers to use ie. $16-$25 and hence some of the 
> scan-assembler 
> patterns fail.  This fix adds the NO_COMPRESSION attribute to the foo 
> function to 
> force the tests to always compile as mips.
>  
> I have tested this for both mips and micromips, and the tests now pass 
> successfully.  
> The ChangeLog and patch are below.

 Hmm, instead of trying to avoid testing microMIPS code generation just to 
satisfy the test suite I'd rather see the test cases updated so that 
LWM/SWM register ranges are expected and accepted whenever microMIPS code 
is produced.  These scan patterns can be made conditional.

  Maciej


[PATCH][AArch64 Intrinsics] Replace temporary assembler for vst1_lane

2015-01-13 Thread Alan Lawrence
Nowadays, just storing the (bigendian-corrected) vector element to the address, 
generates exactly the same assembler for all cases except 
{float,int,uint}64x1_t, where

st1 {v0.d}[0], [x0]
becomes
str d0, [x0]

This is not a problem, and the change will be much better for optimization 
through the midend, as well as making use of previous improvements in error 
reporting.


Also move the /* vst1q */ comment, which was a couple intrinsics too late.

gcc/ChangeLog:

* config/aarch64/arm_neon.h (vst1_lane_f32, vst1_lane_f64,
vst1_lane_p8, vst1_lane_p16, vst1_lane_s8, vst1_lane_s16,
vst1_lane_s32, vst1_lane_s64, vst1_lane_u8, vst1_lane_u16,
vst1_lane_u32, vst1_lane_u64, vst1q_lane_f32, vst1q_lane_f64,
vst1q_lane_p8, vst1q_lane_p16, vst1q_lane_s8, vst1q_lane_s16,
vst1q_lane_s32, vst1q_lane_s64, vst1q_lane_u8, vst1q_lane_u16,
vst1q_lane_u32, vst1q_lane_u64): Reimplement with pointer dereference
and __aarch64_vget_lane_any.

Cross-tested check-gcc on aarch64-none-elf and aarch64_be-none-elf.

Ok for trunk?

Cheers, Alancommit 926aec661699e52f617f16068075ef0242a43609
Author: Alan Lawrence 
Date:   Thu Dec 11 17:29:54 2014 +

Replace temporary inline assembler for vst1_lane, move /* vst1q */ comment.

Note for (float|u?int)64x1 vectors,
   st1 {v0.d}[0], [x0]
becomes
   str d0, [x0]

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 3d1bcd5..980490f 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -10304,272 +10304,6 @@ vrsqrtss_f32 (float32_t a, float32_t b)
result;  \
  })
 
-#define vst1_lane_f32(a, b, c)  \
-  __extension__ \
-({  \
-   float32x2_t b_ = (b);\
-   float32_t * a_ = (a);\
-   __asm__ ("st1 {%1.s}[%2],[%0]"   \
-:   \
-: "r"(a_), "w"(b_), "i"(c)  \
-: "memory");\
- })
-
-#define vst1_lane_f64(a, b, c)  \
-  __extension__ \
-({  \
-   float64x1_t b_ = (b);\
-   float64_t * a_ = (a);\
-   __asm__ ("st1 {%1.d}[%2],[%0]"   \
-:   \
-: "r"(a_), "w"(b_), "i"(c)  \
-: "memory");\
- })
-
-#define vst1_lane_p8(a, b, c)   \
-  __extension__ \
-({  \
-   poly8x8_t b_ = (b);  \
-   poly8_t * a_ = (a);  \
-   __asm__ ("st1 {%1.b}[%2],[%0]"   \
-:   \
-: "r"(a_), "w"(b_), "i"(c)  \
-: "memory");\
- })
-
-#define vst1_lane_p16(a, b, c)  \
-  __extension__ \
-({  \
-   poly16x4_t b_ = (b); \
-   poly16_t * a_ = (a); \
-   __asm__ ("st1 {%1.h}[%2],[%0]"   \
-:   \
-: "r"(a_), "w"(b_), "i"(c)  \
-: "memory");\
- })
-
-#define vst1_lane_s8(a, b, c)   \
-  __extension__ \
-({  \
-   int8x8_t b_ = (b);   \
-   int8_t * a_ = (a);   \
-   __asm__ ("st1 {%1.b}[%2],[%0]"   \
-:

RE: [PATCH] Fix for PR64081 in RTL loop unroller

2015-01-13 Thread Zamyatin, Igor
> 
> Is it really sufficient here to verify that all the defs are on latch 
> predecessors,
> what about the case where there is a predecessor without a def.  How do
> you guarantee domination in that case?
> 
> ISTM that given the structure for the code you're writing that you'd want to
> verify that in the event of multiple definitions that all of them appear on
> immediate predecessors of the latch *and* that each immediate
> predecessor has a definition.

Yes, do you think it's better to check exactly immediate predecessors?

> > -
> > -  if (!just_once_each_iteration_p (current_loop, DF_REF_BB (adef)))
> > -   return false;
> > +   {
> > + def_num++;
> > + if (!(def_pred_latch = def_pred_latch_p (adef))
> > + || !rtx_equal_p( PATTERN (DF_REF_INSN (single_rd)),
> 
> Whitespace nit here.  Whitespace goes before the open paren for the
> function call, not after.

Thanks for catching this!

> 
> 
> > @@ -351,10 +384,10 @@ latch_dominating_def (rtx reg, df_ref *def)
> >   static enum iv_grd_result
> >   iv_get_reaching_def (rtx_insn *insn, rtx reg, df_ref *def)
> 
> And in this routine, you appear to do both checks.  ie, each def is on an
> immediate predecessor and each immediate predecessor has a def.  Is there
> some reason why iv_get_reaching_def has the stronger check while
> latch_dominating_def does not?

Looks like I was sure that latch_dominating_def always goes after 
iv_get_reaching_def but now I see it is not true. Will add another check in 
latch_dominating_def.

Thanks,
Igor

> 
> jeff


Re: shift/extract SHIFT_COUNT_TRUNCATED combine bug

2015-01-13 Thread Segher Boessenkool
On Tue, Jan 13, 2015 at 10:51:27AM +0100, Richard Biener wrote:
> IMHO SHIFT_COUNT_TRUNCATED should be removed and instead
> backends should provide shift patterns with a (and:QI ...) for the
> shift amount which simply will omit that operation if suitable.

Note that that catches less though, e.g. in

int f(int x, int n) { return x << ((2*n) & 31); }

without SHIFT_COUNT_TRUNCATED it will try to match an AND with 30,
not with 31.


Segher


Re: [PATCH] add option to emit more array bounds warnigs

2015-01-13 Thread Martin Uecker

Mon, 12 Jan 2015 11:00:44 -0700
Jeff Law :
> On 11/11/14 23:13, Martin Uecker wrote:

...

> >
> >
> > * gcc/tree-vrp.c (check_array_ref): Emit more warnings
> > for warn_array_bounds >= 2.
> > * gcc/testsuite/gcc.dg/Warray-bounds-11.c: New test-case.
> > * gcc/c-family/c.opt: New option -Warray-bounds=.
> > * gcc/common.opt: New option -Warray-bounds=.
> > * gcc/doc/invoke.texi: Document new option.
> Has this patch been bootstrapped and regression tested, if so on what 
> platform.

x86_64-unknown-linux-gnu

> Given the new warnings (as implemented by the patch) are not enabled by 
> default, I'm inclined to approve once Martin verifies things via 
> bootstrap and regression test.

Thank you,

Martin

 



[PATCH] Allow MIPS call-saved-{4-6}.c tests to correctly run for micromips

2015-01-13 Thread Andrew Bennett
Hi,

The call-saved-{4-6}.c tests in the mips testsuite fail for micromips.  The 
reason is 
that micromips uses the swm and lwm instructions to save/restore the call-saved 
registers 
rather than using the sw and lw instructions.  The swm and lwm instructions 
only list 
the range of registers to use ie. $16-$25 and hence some of the scan-assembler 
patterns fail.  This fix adds the NO_COMPRESSION attribute to the foo function 
to 
force the tests to always compile as mips.
 
I have tested this for both mips and micromips, and the tests now pass 
successfully.  
The ChangeLog and patch are below.

Ok to commit?


Many thanks,


Andrew


testsuite/

* gcc.target/mips/call-saved-4.c: Add NO_COMPRESSION attribute.
* gcc.target/mips/call-saved-5.c: Likewise.
* gcc.target/mips/call-saved-6.c: Likewise.



diff --git a/gcc/testsuite/gcc.target/mips/call-saved-4.c 
b/gcc/testsuite/gcc.target/mips/call-saved-4.c
index 846ea32..92881c4 100644
--- a/gcc/testsuite/gcc.target/mips/call-saved-4.c
+++ b/gcc/testsuite/gcc.target/mips/call-saved-4.c
@@ -3,7 +3,7 @@
 
 void bar (void);
 
-void
+NOCOMPRESSION void
 foo (int x)
 {
   __builtin_unwind_init ();
diff --git a/gcc/testsuite/gcc.target/mips/call-saved-5.c 
b/gcc/testsuite/gcc.target/mips/call-saved-5.c
index 2937b31..152b28f 100644
--- a/gcc/testsuite/gcc.target/mips/call-saved-5.c
+++ b/gcc/testsuite/gcc.target/mips/call-saved-5.c
@@ -3,7 +3,7 @@
 
 void bar (void);
 
-void
+NOCOMPRESSION void
 foo (int x)
 {
   __builtin_unwind_init ();
diff --git a/gcc/testsuite/gcc.target/mips/call-saved-6.c 
b/gcc/testsuite/gcc.target/mips/call-saved-6.c
index 0d1a4c8..a384d4a 100644
--- a/gcc/testsuite/gcc.target/mips/call-saved-6.c
+++ b/gcc/testsuite/gcc.target/mips/call-saved-6.c
@@ -3,7 +3,7 @@
 
 void bar (void);
 
-void
+NOCOMPRESSION void
 foo (int x)
 {
   __builtin_unwind_init ();




Re: [PATCH] [AArch64, NEON] Improve vpmaxX & vpminX intrinsics

2015-01-13 Thread Tejas Belagod

On 09/12/14 08:17, Yangfei (Felix) wrote:

On 28 November 2014 at 09:23, Yangfei (Felix)  wrote:

Hi,
   This patch converts vpmaxX & vpminX intrinsics to use builtin functions

instead of the previous inline assembly syntax.

   Regtested with aarch64-linux-gnu on QEMU.  Also passed the glorious

testsuite of Christophe Lyon.

   OK for the trunk?


Hi Felix,   We know from experience that the advsimd intrinsics tend
to be fragile for big endian and in general it is fairly easy to break the big 
endian
case.  For these advsimd improvements that you are working on (that we very
much appreciate) it is important to run both little endian and big endian
regressions.

Thanks
/Marcus



Okay.  Any plan for the advsimd big-endian improvement?
I rebased this patch over Alan Lawrance's patch: 
https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00279.html
No regressions for aarch64_be-linux-gnu target too.  OK for the thunk?


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 218464)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,18 @@
+2014-12-09  Felix Yang  
+
+   * config/aarch64/aarch64-simd.md (aarch64_p): New
+   pattern.
+   * config/aarch64/aarch64-simd-builtins.def (smaxp, sminp, umaxp,
+   uminp, smax_nanp, smin_nanp): New builtins.
+   * config/aarch64/arm_neon.h (vpmax_s8, vpmax_s16, vpmax_s32,
+   vpmax_u8, vpmax_u16, vpmax_u32, vpmaxq_s8, vpmaxq_s16, vpmaxq_s32,
+   vpmaxq_u8, vpmaxq_u16, vpmaxq_u32, vpmax_f32, vpmaxq_f32, vpmaxq_f64,
+   vpmaxqd_f64, vpmaxs_f32, vpmaxnm_f32, vpmaxnmq_f32, vpmaxnmq_f64,
+   vpmaxnmqd_f64, vpmaxnms_f32, vpmin_s8, vpmin_s16, vpmin_s32, vpmin_u8,
+   vpmin_u16, vpmin_u32, vpminq_s8, vpminq_s16, vpminq_s32, vpminq_u8,
+   vpminq_u16, vpminq_u32, vpmin_f32, vpminq_f32, vpminq_f64, vpminqd_f64,
+   vpmins_f32, vpminnm_f32, vpminnmq_f32, vpminnmq_f64, vpminnmqd_f64,
+




  __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
Index: gcc/config/aarch64/aarch64-simd.md
===
--- gcc/config/aarch64/aarch64-simd.md  (revision 218464)
+++ gcc/config/aarch64/aarch64-simd.md  (working copy)
@@ -1017,6 +1017,28 @@
DONE;
  })

+;; Pairwise Integer Max/Min operations.
+(define_insn "aarch64_p"
+ [(set (match_operand:VDQ_BHSI 0 "register_operand" "=w")
+   (unspec:VDQ_BHSI [(match_operand:VDQ_BHSI 1 "register_operand" "w")
+(match_operand:VDQ_BHSI 2 "register_operand" "w")]
+   MAXMINV))]
+ "TARGET_SIMD"
+ "p\t%0., %1., %2."
+  [(set_attr "type" "neon_minmax")]
+)
+


Hi Felix,

Sorry for the delay in getting back to you on this.

If you've rolled aarch64_reduc__internalv2si into the above 
pattern, do you still need it? For all its call points, just point them 
to aarch64_p?


Thanks,
Tejas.




Re: [committed] Update copyright years, part 2

2015-01-13 Thread Jakub Jelinek
On Tue, Jan 13, 2015 at 05:06:35PM +, Richard Sandiford wrote:
> Jakub Jelinek  writes:
> > Patch too large to attach uncompressed, this
> > has been created with update-copyright.py --this-year.
> > Note, I had to temporarily move away gcc/jit/docs/conf.py,
> > the python script dies on that and leaves almost all files unchanged.
> 
> Thanks for doing the update.  Is the patch below OK to fix the JIT thing?
> After this change, update-copyright.py --this-year seems to update
> gcc/jit correctly (including the texinfo files).
> 
> Richard
> 
> 
> contrib/
>   * update-copyright.py (Copyright.__init__): Add a regexp for
>   "copyright = u'".
>   (Copyright.update_copyright): Don't add a space before the year
>   in that case.

Ok, thanks.

> --- contrib/update-copyright.py   2014-08-05 10:29:02.695491816 +0100
> +++ contrib/update-copyright.py   2015-01-13 14:13:43.500812967 +
> @@ -183,6 +183,7 @@ class Copyright:
>  '|[Cc]opyright\s+%s'
>  '|[Cc]opyright\s+©'
>  '|[Cc]opyright\s+@copyright{}'
> +'|copyright = u\''
>  '|@set\s+copyright[\w-]+)'
>  
>  # 2: the years.  Include the whitespace in the year, so that
> @@ -363,7 +364,8 @@ class Copyright:
>  return (False, orig_line, next_line)
>  
>  line = (line[:match.start (2)]
> -+ ' ' + canon_form + self.separator
> ++ ('' if intro.startswith ('copyright = ') else ' ')
> ++ canon_form + self.separator
>  + line[match.end (2):])
>  
>  # Use the standard (C) form.

Jakub


Re: [committed] Update copyright years, part 2

2015-01-13 Thread Richard Sandiford
Jakub Jelinek  writes:
> Patch too large to attach uncompressed, this
> has been created with update-copyright.py --this-year.
> Note, I had to temporarily move away gcc/jit/docs/conf.py,
> the python script dies on that and leaves almost all files unchanged.

Thanks for doing the update.  Is the patch below OK to fix the JIT thing?
After this change, update-copyright.py --this-year seems to update
gcc/jit correctly (including the texinfo files).

Richard


contrib/
* update-copyright.py (Copyright.__init__): Add a regexp for
"copyright = u'".
(Copyright.update_copyright): Don't add a space before the year
in that case.

Index: contrib/update-copyright.py
===
--- contrib/update-copyright.py 2014-08-05 10:29:02.695491816 +0100
+++ contrib/update-copyright.py 2015-01-13 14:13:43.500812967 +
@@ -183,6 +183,7 @@ class Copyright:
 '|[Cc]opyright\s+%s'
 '|[Cc]opyright\s+©'
 '|[Cc]opyright\s+@copyright{}'
+'|copyright = u\''
 '|@set\s+copyright[\w-]+)'
 
 # 2: the years.  Include the whitespace in the year, so that
@@ -363,7 +364,8 @@ class Copyright:
 return (False, orig_line, next_line)
 
 line = (line[:match.start (2)]
-+ ' ' + canon_form + self.separator
++ ('' if intro.startswith ('copyright = ') else ' ')
++ canon_form + self.separator
 + line[match.end (2):])
 
 # Use the standard (C) form.



RE: Open Issues in the TSAN Runtime

2015-01-13 Thread Bernd Edlinger

On Tue, 13 Jan 2015 17:22:23, Jakub Jelinek wrote:
>
>
> So do you mean following? I've bootstrapped/regtested on x86_64-linux and
> i686-linux, but haven't tried any special testcases on it. If it works for
> you, I'll commit it.
>

OK.

Thanks
Bernd.

> 2015-01-13 Jakub Jelinek 
>
> * sanitizer_common/sanitizer_deadlock_detector.h: Cherry pick
> upstream r224518 and r224519.
> * tsan/tsan_rtl_thread.cc: Cherry pick upstream r224702 and
> r224834.
>
> --- libsanitizer/sanitizer_common/sanitizer_deadlock_detector.h.jj 2014-05-22 
> 10:18:14.893626024 +0200
> +++ libsanitizer/sanitizer_common/sanitizer_deadlock_detector.h 2015-01-13 
> 14:07:40.096414924 +0100
> @@ -48,6 +48,8 @@ class DeadlockDetectorTLS {
> if (epoch_ == current_epoch) return;
> bv_.clear();
> epoch_ = current_epoch;
> + n_recursive_locks = 0;
> + n_all_locks_ = 0;
> }
>
> uptr getEpoch() const { return epoch_; }
> @@ -81,7 +83,8 @@ class DeadlockDetectorTLS {
> }
> }
> // Printf("remLock: %zx %zx\n", lock_id, epoch_);
> - CHECK(bv_.clearBit(lock_id));
> + if (!bv_.clearBit(lock_id))
> + return; // probably addLock happened before flush
> if (n_all_locks_) {
> for (sptr i = n_all_locks_ - 1; i>= 0; i--) {
> if (all_locks_with_contexts_[i].lock == static_cast(lock_id)) {
> @@ -173,6 +176,7 @@ class DeadlockDetector {
> recycled_nodes_.clear();
> available_nodes_.setAll();
> g_.clear();
> + n_edges_ = 0;
> return getAvailableNode(data);
> }
>
> --- libsanitizer/tsan/tsan_rtl_thread.cc.jj 2014-09-24 11:08:03.824028080 
> +0200
> +++ libsanitizer/tsan/tsan_rtl_thread.cc 2015-01-13 14:08:06.167954292 +0100
> @@ -109,12 +109,13 @@ void ThreadContext::OnStarted(void *arg)
> thr->dd_pt = ctx->dd->CreatePhysicalThread();
> thr->dd_lt = ctx->dd->CreateLogicalThread(unique_id);
> }
> + thr->fast_state.SetHistorySize(flags()->history_size);
> + // Commit switch to the new part of the trace.
> + // TraceAddEvent will reset stack0/mset0 in the new part for us.
> + TraceAddEvent(thr, thr->fast_state, EventTypeMop, 0);
> +
> thr->fast_synch_epoch = epoch0;
> AcquireImpl(thr, 0, &sync);
> - thr->fast_state.SetHistorySize(flags()->history_size);
> - const uptr trace = (epoch0 / kTracePartSize) % TraceParts();
> - Trace *thr_trace = ThreadTrace(thr->tid);
> - thr_trace->headers[trace].epoch0 = epoch0;
> StatInc(thr, StatSyncAcquire);
> sync.Reset(&thr->clock_cache);
> DPrintf("#%d: ThreadStart epoch=%zu stk_addr=%zx stk_size=%zx "
>
>
> Jakub
  

Re: [PATCH] Reenable CSE of non-volatile inline asm (PR rtl-optimization/63637)

2015-01-13 Thread Segher Boessenkool
On Tue, Jan 13, 2015 at 05:18:19PM +0100, Jakub Jelinek wrote:
> 3) on request from Richard (which Segher on IRC argues against), "memory"
>clobber also prevents CSE;

As extend.texi used to say:

"
If your assembler instructions access memory in an unpredictable
fashion, add @samp{memory} to the list of clobbered registers.  This
causes GCC to not keep memory values cached in registers across the
assembler instruction and not optimize stores or loads to that memory.
You also should add the @code{volatile} keyword if the memory
affected is not listed in the inputs or outputs of the @code{asm}, as
the @samp{memory} clobber does not count as a side-effect of the
@code{asm}.
"

so a "memory" clobber in a non-volatile asm should not prevent CSE.


Segher


Re: shift/extract SHIFT_COUNT_TRUNCATED combine bug

2015-01-13 Thread Jeff Law

On 01/13/15 02:51, Richard Biener wrote:

On a SHIFT_COUNT_TRUNCATED target, I don't think it's ever OK to widen a
shift, variable or constant.

In the case of a variable shift, we could easily have eliminated the masking
code before or during combine.  For a constant shift amount we could have
adjusted the constant (see SHIFT_COUNT_TRUNCATED in cse.c)

I think it's just an oversight and it has simply never bit us before.


IMHO SHIFT_COUNT_TRUNCATED should be removed and instead
backends should provide shift patterns with a (and:QI ...) for the
shift amount which simply will omit that operation if suitable.
Perhaps.  I'm certainly not wed to concept of SHIFT_COUNT_TRUNCATED.  I 
don't see that getting addressed in the gcc-5 timeframe.




aarch64, alpha, epiphany, iq2000, lm32, m32r, mep, microblaze, mips, 
mn103, nds32, pa, sparc, stormy16, tilepro, v850 and xtensa are the 
current SHIFT_COUNT_TRUNCATED targets.



Jeff


[patch] update function comments for lto_symtab_encoder_encode_*

2015-01-13 Thread Aldy Hernandez

Hi Richard.

I'm chasing my tail here looking at an LTO + debug problem, and for the 
life of me I can't figure out how all this partition business affects a 
symbol's `analyzed' bit.  Anyways... the documentation for all these 
functions is wrong.


Can you look at this patch and tell me if it makes sense?  I feel a bit 
uneasy committing under the obvious rule, since I don't entirely 
understand the partitioning thing.


Would anyone mind me fixing this on mainline?  It's just a comment fix.

Also, since you seem to understand all this best, can you suggest some 
better wording for the lto_encoder_entry comments?


/* Entry of LTO symtab encoder.  */
struct lto_encoder_entry
{
  symtab_node *node;
  /* Is the node in this partition (i.e. ltrans of this partition will
 be responsible for outputting it)? */
  unsigned int in_partition:1;
  /* Do we encode body in this partition?  */
  unsigned int body:1;
  /* Do we encode initializer in this partition?
 For example the readonly variable initializers are encoded to aid
 constant folding even if they are not in the partition.  */
  unsigned int initializer:1;
};

Whenever I get to the LTO part of this project, I promise to start 
documenting things better.  This whole thing is a mystery.


Thanks.
Aldy
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index bddb9a8..68f8042 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2015-01-13  Aldy Hernandez  
+
+   * lto-cgraph: Update function comments for
+   lto_symtab_encoder_encode_*.
+
 2014-12-20  Martin Uecker 
 
* doc/invoke.texi: Document -Wdiscarded-array-qualifiers.
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index cf92892..0b9d945 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -187,7 +187,7 @@ lto_symtab_encoder_delete_node (lto_symtab_encoder_t 
encoder,
 }
 
 
-/* Return TRUE if we should encode initializer of NODE (if any).  */
+/* Return TRUE if we should encode the body of NODE (if any).  */
 
 bool
 lto_symtab_encoder_encode_body_p (lto_symtab_encoder_t encoder,
@@ -197,7 +197,7 @@ lto_symtab_encoder_encode_body_p (lto_symtab_encoder_t 
encoder,
   return encoder->nodes[index].body;
 }
 
-/* Return TRUE if we should encode body of NODE (if any).  */
+/* Specify that we encode the body of NODE in this partition.  */
 
 static void
 lto_set_symtab_encoder_encode_body (lto_symtab_encoder_t encoder,
@@ -220,7 +220,7 @@ lto_symtab_encoder_encode_initializer_p 
(lto_symtab_encoder_t encoder,
   return encoder->nodes[index].initializer;
 }
 
-/* Return TRUE if we should encode initializer of NODE (if any).  */
+/* Specify that we should encode initializer of NODE (if any).  */
 
 static void
 lto_set_symtab_encoder_encode_initializer (lto_symtab_encoder_t encoder,
@@ -230,7 +230,7 @@ lto_set_symtab_encoder_encode_initializer 
(lto_symtab_encoder_t encoder,
   encoder->nodes[index].initializer = true;
 }
 
-/* Return TRUE if we should encode initializer of NODE (if any).  */
+/* Return TRUE if NODE is in this partition.  */
 
 bool
 lto_symtab_encoder_in_partition_p (lto_symtab_encoder_t encoder,
@@ -242,7 +242,7 @@ lto_symtab_encoder_in_partition_p (lto_symtab_encoder_t 
encoder,
   return encoder->nodes[index].in_partition;
 }
 
-/* Return TRUE if we should encode body of NODE (if any).  */
+/* Specify that NODE is in this partition.  */
 
 void
 lto_set_symtab_encoder_in_partition (lto_symtab_encoder_t encoder,


Re: flatten expr.h (version 2)

2015-01-13 Thread Prathamesh Kulkarni
On 13 January 2015 at 16:06, Prathamesh Kulkarni
 wrote:
> On 13 January 2015 at 15:34, Richard Biener  wrote:
>> On Sun, 11 Jan 2015, Prathamesh Kulkarni wrote:
>>
>>> Hi,
>>> This is a revamped expr.h flattening flattening patch rebased on
>>> tree.h and tree-core.h flattening patch (r219402).
>>> It depends upon the following patch to get committed.
>>> https://gcc.gnu.org/ml/gcc-patches/2015-01/msg00565.html
>>>
>>> Changes:
>>> * Removed all includes except tree-core.h. Put includes required by
>>> expr.h in a comment.
>>> * Moved stmt.c, expmed.c prototypes to stmt.h, expmed.h respectively.
>>> * Adjusted generator programs: genemit.c, gengtype.c, genopinit.c, 
>>> genoutput.c.
>>> * Did not put includes in gcc-plugin.h since expr.h cannot be included
>>> by plugins
>>> (putting them broke building a file in c-family/ since expr.h is not
>>> allowed in front-ends)
>>> * Affects java front-end (expr.h is allowed in java front-end).
>>>
>>> Bootstrapped and tested on x86_64-unknown-linux-gnu with languages:
>>> all,go,ada,jit
>>> Built on all targets in config-list.mk with languages: all, go.
>>> OK to commit ?
>>
>> diff --git a/gcc/expr.c b/gcc/expr.c
>> index fc22862..824541e 100644
>> --- a/gcc/expr.c
>> +++ b/gcc/expr.c
>> @@ -41,11 +41,17 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "regs.h"
>>  #include "hard-reg-set.h"
>>  #include "except.h"
>> -#include "input.h"
>>  #include "function.h"
>>  #include "insn-config.h"
>>  #include "insn-attr.h"
>>  /* Include expr.h after insn-config.h so we get HAVE_conditional_move.
>> */
>> +#include "hashtab.h"
>> +#include "emit-rtl.h"
>> +#include "expmed.h"
>> +#include "stmt.h"
>> +#include "statistics.h"
>> +#include "real.h"
>> +#include "fixed-value.h"
>>  #include "expr.h"
>>
>> Please move the comment to the proper place
> ah, my flattening tool doesn't look at comments. I will move the
> comment before expr.h include, thanks.
>>
>> diff --git a/gcc/expr.h b/gcc/expr.h
>> index a7638b8..f1be8dc 100644
>> --- a/gcc/expr.h
>> +++ b/gcc/expr.h
>> @@ -20,7 +20,8 @@ along with GCC; see the file COPYING3.  If not see
>>  #ifndef GCC_EXPR_H
>>  #define GCC_EXPR_H
>>
>> -/* For inhibit_defer_pop */
>> +/* expr.h required includes */
>> +#if 0
>>  #include "hashtab.h"
>>  #include "hash-set.h"
>>  #include "vec.h"
>> @@ -29,15 +30,17 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "hard-reg-set.h"
>>  #include "input.h"
>>  #include "function.h"
>> -/* For XEXP, GEN_INT, rtx_code */
>>  #include "rtl.h"
>> -/* For optimize_size */
>>  #include "flags.h"
>> -/* For tree_fits_[su]hwi_p, tree_to_[su]hwi, fold_convert, size_binop,
>> -   ssize_int, TREE_CODE, TYPE_SIZE, int_size_in_bytes,*/
>>  #include "tree-core.h"
>> -/* For GET_MODE_BITSIZE, word_mode */
>>  #include "insn-config.h"
>> +#include "alias.h"
>> +#include "emit-rtl.h"
>> +#include "expmed.h"
>> +#include "stmt.h"
>> +#endif
>>
>> Err, please remove the #if 0 section
> I kept it because if something breaks later (hopefully not!), it will
> be easier to fix.
> I will remove it.
>>
>> +
>> +#include "tree-core.h"
>>
>> Why?  The original comment says
>>
>> -/* For tree_fits_[su]hwi_p, tree_to_[su]hwi, fold_convert, size_binop,
>> -   ssize_int, TREE_CODE, TYPE_SIZE, int_size_in_bytes,*/
>>
>> but all those are declared in tree.h.  Which means the files including
>> expr.h must already include tree.h.
>>
>> If that's not the reason we need to include tree-core.h from expr.c
>> please add a comment explaining why.
> bt-load.c fails to compile because it includes expr.h but does not
> include tree.h
> I will place tree.h include in all files that include expr.h and rebuild.
This is not going to work, since tree.h is now flattened. Shall also
require including all headers required by
tree.h in all files that include expr.h. Could we retain tree-core.h
in expr.h for now ?
Or should I insert tree.h (along with tree.h required includes) in all
files that include expr.h ?

Thanks,
Prathamesh
>>
>> -/* Definitions from emit-rtl.c */
>> -#include "emit-rtl.h"
>> -
>>  /* Return a memory reference like MEMREF, but with its mode widened to
>> MODE and adjusted by OFFSET.  */
>>  extern rtx widen_memory_access (rtx, machine_mode, HOST_WIDE_INT);
>>
>> err - functions defined in emit-rtl.c should be declared in emit-rtl.h.
>> Please fix that first.  expr.h should _only_ contain prototypes
>> for stuff defined in expr.c.
> oops, missed it :(
>>
>> Andrew did a good job with this, first cleaning up a header moving
>> declarations to proper places and only after that flattening it.
>>
>> The rest of the patch looks good to me but expr.h isn't in a good
>> shape after it.
> I will work on it and send patch with suggested changes by tomorrow.
>
> Thanks,
> Prathamesh
>>
>> Thanks,
>> Richard.


Re: [PATCH] Don't set TREE_READONLY on dummy args with VALUE attr (PR fortran/64528)

2015-01-13 Thread Tobias Burnus
Jakub Jelinek wrote:
> With VALUE attr, the PARM_DECLs hold the values and thus are (usually) not
> read-only, therefore telling the middle-end they are read-only leads to
> invalid IL.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?

OK. Thanks for the patch. I haven't checked whether it also applies to
4.8/4.9, if so, the patch is also okay for those branches.

Tobias


[PATCH] Fix ICE with -fgnu-tm and pragma ivdep (PR middle-end/64391)

2015-01-13 Thread Marek Polacek
We ICE on this testcase, because the usage of #pragma GCC ivdep
pulls in the ANNOTATE internal functions which don't have underlying
fndecls, hence we segv on a NULL_TREE.  This patch makes get_attrs_for
be prepared for such a scenario.  The callers of get_attrs_for already
check for NULL_TREE.  I don't think internal fns can have transaction_*
attributes anyway.  While at it, I did some cleanups.

Bootstrapped/regtested on {ppc64,x86_64}-linux, ok for trunk?

2015-01-13  Marek Polacek  

PR middle-end/64391
* trans-mem.c (get_attrs_for): Return NULL_TREE if X is NULL_TREE.

* gcc.dg/tm/pr64391.c: New test.

diff --git gcc/testsuite/gcc.dg/tm/pr64391.c gcc/testsuite/gcc.dg/tm/pr64391.c
index e69de29..235118a 100644
--- gcc/testsuite/gcc.dg/tm/pr64391.c
+++ gcc/testsuite/gcc.dg/tm/pr64391.c
@@ -0,0 +1,10 @@
+/* PR middle-end/64391 */
+/* { dg-do compile } */
+/* { dg-options "-fgnu-tm" } */
+
+void
+foo (void)
+{
+#pragma GCC ivdep
+  while (1);
+}
diff --git gcc/trans-mem.c gcc/trans-mem.c
index b449760..21fa497 100644
--- gcc/trans-mem.c
+++ gcc/trans-mem.c
@@ -183,6 +183,9 @@ static void *expand_regions (struct tm_region *,
 static tree
 get_attrs_for (const_tree x)
 {
+  if (x == NULL_TREE)
+return NULL_TREE;
+
   switch (TREE_CODE (x))
 {
 case FUNCTION_DECL:
@@ -191,16 +194,16 @@ get_attrs_for (const_tree x)
 
 default:
   if (TYPE_P (x))
-   return NULL;
+   return NULL_TREE;
   x = TREE_TYPE (x);
   if (TREE_CODE (x) != POINTER_TYPE)
-   return NULL;
+   return NULL_TREE;
   /* FALLTHRU */
 
 case POINTER_TYPE:
   x = TREE_TYPE (x);
   if (TREE_CODE (x) != FUNCTION_TYPE && TREE_CODE (x) != METHOD_TYPE)
-   return NULL;
+   return NULL_TREE;
   /* FALLTHRU */
 
 case FUNCTION_TYPE:

Marek


[PATCH] Don't set TREE_READONLY on dummy args with VALUE attr (PR fortran/64528)

2015-01-13 Thread Jakub Jelinek
Hi!

With VALUE attr, the PARM_DECLs hold the values and thus are (usually) not
read-only, therefore telling the middle-end they are read-only leads to
invalid IL.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2015-01-13  Jakub Jelinek  

PR fortran/64528
* trans-decl.c (create_function_arglist): Don't set TREE_READONLY
on dummy args with VALUE attribute.

* gfortran.dg/pr64528.f90: New test.

--- gcc/fortran/trans-decl.c.jj 2015-01-09 21:59:47.0 +0100
+++ gcc/fortran/trans-decl.c2015-01-13 14:24:22.342682352 +0100
@@ -2327,8 +2327,9 @@ create_function_arglist (gfc_symbol * sy
   /* Fill in arg stuff.  */
   DECL_CONTEXT (parm) = fndecl;
   DECL_ARG_TYPE (parm) = TREE_VALUE (typelist);
-  /* All implementation args are read-only.  */
-  TREE_READONLY (parm) = 1;
+  /* All implementation args except for VALUE are read-only.  */
+  if (!f->sym->attr.value)
+   TREE_READONLY (parm) = 1;
   if (POINTER_TYPE_P (type)
  && (!f->sym->attr.proc_pointer
  && f->sym->attr.flavor != FL_PROCEDURE))
--- gcc/testsuite/gfortran.dg/pr64528.f90.jj2015-01-13 14:27:13.475650977 
+0100
+++ gcc/testsuite/gfortran.dg/pr64528.f90   2015-01-13 14:26:46.0 
+0100
@@ -0,0 +1,20 @@
+! PR fortran/64528
+! { dg-do compile }
+! { dg-options "-O -fno-tree-dce -fno-tree-ccp" }
+
+program pr64528
+  interface
+ subroutine foo(x)
+   integer, value :: x
+ end subroutine foo
+  end interface
+  integer :: x
+  x = 10
+  call foo(x)
+  if(x .ne. 10) then
+  endif
+end program pr64528
+subroutine foo(x)
+  integer, value :: x
+  x = 11
+end subroutine foo

Jakub


Re: [PATCH] IPA ICF: add comparison for target and optimization nodes

2015-01-13 Thread Jan Hubicka
> Hello.
> 
> Following patch adds support for target and optimization nodes comparison, 
> which is
> based on Honza's newly added infrastructure.
> 
> Apart from that, there's a small hunk that corrects formatting and removes 
> unnecessary
> call to a comparison function.
> 
> Hope it can be applied as one patch.
> 
> Tested on x86_64-linux-pc without any new regression introduction.
> 
> Ready for trunk?

OK, thanks!
Honza
> 
> Thank you,
> Martin

> >From 393eaa47c8aef9a91a1c635016f23ca2f5aa25e4 Mon Sep 17 00:00:00 2001
> From: mliska 
> Date: Tue, 6 Jan 2015 15:06:18 +0100
> Subject: [PATCH] IPA ICF: target and optimization flags comparison.
> 
> gcc/ChangeLog:
> 
> 2015-01-06  Martin Liska  
> 
>   * cgraphunit.c (cgraph_node::create_wrapper): Fix level of indentation.
>   * ipa-icf.c (sem_function::equals_private): Add support for target and
>   (sem_item_optimizer::merge_classes): Remove redundant function
>   comparison.
>   optimization flags comparison.
>   * tree.h (target_opts_for_fn): New function.
> ---
>  gcc/cgraphunit.c | 52 ++--
>  gcc/ipa-icf.c| 44 +++-
>  gcc/tree.h   | 10 ++
>  3 files changed, 79 insertions(+), 27 deletions(-)
> 
> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
> index c8c8562..81246e2 100644
> --- a/gcc/cgraphunit.c
> +++ b/gcc/cgraphunit.c
> @@ -2385,40 +2385,40 @@ cgraphunit_c_finalize (void)
>  void
>  cgraph_node::create_wrapper (cgraph_node *target)
>  {
> -/* Preserve DECL_RESULT so we get right by reference flag.  */
> -tree decl_result = DECL_RESULT (decl);
> +  /* Preserve DECL_RESULT so we get right by reference flag.  */
> +  tree decl_result = DECL_RESULT (decl);
>  
> -/* Remove the function's body but keep arguments to be reused
> -   for thunk.  */
> -release_body (true);
> -reset ();
> +  /* Remove the function's body but keep arguments to be reused
> + for thunk.  */
> +  release_body (true);
> +  reset ();
>  
> -DECL_RESULT (decl) = decl_result;
> -DECL_INITIAL (decl) = NULL;
> -allocate_struct_function (decl, false);
> -set_cfun (NULL);
> +  DECL_RESULT (decl) = decl_result;
> +  DECL_INITIAL (decl) = NULL;
> +  allocate_struct_function (decl, false);
> +  set_cfun (NULL);
>  
> -/* Turn alias into thunk and expand it into GIMPLE representation.  */
> -definition = true;
> -thunk.thunk_p = true;
> -thunk.this_adjusting = false;
> +  /* Turn alias into thunk and expand it into GIMPLE representation.  */
> +  definition = true;
> +  thunk.thunk_p = true;
> +  thunk.this_adjusting = false;
>  
> -cgraph_edge *e = create_edge (target, NULL, 0, CGRAPH_FREQ_BASE);
> +  cgraph_edge *e = create_edge (target, NULL, 0, CGRAPH_FREQ_BASE);
>  
> -tree arguments = DECL_ARGUMENTS (decl);
> +  tree arguments = DECL_ARGUMENTS (decl);
>  
> -while (arguments)
> -  {
> - TREE_ADDRESSABLE (arguments) = false;
> - arguments = TREE_CHAIN (arguments);
> -  }
> +  while (arguments)
> +{
> +  TREE_ADDRESSABLE (arguments) = false;
> +  arguments = TREE_CHAIN (arguments);
> +}
>  
> -expand_thunk (false, true);
> -e->call_stmt_cannot_inline_p = true;
> +  expand_thunk (false, true);
> +  e->call_stmt_cannot_inline_p = true;
>  
> -/* Inline summary set-up.  */
> -analyze ();
> -inline_analyze_function (this);
> +  /* Inline summary set-up.  */
> +  analyze ();
> +  inline_analyze_function (this);
>  }
>  
>  #include "gt-cgraphunit.h"
> diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
> index c7ba75a..28158b3 100644
> --- a/gcc/ipa-icf.c
> +++ b/gcc/ipa-icf.c
> @@ -427,6 +427,49 @@ sem_function::equals_private (sem_item *item,
>if (!equals_wpa (item, ignored_nodes))
>  return false;
>  
> +  /* Checking function TARGET and OPTIMIZATION flags.  */
> +  cl_target_option *tar1 = target_opts_for_fn (decl);
> +  cl_target_option *tar2 = target_opts_for_fn (m_compared_func->decl);
> +
> +  if (tar1 != NULL || tar2 != NULL)
> +{
> +  if (!cl_target_option_eq (tar1, tar2))
> + {
> +   if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> +   fprintf (dump_file, "Source target flags\n");
> +   cl_target_option_print (dump_file, 2, tar1);
> +   fprintf (dump_file, "Target target flags\n");
> +   cl_target_option_print (dump_file, 2, tar2);
> + }
> +
> +   return return_false_with_msg ("Target flags are different");
> + }
> +}
> +  else if (tar1 != NULL || tar2 != NULL)
> +return return_false_with_msg ("Target flags are different");
> +
> +  cl_optimization *opt1 = opts_for_fn (decl);
> +  cl_optimization *opt2 = opts_for_fn (m_compared_func->decl);
> +
> +  if (opt1 != NULL && opt2 != NULL)
> +{
> +  if (memcmp (opt1, opt2, sizeof(cl_optimization)))
> + {
> +   if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> +   fpr

Re: Open Issues in the TSAN Runtime

2015-01-13 Thread Jakub Jelinek
On Mon, Jan 12, 2015 at 09:55:15PM +0100, Jakub Jelinek wrote:
> > specific changes from there.
> 
> Yes, I'll try to cherry-pick those tomorrow.
> 
> > I am especially interested in fixing these two issues, but there may be 
> > other important improvements too:
> > 
> > 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64350 : TSAN fails after 
> > stress-testing for a while
> > 
> > was fixed by
> > 
> > http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/sanitizer_common/sanitizer_deadlock_detector.h?r1=224518&r2=224517&pathrev=224518
> > http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/sanitizer_common/sanitizer_deadlock_detector.h?r1=224519&r2=224518&pathrev=224519
> > 
> > 
> > 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63251 : tsan: corrupted shadow 
> > stack
> > 
> > was fixed by
> > 
> > http://llvm.org/viewvc/llvm-project?view=revision&revision=224702
> > http://llvm.org/viewvc/llvm-project?view=revision&revision=224834

So do you mean following?  I've bootstrapped/regtested on x86_64-linux and
i686-linux, but haven't tried any special testcases on it.  If it works for
you, I'll commit it.

2015-01-13  Jakub Jelinek  

* sanitizer_common/sanitizer_deadlock_detector.h: Cherry pick
upstream r224518 and r224519.
* tsan/tsan_rtl_thread.cc: Cherry pick upstream r224702 and
r224834.

--- libsanitizer/sanitizer_common/sanitizer_deadlock_detector.h.jj  
2014-05-22 10:18:14.893626024 +0200
+++ libsanitizer/sanitizer_common/sanitizer_deadlock_detector.h 2015-01-13 
14:07:40.096414924 +0100
@@ -48,6 +48,8 @@ class DeadlockDetectorTLS {
 if (epoch_ == current_epoch) return;
 bv_.clear();
 epoch_ = current_epoch;
+n_recursive_locks = 0;
+n_all_locks_ = 0;
   }
 
   uptr getEpoch() const { return epoch_; }
@@ -81,7 +83,8 @@ class DeadlockDetectorTLS {
   }
 }
 // Printf("remLock: %zx %zx\n", lock_id, epoch_);
-CHECK(bv_.clearBit(lock_id));
+if (!bv_.clearBit(lock_id))
+  return;  // probably addLock happened before flush
 if (n_all_locks_) {
   for (sptr i = n_all_locks_ - 1; i >= 0; i--) {
 if (all_locks_with_contexts_[i].lock == static_cast(lock_id)) {
@@ -173,6 +176,7 @@ class DeadlockDetector {
 recycled_nodes_.clear();
 available_nodes_.setAll();
 g_.clear();
+n_edges_ = 0;
 return getAvailableNode(data);
   }
 
--- libsanitizer/tsan/tsan_rtl_thread.cc.jj 2014-09-24 11:08:03.824028080 
+0200
+++ libsanitizer/tsan/tsan_rtl_thread.cc2015-01-13 14:08:06.167954292 
+0100
@@ -109,12 +109,13 @@ void ThreadContext::OnStarted(void *arg)
 thr->dd_pt = ctx->dd->CreatePhysicalThread();
 thr->dd_lt = ctx->dd->CreateLogicalThread(unique_id);
   }
+  thr->fast_state.SetHistorySize(flags()->history_size);
+  // Commit switch to the new part of the trace.
+  // TraceAddEvent will reset stack0/mset0 in the new part for us.
+  TraceAddEvent(thr, thr->fast_state, EventTypeMop, 0);
+
   thr->fast_synch_epoch = epoch0;
   AcquireImpl(thr, 0, &sync);
-  thr->fast_state.SetHistorySize(flags()->history_size);
-  const uptr trace = (epoch0 / kTracePartSize) % TraceParts();
-  Trace *thr_trace = ThreadTrace(thr->tid);
-  thr_trace->headers[trace].epoch0 = epoch0;
   StatInc(thr, StatSyncAcquire);
   sync.Reset(&thr->clock_cache);
   DPrintf("#%d: ThreadStart epoch=%zu stk_addr=%zx stk_size=%zx "


Jakub


[PATCH] Reenable CSE of non-volatile inline asm (PR rtl-optimization/63637)

2015-01-13 Thread Jakub Jelinek
Hi!

My PR60663 fix unfortunately stopped CSE of all inline-asms, even when
they e.g. only have the clobbers added by default.

This patch attempts to restore the old behavior, with the exceptions:
1) as always, asm volatile is not CSEd
2) inline-asm with multiple outputs are not CSEd
3) on request from Richard (which Segher on IRC argues against), "memory"
   clobber also prevents CSE; this can be removed by removing the
   int j, lim = XVECLEN (x, 0); and loop below it
4) inline-asm with clobbers is never copied into an insn that wasn't
   inline-asm before, so if there are clobbers, we allow CSEing of
   e.g. two same inline-asms, but only by reusing results of one
   of those

Bootstrapped/regtested on x86_64-linux and i686-linux, tested also
with arm cross after reverting the PR60663 arm cost fix.

Ok for trunk this way, or with 3) removed?

2015-01-13  Jakub Jelinek  

PR rtl-optimization/63637
PR rtl-optimization/60663
* cse.c (merge_equiv_classes): Set new_elt->cost to MAX_COST
if elt->cost is MAX_COST for ASM_OPERANDS.
(find_sets_in_insn): Fix up comment typo.
(cse_insn): Don't set src_volatile for all non-volatile
ASM_OPERANDS in PARALLELs, but just those with multiple outputs
or with "memory" clobber.  Set elt->cost to MAX_COST
for ASM_OPERANDS in PARALLEL.  Set src_elt->cost to MAX_COST
if new_src is ASM_OPERANDS and elt->cost is MAX_COST.

* gcc.dg/pr63637-1.c: New test.
* gcc.dg/pr63637-2.c: New test.
* gcc.dg/pr63637-3.c: New test.
* gcc.dg/pr63637-4.c: New test.
* gcc.dg/pr63637-5.c: New test.
* gcc.dg/pr63637-6.c: New test.
* gcc.target/i386/pr63637-1.c: New test.
* gcc.target/i386/pr63637-2.c: New test.
* gcc.target/i386/pr63637-3.c: New test.
* gcc.target/i386/pr63637-4.c: New test.
* gcc.target/i386/pr63637-5.c: New test.
* gcc.target/i386/pr63637-6.c: New test.

--- gcc/cse.c.jj2015-01-09 21:59:44.0 +0100
+++ gcc/cse.c   2015-01-13 13:26:23.391216064 +0100
@@ -1792,6 +1792,8 @@ merge_equiv_classes (struct table_elt *c
}
  new_elt = insert (exp, class1, hash, mode);
  new_elt->in_memory = hash_arg_in_memory;
+ if (GET_CODE (exp) == ASM_OPERANDS && elt->cost == MAX_COST)
+   new_elt->cost = MAX_COST;
}
 }
 }
@@ -4258,7 +4260,7 @@ find_sets_in_insn (rtx_insn *insn, struc
 {
   int i, lim = XVECLEN (x, 0);
 
-  /* Go over the epressions of the PARALLEL in forward order, to
+  /* Go over the expressions of the PARALLEL in forward order, to
 put them in the same order in the SETS array.  */
   for (i = 0; i < lim; i++)
{
@@ -4634,12 +4636,27 @@ cse_insn (rtx_insn *insn)
  && REGNO (dest) >= FIRST_PSEUDO_REGISTER)
sets[i].src_volatile = 1;
 
-  /* Also do not record result of a non-volatile inline asm with
-more than one result or with clobbers, we do not want CSE to
-break the inline asm apart.  */
   else if (GET_CODE (src) == ASM_OPERANDS
   && GET_CODE (x) == PARALLEL)
-   sets[i].src_volatile = 1;
+   {
+ /* Do not record result of a non-volatile inline asm with
+more than one result.  */
+ if (n_sets > 1)
+   sets[i].src_volatile = 1;
+
+ int j, lim = XVECLEN (x, 0);
+ for (j = 0; j < lim; j++)
+   {
+ rtx y = XVECEXP (x, 0, j);
+ /* And do not record result of a non-volatile inline asm
+with "memory" clobber.  */
+ if (GET_CODE (y) == CLOBBER && MEM_P (XEXP (y, 0)))
+   {
+ sets[i].src_volatile = 1;
+ break;
+   }
+   }
+   }
 
 #if 0
   /* It is no longer clear why we used to do this, but it doesn't
@@ -5230,8 +5247,8 @@ cse_insn (rtx_insn *insn)
;
 
  /* Look for a substitution that makes a valid insn.  */
- else if (validate_unshare_change
-(insn, &SET_SRC (sets[i].rtl), trial, 0))
+ else if (validate_unshare_change (insn, &SET_SRC (sets[i].rtl),
+   trial, 0))
{
  rtx new_rtx = canon_reg (SET_SRC (sets[i].rtl), insn);
 
@@ -5593,6 +5610,12 @@ cse_insn (rtx_insn *insn)
  }
elt = insert (src, classp, sets[i].src_hash, mode);
elt->in_memory = sets[i].src_in_memory;
+   /* If inline asm has any clobbers, ensure we only reuse
+  existing inline asms and never try to put the ASM_OPERANDS
+  into an insn that isn't inline asm.  */
+   if (GET_CODE (src) == ASM_OPERANDS
+   && GET_CODE (x) == PARALLEL)
+ elt->cost = MAX_COST;
sets[i].src_elt = classp = elt;
  }
if (sets[i].src_cons

[PATCH] Fix REE for vector modes (PR rtl-optimization/64286, take 2)

2015-01-13 Thread Jakub Jelinek
On Mon, Jan 12, 2015 at 02:29:53PM -0700, Jeff Law wrote:
> On 01/12/15 12:59, Jakub Jelinek wrote:
> >Hi!
> >
> >As mentioned in the PR, giving up for all vector mode extensions
> >is unnecessary, but unlike scalar integer extensions, where the low part
> >of the extended value is the original value, for vectors this is not true,
> >thus the old value is lost.  Which means we can perform REE, but only if
> >all uses of the definition are the same (code+mode) extension.
> >
> >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> >2015-01-12  Jakub Jelinek  
> >
> > PR rtl-optimization/64286
> > * ree.c (add_removable_extension): Don't add vector mode
> > extensions if all uses of the source register aren't the same
> > vector extensions.
> >
> > * gcc.target/i386/avx2-pr64286.c: New test.
> Does it make sense to remove your change for 59754 in combine_reaching_defs?
> Shouldn't this patch handle that case as well?

You're right, this patch handles that too.

New patch, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2015-01-13  Jakub Jelinek  

PR rtl-optimization/64286
* ree.c (combine_reaching_defs): Move part of comment earlier,
remove !SCALAR_INT_MODE_P check.
(add_removable_extension): Don't add vector mode
extensions if all uses of the source register aren't the same
vector extensions.

* gcc.target/i386/avx2-pr64286.c: New test.

--- gcc/ree.c.jj2015-01-12 21:29:07.023060045 +0100
+++ gcc/ree.c   2015-01-13 09:43:32.158449885 +0100
@@ -783,6 +783,17 @@ combine_reaching_defs (ext_cand *cand, c
!= REGNO (get_extended_src_reg (SET_SRC (PATTERN (cand->insn);
   if (copy_needed)
 {
+  /* Considering transformation of
+(set (reg1) (expression))
+...
+(set (reg2) (any_extend (reg1)))
+
+into
+
+(set (reg2) (any_extend (expression)))
+(set (reg1) (reg2))
+...  */
+
   /* In theory we could handle more than one reaching def, it
 just makes the code to update the insn stream more complex.  */
   if (state->defs_list.length () != 1)
@@ -798,18 +809,6 @@ combine_reaching_defs (ext_cand *cand, c
   if (state->modified[INSN_UID (cand->insn)].kind != EXT_MODIFIED_NONE)
return false;
 
-  /* Transformation of
-(set (reg1) (expression))
-(set (reg2) (any_extend (reg1)))
-into
-(set (reg2) (any_extend (expression)))
-(set (reg1) (reg2))
-is only valid for scalar integral modes, as it relies on the low
-subreg of reg1 to have the value of (expression), which is not true
-e.g. for vector modes.  */
-  if (!SCALAR_INT_MODE_P (GET_MODE (SET_DEST (PATTERN (cand->insn)
-   return false;
-
   machine_mode dst_mode = GET_MODE (SET_DEST (PATTERN (cand->insn)));
   rtx src_reg = get_extended_src_reg (SET_SRC (PATTERN (cand->insn)));
 
@@ -1027,6 +1026,7 @@ add_removable_extension (const_rtx expr,
 different extension.  FIXME: this obviously can be improved.  */
   for (def = defs; def; def = def->next)
if ((idx = def_map[INSN_UID (DF_REF_INSN (def->ref))])
+   && idx != -1U
&& (cand = &(*insn_list)[idx - 1])
&& cand->code != code)
  {
@@ -1038,6 +1038,57 @@ add_removable_extension (const_rtx expr,
  }
return;
  }
+   /* For vector mode extensions, ensure that all uses of the
+  XEXP (src, 0) register are the same extension (both code
+  and to which mode), as unlike integral extensions lowpart
+  subreg of the sign/zero extended register are not equal
+  to the original register, so we have to change all uses or
+  none.  */
+   else if (VECTOR_MODE_P (GET_MODE (XEXP (src, 0
+ {
+   if (idx == 0)
+ {
+   struct df_link *ref_chain, *ref_link;
+
+   ref_chain = DF_REF_CHAIN (def->ref);
+   for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
+ {
+   if (ref_link->ref == NULL
+   || DF_REF_INSN_INFO (ref_link->ref) == NULL)
+ {
+   idx = -1U;
+   break;
+ }
+   rtx_insn *use_insn = DF_REF_INSN (ref_link->ref);
+   const_rtx use_set;
+   if (use_insn == insn || DEBUG_INSN_P (use_insn))
+ continue;
+   if (!(use_set = single_set (use_insn))
+   || !REG_P (SET_DEST (use_set))
+   || GET_MODE (SET_DEST (use_set)) != GET_MODE (dest)
+   || GET_CODE (SET_SRC (use_set)) != code
+   || !rtx_equal_p (XEXP (SET_SRC (use_set), 0),
+XEXP (src, 0)))
+ {
+  

Re: [ARM]Make CLZ_DEFINED_VALUE_AT_ZERO and CTZ_DEFINED_VALUE_AT_ZERO return 2.

2015-01-13 Thread Richard Earnshaw
On 13/01/15 15:58, Renlin Li wrote:
> Hi all,
> 
> This patch update CLZ_DEFINED_VALUE_AT_ZERO and 
> CTZ_DEFINED_VALUE_AT_ZERO to make them return 2 in
> arm back-end.
> 
> Here are the explanations from GCC documentation:
> 
> CLZ_DEFINED_VALUE_AT_ZERO (mode, value)
> CTZ_DEFINED_VALUE_AT_ZERO (mode, value)
> A C expression that indicates whether the architecture defines a value
> for @code{clz} or @code{ctz} with a zero operand.
> A result of 0 indicates the value is undefined.
> If the value is defined for only the RTL expression, the macro should
> evaluate to 1; if the value applies also to the corresponding optab
> entry (which is normally the case if it expands directly into
> the corresponding RTL), then the macro should evaluate to 2.
> In the cases where the value is defined, @var{value} should be set to
> this value.
> 
> arm-none-eabi has been test on the model, no new issue.
> Okay for trunk?
> 
> gcc/ChangeLog:
> 
>  2015-01-13 Renlin Li 
> 
>  * config/arm/arm.h (CLZ_DEFINED_VALUE_AT_ZERO): Return 2.
>  (CTZ_DEFINED_VALUE_AT_ZERO): Ditto.
> 
> 

OK.

R.




[ARM]Make CLZ_DEFINED_VALUE_AT_ZERO and CTZ_DEFINED_VALUE_AT_ZERO return 2.

2015-01-13 Thread Renlin Li

Hi all,

This patch update CLZ_DEFINED_VALUE_AT_ZERO and 
CTZ_DEFINED_VALUE_AT_ZERO to make them return 2 in

arm back-end.

Here are the explanations from GCC documentation:

CLZ_DEFINED_VALUE_AT_ZERO (mode, value)
CTZ_DEFINED_VALUE_AT_ZERO (mode, value)
A C expression that indicates whether the architecture defines a value
for @code{clz} or @code{ctz} with a zero operand.
A result of 0 indicates the value is undefined.
If the value is defined for only the RTL expression, the macro should
evaluate to 1; if the value applies also to the corresponding optab
entry (which is normally the case if it expands directly into
the corresponding RTL), then the macro should evaluate to 2.
In the cases where the value is defined, @var{value} should be set to
this value.

arm-none-eabi has been test on the model, no new issue.
Okay for trunk?

gcc/ChangeLog:

2015-01-13 Renlin Li 

* config/arm/arm.h (CLZ_DEFINED_VALUE_AT_ZERO): Return 2.
(CTZ_DEFINED_VALUE_AT_ZERO): Ditto.

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index d850982..83c9c33 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -2145,9 +2145,9 @@ extern int making_const_table;
: reverse_condition (code))
 
 #define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
-  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE))
+  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2)
 #define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
-  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE))
+  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2)
 
 #define CC_STATUS_INIT \
   do { cfun->machine->thumb1_cc_insn = NULL_RTX; } while (0)

Re: PATCH][ARM][4.8]Backport "Fix definition of __ARM_SIZEOF_WCHAR_T"

2015-01-13 Thread Richard Earnshaw
On 13/01/15 15:53, Renlin Li wrote:
> Hi all,
> 
> This is a backport patch for 
> https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=213864
> 
> arm-none-eabi regression test has been done, no new issues.
> Okay for branch 4.8?
> 
> gcc/ChangeLog
> Fix PR target/61413
> Backport from mainline.
> 
> 2014-08-12 Ramana Radhakrishnan 
> 
>  PR target/61413
>  * config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Fix definition
>  of __ARM_SIZEOF_WCHAR_T.
> 
> 
> 
> backport.patch
> 
> 
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index c0f2184..2eb 100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -74,8 +74,8 @@ extern char arm_arch_name[];
>   builtin_define_with_int_value ( \
> "__ARM_SIZEOF_MINIMAL_ENUM",  \
> flag_short_enums ? 1 : 4);\
> - builtin_define_with_int_value ( \
> -   "__ARM_SIZEOF_WCHAR_T", WCHAR_TYPE_SIZE); \
> + builtin_define_type_sizeof ("__ARM_SIZEOF_WCHAR_T", \
> + wchar_type_node);   \
>   if (TARGET_ARM_ARCH_PROFILE)\
> builtin_define_with_int_value (   \
>   "__ARM_ARCH_PROFILE", TARGET_ARM_ARCH_PROFILE); \
> 

OK.




PATCH][ARM][4.8]Backport "Fix definition of __ARM_SIZEOF_WCHAR_T"

2015-01-13 Thread Renlin Li

Hi all,

This is a backport patch for 
https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=213864


arm-none-eabi regression test has been done, no new issues.
Okay for branch 4.8?

gcc/ChangeLog
Fix PR target/61413
Backport from mainline.

2014-08-12 Ramana Radhakrishnan 

PR target/61413
* config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Fix definition
of __ARM_SIZEOF_WCHAR_T.

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index c0f2184..2eb 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -74,8 +74,8 @@ extern char arm_arch_name[];
 	builtin_define_with_int_value (\
 	  "__ARM_SIZEOF_MINIMAL_ENUM",\
 	  flag_short_enums ? 1 : 4);\
-	builtin_define_with_int_value (\
-	  "__ARM_SIZEOF_WCHAR_T", WCHAR_TYPE_SIZE);		\
+	builtin_define_type_sizeof ("__ARM_SIZEOF_WCHAR_T",	\
+wchar_type_node);		\
 	if (TARGET_ARM_ARCH_PROFILE)\
 	  builtin_define_with_int_value (			\
 	"__ARM_ARCH_PROFILE", TARGET_ARM_ARCH_PROFILE);	\

Re: libffi is broken for x32

2015-01-13 Thread Richard Henderson
On 01/13/2015 07:35 AM, H.J. Lu wrote:
> Can I apply it to GCC trunk?

Please.


r~


[[ARM/AArch64][testsuite] 27/36] Add vmull_n tests.

2015-01-13 Thread Christophe Lyon
* gcc.target/aarch64/advsimd-intrinsics/vmull_n.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_n.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_n.c
new file mode 100644
index 000..df28a94
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_n.c
@@ -0,0 +1,61 @@
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,32,4) [] = { 0x11000, 0x11000, 0x11000, 0x11000 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0x22000, 0x22000 };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0x33000, 0x33000, 0x33000, 0x33000 };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0x44000, 0x44000 };
+
+#define INSN_NAME vmull
+#define TEST_MSG "VMULL_N"
+void exec_vmull_n (void)
+{
+  int i;
+
+  /* vector_res = vmull_n(vector,val), then store the result.  */
+#define TEST_VMULL_N1(INSN, T1, T2, W, W2, N, L)   \
+  VECT_VAR(vector_res, T1, W2, N) =\
+INSN##_n_##T2##W(VECT_VAR(vector, T1, W, N),   \
+L);\
+  vst1q_##T2##W2(VECT_VAR(result, T1, W2, N), VECT_VAR(vector_res, T1, W2, N))
+
+#define TEST_VMULL_N(INSN, T1, T2, W, W2, N, L)\
+  TEST_VMULL_N1(INSN, T1, T2, W, W2, N, L)
+
+  DECL_VARIABLE(vector, int, 16, 4);
+  DECL_VARIABLE(vector, int, 32, 2);
+  DECL_VARIABLE(vector, uint, 16, 4);
+  DECL_VARIABLE(vector, uint, 32, 2);
+
+  DECL_VARIABLE(vector_res, int, 32, 4);
+  DECL_VARIABLE(vector_res, int, 64, 2);
+  DECL_VARIABLE(vector_res, uint, 32, 4);
+  DECL_VARIABLE(vector_res, uint, 64, 2);
+
+  clean_results ();
+
+  /* Initialize vector.  */
+  VDUP(vector, , int, s, 16, 4, 0x1000);
+  VDUP(vector, , int, s, 32, 2, 0x1000);
+  VDUP(vector, , uint, u, 16, 4, 0x1000);
+  VDUP(vector, , uint, u, 32, 2, 0x1000);
+
+  /* Choose multiplier arbitrarily.  */
+  TEST_VMULL_N(INSN_NAME, int, s, 16, 32, 4, 0x11);
+  TEST_VMULL_N(INSN_NAME, int, s, 32, 64, 2, 0x22);
+  TEST_VMULL_N(INSN_NAME, uint, u, 16, 32, 4, 0x33);
+  TEST_VMULL_N(INSN_NAME, uint, u, 32, 64, 2, 0x44);
+
+  CHECK(TEST_MSG, int, 32, 4, PRIx32, expected, "");
+  CHECK(TEST_MSG, int, 64, 2, PRIx64, expected, "");
+  CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, "");
+  CHECK(TEST_MSG, uint, 64, 2, PRIx64, expected, "");
+}
+
+int main (void)
+{
+  exec_vmull_n ();
+  return 0;
+}
-- 
2.1.0



Re: libffi is broken for x32

2015-01-13 Thread H.J. Lu
On Mon, Jan 12, 2015 at 5:13 PM, Richard Henderson  wrote:
> On 01/12/2015 04:57 PM, H.J. Lu wrote:
>> The problem is my x86_64-*-linux-gnux32 patch
>>
>> https://gcc.gnu.org/ml/gcc-patches/2012-08/msg01083.html
>>
>> was never accepted upstream.  Can I apply it to config.guess
>> in GCC?
>
> Ah.  Hmm.  Perhaps the configure.host patch would be better after all.
>

Can I apply it to GCC trunk?

Thanks.

-- 
H.J.


Re: [PATCH] PR59448 - Promote consume to acquire

2015-01-13 Thread Andrew MacLeod

On 01/13/2015 10:20 AM, Torvald Riegel wrote:

On Tue, 2015-01-13 at 09:56 -0500, Andrew MacLeod wrote:

The problem with the patch in the PR is the  memory model is immediately
promoted from consume to acquire.   This happens *before* any of the
memmodel checks are made.  If a consume is illegally specified (such as
in a compare_exchange), it gets promoted to acquire and the compiler
doesn't report the error because it never sees the consume.

The only issue I can think of in compare_exchange is if the program
specifies memory_order_consume for the success path but
memory_order_acquire for the failure path, which is disallowed by the
standard.

However, I don't see a reason why the standard's requirement is anything
but a performance check in our particular case.  The only case we
prevent the compiler from reporting is a consume-on-success /
acquire-on-failure combination.  But we upgrade the former to acquire,
so we can't even cause libatomic (or similar) to issue too weak barriers
due to libatomic relying on the standard's requirement.

Thus, if there's no easy way to upgrade to acquire after the sanity
checks, I think this isn't critical enough to hold up the patch from
being committed.  memory_order_consume is clearly a feature for experts.

The error was actually in exchange... not compare_exchange like I wrote. 
  and causes a testsuite error that specifically tests for an illegal 
consume.


Andrew


Re: [patch RFA libffi SH] Fix configure error for sh4-unknown-linux-gnu

2015-01-13 Thread Richard Henderson
On 01/12/2015 09:38 PM, Kaz Kojima wrote:
> 2015-01-13  Kaz Kojima  
> 
>   * configure.host: Remove extra brackets for sh.

Ok, thanks.


r~


[[ARM/AArch64][testsuite] 08/36] Add vtrn tests. Refactor vzup and vzip tests.

2015-01-13 Thread Christophe Lyon

* gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc: New file.
* gcc.target/aarch64/advsimd-intrinsics/vtrn.c: New file.
* gcc.target/aarch64/advsimd-intrinsics/vuzp.c: Use code from
vshuffle.inc.
* gcc.target/aarch64/advsimd-intrinsics/vzip.c: Use code from
vshuffle.inc.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc
new file mode 100644
index 000..928f338
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc
@@ -0,0 +1,139 @@
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+  /* In this case, output variables are arrays of vectors.  */
+#define DECL_VSHUFFLE(T1, W, N)
\
+  VECT_ARRAY_TYPE(T1, W, N, 2) VECT_ARRAY_VAR(result_vec, T1, W, N, 2);
\
+  VECT_VAR_DECL(result_bis, T1, W, N)[2 * N]
+
+  /* We need to use a temporary result buffer (result_bis), because
+ the one used for other tests is not large enough. A subset of the
+ result data is moved from result_bis to result, and it is this
+ subset which is used to check the actual behaviour. The next
+ macro enables to move another chunk of data from result_bis to
+ result.  */
+#define TEST_VSHUFFLE(INSN, Q, T1, T2, W, N)   \
+  VECT_ARRAY_VAR(result_vec, T1, W, N, 2) =\
+INSN##Q##_##T2##W(VECT_VAR(vector1, T1, W, N), \
+ VECT_VAR(vector2, T1, W, N)); \
+  vst2##Q##_##T2##W(VECT_VAR(result_bis, T1, W, N),\
+   VECT_ARRAY_VAR(result_vec, T1, W, N, 2));   \
+  memcpy(VECT_VAR(result, T1, W, N), VECT_VAR(result_bis, T1, W, N),   \
+sizeof(VECT_VAR(result, T1, W, N)));
+
+  /* Overwrite "result" with the contents of "result_bis"[X].  */
+#define TEST_EXTRA_CHUNK(T1, W, N, X)  \
+  memcpy(VECT_VAR(result, T1, W, N), &(VECT_VAR(result_bis, T1, W, N)[X*N]), \
+sizeof(VECT_VAR(result, T1, W, N)));
+
+  DECL_VARIABLE_ALL_VARIANTS(vector1);
+  DECL_VARIABLE_ALL_VARIANTS(vector2);
+
+  /* We don't need 64 bits variants.  */
+#define DECL_ALL_VSHUFFLE()\
+  DECL_VSHUFFLE(int, 8, 8);\
+  DECL_VSHUFFLE(int, 16, 4);   \
+  DECL_VSHUFFLE(int, 32, 2);   \
+  DECL_VSHUFFLE(uint, 8, 8);   \
+  DECL_VSHUFFLE(uint, 16, 4);  \
+  DECL_VSHUFFLE(uint, 32, 2);  \
+  DECL_VSHUFFLE(poly, 8, 8);   \
+  DECL_VSHUFFLE(poly, 16, 4);  \
+  DECL_VSHUFFLE(float, 32, 2); \
+  DECL_VSHUFFLE(int, 8, 16);   \
+  DECL_VSHUFFLE(int, 16, 8);   \
+  DECL_VSHUFFLE(int, 32, 4);   \
+  DECL_VSHUFFLE(uint, 8, 16);  \
+  DECL_VSHUFFLE(uint, 16, 8);  \
+  DECL_VSHUFFLE(uint, 32, 4);  \
+  DECL_VSHUFFLE(poly, 8, 16);  \
+  DECL_VSHUFFLE(poly, 16, 8);  \
+  DECL_VSHUFFLE(float, 32, 4)
+
+  DECL_ALL_VSHUFFLE();
+
+  /* Initialize input "vector" from "buffer".  */
+  TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector1, buffer);
+  VLOAD(vector1, buffer, , float, f, 32, 2);
+  VLOAD(vector1, buffer, q, float, f, 32, 4);
+
+  /* Choose arbitrary initialization values.  */
+  VDUP(vector2, , int, s, 8, 8, 0x11);
+  VDUP(vector2, , int, s, 16, 4, 0x22);
+  VDUP(vector2, , int, s, 32, 2, 0x33);
+  VDUP(vector2, , uint, u, 8, 8, 0x55);
+  VDUP(vector2, , uint, u, 16, 4, 0x66);
+  VDUP(vector2, , uint, u, 32, 2, 0x77);
+  VDUP(vector2, , poly, p, 8, 8, 0x55);
+  VDUP(vector2, , poly, p, 16, 4, 0x66);
+  VDUP(vector2, , float, f, 32, 2, 33.6f);
+
+  VDUP(vector2, q, int, s, 8, 16, 0x11);
+  VDUP(vector2, q, int, s, 16, 8, 0x22);
+  VDUP(vector2, q, int, s, 32, 4, 0x33);
+  VDUP(vector2, q, uint, u, 8, 16, 0x55);
+  VDUP(vector2, q, uint, u, 16, 8, 0x66);
+  VDUP(vector2, q, uint, u, 32, 4, 0x77);
+  VDUP(vector2, q, poly, p, 8, 16, 0x55);
+  VDUP(vector2, q, poly, p, 16, 8, 0x66);
+  VDUP(vector2, q, float, f, 32, 4, 33.8f);
+  
+#define TEST_ALL_VSHUFFLE(INSN)\
+  TEST_VSHUFFLE(INSN, , int, s, 8, 8); \
+  TEST_VSHUFFLE(INSN, , int, s, 16, 4);\
+  TEST_VSHUFFLE(INSN, , int, s, 32, 2);\
+  TEST_VSHUFFLE(INSN, , uint, u, 8, 8);\
+  TEST_VSHUFFLE(INSN, , uint, u, 16, 4);   \
+  TEST_VSHUFFLE(INSN, , uint, u, 32, 2);   \
+  TEST_VSHUFFLE(INSN, , poly, p, 8, 8);\
+  TEST_VSHUFFLE(INSN, , poly, p, 16, 4); 

[[ARM/AArch64][testsuite] 06/36] Add vmla and vmls tests.

2015-01-13 Thread Christophe Lyon

* gcc.target/aarch64/advsimd-intrinsics/vmlX.inc: New file.
* gcc.target/aarch64/advsimd-intrinsics/vmla.c: New file.
* gcc.target/aarch64/advsimd-intrinsics/vmls.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlX.inc 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlX.inc
new file mode 100644
index 000..1c8f1be
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlX.inc
@@ -0,0 +1,110 @@
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+#define DECL_VMLX(T, W, N) \
+  DECL_VARIABLE(vector1, T, W, N); \
+  DECL_VARIABLE(vector2, T, W, N); \
+  DECL_VARIABLE(vector3, T, W, N); \
+  DECL_VARIABLE(vector_res, T, W, N)
+
+  /* vector_res = vmla(vector, vector3, vector4),
+ then store the result.  */
+#define TEST_VMLX1(INSN, Q, T1, T2, W, N)  \
+  VECT_VAR(vector_res, T1, W, N) =  \
+INSN##Q##_##T2##W(VECT_VAR(vector1, T1, W, N), \
+ VECT_VAR(vector2, T1, W, N),  \
+ VECT_VAR(vector3, T1, W, N)); \
+  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), \
+   VECT_VAR(vector_res, T1, W, N))
+
+#define TEST_VMLX(INSN, Q, T1, T2, W, N)   \
+  TEST_VMLX1(INSN, Q, T1, T2, W, N)
+
+  DECL_VMLX(int, 8, 8);
+  DECL_VMLX(int, 16, 4);
+  DECL_VMLX(int, 32, 2);
+  DECL_VMLX(uint, 8, 8);
+  DECL_VMLX(uint, 16, 4);
+  DECL_VMLX(uint, 32, 2);
+  DECL_VMLX(float, 32, 2);
+  DECL_VMLX(int, 8, 16);
+  DECL_VMLX(int, 16, 8);
+  DECL_VMLX(int, 32, 4);
+  DECL_VMLX(uint, 8, 16);
+  DECL_VMLX(uint, 16, 8);
+  DECL_VMLX(uint, 32, 4);
+  DECL_VMLX(float, 32, 4);
+
+  clean_results ();
+
+  VLOAD(vector1, buffer, , int, s, 8, 8);
+  VLOAD(vector1, buffer, , int, s, 16, 4);
+  VLOAD(vector1, buffer, , int, s, 32, 2);
+  VLOAD(vector1, buffer, , uint, u, 8, 8);
+  VLOAD(vector1, buffer, , uint, u, 16, 4);
+  VLOAD(vector1, buffer, , uint, u, 32, 2);
+  VLOAD(vector1, buffer, , float, f, 32, 2);
+  VLOAD(vector1, buffer, q, int, s, 8, 16);
+  VLOAD(vector1, buffer, q, int, s, 16, 8);
+  VLOAD(vector1, buffer, q, int, s, 32, 4);
+  VLOAD(vector1, buffer, q, uint, u, 8, 16);
+  VLOAD(vector1, buffer, q, uint, u, 16, 8);
+  VLOAD(vector1, buffer, q, uint, u, 32, 4);
+  VLOAD(vector1, buffer, q, float, f, 32, 4);
+
+  VDUP(vector2, , int, s, 8, 8, 0x11);
+  VDUP(vector2, , int, s, 16, 4, 0x22);
+  VDUP(vector2, , int, s, 32, 2, 0x33);
+  VDUP(vector2, , uint, u, 8, 8, 0x44);
+  VDUP(vector2, , uint, u, 16, 4, 0x55);
+  VDUP(vector2, , uint, u, 32, 2, 0x66);
+  VDUP(vector2, , float, f, 32, 2, 33.1f);
+  VDUP(vector2, q, int, s, 8, 16, 0x77);
+  VDUP(vector2, q, int, s, 16, 8, 0x88);
+  VDUP(vector2, q, int, s, 32, 4, 0x99);
+  VDUP(vector2, q, uint, u, 8, 16, 0xAA);
+  VDUP(vector2, q, uint, u, 16, 8, 0xBB);
+  VDUP(vector2, q, uint, u, 32, 4, 0xCC);
+  VDUP(vector2, q, float, f, 32, 4, 99.2f);
+
+  VDUP(vector3, , int, s, 8, 8, 0xFF);
+  VDUP(vector3, , int, s, 16, 4, 0xEE);
+  VDUP(vector3, , int, s, 32, 2, 0xDD);
+  VDUP(vector3, , uint, u, 8, 8, 0xCC);
+  VDUP(vector3, , uint, u, 16, 4, 0xBB);
+  VDUP(vector3, , uint, u, 32, 2, 0xAA);
+  VDUP(vector3, , float, f, 32, 2, 10.23f);
+  VDUP(vector3, q, int, s, 8, 16, 0x99);
+  VDUP(vector3, q, int, s, 16, 8, 0x88);
+  VDUP(vector3, q, int, s, 32, 4, 0x77);
+  VDUP(vector3, q, uint, u, 8, 16, 0x66);
+  VDUP(vector3, q, uint, u, 16, 8, 0x55);
+  VDUP(vector3, q, uint, u, 32, 4, 0x44);
+  VDUP(vector3, q, float, f, 32, 4, 77.8f);
+
+  TEST_VMLX(INSN_NAME, , int, s, 8, 8);
+  TEST_VMLX(INSN_NAME, , int, s, 16, 4);
+  TEST_VMLX(INSN_NAME, , int, s, 32, 2);
+  TEST_VMLX(INSN_NAME, , uint, u, 8, 8);
+  TEST_VMLX(INSN_NAME, , uint, u, 16, 4);
+  TEST_VMLX(INSN_NAME, , uint, u, 32, 2);
+  TEST_VMLX(INSN_NAME, , float, f, 32, 2);
+  TEST_VMLX(INSN_NAME, q, int, s, 8, 16);
+  TEST_VMLX(INSN_NAME, q, int, s, 16, 8);
+  TEST_VMLX(INSN_NAME, q, int, s, 32, 4);
+  TEST_VMLX(INSN_NAME, q, uint, u, 8, 16);
+  TEST_VMLX(INSN_NAME, q, uint, u, 16, 8);
+  TEST_VMLX(INSN_NAME, q, uint, u, 32, 4);
+  TEST_VMLX(INSN_NAME, q, float, f, 32, 4);
+
+  CHECK_RESULTS (TEST_MSG, "");
+}
+
+int main (void)
+{
+  FNNAME (INSN_NAME) ();
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmla.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmla.c
new file mode 100644
index 000..e3da60c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmla.c
@@ -0,0 +1,50 @@
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define INSN_NAME vmla
+#define TEST_MSG "VMLA"
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0xdf, 0xe0, 0xe1, 0xe2,
+  0xe3, 0xe4, 0xe5, 0xe6 };
+VECT_VAR_DECL(exp

Re: [PATCH] PR59448 - Promote consume to acquire

2015-01-13 Thread Torvald Riegel
On Tue, 2015-01-13 at 09:56 -0500, Andrew MacLeod wrote:
> The problem with the patch in the PR is the  memory model is immediately 
> promoted from consume to acquire.   This happens *before* any of the 
> memmodel checks are made.  If a consume is illegally specified (such as 
> in a compare_exchange), it gets promoted to acquire and the compiler 
> doesn't report the error because it never sees the consume.

The only issue I can think of in compare_exchange is if the program
specifies memory_order_consume for the success path but
memory_order_acquire for the failure path, which is disallowed by the
standard.

However, I don't see a reason why the standard's requirement is anything
but a performance check in our particular case.  The only case we
prevent the compiler from reporting is a consume-on-success /
acquire-on-failure combination.  But we upgrade the former to acquire,
so we can't even cause libatomic (or similar) to issue too weak barriers
due to libatomic relying on the standard's requirement.

Thus, if there's no easy way to upgrade to acquire after the sanity
checks, I think this isn't critical enough to hold up the patch from
being committed.  memory_order_consume is clearly a feature for experts.



[[ARM/AArch64][testsuite] 35/36] Add vqdmull_lane tests.

2015-01-13 Thread Christophe Lyon
* gcc.target/aarch64/advsimd-intrinsics/vqdmull_lane.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmull_lane.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmull_lane.c
new file mode 100644
index 000..12f2a6b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmull_lane.c
@@ -0,0 +1,94 @@
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected values of cumulative_saturation flag.  */
+int VECT_VAR(expected_cumulative_sat,int,16,4) = 0;
+int VECT_VAR(expected_cumulative_sat,int,32,2) = 0;
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,32,4) [] = { 0x8000, 0x8000, 0x8000, 0x8000 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0x4000, 0x4000 };
+
+/* Expected values of cumulative_saturation flag when saturation
+   occurs.  */
+int VECT_VAR(expected_cumulative_sat2,int,16,4) = 1;
+int VECT_VAR(expected_cumulative_sat2,int,32,2) = 1;
+
+/* Expected results when saturation occurs.  */
+VECT_VAR_DECL(expected2,int,32,4) [] = { 0x7fff, 0x7fff,
+0x7fff, 0x7fff };
+VECT_VAR_DECL(expected2,int,64,2) [] = { 0x7fff,
+0x7fff };
+
+#define INSN_NAME vqdmull
+#define TEST_MSG "VQDMULL_LANE"
+
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+  int i;
+
+  /* vector_res = vqdmull_lane(vector,vector2,lane), then store the result.  */
+#define TEST_VQDMULL_LANE2(INSN, T1, T2, W, W2, N, L, EXPECTED_CUMULATIVE_SAT, 
CMT) \
+  Set_Neon_Cumulative_Sat(0, VECT_VAR(vector_res, T1, W2, N)); \
+  VECT_VAR(vector_res, T1, W2, N) =\
+INSN##_lane_##T2##W(VECT_VAR(vector, T1, W, N),\
+   VECT_VAR(vector2, T1, W, N),\
+   L); \
+  vst1q_##T2##W2(VECT_VAR(result, T1, W2, N),  \
+VECT_VAR(vector_res, T1, W2, N));  \
+  CHECK_CUMULATIVE_SAT(TEST_MSG, T1, W, N, EXPECTED_CUMULATIVE_SAT, CMT)
+
+  /* Two auxliary macros are necessary to expand INSN.  */
+#define TEST_VQDMULL_LANE1(INSN, T1, T2, W, W2, N, L, EXPECTED_CUMULATIVE_SAT, 
CMT) \
+  TEST_VQDMULL_LANE2(INSN, T1, T2, W, W2, N, L, EXPECTED_CUMULATIVE_SAT, CMT)
+
+#define TEST_VQDMULL_LANE(T1, T2, W, W2, N, L, EXPECTED_CUMULATIVE_SAT, CMT) \
+  TEST_VQDMULL_LANE1(INSN_NAME, T1, T2, W, W2, N, L, EXPECTED_CUMULATIVE_SAT, 
CMT)
+
+  DECL_VARIABLE(vector, int, 16, 4);
+  DECL_VARIABLE(vector, int, 32, 2);
+  DECL_VARIABLE(vector2, int, 16, 4);
+  DECL_VARIABLE(vector2, int, 32, 2);
+
+  DECL_VARIABLE(vector_res, int, 32, 4);
+  DECL_VARIABLE(vector_res, int, 64, 2);
+
+  clean_results ();
+
+  /* Initialize vector.  */
+  VDUP(vector, , int, s, 16, 4, 0x1000);
+  VDUP(vector, , int, s, 32, 2, 0x1000);
+
+  /* Initialize vector2.  */
+  VDUP(vector2, , int, s, 16, 4, 0x4);
+  VDUP(vector2, , int, s, 32, 2, 0x2);
+
+  /* Choose lane arbitrarily.  */
+  TEST_VQDMULL_LANE(int, s, 16, 32, 4, 2, expected_cumulative_sat, "");
+  TEST_VQDMULL_LANE(int, s, 32, 64, 2, 1, expected_cumulative_sat, "");
+
+  CHECK(TEST_MSG, int, 32, 4, PRIx32, expected, "");
+  CHECK(TEST_MSG, int, 64, 2, PRIx64, expected, "");
+
+  VDUP(vector, , int, s, 16, 4, 0x8000);
+  VDUP(vector2, , int, s, 16, 4, 0x8000);
+  VDUP(vector, , int, s, 32, 2, 0x8000);
+  VDUP(vector2, , int, s, 32, 2, 0x8000);
+
+#define TEST_MSG2 "with saturation"
+  TEST_VQDMULL_LANE(int, s, 16, 32, 4, 2, expected_cumulative_sat2, TEST_MSG2);
+  TEST_VQDMULL_LANE(int, s, 32, 64, 2, 1, expected_cumulative_sat2, TEST_MSG2);
+
+  CHECK(TEST_MSG, int, 32, 4, PRIx32, expected2, TEST_MSG2);
+  CHECK(TEST_MSG, int, 64, 2, PRIx64, expected2, TEST_MSG2);
+}
+
+int main (void)
+{
+  FNNAME (INSN_NAME) ();
+  return 0;
+}
-- 
2.1.0



[[ARM/AArch64][testsuite] 34/36] Add vqdmull tests.

2015-01-13 Thread Christophe Lyon
* gcc.target/aarch64/advsimd-intrinsics/vqdmull.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmull.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmull.c
new file mode 100644
index 000..e71a624
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmull.c
@@ -0,0 +1,86 @@
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected values of cumulative_saturation flag.  */
+int VECT_VAR(expected_cumulative_sat,int,16,4) = 0;
+int VECT_VAR(expected_cumulative_sat,int,32,2) = 0;
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,32,4) [] = { 0x200, 0x1c2, 0x188, 0x152 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0x200, 0x1c2 };
+
+/* Expected values of cumulative_saturation flag when saturation
+   occurs.  */
+int VECT_VAR(expected_cumulative_sat2,int,16,4) = 1;
+int VECT_VAR(expected_cumulative_sat2,int,32,2) = 1;
+
+/* Expected results when saturation occurs.  */
+VECT_VAR_DECL(expected2,int,32,4) [] = { 0x7fff, 0x7fff,
+0x7fff, 0x7fff };
+VECT_VAR_DECL(expected2,int,64,2) [] = { 0x7fff,
+0x7fff };
+
+#define INSN_NAME vqdmull
+#define TEST_MSG "VQDMULL"
+
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+  /* Basic test: y=vqdmull(x,x), then store the result.  */
+#define TEST_VQDMULL2(INSN, T1, T2, W, W2, N, EXPECTED_CUMULATIVE_SAT, CMT)
\
+  Set_Neon_Cumulative_Sat(0, VECT_VAR(vector_res, T1, W2, N)); \
+  VECT_VAR(vector_res, T1, W2, N) =\
+INSN##_##T2##W(VECT_VAR(vector, T1, W, N), \
+  VECT_VAR(vector2, T1, W, N));\
+  vst1q_##T2##W2(VECT_VAR(result, T1, W2, N),  \
+VECT_VAR(vector_res, T1, W2, N));  \
+  CHECK_CUMULATIVE_SAT(TEST_MSG, T1, W, N, EXPECTED_CUMULATIVE_SAT, CMT)
+
+  /* Two auxliary macros are necessary to expand INSN.  */
+#define TEST_VQDMULL1(INSN, T1, T2, W, W2, N, EXPECTED_CUMULATIVE_SAT, CMT)
\
+  TEST_VQDMULL2(INSN, T1, T2, W, W2, N, EXPECTED_CUMULATIVE_SAT, CMT)
+
+#define TEST_VQDMULL(T1, T2, W, W2, N, EXPECTED_CUMULATIVE_SAT, CMT)   \
+  TEST_VQDMULL1(INSN_NAME, T1, T2, W, W2, N, EXPECTED_CUMULATIVE_SAT, CMT)
+
+  DECL_VARIABLE(vector, int, 16, 4);
+  DECL_VARIABLE(vector, int, 32, 2);
+  DECL_VARIABLE(vector2, int, 16, 4);
+  DECL_VARIABLE(vector2, int, 32, 2);
+  DECL_VARIABLE(vector_res, int, 32, 4);
+  DECL_VARIABLE(vector_res, int, 64, 2);
+
+  clean_results ();
+
+  VLOAD(vector, buffer, , int, s, 16, 4);
+  VLOAD(vector, buffer, , int, s, 32, 2);
+  VLOAD(vector2, buffer, , int, s, 16, 4);
+  VLOAD(vector2, buffer, , int, s, 32, 2);
+
+  TEST_VQDMULL(int, s, 16, 32, 4, expected_cumulative_sat, "");
+  TEST_VQDMULL(int, s, 32, 64, 2, expected_cumulative_sat, "");
+
+  CHECK (TEST_MSG, int, 32, 4, PRIx16, expected, "");
+  CHECK (TEST_MSG, int, 64, 2, PRIx32, expected, "");
+
+  VDUP(vector, , int, s, 16, 4, 0x8000);
+  VDUP(vector2, , int, s, 16, 4, 0x8000);
+  VDUP(vector, , int, s, 32, 2, 0x8000);
+  VDUP(vector2, , int, s, 32, 2, 0x8000);
+
+#define TEST_MSG2 "with saturation"
+  TEST_VQDMULL(int, s, 16, 32, 4, expected_cumulative_sat2, TEST_MSG2);
+  TEST_VQDMULL(int, s, 32, 64, 2, expected_cumulative_sat2, TEST_MSG2);
+
+  CHECK (TEST_MSG, int, 32, 4, PRIx16, expected2, TEST_MSG2);
+  CHECK (TEST_MSG, int, 64, 2, PRIx32, expected2, TEST_MSG2);
+}
+
+int main (void)
+{
+  FNNAME (INSN_NAME) ();
+  return 0;
+}
-- 
2.1.0



  1   2   >