Re: ira.c update_equiv_regs patch causes gcc/testsuite/gcc.target/arm/pr43920-2.c regression

2015-04-20 Thread Shiva Chen
Hi, Jeff

Thanks for your advice.

can_replace_by.patch is the new patch to handle both cases.

pr43920-2.c.244r.jump2.ori is the original  jump2 rtl dump

pr43920-2.c.244r.jump2.patch_can_replace_by is the jump2 rtl dump
after patch  can_replace_by.patch

Could you help me to review the patch?

Thanks again.

Shiva

2015-04-18 0:03 GMT+08:00 Jeff Law :
> On 04/17/2015 03:57 AM, Shiva Chen wrote:
>>
>> Hi,
>>
>> I think the rtl dump in
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64916
>> is not jump2 phase rtl dump.
>>
>> Because jump2 is after ira, the register number should be hardware
>> register number.
>>
>> the jump2 rtl dump should as follow
>>
>> ...
>> 31: NOTE_INSN_BASIC_BLOCK 5
>> 32: [r6:SI]=r4:SI
>>REG_DEAD r6:SI
>>REG_DEAD r4:SI
>> 33: [r5:SI]=r0:SI
>>REG_DEAD r5:SI
>>REG_DEAD r0:SI
>>  7: r0:SI=0
>>REG_EQUAL 0
>> 85: use r0:SI
>> 86:
>> {return;sp:SI=sp:SI+0x18;r3:SI=[sp:SI];r4:SI=[sp:SI+0x4];r5:SI=[sp:SI+0x8];r6:SI=[sp:SI+0xc];r7:SI=[sp:SI+0x10];pc:SI=[sp:SI+0x14];}
>>REG_UNUSED pc:SI
>>REG_UNUSED r3:SI
>>REG_CFA_RESTORE r7:SI
>>REG_CFA_RESTORE r6:SI
>>REG_CFA_RESTORE r5:SI
>>REG_CFA_RESTORE r4:SI
>>REG_CFA_RESTORE r3:SI
>> 77: barrier
>> 46: L46:
>> 45: NOTE_INSN_BASIC_BLOCK 6
>>  8: r0:SI=r4:SI
>>REG_DEAD r4:SI
>>REG_EQUAL 0x
>> 87: use r0:SI
>> 88:
>> {return;sp:SI=sp:SI+0x18;r3:SI=[sp:SI];r4:SI=[sp:SI+0x4];r5:SI=[sp:SI+0x8];r6:SI=[sp:SI+0xc];r7:SI=[sp:SI+0x10];pc:SI=[sp:SI+0x14];}
>>REG_UNUSED pc:SI
>>REG_UNUSED r3:SI
>>REG_CFA_RESTORE r7:SI
>>REG_CFA_RESTORE r6:SI
>>REG_CFA_RESTORE r5:SI
>>REG_CFA_RESTORE r4:SI
>>REG_CFA_RESTORE r3:SI
>> 79: barrier
>> 54: L54:
>> 53: NOTE_INSN_BASIC_BLOCK 7
>>  9: r0:SI=0x <== lost REG_EQUAL after patch
>> 34: L34:
>> 35: NOTE_INSN_BASIC_BLOCK 8
>> 41: use r0:SI
>> 90:
>> {return;sp:SI=sp:SI+0x18;r3:SI=[sp:SI];r4:SI=[sp:SI+0x4];r5:SI=[sp:SI+0x8];r6:SI=[sp:SI+0xc];r7:SI=[sp:SI+0x10];pc:SI=[sp:SI+0x14];}
>>REG_UNUSED pc:SI
>>REG_UNUSED r3:SI
>>REG_CFA_RESTORE r7:SI
>>REG_CFA_RESTORE r6:SI
>>REG_CFA_RESTORE r5:SI
>>REG_CFA_RESTORE r4:SI
>>REG_CFA_RESTORE r3:SI
>> 89: barrier
>
> Intead of the slim dump, can you please include the full RTL dump.  I find
> those much easier to read.
>
>
>
>>
>> Possible patch for  can_replace_by in cfgcleanup.c.
>>
>> -  if (!note1 || !note2 || !rtx_equal_p (XEXP (note1, 0), XEXP (note2, 0))
>> -  || !CONST_INT_P (XEXP (note1, 0)))
>> +
>> +  if (!note1 || !CONST_INT_P (XEXP (note1, 0)))
>>   return dir_none;
>>
>> +  if (note2)
>> +{
>> +  if (!rtx_equal_p (XEXP (note1, 0), XEXP (note2, 0)))
>> +   return dir_none;
>> +}
>> +  else
>> +{
>> +  if (!CONST_INT_P (SET_SRC (s2))
>> + || !rtx_equal_p (XEXP (note1, 0), SET_SRC (s2)))
>> +   return dir_none;
>> +}
>> +
>>
>> I'm not sure the idea is ok or it might crash something.
>> Any suggestion would be very helpful.
>
> Seems like you're on a reasonable path to me.  I suggest you stick with it.
>
> Basically what it appears you're trying to do is unify insns from different
> blocks where one looks like
>
> (set x y)  with an attached REG_EQUAL note
>
> And the other looks like
>
> (set x const_int)
>
> Where the REG_EQUAL note has the same value as the const_int in the second
> set.
>
> I think you'd want to handle both cases i1 has the note i2, no note and i1
> has no note and i2 has a note.
>
> Jeff
>
> jeff


can_replace_by.patch
Description: Binary data


pr43920-2.c.244r.jump2.ori
Description: Binary data


pr43920-2.c.244r.jump2.patch_can_replace_by
Description: Binary data


Changelog.can_replace_by
Description: Binary data


Re: [Patch] pr65779 - [5/6 Regression] undefined local symbol on powerpc

2015-04-20 Thread Jakub Jelinek
On Mon, Apr 20, 2015 at 12:40:49PM +0930, Alan Modra wrote:
> with the log for the ubsan fails
> /src/gcc-5/gcc/testsuite/c-c++-common/ubsan/object-size-10.c:19:11: runtime 
> error: index 128 out of bounds for type 'char [128]'
> /src/gcc-5/gcc/testsuite/c-c++-common/ubsan/object-size-10.c:19:11: runtime 
> error: load of address 0x0804a000 with insufficient space for an object of 
> type 'char'
> 0x0804a000: note: pointer points here
> 

The issue here is that libsanitizer wants to print some context around the
variable, and doesn't try too hard, so if the variable is too close to the
end of the RW PT_LOAD, you get different message from what is expected.
In your case, most likely the end of the array happens to be exactly at the
end of the PT_LOAD segment.

So, the fix is either to try harder in ubsan renderMemorySnippet function
(it first computes the region it wishes to print, then has
  if (!IsAccessibleMemoryRange(Min, Max - Min)) {
Printf("\n");
return;
  }
).  Supposedly it could, if there are any page boundary crosses in the
Min .. Max region lower a little bit (to the page boundary) the end and/or
increase to the page boundary the start, and retry with that if it is
accessible.

Or we'd need to make the testcases that suffer from this accept also
the  in place of the memory content line, line
with ^ marker (don't remember if there is yet another one).

> gcc/
>   PR debug/65779
>   * shrink-wrap.c (insn_uses_reg): New function.
>   (move_insn_for_shrink_wrap): Remove debug insns using regs set
>   by the moved insn.
> gcc/testsuite/
>   * gcc.dg/pr65779.c: New.
> 
> Index: gcc/shrink-wrap.c
> ===
> --- gcc/shrink-wrap.c (revision 222160)
> +++ gcc/shrink-wrap.c (working copy)
> @@ -182,6 +182,21 @@ live_edge_for_reg (basic_block bb, int regno, int
>return live_edge;
>  }
>  
> +static bool
> +insn_uses_reg (rtx_insn *insn, unsigned int regno, unsigned int end_regno)
> +{
> +  df_ref use;
> +
> +  FOR_EACH_INSN_USE (use, insn)
> +{
> +  rtx reg = DF_REF_REG (use);
> +
> +  if (REG_P (reg) && REGNO (reg) >= regno && REGNO (reg) < end_regno)
> + return true;
> +}
> +  return false;
> +}
> +
>  /* Try to move INSN from BB to a successor.  Return true on success.
> USES and DEFS are the set of registers that are used and defined
> after INSN in BB.  SPLIT_P indicates whether a live edge from BB
> @@ -340,10 +355,15 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_ins
>*split_p = true;
>  }
>  
> +  vec live_bbs;
> +  if (MAY_HAVE_DEBUG_INSNS)
> +live_bbs.create (5);

Just wonder if using an
  auto_vec live_bbs;

> +   FOR_BB_INSNS_REVERSE (tmp_bb, dinsn)
> + {
> +   if (dinsn == insn)
> + break;
> +   if (DEBUG_INSN_P (dinsn)
> +   && insn_uses_reg (dinsn, dregno, end_dregno))
> + {
> +   if (*split_p)
> + /* If split, then we will be moving insn into a
> +newly created block immediately after the entry
> +block.  Move the debug info there too.  */
> + emit_debug_insn_after (PATTERN (dinsn), bb_note (bb));
> +   delete_insn (dinsn);

Debug insns should never be deleted, nor moved.  You should either
reset them
(INSN_VAR_LOCATION_LOC (insn) = gen_rtx_UNKNOWN_VAR_LOC (); plus
df_insn_rescan_debug_internal (insn);), or try to adjust them based on the
instruction setting the register (say, if insn sets the register to
some other register + 10 and the other register is still live, you could
replace the uses of the register with (plus (the other register) (const_int 
10)).

> +  live_bbs.release ();

If live_bbs is auto_vec, this would not be needed.

Jakub


Re: patch to fix PR65805

2015-04-20 Thread Jakub Jelinek
On Sun, Apr 19, 2015 at 07:49:52PM -0400, Vladimir Makarov wrote:
> The followingpatch fixes
> 
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65805
> 
> The problem occurred when SP was changed between the original insn and
> rematerialized one and the rematerialized insn contained a reg which will be
> substituted by SP.  In this case difference between sp offset and the
> previous sp offset was used twice.
> 
> The patch was bootstrapped and tested on x86/x86-64 and ppc64.
> 
> It is hard to create a test for the PR which will be stable. So the test is
> absent.
> 
> Committed to the trunk as rev. 23.
> 
> Jakub, the problem is present in GCC5 RC too.  Should I commit it into
> gcc-5-branch too?

Yes, but only after the 5.1 release (Wednesday), I think it is too risky
this late before the release.

Jakub


Re: [PATCH] -Warray-bounds TLC

2015-04-20 Thread Richard Biener
On Fri, 17 Apr 2015, Steve Ellcey wrote:

> On Sat, 2015-04-18 at 00:15 +0200, Marc Glisse wrote:
> > >
> > > extern void bad (const char *__assertion)  __attribute__ ((__noreturn__));
> > > struct link_map { long int l_ns; };
> > > extern struct link_namespaces
> > >  {
> > >unsigned int _ns_nloaded;
> > >  } _dl_ns[1];
> > > void _dl_close_worker (struct link_map *map)
> > > {
> > >  long int nsid = map->l_ns;
> > >  struct link_namespaces *ns = &_dl_ns[nsid];
> > >  (nsid != 0) ? (void) (0) : bad ("nsid != 0");
> > >  --ns->_ns_nloaded;
> > > }
> > 
> > It looks close enough to me. The actual access to _dl_ns[nsid] only ever 
> > happens for an index that is out of range. The last line of the function 
> > can never make sense (unreachable or undefined behavior), it is good that 
> > the compiler tells you about it.
> 
> I guess, but it left me very confused because the compiler didn't point
> me at the last line, it pointed me at the '*ns = &_dl_ns[nsid]' line.
> If there was a lot of stuff in between that line, the line with the call
> to the noreturn function, and the ns->ns_loaded line (like there is in
> the real glibc), it is very hard to understand what the compiler is
> trying to tell me when it only points out the first line as where the
> error is.

Yeah - we actually warn on both

 _4 = MEM[(struct link_namespaces *)&_dl_ns][nsid_8]._ns_nloaded;

and

 MEM[(struct link_namespaces *)&_dl_ns][nsid_8]._ns_nloaded = _5;

which is also why you get two warnings.  The location of the
ARRAY_REF is that of the original address taking operation.
What eventually misses is the location of the dereference
statement (line 12).

Richard.


Re: [PATCH PR65767]Fix test case failure on arm-none-eabi

2015-04-20 Thread Ramana Radhakrishnan
On Mon, Apr 20, 2015 at 7:50 AM, Bin Cheng  wrote:
> Hi,
> As comments at PR65767 and PR65718, we should use namespace other than std
> to avoid duplicated definition problem on arm-none-eabi.  This patch fixes
> the issue.  It is an obvious change, but I will wait for approval because of
> GCC5 branch.

Ok for trunk.

Please wait for Jakub to ack for the GCC 5 branch as the release
appears to be imminent.

Ramana

>
> Is it OK?
>
> gcc/testsuite/ChangeLog
> 2015-04-20  Bin Cheng  
>
> PR testsuite/65767
> * g++.dg/lto/pr65276_0.C: Change namespace std to std2.
> * g++.dg/lto/pr65276_1.C: Change namespace std to std2.


Re: [wwwdocs] Update changes.html with libstdc++ changes

2015-04-20 Thread Gerald Pfeifer
On Wed, 8 Apr 2015, Jonathan Wakely wrote:
>> The only drawback of this, and some similar cases, is that we now
>> risk referring to older versions on a release branch.
> Yes, I realised that problem when making the change and linking to the
> versions that were current at the time. One option would be to add a
> gcc-4.8 symlink that points to the latest gcc-4.8.x version, but that
> adds more work for the release managers and only has a small benefit.

Agreed.

> Alternatively, since we only tend to have four or five releases from a
> branch, we could just update them manually when we remember to. That's
> only necessary until the branch closes, at which point the latest
> release won't change. It's not a huge problem if the links don't go
> to the latest docs immediately IMHO.

Agreed, and agreed.  Let's just keep an eye on it and (try to)
update when a new release of an existing branch comes out.

Gerald


Re: [PATCH PR65767]Fix test case failure on arm-none-eabi

2015-04-20 Thread Jakub Jelinek
On Mon, Apr 20, 2015 at 09:17:21AM +0100, Ramana Radhakrishnan wrote:
> On Mon, Apr 20, 2015 at 7:50 AM, Bin Cheng  wrote:
> > Hi,
> > As comments at PR65767 and PR65718, we should use namespace other than std
> > to avoid duplicated definition problem on arm-none-eabi.  This patch fixes
> > the issue.  It is an obvious change, but I will wait for approval because of
> > GCC5 branch.
> 
> Ok for trunk.
> 
> Please wait for Jakub to ack for the GCC 5 branch as the release
> appears to be imminent.

It is ok for the branch even now too.

> > Is it OK?
> >
> > gcc/testsuite/ChangeLog
> > 2015-04-20  Bin Cheng  
> >
> > PR testsuite/65767
> > * g++.dg/lto/pr65276_0.C: Change namespace std to std2.
> > * g++.dg/lto/pr65276_1.C: Change namespace std to std2.

Jakub


[wwwdocs] gcc-5/changes.html

2015-04-20 Thread Gerald Pfeifer
Some minor changes I suggest based on going through the page.

Applied, but happy to reconsider should others feel differently.

Gerald

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v
retrieving revision 1.108
diff -u -r1.108 changes.html
--- changes.html16 Apr 2015 07:53:38 -  1.108
+++ changes.html20 Apr 2015 08:21:32 -
@@ -39,21 +39,21 @@
 -fipa-icf) has been added.  Compared to the identical
 code folding performed by the Gold linker this
 pass does not require function sections.  It also performs merging
-before inlining so inter-procedural optimizations are aware of the
+before inlining, so inter-procedural optimizations are aware of the
 code re-use. On the other hand not all unifications performed
 by a linker are doable by GCC which must honor
 aliasing information. During link-time optimization of Firefox,
 this pass unifies about 31000 functions, that is 14% overall.
  The devirtualization pass was significantly improved by adding
 better support for speculative devirtualization and dynamic type
-detection. About 50% of virtual calls in Firefox are speculatively
-devirtualized during link-time optimization.
+detection. About 50% of virtual calls in Firefox are now
+speculatively devirtualized during link-time optimization.
  A new comdat localization pass allows the linker to eliminate more
  dead code in presence of C++ inline functions.
  Virtual tables are now optimized. Local aliases are used to reduce
 dynamic linking time of C++ virtual tables on ELF targets and
 data alignment has been reduced to limit data segment bloat.
- A new -fno-semantic-interposition flag can be used
+ A new -fno-semantic-interposition option can be used
 to improve code quality of shared libraries where interposition of
 exported symbols is not allowed.
  Write-only variables are now detected and optimized out.
@@ -80,14 +80,14 @@
  -flto-odr-type-merging.
Command-line optimization and target options are now streamed on
  a per-function basis and honored by the link-time optimizer.
- This change makes the link-time optimization a more transparent
+ This change makes link-time optimization a more transparent
  replacement of per-file optimizations.
  It is now possible to build projects that require
  different optimization
  settings for different translation units (such as
  -ffast-math, -mavx, or
  -finline).
- Contrary to the earlier GCC releases, the optimization and target
+ Contrary to earlier GCC releases, the optimization and target
  options passed on the link command line are ignored.
  Note that this applies only to those command-line options
  that can be passed to optimize and


RE: [PATCH, ping1] Fix removing of df problem in df_finish_pass

2015-04-20 Thread Thomas Preud'homme
Ping?

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Tuesday, March 03, 2015 12:02 PM
> To: 'Bernhard Reutner-Fischer'; gcc-patches@gcc.gnu.org; 'Paolo Bonzini';
> 'Seongbae Park'; 'Kenneth Zadeck'
> Subject: RE: [PATCH] Fix removing of df problem in df_finish_pass
> 
> > From: Bernhard Reutner-Fischer [mailto:rep.dot@gmail.com]
> > Sent: Saturday, February 28, 2015 4:00 AM
> > >   use df_remove_problem rather than manually removing problems,
> > living
> >
> > leaving
> 
> Indeed. Please find updated changelog below:
> 
> 2015-03-03  Thomas Preud'homme  
> 
>   * df-core.c (df_finish_pass): Iterate over df-
> >problems_by_index[] and
>   use df_remove_problem rather than manually removing
> problems, leaving
>   holes in df->problems_in_order[].
> 
> Best regards,
> 
> Thomas
> 
> 
> 
> 





Re: [wwwdocs] Update Fortran secrion in 4.8/changes.html

2015-04-20 Thread Gerald Pfeifer
On Mon, 19 Nov 2012, Tobias Burnus wrote:
>> There is one sentence (preceding my patch) which I don't quite
>> understand (specifically around the "to"):
>> 
>>"...which diagnose when code to is inserted for automatic
>>(re)allocation of a variable during assignment."
> Let me try to explain what the warning does and what "automatic
> (re)allocation" is; either the meaning becomes clear – or you might get an
> idea how to improve the wording.
> 
> But I concur that the current version is odd; it should either be "when code
> is [or: "will be"] inserted" or, maybe, "when code is to be inserted".

Thanks for the explanation, Tobias.  That was very helpful.

Now that I understand things better, I went ahead and applied the
following minimal patch (per your suggestion).

Gerald

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v
retrieving revision 1.132
diff -u -r1.132 changes.html
--- changes.html8 Apr 2015 10:33:06 -   1.132
+++ changes.html20 Apr 2015 08:24:33 -
@@ -372,7 +372,7 @@
 The https://gcc.gnu.org/onlinedocs/gfortran/Error-and-Warning-Options.html";>
 -Wrealloc-lhs and -Wrealloc-lhs-all warning
-command-line options have been added, which diagnose when code to is
+command-line options have been added, which diagnose when code is
 inserted for automatic (re)allocation of a variable during assignment.
 This option can be used to decide whether it is safe to use https://gcc.gnu.org/onlinedocs/gfortran/Code-Gen-Options.html";>

Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall

2015-04-20 Thread Kyrill Tkachov

Hi Jeff,

On 17/04/15 18:26, Jeff Law wrote:

On 03/19/2015 08:39 AM, Kyrill Tkachov wrote:

Hi all,

This patch fixes PR 65358. For details look at the excellent write-up
by Honggyu in bugzilla. The problem is that we're trying to pass a struct
partially on the stack and partially in regs during a tail-call
optimisation
but the struct we're passing is also a partial incoming arg though the
split
between stack and regs is different from its outgoing usage.

The emit_push_insn code ends up doing a block move for the on-stack part
but
ends up overwriting the part that needs to be loaded into regs.
My first thought was to just load the regs part first and then do the stack
part but that doesn't work as multiple comments in that function indicate
(the block move being expanded to movmem or other functions being one of
the
reasons).

My proposed solution is to detect when the overlap happens, find the
overlapping region and load it before the stack pushing into pseudos and
after the stack pushing is done move the overlapping values from the
pseudos
into the hard argument regs that they're supposed to go.

That way this new functionality should only ever be triggered when there's
the overlap in this PR (causing wrong-code) and shouldn't affect codegen
anywhere else.

Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu
and x86_64-linux-gnu.

According to the PR this appears at least as far back 4.6 so this isn't a
regression on the release branches, but it is a wrong-code bug.

I'll let Honggyu upstream the testcase separately
(https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00984.html)

I'll be testing this on the 4.8 and 4.9 branches.
Thoughts on this approach?

Thanks,
Kyrill

2015-03-19  Kyrylo Tkachov 

  PR middle-end/65358
  * expr.c (memory_load_overlap): New function.
  (emit_push_insn): When pushing partial args to the stack would
  clobber the register part load the overlapping part into a pseudo
  and put it into the hard reg after pushing.

expr.patch


commit 490c5f2074d76a2927afaea99e4dd0bacccb413c
Author: Kyrylo Tkachov
Date:   Wed Mar 18 13:42:37 2015 +

  [expr.c] PR 65358 Avoid clobbering partial argument during sibcall

diff --git a/gcc/expr.c b/gcc/expr.c
index dc13a14..d3b9156 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -4121,6 +4121,25 @@ emit_single_push_insn (machine_mode mode, rtx x, tree 
type)
   }
   #endif

+/* Add SIZE to X and check whether it's greater than Y.
+   If it is, return the constant amount by which it's greater or smaller.
+   If the two are not statically comparable (for example, X and Y contain
+   different registers) return -1.  This is used in expand_push_insn to
+   figure out if reading SIZE bytes from location X will end up reading from
+   location Y.  */
+
+static int
+memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
+{
+  rtx tmp = plus_constant (Pmode, x, size);
+  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
+
+  if (!CONST_INT_P (sub))
+return -1;
+
+  return INTVAL (sub);
+}

Hmmm, so what happens if the difference is < 0?   I'd be a bit worried
about that case for the PA (for example).

So how about asserting that the INTVAL is >= 0 prior to returning so
that we catch that case if it ever occurs?


INTVAL being >= 0 is the case that I want to catch with this function.
INTVAL <0 is the usual case on leaf call optimisation. On arm, at least,
it means that x and y use the same base register (i.e. same stack frame)
but the offsets are such that reading SIZE bytes from X will not overlap
with Y, thus not requiring the workaround in this patch.
Thus, asserting that the result is positive is not right here.

What characteristic on pa makes this problematic? Is it the STACK_GROWS_UPWARD?
Should I then extend this function to do something like:

HOST_WIDE_INT res = INTVAL (sub);
#ifndef STACK_GROWS_DOWNWARD
res = -res;
#endif

return res?





OK for the trunk with the added assert.  Please commit the testcase from
Honggyu at the same time you commit the patch.


Thanks, will do after the above is resolved.

Kyrill



Let's let it simmer for a while on the trunk before considering it to be
backported.

jeff





Re: ping: [PATCH, ARM] attribute target (thumb,arm) [0-6]

2015-04-20 Thread Christian Bruel
Hello Ramana

>>
> 
> Can you respin this now that we are in stage1 again ?
> 
> Ramana
> 

Attached the rebased, rechecked set of patches. Original with comments
posted in

https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02455.html
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02458.html
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02460.html
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02461.html
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02463.html
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02467.html
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02468.html

many thanks,

Christian
2014-09-23  Christian Bruel  

	* config/arm/arm.h (arm_option_override): Reoganized and split.
	(arm_option_params_internal); New function.
	(arm_option_check_internal): New function.
	(arm_option_override_internal): New function.
	(restrict_default): New boolean.
	(thumb_code, thumb1_code): Remove.
	* config/arm/arm.h (TREE_TARGET_THUMB, TREE_TARGET_THUMB1): New macros.
	(TREE_TARGET_THUM2, TREE_TARGET_ARM): Likewise.
	(thumb_code, thumb1_code): Remove.
	* config/arm/arm.md (is_thumb, is_thumb1): Check TARGET flag.

diff -ruN '--exclude=.svn' a/gcc/gcc/config/arm/arm.c a1/gcc/gcc/config/arm/arm.c
--- a/gcc/gcc/config/arm/arm.c	2015-02-04 09:14:26.120602737 +0100
+++ a1/gcc/gcc/config/arm/arm.c	2015-02-05 09:19:32.853338616 +0100
@@ -846,12 +846,6 @@
 /* Nonzero if tuning for Cortex-A9.  */
 int arm_tune_cortex_a9 = 0;
 
-/* Nonzero if generating Thumb instructions.  */
-int thumb_code = 0;
-
-/* Nonzero if generating Thumb-1 instructions.  */
-int thumb1_code = 0;
-
 /* Nonzero if we should define __THUMB_INTERWORK__ in the
preprocessor.
XXX This is a bit of a hack, it's intended to help work around
@@ -2623,6 +2617,148 @@
   return std_gimplify_va_arg_expr (valist, type, pre_p, post_p);
 }
 
+/* Check any incompatible options that the user has specified.  */
+static void
+arm_option_check_internal (struct gcc_options *opts)
+{
+  /* Make sure that the processor choice does not conflict with any of the
+ other command line choices.  */
+  if (TREE_TARGET_ARM (opts) && !(insn_flags & FL_NOTM))
+error ("target CPU does not support ARM mode");
+
+  /* TARGET_BACKTRACE calls leaf_function_p, which causes a crash if done
+ from here where no function is being compiled currently.  */
+  if ((TARGET_TPCS_FRAME || TARGET_TPCS_LEAF_FRAME) && TREE_TARGET_ARM (opts))
+warning (0, "enabling backtrace support is only meaningful when compiling for the Thumb");
+
+  if (TREE_TARGET_ARM (opts) && TARGET_CALLEE_INTERWORKING)
+warning (0, "enabling callee interworking support is only meaningful when compiling for the Thumb");
+
+  /* If this target is normally configured to use APCS frames, warn if they
+ are turned off and debugging is turned on.  */
+  if (TREE_TARGET_ARM (opts)
+  && write_symbols != NO_DEBUG
+  && !TARGET_APCS_FRAME
+  && (TARGET_DEFAULT & MASK_APCS_FRAME))
+warning (0, "-g with -mno-apcs-frame may not give sensible debugging");
+
+  /* iWMMXt unsupported under Thumb mode.  */
+  if (TREE_TARGET_THUMB (opts) && TARGET_IWMMXT)
+error ("iWMMXt unsupported under Thumb mode");
+
+  if (TARGET_HARD_TP && TREE_TARGET_THUMB1 (opts))
+error ("can not use -mtp=cp15 with 16-bit Thumb");
+
+  if (TREE_TARGET_THUMB (opts) && TARGET_VXWORKS_RTP && flag_pic)
+{
+  error ("RTP PIC is incompatible with Thumb");
+  flag_pic = 0;
+}
+
+  /* We only support -mslow-flash-data on armv7-m targets.  */
+  if (target_slow_flash_data
+  && ((!(arm_arch7 && !arm_arch_notm) && !arm_arch7em)
+	  || (TREE_TARGET_THUMB1 (opts) || flag_pic || TARGET_NEON)))
+error ("-mslow-flash-data only supports non-pic code on armv7-m targets");
+}
+
+/* Check any params depending on attributes that the user has specified.  */
+static void
+arm_option_params_internal (struct gcc_options *opts)
+{
+ /* If we are not using the default (ARM mode) section anchor offset
+ ranges, then set the correct ranges now.  */
+  if (TREE_TARGET_THUMB1 (opts))
+{
+  /* Thumb-1 LDR instructions cannot have negative offsets.
+ Permissible positive offset ranges are 5-bit (for byte loads),
+ 6-bit (for halfword loads), or 7-bit (for word loads).
+ Empirical results suggest a 7-bit anchor range gives the best
+ overall code size.  */
+  targetm.min_anchor_offset = 0;
+  targetm.max_anchor_offset = 127;
+}
+  else if (TREE_TARGET_THUMB2 (opts))
+{
+  /* The minimum is set such that the total size of the block
+ for a particular anchor is 248 + 1 + 4095 bytes, which is
+ divisible by eight, ensuring natural spacing of anchors.  */
+  targetm.min_anchor_offset = -248;
+  targetm.max_anchor_offset = 4095;
+}
+  else
+{
+  targetm.min_anchor_offset = TARGET_MIN_ANCHOR_OFFSET;
+  targetm.max_anchor_offset = TARGET_MAX_ANCHOR_OFFSET;
+}
+
+  if (optimize_size)
+{
+  /* If optimizing for size, 

Re: [Patch] pr65779 - [5/6 Regression] undefined local symbol on powerpc

2015-04-20 Thread Alan Modra
On Mon, Apr 20, 2015 at 09:35:07AM +0200, Jakub Jelinek wrote:
> On Mon, Apr 20, 2015 at 12:40:49PM +0930, Alan Modra wrote:
> > with the log for the ubsan fails
> > /src/gcc-5/gcc/testsuite/c-c++-common/ubsan/object-size-10.c:19:11: runtime 
> > error: index 128 out of bounds for type 'char [128]'
> > /src/gcc-5/gcc/testsuite/c-c++-common/ubsan/object-size-10.c:19:11: runtime 
> > error: load of address 0x0804a000 with insufficient space for an object of 
> > type 'char'
> > 0x0804a000: note: pointer points here
> > 
> 
> The issue here is that libsanitizer wants to print some context around the
> variable, and doesn't try too hard, so if the variable is too close to the
> end of the RW PT_LOAD, you get different message from what is expected.

Thanks for the info.  I don't tend to run sanitizer tests, so this was
the first time I'd seen such a failure.

> Just wonder if using an
>   auto_vec live_bbs;

OK, done.

> > + FOR_BB_INSNS_REVERSE (tmp_bb, dinsn)
> > +   {
> > + if (dinsn == insn)
> > +   break;
> > + if (DEBUG_INSN_P (dinsn)
> > + && insn_uses_reg (dinsn, dregno, end_dregno))
> > +   {
> > + if (*split_p)
> > +   /* If split, then we will be moving insn into a
> > +  newly created block immediately after the entry
> > +  block.  Move the debug info there too.  */
> > +   emit_debug_insn_after (PATTERN (dinsn), bb_note (bb));
> > + delete_insn (dinsn);
> 
> Debug insns should never be deleted, nor moved.  You should either
> reset them
> (INSN_VAR_LOCATION_LOC (insn) = gen_rtx_UNKNOWN_VAR_LOC (); plus
> df_insn_rescan_debug_internal (insn);), or

I had it that way in my first patch, then decided to try deleting..

I can certainly change it back even if only to do it the standard way
for safety's sake, but I'm curious as to why they can't be deleted in
this special case.  My thinking was that we're on a chain of blocks
starting at the entry where there is a single outgoing live edge for
the register being used.  So there shouldn't be any need for a debug
insn to mark info about the variable as invalid.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [Patch] pr65779 - [5/6 Regression] undefined local symbol on powerpc

2015-04-20 Thread Jakub Jelinek
On Mon, Apr 20, 2015 at 06:12:26PM +0930, Alan Modra wrote:
> I had it that way in my first patch, then decided to try deleting..
> 
> I can certainly change it back even if only to do it the standard way
> for safety's sake, but I'm curious as to why they can't be deleted in
> this special case.  My thinking was that we're on a chain of blocks
> starting at the entry where there is a single outgoing live edge for
> the register being used.  So there shouldn't be any need for a debug
> insn to mark info about the variable as invalid.

The debug insns can be for arbitrary variables, there is no "the variable",
and there could be other debug insns for the same variable on that path,
say saying that decl lives in some other register, or can be computed using
an expression involving other registers, or memory etc.  Say you could have
(set regX (whatever))
...
(debug_insn var5 (some expression not referring to regX))
...
(debug_insn var5 (some expression referring to regX))
...
(debug_insn var5 (other expression not referring to regX))
...
(use regX)

where ... contains unrelated insns (not referring to regX) and edges live
for regX.  If shrink wrapping attempts to move the first set somewhere into
the last ..., if you delete debug insns referring to regX, you extend the
lifetime of the previous debug_insn, which is wrong, the registers
referenced in the first debug_insn might not contain the right values
afterwards.  And if you move the debug insn later, you might shorten the
lifetime of the third debug_insn, while regX is supposed to contain the same
value, perhaps some other register referenced there might have been changed
already.

Jakub


Re: [PATCH][expmed] Calculate mult-by-const cost properly in mult_by_coeff_cost

2015-04-20 Thread Kyrill Tkachov


On 15/04/15 16:41, Jeff Law wrote:

On 04/14/2015 02:07 AM, Kyrill Tkachov wrote:

Hi Jeff,

Thanks for looking at this.

On 13/04/15 19:18, Jeff Law wrote:

On 03/16/2015 04:12 AM, Kyrill Tkachov wrote:

Hi all,

Eyeballing the mult_by_coeff_cost function I think it has a typo/bug.
It's supposed to return the cost of multiplying by a constant 'coeff'.
It calculates that by taking the cost of a MULT rtx by that constant
and comparing it to the cost of synthesizing that multiplication, and
returning
the cheapest. However, in the MULT rtx cost calculations it creates
a MULT rtx of two REGs rather than the a REG and the GEN_INT of coeff as
I would
expect. This patches fixes that in the obvious way.

Tested aarch64-none-elf and bootstrapped on x86_64-linux-gnu.
I'm guessing this is stage 1 material at this point?

Thanks,
Kyrill

2015-03-13  Kyrylo Tkachov  

   * expmed.c (mult_by_coeff_cost): Pass CONT_INT rtx to MULT cost
   calculation rather than fake_reg.

I'm pretty sure this patch is wrong.

The call you're referring to is computing an upper limit to the cost for
use by choose_mult_variant.  Once a synthesized multiply sequence
exceeds the cost of reg*reg, then that synthesized sequence can be
thrown away because it's not profitable.

But shouldn't the limit be the mult-by-constant cost?

No, because ultimately we're trying to do better than just loading the
constant into a register and doing a reg * reg.  So the reg*reg case is
the upper bound for allowed cost of a synthesized sequence.


So I've thought about it a bit more and I have another concern.
The function returns this:
  if (choose_mult_variant (mode, coeff, &algorithm, &variant, max_cost))
return algorithm.cost.cost;
  else
return max_cost;

If I read this right, it tries to synthesise the mult at choose_mult_variant
with the limit cost of the reg-by-reg mult, but if the synthesis cost exceeds
that, then it returns the reg-by-reg mult cost (in return max_cost;) so that
can't be right, can it?

Thanks,
Kyrill




Consider also similar logic in expand_mult:
max_cost = set_src_cost (gen_rtx_MULT (mode, fake_reg, op1), speed);
if (choose_mult_variant (mode, coeff, &algorithm, &variant, max_cost))
return expand_mult_const (mode, op0, coeff, target,
  &algorithm, variant);

This looks wrong to me.  They're certainly inconsistent.

Maybe start by asking Bill (who added mult_by_coeff_cost and whom I've
cc'd) what his intent was to make sure it matches my understanding.

Jeff





Re: [PATCH, i386, Darwin RFT]: Remove reload_in_progress checks

2015-04-20 Thread Dominique d'Humières
After having fixed the typo, regtesting went without regression.

Dominique

> Le 19 avr. 2015 à 20:35, Uros Bizjak  a écrit :
> 
> Hello!
> 
> Attached patch removes reload_in_progress checks for x86 (LRA enabled)
> target. AFAICS, reload_in_progress is never set during the
> compilation, a watchpoint on this variable didn't trigger for a couple
> of complex compilations.
> 
> 2015-04-19  Uros Bizjak  
> 
>* config/i386/i386.c (set_pic_reg_ever_live): Remove.
>(legitimize_pic_address): Do not call set_pic_reg_ever_live.
>(legitimize_tls_address): Ditto.
>(ix86_expand_move): Ditto.
>(ix86_expand_binary_operator): Remove reload_in_progress checks.
>(ix86_expand_unary_operator): Ditto.
>* config/i386/predicates.md (index_register_operand): Ditto.
> 
> Patch was bootstrapped on x86_64-linux-gnu and regression tested for
> x86_64-linux-gnu, i686-linux-gnu with and w/o -fpic.
> 
> The patch also changes Darwin specific code that can't be tested
> properly on linux. Instead of leaving a reload_in_progress_check
> theere, I'd ask someone to bootstrap and regression test the patch on
> Darwin target.
> 
> I'll wait for the Darwin regression test results (and possible
> comments) before committing the patch.
> 
> Uros.
> 



Re: [PATCH, i386, Darwin RFT]: Remove reload_in_progress checks

2015-04-20 Thread Iain Sandoe

On 20 Apr 2015, at 10:47, Dominique d'Humières wrote:

> After having fixed the typo, regtesting went without regression.

I have done a bootstrap on i686-darwin10 with the amended patch - slow machine, 
so testing still in progress (but looks OK so far),

NOTE: that there some references to reload_in_progress in config/darwin.c pic 
code shared between powerpc and x86 darwin implementations.  I will do a 
follow-up patch to make those assert if triggered on x86 (AFAIK, they still 
need to be present for powerpc, at present).

Iain

> 
> Dominique
> 
>> Le 19 avr. 2015 à 20:35, Uros Bizjak  a écrit :
>> 
>> Hello!
>> 
>> Attached patch removes reload_in_progress checks for x86 (LRA enabled)
>> target. AFAICS, reload_in_progress is never set during the
>> compilation, a watchpoint on this variable didn't trigger for a couple
>> of complex compilations.
>> 
>> 2015-04-19  Uros Bizjak  
>> 
>>   * config/i386/i386.c (set_pic_reg_ever_live): Remove.
>>   (legitimize_pic_address): Do not call set_pic_reg_ever_live.
>>   (legitimize_tls_address): Ditto.
>>   (ix86_expand_move): Ditto.
>>   (ix86_expand_binary_operator): Remove reload_in_progress checks.
>>   (ix86_expand_unary_operator): Ditto.
>>   * config/i386/predicates.md (index_register_operand): Ditto.
>> 
>> Patch was bootstrapped on x86_64-linux-gnu and regression tested for
>> x86_64-linux-gnu, i686-linux-gnu with and w/o -fpic.
>> 
>> The patch also changes Darwin specific code that can't be tested
>> properly on linux. Instead of leaving a reload_in_progress_check
>> theere, I'd ask someone to bootstrap and regression test the patch on
>> Darwin target.
>> 
>> I'll wait for the Darwin regression test results (and possible
>> comments) before committing the patch.
>> 
>> Uros.
>> 
> 



[PATCH][AArch64] PR target/65491: Classify V1TF vectors as AAPCS64 short vectors rather than composite types

2015-04-20 Thread Kyrill Tkachov
Hi all,

The ICE in the PR happens when we pass a 1x(128-bit float) vector as an
argument.
The aarch64 backend erroneously classifies it as a composite type when in
fact it
is a short vector according to AAPCS64
(section 4.1.2 from
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.p
df).

The solution in this patch is to check aarch64_composite_type_p for a short
vector with
aarch64_short_vector_p rather than the other way around (check for
aarch64_short_vector_p
in aarch64_composite_type_p).

With this patch the testcase compiles fine and in the generated code the
argument is passed
in the simd registers like the ABI requires.

Bootstrapped and tested on aarch64-linux.

This bug appears on all release branches so it's not a regression.
Ok for trunk?
Do we want this in the release branches eventually?

Thanks,
Kyrill

2015-04-20  Kyrylo Tkachov  

PR target/65491
* config/aarch64/aarch64.c (aarch64_short_vector_p): Move above
aarch64_composite_type_p.  Remove check for aarch64_composite_type_p.
(aarch64_composite_type_p): Return false if given type and mode are
for a short vector.

2015-04-20  Kyrylo Tkachov  

PR target/65491
* gcc.target/aarch64/pr65491_1.c: New test.
* gcc.target/aarch64/aapcs64/type-def.h (vlf1_t): New typedef.
* gcc.target/aarch64/aapcs64/func-ret-1.c: Add test for vlf1_t.


aarch64-v1tf-aapcs.patch
Description: Binary data


Re: [PATCH, i386, Darwin RFT]: Remove reload_in_progress checks

2015-04-20 Thread Uros Bizjak
On Mon, Apr 20, 2015 at 12:00 PM, Iain Sandoe  wrote:

>> After having fixed the typo, regtesting went without regression.
>
> I have done a bootstrap on i686-darwin10 with the amended patch - slow 
> machine, so testing still in progress (but looks OK so far),
>
> NOTE: that there some references to reload_in_progress in config/darwin.c pic 
> code shared between powerpc and x86 darwin implementations.  I will do a 
> follow-up patch to make those assert if triggered on x86 (AFAIK, they still 
> need to be present for powerpc, at present).

Probably a better way is to include "targetm.lra_p ()" into the check.

Uros.


Re: [PATCH 1/2] ?PR c++/61636

2015-04-20 Thread Marek Polacek
On Sat, Apr 18, 2015 at 06:53:28PM +0100, Adam Butcher wrote:
> Test like this?
> 
> /* { dg-do run { target c++14 } }  */
> /* { dg-final { scan-assembler-not "..." } }  */

What is this dg-final supposed to do here?

Marek


[PATCH][doc] Improve pipeline description docs a bit

2015-04-20 Thread Kyrill Tkachov
Hi all,

This patch attempts to improve the pipeline description documentation.
It fixes some grammar errors,typos and clarifies some concepts.

The sections on the syntactic constructs are formatted to have a
small description, and example, description of syntax elements and some
elaboration.

Is this ok for trunk?

Thanks,
Kyrill

2014-04-20  Kyrylo Tkachov  

* doc/md.texi (Specifying processor pipeline description):
Improve wording.
Clarify some constructs.


doc-md-texi.patch
Description: Binary data


[patch,wwwdocs] Add gcc-5 caveats for avr.

2015-04-20 Thread Georg-Johann Lay

Hi Gerald, this is the patch against GCC-5's release notes.

Okay to install?

Johann


Index: gcc-5/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v
retrieving revision 1.109
diff -u -p -r1.109 changes.html
--- gcc-5/changes.html	20 Apr 2015 08:22:35 -	1.109
+++ gcc-5/changes.html	20 Apr 2015 10:34:45 -
@@ -28,6 +28,14 @@
 is_trivially_default_constructible,
 is_trivially_copy_constructible and
 is_trivially_copy_assignable should be used instead.
+On AVR, support has been added for the devices ATtiny4/5/9/10/20/40.
+This requires Binutils 2.25 or newer.
+The AVR port uses a new scheme to describe supported devices:
+For each supported device the compiler provides a device-specific
+http://gcc.gnu.org/onlinedocs/gcc/Spec-Files.html";>spec file.
+If the compiler is used together with AVR-LibC, this requires at
+least GCC 5.2 and a version of AVR-LibC which implements
+http://savannah.nongnu.org/bugs/?44574";#44574.
   
 
 General Optimizer Improvements
@@ -690,6 +698,18 @@ here.
  
  
 
+AVR
+
+  A new command option -nodevicelib has been added.
+If this option is turned on the compiler won't link against AVR-LibC's
+device-specific library libdevice.a by omitting
+-ldevice from the linker's command line.
+If the compiler had not been
+http://gcc.gnu.org/install/configure.html";>configured
+to be used with AVR-LibC, the compiler will not link against that
+library and the option has no effect.
+
+
 IA-32/x86-64
   
 New ISA extensions support


[RFC][wwwdocs] bug report reopen policy

2015-04-20 Thread Tom de Vries

Hi,

In PR64683 comment 11, Ian mentioned:
...
This bug may have the same symptoms but it has a completely different cause.
Next time, please do not reopen the bug unless you are certain it has the same
cause.  Please open a new bug instead.  Thanks.
...

I couldn't find a similar rule in the 'Reporting Bugs' section, so this patch 
adds such a rule.


The patch (validated as required) adds the following item to the 'Reporting Bugs 
- What we DON'T want' bit of https://gcc.gnu.org/bugs/ :

...
* Reopening a bug report when you're not certain that the failure you're
  reporting has the same root cause as the reopened bug report
...

Thanks,
- Tom
Index: htdocs/bugs/index.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/bugs/index.html,v
retrieving revision 1.116
diff -u -r1.116 index.html
--- htdocs/bugs/index.html	5 Jul 2014 21:52:32 -	1.116
+++ htdocs/bugs/index.html	20 Apr 2015 10:46:37 -
@@ -119,6 +119,9 @@
   Questions about the correctness or the expected behavior of
   certain constructs that are not GCC extensions.  Ask them in forums
   dedicated to the discussion of the programming language
+
+  Reopening a bug report when you're not certain that the failure you're
+  reporting has the same root cause as the reopened bug report
 
 
 Where to post it


Re: [PATCH][ARM][cleanup] Use IN_RANGE more often

2015-04-20 Thread Kyrill Tkachov


On 18/04/15 15:18, Richard Earnshaw wrote:

On 15/04/15 16:22, Kyrill Tkachov wrote:

Hi all,

This patch goes through the arm backend and replaces expressions of the
form
a >= lo && a <= hi with IN_RANGE (a, lo, hi) which is that tiny bit smaller
and easier to read in my opinion. I guess there's also a chance it might
make
things infinitesimally faster since IN_RANGE evaluates 'a' only once.
The patch also substitutes expressions like a > hi || a < lo with
!IN_RANGE (a, lo, hi) which, again, conveys the intended meaning more
clearly.
I tried to make sure not to introduce any off-by-one errors and testing
caught some that I had made while writing these.

Bootstrapped and tested arm-none-linux-gnueabihf. Built and run SPEC2006
succesfully.

Ok for trunk once 5.1 is released?


I think this is pretty obvious for those cases where the type of the
range is [unsigned] HOST_WIDE_INT, but much less obvious for those cases
where the type is just int, or unsigned.  Cases that I think need more
careful examination include vfp3_const_double_index and
aapcs_vfp_is_call_or_return_candidate, but I haven't gone through every
instance to check whether there are more cases.


The definition and comment on IN_RANGE in system.h is:
/* A macro to determine whether a VALUE lies inclusively within a
   certain range without evaluating the VALUE more than once.  This
   macro won't warn if the VALUE is unsigned and the LOWER bound is
   zero, as it would e.g. with "VALUE >= 0 && ...".  Note the LOWER
   bound *is* evaluated twice, and LOWER must not be greater than
   UPPER.  However the bounds themselves can be either positive or
   negative.  */
#define IN_RANGE(VALUE, LOWER, UPPER) \
  ((unsigned HOST_WIDE_INT) (VALUE) - (unsigned HOST_WIDE_INT) (LOWER) \
   <= (unsigned HOST_WIDE_INT) (UPPER) - (unsigned HOST_WIDE_INT) (LOWER))

Since it works on positive or negative bounds, I'd think it would work on
signed numbers, wouldn't it?



I'd be particularly concerned about these if the widening of the result
caused a code quality regression on a native 32-bit machine (since HWI
is a 64-bit type).


That being said, I see a 0.6% size increase on cc1 built on a native arm-linux
system. This seems like a not trivial increase to me. If that is not acceptable
then we can drop this patch.

Thanks,
Kyrill



R.


Thanks,
Kyrill

2015-04-15  Kyrylo Tkachov  

 * config/arm/arm.md (*zeroextractsi_compare0_scratch): Use IN_RANGE
 instead of two compares.
 (*ne_zeroextractsi): Likewise.
 (*ite_ne_zeroextractsi): Likewise.
 (load_multiple): Likewise.
 (store_multiple): Likewise.
 * config/arm/arm.h (IS_IWMMXT_REGNUM): Likewise.
 (IS_IWMMXT_GR_REGNUM): Likewise.
 (IS_VFP_REGNUM): Likewise.
 * config/arm/arm.c (arm_return_in_memory): Likewise.
 (aapcs_vfp_is_call_or_return_candidate): Likewise.
 (thumb_find_work_register): Likewise.
 (thumb2_legitimate_address_p): Likewise.
 (arm_legitimate_index_p): Likewise.
 (thumb2_legitimate_index_p): Likewise.
 (thumb1_legitimate_address_p): Likewise.
 (thumb_legitimize_address): Likewise.
 (vfp3_const_double_index): Likewise.
 (neon_immediate_valid_for_logic): Likewise.
 (bounds_check): Likewise.
 (load_multiple_sequence): Likewise.
 (store_multiple_sequence): Likewise.
 (offset_ok_for_ldrd_strd): Likewise.
 (callee_saved_reg_p): Likewise.
 (thumb2_emit_strd_push): Likewise.
 (arm_output_load_gr): Likewise.
 (arm_vector_mode_supported_p): Likewise.
 * config/arm/neon.md (ashldi3_neon_noclobber): Likewise.
 (ashrdi3_neon_imm_noclobber): Likewise.
 (lshrdi3_neon_imm_noclobber): Likewise.
 * config/arm/thumb1.md (*thumb1_addsi3): Likewise.
 * config/arm/thumb2.md (define_peephole2's after orsi_not_shiftsi_si):
 Likewise.




Re: [PATCH][ARM][cleanup] Use IN_RANGE more often

2015-04-20 Thread Jakub Jelinek
On Mon, Apr 20, 2015 at 12:03:11PM +0100, Kyrill Tkachov wrote:
> The definition and comment on IN_RANGE in system.h is:
> /* A macro to determine whether a VALUE lies inclusively within a
>certain range without evaluating the VALUE more than once.  This
>macro won't warn if the VALUE is unsigned and the LOWER bound is
>zero, as it would e.g. with "VALUE >= 0 && ...".  Note the LOWER
>bound *is* evaluated twice, and LOWER must not be greater than
>UPPER.  However the bounds themselves can be either positive or
>negative.  */
> #define IN_RANGE(VALUE, LOWER, UPPER) \
>   ((unsigned HOST_WIDE_INT) (VALUE) - (unsigned HOST_WIDE_INT) (LOWER) \
><= (unsigned HOST_WIDE_INT) (UPPER) - (unsigned HOST_WIDE_INT) (LOWER))
> 
> Since it works on positive or negative bounds, I'd think it would work on
> signed numbers, wouldn't it?

Of course it does, as long as the types of VALUE, LOWER and UPPER aren't
wider integer types than {,un}signed HOST_WIDE_INT.  As HWI is always 64-bit
now, that just means it wouldn't work for 128-bit integral types that aren't
used anywhere in GCC.  So, just don't use IN_RANGE for floating point
values, that might have surprising effects, otherwise it is safe to use it.

Jakub


[patch] Improve libstdc++ documentation on using atomics.

2015-04-20 Thread Jonathan Wakely

Currently we don't mention libatomic anywhere in the libstdc++ manual,
even though it might be needed for std::atomic.

This fixes that and makes a few other drive-by improvements.

Committed to trunk. This would be suitable for all active branches,
so I might backport it once the gcc-5-branch opens.


commit b86123208d25cf50b75b58a2fc7911972690b414
Author: Jonathan Wakely 
Date:   Mon Apr 20 12:01:59 2015 +0100

	* doc/xml/manual/concurrency_extensions.xml: Update documentation
	on atomics.
	* doc/xml/manual/using.xml: Likewise. Improve markup.
	* doc/html/*: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/concurrency_extensions.xml b/libstdc++-v3/doc/xml/manual/concurrency_extensions.xml
index b9bab53..cb79c20 100644
--- a/libstdc++-v3/doc/xml/manual/concurrency_extensions.xml
+++ b/libstdc++-v3/doc/xml/manual/concurrency_extensions.xml
@@ -187,7 +187,7 @@ host hardware and operating system.
 Implementation
   
   
-  Using Builtin Atomic Functions
+  Using Built-in Atomic Functions
 
 
 The functions for atomic operations described above are either
@@ -197,9 +197,21 @@ capable) or by library fallbacks.
 Compiler intrinsics (builtins) are always preferred.  However, as
 the compiler builtins for atomics are not universally implemented,
 using them directly is problematic, and can result in undefined
-function calls. (An example of an undefined symbol from the use
+function calls.
+
+
+Prior to GCC 4.7 the older __sync intrinsics were used.
+An example of an undefined symbol from the use
 of __sync_fetch_and_add on an unsupported host is a
-missing reference to __sync_fetch_and_add_4.)
+missing reference to __sync_fetch_and_add_4.
+
+
+Current releases use the newer __atomic intrinsics,
+which are implemented by library calls if the hardware doesn't support them.
+Undefined references to functions like
+__atomic_is_lock_free should be resolved by linking to
+libatomic, which is usually
+installed alongside libstdc++.
 
 
 In addition, on some hosts the compiler intrinsics are enabled
diff --git a/libstdc++-v3/doc/xml/manual/using.xml b/libstdc++-v3/doc/xml/manual/using.xml
index 3256c58..f6f615e 100644
--- a/libstdc++-v3/doc/xml/manual/using.xml
+++ b/libstdc++-v3/doc/xml/manual/using.xml
@@ -6,8 +6,7 @@
   Command Options
 
 
-  The set of features available in the GNU C++ library is shaped
-  by
+  The set of features available in the GNU C++ library is shaped by
   several http://www.w3.org/1999/xlink"; xlink:href="http://gcc.gnu.org/onlinedocs/gcc-4.3.2/gcc/Invoking-GCC.html";>GCC
   Command Options. Options that impact libstdc++ are
   enumerated and detailed in the table below.
@@ -64,8 +63,20 @@
 
 
   -pthread or -pthreads
-  For ISO C++11 , ,
-  , or .
+  For ISO C++11
+,
+,
+,
+or .
+  
+
+
+
+  -latomic
+  Linking to libatomic
+is required for some uses of ISO C++11
+.
+  
 
 
 
@@ -779,8 +790,9 @@ g++ -Winvalid-pch -I. -include stdc++.h -H -g -O2 hello.cc -o test.exe
   file c++config.h, which
   is generated during the libstdc++ configuration and build
   process. This file is then included when needed by files part of
-  the public libstdc++ API, like . Most of these macros
-  should not be used by consumers of libstdc++, and are reserved
+  the public libstdc++ API, like
+  . Most of these
+  macros should not be used by consumers of libstdc++, and are reserved
   for internal implementation use. These macros cannot
   be redefined.

@@ -800,10 +812,10 @@ g++ -Winvalid-pch -I. -include stdc++.h -H -g -O2 hello.cc -o test.exe
   __GLIBCXX__
   
 	The current version of
-libstdc++ in compressed ISO date format, form of an unsigned
+libstdc++ in compressed ISO date format, as an unsigned
 long. For details on the value of this particular macro for a
-particular release, please consult this 
-document.
+particular release, please consult the 
+ABI Policy and Guidelines appendix.
 
 
 
@@ -816,14 +828,15 @@ g++ -Winvalid-pch -I. -include stdc++.h -H -g -O2 hello.cc -o test.exe
Configurable (or Not configurable) means
   that the symbol is initially chosen (or not) based on
   --enable/--disable options at library build and configure time
-  (documented here), with the
-  various --enable/--disable choices being translated to
+  (documented in
+  Configure),
+  with the various --enable/--disable choices being translated to
   #define/#undef).

 
 ABI means that changing from the default value may
-  mean changing the ABI of compiled code. In other words, these
-  choices control code which has already been compiled (i.e., in a
+  mean changing the ABI of compiled code. In other words,
+  these choices co

Re: [PATCH][expmed] Properly account for the cost and latency of shift+add ops when synthesizing mults

2015-04-20 Thread Kyrill Tkachov

Hi Jeff,

On 17/04/15 20:38, Jeff Law wrote:

On 04/14/2015 02:11 AM, Kyrill Tkachov wrote:

Of course the effect on codegen of this patch depends a lot on the rtx
costs in the backend.
On aarch64 with -mcpu=cortex-a57 tuning I see the cost limit being
exceeded in more cases and the
expansion code choosing instead to do a move-immediate and a mul
instruction.
No regressions on SPEC2006 on a Cortex-A57.

For example, for code:
long f0 (int x, int y)
{
return (long)x * 6L;
}


int f1(int x)
{
return x * 10;
}

int f2(int x)
{
  return x * 100;
}

int f3(int x)
{
  return x * 20;
}

int f4(int x)
{
  return x * 25;
}

int f5(int x)
{
return x * 11;
}

Please turn this into a test for the testsuite.  It's fine if this the
test is specific to AArch64.  You may need to break it into 6 individual
tests since what you want to check for in each one may be significantly
different.  For example, f0, f4 and f5 you'd probably check for the
constant load & multiply instructions.  Not sure how to best test for
what you want in f1-f3.


f1/f3 still end up synthesising the mult, but prefer a different
algorithm. I don't think the algorithm chosen in f1/f3 is worse or
better than what it was producing before, so I don't think there's
much point in testing for it. If you think it's really better to
test for something, I propose testing that only two instructions are
generated, and neither of them are a 'mul'. I'll repost a patch with
my proposed testcases for f0,f2,f4,f5.





Bootstrapped and tested on arm, aarch64, x86_64-linux.
Ok for trunk?

Thanks,
Kyrill

2015-04-14  Kyrylo Tkachov  

  * expmed.c: (synth_mult): Only assume overlapping
  shift with previous steps in alg_sub_t_m2 case.

OK with a test for the testsuite.


Thanks for looking at this,
Kyrill




jeff





[PATCH] Simplify gimple_build interface, more closely match fold_buildN

2015-04-20 Thread Richard Biener

This changes gimple_build to follow fold_buildN behavior - combine
stmts only from the sequence(s) we are currently building.  This
avoids possible issues with a straight-forward transitioning to
gimple_build in passes if they do not keep SSA form up-to-date.

Bootstrap and regtest in progress on x86_64-unknown-linux-gnu.

Richard.

2015-04-20  Richard Biener  

* gimple-fold.h (gimple_build): Remove optional valueize arguments.
* gimple-fold.c (gimple_build_valueize): New function.
(gimple_build): Always use gimple_build_valueize as valueize hook.

Index: gcc/gimple-fold.c
===
*** gcc/gimple-fold.c   (revision 27)
--- gcc/gimple-fold.c   (working copy)
*** rewrite_to_defined_overflow (gimple stmt
*** 6078,6095 
  }
  
  
  /* Build the expression CODE OP0 of type TYPE with location LOC,
!simplifying it first if possible using VALUEIZE if not NULL.
!OP0 is expected to be valueized already.  Returns the built
 expression value and appends statements possibly defining it
 to SEQ.  */
  
  tree
  gimple_build (gimple_seq *seq, location_t loc,
! enum tree_code code, tree type, tree op0,
! tree (*valueize)(tree))
  {
!   tree res = gimple_simplify (code, type, op0, seq, valueize);
if (!res)
  {
if (gimple_in_ssa_p (cfun))
--- 6078,6105 
  }
  
  
+ /* The valueization hook we use for the gimple_build API simplification.
+This makes us match fold_buildN behavior by only combining with
+statements in the sequence(s) we are currently building.  */
+ 
+ static tree
+ gimple_build_valueize (tree op)
+ {
+   if (gimple_bb (SSA_NAME_DEF_STMT (op)) == NULL)
+ return op;
+   return NULL_TREE;
+ }
+ 
  /* Build the expression CODE OP0 of type TYPE with location LOC,
!simplifying it first if possible.  Returns the built
 expression value and appends statements possibly defining it
 to SEQ.  */
  
  tree
  gimple_build (gimple_seq *seq, location_t loc,
! enum tree_code code, tree type, tree op0)
  {
!   tree res = gimple_simplify (code, type, op0, seq, gimple_build_valueize);
if (!res)
  {
if (gimple_in_ssa_p (cfun))
*** gimple_build (gimple_seq *seq, location_
*** 6110,6126 
  }
  
  /* Build the expression OP0 CODE OP1 of type TYPE with location LOC,
!simplifying it first if possible using VALUEIZE if not NULL.
!OP0 and OP1 are expected to be valueized already.  Returns the built
 expression value and appends statements possibly defining it
 to SEQ.  */
  
  tree
  gimple_build (gimple_seq *seq, location_t loc,
! enum tree_code code, tree type, tree op0, tree op1,
! tree (*valueize)(tree))
  {
!   tree res = gimple_simplify (code, type, op0, op1, seq, valueize);
if (!res)
  {
if (gimple_in_ssa_p (cfun))
--- 6120,6134 
  }
  
  /* Build the expression OP0 CODE OP1 of type TYPE with location LOC,
!simplifying it first if possible.  Returns the built
 expression value and appends statements possibly defining it
 to SEQ.  */
  
  tree
  gimple_build (gimple_seq *seq, location_t loc,
! enum tree_code code, tree type, tree op0, tree op1)
  {
!   tree res = gimple_simplify (code, type, op0, op1, seq, 
gimple_build_valueize);
if (!res)
  {
if (gimple_in_ssa_p (cfun))
*** gimple_build (gimple_seq *seq, location_
*** 6135,6152 
  }
  
  /* Build the expression (CODE OP0 OP1 OP2) of type TYPE with location LOC,
!simplifying it first if possible using VALUEIZE if not NULL.
!OP0, OP1 and OP2 are expected to be valueized already.  Returns the built
 expression value and appends statements possibly defining it
 to SEQ.  */
  
  tree
  gimple_build (gimple_seq *seq, location_t loc,
! enum tree_code code, tree type, tree op0, tree op1, tree op2,
! tree (*valueize)(tree))
  {
tree res = gimple_simplify (code, type, op0, op1, op2,
! seq, valueize);
if (!res)
  {
if (gimple_in_ssa_p (cfun))
--- 6143,6158 
  }
  
  /* Build the expression (CODE OP0 OP1 OP2) of type TYPE with location LOC,
!simplifying it first if possible.  Returns the built
 expression value and appends statements possibly defining it
 to SEQ.  */
  
  tree
  gimple_build (gimple_seq *seq, location_t loc,
! enum tree_code code, tree type, tree op0, tree op1, tree op2)
  {
tree res = gimple_simplify (code, type, op0, op1, op2,
! seq, gimple_build_valueize);
if (!res)
  {
if (gimple_in_ssa_p (cfun))
*** gimple_build (gimple_seq *seq, location_
*** 6167,6183 
  
  /* Build the call FN (ARG0) with a result of type TYPE
 (or no result if TYPE is void) with location LOC,
!simplifying it first if possible using VALUEIZE if not NULL.
!ARG0 is expe

Re: [PATCH][ARM][cleanup] Use IN_RANGE more often

2015-04-20 Thread Richard Earnshaw
On 20/04/15 12:03, Kyrill Tkachov wrote:
> 
> On 18/04/15 15:18, Richard Earnshaw wrote:
>> On 15/04/15 16:22, Kyrill Tkachov wrote:
>>> Hi all,
>>>
>>> This patch goes through the arm backend and replaces expressions of the
>>> form
>>> a >= lo && a <= hi with IN_RANGE (a, lo, hi) which is that tiny bit
>>> smaller
>>> and easier to read in my opinion. I guess there's also a chance it might
>>> make
>>> things infinitesimally faster since IN_RANGE evaluates 'a' only once.
>>> The patch also substitutes expressions like a > hi || a < lo with
>>> !IN_RANGE (a, lo, hi) which, again, conveys the intended meaning more
>>> clearly.
>>> I tried to make sure not to introduce any off-by-one errors and testing
>>> caught some that I had made while writing these.
>>>
>>> Bootstrapped and tested arm-none-linux-gnueabihf. Built and run SPEC2006
>>> succesfully.
>>>
>>> Ok for trunk once 5.1 is released?
>>>
>> I think this is pretty obvious for those cases where the type of the
>> range is [unsigned] HOST_WIDE_INT, but much less obvious for those cases
>> where the type is just int, or unsigned.  Cases that I think need more
>> careful examination include vfp3_const_double_index and
>> aapcs_vfp_is_call_or_return_candidate, but I haven't gone through every
>> instance to check whether there are more cases.
> 
> The definition and comment on IN_RANGE in system.h is:
> /* A macro to determine whether a VALUE lies inclusively within a
>certain range without evaluating the VALUE more than once.  This
>macro won't warn if the VALUE is unsigned and the LOWER bound is
>zero, as it would e.g. with "VALUE >= 0 && ...".  Note the LOWER
>bound *is* evaluated twice, and LOWER must not be greater than
>UPPER.  However the bounds themselves can be either positive or
>negative.  */
> #define IN_RANGE(VALUE, LOWER, UPPER) \
>   ((unsigned HOST_WIDE_INT) (VALUE) - (unsigned HOST_WIDE_INT) (LOWER) \
><= (unsigned HOST_WIDE_INT) (UPPER) - (unsigned HOST_WIDE_INT) (LOWER))
> 
> Since it works on positive or negative bounds, I'd think it would work on
> signed numbers, wouldn't it?
> 
>>
>> I'd be particularly concerned about these if the widening of the result
>> caused a code quality regression on a native 32-bit machine (since HWI
>> is a 64-bit type).
> 
> That being said, I see a 0.6% size increase on cc1 built on a native
> arm-linux
> system. This seems like a not trivial increase to me. If that is not
> acceptable
> then we can drop this patch.
> 

I suggest we just drop the bits that are not using HWI as the base type.

R.

> Thanks,
> Kyrill
> 
>>
>> R.
>>
>>> Thanks,
>>> Kyrill
>>>
>>> 2015-04-15  Kyrylo Tkachov  
>>>
>>>  * config/arm/arm.md (*zeroextractsi_compare0_scratch): Use IN_RANGE
>>>  instead of two compares.
>>>  (*ne_zeroextractsi): Likewise.
>>>  (*ite_ne_zeroextractsi): Likewise.
>>>  (load_multiple): Likewise.
>>>  (store_multiple): Likewise.
>>>  * config/arm/arm.h (IS_IWMMXT_REGNUM): Likewise.
>>>  (IS_IWMMXT_GR_REGNUM): Likewise.
>>>  (IS_VFP_REGNUM): Likewise.
>>>  * config/arm/arm.c (arm_return_in_memory): Likewise.
>>>  (aapcs_vfp_is_call_or_return_candidate): Likewise.
>>>  (thumb_find_work_register): Likewise.
>>>  (thumb2_legitimate_address_p): Likewise.
>>>  (arm_legitimate_index_p): Likewise.
>>>  (thumb2_legitimate_index_p): Likewise.
>>>  (thumb1_legitimate_address_p): Likewise.
>>>  (thumb_legitimize_address): Likewise.
>>>  (vfp3_const_double_index): Likewise.
>>>  (neon_immediate_valid_for_logic): Likewise.
>>>  (bounds_check): Likewise.
>>>  (load_multiple_sequence): Likewise.
>>>  (store_multiple_sequence): Likewise.
>>>  (offset_ok_for_ldrd_strd): Likewise.
>>>  (callee_saved_reg_p): Likewise.
>>>  (thumb2_emit_strd_push): Likewise.
>>>  (arm_output_load_gr): Likewise.
>>>  (arm_vector_mode_supported_p): Likewise.
>>>  * config/arm/neon.md (ashldi3_neon_noclobber): Likewise.
>>>  (ashrdi3_neon_imm_noclobber): Likewise.
>>>  (lshrdi3_neon_imm_noclobber): Likewise.
>>>  * config/arm/thumb1.md (*thumb1_addsi3): Likewise.
>>>  * config/arm/thumb2.md (define_peephole2's after
>>> orsi_not_shiftsi_si):
>>>  Likewise.
> 



Re: [PING][PATCH][PR65443] Add transform_to_exit_first_loop_alt

2015-04-20 Thread Richard Biener
On Wed, 15 Apr 2015, Tom de Vries wrote:

> On 03-04-15 14:39, Tom de Vries wrote:
> > On 27-03-15 15:10, Tom de Vries wrote:
> > > Hi,
> > > 
> > > this patch fixes PR65443, a todo in the parloops pass for function
> > > transform_to_exit_first_loop:
> > > ...
> > > TODO: the common case is that latch of the loop is empty and
> > > immediately
> > > follows the loop exit.  In this case, it would be better not to copy
> > > the
> > > body of the loop, but only move the entry of the loop directly before
> > > the
> > > exit check and increase the number of iterations of the loop by one.
> > > This may need some additional preconditioning in case NIT = ~0.
> > > ...
> > > 
> > > The current implementation of transform_to_exit_first_loop transforms the
> > > loop
> > > by moving and duplicating the loop body. This patch transforms the loop
> > > into the
> > > same normal form, but without the duplication, if that's possible
> > > (determined by
> > > try_transform_to_exit_first_loop_alt).
> > > 
> > > The actual transformation, done by transform_to_exit_first_loop_alt
> > > transforms
> > > this loop:
> > > ...
> > >   bb preheader:
> > >   ...
> > >   goto 
> > > 
> > >   :
> > >   ...
> > >   if (ivtmp < n)
> > > goto ;
> > >   else
> > > goto ;
> > > 
> > >   :
> > >   ivtmp = ivtmp + inc;
> > >   goto 
> > > ...
> > > 
> > > into this one:
> > > ...
> > >   bb preheader:
> > >   ...
> > >   goto 
> > > 
> > >   :
> > >   ...
> > >   goto ;
> > > 
> > >   :
> > >   if (ivtmp < n + 1)
> > > goto ;
> > >   else
> > > goto ;
> > > 
> > >   :
> > >   ivtmp = ivtmp + inc;
> > >   goto 
> > > ...
> > > 
> > 
> > Updated patch, now using redirect_edge_var_map and flush_pending_stmts.
> > 
> > Bootstrapped and reg-tested on x86_64.
> > 
> > OK for stage1 trunk?
> > 
> 
> Ping.

+static void
+replace_imm_uses (tree val, imm_use_iterator *imm_iter)
+{
+  use_operand_p use_p;
+
+  FOR_EACH_IMM_USE_ON_STMT (use_p, *imm_iter)
+SET_USE (use_p, val);

Use propagate_value.  Why this odd interface passing a imm_iter?!

In fact most of the "repair SSA" in transform_to_exit_first_loop_alt
looks too ad-hoc to me ... it also looks somewhat familiar to other
code ...

Unfortunately the comment before the function isn't in SSA form
so it's hard to follow the transform.

I consider the parloops code bitrotten, no repair possible, so
I might be convinced to not care about new spaghetti in there...

+  /* Fix up loop structure.  TODO: Check whether this is sufficient.  */
+  loop->header = new_header;
+

no, surely not.  Number of iterations (and estimates) are off
after the transform and loop->latch might also need updating.

"Safest" is to simply schedule a fixup (but you'll lose any
loop annotation in that process).

+  /* Figure out whether nit + 1 overflows.  */
+  if (TREE_CODE (nit) == INTEGER_CST)
+{
+  if (!tree_int_cst_equal (nit, TYPE_MAXVAL (nit_type)))

in case nit_type is a pointer type TYPE_MAXVAL will be NULL I think.

Is the whole exercise for performance?  In that case using an
entirely new, unsigned IV, that runs from 0 to niter should
be easiest and just run-time guard that niter == +INF case?

For the graphite case, can't you make graphite generate the
loops exit-first in the first place?

The testcases are just correctness ones?  That is, there isn't
any testcase that checks whether the new code is exercised?
(no extra debugging dumping?)

Thanks,
Richard.


Re: [patch, fortran] PR 37131

2015-04-20 Thread Mikael Morin
Le 19/04/2015 17:58, Thomas Koenig a écrit :
> Hello world,
> 
> here is the first installation of the matmul inlining patch.
> 
> This patch calculates c=MATMUL(a,b) using DO loops where there is no
> dependency between a and c/b and c loops, taking care of realloc on
> assignment and bounds checking (using the same error messages that the
> library does), and does not cause any regressions in the test suite.
> 
> There are several directions this should be extended at a later date:
> 
> - Remove unneeded bounds checking for the individual array accesses
> - Add handling of TRANSPOSE of the arguments
> - Add handling of temporaries for arguments, where needed
> 
> However, I think the patch is useful as it is now, and can go
> into trunk.
> 
> So: OK for trunk?
> 
Hello,

This is impressive.
I have a few comments, but in general it's mostly good.

I couldn't tell whether subreferences array(:,:)%subref%comp are
correctly handled, either positively or negatively. Tests for it are
more than welcome in any case. ;-)
An interesting case is non-default lbound.  The lbound intrinsic is
supposed to return 1 in the case of array subobjects, which may have
interesting effects.
So, test with non-default lbound as well.

I think strides are properly handled, but would feel more confortable
with tests about them.

To sum um, tests missing for the following:
array(4,:,:)
array(3:5,:)
array(3:10:2,:)
array(:,:)%comp
with both lbound == 1 and lbound != 1.
One test with lhs-rhs dependency would be good as well.

More specific comments below.

Mikael


> Index: fortran/array.c
> ===
> --- fortran/array.c   (Revision 18)
> +++ fortran/array.c   (Arbeitskopie)
> @@ -338,6 +338,9 @@ gfc_resolve_array_spec (gfc_array_spec *as, int ch
>if (as == NULL)
>  return true;
>  
> +  if (as->resolved)
> +return true;
> +
Why this?

[...]

> Index: fortran/frontend-passes.c
> ===
> --- fortran/frontend-passes.c (Revision 18)
> +++ fortran/frontend-passes.c (Arbeitskopie)
> @@ -43,7 +44,11 @@ static void doloop_warn (gfc_namespace *);
>  static void optimize_reduction (gfc_namespace *);
>  static int callback_reduction (gfc_expr **, int *, void *);
>  static void realloc_strings (gfc_namespace *);
> -static gfc_expr *create_var (gfc_expr *);
> +static gfc_expr *create_var (gfc_expr *, const char *vname=NULL);
> +static int optimize_matmul_assign (gfc_code **, int *, void *);
The function doesn't really "optimize", so name it inline_matmul_assign
instead.
Same for the comments about "optimizing MATMUL".

[...]

> @@ -524,29 +542,11 @@ constant_string_length (gfc_expr *e)
>  
>  }
>  
> -/* Returns a new expression (a variable) to be used in place of the old one,
> -   with an assignment statement before the current statement to set
> -   the value of the variable. Creates a new BLOCK for the statement if
> -   that hasn't already been done and puts the statement, plus the
> -   newly created variables, in that block.  Special cases:  If the
> -   expression is constant or a temporary which has already
> -   been created, just copy it.  */
> -
> -static gfc_expr*
> -create_var (gfc_expr * e)
Keep a comment here.

> +static gfc_namespace*
> +insert_block ()
>  {
> -  char name[GFC_MAX_SYMBOL_LEN +1];
> -  static int num = 1;
> -  gfc_symtree *symtree;
> -  gfc_symbol *symbol;
> -  gfc_expr *result;
> -  gfc_code *n;
>gfc_namespace *ns;
> -  int i;
>  
> -  if (e->expr_type == EXPR_CONSTANT || is_fe_temp (e))
> -return gfc_copy_expr (e);
> -
>/* If the block hasn't already been created, do so.  */
>if (inserted_block == NULL)
>  {

> @@ -1939,7 +1977,1049 @@ doloop_warn (gfc_namespace *ns)
>gfc_code_walker (&ns->code, doloop_code, do_function, NULL);
>  }
>  
> +/* This selction deals with inlining calls to MATMUL.  */
section
>  
> +/* Auxiliary function to build and simplify an array inquiry function.
> +   dim is zero-based.  */
> +
> +static gfc_expr *
> +get_array_inq_function (gfc_expr *e, int dim, gfc_isym_id id)
It's better if the id is the first argument, so that the function id and
its arguments come in their natural order.

[...]

> +/* Builds a logical expression.  */
> +
> +static gfc_expr*
> +build_logical_expr (gfc_expr *e1, gfc_expr *e2, gfc_intrinsic_op op)
Same here, op first.

[...]

> +
> +/* Return an operation of one two gfc_expr (one if e2 is NULL). This assumes
> +   compatible typespecs.  */
> +
> +static gfc_expr *
> +get_operand (gfc_intrinsic_op op, gfc_expr *e1, gfc_expr *e2)
Here it's good already. :-)

[...]

> +/* Insert code to issue a runtime error if the expressions are not equal.  */
> +
> +static gfc_code *
> +runtime_error_ne (gfc_expr *e1, gfc_expr *e2, const char *msg)
> +{
> +  gfc_expr *cond;
> +  gfc_code *if_1, *if_2;
> +  gfc_code *c;
> +  // const char *name;
Any reason...

> +  gfc_actual_argli

Re: [PATCH] Make split_block and create_basic_block type-safe

2015-04-20 Thread Richard Biener
On Mon, 23 Mar 2015, Richard Biener wrote:

> On Fri, 20 Mar 2015, David Malcolm wrote:
> 
> > On Thu, 2015-03-12 at 14:20 +0100, Richard Biener wrote:
> > > After noticing tree-parloop.c passing crap to split_block (a tree
> > > rather than a gimple or an rtx) I noticed those CFG functions simply
> > > take void * pointers.  The following patch fixes that and adds
> > > two overloads, one for GIMPLE use and one for RTL use.
> > > 
> > > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> > > 
> > > Ok at this stage?
> > > 
> > > Thanks,
> > > Richard.
> > > 
> > > 2015-03-12  Richard Biener  
> > > 
> > >   * cfghooks.h (create_basic_block): Replace with two overloads
> > >   for RTL and GIMPLE.
> > >   (split_block): Likewise.
> > >   * cfghooks.c (split_block): Rename to ...
> > >   (split_block_1): ... this.
> > >   (split_block): Add two type-safe overloads for RTL and GIMPLE.
> > >   (split_block_after_labels): Call split_block_1.
> > >   (create_basic_block): Rename to ...
> > >   (create_basic_block_1): ... this.
> > >   (create_basic_block): Add two type-safe overloads for RTL and GIMPLE.
> > >   (create_empty_bb): Call create_basic_block_1.
> > >   * cfgrtl.c (fixup_fallthru_exit_predecessor): Use
> > >   split_block_after_labels.
> > >   * omp-low.c (expand_parallel_call): Likewise.
> > >   (expand_omp_target): Likewise.
> > >   (simd_clone_adjust): Likewise.
> > >   * tree-chkp.c (chkp_get_entry_block): Likewise.
> > >   * cgraphunit.c (init_lowered_empty_function): Use the GIMPLE
> > >   create_basic_block overload.
> > >   (cgraph_node::expand_thunk): Likewise.
> > >   * tree-cfg.c (make_blocks): Likewise.
> > >   (handle_abnormal_edges): Likewise.
> > >   * tree-inline.c (copy_bb): Likewise.
> > > 
> > > Index: gcc/cfghooks.c
> > > ===
> > > --- gcc/cfghooks.c(revision 221379)
> > > +++ gcc/cfghooks.c(working copy)
> > 
> > [...snip...]
> > 
> > > +edge
> > > +split_block (basic_block bb, rtx i)
> > > +{
> > > +  return split_block_1 (bb, i);
> > > +}
> > 
> > Possibly a dumb question, but could this take an rtx_insn * rather than
> > a plain rtx?
> 
> Well, as you noted below...
> 
> > > +basic_block
> > > +create_basic_block (rtx head, rtx end, basic_block after)
> > > +{
> > > +  return create_basic_block_1 (head, end, after);
> > > +}
> > 
> > Likewise for head and end... though I see a fix would be needed in
> > bfin.c:hwloop_optimize, at least.
> 
> ...it would have required building all sorts of targets.
> 
> But sure, as this is now stage1 stuff I'll see to make it rtx_insns.

Ick.  Even bb-reorder.c isn't safe for that and stuff like
block_label () doesn't return a rtx_insn * either.

Thus I'll go for the original patch.

Thanks,
Richard.


Re: [Patch] pr65779 - [5/6 Regression] undefined local symbol on powerpc

2015-04-20 Thread Alan Modra
On Mon, Apr 20, 2015 at 10:55:56AM +0200, Jakub Jelinek wrote:
> On Mon, Apr 20, 2015 at 06:12:26PM +0930, Alan Modra wrote:
> > I had it that way in my first patch, then decided to try deleting..
> > 
> > I can certainly change it back even if only to do it the standard way
> > for safety's sake, but I'm curious as to why they can't be deleted in
> > this special case.  My thinking was that we're on a chain of blocks
> > starting at the entry where there is a single outgoing live edge for
> > the register being used.  So there shouldn't be any need for a debug
> > insn to mark info about the variable as invalid.
> 
> The debug insns can be for arbitrary variables, there is no "the variable",

Sure.

> and there could be other debug insns for the same variable on that path,
> say saying that decl lives in some other register, or can be computed using
> an expression involving other registers, or memory etc.  Say you could have

Yes, that's true in the general case.  For the shrink-wrap case, any
bb (or tail of the entry block) that we move over has no use or def of
the register.  So I'm left wondering how it would be possible for the
var to live in some other register or memory?  Probably lack of
imagination on my part, but the only scenarios I see would involve a
failure of cse.

Anyway, I rewrote the patch to do as you suggested, and started
looking at .debug_loc in gcc/*.o for files that differed between the
two approaches.  (Only 32 files differed, besides the expected
shrinkwrap.o and checksum files.)  That was a bit of a revelation,
and no wonder powerpc debugging is such a pain..

The first file I looked at was reload.o, and the first difference is
in location lists for get_secondary_mem mode param.  It looks like

virgin:
86c5 5c90 5cd7 (DW_OP_reg4 (r4))
86d8 5cd7 5cd8 (DW_OP_GNU_entry_value: 
(DW_OP_reg4 (r4)); DW_OP_stack_value)
86ee 5cd8 5cf8 (DW_OP_reg30 (r30))
8701 5d2c 5d38 (DW_OP_reg30 (r30))

delete:
86ae 5c90 5cd7 (DW_OP_reg4 (r4))
86c1 5cd7 5ec0 (DW_OP_GNU_entry_value: 
(DW_OP_reg4 (r4)); DW_OP_stack_value)

zap:
86ae 5c90 5cd7 (DW_OP_reg4 (r4))
86c1 5cd7 5cd8 (DW_OP_GNU_entry_value: 
(DW_OP_reg4 (r4)); DW_OP_stack_value)

and the code:
5cd0:   48 00 00 01 bl  5cd0 
<._Z17get_secondary_memP7rtx_def12machine_modei11reload_type+0x40>
5cd0: R_PPC64_REL24 
_Z35rs6000_secondary_memory_needed_mode12machine_mode
5cd4:   60 00 00 00 nop
5cd8:   7c 7d 07 b4 extsw   r29,r3
5cdc:   1f fd 00 1e mulli   r31,r29,30
5ce0:   7d 3f da 14 add r9,r31,r27
5ce4:   79 29 1f 24 rldicr  r9,r9,3,60
5ce8:   7d 3c 4a 14 add r9,r28,r9
5cec:   e9 29 55 a8 ld  r9,21928(r9)
5cf0:   2f a9 00 00 cmpdi   cr7,r9,0
5cf4:   41 9e 00 3c beq cr7,5d30 
<._Z17get_secondary_memP7rtx_def12machine_modei11reload_type+0xa0>
5cf8:   38 21 00 c0 addir1,r1,192
5cfc:   7d 23 4b 78 mr  r3,r9
..rest of epilogue

5d30:   7b b9 1f 24 rldicr  r25,r29,3,60
5d34:   7c 7e 1b 78 mr  r30,r3
5d38:   7f 1c ca 14 add r24,r28,r25

Arrgh!  In the first place the ranges are wrong since r4 dies after
the bl, not the toc restoring nop.  Worse, both deleting and zapping
(ie UNKNOWN_VAR_LOC) the debug insn is wrong.  My simplistic patch
isn't correct.  In fact it makes the debug info worse.  However,
leaving the debug insn alone says "mode" lives in r30, but that is
wrong since the copy to r30 insn has been moved, to 5d34.  Apparently
the move causes the virgin location lists to say "mode" disappears at
that point too.  What a mess!

Of course, all this moving for shrink-wrap is senseless in a block
that contains a call.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] Fix check for whether a function is a variadic function

2015-04-20 Thread Patrick Palka
On Sun, Apr 19, 2015 at 11:17 PM, Jason Merrill  wrote:
> On 04/19/2015 07:45 PM, Patrick Palka wrote:
>>
>> stdarg_p() apparently returns false for a variadic function that has no
>> concrete parameters, e.g. "void foo (...);".  This patch fixes this
>> issue by removing the predicate's seemingly bogus "n != NULL_TREE" test.
>
>
> What does this do with K&R non-prototype declarations, e.g. "int main();"?

stdarg_p (decl) now returns true when breaking on finish_decl() for
"int main();".  I'm not sure if that's right or not..

But never mind, I'm getting hundreds of failures from C tests that
define main without a parameter list e.g. "int main () { ... }".  This
is an issue not worth fixing..

>
> Jason
>


Re: [Patch] pr65779 - [5/6 Regression] undefined local symbol on powerpc

2015-04-20 Thread Jakub Jelinek
On Mon, Apr 20, 2015 at 10:30:32PM +0930, Alan Modra wrote:
> > and there could be other debug insns for the same variable on that path,
> > say saying that decl lives in some other register, or can be computed using
> > an expression involving other registers, or memory etc.  Say you could have
> 
> Yes, that's true in the general case.  For the shrink-wrap case, any
> bb (or tail of the entry block) that we move over has no use or def of
> the register.  So I'm left wondering how it would be possible for the
> var to live in some other register or memory?  Probably lack of
> imagination on my part, but the only scenarios I see would involve a
> failure of cse.

E.g. the variable might have different values in the source code.
You can have
  int var1 = 5;
  // some statements
  var1 = parmx + 20;
  // some statements
  var1 = 30;
  // ...
in the source and consider that the shrink wrapping insn is attempting to move
(set (regX) (plus (reg parmx) (const_int 20)))
later.  You could have different debug insns for the same var1, and trying
to move the debug_insn later would break things.  Usually, gcc only adds
further debug stmts (or insns), e.g. if in a range between earlier two debug
stmts (or insns) the var is known to contain some particular value, but
expression having that value or something used in it (SSA_NAME, pseudo REG),
is optimized away, if there is some other way to get the same value, the
range could be split into two with two different expressions etc.

> Arrgh!  In the first place the ranges are wrong since r4 dies after
> the bl, not the toc restoring nop.  Worse, both deleting and zapping
> (ie UNKNOWN_VAR_LOC) the debug insn is wrong.  My simplistic patch
> isn't correct.  In fact it makes the debug info worse.  However,

Zapping is conservatively correct, if you don't know where the var lives in
or how to compute it, you tell the debugger you don't know it.
Of course, it is a QoI issue, if there is an easy way how to reconstruct the
value otherwise, it is always better to do so.

> leaving the debug insn alone says "mode" lives in r30, but that is
> wrong since the copy to r30 insn has been moved, to 5d34.  Apparently
> the move causes the virgin location lists to say "mode" disappears at
> that point too.  What a mess!
> 
> Of course, all this moving for shrink-wrap is senseless in a block
> that contains a call.

Yeah, such blocks clearly aren't going to be shrink-wrapped, so there is no
point to move it that far, right?

Jakub


[PATCH][AArch64] Fix libstdc++ ABI baseline for aarch64-linux-gnu

2015-04-20 Thread Maxim Kuvyrkov
Hi,

I have been trying to figure out why I constantly get 2 extra TLS symbols in my 
aarch64-linux-gnu libstdc++ ABI tests, and it turned out to be due to support 
for non-TLS toolchain -- as discussed here [*].

However, as far as I understand, aarch64-linux-gnu postdates NPTL 
implementation, so I'm wondering if --disable-tls is even applicable to 
aarch64-linux-gnu.  If --disable-tls is not applicable, then we might as well 
include TLS symbols into aarch64-linux-gnu libstdc++ ABI baseline (as done for 
solaris configs).

Comments?

[*] http://thread.gmane.org/gmane.comp.gcc.patches/308910/focus=308917

--
Maxim Kuvyrkov
www.linaro.org




0001-Fix-libstdc-ABI-baseline-for-aarch64-linux-gnu.patch
Description: Binary data


Re: [PATCH, ping1] Fix removing of df problem in df_finish_pass

2015-04-20 Thread Kenneth Zadeck
As a dataflow maintainer, I approve this patch for the next release.
However, you will have to get approval of a release manager to get it 
into 5.0.




On 04/20/2015 04:22 AM, Thomas Preud'homme wrote:

Ping?


-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
Sent: Tuesday, March 03, 2015 12:02 PM
To: 'Bernhard Reutner-Fischer'; gcc-patches@gcc.gnu.org; 'Paolo Bonzini';
'Seongbae Park'; 'Kenneth Zadeck'
Subject: RE: [PATCH] Fix removing of df problem in df_finish_pass


From: Bernhard Reutner-Fischer [mailto:rep.dot@gmail.com]
Sent: Saturday, February 28, 2015 4:00 AM

   use df_remove_problem rather than manually removing problems,

living

leaving

Indeed. Please find updated changelog below:

2015-03-03  Thomas Preud'homme  

* df-core.c (df_finish_pass): Iterate over df-

problems_by_index[] and

use df_remove_problem rather than manually removing
problems, leaving
holes in df->problems_in_order[].

Best regards,

Thomas











Update __atomic builtins documentation.

2015-04-20 Thread Matthew Wahab

Hello,

The documentation for the __atomic builtins isn't clear about their expectations
and behaviour. In particular, assumptions about the C11/C++11 restrictions on
programs should be stated and the different behaviour of memory models in fences
and in operations should be noted. The behaviour of compare-exchange when the
compare fails is also confusing and the description of the implementation of the
__atomics is mixed in with the description of their functionality.

This patch tries to deal with some of these problems.

Tested by looking at the html.

Ok for trunk?
Matthew

2015-04-20  Matthew Wahab  

* doc/extend.texi (__atomic Builtins): Move implementation details
to the end of the description, rewrite opening paragraphs, state
difference with __sync builtins, state C11/C++11 assumptions,
weaken itemized descriptions, add explanation of memory model
behaviour, expand description of compare-exchange, simplify text.
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 7470e40..5b551c1 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8353,45 +8353,47 @@ are not prevented from being speculated to before the barrier.
 @node __atomic Builtins
 @section Built-in Functions for Memory Model Aware Atomic Operations
 
-The following built-in functions approximately match the requirements for
-C++11 memory model. Many are similar to the @samp{__sync} prefixed built-in
-functions, but all also have a memory model parameter.  These are all
-identified by being prefixed with @samp{__atomic}, and most are overloaded
-such that they work with multiple types.
-
-GCC allows any integral scalar or pointer type that is 1, 2, 4, or 8
-bytes in length. 16-byte integral types are also allowed if
-@samp{__int128} (@pxref{__int128}) is supported by the architecture.
-
-Target architectures are encouraged to provide their own patterns for
-each of these built-in functions.  If no target is provided, the original 
-non-memory model set of @samp{__sync} atomic built-in functions are
-utilized, along with any required synchronization fences surrounding it in
-order to achieve the proper behavior.  Execution in this case is subject
-to the same restrictions as those built-in functions.
-
-If there is no pattern or mechanism to provide a lock free instruction
-sequence, a call is made to an external routine with the same parameters
-to be resolved at run time.
+The following built-in functions approximately match the requirements
+for C++11 concurrency and memory models.  They are all
+identified by being prefixed with @samp{__atomic} and most are
+overloaded so that they work with multiple types.
+
+These functions are intended to replace the legacy @samp{__sync}
+builtins.  The main difference is that the memory model to be used is a
+parameter to the functions.  New code should always use the
+@samp{__atomic} builtins rather than the @samp{__sync} builtins.
+
+Note that the @samp{__atomic} builtins assume that programs will
+conform to the C++11 model for concurrency.  In particular, they assume
+that programs are free of data races.  See the C++11 standard for
+detailed definitions.
+
+The @samp{__atomic} builtins can be used with any integral scalar or
+pointer type that is 1, 2, 4, or 8 bytes in length.  16-byte integral
+types are also allowed if @samp{__int128} (@pxref{__int128}) is
+supported by the architecture.
 
 The four non-arithmetic functions (load, store, exchange, and 
 compare_exchange) all have a generic version as well.  This generic
 version works on any data type.  If the data type size maps to one
 of the integral sizes that may have lock free support, the generic
-version utilizes the lock free built-in function.  Otherwise an
+version uses the lock free built-in function.  Otherwise an
 external call is left to be resolved at run time.  This external call is
 the same format with the addition of a @samp{size_t} parameter inserted
 as the first parameter indicating the size of the object being pointed to.
 All objects must be the same size.
 
 There are 6 different memory models that can be specified.  These map
-to the same names in the C++11 standard.  Refer there or to the
-@uref{http://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync,GCC wiki on
-atomic synchronization} for more detailed definitions.  These memory
-models integrate both barriers to code motion as well as synchronization
-requirements with other threads. These are listed in approximately
-ascending order of strength. It is also possible to use target specific
-flags for memory model flags, like Hardware Lock Elision.
+to the C++11 memory models with the same names, see the C++11 standard
+or the @uref{http://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync,GCC wiki
+on atomic synchronization} for detailed definitions.  Individual
+targets may also support additional memory models for use on specific
+architectures.  Refer to the target documentation for details of
+these.
+
+The memory mod

Re: [PATCH][AArch64] Fix libstdc++ ABI baseline for aarch64-linux-gnu

2015-04-20 Thread Jakub Jelinek
On Mon, Apr 20, 2015 at 04:23:17PM +0300, Maxim Kuvyrkov wrote:
> I have been trying to figure out why I constantly get 2 extra TLS symbols in 
> my aarch64-linux-gnu libstdc++ ABI tests, and it turned out to be due to 
> support for non-TLS toolchain -- as discussed here [*].
> 
> However, as far as I understand, aarch64-linux-gnu postdates NPTL 
> implementation, so I'm wondering if --disable-tls is even applicable to 
> aarch64-linux-gnu.  If --disable-tls is not applicable, then we might as well 
> include TLS symbols into aarch64-linux-gnu libstdc++ ABI baseline (as done 
> for solaris configs).

If you want to do anything about it, instead hack up the ABI list checker,
so that for configurations with disabled TLS it ignores TLS symbols in the
ABI list, then they can be added on all architectures.

Treating aarch64-linux differently than all the other architectures sounds
wrong to me.

Jakub


Re: [patch,wwwdocs] Add gcc-5 caveats for avr.

2015-04-20 Thread Andi Kleen
Georg-Johann Lay  writes:
> +http://gcc.gnu.org/onlinedocs/gcc/Spec-Files.html";>spec 
> file.
> +If the compiler is used together with AVR-LibC, this requires at
> +least GCC 5.2 and a version of AVR-LibC which implements

Really 5.2?

-Andi


[committed] Fix GC ICE due to dwarf2out bug (PR debug/65807)

2015-04-20 Thread Jakub Jelinek
Hi!

add_AT_wide is the only add_AT_* that doesn't clear or otherwise initialize
dw_attr_val.val_entry field, so it contains random garbage, which isn't
desirable when ggc walks it during collections.

Supposedly this omission originates from the val_entry addition being added
everywhere only after wide-int branch grabbed some add_AT_* routine from
dwarf2out.c as example for the add_AT_wide addition.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk
and 5.1 as obvious.

2015-04-20  Jakub Jelinek  

PR debug/65807
* dwarf2out.c (add_AT_wide): Clear attr.dw_attr_val.val_entry.

--- gcc/dwarf2out.c.jj  2015-04-17 09:45:08.0 +0200
+++ gcc/dwarf2out.c 2015-04-20 11:37:59.544596284 +0200
@@ -3886,6 +3886,7 @@ add_AT_wide (dw_die_ref die, enum dwarf_
 
   attr.dw_attr = attr_kind;
   attr.dw_attr_val.val_class = dw_val_class_wide_int;
+  attr.dw_attr_val.val_entry = NULL;
   attr.dw_attr_val.v.val_wide = ggc_alloc ();
   *attr.dw_attr_val.v.val_wide = w;
   add_dwarf_attr (die, &attr);

Jakub


[Obvious][AArch64] Delete unused aarch64_simd_emit_pair_result_insn.

2015-04-20 Thread Alan Lawrence

Bootstrapped on aarch64-none-linux-gnu.

Pushed as r34.

gcc/ChangeLog:

* config/aarch64/aarch64.c (aarch64_simd_emit_pair_result_insn): Delete.
* config/aarch64/aarch64-protos.h (aarch64_simd_emit_pair_result_insn):
Delete.



Re: [PATCH, rs6000] Force lvx and stvx for prologue saves and epilogue restores of Altivec regs

2015-04-20 Thread Bill Schmidt
Hi,

I've added an assert to this patch to detect a related condition.  Prior
to this patch, the prologue code was causing us to incorrectly call the
expand-time routine rs6000_emit_vsx_le_store().  This was harmless, but
we should have caught this sooner.  The patch removes this problem, but
we should still be on the lookout for such behavior in the future.  I'll
commit this version after 5.1 is released, if there are no objections.

Thanks,
Bill


2015-04-20  Bill Schmidt  

* config/rs6000/altivec.md (*altivec_lvx__internal): Remove
asterisk from name so this can be generated directly.
(*altivec_stvx__internal): Likewise.
* config/rs6000/rs6000.c (rs6000_emit_le_vsx_store): Add assert
that this is never called during or after reload/lra.
(rs6000_frame_related): Remove split_reg
argument and logic that references it.
(emit_frame_save): Remove last parameter from call to
rs6000_frame_related.
(rs6000_emit_prologue): Remove last parameter from eight calls to
rs6000_frame_related.  Force generation of stvx instruction for
Altivec register saves.  Remove split_reg handling, which is no
longer needed.
(rs6000_emit_epilogue):  Force generation of lvx instruction for
Altivec register restores.


Index: gcc/config/rs6000/altivec.md
===
--- gcc/config/rs6000/altivec.md(revision 30)
+++ gcc/config/rs6000/altivec.md(working copy)
@@ -2455,7 +2455,7 @@
 }
 })
 
-(define_insn "*altivec_lvx__internal"
+(define_insn "altivec_lvx__internal"
   [(parallel
 [(set (match_operand:VM2 0 "register_operand" "=v")
  (match_operand:VM2 1 "memory_operand" "Z"))
@@ -2478,7 +2478,7 @@
 }
 })
 
-(define_insn "*altivec_stvx__internal"
+(define_insn "altivec_stvx__internal"
   [(parallel
 [(set (match_operand:VM2 0 "memory_operand" "=Z")
  (match_operand:VM2 1 "register_operand" "v"))
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 30)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -8371,6 +8371,11 @@ rs6000_emit_le_vsx_store (rtx dest, rtx source, ma
 {
   rtx tmp, permute_src, permute_tmp;
 
+  /* This should never be called during or after reload, because it does
+ not re-permute the source register.  It is intended only for use
+ during expand.  */
+  gcc_assert (!reload_in_progress && !lra_in_progress && !reload_completed);
+
   /* Use V2DImode to do swaps of types with 128-bit scalare parts (TImode,
  V1TImode).  */
   if (mode == TImode || mode == V1TImode)
@@ -22768,7 +22773,7 @@ output_probe_stack_range (rtx reg1, rtx reg2)
 
 static rtx
 rs6000_frame_related (rtx insn, rtx reg, HOST_WIDE_INT val,
- rtx reg2, rtx rreg, rtx split_reg)
+ rtx reg2, rtx rreg)
 {
   rtx real, temp;
 
@@ -22859,11 +22864,6 @@ rs6000_frame_related (rtx insn, rtx reg, HOST_WIDE
  }
 }
 
-  /* If a store insn has been split into multiple insns, the
- true source register is given by split_reg.  */
-  if (split_reg != NULL_RTX)
-real = gen_rtx_SET (VOIDmode, SET_DEST (real), split_reg);
-
   RTX_FRAME_RELATED_P (insn) = 1;
   add_reg_note (insn, REG_FRAME_RELATED_EXPR, real);
 
@@ -22971,7 +22971,7 @@ emit_frame_save (rtx frame_reg, machine_mode mode,
   reg = gen_rtx_REG (mode, regno);
   insn = emit_insn (gen_frame_store (reg, frame_reg, offset));
   return rs6000_frame_related (insn, frame_reg, frame_reg_to_sp,
-  NULL_RTX, NULL_RTX, NULL_RTX);
+  NULL_RTX, NULL_RTX);
 }
 
 /* Emit an offset memory reference suitable for a frame store, while
@@ -23551,7 +23551,7 @@ rs6000_emit_prologue (void)
 
   insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, p));
   rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
-   treg, GEN_INT (-info->total_size), NULL_RTX);
+   treg, GEN_INT (-info->total_size));
   sp_off = frame_off = info->total_size;
 }
 
@@ -23636,7 +23636,7 @@ rs6000_emit_prologue (void)
 
  insn = emit_move_insn (mem, reg);
  rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
-   NULL_RTX, NULL_RTX, NULL_RTX);
+   NULL_RTX, NULL_RTX);
  END_USE (0);
}
 }
@@ -23692,7 +23692,7 @@ rs6000_emit_prologue (void)
 info->lr_save_offset,
 DFmode, sel);
   rs6000_frame_related (insn, ptr_reg, sp_off,
-   NULL_RTX, NULL_RTX, NULL_RTX);
+   NULL_RTX, NULL_RTX);
   if (lr)
END_USE (0);
 }
@@ -23771,7 +23771,7 @@ rs6000_emit_prologue (void)
 SAVRES_SAVE 

[RS6000] pr65810, powerpc64 alignment of r2

2015-04-20 Thread Alan Modra
This fixes a thinko in offsettable_ok_by_alignment.  It's not the
absolute placement that matters, but the toc-pointer relative offset.
So alignment of r2 also needs to be taken into account.

Bootstrapped and regression tested powerpc64-linux.  OK for mainline
and gcc-5 branch?  Without the dead code removal for the branch..

I also have a linker fix to align the toc pointer and gcc configury
changes to supply POWERPC64_TOC_POINTER_ALIGNMENT in the works.

PR target/65810
* config/rs6000/rs6000.c (POWERPC64_TOC_POINTER_ALIGNMENT): Define.
(offsettable_ok_by_alignment): Return false if size exceeds
POWERPC64_TOC_POINTER_ALIGNMENT.  Replace dead code with assertion.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 27)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -6497,13 +6497,21 @@ virtual_stack_registers_memory_p (rtx op)
 }
 
 /* Return true if a MODE sized memory accesses to OP plus OFFSET
-   is known to not straddle a 32k boundary.  */
+   is known to not straddle a 32k boundary.  This function is used
+   to determine whether -mcmodel=medium code can use TOC pointer
+   relative addressing for OP.  This means the alignment of the TOC
+   pointer must also be taken into account, and unfortunately that is
+   only 8 bytes.  */ 
 
+#ifndef POWERPC64_TOC_POINTER_ALIGNMENT
+#define POWERPC64_TOC_POINTER_ALIGNMENT 8
+#endif
+
 static bool
 offsettable_ok_by_alignment (rtx op, HOST_WIDE_INT offset,
 machine_mode mode)
 {
-  tree decl, type;
+  tree decl;
   unsigned HOST_WIDE_INT dsize, dalign, lsb, mask;
 
   if (GET_CODE (op) != SYMBOL_REF)
@@ -6510,6 +6518,8 @@ offsettable_ok_by_alignment (rtx op, HOST_WIDE_INT
 return false;
 
   dsize = GET_MODE_SIZE (mode);
+  if (dsize > POWERPC64_TOC_POINTER_ALIGNMENT)
+return false;
   decl = SYMBOL_REF_DECL (op);
   if (!decl)
 {
@@ -6553,6 +6563,8 @@ offsettable_ok_by_alignment (rtx op, HOST_WIDE_INT
return false;
 
  dsize = tree_to_uhwi (DECL_SIZE_UNIT (decl));
+ if (dsize > POWERPC64_TOC_POINTER_ALIGNMENT)
+   return false;
  if (dsize > 32768)
return false;
 
@@ -6560,32 +6572,8 @@ offsettable_ok_by_alignment (rtx op, HOST_WIDE_INT
}
 }
   else
-{
-  type = TREE_TYPE (decl);
+gcc_unreachable ();
 
-  dalign = TYPE_ALIGN (type);
-  if (CONSTANT_CLASS_P (decl))
-   dalign = CONSTANT_ALIGNMENT (decl, dalign);
-  else
-   dalign = DATA_ALIGNMENT (decl, dalign);
-
-  if (dsize == 0)
-   {
- /* BLKmode, check the entire object.  */
- if (TREE_CODE (decl) == STRING_CST)
-   dsize = TREE_STRING_LENGTH (decl);
- else if (TYPE_SIZE_UNIT (type)
-  && tree_fits_uhwi_p (TYPE_SIZE_UNIT (type)))
-   dsize = tree_to_uhwi (TYPE_SIZE_UNIT (type));
- else
-   return false;
- if (dsize > 32768)
-   return false;
-
- return dalign / BITS_PER_UNIT >= dsize;
-   }
-}
-
   /* Find how many bits of the alignment we know for this access.  */
   mask = dalign / BITS_PER_UNIT - 1;
   lsb = offset & -offset;

-- 
Alan Modra
Australia Development Lab, IBM


[PATCH][combine] Do not call rtx costs on potentially unrecognisable rtxes in combine

2015-04-20 Thread Kyrill Tkachov

Hi all,

I'm trying to reduce the cases where the midend calls the backend rtx costs on 
bogus rtl for which the backend
doesn't have patterns or ways of handling. Having to handle these kinds of 
rtxes sanely bloats those
functions and makes them harder to maintain.

One of the cases where this occurs is in combine and 
distribute_and_simplify_rtx in particular.
Citing the comment at that function:
" See if X is of the form (* (+ A B) C), and if so convert to
   (+ (* A C) (* B C)) and try to simplify.
 Most of the time, this results in no change.  However, if some of
   the operands are the same or inverses of each other, simplifications
   will result."

The problem is that after it applies the distributive law it calls rtx costs
to figure out whether the rtx became simpler. This rtx can get pretty complex.
For example, on arm I've seen it try to cost:
(plus:SI (mult:SI (plus:SI (reg:SI 232 [ m1 ])
(const_int 1 [0x1]))
(reg:SI 232 [ m1 ]))
(plus:SI (reg:SI 232 [ m1 ])
(const_int 1 [0x1])))

which is never going to match anything on arm anyway, so why should the costs 
function handle it?
In any case, I believe combine's design is such that it should first be 
attempting to call
recog and split on the rtxes, and only if that succeeds should it be making a 
target-specific
decision on which rtx to prefer. distribute_and_simplify_rtx goes against that 
by calling
rtx costs on an unverified rtx in attempt to gauge its complexity.

This patch remedies that by removing the call to rtx costs and instead manually 
performing
a relatively simple check on whether the resultant rtx was simplified. That is, 
using the example
from the comment, whether (+ (* A C) (* B C)) still has + at the top and * in 
the two operands.
This should give a good indication on whether any meaningful simplification was 
made (The '+' and '*'
operators in the example can be any operators that can be distributed over).

Initially, I wanted to just return the distributed version and let recog reject 
the invalid rtxes
but that caused some code quality regressions on arm where the original rtx 
would not recog but
would match a beneficial splitter, whereas the distributed rtx would not.

With this patch I saw almost no codegen differences on arm for the whole of 
SPEC2006.
The one exception was 416.gamess where it managed to merge a mul and an add 
into an mla
which resulted in a slightly better code sequence. That was in a pretty large 
file and I
don't speak Fortran'ese, so I couldn't really reduce a testcase for it, but my 
guess is that
before the patch the costs would return some essentially random value for an 
arbitrarily complex rtx
that it was passed to, which changed the decision in 
distribute_and_simplify_rtx on whether
to return the distributed rtx, which could have impacted further optimisations 
in combine.

I tried it on x86_64 as well. Again, there were almost no codegen differences. 
The exception
was tonto and wrf where a few instructions were eliminated, but no significant 
difference.
The resultant binaries for these two were a tiny bit smaller, with no impact on 
runtime.

Therefore I claim that this a safe thing to do, as it leaves the 
target-specific rtx cost
judgements in combine to be made only on valid recog-ed rtxes, and not having 
them cancel
optimisations early due to rtx costs not handling arbitrary rtxes well.

Bootstrapped on arm, x86_64, aarch64 (all linux). Tested on arm,aarch64.

Ok for trunk?

Thanks,
Kyrill


2015-04-20  Kyrylo Tkachov  

* combine.c (distribute_and_simplify_rtx): Do not check rtx costs.
Look at the rtx codes to see if a simplification occured.
commit e9833e5e3e996ac68b645fdca14738232f59e1a2
Author: Kyrylo Tkachov 
Date:   Wed Apr 15 14:11:16 2015 +0100

[combine] Do not call rtx costs on potentially invalid rtx in distribute_and_simplify_rtx

diff --git a/gcc/combine.c b/gcc/combine.c
index 46cd6db..56d297b 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -9602,12 +9602,15 @@ distribute_and_simplify_rtx (rtx x, int n)
 
   tmp = apply_distributive_law (simplify_gen_binary (inner_code, mode,
 		 new_op0, new_op1));
-  if (GET_CODE (tmp) != outer_code
-  && (set_src_cost (tmp, optimize_this_for_speed_p)
-	  < set_src_cost (x, optimize_this_for_speed_p)))
-return tmp;
 
-  return NULL_RTX;
+  /* We didn't manage to simplify.  */
+  if (GET_CODE (tmp) == outer_code
+  || (GET_CODE (tmp) == inner_code
+	  && GET_CODE (XEXP (tmp, 0)) == outer_code
+	  && GET_CODE (XEXP (tmp, 1)) == outer_code))
+return NULL_RTX;
+
+  return tmp;
 }
 
 /* Simplify a logical `and' of VAROP with the constant CONSTOP, to be done


[PATCH][OpenMP] Fix resolve_device with -foffload=disable

2015-04-20 Thread Ilya Verbin
Hi!

Currently if a compiler is configured with enabled offloading, the 'devices'
array in libgomp is filled properly with a number of available devices.
However, if a program is compiled with -foffload=disable, the resolve_device
function returns a pointer to the device, and host-fallback is not happening.
The patch below fixes this issue.
make check-target-libgomp passed.  OK for trunk?


libgomp/
* libgomp.h (struct gomp_device_descr): Add num_images.
* target.c (resolve_device): Call gomp_init_device.  Return NULL if
there is no image loaded to the device.
(gomp_offload_image_to_device): Increase num_images.
(GOMP_offload_unregister): Decrease num_images.
(GOMP_target): Don't call gomp_init_device.
(GOMP_target_data): Ditto.
(GOMP_target_update): Ditto.
(gomp_target_init): Set num_images to 0.
* testsuite/libgomp.c/target-1-disable.c: New test.


diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 5272f01..47a064a 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -762,6 +762,9 @@ struct gomp_device_descr
   /* Set to true when device is initialized.  */
   bool is_initialized;
 
+  /* Number of images offloaded to the device.  */
+  int num_images;
+
   /* OpenACC-specific data and functions.  */
   /* This is mutable because of its mutable data_environ and target_data
  members.  */
diff --git a/libgomp/target.c b/libgomp/target.c
index d8da783..f5126b9 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -132,6 +132,14 @@ resolve_device (int device_id)
   if (device_id < 0 || device_id >= gomp_get_num_devices ())
 return NULL;
 
+  gomp_mutex_lock (&devices[device_id].lock);
+  if (!devices[device_id].is_initialized)
+gomp_init_device (&devices[device_id]);
+  gomp_mutex_unlock (&devices[device_id].lock);
+
+  if (devices[device_id].num_images <= 0)
+return NULL;
+
   return &devices[device_id];
 }
 
@@ -697,6 +705,7 @@ gomp_offload_image_to_device (struct gomp_device_descr 
*devicep,
   struct addr_pair *target_table = NULL;
   int i, num_target_entries
 = devicep->load_image_func (devicep->target_id, target_data, 
&target_table);
+  devicep->num_images++;
 
   if (num_target_entries != num_funcs + num_vars)
 {
@@ -831,6 +840,7 @@ GOMP_offload_unregister (void *host_table, enum 
offload_target_type target_type,
}
 
   devicep->unload_image_func (devicep->target_id, target_data);
+  devicep->num_images--;
 
   /* Remove mapping from splay tree.  */
   struct splay_tree_key_s k;
@@ -966,11 +976,6 @@ GOMP_target (int device, void (*fn) (void *), const void 
*unused,
   return;
 }
 
-  gomp_mutex_lock (&devicep->lock);
-  if (!devicep->is_initialized)
-gomp_init_device (devicep);
-  gomp_mutex_unlock (&devicep->lock);
-
   void *fn_addr;
 
   if (devicep->capabilities & GOMP_OFFLOAD_CAP_NATIVE_EXEC)
@@ -1034,11 +1039,6 @@ GOMP_target_data (int device, const void *unused, size_t 
mapnum,
   return;
 }
 
-  gomp_mutex_lock (&devicep->lock);
-  if (!devicep->is_initialized)
-gomp_init_device (devicep);
-  gomp_mutex_unlock (&devicep->lock);
-
   struct target_mem_desc *tgt
 = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
 false);
@@ -1069,11 +1069,6 @@ GOMP_target_update (int device, const void *unused, 
size_t mapnum,
   || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
 return;
 
-  gomp_mutex_lock (&devicep->lock);
-  if (!devicep->is_initialized)
-gomp_init_device (devicep);
-  gomp_mutex_unlock (&devicep->lock);
-
   gomp_update (devicep, mapnum, hostaddrs, sizes, kinds, false);
 }
 
@@ -1265,6 +1260,7 @@ gomp_target_init (void)
current_device.type = current_device.get_type_func ();
current_device.mem_map.root = NULL;
current_device.is_initialized = false;
+   current_device.num_images = 0;
current_device.openacc.data_environ = NULL;
for (i = 0; i < new_num_devices; i++)
  {
diff --git a/libgomp/testsuite/libgomp.c/target-1-disable.c 
b/libgomp/testsuite/libgomp.c/target-1-disable.c
new file mode 100644
index 000..00ea143
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/target-1-disable.c
@@ -0,0 +1,4 @@
+/* { dg-options "-foffload=disable" } */
+/* { dg-require-effective-target offload_device } */
+
+#include "target-1.c"


  -- Ilya


Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-04-20 Thread Thomas Schwinge
Hi!

On Thu, 15 Jan 2015 21:20:07 +0100, I wrote:
> In r219682, I have committed to trunk our current set of OpenACC changes,
> which we had prepared on gomp-4_0-branch.  Thanks to everyone who has
> been contributing!
> 
> Note that this is an experimental feature, incomplete, and subject to
> change in future versions of GCC.  We shall update -- and keep updated --
> , to track the current status.

(This has now happened, finally...)

Gerald, is it OK to commit the following to update GCC 5 changes' »New
Languages and Language specific improvements« section?

Index: htdocs/gcc-5/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v
retrieving revision 1.109
diff -u -p -r1.109 changes.html
--- htdocs/gcc-5/changes.html   20 Apr 2015 08:22:35 -  1.109
+++ htdocs/gcc-5/changes.html   20 Apr 2015 14:20:54 -
@@ -193,6 +193,12 @@
  Card emulator.

 
+
+  GCC 5 includes a preliminary implementation of the OpenACC 2.0a
+  specification.  OpenACC is intended for programming accelerator devices
+  such as GPUs.  See https://gcc.gnu.org/wiki/OpenACC";>the OpenACC
+  wiki page for more information.
+
   
 
 


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: [RS6000] pr65810, powerpc64 alignment of r2

2015-04-20 Thread Jakub Jelinek
On Mon, Apr 20, 2015 at 11:23:17PM +0930, Alan Modra wrote:
> This fixes a thinko in offsettable_ok_by_alignment.  It's not the
> absolute placement that matters, but the toc-pointer relative offset.
> So alignment of r2 also needs to be taken into account.
> 
> Bootstrapped and regression tested powerpc64-linux.  OK for mainline
> and gcc-5 branch?  Without the dead code removal for the branch..

Please wait until 5.1 is released on the gcc-5 branch.

Jakub


Re: [PATCH] Fix check for whether a function is a variadic function

2015-04-20 Thread Jason Merrill

On 04/20/2015 09:02 AM, Patrick Palka wrote:

But never mind, I'm getting hundreds of failures from C tests that
define main without a parameter list e.g. "int main () { ... }".  This
is an issue not worth fixing..


Yep, that's what I was wondering.  I think it makes sense to fix this 
testcase in the C++ front end rather than in code shared between front ends.


Jason




Re: RFA (stor-layout): PATCH for c++/65734 (attribute aligned and templates)

2015-04-20 Thread Jakub Jelinek
On Tue, Apr 14, 2015 at 12:06:16PM -0400, Jason Merrill wrote:
> With C++ templates and attribute ((aligned)), you can have TYPE_ALIGN and
> TYPE_USER_ALIGN set on a type before you know its size, so layout_type and
> kin need to respect them if they are already set.
> 
> Tested x86_64-pc-linux-gnu.  OK for trunk?

Wonder what will happen if finalize_type_size or fixup_attribute_variants
is called on a type variant with TYPE_USER_ALIGN before it is called
on the TYPE_MAIN_VARIANT; I'd guess that in that case all the variants
including the main variant would be marked as TYPE_USER_ALIGN and might have
incorrect TYPE_ALIGN.  Perhaps they shouldn't propagate anything to other
variants if user_align is set, or recurse (at least finalize_type_size)
on the TYPE_MAIN_VARIANT if it is called on some other variant, as a way to
propagate the layout to other variants?

Otherwise the patch LGTM.

Jakub


[PATCH][committed] Add missed '%>' to error_at message

2015-04-20 Thread Ilya Verbin
This patch adds missed '%>' to the error_at () message in
c[p]_parser_omp_target_update.  Committed as obvious.


gcc/c/
* c-parser.c (c_parser_oacc_enter_exit_data): Remove excess semicolon.
(c_parser_omp_target_update): Add missed %> to error_at ().
gcc/cp/
* parser.c (cp_parser_omp_target_update): Add missed %> to error_at ().


Index: gcc/c/c-parser.c
===
--- gcc/c/c-parser.c(revision 36)
+++ gcc/c/c-parser.c(working copy)
@@ -12153,7 +12153,7 @@
   return;
 }
 
-  stmt = enter ? make_node (OACC_ENTER_DATA) : make_node (OACC_EXIT_DATA);;
+  stmt = enter ? make_node (OACC_ENTER_DATA) : make_node (OACC_EXIT_DATA);
   TREE_TYPE (stmt) = void_type_node;
   if (enter)
 OACC_ENTER_DATA_CLAUSES (stmt) = clauses;
@@ -13851,7 +13851,7 @@
   && find_omp_clause (clauses, OMP_CLAUSE_FROM) == NULL_TREE)
 {
   error_at (loc,
-   "%<#pragma omp target update must contain at least one "
+   "%<#pragma omp target update%> must contain at least one "
"% or % clauses");
   return false;
 }
Index: gcc/cp/parser.c
===
--- gcc/cp/parser.c (revision 36)
+++ gcc/cp/parser.c (working copy)
@@ -31379,7 +31379,7 @@
   && find_omp_clause (clauses, OMP_CLAUSE_FROM) == NULL_TREE)
 {
   error_at (pragma_tok->location,
-   "%<#pragma omp target update must contain at least one "
+   "%<#pragma omp target update%> must contain at least one "
"% or % clauses");
   return false;
 }


  -- Ilya


Re: [PATCH][expmed] Properly account for the cost and latency of shift+add ops when synthesizing mults

2015-04-20 Thread Jeff Law

On 04/20/2015 05:09 AM, Kyrill Tkachov wrote:

Hi Jeff,

On 17/04/15 20:38, Jeff Law wrote:

On 04/14/2015 02:11 AM, Kyrill Tkachov wrote:

Of course the effect on codegen of this patch depends a lot on the rtx
costs in the backend.
On aarch64 with -mcpu=cortex-a57 tuning I see the cost limit being
exceeded in more cases and the
expansion code choosing instead to do a move-immediate and a mul
instruction.
No regressions on SPEC2006 on a Cortex-A57.

For example, for code:
long f0 (int x, int y)
{
return (long)x * 6L;
}


int f1(int x)
{
return x * 10;
}

int f2(int x)
{
  return x * 100;
}

int f3(int x)
{
  return x * 20;
}

int f4(int x)
{
  return x * 25;
}

int f5(int x)
{
return x * 11;
}

Please turn this into a test for the testsuite.  It's fine if this the
test is specific to AArch64.  You may need to break it into 6 individual
tests since what you want to check for in each one may be significantly
different.  For example, f0, f4 and f5 you'd probably check for the
constant load & multiply instructions.  Not sure how to best test for
what you want in f1-f3.


f1/f3 still end up synthesising the mult, but prefer a different
algorithm. I don't think the algorithm chosen in f1/f3 is worse or
better than what it was producing before, so I don't think there's
much point in testing for it.
Yea, when I looked at the differences, it wasn't immediately clear if 
there was a real improvement or not.  The new sequences use the single 
register, so one might claim they're marginally better due to that.


 If you think it's really better to

test for something, I propose testing that only two instructions are
generated, and neither of them are a 'mul'. I'll repost a patch with
my proposed testcases for f0,f2,f4,f5.
Your call on f1/f3.  IIRC f2 didn't change at all, so I'm not sure if 
you need a test for that (perhaps you should make sure it continues to 
use a mul rather than a synthesized sequence).


Pre-approved with whatever you decide on the testing side.

jeff



Re: [PATCH][committed] Add missed '%>' to error_at message

2015-04-20 Thread Jakub Jelinek
On Mon, Apr 20, 2015 at 05:56:16PM +0300, Ilya Verbin wrote:
> This patch adds missed '%>' to the error_at () message in
> c[p]_parser_omp_target_update.  Committed as obvious.

Please commit also to gcc-5-branch.
The Swedish translation is the only one that has this message (with the bug
in it) translated, so either we fix that for the release manually, or that
message won't be translated.

> gcc/c/
>   * c-parser.c (c_parser_oacc_enter_exit_data): Remove excess semicolon.
>   (c_parser_omp_target_update): Add missed %> to error_at ().
> gcc/cp/
>   * parser.c (cp_parser_omp_target_update): Add missed %> to error_at ().

Jakub


Re: [PATCH][expmed] Properly account for the cost and latency of shift+add ops when synthesizing mults

2015-04-20 Thread Kyrill Tkachov


On 20/04/15 16:06, Jeff Law wrote:

On 04/20/2015 05:09 AM, Kyrill Tkachov wrote:

Hi Jeff,

On 17/04/15 20:38, Jeff Law wrote:

On 04/14/2015 02:11 AM, Kyrill Tkachov wrote:

Of course the effect on codegen of this patch depends a lot on the rtx
costs in the backend.
On aarch64 with -mcpu=cortex-a57 tuning I see the cost limit being
exceeded in more cases and the
expansion code choosing instead to do a move-immediate and a mul
instruction.
No regressions on SPEC2006 on a Cortex-A57.

For example, for code:
long f0 (int x, int y)
{
 return (long)x * 6L;
}


int f1(int x)
{
 return x * 10;
}

int f2(int x)
{
   return x * 100;
}

int f3(int x)
{
   return x * 20;
}

int f4(int x)
{
   return x * 25;
}

int f5(int x)
{
 return x * 11;
}

Please turn this into a test for the testsuite.  It's fine if this the
test is specific to AArch64.  You may need to break it into 6 individual
tests since what you want to check for in each one may be significantly
different.  For example, f0, f4 and f5 you'd probably check for the
constant load & multiply instructions.  Not sure how to best test for
what you want in f1-f3.

f1/f3 still end up synthesising the mult, but prefer a different
algorithm. I don't think the algorithm chosen in f1/f3 is worse or
better than what it was producing before, so I don't think there's
much point in testing for it.

Yea, when I looked at the differences, it wasn't immediately clear if
there was a real improvement or not.  The new sequences use the single
register, so one might claim they're marginally better due to that.

   If you think it's really better to

test for something, I propose testing that only two instructions are
generated, and neither of them are a 'mul'. I'll repost a patch with
my proposed testcases for f0,f2,f4,f5.

Your call on f1/f3.  IIRC f2 didn't change at all, so I'm not sure if
you need a test for that (perhaps you should make sure it continues to
use a mul rather than a synthesized sequence).

Pre-approved with whatever you decide on the testing side.


Thanks,
I could've sworn I had sent this version out a couple hours ago.
My mail client has been playing up.

Here it is with 6 tests. For the tests corresponding to f1/f3 in my
example above I scan that we don't use the 'w1' reg.

I'll give the AArch64 maintainers to comment on the tests for a day or two
before committing.

Thanks,
Kyrill

2015-04-20  Kyrylo Tkachov  

* expmed.c: (synth_mult): Only assume overlapping
shift with previous steps in alg_sub_t_m2 case.

2015-04-20  Kyrylo Tkachov  

* gcc.target/aarch64/mult-synth_1.c: New test.
* gcc.target/aarch64/mult-synth_2.c: Likewise.
* gcc.target/aarch64/mult-synth_3.c: Likewise.
* gcc.target/aarch64/mult-synth_4.c: Likewise.
* gcc.target/aarch64/mult-synth_5.c: Likewise.
* gcc.target/aarch64/mult-synth_6.c: Likewise.


jeff



commit 6785f2839811c9f8e59e3e36f64cdb3a4ebe
Author: Kyrylo Tkachov 
Date:   Thu Mar 12 17:38:20 2015 +

[expmed] Properly account for the cost and latency of shift+sub ops when synthesizing mults

diff --git a/gcc/expmed.c b/gcc/expmed.c
index f570612..e890a1e 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -2664,14 +2664,28 @@ synth_mult (struct algorithm *alg_out, unsigned HOST_WIDE_INT t,
   m = exact_log2 (-orig_t + 1);
   if (m >= 0 && m < maxm)
 	{
-	  op_cost = shiftsub1_cost (speed, mode, m);
+	  op_cost = add_cost (speed, mode) + shift_cost (speed, mode, m);
+	  /* If the target has a cheap shift-and-subtract insn use
+	 that in preference to a shift insn followed by a sub insn.
+	 Assume that the shift-and-sub is "atomic" with a latency
+	 equal to it's cost, otherwise assume that on superscalar
+	 hardware the shift may be executed concurrently with the
+	 earlier steps in the algorithm.  */
+	  if (shiftsub1_cost (speed, mode, m) <= op_cost)
+	{
+	  op_cost = shiftsub1_cost (speed, mode, m);
+	  op_latency = op_cost;
+	}
+	  else
+	op_latency = add_cost (speed, mode);
+
 	  new_limit.cost = best_cost.cost - op_cost;
-	  new_limit.latency = best_cost.latency - op_cost;
+	  new_limit.latency = best_cost.latency - op_latency;
 	  synth_mult (alg_in, (unsigned HOST_WIDE_INT) (-orig_t + 1) >> m,
 		  &new_limit, mode);
 
 	  alg_in->cost.cost += op_cost;
-	  alg_in->cost.latency += op_cost;
+	  alg_in->cost.latency += op_latency;
 	  if (CHEAPER_MULT_COST (&alg_in->cost, &best_cost))
 	{
 	  best_cost = alg_in->cost;
@@ -2704,20 +2718,12 @@ synth_mult (struct algorithm *alg_out, unsigned HOST_WIDE_INT t,
   if (t % d == 0 && t > d && m < maxm
 	  && (!cache_hit || cache_alg == alg_add_factor))
 	{
-	  /* If the target has a cheap shift-and-add instruction use
-	 that in preference to a shift insn followed by an add insn.
-	 Assume that the shift-and-add is "atomic" with a latency
-	 equal to its cost, otherwise assume that on superscalar
-	 hardware the shift may be executed co

Re: [Obvious][AArch64] Delete unused aarch64_simd_emit_pair_result_insn.

2015-04-20 Thread Alan Lawrence

Oops, missed off the patch actually pushed. Attached now.

Cheers, Alan

Alan Lawrence wrote:

Bootstrapped on aarch64-none-linux-gnu.

Pushed as r34.

gcc/ChangeLog:

 * config/aarch64/aarch64.c (aarch64_simd_emit_pair_result_insn): 
Delete.
 * config/aarch64/aarch64-protos.h (aarch64_simd_emit_pair_result_insn):
 Delete.




diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 59c5824f894cf5dafe93a996180056696518feb4..8676c5c9c85d82f05e8c63e27f29d1e244ac7104 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -264,12 +264,6 @@ void init_aarch64_simd_builtins (void);
 
 void aarch64_simd_emit_reg_reg_move (rtx *, enum machine_mode, unsigned int);
 
-/* Emit code to place a AdvSIMD pair result in memory locations (with equal
-   registers).  */
-void aarch64_simd_emit_pair_result_insn (machine_mode,
-	 rtx (*intfn) (rtx, rtx, rtx), rtx,
-	 rtx);
-
 /* Expand builtins for SIMD intrinsics.  */
 rtx aarch64_simd_expand_builtin (int, tree, rtx);
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index cba3c1a4d42c7d543e0ed96a7b41fcd9c925f245..0c63af040493ba4006324b2e2f6a3c8e901027f4 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8569,24 +8569,6 @@ aarch64_simd_lane_bounds (rtx operand, HOST_WIDE_INT low, HOST_WIDE_INT high,
   }
 }
 
-/* Emit code to place a AdvSIMD pair result in memory locations (with equal
-   registers).  */
-void
-aarch64_simd_emit_pair_result_insn (machine_mode mode,
-			rtx (*intfn) (rtx, rtx, rtx), rtx destaddr,
-rtx op1)
-{
-  rtx mem = gen_rtx_MEM (mode, destaddr);
-  rtx tmp1 = gen_reg_rtx (mode);
-  rtx tmp2 = gen_reg_rtx (mode);
-
-  emit_insn (intfn (tmp1, op1, tmp2));
-
-  emit_move_insn (mem, tmp1);
-  mem = adjust_address (mem, mode, GET_MODE_SIZE (mode));
-  emit_move_insn (mem, tmp2);
-}
-
 /* Return TRUE if OP is a valid vector addressing mode.  */
 bool
 aarch64_simd_mem_operand_p (rtx op)


Re: [PATCH] Add new target h8300-*-linux

2015-04-20 Thread Jeff Law

On 04/19/2015 10:51 PM, Yoshinori Sato wrote:

+  if (TARGET_H8300H && (TARGET_H8300S || TARGET_H8300SX))
+{
+  target_flags ^= MASK_H8300H;
+}

I'm a bit concerned by this.  Why did you need to make this change?



The flag is exclusion, but it's set both.
Hmmm, IIRC the port has many places where it may assume that H8300H is 
set for H8300S/H8300SX.  I did a very quick audit and saw:


I would recommend reviewing the extzv_16_8 pattern which has the 
condition "TARGET_H8300H" and changing the condition to
"TARGET_H8300H || TARGET_H8300S"  since AFAICT that pattern should work 
on both processor variants.


Similarly there's two peephole patterns have have conditions that looks 
like this:


 "(TARGET_H8300H || TARGET_H8300S)
&& peep2_reg_dead_p (1, operands[0])
&& ((TARGET_H8300H && INTVAL (operands[1]) == 3)
 || INTVAL (operands[1]) == 7
 || INTVAL (operands[1]) == 15
 || INTVAL (operands[1]) == 31
 || INTVAL (operands[1]) == 63
 || INTVAL (operands[1]) == 127
 || INTVAL (operands[1]) == 255)"


I'm pretty sure the second TARGET_H8300H should be (TARGET_H8300H || 
TARGET_H8300S).


In h8300.c::get_shift_alg, case HIshift, count 14, does this need to change?

else if (TARGET_H8300H)
{
  info->special = 
"shll.b\t%t0\n\tsubx.b\t%s0,%s0\n\tshll.b\t%t0\n\trotxl.b\t%s0\n\texts.w\t%T0";

  info->cc_special = CC_SET_ZNV;
}
  else /* TARGET_H8300S */
gcc_unreachable ();

Similarly SImode shifts by 28-30 bits should be reviewed in a similar 
manner.  As should the implementation of h8300_shift_needs_scratch_p.


output_a_rotate also needs to be reviewed if you want to make the change 
to turn off H8300H when H8/S is true.  Similarly for 
compute_a_rotate_length.



There may be others, these are what I found with a very quick search. 
If there's not a compelling reason to make the change, I'd recommend 
against it.




Jeff


[PATCH][AArch64] Implement -m{cpu,tune,arch}=native using only /proc/cpuinfo

2015-04-20 Thread Kyrill Tkachov

Hi all,

This is an attempt to add native CPU detection to AArch64 GNU/Linux targets.
Similar to other ports we use SPEC rewriting to rewrite -m{cpu,tune,arch}=native
options into the appropriate CPU/architecture and the architecture extension 
options
when appropriate (i.e. +crypto/+crc etc).

For CPU/architecture detection it gets a bit involved, especially when running 
on a
big.LITTLE system. My proposed approach is to look at /proc/cpuinfo/ and search 
for the
implementer id and part number fields that uniquely identify each core 
(appropriate identifying
information is added to aarch64-cores.def). If we find two types of core we 
have a big.LITTLE
system, so search through the core definitions extracted from aarch64-cores.def 
to find if we
support such a combination (currently only cortex-a57.cortex-a53 and 
cortex-a72.cortex-a53)
and make sure that the implementer id field matches up.

I tested this on a 4xCortex-A53 + 2xCortex-A57 big.LITTLE Ubuntu GNU/Linux 
system.
There are two formats for /proc/cpuinfo/ that I'm aware of. The first (old) one 
has the format:
--
processor: 0
processor: 1
processor: 2
processor: 3
processor: 4
processor: 5
Features: fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer: 0x41
CPU architecture: AArch64
CPU variant: 0x0
CPU part: 0xd03
--

In this format it lists the 6 cores but the CPU part it reports is only the one 
for the core
from which /proc/cpuinfo was read from (!), in this case one of the Cortex-A53 
cores.
This means we detect a different CPU depending on which
core GCC was invoked on. Not ideal really, but there's no more information that 
we can extract.
Given the /proc/cpuinfo above, this patch will rewrite -mcpu=native into 
-mcpu=cortex-a53+fp+simd+crypto+crc

The newer /proc/cpuinfo format proposed at
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=44b82b7700d05a52cd983799d3ecde1a976b3bed
looks like this:

--
processor   : 0
Features: fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part: 0xd03
CPU revision: 0

processor   : 1
Features: fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part: 0xd03
CPU revision: 0

processor   : 2
Features: fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part: 0xd03
CPU revision: 0

processor   : 3
Features: fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part: 0xd03
CPU revision: 0

processor   : 4
Features: fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part: 0xd07
CPU revision: 0

processor   : 5
Features: fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part: 0xd07
CPU revision: 0
--

The Features field is used to detect the architectural features that we map to 
GCC option extensions
i.e. +fp,+crypto,+simd,+crc etc.

Similarly, -march=native would be rewritten into 
-march=armv8-a+fp+simd+crypto+crc
while -mtune=native into -march=cortex-a57.cortex-a53 (the arch extension 
options are not valid
for -mtune).

If it detects more than one implementer ID or the implementer IDs not matching 
up somewhere
or some other weirdness /proc/cpuinfo or fails to recognise the CPU it will 
bail out and ignore
the option entirely (similarly to other ports).

The patch works fine with both /proc/cpuinfo formats although, as mentioned 
above, it will not be
able to detect the big.LITTLE combination from the first format.

I've filled in the implementer ID and part numbers for the Cortex-A57, 
Cortex-A53, Cortex-A72, X-Gene 1 cores,
but I don't have that info for thunderx or exynosm1. Could someone from Cavium 
and Samsung help me out
here? At present this patch has some false dummy values that I'd like to fill 
out before committing this.

I've bootstrapped this on the system mentioned above with -mcpu=native in the 
BOOT_CFLAGS and regtested as well.
For the bootstrap I've used the 2nd /proc/cpuinfo format.

I've also tested it on AArch64 hardware from ARM Ltd. and the ecosystem.

If using the first format the bootstrap fails the comparison because, depending 
on the OS scheduling, some files
are compiled with Cortex-A57 tuning and some with Cortex-A53 tuning and this is 
practically non-deterministic
across stage2 and stage3!

What do people think of this approach?

2014-04-20  Kyrylo Tkachov  

* config.host (case ${host}): Add aarch64*-*-linux ca

Re: [patch,wwwdocs] Add gcc-5 caveats for avr.

2015-04-20 Thread Georg-Johann Lay

Am 04/20/2015 um 03:40 PM schrieb Andi Kleen:

Georg-Johann Lay  writes:

+http://gcc.gnu.org/onlinedocs/gcc/Spec-Files.html";>spec 
file.
+If the compiler is used together with AVR-LibC, this requires at
+least GCC 5.2 and a version of AVR-LibC which implements


Really 5.2?

-Andi



It is still unclear to me whether changes to 3ary platforms may go into 5.1.  
See

https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00908.html

Johann



[PATCH][AArch64] Increase static buffer size in aarch64_rewrite_selected_cpu

2015-04-20 Thread Kyrill Tkachov

Hi all,

When trying to compile a testcase with -mcpu=cortex-a57+crypto+nocrc I got the 
weird assembler error:
Assembler messages:
Error: missing architectural extension
Error: unrecognized option -mcpu=cortex-a57+crypto+no

The problem is the aarch64_rewrite_selected_cpu that is used to rewrite -mcpu 
for big.LITTLE options
has a limit of 20 characters in what it handles, which we can exhaust quickly 
if we specify
architectural extensions in a fine-grained manner.

This patch increases that character limit to 128 and adds an assert to confirm 
that no bad things
happen.


It also fixes another problem: If we pass a big.LITTLE combination with feature 
modifiers like:
-mcpu=cortex-a57.cortex-a53+nosimd

the code will truncate everything after '.', thus destroying the extensions 
that we want to pass.
The patch adds code to stitch the extensions back on after the LITTLE cpu is 
removed.

Tested aarch64-none-elf and made sure the given mcpu option works fine with the 
assembler.

Ok for trunk?

Thanks,
Kyrill

2015-04-20  Kyrylo Tkachov  

* common/config/aarch64/aarch64-common.c (AARCH64_CPU_NAME_LENGTH):
Increase to 128.
(aarch64_rewrite_selected_cpu): Do not chop off extensions starting
at '.'.  Assert that there's enough space for everything.
commit 9623c859d5f4d0da1a364184bf0ce0dbbc7907b4
Author: Kyrylo Tkachov 
Date:   Thu Feb 19 17:05:48 2015 +

[AArch64] Increase static buffer size in aarch64_rewrite_selected_cpu

diff --git a/gcc/common/config/aarch64/aarch64-common.c b/gcc/common/config/aarch64/aarch64-common.c
index 308f19c..b3fd9dc 100644
--- a/gcc/common/config/aarch64/aarch64-common.c
+++ b/gcc/common/config/aarch64/aarch64-common.c
@@ -27,6 +27,7 @@
 #include "common/common-target-def.h"
 #include "opts.h"
 #include "flags.h"
+#include "errors.h"
 
 #ifdef  TARGET_BIG_ENDIAN_DEFAULT
 #undef  TARGET_DEFAULT_TARGET_FLAGS
@@ -89,23 +90,34 @@ aarch64_handle_option (struct gcc_options *opts,
 
 struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
 
-#define AARCH64_CPU_NAME_LENGTH 20
+#define AARCH64_CPU_NAME_LENGTH 128
 
-/* Truncate NAME at the first '.' character seen, or return
-   NAME unmodified.  */
+/* Truncate NAME at the first '.' character seen up to the first '+'
+   or return NAME unmodified.  */
 
 const char *
 aarch64_rewrite_selected_cpu (const char *name)
 {
   static char output_buf[AARCH64_CPU_NAME_LENGTH + 1] = {0};
-  char *arg_pos;
+  const char *bL_sep;
+  const char *feats;
+  size_t pref_size;
+  size_t feat_size;
 
-  strncpy (output_buf, name, AARCH64_CPU_NAME_LENGTH);
-  arg_pos = strchr (output_buf, '.');
+  bL_sep = strchr (name, '.');
+  if (!bL_sep)
+return name;
 
-  /* If we found a '.' truncate the entry at that point.  */
-  if (arg_pos)
-*arg_pos = '\0';
+  feats = strchr (name, '+');
+  feat_size = feats ? strnlen (feats, AARCH64_CPU_NAME_LENGTH) : 0;
+  pref_size = bL_sep - name;
+
+  if ((feat_size + pref_size) > AARCH64_CPU_NAME_LENGTH)
+internal_error ("-mcpu string too large");
+
+  strncpy (output_buf, name, pref_size);
+  if (feats)
+strncpy (output_buf + pref_size, feats, feat_size);
 
   return output_buf;
 }


[PATCH][ARM] Handle UNSPEC_VOLATILE in rtx costs and don't recurse inside the unspec

2015-04-20 Thread Kyrill Tkachov

Hi all,

A pet project of mine is to get to the point where backend rtx costs functions 
won't have
to handle rtxes that don't match down to any patterns/expanders we have. Or at 
least limit such cases.
A case dealt with in this patch is QImode PLUS. We don't actually generate or 
handle these anywhere in
the arm backend *except* in sync.md where, for example, 
atomic_ matches:
(set (match_operand:QHSD 0 "mem_noofs_operand" "+Ua")
(unspec_volatile:QHSD
  [(syncop:QHSD (match_dup 0)
 (match_operand:QHSD 1 "" ""))
   (match_operand:SI 2 "const_int_operand")];; model
  VUNSPEC_ATOMIC_OP))

Here QHSD can contain QImode and HImode while syncop can be PLUS.
Now immediately during splitting in arm_split_atomic_op we convert that
QImode PLUS into an SImode one, so we never actually generate any kind of 
QImode add operations
(how would we? we don't have define_insns for such things) but the RTL 
optimisers will get a hold
of the UNSPEC_VOLATILE in the meantime and ask for it's cost (for example, cse 
when building libatomic).
Currently we don't handle UNSPEC_VOLATILE (VUNSPEC_ATOMIC_OP) so the arm rtx 
costs function just recurses
into the QImode PLUS that I'd like to avoid.
This patch stops that by passing the VUNSPEC_ATOMIC_OP into arm_unspec_cost and 
handling it there
(very straightforwardly just returning COSTS_N_INSNS (2); there's no indication 
that we want to do anything
smarter here) and stopping the recursion.

This is a small step in the direction of not having to care about obviously 
useless rtxes in the backend.
The astute reader might notice that in sync.md we also have the pattern 
atomic_fetch_
which expands to/matches this:
(set (match_operand:QHSD 0 "s_register_operand" "=&r")
(match_operand:QHSD 1 "mem_noofs_operand" "+Ua"))
   (set (match_dup 1)
(unspec_volatile:QHSD
  [(syncop:QHSD (match_dup 1)
 (match_operand:QHSD 2 "" ""))
   (match_operand:SI 3 "const_int_operand")];; model
  VUNSPEC_ATOMIC_OP))


Here the QImode PLUS is in a PARALLEL together with the UNSPEC, so it might 
have rtx costs called on it
as well. This will always be a (plus (reg) (mem)) rtx, which is unlike any 
other normal rtx we generate
in the arm backend. I'll try to get a patch to handle that case, but I'm still 
thinking on how to best
do that.

Tested arm-none-eabi, I didn't see any codegen differences in some compiled 
codebases.

Ok for trunk?

P.S. I know that expmed creates all kinds of irregular rtxes and asks for their 
costs. I'm hoping to clean that
up at some point...

2015-04-20  Kyrylo Tkachov  

* config/arm/arm.c (arm_new_rtx_costs): Handle UNSPEC_VOLATILE.
(arm_unspec_cost): Allos UNSPEC_VOLATILE.  Do not recurse inside
unknown unspecs.
commit b86d4036fbbbd15956595ef5a158e1a6881356c1
Author: Kyrylo Tkachov 
Date:   Thu Apr 16 11:17:47 2015 +0100

[ARM] Handle UNSPEC_VOLATILE in costs and don't recurse inside the unspec

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 5116817..8f31be0 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9607,7 +9607,8 @@ static bool
 arm_unspec_cost (rtx x, enum rtx_code /* outer_code */, bool speed_p, int *cost)
 {
   const struct cpu_cost_table *extra_cost = current_tune->insn_extra_cost;
-  gcc_assert (GET_CODE (x) == UNSPEC);
+  rtx_code code = GET_CODE (x);
+  gcc_assert (code == UNSPEC || code == UNSPEC_VOLATILE);
 
   switch (XINT (x, 1))
 {
@@ -9653,7 +9654,7 @@ arm_unspec_cost (rtx x, enum rtx_code /* outer_code */, bool speed_p, int *cost)
   *cost = COSTS_N_INSNS (2);
   break;
 }
-  return false;
+  return true;
 }
 
 /* Cost of a libcall.  We assume one insn per argument, an amount for the
@@ -11121,6 +11122,7 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code,
   *cost = LIBCALL_COST (1);
   return false;
 
+case UNSPEC_VOLATILE:
 case UNSPEC:
   return arm_unspec_cost (x, outer_code, speed_p, cost);
 


Re: [patch,wwwdocs] Add gcc-5 caveats for avr.

2015-04-20 Thread Jakub Jelinek
On Mon, Apr 20, 2015 at 06:21:30PM +0200, Georg-Johann Lay wrote:
> Am 04/20/2015 um 03:40 PM schrieb Andi Kleen:
> >Georg-Johann Lay  writes:
> >>+http://gcc.gnu.org/onlinedocs/gcc/Spec-Files.html";>spec 
> >>file.
> >>+If the compiler is used together with AVR-LibC, this requires at
> >>+least GCC 5.2 and a version of AVR-LibC which implements
> >
> >Really 5.2?
> >
> >-Andi
> 
> 
> It is still unclear to me whether changes to 3ary platforms may go into 5.1.  
> See
> 
> https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00908.html

At this point no.

Jakub


[PATCH][AArch64] Properly handle SHIFT ops and EXTEND in aarch64_rtx_mult_cost

2015-04-20 Thread Kyrill Tkachov

Hi all,

The aarch64_rtx_mult_cost helper is supposed to handle multiplication costs as 
well as
PLUS/MINUS operations combined with multiplication or shift operations. The 
shift
operations may contain an extension. Currently we do not handle all these cases 
properly.
We also don't handle other supported shift types besides ASHIFT.

This patch addresses that by beefing up aarch64_rtx_mult_cost to handle
extensions inside the shifts and handling the other kinds of supported shifts.

Bootstrapped and tested on aarch64-linux.

Ok for trunk?

Thanks,
Kyrill

2015-04-20  Kyrylo Tkachov  

* config/aarch64/aarch64.c (aarch64_shift_p): New function.
(aarch64_rtx_mult_cost): Update comment to reflect that it also handles
combined arithmetic-shift ops.  Properly handle all shift and extend
operations that can occur in combination with PLUS/MINUS.
Rename maybe_fma to compound_p.
(aarch64_rtx_costs): Use aarch64_shift_p when costing compound
arithmetic and shift operations.
commit 5c9d34ca7f6758ea0402cc0ef97d5db481ba7e40
Author: Kyrylo Tkachov 
Date:   Mon Mar 2 12:04:27 2015 +

[AArch64] Properly handle SHIFT ops and EXTEND in aarch64_rtx_mult_cost.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 2023f04..65be1b98 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5161,9 +5161,17 @@ aarch64_strip_extend (rtx x)
   return x;
 }
 
+/* Return true iff CODE is a shift supported in combination
+   with arithmetic instructions.  */
+static bool
+aarch64_shift_p (enum rtx_code code)
+{
+  return code == ASHIFT || code == ASHIFTRT || code == LSHIFTRT;
+}
+
 /* Helper function for rtx cost calculation.  Calculate the cost of
-   a MULT, which may be part of a multiply-accumulate rtx.  Return
-   the calculated cost of the expression, recursing manually in to
+   a MULT or ASHIFT, which may be part of a compound PLUS/MINUS rtx.
+   Return the calculated cost of the expression, recursing manually in to
operands where needed.  */
 
 static int
@@ -5173,7 +5181,7 @@ aarch64_rtx_mult_cost (rtx x, int code, int outer, bool speed)
   const struct cpu_cost_table *extra_cost
 = aarch64_tune_params->insn_extra_cost;
   int cost = 0;
-  bool maybe_fma = (outer == PLUS || outer == MINUS);
+  bool compound_p = (outer == PLUS || outer == MINUS);
   machine_mode mode = GET_MODE (x);
 
   gcc_checking_assert (code == MULT);
@@ -5188,18 +5196,35 @@ aarch64_rtx_mult_cost (rtx x, int code, int outer, bool speed)
   if (GET_MODE_CLASS (mode) == MODE_INT)
 {
   /* The multiply will be canonicalized as a shift, cost it as such.  */
-  if (CONST_INT_P (op1)
-	  && exact_log2 (INTVAL (op1)) > 0)
+  if (aarch64_shift_p (GET_CODE (x))
+	  || (CONST_INT_P (op1)
+	  && exact_log2 (INTVAL (op1)) > 0))
 	{
+	  bool is_extend = GET_CODE (op0) == ZERO_EXTEND
+	   || GET_CODE (op0) == SIGN_EXTEND;
 	  if (speed)
 	{
-	  if (maybe_fma)
-		/* ADD (shifted register).  */
-		cost += extra_cost->alu.arith_shift;
+	  if (compound_p)
+	{
+	  if (REG_P (op1))
+		/* ARITH + shift-by-register.  */
+		cost += extra_cost->alu.arith_shift_reg;
+		  else if (is_extend)
+		/* ARITH + extended register.  We don't have a cost field
+		   for ARITH+EXTEND+SHIFT, so use extend_arith here.  */
+		cost += extra_cost->alu.extend_arith;
+		  else
+		/* ARITH + shift-by-immediate.  */
+		cost += extra_cost->alu.arith_shift;
+		}
 	  else
 		/* LSL (immediate).  */
-		cost += extra_cost->alu.shift;
+	cost += extra_cost->alu.shift;
+
 	}
+	  /* Strip extends as we will have costed them in the case above.  */
+	  if (is_extend)
+	op0 = aarch64_strip_extend (op0);
 
 	  cost += rtx_cost (op0, GET_CODE (op0), 0, speed);
 
@@ -5217,7 +5242,7 @@ aarch64_rtx_mult_cost (rtx x, int code, int outer, bool speed)
 
 	  if (speed)
 	{
-	  if (maybe_fma)
+	  if (compound_p)
 		/* MADD/SMADDL/UMADDL.  */
 		cost += extra_cost->mult[0].extend_add;
 	  else
@@ -5235,7 +5260,7 @@ aarch64_rtx_mult_cost (rtx x, int code, int outer, bool speed)
 
   if (speed)
 	{
-	  if (maybe_fma)
+	  if (compound_p)
 	/* MADD.  */
 	cost += extra_cost->mult[mode == DImode].add;
 	  else
@@ -5256,7 +5281,7 @@ aarch64_rtx_mult_cost (rtx x, int code, int outer, bool speed)
 	  if (GET_CODE (op1) == NEG)
 	op1 = XEXP (op1, 0);
 
-	  if (maybe_fma)
+	  if (compound_p)
 	/* FMADD/FNMADD/FNMSUB/FMSUB.  */
 	cost += extra_cost->fp[mode == DFmode].fma;
 	  else
@@ -5833,7 +5858,7 @@ cost_minus:
 
 	/* Cost this as an FMA-alike operation.  */
 	if ((GET_CODE (new_op1) == MULT
-	 || GET_CODE (new_op1) == ASHIFT)
+	 || aarch64_shift_p (GET_CODE (new_op1)))
 	&& code != COMPARE)
 	  {
 	*cost += aarch64_rtx_mult_cost (new_op1, MULT,
@@ -5904,7 +5929,7 @@ cost_plus:
 	new_op0 = aarch64_strip_extend (op0);
 
 	if (GET_CODE (new_op0) == MULT
-	|| GET_CODE 

[PATCH][AArch64] Properly cost MNEG/[SU]MNEGL patterns

2015-04-20 Thread Kyrill Tkachov

Hi all,

Currently we do not handle the MNEG patterns properly in rtx costs.
These instructions are similar to the MSUB ones.
This patch handles them by catching the NEG at the appropriate position,
extracting its operands and letting the rest of the aarch64_rtx_mult_cost 
function
handle the additional costs.

Tested on aarch64-none-elf.

Ok trunk?

Thanks,
Kyrill

N.B.
This patches' context depends on:
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01049.html

2015-04-20  Kyrylo Tkachov  

* config/aarch64/aarch64.c (aarch64_rtx_mult_cost): Handle MNEG
and [SU]MNEGL patterns.
commit 13b3a8297e6337a3ed89b9ef0182f273cf693ac3
Author: Kyrylo Tkachov 
Date:   Tue Mar 10 15:52:24 2015 +

[AArch64] Properly cost MNEG/[SU]MNEGL patterns

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 853cce9..d1635f4 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5228,6 +5228,15 @@ aarch64_rtx_mult_cost (rtx x, int code, int outer, bool speed)
 	  return cost;
 	}
 
+  /* MNEG or [US]MNEGL.  Extract the NEG operand and indicate that it's a
+	 compound and let the below cases handle it.  After all, MNEG is a
+	 special-case alias of MSUB.  */
+  if (GET_CODE (op0) == NEG)
+	{
+	  op0 = XEXP (op0, 0);
+	  compound_p = true;
+	}
+
   /* Integer multiplies or FMAs have zero/sign extending variants.  */
   if ((GET_CODE (op0) == ZERO_EXTEND
 	   && GET_CODE (op1) == ZERO_EXTEND)
@@ -5240,7 +5249,7 @@ aarch64_rtx_mult_cost (rtx x, int code, int outer, bool speed)
 	  if (speed)
 	{
 	  if (compound_p)
-		/* MADD/SMADDL/UMADDL.  */
+		/* SMADDL/UMADDL/UMSUBL/SMSUBL.  */
 		cost += extra_cost->mult[0].extend_add;
 	  else
 		/* MUL/SMULL/UMULL.  */
@@ -5250,7 +5259,7 @@ aarch64_rtx_mult_cost (rtx x, int code, int outer, bool speed)
 	  return cost;
 	}
 
-  /* This is either an integer multiply or an FMA.  In both cases
+  /* This is either an integer multiply or a MADD.  In both cases
 	 we want to recurse and cost the operands.  */
   cost += rtx_cost (op0, MULT, 0, speed)
 	  + rtx_cost (op1, MULT, 1, speed);
@@ -5258,7 +5267,7 @@ aarch64_rtx_mult_cost (rtx x, int code, int outer, bool speed)
   if (speed)
 	{
 	  if (compound_p)
-	/* MADD.  */
+	/* MADD/MSUB.  */
 	cost += extra_cost->mult[mode == DImode].add;
 	  else
 	/* MUL.  */


[PATCH][AArch64] Use extend_arith rtx cost appropriately

2015-04-20 Thread Kyrill Tkachov

Hi all,

When calculating the rtx costs of an arithmetic operation combined with
zero or sign extension of its operand we should use the extend_arith
cost rather than the arith_shift cost.

Bootstrapped and tested on aarch64-linux.
Ok for trunk?

Thanks,
Kyrill

2015-04-20  Kyrylo Tkachov  

* config/aarch64/aarch64.c (aarch64_rtx_costs): Use extend_arith
rather than arith_shift cost when costing ADD/MINUS of an
extended value.
commit 5bf9f716682d7098b52f3f6ba562c74a33ed9485
Author: Kyrylo Tkachov 
Date:   Mon Mar 2 10:31:43 2015 +

[AArch64] Use extend_arith rtx cost appropriately.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 6e73176..2023f04 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5821,7 +5821,7 @@ cost_minus:
 if (aarch64_rtx_arith_op_extract_p (op1, mode))
 	  {
 	if (speed)
-	  *cost += extra_cost->alu.arith_shift;
+	  *cost += extra_cost->alu.extend_arith;
 
 	*cost += rtx_cost (XEXP (XEXP (op1, 0), 0),
 			   (enum rtx_code) GET_CODE (op1),
@@ -5891,7 +5891,7 @@ cost_plus:
 if (aarch64_rtx_arith_op_extract_p (op0, mode))
 	  {
 	if (speed)
-	  *cost += extra_cost->alu.arith_shift;
+	  *cost += extra_cost->alu.extend_arith;
 
 	*cost += rtx_cost (XEXP (XEXP (op0, 0), 0),
 			   (enum rtx_code) GET_CODE (op0),


Re: [PATCH][committed] Add missed '%>' to error_at message

2015-04-20 Thread Ilya Verbin
On Mon, Apr 20, 2015 at 17:03:27 +0200, Jakub Jelinek wrote:
> On Mon, Apr 20, 2015 at 05:56:16PM +0300, Ilya Verbin wrote:
> > This patch adds missed '%>' to the error_at () message in
> > c[p]_parser_omp_target_update.  Committed as obvious.
> 
> Please commit also to gcc-5-branch.
> The Swedish translation is the only one that has this message (with the bug
> in it) translated, so either we fix that for the release manually, or that
> message won't be translated.

Done.  (I didn't fix Swedish translation)

  -- Ilya


[PATCH][DRIVER] Wrong C++ include paths when configuring with "--with-sysroot=/"

2015-04-20 Thread Pavel Kopyl

Hi all,


To build a GCC-4.9.2 ARM cross-compiler for my setting I need to 
configure it with  "--with-sysroot=/ 
--with-gxx-include-dir=/usr/include/c++/4.9.2".

But I found that gcc driver removes the leading slash from resulting paths:

`gcc -print-prog-name=cc1plus` -v
...
ignoring nonexistent directory "usr/include/c++/4.9.2"   <- HERE
ignoring nonexistent directory 
"usr/include/c++/4.9.2/armv7l-tizen-linux-gnueabi"   <- AND HERE

ignoring nonexistent directory "usr/include/c++/4.9.2/backward" <- AND HERE
#include "..." search starts here:
#include <...> search starts here:
/usr/lib/gcc/armv7l-tizen-linux-gnueabi/4.9.2/include
/usr/local/include
/usr/lib/gcc/armv7l-tizen-linux-gnueabi/4.9.2/include-fixed
/usr/include

It's also reproducible on trunk.

Attached patch fixes this bug.

Thanks,
Pavel.
gcc/Changelog

2015-04-20  Pavel Kopyl  

	* gcc.c (add_sysrooted_prefix): Add new variable 'real_sysroot'.
	Pass it to 'concat()' instead of 'sysroot_no_trailing_dir_separator'.
	* incpath.c (add_standard_paths): Likewise.

diff --git a/gcc/gcc.c b/gcc/gcc.c
index c3d44b1..b0b7515 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -2581,11 +2581,19 @@ add_sysrooted_prefix (struct path_prefix *pprefix, const char *prefix,
 	sysroot_no_trailing_dir_separator[sysroot_len - 1] = '\0';
 
   if (target_sysroot_suffix)
-	prefix = concat (sysroot_no_trailing_dir_separator,
-			 target_sysroot_suffix, prefix, NULL);
+	{
+	  const char *real_sysroot
+	 = ((target_sysroot_suffix[0] == DIR_SEPARATOR)
+		? sysroot_no_trailing_dir_separator : target_system_root);
+	  prefix = concat (real_sysroot, target_sysroot_suffix, prefix, NULL);
+	}
   else
-	prefix = concat (sysroot_no_trailing_dir_separator, prefix, NULL);
-
+	{
+	  const char *real_sysroot
+	= ((prefix[0] == DIR_SEPARATOR)
+	   ? sysroot_no_trailing_dir_separator : target_system_root);
+	  prefix = concat (real_sysroot, prefix, NULL);
+	}
   free (sysroot_no_trailing_dir_separator);
 
   /* We have to override this because GCC's notion of sysroot
diff --git a/gcc/incpath.c b/gcc/incpath.c
index f495c0a..2387db6 100644
--- a/gcc/incpath.c
+++ b/gcc/incpath.c
@@ -178,10 +178,14 @@ add_standard_paths (const char *sysroot, const char *iprefix,
 	{
 	  char *sysroot_no_trailing_dir_separator = xstrdup (sysroot);
 	  size_t sysroot_len = strlen (sysroot);
+	  const char *real_sysroot;
 
 	  if (sysroot_len > 0 && sysroot[sysroot_len - 1] == DIR_SEPARATOR)
 		sysroot_no_trailing_dir_separator[sysroot_len - 1] = '\0';
-	  str = concat (sysroot_no_trailing_dir_separator, p->fname, NULL);
+
+	  real_sysroot = ((p->fname[0] == DIR_SEPARATOR)
+			  ? sysroot_no_trailing_dir_separator : sysroot);
+	  str = concat (real_sysroot, p->fname, NULL);
 	  free (sysroot_no_trailing_dir_separator);
 	}
 	  else if (!p->add_sysroot && relocated


Re: [RS6000] pr65810, powerpc64 alignment of r2

2015-04-20 Thread David Edelsohn
On Mon, Apr 20, 2015 at 9:53 AM, Alan Modra  wrote:
> This fixes a thinko in offsettable_ok_by_alignment.  It's not the
> absolute placement that matters, but the toc-pointer relative offset.
> So alignment of r2 also needs to be taken into account.
>
> Bootstrapped and regression tested powerpc64-linux.  OK for mainline
> and gcc-5 branch?  Without the dead code removal for the branch..

See question below.

>
> I also have a linker fix to align the toc pointer and gcc configury
> changes to supply POWERPC64_TOC_POINTER_ALIGNMENT in the works.
>
> PR target/65810
> * config/rs6000/rs6000.c (POWERPC64_TOC_POINTER_ALIGNMENT): Define.
> (offsettable_ok_by_alignment): Return false if size exceeds
> POWERPC64_TOC_POINTER_ALIGNMENT.  Replace dead code with assertion.
>
> Index: gcc/config/rs6000/rs6000.c
> ===
> --- gcc/config/rs6000/rs6000.c  (revision 27)
> +++ gcc/config/rs6000/rs6000.c  (working copy)
> @@ -6497,13 +6497,21 @@ virtual_stack_registers_memory_p (rtx op)
>  }
>
>  /* Return true if a MODE sized memory accesses to OP plus OFFSET
> -   is known to not straddle a 32k boundary.  */
> +   is known to not straddle a 32k boundary.  This function is used
> +   to determine whether -mcmodel=medium code can use TOC pointer
> +   relative addressing for OP.  This means the alignment of the TOC
> +   pointer must also be taken into account, and unfortunately that is
> +   only 8 bytes.  */
>
> +#ifndef POWERPC64_TOC_POINTER_ALIGNMENT
> +#define POWERPC64_TOC_POINTER_ALIGNMENT 8
> +#endif
> +
>  static bool
>  offsettable_ok_by_alignment (rtx op, HOST_WIDE_INT offset,
>  machine_mode mode)
>  {
> -  tree decl, type;
> +  tree decl;
>unsigned HOST_WIDE_INT dsize, dalign, lsb, mask;
>
>if (GET_CODE (op) != SYMBOL_REF)
> @@ -6510,6 +6518,8 @@ offsettable_ok_by_alignment (rtx op, HOST_WIDE_INT
>  return false;
>
>dsize = GET_MODE_SIZE (mode);
> +  if (dsize > POWERPC64_TOC_POINTER_ALIGNMENT)
> +return false;

Why do you immediately fail if the mode size is greater than the alignment?

You may want to sprinkle some comments that this is assuming natural
alignment (which is correct for the current ABI).  But the DECL could
be declared with less alignment.

VMX needs this sort of restriction, but VSX would function okay.

Or, does the generated code assume that the offsets from the TOC are
units of the DECL size and not the TOC size?

Thanks, David

>decl = SYMBOL_REF_DECL (op);
>if (!decl)
>  {
> @@ -6553,6 +6563,8 @@ offsettable_ok_by_alignment (rtx op, HOST_WIDE_INT
> return false;
>
>   dsize = tree_to_uhwi (DECL_SIZE_UNIT (decl));
> + if (dsize > POWERPC64_TOC_POINTER_ALIGNMENT)
> +   return false;
>   if (dsize > 32768)
> return false;
>
> @@ -6560,32 +6572,8 @@ offsettable_ok_by_alignment (rtx op, HOST_WIDE_INT
> }
>  }
>else
> -{
> -  type = TREE_TYPE (decl);
> +gcc_unreachable ();
>
> -  dalign = TYPE_ALIGN (type);
> -  if (CONSTANT_CLASS_P (decl))
> -   dalign = CONSTANT_ALIGNMENT (decl, dalign);
> -  else
> -   dalign = DATA_ALIGNMENT (decl, dalign);
> -
> -  if (dsize == 0)
> -   {
> - /* BLKmode, check the entire object.  */
> - if (TREE_CODE (decl) == STRING_CST)
> -   dsize = TREE_STRING_LENGTH (decl);
> - else if (TYPE_SIZE_UNIT (type)
> -  && tree_fits_uhwi_p (TYPE_SIZE_UNIT (type)))
> -   dsize = tree_to_uhwi (TYPE_SIZE_UNIT (type));
> - else
> -   return false;
> - if (dsize > 32768)
> -   return false;
> -
> - return dalign / BITS_PER_UNIT >= dsize;
> -   }
> -}
> -
>/* Find how many bits of the alignment we know for this access.  */
>mask = dalign / BITS_PER_UNIT - 1;
>lsb = offset & -offset;
>
> --
> Alan Modra
> Australia Development Lab, IBM


Handle redirection blocks with clobbers

2015-04-20 Thread Jeff Law


PR 65658 shows a case where we fail to thread jumps in a block that is 
trivially threadable and would generate no code if threaded.  That in 
turn results in inefficient code and a false positive from -Wuninitialized.


The problem is the problem block has a clobber statement and 
redirection_block_p thus rejects the block as a redirection block.  That 
in turn causes the recorded jump threads to be pruned.


Fixed by handling clobber statements in redirection-block_p.

Bootstrapped and regression tested on x86-linux-gnu.  Installed on the 
trunk.


Jeff
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index a34e846..9d8c2bc 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2015-04-20  Jeff Law  
+
+   PR tree-optimization/65658
+   * tree-ssa-threadupdate.c (redirection_block_p): Ignore clobber
+   statements too.
+
 2015-04-20  Alan Lawrence  
 
* config/aarch64/aarch64.c (aarch64_simd_emit_pair_result_insn): Delete.
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 248ffc8..e768f57 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2015-04-20  Jeff Law  
+
+   PR tree-optimization/65658
+   * gcc.dg/pr65658.c: New test.
+
 2015-04-20  Alan Lawrence  
 
PR target/64134
diff --git a/gcc/testsuite/gcc.dg/pr65658.c b/gcc/testsuite/gcc.dg/pr65658.c
new file mode 100644
index 000..cce0f2a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr65658.c
@@ -0,0 +1,111 @@
+/* { dg-do compile } */
+/* { dg-options "-Wuninitialized -O2 -Wno-implicit" } */
+
+extern int optind;
+struct undefinfo
+{
+  unsigned long l1;
+  unsigned long l2;
+};
+struct undeffoo
+{
+  char a[64];
+  long b[4];
+  int c[33];
+};
+struct problem
+{
+  unsigned long l1;
+  unsigned long l2;
+  unsigned long l3;
+  unsigned long l4;
+};
+static unsigned int undef1, undef2, undef3, undef4, undef5, undef6;
+static void *undefvp1;
+extern struct undefinfo undefinfo;
+static int
+undefinit1 (void)
+{
+  struct undeffoo foo;
+  int i;
+  for (i = 0; i < 2000; i++)
+{
+  undef6++;
+  external_function5 (((void *) 0), 0, (void *) &foo);
+}
+}
+
+static int
+undefinit2 (void *problemp, unsigned long problem)
+{
+  int ret, u;
+  if (undefinit1 ())
+return 1;
+  if (fn10 ())
+return 1;
+  for (u = 0; u < undef6; u++)
+{
+  ret = external_function1 (3 + u * 10, 10);
+  if (ret)
+   return ret;
+  external_function6 (0, 0, 0, problemp + problem);
+  return 1;
+}
+}
+
+static int
+fn6 (struct undefinfo *uip, struct problem *problem)
+{
+  unsigned long amt;
+  if (external_function3 (((void *) 0), ((void *) 0), &amt, 0, 0))
+return 1;
+  problem->l1 = (unsigned long) undefvp1;
+  problem->l4 = uip->l1;
+  problem->l3 = uip->l2;
+  return 0;
+}
+
+static int
+setup (void)
+{
+  struct problem problem;
+  if (fn6 (&undefinfo, &problem))
+return 1;
+  if (fn2 ())
+return 1;
+  if (fn4 (101))
+return 1;
+  if (undefinit2 ((void *) problem.l1, problem.l3 * 4))  /* { dg-bogus 
"problem.l3" "uninitialized variable warning" } */ 
+return 1;
+}
+
+int
+main (int argc, char **argv)
+{
+  int optc;
+  if (external_function (1))
+return 1;
+  if (external_function (1))
+return 1;
+  if (external_function (1))
+return 1;
+  while ((optc =
+ getopt_long (argc, argv, ((void *) 0), ((void *) 0),
+  ((void *) 0))) != -1)
+{
+  switch (optc)
+   {
+   case 0:
+ break;
+   case 'F':
+ external_function (1);
+   default:
+ return 1;
+   }
+}
+  if ((optind != 99))
+{
+  return 1;
+}
+  setup ();
+}
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 709b16e..9f263bd 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -1449,7 +1449,9 @@ redirection_block_p (basic_block bb)
   while (!gsi_end_p (gsi)
 && (gimple_code (gsi_stmt (gsi)) == GIMPLE_LABEL
 || is_gimple_debug (gsi_stmt (gsi))
-|| gimple_nop_p (gsi_stmt (gsi
+|| gimple_nop_p (gsi_stmt (gsi))
+|| (gimple_code (gsi_stmt (gsi)) == GIMPLE_ASSIGN
+&& gimple_clobber_p (gsi_stmt (gsi)
 gsi_next (&gsi);
 
   /* Check if this is an empty block.  */


Re: Handle redirection blocks with clobbers

2015-04-20 Thread Jakub Jelinek
On Mon, Apr 20, 2015 at 11:14:03AM -0600, Jeff Law wrote:
>while (!gsi_end_p (gsi)
>&& (gimple_code (gsi_stmt (gsi)) == GIMPLE_LABEL
>|| is_gimple_debug (gsi_stmt (gsi))
> -  || gimple_nop_p (gsi_stmt (gsi
> +  || gimple_nop_p (gsi_stmt (gsi))
> +  || (gimple_code (gsi_stmt (gsi)) == GIMPLE_ASSIGN

Why this line?  There is no need to test for GIMPLE_ASSIGN
(and canonical test for that would be is_gimple_assign (gsi_stmt (gsi))
anyway),

> +  && gimple_clobber_p (gsi_stmt (gsi)

gimple_clobber_p checks for that too.

Jakub


Re: [PATCH][expmed] Calculate mult-by-const cost properly in mult_by_coeff_cost

2015-04-20 Thread Jeff Law

On 04/20/2015 03:27 AM, Kyrill Tkachov wrote:


On 15/04/15 16:41, Jeff Law wrote:

On 04/14/2015 02:07 AM, Kyrill Tkachov wrote:

Hi Jeff,

Thanks for looking at this.

On 13/04/15 19:18, Jeff Law wrote:

On 03/16/2015 04:12 AM, Kyrill Tkachov wrote:

Hi all,

Eyeballing the mult_by_coeff_cost function I think it has a
typo/bug. It's supposed to return the cost of multiplying by
a constant 'coeff'. It calculates that by taking the cost of
a MULT rtx by that constant and comparing it to the cost of
synthesizing that multiplication, and returning the cheapest.
However, in the MULT rtx cost calculations it creates a MULT
rtx of two REGs rather than the a REG and the GEN_INT of
coeff as I would expect. This patches fixes that in the
obvious way.

Tested aarch64-none-elf and bootstrapped on
x86_64-linux-gnu. I'm guessing this is stage 1 material at
this point?

Thanks, Kyrill

2015-03-13  Kyrylo Tkachov  

* expmed.c (mult_by_coeff_cost): Pass CONT_INT rtx to MULT
cost calculation rather than fake_reg.

I'm pretty sure this patch is wrong.

The call you're referring to is computing an upper limit to the
cost for use by choose_mult_variant.  Once a synthesized
multiply sequence exceeds the cost of reg*reg, then that
synthesized sequence can be thrown away because it's not
profitable.

But shouldn't the limit be the mult-by-constant cost?

No, because ultimately we're trying to do better than just loading
the constant into a register and doing a reg * reg.  So the reg*reg
case is the upper bound for allowed cost of a synthesized
sequence.


So I've thought about it a bit more and I have another concern. The
function returns this: if (choose_mult_variant (mode, coeff,
&algorithm, &variant, max_cost)) return algorithm.cost.cost; else
return max_cost;

If I read this right, it tries to synthesise the mult at
choose_mult_variant with the limit cost of the reg-by-reg mult, but
if the synthesis cost exceeds that, then it returns the reg-by-reg
mult cost (in return max_cost;) so that can't be right, can it?
In the case where the target doesn't have mult imm,reg, then reg*reg 
would be the right estimated cost if there's no cheap synthesis.   It 
doesn't look like we correctly handle costing on targets with mult imm,reg.


jeff


Re: [Patch, fortran] PR65792 unitialized structure constructor array subcomponent

2015-04-20 Thread Dominique d'Humières
I have retested a clean tree with only the patches for pr 65792  [first patch] 
and Andre’s one for pr59678:  i.e., without any patch from pr61831, and I still 
see the conflict between the two patches.

Dominique

> Le 19 avr. 2015 à 10:39, Dominique d'Humières  a écrit :
> 
>> Snip
>> Both patches have been regression tested on trunk on x86_64-linux.
>> 
>> OK for trunk [first patch]?
>> OK for 4.9 and 5 (after the 5.1 release) [second patch]?
>> 
>> Mikael
>> 
>> PS: Dominiq reported that the variant of this patch posted on the PR was
>> also fixing PR49324. I couldn't confirm as what seems to be the
>> remaining testcase there (comment #6) doesn't fail with trunk here.
> 
> I have tested both patches on my working tree and on a clean one, but only on 
> top of the [better patch] for pr61831, without the hunk
> 
> @@ -4990,7 +5010,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol *
> 
> tmp = gfc_deallocate_alloc_comp (e->ts.u.derived, tmp, parm_rank);
> 
> -   gfc_add_expr_to_block (&se->post, tmp);
> +   gfc_prepend_expr_to_block (&se->post, tmp);
> }
> 
>   /* Add argument checking of passing an unallocated/NULL actual to
> 
> 
> as said in pr61831 comment 45 (the above hunk causes a regression for 
> gfortran.dg/alloc_comp_assign_10.f90).
> 
> AFAICT this is the [better patch] which fixes  PR49324.
> 
> Now Andre Vehreschild has submitted a patch for pr59678 at 
> https://gcc.gnu.org/ml/fortran/2015-04/msg00061.html. Andre's patch works 
> well with the [second patch]+[better patch], but leads to a regression for 
> gfortran.dg/class_19.f03 (pr65792 comment 3) with the [first patch]+[better 
> patch]. So if the [first patch] is chosen, it will require some change(s) in 
> Andre’s patch.
> 
> Thanks for working on these issues,
> 
> Dominique
> 



Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall

2015-04-20 Thread Jeff Law

On 04/20/2015 02:25 AM, Kyrill Tkachov wrote:

Hi Jeff,

Hmmm, so what happens if the difference is < 0?   I'd be a bit worried
about that case for the PA (for example).

So how about asserting that the INTVAL is >= 0 prior to returning so
that we catch that case if it ever occurs?


INTVAL being >= 0 is the case that I want to catch with this function.
INTVAL <0 is the usual case on leaf call optimisation. On arm, at least,
it means that x and y use the same base register (i.e. same stack frame)
but the offsets are such that reading SIZE bytes from X will not overlap
with Y, thus not requiring the workaround in this patch.
Thus, asserting that the result is positive is not right here.

What characteristic on pa makes this problematic? Is it the
STACK_GROWS_UPWARD?
Yea or more correctly that {STACK,FRAME}_GROWS_UPWARD and 
ARGS_GROW_DOWNWARD.  I think the stormy16 may have downward growing args 
too.




Should I then extend this function to do something like:

HOST_WIDE_INT res = INTVAL (sub);
#ifndef STACK_GROWS_DOWNWARD
res = -res;
#endif

return res?
It certainly feels like something is needed for targets where growth is 
in the opposite direction -- but my guess is that without a concrete 
case that triggers on those targets (just the PA in 64 bit mode and 
stormy?) we'll probably get it wrong in one way or another.  Hence my 
suggestion that we assert rather than try to handle it and silently 
generate incorrect code in the process.



Jeff


Re: [Patch] pr65779 - [5/6 Regression] undefined local symbol on powerpc

2015-04-20 Thread Jeff Law

On 04/19/2015 09:10 PM, Alan Modra wrote:

This patch removes bogus debug info left around by shrink-wrapping,
which on some powerpc targets with just the right register allocation
led to assembly errors.

Bootstrapped and regression tested powerpc64-linux and x86_64-linux.
I did see some regressions, but completely unrelated to this patch.
See pr65810 for the powerpc64 regressions.  x86_64-linux showed fails
of
+FAIL: c-c++-common/ubsan/object-size-10.c   -O2  execution test
+FAIL: c-c++-common/ubsan/object-size-10.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  execution test
+FAIL: c-c++-common/ubsan/object-size-10.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  execution test
+FAIL: gfortran.dg/class_allocate_18.f90   -O0  execution test
+FAIL: gfortran.dg/class_allocate_18.f90   -O1  execution test
  FAIL: gfortran.dg/class_allocate_18.f90   -O3 -fomit-frame-pointer 
-funroll-all-loops -finline-functions  execution test
-FAIL: gfortran.dg/class_allocate_18.f90   -Os  execution test
+FAIL: gfortran.dg/class_allocate_18.f90   -O3 -g  execution test

with the log for the ubsan fails
/src/gcc-5/gcc/testsuite/c-c++-common/ubsan/object-size-10.c:19:11: runtime 
error: index 128 out of bounds for type 'char [128]'
/src/gcc-5/gcc/testsuite/c-c++-common/ubsan/object-size-10.c:19:11: runtime 
error: load of address 0x0804a000 with insufficient space for an object of type 
'char'
0x0804a000: note: pointer points here

I assume I was thrashing my ubuntu 14.10 x86_64 box too hard and just
ran out of memory.  Running the test by hand resulted in the expected
output.

The class_allocate_18.f90 failure are intermittent, and occur
occasionally when running the testcase by hand.  :-(
Yea, I think folks are still trying to sort out what's happening with 
class_allocate_18.f90.




gcc/
PR debug/65779
* shrink-wrap.c (insn_uses_reg): New function.
(move_insn_for_shrink_wrap): Remove debug insns using regs set
by the moved insn.
gcc/testsuite/
* gcc.dg/pr65779.c: New.

Index: gcc/shrink-wrap.c
===
--- gcc/shrink-wrap.c   (revision 222160)
+++ gcc/shrink-wrap.c   (working copy)
@@ -182,6 +182,21 @@ live_edge_for_reg (basic_block bb, int regno, int
return live_edge;
  }

+static bool
+insn_uses_reg (rtx_insn *insn, unsigned int regno, unsigned int end_regno)
+{
+  df_ref use;
+
+  FOR_EACH_INSN_USE (use, insn)
+{
+  rtx reg = DF_REF_REG (use);
+
+  if (REG_P (reg) && REGNO (reg) >= regno && REGNO (reg) < end_regno)
+   return true;
+}
+  return false;
+}

Need a comment for this function.

So just one question.  Why handle the split case differently?  In the 
split case you effectively move the debug insn to the new block.  In the 
!split case, you just delete the debug insn.


I'm sure there's a reason, it would be worth noting it as a comment in 
this code.


OK with the comments added.

jeff




Re: [PATCH] 65479 - sanitizer stack trace missing frames past #0 on powerpc64

2015-04-20 Thread Jeff Law

On 04/19/2015 07:48 PM, Martin Sebor wrote:

The attached patch resolves the failures in a number of address
sanitizer tests on powerpc64*-*-*-* discussed in bug 65479 (the
failures in c-c++-common/asan/swapcontext-test-1.c reported in
pr65643 remain unresolved).

The patch has been tested on powerpc64*-*-*-* and x86_64 with
no regressions.

Is this okay for trunk? For 5.1?

Martin

gcc-65479.patch


diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index b4052ef..18eede3 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,12 @@
+2015-04-19  Martin Sebor
+
+   PR sanitizer/65479
+   * gcc/testsuite/c-c++-common/asan/misalign-1.c [powerpc*-*-*-*]:
+   Use -fno-omit-frame-pointer.  Adjust line numbers and expect exact
+   matches.
+   * gcc/testsuite/c-c++-common/asan/misalign-2.c: Ditto.
+   * gcc/testsuite/c-c++-common/asan/null-deref-1.c: Ditto.
So the ChangeLog doesn't match the patch.  The changelog references 
"-fno-omit-frame-pointer", but in the patch you actually add 
"-fasynchronous-unwind-tables".


I also wonder if other targets need -fasynchronous-unwind-tables and 
whether or not we should just add it unconditionally.






diff --git a/libbacktrace/ChangeLog b/libbacktrace/ChangeLog
index e385d8f..9348321 100644
--- a/libbacktrace/ChangeLog
+++ b/libbacktrace/ChangeLog
@@ -1,3 +1,11 @@
+2015-04-19  Martin Sebor
+
+   PR sanitizer/65479
+   * libbacktrace/dwarf.c (struct line): Add idx data member.
+   (line_compare): Use struct line idx member.
+   (add_line): Set ln->idx.
+   (read_line_info): Clear ln->idx.

Seems OK.



diff --git a/libsanitizer/ChangeLog b/libsanitizer/ChangeLog
index 6f44dcf..7b82378 100644
--- a/libsanitizer/ChangeLog
+++ b/libsanitizer/ChangeLog
@@ -1,3 +1,15 @@
+2015-04-19  Martin Sebor
+
+   PR sanitizer/65479
+   * libsanitizer/sanitizer_common/sanitizer_stacktrace.h
+   (StackTrace::signaled, StackTrace::min_insn_bytes): New data members.
+   (StackTrace::StackTrace): Initialize signaled.
+   * libsanitizer/sanitizer_common/sanitizer_stacktrace.cc
+   (StackTrace::GetPreviousInstructionPc): Rewrite.
+   * libsanitizer/sanitizer_common/sanitizer_stacktrace_libcdep.cc
+   (StackTrace::Print): Use min_insn_bytes to adjust PC value.
+   (BufferedStackTrace::Unwind): Set signaled.
Is libsanitizer maintained in LLVM?  If so, we want to minimize 
divergence, so it may be better to get this approved in LLVM then pick 
it up via a merge.




Given this hits 3 distinct pieces of code, do any of them make sense in 
isolation or do they have to land together as a unit?


Jeff



Re: [PATCH][MIPS] Enable load-load/store-store bonding

2015-04-20 Thread Mike Stump
With FUSION you might get farther.  See the arm port as I recall.

The quick overview, FUSION allows instructions that are not contiguous to be 
paired up and fused together.  it was built for load/load store/store combining.

On Apr 19, 2015, at 10:09 PM, sameera  wrote:
> Gentle reminder!
> 
> - Thanks and regards,
>  Sameera D.
> 
> On Monday 30 March 2015 04:58 PM, sameera wrote:
>> Hi!
>> 
>> Sorry for delay in sending this patch for review.
>> Please find attached updated patch.
>> 
>> In P5600, 2 consecutive loads/stores of same type which access contiguous 
>> memory locations are bonded together by instruction issue unit to dispatch
>> single load/store instruction which accesses both locations. This allows 2X 
>> improvement in memory intensive code. This optimization can be performed
>> for LH, SH, LW, SW, LWC, SWC, LDC, SDC instructions.
>> 
>> This patch adds peephole2 patterns to identify such loads/stores, and put 
>> them in parallel, so that the scheduler will not split it - thereby
>> guaranteeing h/w level load/store bonding.
>> 
>> The patch is tested with dejagnu for correctness, and tested on hardware for 
>> performance.
>> Ok for trunk?
>> 
>> Changelog:
>> gcc/
>> * config/mips/mips.md (JOIN_MODE): New mode iterator.
>> (join2_load_Store): New pattern.
>> (join2_loadhi): Likewise.
>> (define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode
>> load-load and store-stores.
>> * config/mips/mips.opt (mload-store-pairs): New option.
>> (TARGET_LOAD_STORE_PAIRS): New macro.
>> *config/mips/mips.h (ENABLE_LD_ST_PAIRS): Likewise.
>> *config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype.
>> *config/mips/mips.c(mips_load_store_bonding_p): New function.
>> 
>> - Thanks and regards,
>>   Sameera D.
>> 
>> On Tuesday 24 June 2014 04:12 PM, Sameera Deshpande wrote:
>>> Hi Richard,
>>> 
>>> Thanks for the review.
>>> Please find attached updated patch after your review comments.
>>> 
>>> Changelog:
>>> gcc/
>>>* config/mips/mips.md (JOIN_MODE): New mode iterator.
>>>(join2_load_Store): New pattern.
>>>(join2_loadhi): Likewise.
>>>(define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode
>>>load-load and store-stores.
>>>* config/mips/mips.opt (mload-store-pairs): New option.
>>>(TARGET_LOAD_STORE_PAIRS): New macro.
>>>*config/mips/mips.h (ENABLE_P5600_LD_ST_PAIRS): Likewise.
>>>*config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype.
>>>*config/mips/mips.c(mips_load_store_bonding_p): New function.
>>> 
>>> The change is tested with dejagnu with additional options 
>>> -mload-store-pairs and -mtune=p5600.
>>> The perf measurement is yet to finish.
>>> 
> We had offline discussion based on your comment. There is additional
> view on the same.
> Only ISAs mips32r2, mips32r3 and mips32r5 support P5600. Remaining
> ISAs do not support P5600.
> For mips32r2 (24K) and mips32r3 (micromips), load-store pairing is
> implemented separately, and hence, as you suggested, P5600 Ld-ST
> bonding optimization should not be enabled for them.
> So, is it fine if I emit error for any ISAs other than mips32r2,
> mips32r3 and mips32r5 when P5600 is enabled, or the compilation should
> continue by emitting warning and disabling P5600?
 
 No, the point is that we have two separate concepts: ISA and optimisation
 target.  -mipsN and -march=N control the ISA (which instructions are
 available) and -mtune=M controls optimisation decisions within the
 constraints of that N, such as scheduling and the cost of things like
 multiplication and division.
 
 E.g. you could have -mips2 -mtune=p5600 -mfix-24k: generate MIPS II-
 compatible code, optimise it for p5600, but make sure that 24k workarounds
 are used.  The code would run correctly on any MIPS II-compatible processor
 without known errata and also on the 24k.
>>> Ok, disabled the peephole pattern for fix-24k and micromips - to allow 
>>> specific patterns to be matched.
>>> 
> +
> +mld-st-pairing
> +Target Report Var(TARGET_ENABLE_LD_ST_PAIRING) Enable load/store
> +pairing
 
 Other options are just "TARGET_" + the captialised form of the option name,
 so I'd prefer TARGET_LD_ST_PAIRING instead.  Although "ld" might be
 misleading since it's an abbreviation for "load" rather than the LD 
 instruction.
 Maybe -mload-store-pairs, since plurals are more common than "-ing"?
 Not sure that's a great suggestion though.
>>> Renamed the option and corresponding macro as suggested.
>>> 
> Performance testing for this patch is not yet done.
> If the patch proves beneficial in most of the testcases (which we
> believe will do on P5600) we will enable this optimization by default
> for P5600 - in which case this option can be removed.
 
 OK.  Sending the patch for comments before performance testing

Re: [patch, c, ping] Fix PR c/48956: diagnostics for conversions involving complex types (reviewed)

2015-04-20 Thread Jeff Law

On 04/17/2015 11:37 AM, Mikhail Maltsev wrote:

On 04/17/2015 08:10 PM, Jeff Law wrote:

Have you received confirmation from the FSF WRT your copyright
assignment was accepted?

jeff


Yes, it's ID is [gnu.org #972407]. Should I forward the PDF to you?

Can't hurt for a confirmation.


Jeff



Re: [PATCH] 65479 - sanitizer stack trace missing frames past #0 on powerpc64

2015-04-20 Thread Yury Gribov

On 04/20/2015 09:23 PM, Jeff Law wrote:

On 04/19/2015 07:48 PM, Martin Sebor wrote:

The attached patch resolves the failures in a number of address
sanitizer tests on powerpc64*-*-*-* discussed in bug 65479 (the
failures in c-c++-common/asan/swapcontext-test-1.c reported in
pr65643 remain unresolved).

The patch has been tested on powerpc64*-*-*-* and x86_64 with
no regressions.

Is this okay for trunk? For 5.1?

Martin

gcc-65479.patch


diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index b4052ef..18eede3 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,12 @@
+2015-04-19  Martin Sebor
+
+PR sanitizer/65479
+* gcc/testsuite/c-c++-common/asan/misalign-1.c [powerpc*-*-*-*]:
+Use -fno-omit-frame-pointer.  Adjust line numbers and expect exact
+matches.
+* gcc/testsuite/c-c++-common/asan/misalign-2.c: Ditto.
+* gcc/testsuite/c-c++-common/asan/null-deref-1.c: Ditto.

So the ChangeLog doesn't match the patch.  The changelog references
"-fno-omit-frame-pointer", but in the patch you actually add
"-fasynchronous-unwind-tables".

I also wonder if other targets need -fasynchronous-unwind-tables and
whether or not we should just add it unconditionally.


Perhaps enable unwind tables in GCC spec if -fsanitize=address is 
present? Sanitizer backtraces typically won't work without unwind tables 
anyway so IMHO this makes sense.


BTW why do we need asynchronous tables? Wouldn't simple -funwind-tables 
be enough?


-Y



Re: [PATCH, i386, Darwin RFT]: Remove reload_in_progress checks

2015-04-20 Thread Mike Stump
On Apr 20, 2015, at 3:16 AM, Uros Bizjak  wrote:
> On Mon, Apr 20, 2015 at 12:00 PM, Iain Sandoe  wrote:
> 
>>> After having fixed the typo, regtesting went without regression.
>> 
>> I have done a bootstrap on i686-darwin10 with the amended patch - slow 
>> machine, so testing still in progress (but looks OK so far),
>> 
>> NOTE: that there some references to reload_in_progress in config/darwin.c 
>> pic code shared between powerpc and x86 darwin implementations.  I will do a 
>> follow-up patch to make those assert if triggered on x86 (AFAIK, they still 
>> need to be present for powerpc, at present).
> 
> Probably a better way is to include "targetm.lra_p ()" into the check.

Only if you discount asking the nice rs6000/powerpc people to lra the entire 
port, and by that, I mean, remove the non-lra code.  :-)

Re: [PATCH] 65479 - sanitizer stack trace missing frames past #0 on powerpc64

2015-04-20 Thread Jeff Law

On 04/20/2015 12:38 PM, Yury Gribov wrote:

Perhaps enable unwind tables in GCC spec if -fsanitize=address is
present? Sanitizer backtraces typically won't work without unwind tables
anyway so IMHO this makes sense.

BTW why do we need asynchronous tables? Wouldn't simple -funwind-tables
be enough?
I haven't thought much about it.  I'd kind of expect with the 
instrumentation for the sanitizers that it wouldn't make much, if any 
difference.  But I've never been inside the sanitizer instrumentation :)


jeff


Re: [PATCH] 65479 - sanitizer stack trace missing frames past #0 on powerpc64

2015-04-20 Thread Jakub Jelinek
On Mon, Apr 20, 2015 at 09:38:03PM +0300, Yury Gribov wrote:
> >>--- a/gcc/testsuite/ChangeLog
> >>+++ b/gcc/testsuite/ChangeLog
> >>@@ -1,3 +1,12 @@
> >>+2015-04-19  Martin Sebor
> >>+
> >>+PR sanitizer/65479
> >>+* gcc/testsuite/c-c++-common/asan/misalign-1.c [powerpc*-*-*-*]:
> >>+Use -fno-omit-frame-pointer.  Adjust line numbers and expect exact
> >>+matches.
> >>+* gcc/testsuite/c-c++-common/asan/misalign-2.c: Ditto.
> >>+* gcc/testsuite/c-c++-common/asan/null-deref-1.c: Ditto.
> >So the ChangeLog doesn't match the patch.  The changelog references
> >"-fno-omit-frame-pointer", but in the patch you actually add
> >"-fasynchronous-unwind-tables".
> >
> >I also wonder if other targets need -fasynchronous-unwind-tables and
> >whether or not we should just add it unconditionally.

PowerPC really should use the "fast" unwinding unconditionally, as it always
works there reliably due to the ABI requirements.
So IMHO we shouldn't change the tests this way.

> Perhaps enable unwind tables in GCC spec if -fsanitize=address is present?

No.  That is orthogonal to that, most targets enable them by default anyway
and if somebody for some reason asks for something different, we should
honor that.

> Sanitizer backtraces typically won't work without unwind tables anyway so
> IMHO this makes sense.
> 
> BTW why do we need asynchronous tables? Wouldn't simple -funwind-tables be
> enough?

-funwind-tables enables them only for functions that can throw, while you
really want it for all functions.

Jakub


[PATCH 0/13] Add musl support to GCC

2015-04-20 Thread Szabolcs Nagy
This patch set adds musl libc[0] support to GCC.

The patches were originally developed by Gregor Richards[1][2],
who I believe have already done the FSF copyright assignment and agrees
with the changes I made (please verify).  I only did minor cleanups
to make the patches better suited for upstream.

Issues I don't have patches for:

* On powerpc it seems the only configure option to choose the default
long-double abi is --with-long-double-128, but that's the default with
sufficiently new glibc. (musl gets 64bit long-double because the glibc
version check fails, this is ok, because there is no ibm128 support in
musl, but it would be better if --with-long-double-64 could be set
explicitly or even ieee128 abi if gcc supports that).

* libcilkrts uses pthread_yield on linux instead of sched_yield by
default, but musl does not have the non-standard pthread_yield symbol.

Tests:

I only tested the C and C++ languages.  GCJ needs further patching,
I haven't tested gfortran nor Ada.  libsanitizer has to be disabled
for musl.  I built cross compilers for various targets musl supports
and ran x86_64-linux-musl cross compiler (musl target glibc host)
and native musl make check.  There were several errors, some of the
causes:

* libcilkrts.so has undefined ref to pthread_yield
* ifunc is not supported by musl
* ifunc is used in x86 libatomic.so for __atomic_store_16
* gcc stdatomic.h has some incompatible typedefs with musl stdint.h
* musl does not support unwind across libc calls or signal handlers
* splitstacks does not work (undef refs to __morestack* in libgcc)
* musl does not yet have __memcpy_chk etc.
* -fshort-wchar is not supported
* i've seen x86 specific avx failures
* atexit during exit fails with musl (some c++ destructor tests)

This fixes BZ 58446 and 55807 which requested musl support.

[0] http://www.musl-libc.org/
[1] https://github.com/GregorR/musl-gcc-patches
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58446

--
Changes:
 fixincludes/mkfixinc.sh |3 +-
 gcc/config.gcc  |9 ++-
 gcc/config/aarch64/aarch64-linux.h  |8 ++
 gcc/config/arm/linux-eabi.h |   17 +
 gcc/config/i386/linux.h |1 +
 gcc/config/i386/linux64.h   |4 +
 gcc/config/i386/pmm_malloc.h|9 ++-
 gcc/config/linux.h  |  105 +++
 gcc/config/linux.opt|4 +
 gcc/config/microblaze/linux.h   |   16 +++-
 gcc/config/mips/linux.h |7 ++
 gcc/config/rs6000/linux64.h |   12 ++-
 gcc/config/rs6000/secureplt.h   |1 +
 gcc/config/rs6000/sysv4.h   |   16 +++-
 gcc/config/sh/linux.h   |7 ++
 gcc/configure   |7 ++
 gcc/configure.ac|7 ++
 libgcc/unwind-dw2-fde-dip.c |6 ++
 libgfortran/acinclude.m4|2 +-
 libgfortran/configure   |2 +-
 libitm/config/arm/hwcap.cc  |2 +-
 libitm/config/linux/x86/tls.h   |8 +-
 libstdc++-v3/config/os/generic/os_defines.h |5 ++
 libstdc++-v3/configure.host |3 +
 24 files changed, 227 insertions(+), 34 deletions(-)



Re: [committed] Fix GC ICE due to dwarf2out bug (PR debug/65807)

2015-04-20 Thread Mike Stump
On Apr 20, 2015, at 6:34 AM, Jakub Jelinek  wrote:
> add_AT_wide is the only add_AT_* that doesn't clear or otherwise initialize
> dw_attr_val.val_entry field, so it contains random garbage, which isn't
> desirable when ggc walks it during collections.
> 
> Supposedly this omission originates from the val_entry addition being added
> everywhere only after wide-int branch grabbed some add_AT_* routine from
> dwarf2out.c as example for the add_AT_wide addition.

I can indeed confirm this is what happened.

> Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk
> and 5.1 as obvious.

The fix looks good to me.


[PATCH 1/13] libitm fixes for musl support

2015-04-20 Thread Szabolcs Nagy
This are minor correctness fixes required for musl.

(fcntl.h is the standard header and always available on Linux,
sys/fcntl.h is just a legacy alias, so use the standard one.)

libitm/Changelog:

2015-04-16  Gregor Richards  

* config/arm/hwcap.cc: Use fcntl.h instead of sys/fcntl.h.
* config/linux/x86/tls.h: Only use __GLIBC_PREREQ if defined.
diff --git a/libitm/config/arm/hwcap.cc b/libitm/config/arm/hwcap.cc
index a1c2cfd..ea8f023 100644
--- a/libitm/config/arm/hwcap.cc
+++ b/libitm/config/arm/hwcap.cc
@@ -40,7 +40,7 @@ int GTM_hwcap HIDDEN = 0
 
 #ifdef __linux__
 #include 
-#include 
+#include 
 #include 
 
 static void __attribute__((constructor))
diff --git a/libitm/config/linux/x86/tls.h b/libitm/config/linux/x86/tls.h
index e731ab7..54ad8b6 100644
--- a/libitm/config/linux/x86/tls.h
+++ b/libitm/config/linux/x86/tls.h
@@ -25,16 +25,19 @@
 #ifndef LIBITM_X86_TLS_H
 #define LIBITM_X86_TLS_H 1
 
-#if defined(__GLIBC_PREREQ) && __GLIBC_PREREQ(2, 10)
+#if defined(__GLIBC_PREREQ)
+#if __GLIBC_PREREQ(2, 10)
 /* Use slots in the TCB head rather than __thread lookups.
GLIBC has reserved words 10 through 13 for TM.  */
 #define HAVE_ARCH_GTM_THREAD 1
 #define HAVE_ARCH_GTM_THREAD_DISP 1
 #endif
+#endif
 
 #include "config/generic/tls.h"
 
-#if defined(__GLIBC_PREREQ) && __GLIBC_PREREQ(2, 10)
+#if defined(__GLIBC_PREREQ)
+#if __GLIBC_PREREQ(2, 10)
 namespace GTM HIDDEN {
 
 #ifdef __x86_64__
@@ -101,5 +104,6 @@ static inline void set_abi_disp(struct abi_dispatch *x)
 
 } // namespace GTM
 #endif /* >= GLIBC 2.10 */
+#endif
 
 #endif // LIBITM_X86_TLS_H


Re: [PATCH] 65479 - sanitizer stack trace missing frames past #0 on powerpc64

2015-04-20 Thread Yury Gribov

On 04/20/2015 09:43 PM, Jakub Jelinek wrote:

On Mon, Apr 20, 2015 at 09:38:03PM +0300, Yury Gribov wrote:

--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,12 @@
+2015-04-19  Martin Sebor
+
+PR sanitizer/65479
+* gcc/testsuite/c-c++-common/asan/misalign-1.c [powerpc*-*-*-*]:
+Use -fno-omit-frame-pointer.  Adjust line numbers and expect exact
+matches.
+* gcc/testsuite/c-c++-common/asan/misalign-2.c: Ditto.
+* gcc/testsuite/c-c++-common/asan/null-deref-1.c: Ditto.

So the ChangeLog doesn't match the patch.  The changelog references
"-fno-omit-frame-pointer", but in the patch you actually add
"-fasynchronous-unwind-tables".

I also wonder if other targets need -fasynchronous-unwind-tables and
whether or not we should just add it unconditionally.


PowerPC really should use the "fast" unwinding unconditionally, as it always
works there reliably due to the ABI requirements.
So IMHO we shouldn't change the tests this way.


Agreed, I think Martin just wanted a temp workaround until he gets to 
fix PowerPC unwinder in LLVM.



Perhaps enable unwind tables in GCC spec if -fsanitize=address is present?


No.  That is orthogonal to that, most targets enable them by default anyway
and if somebody for some reason asks for something different, we should
honor that.


Sanitizer backtraces typically won't work without unwind tables anyway so
IMHO this makes sense.

BTW why do we need asynchronous tables? Wouldn't simple -funwind-tables be
enough?


-funwind-tables enables them only for functions that can throw, while you
really want it for all functions.


Right but asynchronous tables also enable them for all instructions 
which is quite an overkill.


-Y



Re: [PATCH][combine] Do not call rtx costs on potentially unrecognisable rtxes in combine

2015-04-20 Thread Jeff Law

On 04/20/2015 08:04 AM, Kyrill Tkachov wrote:

Hi all,

I'm trying to reduce the cases where the midend calls the backend rtx
costs on bogus rtl for which the backend
doesn't have patterns or ways of handling. Having to handle these kinds
of rtxes sanely bloats those
functions and makes them harder to maintain.

One of the cases where this occurs is in combine and
distribute_and_simplify_rtx in particular.
Citing the comment at that function:
" See if X is of the form (* (+ A B) C), and if so convert to
(+ (* A C) (* B C)) and try to simplify.
  Most of the time, this results in no change.  However, if some of
the operands are the same or inverses of each other, simplifications
will result."

The problem is that after it applies the distributive law it calls rtx
costs
to figure out whether the rtx became simpler. This rtx can get pretty
complex.
For example, on arm I've seen it try to cost:
(plus:SI (mult:SI (plus:SI (reg:SI 232 [ m1 ])
 (const_int 1 [0x1]))
 (reg:SI 232 [ m1 ]))
 (plus:SI (reg:SI 232 [ m1 ])
 (const_int 1 [0x1])))

which is never going to match anything on arm anyway, so why should the
costs function handle it?
In any case, I believe combine's design is such that it should first be
attempting to call
recog and split on the rtxes, and only if that succeeds should it be
making a target-specific
decision on which rtx to prefer. distribute_and_simplify_rtx goes
against that by calling
rtx costs on an unverified rtx in attempt to gauge its complexity.

This patch remedies that by removing the call to rtx costs and instead
manually performing
a relatively simple check on whether the resultant rtx was simplified.
That is, using the example
from the comment, whether (+ (* A C) (* B C)) still has + at the top and
* in the two operands.
This should give a good indication on whether any meaningful
simplification was made (The '+' and '*'
operators in the example can be any operators that can be distributed
over).

Initially, I wanted to just return the distributed version and let recog
reject the invalid rtxes
but that caused some code quality regressions on arm where the original
rtx would not recog but
would match a beneficial splitter, whereas the distributed rtx would not.

With this patch I saw almost no codegen differences on arm for the whole
of SPEC2006.
The one exception was 416.gamess where it managed to merge a mul and an
add into an mla
which resulted in a slightly better code sequence. That was in a pretty
large file and I
don't speak Fortran'ese, so I couldn't really reduce a testcase for it,
but my guess is that
before the patch the costs would return some essentially random value
for an arbitrarily complex rtx
that it was passed to, which changed the decision in
distribute_and_simplify_rtx on whether
to return the distributed rtx, which could have impacted further
optimisations in combine.

I tried it on x86_64 as well. Again, there were almost no codegen
differences. The exception
was tonto and wrf where a few instructions were eliminated, but no
significant difference.
The resultant binaries for these two were a tiny bit smaller, with no
impact on runtime.

Therefore I claim that this a safe thing to do, as it leaves the
target-specific rtx cost
judgements in combine to be made only on valid recog-ed rtxes, and not
having them cancel
optimisations early due to rtx costs not handling arbitrary rtxes well.

Bootstrapped on arm, x86_64, aarch64 (all linux). Tested on arm,aarch64.

Ok for trunk?

Thanks,
Kyrill


2015-04-20  Kyrylo Tkachov  

 * combine.c (distribute_and_simplify_rtx): Do not check rtx costs.
 Look at the rtx codes to see if a simplification occured.

OK.

Though I do wonder if, in practice, we can identify those cases that do 
simplify more directly apriori and just punt everything else rather than 
this rather convoluted approach.


jeff



[PATCH 2/13] musl libc config

2015-04-20 Thread Szabolcs Nagy
Add musl libc support to gcc and the command line option -mmusl following other
libc support code.

Note that -m cannot be entirely correct: there are build time decisions
based on the default libc.

gcc/Changelog:

2015-04-16  Gregor Richards  

* config.gcc (LIBC_MUSL): New tm_defines macro.
* config/linux.h (OPTION_MUSL): Define.
(INCLUDE_DEFAULTS_MUSL_GPP, INCLUDE_DEFAULTS_MUSL_LOCAL,)
(INCLUDE_DEFAULTS_MUSL_PREFIX, INCLUDE_DEFAULTS_MUSL_CROSS,)
(INCLUDE_DEFAULTS_MUSL_TOOL, INCLUDE_DEFAULTS_MUSL_NATIVE): Define.

* config/linux.opt (mmusl): New option.
* gcc/configure.ac (gcc_cv_libc_provides_ssp): Add *-*-musl*.
(gcc_cv_target_dl_iterate_phdr): Add *-linux-musl*.

* gcc/configure: Regenerate.
diff --git a/gcc/config.gcc b/gcc/config.gcc
index cb08a5c..8e59bb0 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -575,7 +575,7 @@ case ${target} in
 esac
 
 # Common C libraries.
-tm_defines="$tm_defines LIBC_GLIBC=1 LIBC_UCLIBC=2 LIBC_BIONIC=3"
+tm_defines="$tm_defines LIBC_GLIBC=1 LIBC_UCLIBC=2 LIBC_BIONIC=3 LIBC_MUSL=4"
 
 # 32-bit x86 processors supported by --with-arch=.  Each processor
 # MUST be separated by exactly one space.
@@ -720,6 +720,9 @@ case ${target} in
 *-*-*uclibc*)
   tm_defines="$tm_defines DEFAULT_LIBC=LIBC_UCLIBC"
   ;;
+*-*-*musl*)
+  tm_defines="$tm_defines DEFAULT_LIBC=LIBC_MUSL"
+  ;;
 *)
   tm_defines="$tm_defines DEFAULT_LIBC=LIBC_GLIBC"
   ;;
diff --git a/gcc/config/linux.h b/gcc/config/linux.h
index 857389a..cda87bf 100644
--- a/gcc/config/linux.h
+++ b/gcc/config/linux.h
@@ -32,10 +32,12 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define OPTION_GLIBC  (DEFAULT_LIBC == LIBC_GLIBC)
 #define OPTION_UCLIBC (DEFAULT_LIBC == LIBC_UCLIBC)
 #define OPTION_BIONIC (DEFAULT_LIBC == LIBC_BIONIC)
+#define OPTION_MUSL   (DEFAULT_LIBC == LIBC_MUSL)
 #else
 #define OPTION_GLIBC  (linux_libc == LIBC_GLIBC)
 #define OPTION_UCLIBC (linux_libc == LIBC_UCLIBC)
 #define OPTION_BIONIC (linux_libc == LIBC_BIONIC)
+#define OPTION_MUSL   (linux_libc == LIBC_MUSL)
 #endif
 
 #define GNU_USER_TARGET_OS_CPP_BUILTINS()			\
@@ -50,21 +52,25 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 } while (0)
 
 /* Determine which dynamic linker to use depending on whether GLIBC or
-   uClibc or Bionic is the default C library and whether
-   -muclibc or -mglibc or -mbionic has been passed to change the default.  */
+   uClibc or Bionic or musl is the default C library and whether
+   -muclibc or -mglibc or -mbionic or -mmusl has been passed to change
+   the default.  */
 
-#define CHOOSE_DYNAMIC_LINKER1(LIBC1, LIBC2, LIBC3, LD1, LD2, LD3)	\
-  "%{" LIBC2 ":" LD2 ";:%{" LIBC3 ":" LD3 ";:" LD1 "}}"
+#define CHOOSE_DYNAMIC_LINKER1(LIBC1, LIBC2, LIBC3, LIBC4, LD1, LD2, LD3, LD4)	\
+  "%{" LIBC2 ":" LD2 ";:%{" LIBC3 ":" LD3 ";:%{" LIBC4 ":" LD4 ";:" LD1 "}}}"
 
 #if DEFAULT_LIBC == LIBC_GLIBC
-#define CHOOSE_DYNAMIC_LINKER(G, U, B) \
-  CHOOSE_DYNAMIC_LINKER1 ("mglibc", "muclibc", "mbionic", G, U, B)
+#define CHOOSE_DYNAMIC_LINKER(G, U, B, M) \
+  CHOOSE_DYNAMIC_LINKER1 ("mglibc", "muclibc", "mbionic", "mmusl", G, U, B, M)
 #elif DEFAULT_LIBC == LIBC_UCLIBC
-#define CHOOSE_DYNAMIC_LINKER(G, U, B) \
-  CHOOSE_DYNAMIC_LINKER1 ("muclibc", "mglibc", "mbionic", U, G, B)
+#define CHOOSE_DYNAMIC_LINKER(G, U, B, M) \
+  CHOOSE_DYNAMIC_LINKER1 ("muclibc", "mglibc", "mbionic", "mmusl", U, G, B, M)
 #elif DEFAULT_LIBC == LIBC_BIONIC
-#define CHOOSE_DYNAMIC_LINKER(G, U, B) \
-  CHOOSE_DYNAMIC_LINKER1 ("mbionic", "mglibc", "muclibc", B, G, U)
+#define CHOOSE_DYNAMIC_LINKER(G, U, B, M) \
+  CHOOSE_DYNAMIC_LINKER1 ("mbionic", "mglibc", "muclibc", "mmusl", B, G, U, M)
+#elif DEFAULT_LIBC == LIBC_MUSL
+#define CHOOSE_DYNAMIC_LINKER(G, U, B, M) \
+  CHOOSE_DYNAMIC_LINKER1 ("mmusl", "mglibc", "muclibc", "mbionic", M, G, U, B)
 #else
 #error "Unsupported DEFAULT_LIBC"
 #endif /* DEFAULT_LIBC */
@@ -84,21 +90,92 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 
 #define GNU_USER_DYNAMIC_LINKER		\
   CHOOSE_DYNAMIC_LINKER (GLIBC_DYNAMIC_LINKER, UCLIBC_DYNAMIC_LINKER,	\
-			 BIONIC_DYNAMIC_LINKER)
+			 BIONIC_DYNAMIC_LINKER, MUSL_DYNAMIC_LINKER)
 #define GNU_USER_DYNAMIC_LINKER32	\
   CHOOSE_DYNAMIC_LINKER (GLIBC_DYNAMIC_LINKER32, UCLIBC_DYNAMIC_LINKER32, \
-			 BIONIC_DYNAMIC_LINKER32)
+			 BIONIC_DYNAMIC_LINKER32, MUSL_DYNAMIC_LINKER32)
 #define GNU_USER_DYNAMIC_LINKER64	\
   CHOOSE_DYNAMIC_LINKER (GLIBC_DYNAMIC_LINKER64, UCLIBC_DYNAMIC_LINKER64, \
-			 BIONIC_DYNAMIC_LINKER64)
+			 BIONIC_DYNAMIC_LINKER64, MUSL_DYNAMIC_LINKER64)
 #define GNU_USER_DYNAMIC_LINKERX32	\
   CHOOSE_DYNAMIC_LINKER (GLIBC_DYNAMIC_LINKERX32, UCLIBC_DYNAMIC_LINKERX32, \
-			 BIONIC_DYNAMIC_LINKERX32)
+			 BIONIC_DYNAMIC_LINKERX32, MUSL_DYNAMIC_LINKERX32)
 
 /* Whether we have Bionic libc runtime */
 #undef TARGET_HAS_BIONIC
 #define TARGET_HAS_BIONIC (OPTIO

[PATCH 3/13] aarch64 musl support

2015-04-20 Thread Szabolcs Nagy
Set up dynamic linker name for aarch64.

gcc/Changelog:

2015-04-16  Gregor Richards  
Szabolcs Nagy  

* config/aarch64/aarch64-linux.h (MUSL_DYNAMIC_LINKER): Define.
diff --git a/gcc/config/aarch64/aarch64-linux.h b/gcc/config/aarch64/aarch64-linux.h
index 9abb252..5ff83dd 100644
--- a/gcc/config/aarch64/aarch64-linux.h
+++ b/gcc/config/aarch64/aarch64-linux.h
@@ -23,6 +23,14 @@
 
 #define GLIBC_DYNAMIC_LINKER "/lib/ld-linux-aarch64%{mbig-endian:_be}%{mabi=ilp32:_ilp32}.so.1"
 
+#if TARGET_BIG_ENDIAN_DEFAULT
+#define MUSL_DYNAMIC_LINKER_E "%{mlittle-endian:;:_be}"
+#else
+#define MUSL_DYNAMIC_LINKER_E "%{mbig-endian:_be}"
+#endif
+#define MUSL_DYNAMIC_LINKER \
+  "/lib/ld-musl-aarch64" MUSL_DYNAMIC_LINKER_E ".so.1"
+
 #undef  ASAN_CC1_SPEC
 #define ASAN_CC1_SPEC "%{%:sanitize(address):-funwind-tables}"
 


[PATCH 4/13] arm musl support

2015-04-20 Thread Szabolcs Nagy
Set up dynamic linker name for arm.

gcc/Changelog:

2015-04-16  Gregor Richards  

* config/arm/linux-eabi.h (MUSL_DYNAMIC_LINKER): Define.
diff --git a/gcc/config/arm/linux-eabi.h b/gcc/config/arm/linux-eabi.h
index e9d65dc..f12e6bd 100644
--- a/gcc/config/arm/linux-eabi.h
+++ b/gcc/config/arm/linux-eabi.h
@@ -77,6 +77,23 @@
 %{mfloat-abi=soft*:" GLIBC_DYNAMIC_LINKER_SOFT_FLOAT "} \
 %{!mfloat-abi=*:" GLIBC_DYNAMIC_LINKER_DEFAULT "}"
 
+/* For ARM musl currently supports four dynamic linkers:
+   - ld-musl-arm.so.1 - for the EABI-derived soft-float ABI
+   - ld-musl-armhf.so.1 - for the EABI-derived hard-float ABI
+   - ld-musl-armeb.so.1 - for the EABI-derived soft-float ABI, EB
+   - ld-musl-armebhf.so.1 - for the EABI-derived hard-float ABI, EB
+   musl does not support the legacy OABI mode.
+   All the dynamic linkers live in /lib.
+   We default to soft-float, EL. */
+#undef  MUSL_DYNAMIC_LINKER
+#if TARGET_BIG_ENDIAN_DEFAULT
+#define MUSL_DYNAMIC_LINKER_E "%{mlittle-endian:;:eb}"
+#else
+#define MUSL_DYNAMIC_LINKER_E "%{mbig-endian:eb}"
+#endif
+#define MUSL_DYNAMIC_LINKER \
+  "/lib/ld-musl-arm" MUSL_DYNAMIC_LINKER_E "%{mfloat-abi=hard:hf}.so.1"
+
 /* At this point, bpabi.h will have clobbered LINK_SPEC.  We want to
use the GNU/Linux version, not the generic BPABI version.  */
 #undef  LINK_SPEC


[PATCH 5/13] microblaze musl support

2015-04-20 Thread Szabolcs Nagy
Set up dynamic linker name for microblaze.

gcc/Changelog:

2015-04-16  Gregor Richards  

* config/microblaze/linux.h (MUSL_DYNAMIC_LINKER): Define.
(DYNAMIC_LINKER): Change.
diff --git a/gcc/config/microblaze/linux.h b/gcc/config/microblaze/linux.h
index a7faa7d..14fe41e 100644
--- a/gcc/config/microblaze/linux.h
+++ b/gcc/config/microblaze/linux.h
@@ -25,7 +25,21 @@
 #undef TLS_NEEDS_GOT
 #define TLS_NEEDS_GOT 1
 
-#define DYNAMIC_LINKER "/lib/ld.so.1"
+#if TARGET_BIG_ENDIAN_DEFAULT == 0 /* LE */
+#define MUSL_DYNAMIC_LINKER_E "%{EB:;:el}"
+#else
+#define MUSL_DYNAMIC_LINKER_E "%{EL:el}"
+#endif
+
+#define MUSL_DYNAMIC_LINKER "/lib/ld-musl-microblaze" MUSL_DYNAMIC_LINKER_E ".so.1"
+#define GLIBC_DYNAMIC_LINKER "/lib/ld.so.1"
+
+#if DEFAULT_LIBC == LIBC_MUSL
+#define DYNAMIC_LINKER MUSL_DYNAMIC_LINKER
+#else
+#define DYNAMIC_LINKER GLIBC_DYNAMIC_LINKER
+#endif
+
 #undef  SUBTARGET_EXTRA_SPECS
 #define SUBTARGET_EXTRA_SPECS \
   { "dynamic_linker", DYNAMIC_LINKER }


[PATCH 6/13] mips musl support

2015-04-20 Thread Szabolcs Nagy
Set up dynamic linker name for mips.

gcc/Changelog:

2015-04-16  Gregor Richards  

* config/mips/linux.h (MUSL_DYNAMIC_LINKER): Define.
diff --git a/gcc/config/mips/linux.h b/gcc/config/mips/linux.h
index 91df261..5057bc5 100644
--- a/gcc/config/mips/linux.h
+++ b/gcc/config/mips/linux.h
@@ -37,6 +37,13 @@ along with GCC; see the file COPYING3.  If not see
 #define UCLIBC_DYNAMIC_LINKERN32 \
   "%{mnan=2008:/lib32/ld-uClibc-mipsn8.so.0;:/lib32/ld-uClibc.so.0}"
 
+#if TARGET_ENDIAN_DEFAULT == 0 /* LE */
+#define MUSL_DYNAMIC_LINKER_E "%{EB:;:el}"
+#else
+#define MUSL_DYNAMIC_LINKER_E "%{EL:el}"
+#endif
+#define MUSL_DYNAMIC_LINKER "/lib/ld-musl-mips" MUSL_DYNAMIC_LINKER_E ".so.1"
+
 #define BIONIC_DYNAMIC_LINKERN32 "/system/bin/linker32"
 #define GNU_USER_DYNAMIC_LINKERN32 \
   CHOOSE_DYNAMIC_LINKER (GLIBC_DYNAMIC_LINKERN32, UCLIBC_DYNAMIC_LINKERN32, \


[PATCH 7/13] powerpc musl support

2015-04-20 Thread Szabolcs Nagy
Set up dynamic linker name for powerpc.  Musl only supports powerpc
with secure plt, so appropriate options are passed to the linker by
default.


gcc/Changelog:

2015-04-16  Gregor Richards  

* config.gcc (secure_plt): Add *-linux*-musl*.
* config/rs6000/linux64.h (MUSL_DYNAMIC_LINKER): Define.
(CHOOSE_DYNAMIC_LINKER): Update.

* config/rs6000/secureplt.h (LINK_SECURE_PLT_DEFAULT_SPEC): Define.
* config/rs6000/sysv4.h (LINK_SECURE_PLT_DEFAULT_SPEC): Define.
(CHOOSE_DYNAMIC_LINKER, LINK_TARGET_SPEC, LINK_OS_LINUX_SPEC): Update.
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 8e59bb0..34d4bd4 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2409,6 +2409,10 @@ powerpc*-*-linux*)
 	powerpc*-*-linux*paired*)
 		tm_file="${tm_file} rs6000/750cl.h" ;;
 	esac
+	case ${target} in
+	*-linux*-musl*)
+		enable_secureplt=yes ;;
+	esac
 	if test x${enable_secureplt} = xyes; then
 		tm_file="rs6000/secureplt.h ${tm_file}"
 	fi
diff --git a/gcc/config/rs6000/linux64.h b/gcc/config/rs6000/linux64.h
index 0879e7e..f00c775 100644
--- a/gcc/config/rs6000/linux64.h
+++ b/gcc/config/rs6000/linux64.h
@@ -365,17 +365,21 @@ extern int dot_symbols;
 #endif
 #define UCLIBC_DYNAMIC_LINKER32 "/lib/ld-uClibc.so.0"
 #define UCLIBC_DYNAMIC_LINKER64 "/lib/ld64-uClibc.so.0"
+#define MUSL_DYNAMIC_LINKER32 "/lib/ld-musl-powerpc.so.1"
+#define MUSL_DYNAMIC_LINKER64 "/lib/ld-musl-powerpc64.so.1"
 #if DEFAULT_LIBC == LIBC_UCLIBC
-#define CHOOSE_DYNAMIC_LINKER(G, U) "%{mglibc:" G ";:" U "}"
+#define CHOOSE_DYNAMIC_LINKER(G, U, M) "%{mglibc:" G ";:%{mmusl:" M ";:" U "}}"
 #elif DEFAULT_LIBC == LIBC_GLIBC
-#define CHOOSE_DYNAMIC_LINKER(G, U) "%{muclibc:" U ";:" G "}"
+#define CHOOSE_DYNAMIC_LINKER(G, U, M) "%{muclibc:" U ";:%{mmusl:" M ";:" G "}}"
+#elif DEFAULT_LIBC == LIBC_MUSL
+#define CHOOSE_DYNAMIC_LINKER(G, U, M) "%{mglibc:" G ";:%{muclibc:" U ";:" M "}}"
 #else
 #error "Unsupported DEFAULT_LIBC"
 #endif
 #define GNU_USER_DYNAMIC_LINKER32 \
-  CHOOSE_DYNAMIC_LINKER (GLIBC_DYNAMIC_LINKER32, UCLIBC_DYNAMIC_LINKER32)
+  CHOOSE_DYNAMIC_LINKER (GLIBC_DYNAMIC_LINKER32, UCLIBC_DYNAMIC_LINKER32, MUSL_DYNAMIC_LINKER32)
 #define GNU_USER_DYNAMIC_LINKER64 \
-  CHOOSE_DYNAMIC_LINKER (GLIBC_DYNAMIC_LINKER64, UCLIBC_DYNAMIC_LINKER64)
+  CHOOSE_DYNAMIC_LINKER (GLIBC_DYNAMIC_LINKER64, UCLIBC_DYNAMIC_LINKER64, MUSL_DYNAMIC_LINKER64)
 
 #undef  DEFAULT_ASM_ENDIAN
 #if (TARGET_DEFAULT & MASK_LITTLE_ENDIAN)
diff --git a/gcc/config/rs6000/secureplt.h b/gcc/config/rs6000/secureplt.h
index b463463..77edf2a 100644
--- a/gcc/config/rs6000/secureplt.h
+++ b/gcc/config/rs6000/secureplt.h
@@ -18,3 +18,4 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #define CC1_SECURE_PLT_DEFAULT_SPEC "-msecure-plt"
+#define LINK_SECURE_PLT_DEFAULT_SPEC "--secure-plt"
diff --git a/gcc/config/rs6000/sysv4.h b/gcc/config/rs6000/sysv4.h
index 9917c2f..e6ce1e1 100644
--- a/gcc/config/rs6000/sysv4.h
+++ b/gcc/config/rs6000/sysv4.h
@@ -537,6 +537,9 @@ ENDIAN_SELECT(" -mbig", " -mlittle", DEFAULT_ASM_ENDIAN)
 #ifndef CC1_SECURE_PLT_DEFAULT_SPEC
 #define CC1_SECURE_PLT_DEFAULT_SPEC ""
 #endif
+#ifndef LINK_SECURE_PLT_DEFAULT_SPEC
+#define LINK_SECURE_PLT_DEFAULT_SPEC ""
+#endif
 
 /* Pass -G xxx to the compiler.  */
 #undef CC1_SPEC
@@ -586,7 +589,8 @@ ENDIAN_SELECT(" -mbig", " -mlittle", DEFAULT_ASM_ENDIAN)
 
 /* Override the default target of the linker.  */
 #define	LINK_TARGET_SPEC \
-  ENDIAN_SELECT("", " --oformat elf32-powerpcle", "")
+  ENDIAN_SELECT("", " --oformat elf32-powerpcle", "") \
+  "%{!mbss-plt: %{!msecure-plt: %(link_secure_plt_default)}}"
 
 /* Any specific OS flags.  */
 #define LINK_OS_SPEC "\
@@ -764,15 +768,18 @@ ENDIAN_SELECT(" -mbig", " -mlittle", DEFAULT_ASM_ENDIAN)
 
 #define GLIBC_DYNAMIC_LINKER "/lib/ld.so.1"
 #define UCLIBC_DYNAMIC_LINKER "/lib/ld-uClibc.so.0"
+#define MUSL_DYNAMIC_LINKER "/lib/ld-musl-powerpc.so.1"
 #if DEFAULT_LIBC == LIBC_UCLIBC
-#define CHOOSE_DYNAMIC_LINKER(G, U) "%{mglibc:" G ";:" U "}"
+#define CHOOSE_DYNAMIC_LINKER(G, U, M) "%{mglibc:" G ";:%{mmusl:" M ";:" U "}}"
+#elif DEFAULT_LIBC == LIBC_MUSL
+#define CHOOSE_DYNAMIC_LINKER(G, U, M) "%{mglibc:" G ";:%{muclibc:" U ";:" M "}}"
 #elif !defined (DEFAULT_LIBC) || DEFAULT_LIBC == LIBC_GLIBC
-#define CHOOSE_DYNAMIC_LINKER(G, U) "%{muclibc:" U ";:" G "}"
+#define CHOOSE_DYNAMIC_LINKER(G, U, M) "%{muclibc:" U ";:%{mmusl:" M ";:" G "}}"
 #else
 #error "Unsupported DEFAULT_LIBC"
 #endif
 #define GNU_USER_DYNAMIC_LINKER \
-  CHOOSE_DYNAMIC_LINKER (GLIBC_DYNAMIC_LINKER, UCLIBC_DYNAMIC_LINKER)
+  CHOOSE_DYNAMIC_LINKER (GLIBC_DYNAMIC_LINKER, UCLIBC_DYNAMIC_LINKER, MUSL_DYNAMIC_LINKER)
 
 #define LINK_OS_LINUX_SPEC "-m elf32ppclinux %{!shared: %{!static: \
   %{rdynamic:-export-dynamic} \
@@ -895,6 +902,7 @@ ncrtn.o%s"
   { "link_os_openbsd",		LINK_OS_OPENBSD_SPEC },			\
   { "link_os_default",		LINK_OS_DEFAULT_SPEC },			\
   { "cc1_secure_plt_default",	CC1_SECURE_PLT_DEFAULT_SPEC },		\
+  

[PATCH 8/13] sh musl support

2015-04-20 Thread Szabolcs Nagy
Set up dynamic linker name for sh.

gcc/Changelog:

2015-04-16  Gregor Richards  

* config/sh/linux.h (MUSL_DYNAMIC_LINKER): Define.
diff --git a/gcc/config/sh/linux.h b/gcc/config/sh/linux.h
index 0f5d614..16524da 100644
--- a/gcc/config/sh/linux.h
+++ b/gcc/config/sh/linux.h
@@ -43,7 +43,14 @@ along with GCC; see the file COPYING3.  If not see
 
 #define TARGET_ASM_FILE_END file_end_indicate_exec_stack
 
+#if TARGET_BIG_ENDIAN_DEFAULT /* BE */
+#define MUSL_DYNAMIC_LINKER_E "eb"
+#else
+#define MUSL_DYNAMIC_LINKER_E
+#endif
+
 #define GLIBC_DYNAMIC_LINKER "/lib/ld-linux.so.2"
+#define MUSL_DYNAMIC_LINKER "/lib/ld-musl-sh" MUSL_DYNAMIC_LINKER_E ".so.1"
 
 #undef SUBTARGET_LINK_EMUL_SUFFIX
 #define SUBTARGET_LINK_EMUL_SUFFIX "_linux"


[PATCH 9/13] x86 musl support

2015-04-20 Thread Szabolcs Nagy
Set up dynamic linker name for x86.

gcc/Changelog:

2015-04-16  Gregor Richards  

* config/i386/linux.h (MUSL_DYNAMIC_LINKER): Define.
* config/i386/linux64.h (MUSL_DYNAMIC_LINKER32): Define.
(MUSL_DYNAMIC_LINKER64, MUSL_DYNAMIC_LINKERX32): Define.
diff --git a/gcc/config/i386/linux.h b/gcc/config/i386/linux.h
index a100963..836a97a 100644
--- a/gcc/config/i386/linux.h
+++ b/gcc/config/i386/linux.h
@@ -21,3 +21,4 @@ along with GCC; see the file COPYING3.  If not see
 
 #define GNU_USER_LINK_EMULATION "elf_i386"
 #define GLIBC_DYNAMIC_LINKER "/lib/ld-linux.so.2"
+#define MUSL_DYNAMIC_LINKER "/lib/ld-musl-i386.so.1"
diff --git a/gcc/config/i386/linux64.h b/gcc/config/i386/linux64.h
index a27d3be..cba322b 100644
--- a/gcc/config/i386/linux64.h
+++ b/gcc/config/i386/linux64.h
@@ -30,3 +30,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define GLIBC_DYNAMIC_LINKER32 "/lib/ld-linux.so.2"
 #define GLIBC_DYNAMIC_LINKER64 "/lib64/ld-linux-x86-64.so.2"
 #define GLIBC_DYNAMIC_LINKERX32 "/libx32/ld-linux-x32.so.2"
+
+#define MUSL_DYNAMIC_LINKER32 "/lib/ld-musl-i386.so.1"
+#define MUSL_DYNAMIC_LINKER64 "/lib/ld-musl-x86_64.so.1"
+#define MUSL_DYNAMIC_LINKERX32 "/lib/ld-musl-x32.so.1"


[PATCH 10/13] fixincludes

2015-04-20 Thread Szabolcs Nagy
No fixincludes are needed for musl.

fixincludes/Changelog:

2015-04-16  Gregor Richards  

* mkfixinc.sh: Add *-musl* with no fixes.
diff --git a/fixincludes/mkfixinc.sh b/fixincludes/mkfixinc.sh
index 6653fed..0d96c8c 100755
--- a/fixincludes/mkfixinc.sh
+++ b/fixincludes/mkfixinc.sh
@@ -19,7 +19,8 @@ case $machine in
 powerpc-*-eabi*| \
 powerpc-*-rtems*   | \
 powerpcle-*-eabisim* | \
-powerpcle-*-eabi* )
+powerpcle-*-eabi* | \
+*-musl* )
 	#  IF there is no include fixing,
 	#  THEN create a no-op fixer and exit
 	(echo "#! /bin/sh" ; echo "exit 0" ) > ${target}


[PATCH 11/13] unwind fix for musl

2015-04-20 Thread Szabolcs Nagy
dl_iterate_phdr depends on USE_PT_GNU_EH_FRAME.

I think USE_PT_GNU_EH_FRAME could be enabled more generally (whenever
libc provides dl_iterate_phdr), but I only made a conservative change.


libgcc/Changelog:

2015-04-16  Gregor Richards  
Szabolcs Nagy  

* unwind-dw2-fde-dip.c (USE_PT_GNU_EH_FRAME): Define it on
Linux if target provides dl_iterate_phdr.
diff --git a/libgcc/unwind-dw2-fde-dip.c b/libgcc/unwind-dw2-fde-dip.c
index e1e566b..137dced 100644
--- a/libgcc/unwind-dw2-fde-dip.c
+++ b/libgcc/unwind-dw2-fde-dip.c
@@ -59,6 +59,12 @@
 
 #if !defined(inhibit_libc) && defined(HAVE_LD_EH_FRAME_HDR) \
 && defined(TARGET_DL_ITERATE_PHDR) \
+&& defined(__linux__)
+# define USE_PT_GNU_EH_FRAME
+#endif
+
+#if !defined(inhibit_libc) && defined(HAVE_LD_EH_FRAME_HDR) \
+&& defined(TARGET_DL_ITERATE_PHDR) \
 && (defined(__DragonFly__) || defined(__FreeBSD__))
 # define ElfW __ElfN
 # define USE_PT_GNU_EH_FRAME


[PATCH 12/13] libstdc++, libgfortran gthr workaround for musl

2015-04-20 Thread Szabolcs Nagy
libgcc/gthr-posix.h uses weak reference logic to determine if libpthread
is linked into the application or not.  This is broken unless there is
special workaround with libc internal knowledge and even then static
linking needs further manual link time workaround, so this was disabled
for os/generic in libstdc++v3 and for musl in libgfortran.

The change minimizes the impact on other setups, but I think the weak
ref logic should be disabled by default, it is never entirely correct.
Conforming code can crash on a glibc setup too:

$ cat a.cpp
#include 
void(*f)(void) = (void(*)(void))pthread_key_create;
int main(){}
$ g++ -static a.cpp -lpthread
$ ./a.out
Segmentation fault

I reported this previously at
https://gcc.gnu.org/ml/gcc/2014-11/msg00246.html


libgfortran/Changelog:

2015-04-16  Szabolcs Nagy  

* acinclude.m4 (GTHREAD_USE_WEAK): Define as 0 for *-*-musl*.
* configure: Regenerate.

libstdc++v3/Changelog:

2015-04-16  Szabolcs Nagy  

* config/os/generic/os_defines.h (_GLIBCXX_GTHREAD_USE_WEAK): Define.
* configure.host (os_include_dir): Set to "os/generic" for linux-musl*.
diff --git a/libgfortran/acinclude.m4 b/libgfortran/acinclude.m4
index ba890f9..30b8b1a6 100644
--- a/libgfortran/acinclude.m4
+++ b/libgfortran/acinclude.m4
@@ -100,7 +100,7 @@ void foo (void);
 	  [Define to 1 if the target supports #pragma weak])
   fi
   case "$host" in
-*-*-darwin* | *-*-hpux* | *-*-cygwin* | *-*-mingw* )
+*-*-darwin* | *-*-hpux* | *-*-cygwin* | *-*-mingw* | *-*-musl* )
   AC_DEFINE(GTHREAD_USE_WEAK, 0,
 		[Define to 0 if the target shouldn't use #pragma weak])
   ;;
diff --git a/libgfortran/configure b/libgfortran/configure
index e1592f7..07542e1 100755
--- a/libgfortran/configure
+++ b/libgfortran/configure
@@ -26447,7 +26447,7 @@ $as_echo "#define SUPPORTS_WEAK 1" >>confdefs.h
 
   fi
   case "$host" in
-*-*-darwin* | *-*-hpux* | *-*-cygwin* | *-*-mingw* )
+*-*-darwin* | *-*-hpux* | *-*-cygwin* | *-*-mingw* | *-*-musl* )
 
 $as_echo "#define GTHREAD_USE_WEAK 0" >>confdefs.h
 
diff --git a/libstdc++-v3/config/os/generic/os_defines.h b/libstdc++-v3/config/os/generic/os_defines.h
index 45bf52a..103ec0e 100644
--- a/libstdc++-v3/config/os/generic/os_defines.h
+++ b/libstdc++-v3/config/os/generic/os_defines.h
@@ -33,4 +33,9 @@
 // System-specific #define, typedefs, corrections, etc, go here.  This
 // file will come before all others.
 
+// Disable the weak reference logic in gthr.h for os/generic because it
+// is broken on every platform unless there is implementation specific
+// workaround in gthr-posix.h and at link-time for static linking.
+#define _GLIBCXX_GTHREAD_USE_WEAK 0
+
 #endif
diff --git a/libstdc++-v3/configure.host b/libstdc++-v3/configure.host
index 82ddc52..a349ce3 100644
--- a/libstdc++-v3/configure.host
+++ b/libstdc++-v3/configure.host
@@ -271,6 +271,9 @@ case "${host_os}" in
   freebsd*)
 os_include_dir="os/bsd/freebsd"
 ;;
+  linux-musl*)
+os_include_dir="os/generic"
+;;
   gnu* | linux* | kfreebsd*-gnu | knetbsd*-gnu)
 if [ "$uclibc" = "yes" ]; then
   os_include_dir="os/uclibc"


[PATCH 13/13] fix incompatible posix_memalign declaration on x86

2015-04-20 Thread Szabolcs Nagy
The posix_memalign declaration is incompatible with musl for C++,
because of the exception specification.  It also pollutes the
namespace and lacks protection against a potential macro definition
that is allowed by POSIX.  The fix avoids source level namespace
pollution but retains the dependency on the posix_memalign extern
libc symbol.

The fix is ugly, but it is not possible to correctly redeclare a
libc function in a public gcc header for C++.


gcc/Changelog:

2015-04-16  Szabolcs Nagy  

* config/i386/pmm_malloc.h (posix_memalign): Renamed to ...
(__gcc_posix_memalign): This.  Use posix_memalign as extern
symbol only.
diff --git a/gcc/config/i386/pmm_malloc.h b/gcc/config/i386/pmm_malloc.h
index 901001b..321fcd3 100644
--- a/gcc/config/i386/pmm_malloc.h
+++ b/gcc/config/i386/pmm_malloc.h
@@ -27,12 +27,13 @@
 #include 
 
 /* We can't depend on  since the prototype of posix_memalign
-   may not be visible.  */
+   may not be visible and we can't pollute the namespace either.  */
 #ifndef __cplusplus
-extern int posix_memalign (void **, size_t, size_t);
+extern int __gcc_posix_memalign (void **, size_t, size_t)
 #else
-extern "C" int posix_memalign (void **, size_t, size_t) throw ();
+extern "C" int __gcc_posix_memalign (void **, size_t, size_t) throw ()
 #endif
+__asm__("posix_memalign");
 
 static __inline void *
 _mm_malloc (size_t size, size_t alignment)
@@ -42,7 +43,7 @@ _mm_malloc (size_t size, size_t alignment)
 return malloc (size);
   if (alignment == 2 || (sizeof (void *) == 8 && alignment == 4))
 alignment = sizeof (void *);
-  if (posix_memalign (&ptr, alignment, size) == 0)
+  if (__gcc_posix_memalign (&ptr, alignment, size) == 0)
 return ptr;
   else
 return NULL;


Re: [patch,wwwdocs] Add gcc-5 caveats for avr.

2015-04-20 Thread Gerald Pfeifer

Hi Johann,

On Mon, 20 Apr 2015, Georg-Johann Lay wrote:

Okay to install?


+The AVR port uses a new scheme to describe supported devices:
+For each supported device the compiler provides a device-specific
+http://gcc.gnu.org/onlinedocs/gcc/Spec-Files.html";>spec 
file.
+If the compiler is used together with AVR-LibC, this requires at
+least GCC 5.2 and a version of AVR-LibC which implements
+http://savannah.nongnu.org/bugs/?44574";#44574.

Can you please make the two links https-links?  (Especially the
one to gcc.gnu.org actually redirects.)

Just using "#44574" for a reference, may that be a little confusing,
or is it sufficiently clear to AVR users?

+  A new command option -nodevicelib has been added.

"command-line option"

+If this option is turned on the compiler won't link against AVR-LibC's
+device-specific library libdevice.a by omitting
+-ldevice from the linker's command line.

How about making this "...-nodevicelib prevents the compiler
from linking against"?

+If the compiler had not been
+http://gcc.gnu.org/install/configure.html";>configured
+to be used with AVR-LibC, the compiler will not link against that
+library and the option has no effect.

"was not" (or "is") instead of "had not", and can you please use
https here as well?

Though, really, could this be just simplified to "If the compiler is
not configured for use with AVR-LibC to begin with, this option has 
no effect"?



Your patch is fine with the above changes or considering them and
deciding not go for one or the other.

Gerald


Re: Handle redirection blocks with clobbers

2015-04-20 Thread Jeff Law

On 04/20/2015 11:19 AM, Jakub Jelinek wrote:

On Mon, Apr 20, 2015 at 11:14:03AM -0600, Jeff Law wrote:

while (!gsi_end_p (gsi)
 && (gimple_code (gsi_stmt (gsi)) == GIMPLE_LABEL
 || is_gimple_debug (gsi_stmt (gsi))
-|| gimple_nop_p (gsi_stmt (gsi
+|| gimple_nop_p (gsi_stmt (gsi))
+|| (gimple_code (gsi_stmt (gsi)) == GIMPLE_ASSIGN


Why this line?  There is no need to test for GIMPLE_ASSIGN
(and canonical test for that would be is_gimple_assign (gsi_stmt (gsi))
anyway),


+&& gimple_clobber_p (gsi_stmt (gsi)


gimple_clobber_p checks for that too.

Build spinning with the redundant check removed...

jeff



RE: [PATCH][MIPS] Enable load-load/store-store bonding

2015-04-20 Thread Matthew Fortune
Sameera Deshpande  writes:
> Gentle reminder!

Thanks Sameera. Just a couple of comments inline below and a question
for Catherine at the end.

> - Thanks and regards,
>Sameera D.
> 
> On Monday 30 March 2015 04:58 PM, sameera wrote:
> > Hi!
> >
> > Sorry for delay in sending this patch for review.
> > Please find attached updated patch.
> >
> > In P5600, 2 consecutive loads/stores of same type which access
> > contiguous memory locations are bonded together by instruction issue
> > unit to dispatch single load/store instruction which accesses both
> locations. This allows 2X improvement in memory intensive code. This
> optimization can be performed for LH, SH, LW, SW, LWC, SWC, LDC, SDC
> instructions.
> >
> > This patch adds peephole2 patterns to identify such loads/stores, and
> > put them in parallel, so that the scheduler will not split it -
> thereby guaranteeing h/w level load/store bonding.
> >
> > The patch is tested with dejagnu for correctness, and tested on
> hardware for performance.
> > Ok for trunk?
> >
> > Changelog:
> > gcc/
> >  * config/mips/mips.md (JOIN_MODE): New mode iterator.
> >  (join2_load_Store): New pattern.
> >  (join2_loadhi): Likewise.
> >  (define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-
> mode
> >  load-load and store-stores.
> >  * config/mips/mips.opt (mload-store-pairs): New option.
> >  (TARGET_LOAD_STORE_PAIRS): New macro.
> >  *config/mips/mips.h (ENABLE_LD_ST_PAIRS): Likewise.
> >  *config/mips/mips-protos.h (mips_load_store_bonding_p): New
> prototype.
> >  *config/mips/mips.c(mips_load_store_bonding_p): New function.

I don't know if this has been corrupted by mail clients but a single
space after '*' and a space before '('. 

>diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
>index b48e04f..244eb8d 100644
>--- a/gcc/config/mips/mips-protos.h
>+++ b/gcc/config/mips/mips-protos.h
>@@ -360,6 +360,7 @@ extern bool mips_epilogue_uses (unsigned int);
> extern void mips_final_prescan_insn (rtx_insn *, rtx *, int);
> extern int mips_trampoline_code_size (void);
> extern void mips_function_profiler (FILE *);
>+extern bool mips_load_store_bonding_p (rtx *, machine_mode, bool);
> 
> typedef rtx (*mulsidi3_gen_fn) (rtx, rtx, rtx);
> #ifdef RTX_CODE
>diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
>index 1733457..85f0591 100644
>--- a/gcc/config/mips/mips.c
>+++ b/gcc/config/mips/mips.c
>@@ -18241,6 +18241,64 @@ umips_load_store_pair_p_1 (bool load_p, bool swap_p,
>   return true;
> }
> 
>+bool
>+mips_load_store_bonding_p (rtx *operands, enum machine_mode mode, bool load_p)

Remove enum from machine_mode.

>+{
>+  rtx reg1, reg2, mem1, mem2, base1, base2;
>+  enum reg_class rc1, rc2;
>+  HOST_WIDE_INT offset1, offset2;
>+
>+  if (load_p)
>+{
>+  reg1 = operands[0];
>+  reg2 = operands[2];
>+  mem1 = operands[1];
>+  mem2 = operands[3];
>+}
>+  else
>+{
>+  reg1 = operands[1];
>+  reg2 = operands[3];
>+  mem1 = operands[0];
>+  mem2 = operands[2];
>+}
>+
>+  if (mips_address_insns (XEXP (mem1, 0), mode, false) == 0
>+  || mips_address_insns (XEXP (mem2, 0), mode, false) == 0)
>+return false;
>+
>+  mips_split_plus (XEXP (mem1, 0), &base1, &offset1);
>+  mips_split_plus (XEXP (mem2, 0), &base2, &offset2);
>+
>+  /* Base regs do not match.  */
>+  if (!REG_P (base1) || !rtx_equal_p (base1, base2))
>+return false;
>+
>+  /* Either of the loads is clobbering base register.  */
>+  if (load_p
>+  && (REGNO (reg1) == REGNO (base1)
>+|| (REGNO (reg2) == REGNO (base1
>+return false;

Can you add a comment saying that this case does not get bonded by
any known hardware even though it could be valid to bond them if it
is the second load that clobbers the base.

>+  /* Loading in same registers.  */
>+  if (load_p
>+  && REGNO (reg1) == REGNO (reg2))
>+return false;
>+
>+  /* The loads/stores are not of same type.  */
>+  rc1 = REGNO_REG_CLASS (REGNO (reg1));
>+  rc2 = REGNO_REG_CLASS (REGNO (reg2));
>+  if (rc1 != rc2
>+  && !reg_class_subset_p (rc1, rc2)
>+  && !reg_class_subset_p (rc2, rc1))
>+return false;
>+
>+  if (abs (offset1 - offset2) != GET_MODE_SIZE (mode))
>+return false;
>+
>+  return true;
>+}
>+
> /* OPERANDS describes the operands to a pair of SETs, in the order
>dest1, src1, dest2, src2.  Return true if the operands can be used
>in an LWP or SWP instruction; LOAD_P says which.  */
>diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
>index ec69ed5..1bd0dae 100644
>--- a/gcc/config/mips/mips.h
>+++ b/gcc/config/mips/mips.h
>@@ -3147,3 +3147,7 @@ extern GTY(()) struct target_globals *mips16_globals;
> #define STANDARD_STARTFILE_PREFIX_1 "/lib64/"
> #define STANDARD_STARTFILE_PREFIX_2 "/usr/lib64/"
> #endif
>+
>+#define ENABLE_LD_ST_PAIRS \
>+  (TARGET_LOAD_STORE_PAIRS && TUNE_P5600 \
>+   && !TARGET_MICROMIPS && !TARGET_FIX_24K)

I've already forgotten

  1   2   >