[PATCH/RFA] Do not set MULTILIB_DEFAULTS for arm*-*-linux-gnueabi* targets

2013-08-19 Thread Matthew Gretton-Dann
All,

The attached patch removes the setting of MULTILIB_DEFAULTS for
arm*-*-linux-gnueabi* targets.

The current setting of MULTILIB_DEFAULTS includes mfloat-abi=hard,
which for arm*-*-linux-gnueabi is not true.  This makes generating a
hard-float multilib impossible in this configuration.

In an off-list conversation with Ramana we decided that
MULTILIB_DEFAULTS should not be set for these targets, as we set
MULTILIB_OPTIONS to an empty string (as per the docs for
MULTILIB_DEFAULTS).

I added comments by the definition of MULTILIB_OPTIONS and
MULTILIB_DEFAULTS to make it clear that the two options are related.

Tested cross arm-none-linux-gnueabi.

OK for trunk?

Thanks,

Matt

gcc/ChangeLog:
2013-08-19  Matthew Gretton-Dann  

* config/arm/linux-elf.h (MULTILIB_DEFAULTS): Remove definition.
* config/arm/t-linux-eabi: (MULTILIB_OPTIONS): Document association
with MULTILIB_DEFAULTS.

-- 
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org
diff --git a/gcc/config/arm/linux-elf.h b/gcc/config/arm/linux-elf.h
index 488efa4..475e220 100644
--- a/gcc/config/arm/linux-elf.h
+++ b/gcc/config/arm/linux-elf.h
@@ -44,9 +44,9 @@
 
 #define SUBTARGET_EXTRA_LINK_SPEC " -m " TARGET_LINKER_EMULATION " -p"
 
+/* We do not have any MULTILIB_OPTIONS specified, so there are no
+   MULTILIB_DEFAULTS.  */
 #undef  MULTILIB_DEFAULTS
-#define MULTILIB_DEFAULTS \
-   { "marm", "mlittle-endian", "mfloat-abi=hard", "mno-thumb-interwork" }
 
 /* Now we define the strings used to build the spec file.  */
 #undef  LIB_SPEC
diff --git a/gcc/config/arm/t-linux-eabi b/gcc/config/arm/t-linux-eabi
index 2f2f8ff..07e32b3 100644
--- a/gcc/config/arm/t-linux-eabi
+++ b/gcc/config/arm/t-linux-eabi
@@ -18,6 +18,8 @@
 
 # We do not build a Thumb multilib for Linux because the definition of
 # CLEAR_INSN_CACHE in linux-gas.h does not work in Thumb mode.
+# If you set MULTILIB_OPTIONS to a non-empty value you should also set
+# MULTILIB_DEFAULTS in linux-elf.h.
 MULTILIB_OPTIONS   =
 MULTILIB_DIRNAMES  =
 


Re: [RFC Patch, Aarch64] : Macros for profile code generation to enable gprof support

2013-08-12 Thread Matthew Gretton-Dann
Marcus,

On 9 August 2013 18:17, Marcus Shawcroft  wrote:
> On 03/08/13 19:01, Venkataramanan Kumar wrote:
>
>
>> 2013-08-02  Venkataramanan Kumar  
>>
>>   * config/aarch64/aarch64.h (MCOUNT_NAME): Define.
>> (NO_PROFILE_COUNTERS): Likewise.
>> (PROFILE_HOOK): Likewise.
>> (FUNCTION_PROFILER): Likewise.
>>  *  config/aarch64/aarch64.c (aarch64_function_profiler): Remove.
>> .
>>
>> regards,
>> Venkat.
>>
>
> Hi Venkat,
>
> Looking at the various other ports it looks that the majority choose to use
> FUNCTION_PROFILER_HOOK rather than PROFILE_HOOK.
>
> Using PROFILE_HOOK to inject a regular call to to _mcount() means that all
> arguments passed in registers in every function will be spilled and reloaded
> because the _mcount call will kill the caller save registers.
>
> Using the FUNCTION_PROFILER_HOOK and taking care not to kill the caller save
> registers would be less invasive.  The LR argument to _mcount would need to
> be passed in a temporary register, say x9 and _mcount would also need to
> ensure caller save registers are saved and restored.
>
> The latter seems to be a better option to me, is there compelling reason to
> choose PROFILE_HOOK over FUNCTION_PROFILER_HOOK ??

(I think you mean FUNCTION_PROFILER rather than FUNCTION_PROFILER_HOOK
in all the above.)

Using either PROFILE_HOOK or FUNCTION_PROFILER results in a call chain
that looks like the following (assuming the C Library is glibc):

 Function -> _mcount -> _mcount_internal.

Where _mcount_internal is the C function that does the real work and
is provided in glibc.  Importantly this means that _mcount_internal
follows the normal ABI - so we have to save the caller saved registers
somewhere.

Using FUNCTION_PROFILER requires us to write assembler which saves and
restores all caller saved registers every time it is called, and
requires (as you say) a special ABI.  This means _mcount ends up being
a piece of assembly that saves all caller-saved registers (i.e.
parameter-passing & temporary registers) and then makes the call to
_mcount internal before restoring everything on _mcount's return.

Using PROFILE_HOOK will cause the compiler to do all the heavy
lifting, and it will do the minimum required (for example with a
function with one parameter it will only save and restore x0).
_mcount in this case can be a simple function that sets up some
parameters and calls _mcount_internal (or even _mcount could just
alias _mcount_internal).

As to which of PROFILE_HOOK or FUNCTION_PROFILER are "the right way"
(TM) - I don't know - the documentation isn't very clear at all.
PROFILE_HOOK was introduced to support profiling for AIX 4.3.
http://gcc.gnu.org/ml/gcc-patches/2000-12/msg00580.html is the initial
patch, with a reworked patch here:
http://gcc.gnu.org/ml/gcc-patches/2001-02/msg00112.html. The final
commit happening on 2001-02-05.  The patch was introduced because it
was impossible to make FUNCTION_PROFILER work for AIX 4.3 and so a new
hook that worked earlier in the compiler was needed.  There doesn't
seem to have been a discussion about preferring one form over the
other.

In conclusion - I prefer the PROFILE_HOOK method because it makes the
compiler do all the work, and results in less impact on stack usage
and performance.  FUNCTION_PROFILER may impact the code generated by
the compiler less and produce a smaller overall image - but I'm not
sure that's more beneficial.

Thanks,

Matt


-- 
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org


[linaro/gcc-4_8-branch] Backports from trunk and merge from gcc-4_8-branch.

2013-05-14 Thread Matthew Gretton-Dann

All,

I have just backported the following revisions from trunk to 
linaro/gcc-4_8-branch: 197838, 198191, 198490-198496, 198575-198575, and 198677.


I have also merged the gcc-4_8-branch into linaro/gcc-4_8-branch up to 
revision 198615.


Thanks,

Matt

--
Matthew Gretton-Dann
Toolchain Working Group, Linaro


[linaro/gcc-4_8-branch] Selective backports from trunk

2013-05-02 Thread Matthew Gretton-Dann

All,

I have just backported the following revisions from trunk to the 
linaro/gcc-4_8-branch:


196795-196797, 196957, 197489-197491, 197513, 197517-197523, 197526-197528, 
197530, 197642, 197770, 197807, 197921,  197925, 197965, 198004, 
198019-198020, 198029-198030, 198090, 198136-198137, 198142, 198176, 198298, 
198302-198306, 198316, 198394, 198396-198400, 198402-198404, 198406, 198412, 
198424-198425, 198443


Thanks,

Matt

--
Matthew Gretton-Dann
Toolchain Working Group, Linaro


Re: [PATCH, ARM, iWMMXT] PR target/54338 - Include IWMMXT_GR_REGS in ALL_REGS

2013-04-30 Thread Matthew Gretton-Dann

Hi,

On 08/04/13 06:28, Xinyu Qi wrote:

At 2013-04-02 17:50:03,"Ramana Radhakrishnan"  wrote:

On 04/02/13 10:40, Xinyu Qi wrote:

Hi,
According to Vladimir Makarov's analysis, the root cause of PR target/54338 
is that ALL_REGS doesn't contain IWMMXT_GR_REGS in REG_CLASS_CONTENTS.
It seems there is no reason to exclude the IWMMXT_GR_REGS from ALL_REGS as 
IWMMXT_GR_REGS are the real registers.
This patch simply makes ALL_REGS include IWMMXT_GR_REGS to fix this PR.
Since the test case gcc.target/arm/mmx-2.c would fail for the same reason 
and become pass with this fix, no extra test case need to be add.
Pass arm.exp test. Patch attached.


Testing just with arm.exp is not enough.

Ok if no regressions running the entire regression testsuite for C and
C++ for arm*-*-*eabi with an iwmmxt configuration.


Hi Ramana,

   I run the full dejagnu test with -march=iwmmxt2 specified in the whole 
progress for this patch.
   No regression but a lot of new pass found in the test.
   Please help to commit it.

ChangeLog

2013-04-02  Xinyu Qi  

PR target/54338
* config/arm/arm.h (REG_CLASS_CONTENTS): Include IWMMXT_GR_REGS in 
ALL_REGS.


It looks to me as if this should also be applied to the 4.8 branch - Xinyu 
do you agree?


If so is the backport OK for 4.8?

Thanks,

Matt


--
Matthew Gretton-Dann
Toolchain Working Group, Linaro


[linaro/gcc-4_8-branch] Merge from upstream gcc-4_8-branch and backports from trunk

2013-04-08 Thread Matthew Gretton-Dann

Hi,

I have just merge upstream gcc-4_8-branch into linaro/gcc-4_8-branch, up to 
r197294.  (The merge is r197598.)


I have also backported the following trunk revisions into the 
linaro/gcc-4_8-branch: 196856, 196858, 196876, 197046, 197051, 197052, 
197153, 197207, 197341, 197342, and 197346. (Backports are revisions 
197599:197609).


Thanks,

Matt

--
Matthew Gretton-Dann
Toolchain Working Group, Linaro


Re: [PATCH][ARM] use vsel instruction for floating point conditional moves in ARMv8

2013-04-03 Thread Matthew Gretton-Dann
Would it be possible for this patch and the others Kyrylo has recently done 
for the new ARMv8 AArch32 instructions to be backported to 4.8?


In particular I'm refering to:

http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00994.html (trunk r197052)
http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00874.html (trunk r197051)
http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00873.html (trunk r197046)
http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00652.html (trunk r197040 and 
197041)


Thanks,

Matt


On 25/03/13 15:21, Kyrylo Tkachov wrote:

-Original Message-
From: Ramana Radhakrishnan
Sent: 18 February 2013 11:51
To: Kyrylo Tkachov
Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
Subject: Re: [PATCH][ARM] use vsel instruction for floating point
conditional moves in ARMv8

On 01/30/13 09:24, Kyrylo Tkachov wrote:

Hi all,
This patch uses the new ARMv8 AArch32 vsel instruction to implement
conditional moves of floating point numbers.
For example, an instruction of the form:
vsel.f32  s0, s1, s2
means
s0 := cond ? s1 : s2

This can be useful, among other places, in Thumb2 because it doesn't

require

an enclosing IT block.
A small catch: The condition code used in vsel can only be one of

{GE, GT,

EQ, VS}. If we want to use their negations {LT, LE, NE, VC} we just

flip the

source operands.
A new predicate is introduced that checks that the comparison yields

an ARM

condition code in the set {GE, GT, EQ, VS, LT, LE, NE, VC}.

New compilation tests are added. They pass on a model and no new

regressions

on arm-none-eabi with qemu.




Ok for trunk?


Ok for stage1 4.9.


Hi Ramana,

Thanks for the review.
Re-tested on arm-none-eabi against current trunk and applied as r197052.



Ramana


Thanks,
Kyrill




Thanks,
Kyrill

gcc/ChangeLog

2013-01-30  Kyrylo Tkachov  

* config/arm/arm.md (f_sels, f_seld): New types.
(*cmov): New pattern.
* config/arm/predicates.md (arm_vsel_comparison_operator): New
predicate.


gcc/testsuite/ChangeLog

2013-01-30  Kyrylo Tkachov  

* gcc.target/arm/vseleqdf.c: New test.
* gcc.target/arm/vseleqsf.c: Likewise.
* gcc.target/arm/vselgedf.c: Likewise.
* gcc.target/arm/vselgesf.c: Likewise.
* gcc.target/arm/vselgtdf.c: Likewise.
* gcc.target/arm/vselgtsf.c: Likewise.
* gcc.target/arm/vselledf.c: Likewise.
* gcc.target/arm/vsellesf.c: Likewise.
* gcc.target/arm/vselltdf.c: Likewise.
* gcc.target/arm/vselltsf.c: Likewise.
* gcc.target/arm/vselnedf.c: Likewise.
* gcc.target/arm/vselnesf.c: Likewise.
* gcc.target/arm/vselvcdf.c: Likewise.
* gcc.target/arm/vselvcsf.c: Likewise.
* gcc.target/arm/vselvsdf.c: Likewise.
* gcc.target/arm/vselvssf.c: Likewise.









--
Matthew Gretton-Dann
Toolchain Working Group, Linaro


Re: [PATCH] Fix -Wformat-security warning in arm.c

2013-04-03 Thread Matthew Gretton-Dann

Is it okay for this patch to be backported to the 4.8 branch?

Thanks,

Matt

On 25/03/13 18:34, Roland McGrath wrote:

This fixes a gratuitous warning.


Thanks,
Roland


gcc/
2013-03-25  Roland McGrath  

* config/arm/arm.c (arm_print_operand: case 'w'): Use fputs rather
than fprintf with a non-constant, non-format string.

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -17997,7 +17997,7 @@ arm_print_operand (FILE *stream, rtx x, int code)
  "wC12",  "wC13",  "wC14",  "wC15"
};

- fprintf (stream, wc_reg_names [INTVAL (x)]);
+ fputs (wc_reg_names [INTVAL (x)], stream);
    }
return;




--
Matthew Gretton-Dann
Toolchain Working Group, Linaro


linaro/gcc-4_8-branch created and documented

2013-02-19 Thread Matthew Gretton-Dann

All,

I have just created a distribution branch: 'linaro/gcc-4_8-branch'.  I have 
committed the attached patch to the wwwdocs CVS repository to document this 
branch (and future Linaro branches).


The branch will track the equivalent FSF release branch (once created) and 
also accept backports of patches accepted for trunk which are of interest to 
the ARM and AArch64 backends.  Anyone from Linaro with write access to the 
GCC Subversion repository can commit and approve patches for this branch.


The branch has been created early as there are some Stage 1 pending patches 
which I want to backport early to this branch.


Thanks,

Matt

--
Matthew Gretton-Dann
Toolchain Working Group, Linaro
? htdocs/.svn.html.swp
? htdocs/svn.html~
Index: htdocs/svn.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/svn.html,v
retrieving revision 1.179
diff -u -p -r1.179 svn.html
--- htdocs/svn.html 12 Feb 2013 15:53:34 -  1.179
+++ htdocs/svn.html 19 Feb 2013 21:20:11 -
@@ -621,6 +621,11 @@ be prefixed with the initials of the dis
   The branch is maintained by Michael Meissner,
   mailto:meiss...@linux.vnet.ibm.com";>meiss...@linux.vnet.ibm.com.
 
+  linaro/gcc-x_y-branch
+  Linaro compilers based on GCC x.y releases.  These branches
+  only accept backports of patches which have been accepted to trunk.  This
+  family of branches is maintained by personnel from Linaro.
+
   redhat/gcc-3_2-branch
   Red Hat GNU/Linux compilers based on GCC 3.2.x.
 


[RFA] Fix DEBUG_RELOAD support

2013-02-02 Thread Matthew Gretton-Dann
Hi,

Whilst debugging a reload issue I tried enabling DEBUG_RELOAD, only to
find that this caused GCC to fail to build.  I think this failure was
introduced during the change to vec being a C++ type, as DEBUG_RELOAD
is normally forced off.

The attached patch fixes the build issue.  Tested by building a cross
arm-none-linux-gnueabi compiler with DEBUG_RELOAD forced on.

OK for trunk?

gcc/ChangeLog:

2013-02-02  Matthew Gretton-Dann  

* gcc/reload.c (subst_reloads): Fix DEBUG_RELOAD build issue.


--
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org
diff --git a/gcc/reload.c b/gcc/reload.c
index 889a6cc..2546c1b 100644
--- a/gcc/reload.c
+++ b/gcc/reload.c
@@ -6313,14 +6313,14 @@ subst_reloads (rtx insn)
  for (check_regno = 0; check_regno < max_regno; check_regno++)
{
 #define CHECK_MODF(ARRAY)  \
- gcc_assert (!reg_equivs[check_regno].ARRAY\
+ gcc_assert (!(*reg_equivs)[check_regno].ARRAY \
  || !loc_mentioned_in_p (r->where, \
- 
reg_equivs[check_regno).ARRAY)]
+ 
(*reg_equivs)[check_regno].ARRAY))
 
- CHECK_MODF (equiv_constant);
- CHECK_MODF (equiv_memory_loc);
- CHECK_MODF (equiv_address);
- CHECK_MODF (equiv_mem);
+ CHECK_MODF (constant);
+ CHECK_MODF (memory_loc);
+ CHECK_MODF (address);
+ CHECK_MODF (mem);
 #undef CHECK_MODF
}
 #endif /* DEBUG_RELOAD */


[RFA/ARM/4.7] Fix PR54974: Thumb literal pools don't handle PC rounding

2013-01-04 Thread Matthew Gretton-Dann

On 29/11/12 14:42, Matthew Gretton-Dann wrote:

On 24 November 2012 00:27, Ramana Radhakrishnan
 wrote:

On Wed, Nov 21, 2012 at 7:59 PM, Matthew Gretton-Dann
 wrote:

[snip]

The fix is to decrease the pool_range of all insns by 2 when generating
Thumb code.  There is no need to change neg_pool_range values as rounding
down here will reduce the distance of the literal pool.


A comment about this fact around thumb2_pool_range would be appropriate.


[snip]


Tested arm-none-linux-gnueabi cross, and with the testcase attached to the
PR.  No added testcase in the patch as this code is sensitive to other code
generation and so it is not easy to generate a testcase which will reliably
test this condition.

OK for trunk, 4.7, and 4.6?



Ok for trunk today - please wait a few days before backporting into
4.6 and 4.7 to see what the fallout is like . Watch out for any
fallout with the auto-testers.


No fallout has been seen with the auto-testers.


The attached is what was actually committed as revision 193930
(original patch + requested comment).


The attached patch is a backport of the trunk patch to 4.7.

Cross tested arm-none-linux-gnueabi with QEMU

OK for 4.7?

Thanks,

Matt

2013-01-04  Matthew Gretton-Dann  

Backport from mainline.
2012-11-29  Matthew Gretton-Dann  

PR target/54974
* config/arm/arm.md (thumb2_pool_range, pool_range): Add
comment on Thumb pool ranges.
(thumb1_extendhisi2): Reduce Thumb pool range.
(arm_movdi): Likewise.
(thumb1_movdi_insn): Likewise.
(thumb1_movsi_insn): Likewise.
(pic_load_addr_unified): Likewise.
(pic_load_addr_32bit): Likewise.
(pic_load_addr_thumb1): Likewise.
(thumb1_movhf): Likewise.
(arm_movsf_soft_insn): Likewise.
(thumb1_movsf_soft_insn): Likewise.
(movdf_soft_insn): Likewise.
(thumb1_movdf_soft_insn): Likewise.
* config/arm/neon.md (*neon_mov): Likewise.
(*neon_mov): Likwise.
* config/arm/thumb2.md: (*thumb2_movsi_insn): Likewise.
(*thumb2_movhi_insn): Likewise.
(*thumb2_extendqisi_v6): Likewise.
(*thumb2_zero_extendqisi_v6): Likewise.
(*thumb2_zero_extendqisi2_v6): Likewise.
* config/arm/vfp.md: (*thumb2_movsi_vfp): Likewise.
(*movdi_vfp): Likewise.
(*movdi_vfp_cortexa8): Likewise.
(*thumb2_movsf_vfp): Likewise.
(*thumb2_movdf_vfp): Likewise.

--
Matthew Gretton-Dann
Toolchain Working Group, Linaro
Index: gcc/config/arm/arm.md
===
--- gcc/config/arm/arm.md   (revision 194852)
+++ gcc/config/arm/arm.md   (working copy)
@@ -256,6 +256,9 @@ (define_attr "insn_enabled" "no,yes"
 ; POOL_RANGE is how far away from a constant pool entry that this insn
 ; can be placed.  If the distance is zero, then this insn will never
 ; reference the pool.
+; Note that for Thumb constant pools the PC value is rounded down to the
+; nearest multiple of four.  Therefore, THUMB2_POOL_RANGE (and POOL_RANGE for
+; Thumb insns) should be set to  - 2.
 ; NEG_POOL_RANGE is nonzero for insns that can reference a constant pool entry
 ; before its address.  It is set to  - (8 + ).
 (define_attr "arm_pool_range" "" (const_int 0))
@@ -4833,7 +4836,7 @@ (define_insn "thumb1_extendhisi2"
(const_int 2) (const_int 4))
  (const_int 4)])
(set_attr "type" "alu_shift,load_byte")
-   (set_attr "pool_range" "*,1020")]
+   (set_attr "pool_range" "*,1018")]
 )
 
 ;; This pattern will only be used when ldsh is not available
@@ -5239,7 +5242,7 @@ (define_insn "*arm_movdi"
(set_attr "type" "*,*,*,load2,store2")
(set_attr "arm_pool_range" "*,*,*,1020,*")
(set_attr "arm_neg_pool_range" "*,*,*,1004,*")
-   (set_attr "thumb2_pool_range" "*,*,*,4096,*")
+   (set_attr "thumb2_pool_range" "*,*,*,4094,*")
(set_attr "thumb2_neg_pool_range" "*,*,*,0,*")]
 )
 
@@ -5379,7 +5382,7 @@ (define_insn "*thumb1_movdi_insn"
   [(set_attr "length" "4,4,6,2,2,6,4,4")
(set_attr "type" "*,*,*,load2,store2,load2,store2,*")
(set_attr "insn" "*,mov,*,*,*,*,*,mov")
-   (set_attr "pool_range" "*,*,*,*,*,1020,*,*")]
+   (set_attr "pool_range" "*,*,*,*,*,1018,*,*")]
 )
 
 (define_expand "movsi"
@@ -5539,7 +5542,7 @@ (define_insn "*thumb1_movsi_insn"
mov\\t%0, %1"
   [(set_attr "length" "2,2,4,4,2,2,2,2,2")
(set_attr "type" "*,*,*,*,load1,store1,load1,store1,*")
-   (set_attr "pool_range" "*,*,*,*,*,*,1020,

Re: [RFA/4.7/ARM] Backport arm-*-linux-gnueabihf triplet support to 4.7

2012-12-21 Thread Matthew Gretton-Dann
On 17 December 2012 14:28, Richard Earnshaw  wrote:
> On 21/11/12 11:48, Matthew Gretton-Dann wrote:
>>
>> On 21 November 2012 00:05, Matthias Klose  wrote:
>>>
>>>
>>> looks fine, except one missing chunk from my original patch. maybe left
>>> out
>>> intentionally.
>>>
>>>Matthias
>>>
>>> Index: b/src/gcc/config.gcc
>>> ===
>>> --- a/src/gcc/config.gcc
>>> +++ b/src/gcc/config.gcc
>>> @@ -934,7 +934,7 @@
>>>  tm_file="dbxelf.h elfos.h arm/unknown-elf.h arm/elf.h
>>> arm/linux-gas.h
>>> arm/uclinux-elf.h glibc-stdint.h"
>>>  tmake_file="arm/t-arm arm/t-arm-elf"
>>>  case ${target} in
>>> -   arm*-*-uclinux*eabi)
>>> +   arm*-*-uclinux*eabi*)
>>>  tm_file="$tm_file arm/bpabi.h arm/uclinux-eabi.h"
>>>  tmake_file="$tmake_file arm/t-bpabi"
>>>  # The BPABI long long divmod functions return a 128-bit
>>> value in
>>
>>
>> This change isn't in your commit to trunk of 2012-10-15 which is what
>> I backported.  This is because Richard Earnshaw effectively made this
>> change when he removed FPA support (SVN rev 188510).
>>
>> I'm happy to do a patch that makes this change - but I think it should
>> be a separate patch to this backport one.
>>
>
> I would have thought this ought to be done for consistency.

Committed above as obvious (after discussions off-list with Richard
Earnshaw).  Attached patch shows what was actually committed.

Thanks,

Matt


--
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org
Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 194655)
+++ gcc/config.gcc  (working copy)
@@ -882,7 +882,7 @@
tm_file="dbxelf.h elfos.h arm/unknown-elf.h arm/elf.h arm/linux-gas.h 
arm/uclinux-elf.h glibc-stdint.h"
tmake_file="arm/t-arm arm/t-arm-elf"
case ${target} in
-   arm*-*-uclinux*eabi)
+   arm*-*-uclinux*eabi*)
tm_file="$tm_file arm/bpabi.h arm/uclinux-eabi.h"
tmake_file="$tmake_file arm/t-bpabi"
        # The BPABI long long divmod functions return a 128-bit value in
Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 194655)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2012-12-21  Matthew Gretton-Dann  
+
+   * config.gcc: Match arm*-*-uclinux*eabi* for EABI uCLinux.
+
 2012-12-18  Matthew Gretton-Dann  
 
Backport from mainline


Re: [RFA/4.7/ARM] Backport arm-*-linux-gnueabihf triplet support to 4.7

2012-12-07 Thread Matthew Gretton-Dann
PING^2

On 29 November 2012 14:45, Matthew Gretton-Dann
 wrote:
> PING
>
> On 20 November 2012 20:34, Matthew Gretton-Dann
>  wrote:
>> All,
>>
>> This patch backports Matthais Klose's arm*-*-linux-gnueabihf triplet
>> support patch of 2012-10-15 to 4.7.
>>
>> The backport was not clean as 4.8 has obsoleted various arm*-*-*
>> triplets which are valid in 4.7.
>>
>> I have tested this cross with arm-none-linux-gnueabihf and
>> arm-none-linux-gnueabi.
>>
>> One question I do have having done this work - is there a canonical way to
>> test for the arm*-*-linux-gnueabi triplet (or variants)?  Various configure
>> and testsuite files test for this, but there doesn't seem to be a consistent
>> method.
>>
>> OK for 4.7?
>>
>> Thanks,
>>
>> Matt
>>
>> 2012-11-08  Matthew Gretton-Dann  
>>
>> Backport from mainline
>> 2012-10-15  Matthias Klose  
>>
>>     * config.gcc: Match arm*-*-linux-* for ARM Linux/GNU.
>> * doc/install.texi: Use arm-*-*linux-* instead of 
>> arm-*-*linux-gnueabi.
>>
>> gcc/ada/ChangeLog:
>> 2012-11-08  Matthew Gretton-Dann  
>>
>>     Backport from mainline.
>> 2012-10-15  Matthias Klose  
>>
>> * gcc-interface/Makefile.in: Match arm*-*-linux-*eabi* for
>> ARM Linux/GNU.
>>
>> gcc/testsuite/ChangeLog:
>> 2012-11-08  Matthew Gretton-Dann  
>>
>> Backport from mainline
>> 2012-10-15  Matthias Klose  
>>
>> * lib/target-supports.exp (check_profiling_available): Match
>> arm*-*-linux-* for ARM Linux/GNU.
>> * gfortran.dg/enum_10.f90: Likewise.
>> * gfortran.dg/enum_9.f90: Likewise.
>> * gcc.target/arm/synchronize.c: Likewise.
>> * g++.old-deja/g++.jason/enum6.C: Likewise.
>> * g++.old-deja/g++.law/enum9.C: Likewise.
>> * g++.old-deja/g++.other/enum4.C: Likewise.
>>
>> libgcc/ChangeLog:
>> 2012-11-08  Matthew Gretton-Dann  >
>> Backport from mainline.
>> 2012-10-15  Matthias Klose  
>>
>> * config.host: Match arm*-*-linux-* for ARM Linux/GNU.
>>
>> libjava/ChangeLog:
>> 2012-11-08  Matthew Gretton-Dann  
>>
>> Backport from mainline.
>> 2012-10-15  Matthias Klose  
>>
>> * configure.ac: Match arm*-*-linux-* for ARM Linux/GNU.
>> * configure: Regenerate.
>>
>> libstdc++-v3/ChangeLog:
>> 2012-11-08  Matthew Gretton-Dann  
>>
>> Backport from mainline
>> 2012-10-15  Matthias Klose  
>>
>> * configure.host: Match arm*-*-linux-* for ARM Linux/GNU.
>> * testsuite/20_util/make_signed/requirements/typedefs-2.cc: Likewise.
>> * testsuite/20_util/make_unsigned/requirements/typedefs-2.cc: 
>> Likewise.

--
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org


Re: [RFA/4.7/ARM] Backport arm-*-linux-gnueabihf triplet support to 4.7

2012-11-29 Thread Matthew Gretton-Dann
PING

On 20 November 2012 20:34, Matthew Gretton-Dann
 wrote:
> All,
>
> This patch backports Matthais Klose's arm*-*-linux-gnueabihf triplet
> support patch of 2012-10-15 to 4.7.
>
> The backport was not clean as 4.8 has obsoleted various arm*-*-*
> triplets which are valid in 4.7.
>
> I have tested this cross with arm-none-linux-gnueabihf and
> arm-none-linux-gnueabi.
>
> One question I do have having done this work - is there a canonical way to
> test for the arm*-*-linux-gnueabi triplet (or variants)?  Various configure
> and testsuite files test for this, but there doesn't seem to be a consistent
> method.
>
> OK for 4.7?
>
> Thanks,
>
> Matt
>
> 2012-11-08  Matthew Gretton-Dann  
>
> Backport from mainline
> 2012-10-15  Matthias Klose  
>
> * config.gcc: Match arm*-*-linux-* for ARM Linux/GNU.
> * doc/install.texi: Use arm-*-*linux-* instead of 
> arm-*-*linux-gnueabi.
>
> gcc/ada/ChangeLog:
> 2012-11-08  Matthew Gretton-Dann  
>
> Backport from mainline.
> 2012-10-15  Matthias Klose  
>
> * gcc-interface/Makefile.in: Match arm*-*-linux-*eabi* for
> ARM Linux/GNU.
>
> gcc/testsuite/ChangeLog:
> 2012-11-08  Matthew Gretton-Dann  
>
> Backport from mainline
> 2012-10-15  Matthias Klose  
>
> * lib/target-supports.exp (check_profiling_available): Match
> arm*-*-linux-* for ARM Linux/GNU.
> * gfortran.dg/enum_10.f90: Likewise.
> * gfortran.dg/enum_9.f90: Likewise.
> * gcc.target/arm/synchronize.c: Likewise.
> * g++.old-deja/g++.jason/enum6.C: Likewise.
> * g++.old-deja/g++.law/enum9.C: Likewise.
> * g++.old-deja/g++.other/enum4.C: Likewise.
>
> libgcc/ChangeLog:
> 2012-11-08  Matthew Gretton-Dann  
>     Backport from mainline.
> 2012-10-15  Matthias Klose  
>
> * config.host: Match arm*-*-linux-* for ARM Linux/GNU.
>
> libjava/ChangeLog:
> 2012-11-08  Matthew Gretton-Dann  
>
> Backport from mainline.
> 2012-10-15  Matthias Klose  
>
> * configure.ac: Match arm*-*-linux-* for ARM Linux/GNU.
> * configure: Regenerate.
>
> libstdc++-v3/ChangeLog:
> 2012-11-08  Matthew Gretton-Dann  
>
> Backport from mainline
> 2012-10-15  Matthias Klose  
>
> * configure.host: Match arm*-*-linux-* for ARM Linux/GNU.
> * testsuite/20_util/make_signed/requirements/typedefs-2.cc: Likewise.
> * testsuite/20_util/make_unsigned/requirements/typedefs-2.cc: 
> Likewise.
>
> --
> Matthew Gretton-Dann
> Linaro Toolchain Working Group
> matthew.gretton-d...@linaro.org



--
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org


Re: [RFA/ARM] Fix PR54974: Thumb literal pools don't handle PC rounding

2012-11-29 Thread Matthew Gretton-Dann
On 24 November 2012 00:27, Ramana Radhakrishnan
 wrote:
> On Wed, Nov 21, 2012 at 7:59 PM, Matthew Gretton-Dann
>  wrote:
[snip]
>> The fix is to decrease the pool_range of all insns by 2 when generating
>> Thumb code.  There is no need to change neg_pool_range values as rounding
>> down here will reduce the distance of the literal pool.
>
> A comment about this fact around thumb2_pool_range would be appropriate.
>
[snip]
>>
>> Tested arm-none-linux-gnueabi cross, and with the testcase attached to the
>> PR.  No added testcase in the patch as this code is sensitive to other code
>> generation and so it is not easy to generate a testcase which will reliably
>> test this condition.
>>
>> OK for trunk, 4.7, and 4.6?
>
>
> Ok for trunk today - please wait a few days before backporting into
> 4.6 and 4.7 to see what the fallout is like . Watch out for any
> fallout with the auto-testers.

The attached is what was actually committed as revision 193930
(original patch + requested comment).

Thanks,

Matt

--
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org
Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 193929)
+++ gcc/ChangeLog   (revision 193930)
@@ -1,3 +1,33 @@
+2012-11-29  Matthew Gretton-Dann  
+
+   PR target/54974
+   * config/arm/arm.md (thumb2_pool_range, pool_range): Add comment on
+   Thumb pool ranges.
+   (thumb1_extendhisi2): Reduce Thumb pool range.
+   (arm_movdi): Likewise.
+   (thumb1_movdi_insn): Likewise.
+   (thumb1_movsi_insn): Likewise.
+   (pic_load_addr_unified): Likewise.
+   (pic_load_addr_32bit): Likewise.
+   (pic_load_addr_thumb1): Likewise.
+   (thumb1_movhf): Likewise.
+   (arm_movsf_soft_insn): Likewise.
+   (thumb1_movsf_soft_insn): Likewise.
+   (movdf_soft_insn): Likewise.
+   (thumb1_movdf_soft_insn): Likewise.
+   * config/arm/neon.md (*neon_mov): Likewise.
+   (*neon_mov): Likwise.
+   * config/arm/thumb2.md: (*thumb2_movsi_insn): Likewise.
+   (*thumb2_movhi_insn): Likewise.
+   (*thumb2_extendqisi_v6): Likewise.
+   (*thumb2_zero_extendqisi_v6): Likewise.
+   (*thumb2_zero_extendqisi2_v6): Likewise.
+   * config/arm/vfp.md: (*thumb2_movsi_vfp): Likewise.
+   (*movdi_vfp): Likewise.
+   (*movdi_vfp_cortexa8): Likewise.
+   (*thumb2_movsf_vfp): Likewise.
+   (*thumb2_movdf_vfp): Likewise.
+
 2012-11-29  Kai Tietz  
 
PR target/55171
Index: gcc/config/arm/thumb2.md
===
--- gcc/config/arm/thumb2.md(revision 193929)
+++ gcc/config/arm/thumb2.md(revision 193930)
@@ -182,7 +182,7 @@
str%?\\t%1, %0"
   [(set_attr "type" "*,*,*,*,load1,load1,store1,store1")
(set_attr "predicable" "yes")
-   (set_attr "pool_range" "*,*,*,*,1020,4096,*,*")
+   (set_attr "pool_range" "*,*,*,*,1018,4094,*,*")
(set_attr "neg_pool_range" "*,*,*,*,0,0,*,*")]
 )
 
@@ -217,7 +217,7 @@
ldr%(h%)\\t%0, %1\\t%@ movhi"
   [(set_attr "type" "*,*,store1,load1")
(set_attr "predicable" "yes")
-   (set_attr "pool_range" "*,*,*,4096")
+   (set_attr "pool_range" "*,*,*,4094")
(set_attr "neg_pool_range" "*,*,*,250")]
 )
 
@@ -570,7 +570,7 @@
ldr%(sb%)\\t%0, %1"
   [(set_attr "type" "alu_shift,load_byte")
(set_attr "predicable" "yes")
-   (set_attr "pool_range" "*,4096")
+   (set_attr "pool_range" "*,4094")
(set_attr "neg_pool_range" "*,250")]
 )
 
@@ -583,7 +583,7 @@
ldr%(h%)\\t%0, %1"
   [(set_attr "type" "alu_shift,load_byte")
(set_attr "predicable" "yes")
-   (set_attr "pool_range" "*,4096")
+   (set_attr "pool_range" "*,4094")
(set_attr "neg_pool_range" "*,250")]
 )
 
@@ -596,7 +596,7 @@
ldr%(b%)\\t%0, %1\\t%@ zero_extendqisi2"
   [(set_attr "type" "alu_shift,load_byte")
(set_attr "predicable" "yes")
-   (set_attr "pool_range" "*,4096")
+   (set_attr "pool_range" "*,4094")
(set_attr "neg_pool_range" "*,250")]
 )
 
Index: gcc/config/arm/vfp.md
===
--- gcc/config/arm/vfp.md   (revision 193929)
+++ gcc/config/arm/vfp.md   (revision 193930)
@@ -123,7 +123,7 @@
(set_attr "type" 
"*,*,*,*,load1,load1,store1,store1,r_2_f,f_2_r,fcpys,f_loads,f_stores")
(set_at

[RFA/ARM] Fix PR54974: Thumb literal pools don't handle PC rounding

2012-11-21 Thread Matthew Gretton-Dann
All,

The attached patch fixes PR54974.

In Thumb when calculating the PC value for a literal load the value used is 
the current PC rounded down to the nearest multiple of 4.  The ARM backend 
currently does not take this into account when calculating literal pool 
placement.

The fix is to decrease the pool_range of all insns by 2 when generating 
Thumb code.  There is no need to change neg_pool_range values as rounding 
down here will reduce the distance of the literal pool.

The patch attached to the PR is not sufficient as we don't precisely know 
the PC when calculating literal pool ranges and so have to be conservative.

Whilst going through all the code I found the following, possibly related, 
issues that I would like some input from the ARM maintainers on (although 
they have not been touched in this patch):

1) Some Thumb-2 patterns (like thumb2_movhi_insn) have a neg_pool_range of 
250 for ldrh, where my reading of the ARMARM says the range is [-4095, 4095] 
for Thumb-2 (with appropriate rounding).  What is the reason for GCC's 
severe pessimism here?

2) thumb1_zero_extendqisi2 (and other insns) give a Thumb-1 narrow ldrb a 
pool_range of 32.  Surely the pool_range should be 0 (or *) as Thumb-1 
doesn't have a ldrb where the base-register can be PC?

Tested arm-none-linux-gnueabi cross, and with the testcase attached to the 
PR.  No added testcase in the patch as this code is sensitive to other code 
generation and so it is not easy to generate a testcase which will reliably 
test this condition.

OK for trunk, 4.7, and 4.6?

Thanks,

Matt

gcc/ChangeLog:

2012-11-21  Matthew Gretton-Dann  

PR target/54974
* config/arm/arm.md (thumb1_extendhisi2): Reduce Thumb
pool_range.
(arm_movdi): Likewise.
(thumb1_movdi_insn): Likewise.
(thumb1_movsi_insn): Likewise.
(pic_load_addr_unified): Likewise.
(pic_load_addr_32bit): Likewise.
(pic_load_addr_thumb1): Likewise.
(thumb1_movhf): Likewise.
(arm_movsf_soft_insn): Likewise.
(thumb1_movsf_soft_insn): Likewise.
(movdf_soft_insn): Likewise.
(thumb1_movdf_soft_insn): Likewise.
* config/arm/neon.md (*neon_mov): Likewise.
(*neon_mov): Likwise.
* config/arm/thumb2.md: (*thumb2_movsi_insn): Likewise.
(*thumb2_movhi_insn): Likewise.
(*thumb2_extendqisi_v6): Likewise.
(*thumb2_zero_extendqisi_v6): Likewise.
(*thumb2_zero_extendqisi2_v6): Likewise.
* config/arm/vfp.md: (*thumb2_movsi_vfp): Likewise.
(*movdi_vfp): Likewise.
(*movdi_vfp_cortexa8): Likewise.
(*thumb2_movsf_vfp): Likewise.
(*thumb2_movdf_vfp): Likewise.

--
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.orgdiff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 7e92b69..b3822d9 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4952,7 +4952,7 @@ (define_insn "thumb1_extendhisi2"
(const_int 2) (const_int 4))
  (const_int 4)])
(set_attr "type" "alu_shift,load_byte")
-   (set_attr "pool_range" "*,1020")]
+   (set_attr "pool_range" "*,1018")]
 )
 
 ;; This pattern will only be used when ldsh is not available
@@ -5359,7 +5359,7 @@ (define_insn "*arm_movdi"
(set_attr "type" "*,*,*,load2,store2")
(set_attr "arm_pool_range" "*,*,*,1020,*")
(set_attr "arm_neg_pool_range" "*,*,*,1004,*")
-   (set_attr "thumb2_pool_range" "*,*,*,4096,*")
+   (set_attr "thumb2_pool_range" "*,*,*,4094,*")
(set_attr "thumb2_neg_pool_range" "*,*,*,0,*")]
 )
 
@@ -5498,7 +5498,7 @@ (define_insn "*thumb1_movdi_insn"
   [(set_attr "length" "4,4,6,2,2,6,4,4")
(set_attr "type" "*,*,*,load2,store2,load2,store2,*")
(set_attr "insn" "*,mov,*,*,*,*,*,mov")
-   (set_attr "pool_range" "*,*,*,*,*,1020,*,*")]
+   (set_attr "pool_range" "*,*,*,*,*,1018,*,*")]
 )
 
 (define_expand "movsi"
@@ -5668,7 +5668,7 @@ (define_insn "*thumb1_movsi_insn"
mov\\t%0, %1"
   [(set_attr "length" "2,2,4,4,2,2,2,2,2")
(set_attr "type" "*,*,*,*,load1,store1,load1,store1,*")
-   (set_attr "pool_range" "*,*,*,*,*,*,1020,*,*")
+   (set_attr "pool_range" "*,*,*,*,*,*,1018,*,*")
(set_attr "conds" "set,clob,*,*,nocond,nocond,nocond,nocond,nocond")])
 
 (define_split 
@@ -5776,7 +5776,7 @@ (define_insn_and_split "pic_load_addr_unified"
 (match_dup 2)] UNSPEC_PIC_BASE))]
  "operands[3] = TARGET_THUMB ? G

Re: [RFA/4.7/ARM] Backport arm-*-linux-gnueabihf triplet support to 4.7

2012-11-21 Thread Matthew Gretton-Dann
On 21 November 2012 00:05, Matthias Klose  wrote:
> Am 20.11.2012 21:34, schrieb Matthew Gretton-Dann:
>> All,
>>
>> This patch backports Matthais Klose's arm*-*-linux-gnueabihf triplet
>> support patch of 2012-10-15 to 4.7.
>>
>> The backport was not clean as 4.8 has obsoleted various arm*-*-*
>> triplets which are valid in 4.7.
>>
>> I have tested this cross with arm-none-linux-gnueabihf and
>> arm-none-linux-gnueabi.
>>
>> One question I do have having done this work - is there a canonical way to
>> test for the arm*-*-linux-gnueabi triplet (or variants)?  Various configure
>> and testsuite files test for this, but there doesn't seem to be a consistent
>> method.
>>
>> OK for 4.7?
>
> looks fine, except one missing chunk from my original patch. maybe left out
> intentionally.
>
>   Matthias
>
> Index: b/src/gcc/config.gcc
> ===
> --- a/src/gcc/config.gcc
> +++ b/src/gcc/config.gcc
> @@ -934,7 +934,7 @@
> tm_file="dbxelf.h elfos.h arm/unknown-elf.h arm/elf.h arm/linux-gas.h
> arm/uclinux-elf.h glibc-stdint.h"
> tmake_file="arm/t-arm arm/t-arm-elf"
> case ${target} in
> -   arm*-*-uclinux*eabi)
> +   arm*-*-uclinux*eabi*)
> tm_file="$tm_file arm/bpabi.h arm/uclinux-eabi.h"
> tmake_file="$tmake_file arm/t-bpabi"
> # The BPABI long long divmod functions return a 128-bit value in

This change isn't in your commit to trunk of 2012-10-15 which is what
I backported.  This is because Richard Earnshaw effectively made this
change when he removed FPA support (SVN rev 188510).

I'm happy to do a patch that makes this change - but I think it should
be a separate patch to this backport one.

Thanks,

Matt

--
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org


[RFA/4.7/ARM] Backport arm-*-linux-gnueabihf triplet support to 4.7

2012-11-20 Thread Matthew Gretton-Dann
All,

This patch backports Matthais Klose's arm*-*-linux-gnueabihf triplet
support patch of 2012-10-15 to 4.7.

The backport was not clean as 4.8 has obsoleted various arm*-*-*
triplets which are valid in 4.7.

I have tested this cross with arm-none-linux-gnueabihf and
arm-none-linux-gnueabi.

One question I do have having done this work - is there a canonical way to 
test for the arm*-*-linux-gnueabi triplet (or variants)?  Various configure 
and testsuite files test for this, but there doesn't seem to be a consistent 
method.

OK for 4.7?

Thanks,

Matt

2012-11-08  Matthew Gretton-Dann  

Backport from mainline
2012-10-15  Matthias Klose  

* config.gcc: Match arm*-*-linux-* for ARM Linux/GNU.
* doc/install.texi: Use arm-*-*linux-* instead of arm-*-*linux-gnueabi.

gcc/ada/ChangeLog:
2012-11-08  Matthew Gretton-Dann  

Backport from mainline.
2012-10-15  Matthias Klose  

* gcc-interface/Makefile.in: Match arm*-*-linux-*eabi* for
ARM Linux/GNU.

gcc/testsuite/ChangeLog:
2012-11-08  Matthew Gretton-Dann  

Backport from mainline
2012-10-15  Matthias Klose  

* lib/target-supports.exp (check_profiling_available): Match
arm*-*-linux-* for ARM Linux/GNU.
* gfortran.dg/enum_10.f90: Likewise.
* gfortran.dg/enum_9.f90: Likewise.
* gcc.target/arm/synchronize.c: Likewise.
* g++.old-deja/g++.jason/enum6.C: Likewise.
* g++.old-deja/g++.law/enum9.C: Likewise.
* g++.old-deja/g++.other/enum4.C: Likewise.

libgcc/ChangeLog:
2012-11-08  Matthew Gretton-Dann  

* config.host: Match arm*-*-linux-* for ARM Linux/GNU.

libjava/ChangeLog:
2012-11-08  Matthew Gretton-Dann  

Backport from mainline.
2012-10-15  Matthias Klose  

* configure.ac: Match arm*-*-linux-* for ARM Linux/GNU.
* configure: Regenerate.

libstdc++-v3/ChangeLog:
2012-11-08  Matthew Gretton-Dann  

Backport from mainline
2012-10-15  Matthias Klose  

* configure.host: Match arm*-*-linux-* for ARM Linux/GNU.
* testsuite/20_util/make_signed/requirements/typedefs-2.cc: Likewise.
* testsuite/20_util/make_unsigned/requirements/typedefs-2.cc: Likewise.

-- 
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.orgdiff --git a/gcc/ada/gcc-interface/Makefile.in 
b/gcc/ada/gcc-interface/Makefile.in
index 9b5135e..9f20f07 100644
--- a/gcc/ada/gcc-interface/Makefile.in
+++ b/gcc/ada/gcc-interface/Makefile.in
@@ -1866,7 +1866,7 @@ ifeq ($(strip $(filter-out powerpc% linux%,$(arch) 
$(osys))),)
   LIBRARY_VERSION := $(LIB_VERSION)
 endif
 
-ifeq ($(strip $(filter-out arm% linux-gnueabi,$(arch) $(osys)-$(word 
4,$(targ,)
+ifeq ($(strip $(filter-out arm%-linux,$(arch)-$(osys)) $(if $(findstring 
eabi,$(word 4,$(targ))),,$(word 4,$(targ,)
   LIBGNAT_TARGET_PAIRS = \
   a-intnam.ads
 
diff --git a/gcc/testsuite/g++.old-deja/g++.law/enum9.C 
b/gcc/testsuite/g++.old-deja/g++.law/enum9.C
index 5a74b2f..0ecb87d 100644
--- a/gcc/testsuite/g++.old-deja/g++.law/enum9.C
+++ b/gcc/testsuite/g++.old-deja/g++.law/enum9.C
@@ -7,10 +7,10 @@
 // enum-size attributes should only be emitted if there are values of
 // enum type that can escape the compilation unit, gcc cannot currently
 // detect this; if this facility is added then this linker option should
-// not be needed.  arm-*-linux*eabi should be a good approximation to
+// not be needed.  arm-*-linux*eabi* should be a good approximation to
 // those platforms where the EABI supplement defines enum values to be
 // 32 bits wide.
-// { dg-options "-fshort-enums -Wl,--no-enum-size-warning" { target 
arm*-*-linux*eabi } }
+// { dg-options "-fshort-enums -Wl,--no-enum-size-warning" { target 
arm*-*-linux*eabi* } }
 
 // GROUPS passed enums
   extern "C" int printf (const char *, ...);
diff --git a/gcc/testsuite/g++.old-deja/g++.other/enum4.C 
b/gcc/testsuite/g++.old-deja/g++.other/enum4.C
index 429e812..509da6d 100644
--- a/gcc/testsuite/g++.old-deja/g++.other/enum4.C
+++ b/gcc/testsuite/g++.old-deja/g++.other/enum4.C
@@ -9,10 +9,10 @@
 // enum-size attributes should only be emitted if there are values of
 // enum type that can escape the compilation unit, gcc cannot currently
 // detect this; if this facility is added then this linker option should
-// not be needed.  arm-*-linux*eabi should be a good approximation to
+// not be needed.  arm-*-linux*eabi* should be a good approximation to
 // those platforms where the EABI supplement defines enum values to be
 // 32 bits wide.
-// { dg-options "-fshort-enums -Wl,--no-enum-size-warning" { target 
arm*-*-linux*eabi } }
+// { dg-options "-fshort-enums -Wl,--no-enum-size-warning" { target 
arm*-*-linux*eabi* } }
 
 enum E { 
   a = -312
diff --git a/gcc/testsuite/gcc.target/arm/synchronize.c 
b/gcc/testsuite/gcc.target/arm/synchronize.c
index 8626d8e..cf5dcdf 100644
--- a/gc

Re: [PATCH ARM]Define LOGICAL_OP_NON_SHORT_CIRCUIT for ARM target

2012-11-19 Thread Matthew Gretton-Dann
On 16 November 2012 12:22, Bin Cheng  wrote:
>
>
>> -Original Message-----
>> From: Matthew Gretton-Dann [mailto:matthew.gretton-d...@linaro.org]
>> Sent: Friday, November 16, 2012 6:30 PM
>> To: Bin Cheng
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH ARM]Define LOGICAL_OP_NON_SHORT_CIRCUIT for ARM target
>>
>> On 16 November 2012 05:37, Bin Cheng  wrote:
>> > Hi,
>> > This patch defines LOGICAL_OP_NON_SHORT_CIRCUIT for ARM target and
>> > prefers short circuit for armv6-m and Thumb2+Os.
>> >
>>
>> > ===
>> > --- gcc/config/arm/arm.h (revision 193494)
>> > +++ gcc/config/arm/arm.h (working copy)
>> > @@ -2012,10 +2012,16 @@ enum arm_auto_incmodes
>> >   || (X) == arg_pointer_rtx)
>> >
>> > /* Try to generate sequences that don't involve branches, we can then
> use
>> > -   conditional instructions */
>> > +   conditional instructions.  */
>> > #define BRANCH_COST(speed_p, predictable_p) \
>> >   (current_tune->branch_cost (speed_p, predictable_p))
>> >
>> > +/* False if short circuit operation is preferred.  */ #define
>> > +LOGICAL_OP_NON_SHORT_CIRCUIT \
>> > +  ((optimize_size) \
>> > +   ? (TARGET_THUMB ? false : true) \
>> > +   : (current_tune->logical_op_non_short_circuit[TARGET_ARM]))
>> > +
>>
>> This changes the definition of LOGICAL_OP_NON_SHORT_CIRCUIT for all cores
>> supported by the ARM backend.
>>
>> In gcc/fold-const.c LOGICAL_OP_NON_SHORT_CIRCUIT is defined as follows:
>>
>> #ifndef LOGICAL_OP_NON_SHORT_CIRCUIT
>> #define LOGICAL_OP_NON_SHORT_CIRCUIT \
>>   (BRANCH_COST (optimize_function_for_speed_p (cfun), \
>> false) >= 2)
>> #endif
>>
>> Now whilst this is probably wrong for most ARM cores, can you please keep
> it
>> as the default for cores which you haven't benchmarked the change on?  The
>> optimise for code size changes are probably on all cores without further
>> testing.
>>
>
> Thanks for your comments,
> I am not sure what's the meaning of "probably wrong for most ARM cores",

I meant that the default definition is not the best definition for ARM cores.

> I
> deduced the value of field "logical_op_non_short_circuit" from the
> previously default macro and the BRANCH_COST for all arm tune_params, so
> this patch should not change the behavior on ARM cores other than v6m. Or
> did I miss something?

My issue was that you had changed all the ARM backend to not have the
'default' behaviour (in the sense that if the default was changed in
fold-const.c then the ARM backend would no longer pick this up).
However, this is of course not possible to achieve as once you've
defined the hook you have to use it for all cores in the ARM backend.
Sorry.

Thanks,

Matt

--
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org


Re: [PATCH ARM]Define LOGICAL_OP_NON_SHORT_CIRCUIT for ARM target

2012-11-16 Thread Matthew Gretton-Dann
On 16 November 2012 05:37, Bin Cheng  wrote:
> Hi,
> This patch defines LOGICAL_OP_NON_SHORT_CIRCUIT for ARM target and prefers
> short circuit for armv6-m and Thumb2+Os.
>

> ===
> --- gcc/config/arm/arm.h (revision 193494)
> +++ gcc/config/arm/arm.h (working copy)
> @@ -2012,10 +2012,16 @@ enum arm_auto_incmodes
>   || (X) == arg_pointer_rtx)
>
> /* Try to generate sequences that don't involve branches, we can then use
> -   conditional instructions */
> +   conditional instructions.  */
> #define BRANCH_COST(speed_p, predictable_p) \
>   (current_tune->branch_cost (speed_p, predictable_p))
>
> +/* False if short circuit operation is preferred.  */
> +#define LOGICAL_OP_NON_SHORT_CIRCUIT \
> +  ((optimize_size) \
> +   ? (TARGET_THUMB ? false : true) \
> +   : (current_tune->logical_op_non_short_circuit[TARGET_ARM]))
> +

This changes the definition of LOGICAL_OP_NON_SHORT_CIRCUIT for all
cores supported by the ARM backend.

In gcc/fold-const.c LOGICAL_OP_NON_SHORT_CIRCUIT is defined as follows:

#ifndef LOGICAL_OP_NON_SHORT_CIRCUIT
#define LOGICAL_OP_NON_SHORT_CIRCUIT \
  (BRANCH_COST (optimize_function_for_speed_p (cfun), \
false) >= 2)
#endif

Now whilst this is probably wrong for most ARM cores, can you please
keep it as the default for cores which you haven't benchmarked the
change on?  The optimise for code size changes are probably on all
cores without further testing.

Thanks,

Matt

--
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org


Re: [PATCH][RFC] Sanity checking for -freorder-blocks-and-partition failures

2012-11-01 Thread Matthew Gretton-Dann
On 31 October 2012 20:06, Teresa Johnson  wrote:
> On Wed, Oct 31, 2012 at 12:58 PM, Christophe Lyon
>  wrote:
>> On 30.10.2012 17:59, Teresa Johnson wrote:
>>>
>>> On Tue, Oct 30, 2012 at 9:26 AM, Steven Bosscher 
>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> Hot/cold partitioning is apparently a hot topic all of a sudden, which
>>>> is a good thing of course, because it's in need of some TLC.
>>>>
>>>> The attached patch adds another check the RTL cfg checking
>>>> (verify_flow_info) for the partitioning: A hot block can never be
>>>> dominated by a cold block (because the dominated block must also be
>>>> cold). This trips in PR55121.
>>>>
>>>> I haven't tested this with any profiling tests, but it's bound to
>>>> break things. From my POV, whatever gets broken by this patch was
>>>> already broken to begin with :-)   If you're in CC, it's because I
>>>> hope you can help test this patch.
>>>
>>> I will try testing your patch on top of mine with our fdo benchmarks.
>>> For the others on the cc list, you may need to include my patch as
>>> well for testing. Without it, -freorder-blocks-and-partition was DOA
>>> for me. For my patch, see
>>> http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02692.html
>>>
>>> Teresa
>>>
>> I have tried Steven's patch an indeed it reported numerous errors while
>> compiling spec2k.
>>
>> I tried Teresa's patch too, but it changed nothing in my tests. The patches
>> already posted by Matt are still necessary and Teresa's patch does not
>> improve my tests.
>
> With checking enabled I am seeing additional failures that my fixes
> are not addressing. Looking into those now.
> Can someone point me to Matt's patches? Is that this one:
> http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00274.html
> or are there others?

That is one of two.  The other one is:
http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00275.html

I'd be careful with the second one - the original patch posted (in the
link) fixes the issue I was seeing, but Stephen Bosscher suggested I
made some changes, and I reposted a patch with some incorrect logic
(which unfortunately also fixes the issue but more by luck than
judgement).  I haven't had the time to fully work through updating and
testing a reworked patch yet.

Thanks,

Matt

-- 
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org


Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-10-30 Thread Matthew Gretton-Dann
On 30 October 2012 05:20, Teresa Johnson  wrote:
> Index: cfgrtl.c
> ===
> --- cfgrtl.c(revision 192692)
> +++ cfgrtl.c(working copy)
> @@ -912,7 +912,8 @@ rtl_can_merge_blocks (basic_block a, basic_block b
>   partition boundaries).  See  the comments at the top of
>   bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>
> -  if (BB_PARTITION (a) != BB_PARTITION (b))
> +  if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX)
> +  || BB_PARTITION (a) != BB_PARTITION (b))
>  return false;
>
>/* Protect the loop latches.  */
> @@ -3978,7 +3979,8 @@ cfg_layout_can_merge_blocks_p (basic_block a, basi
>   partition boundaries).  See  the comments at the top of
>   bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>
> -  if (BB_PARTITION (a) != BB_PARTITION (b))
> +  if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX)
> +  || BB_PARTITION (a) != BB_PARTITION (b))
>  return false;
>
>/* Protect the loop latches.  */

As this if() condition seems to be the canonical way to detect being
in a different partition should it be moved out into a query function,
and all of cfgrtl.c updated to use it?

[Note I am not a maintainer and so can't approve/reject your patch].

Thanks,

Matt

-- 
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org


[RFA ARM/4.7] Backport Split all insns before pool placement

2012-10-17 Thread Matthew Gretton-Dann
All,

Ulrich posted the following patch in July:
   http://gcc.gnu.org/ml/gcc-patches/2012-07/msg01123.html

Richard E requested that it be left in testing on trunk for a couple of days 
before being backported to 4.7.  Three months seems to satisfy the 'couple of 
days' requirement :-).

Is this OK to be backported to 4.7?  Cross tested arm-none-linux-gnueabi.

Thanks,

Matt

2012-10-15  Matthew Gretton-Dann  

   Backported from mainline
   2012-07-23  Ulrich Weigand  

   * config/arm/arm.c (arm_reorg): Ensure all insns are split.

-- 
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.orgdiff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 35b73c5..3796a80 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -13337,6 +13337,13 @@ arm_reorg (void)
   if (TARGET_THUMB2)
 thumb2_reorg ();
 
+  /* Ensure all insns that must be split have been split at this point.
+ Otherwise, the pool placement code below may compute incorrect
+ insn lengths.  Note that when optimizing, all insns have already
+ been split at this point.  */
+  if (!optimize)
+split_all_insns_noflow ();
+
   minipool_fix_head = minipool_fix_tail = NULL;
 
   /* The first insn must always be a note, or the code below won't


[PING] Re: [RFA 1/n] Fix if conversion interactions with block partitioning

2012-10-09 Thread Matthew Gretton-Dann
PING.

On 24 September 2012 11:34, Matthew Gretton-Dann
 wrote:
> On Wednesday 05 September 2012 13:47:19 Steven Bosscher wrote:
>> On Wed, Sep 5, 2012 at 1:25 PM, Matthew Gretton-Dann wrote:
>> > +  /* If the two blocks are in different partitions we do not want to mark
>> > + this as a fallthru edge.  */
>> > +  if (BB_PARTITION (b) != BB_PARTITION (c))
>> > +return;
>> > +
>>
>> I think you should look for a REG_CROSSING_JUMP note on BB_END instead
>> of comparing BB_PARTITION.
>
> Sorry for the delay in getting  back to this.
>
> Anyway, I had a look at how other parts of cfgrtl.c handled this and it seems
> as if they do both your suggestion and the check against different partitions.
>
> So this is what I have implemented in the attached patch.
>
> Cross tested arm-none-linux-gnueabi with QEmu.
>
> OK for trunk?
>
> Thanks,
>
> Matt
>
> gcc/ChangeLog:
>
> 2012-09-24  Matthew Gretton-Dann  
>
> * cfgrtl.c (rtl_tidy_fallthru_edge): Don't tidy edges which
>   cross partitions.
>
> --
> Matthew Gretton-Dann
> Linaro Toolchain Working Group
> matthew.gretton-d...@linaro.org



-- 
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org
diff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c
index c62b5bc..8fcf7e4 100644
--- a/gcc/cfgrtl.c
+++ b/gcc/cfgrtl.c
@@ -1572,6 +1572,12 @@ rtl_tidy_fallthru_edge (edge e)
 if (INSN_P (q))
   return;
 
+  /* If the two blocks are in different partitions we do not want to mark
+ this as a fallthru edge.  */
+  if (find_reg_note (BB_END (b), REG_CROSSING_JUMP, NULL_RTX)
+  || BB_PARTITION (b) != BB_PARTITION (c))
+return;
+
   /* Remove what will soon cease being the jump insn from the source block.
  If block B consisted only of this single jump, turn it into a deleted
  note.  */


Re: [RFA 2/n] Don't lift loads above register using jumps in postreload-gcse.c

2012-09-25 Thread Matthew Gretton-Dann
On Wednesday 05 September 2012 17:40:23 Steven Bosscher wrote:
> On Wed, Sep 5, 2012 at 3:18 PM, Matthew Gretton-Dann
> 
>  wrote:
> > On 5 September 2012 13:45, Richard Earnshaw  wrote:
> >> On 05/09/12 13:02, Steven Bosscher wrote:
> >>> On Wed, Sep 5, 2012 at 1:42 PM, Matthew Gretton-Dann wrote:
> >>>> Whilst this fix works for this particular case I am not sure it is the
> >>>> best fix for the general issue, and so if others have a better idea how
> >>>> to fix this I would be very happy.
> >>> 
> >>> postreload-gcse.c is broken in "interesting" ways. Look at this gem for
> >>> example:
> >>> 
> >>> static bool
> >>> reg_changed_after_insn_p (rtx x, int cuid)
> >>> {
> >>> 
> >>>   unsigned int regno, end_regno;
> >>>   
> >>>   regno = REGNO (x);
> >>>   end_regno = END_HARD_REGNO (x);
> >>>   do
> >>>   
> >>> if (reg_avail_info[regno] > cuid)
> >>> 
> >>>   return true;
> >>>   
> >>>   while (++regno < end_regno);
> >>>   return false;
> >>> 
> >>> }
> >>> 
> >>> So the more conservative the fix, the better :-)
> > 
> > I suppose removing the pass is too conservative :-)
> > 
> >>> The patch looks correct to me. But perhaps the pass should just punt
> >>> on blocks not ending in a simple jump in
> >>> bb_has_well_behaved_predecessors?
> > 
> > By 'simple jump' you mean any block with at most only EDGE_FALLTHRU on the
> > edge?
> No, I mean using the onlyjump_p predicate.

Again sorry for the delay.  Attached is an updated patch using the onlyjump_p 
predicate as suggested by Steven.

Tested cross arm-none-linux-gnueabi with QEmu.

OK for trunk?

Thanks,

Matt

gcc/ChangeLog:

2012-09-25  Matthew Gretton-Dann  

* postreload-gcse.c (bb_has_well_behaved_predecessors): Don't handle 
blocks
that end in a non-simple jump.

-- 
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.orgdiff --git a/gcc/postreload-gcse.c b/gcc/postreload-gcse.c
index b9e9f25..412c8fc 100644
--- a/gcc/postreload-gcse.c
+++ b/gcc/postreload-gcse.c
@@ -925,6 +925,9 @@ bb_has_well_behaved_predecessors (basic_block bb)
 
   if (JUMP_TABLE_DATA_P (BB_END (pred->src)))
return false;
+
+  if (onlyjump_p (BB_END (pred->src)))
+   return false;
 }
   return true;
 }


Re: [RFA 1/n] Fix if conversion interactions with block partitioning

2012-09-24 Thread Matthew Gretton-Dann
On Wednesday 05 September 2012 13:47:19 Steven Bosscher wrote:
> On Wed, Sep 5, 2012 at 1:25 PM, Matthew Gretton-Dann wrote:
> > +  /* If the two blocks are in different partitions we do not want to mark
> > + this as a fallthru edge.  */
> > +  if (BB_PARTITION (b) != BB_PARTITION (c))
> > +return;
> > +
> 
> I think you should look for a REG_CROSSING_JUMP note on BB_END instead
> of comparing BB_PARTITION.

Sorry for the delay in getting  back to this.

Anyway, I had a look at how other parts of cfgrtl.c handled this and it seems 
as if they do both your suggestion and the check against different partitions.

So this is what I have implemented in the attached patch.

Cross tested arm-none-linux-gnueabi with QEmu.

OK for trunk?

Thanks,

Matt

gcc/ChangeLog:

2012-09-24  Matthew Gretton-Dann  

* cfgrtl.c (rtl_tidy_fallthru_edge): Don't tidy edges which
  cross partitions.

-- 
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.orgdiff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c
index c62b5bc..8fcf7e4 100644
--- a/gcc/cfgrtl.c
+++ b/gcc/cfgrtl.c
@@ -1572,6 +1572,12 @@ rtl_tidy_fallthru_edge (edge e)
 if (INSN_P (q))
   return;
 
+  /* If the two blocks are in different partitions we do not want to mark
+ this as a fallthru edge.  */
+  if (find_reg_note (BB_END (b), REG_CROSSING_JUMP, NULL_RTX)
+  || BB_PARTITION (b) != BB_PARTITION (c))
+return;
+
   /* Remove what will soon cease being the jump insn from the source block.
  If block B consisted only of this single jump, turn it into a deleted
  note.  */


Re: [Patch, ARM, testsuite]

2012-09-21 Thread Matthew Gretton-Dann
On 20 September 2012 23:06, Christophe Lyon  wrote:
> Hi,
>
> GCC for ARM does not support compiling in Thumb1 mode  and
> float-abi=hard.  But  it does not fail unless the program being
> compiled actually contains a function with parameters and/or a return
> value.
>
> This is a (minor) problem in the testsuite in some configurations.
>
> For instance, if I run the testsuite forcing -mthumb (via site.exp)
> for a GCC configured for float-abi=hard, and a test uses
> /* { dg-require-effective-target arm_arch_v6_ok } */
> /* { dg-add-options arm_arch_v6 } */
>
> it won't be unresolved since effective-target arm_arch_v6_ok is successful.
>
> The attached patch adds a dummy function body in the test such that it fails.
>
> Another way of achieving the same result is by making sure that the
> relevant tests use
> arm_arch_v6_multilib
> instead of
> arm_arch_v6_ok
>
> even if the test is not intended to be executed.
>
> OK?

[I'm not a maintainer]

You could argue that as the test is checking for just ARMv6, but then
uses ARMv6+VFPv2 features - and so it going wrong is to be expected
:-).

So other approaches could be to see what adding

/* { dg-require-effective-target arm_vfp_ok } */

after dg-add-options achieves.

If that doesn't work I would suggest you add an arm_arch_v6_vfp_v2_ok
set of effective-targets instead of restricting the current
arm_arch_v6 effective target.

Thanks,

Matt

-- 
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org


Re: [RFA 2/n] Don't lift loads above register using jumps in postreload-gcse.c

2012-09-05 Thread Matthew Gretton-Dann
On 5 September 2012 13:45, Richard Earnshaw  wrote:
> On 05/09/12 13:02, Steven Bosscher wrote:
>> On Wed, Sep 5, 2012 at 1:42 PM, Matthew Gretton-Dann wrote:
>>> Whilst this fix works for this particular case I am not sure it is the
>>> best fix for the general issue, and so if others have a better idea how
>>> to fix this I would be very happy.
>>
>> postreload-gcse.c is broken in "interesting" ways. Look at this gem for 
>> example:
>>
>> static bool
>> reg_changed_after_insn_p (rtx x, int cuid)
>> {
>>   unsigned int regno, end_regno;
>>
>>   regno = REGNO (x);
>>   end_regno = END_HARD_REGNO (x);
>>   do
>> if (reg_avail_info[regno] > cuid)
>>   return true;
>>   while (++regno < end_regno);
>>   return false;
>> }
>>
>> So the more conservative the fix, the better :-)

I suppose removing the pass is too conservative :-)

>> The patch looks correct to me. But perhaps the pass should just punt
>> on blocks not ending in a simple jump in
>> bb_has_well_behaved_predecessors?

By 'simple jump' you mean any block with at most only EDGE_FALLTHRU on the edge?

> That sort of makes sense.  Why would we ever want to hoist an insn out
> of a cold block into a hot one?  I could see it making sense to do the
> reverse on occasion, but clearly care is needed here.

So whilst testing -freorder-blocks-and-partition has caused this
behaviour to be exhibited, I believe there is nothing stopping this
happening with any indirect jump - not just crossing jumps.

Thanks,

Matt

-- 
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org


[RFA 2/n] Don't lift loads above register using jumps in postreload-gcse.c

2012-09-05 Thread Matthew Gretton-Dann
All,

When implementing ARM/Thumb support for -freorder-blocks-and-partition I
encountered the following silent code generation fault.

Given the following CFG:

 |  |
93 97
 |  |
 (FALLTHRU)(CROSSING)
 \ /
  \\  /---/
\/
94
 |

Basic Block 94 has the following insn in it which we want to lift into
blocks 93 and 97:

(insn/v 62 767 63 94 (set (reg:SI 3 r3 [orig:1468 ivtmp.85 ] [1468])
(mem/c:SI (plus:SI (reg/f:SI 13 sp)
(const_int 20 [0x14])) [7 %sfp+-52 S4 A32]))

For block 93 this becomes a move from r0 to r3 - and everything is OK.

For block 97 there is no appropriate move so the compiler tries
to copy the load, and insert it on the edge from 97 to 94.  This edge is
a crossing edge, and so is implemented by an indirect jump:

(insn 2795 2590 3940 97 (set (reg:SI 3 r3 [2464])
(mem/u/c:SI (symbol_ref/u:SI ("*.LC19") [flags 0x2]) [2 S4
A32])) 634 {*arm_movsi_vfp}
 (insn_list:REG_LABEL_OPERAND 887 (expr_list:REG_EQUIV (label_ref:SI 887)
(nil
(jump_insn 2796 3940 2593 97 (set (pc)
(reg:SI 3 r3 [2464])) 264 {*arm_indirect_jump}
 (expr_list:REG_CROSSING_JUMP (nil)
(nil)))

The compiler tries to insert the copy of insn 62 (in this case it
becomes insn 3940) immediately before the jump_insn - which because this
is a crossing edge is implemented as an indirect jump using a register
in the ARM backend:

(insn 2795 2590 3940 97 (set (reg:SI 3 r3 [2464])
(mem/u/c:SI (symbol_ref/u:SI ("*.LC19") [flags 0x2]) [2 S4
A32])) 634 {*arm_movsi_vfp}
 (insn_list:REG_LABEL_OPERAND 887 (expr_list:REG_EQUIV (label_ref:SI 887)
(nil
(insn 3940 2795 2796 97 (set (reg:SI 3 r3 [orig:1468 ivtmp.85 ] [1468])
(mem/c:SI (plus:SI (reg/f:SI 13 sp)
(const_int 20 [0x14])) [7 %sfp+-52 S4 A32])) -1
 (nil))
(jump_insn 2796 3940 2593 97 (set (pc)
(reg:SI 3 r3 [2464])) 264 {*arm_indirect_jump}
 (expr_list:REG_CROSSING_JUMP (nil)
(nil)))

However, this is incorrect as insn 3940 sets r3, and the jump_insn
2796 wants to use r3 (as set by 2795).

The patch fixes this by checking that the register set by the load is
not used by the jump before allowing the load to be lifted.

Whilst this fix works for this particular case I am not sure it is the
best fix for the general issue, and so if others have a better idea how
to fix this I would be very happy.

In particular I wonder whether we should be defining
TARGET_CANNOT_MODIFY_JUMPS_P for the ARM backend as indirect jumps use
registers in a similar way to the SH backend.  Not that this would have helped 
in this particular instance.

Tested cross arm-linux-none-gnueabi with in progress ARM -freorder-blocks-and-
partition enabling patch.

OK?

Thanks,

Matt

gcc/ChangeLog:

2012-09-05  Matthew Gretton-Dann  

* postreload-gcse.c (eliminate_partially_redundant_load): Ensure
that loads are not lifted over branches which use the register
loaded.


diff --git a/gcc/postreload-gcse.c b/gcc/postreload-gcse.c
index b464d1f..85fb9b3 100644
--- a/gcc/postreload-gcse.c
+++ b/gcc/postreload-gcse.c
@@ -1048,6 +1048,13 @@ eliminate_partially_redundant_load (basic_block bb, rtx
  /* Adding a load on a critical edge will cause a split.  */
  if (EDGE_CRITICAL_P (pred))
critical_edge_split = true;
+
+ /* If the destination register is used at the BB end we can not
+insert the load.  */
+ if (reg_used_between_p (dest, PREV_INSN (BB_END (pred_bb)),
+ next_pred_bb_end))
+   goto cleanup;
+
  not_ok_count += pred->count;
  unoccr = (struct unoccr *) obstack_alloc (&unoccr_obstack,
sizeof (struct unoccr));

-- 
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org


[RFA 1/n] Fix if conversion interactions with block partitioning

2012-09-05 Thread Matthew Gretton-Dann
All,

This is the first patch in a series with the ultimate aim of enabling
-freorder-blocks-and-partition in the ARM backend.

However, whilst working on this I have come across a number of midend issues 
which should be fixed individually.

This patch fixes an ICE during if-conversion.

The problem is that when we encounter a CFG that looks like:

 ||
 ||
 | 167 (COLD)
 |  /  \
 | /\
 | 168 (COLD)  169 (COLD)
 \ \/
  \--\  \  /
  \  \/
  170 (HOT)
  |
  |

The 'ce3' phase merges blocks 167, 168, and 169, and eventually calls
rtl_tidy_fallthru_edge to convert the edge from 167 to 170 into a
fallthru one.

This causes verify_flow_info to fail as you can't have a fallthru edge
between different partitions.

The fix I have implemented is to have rtl_tidy_fallthru not do anything
if the fallthru edge crosses a partition boundary.

OK?

Thanks,

Matt

gcc/ChangeLog:
2012-09-05  Matthew Gretton-Dann  

* cfgrtl.c (rtl_tidy_fallthru_edge): Don't tidy edges which
cross partitions.

diff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c
index c62b5bc..341ea9e 100644
--- a/gcc/cfgrtl.c
+++ b/gcc/cfgrtl.c
@@ -1572,6 +1572,11 @@ rtl_tidy_fallthru_edge (edge e)
 if (INSN_P (q))
   return;
 
+  /* If the two blocks are in different partitions we do not want to mark
+ this as a fallthru edge.  */
+  if (BB_PARTITION (b) != BB_PARTITION (c))
+return;
+
   /* Remove what will soon cease being the jump insn from the source block.
  If block B consisted only of this single jump, turn it into a deleted
  note.  */

-- 
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org


Re: [PATCH][RFC] Add -Og

2012-09-05 Thread Matthew Gretton-Dann
On 5 September 2012 09:55, Steven Bosscher  wrote:
> On Wed, Sep 5, 2012 at 10:46 AM, Matthew Gretton-Dann
>  wrote:
>>> Please, no inlining.  Think of stack back-traces and their use
>>> when debugging.
>>
>> I would argue [without sufficient knowledge of how easy this would
>> actually be to do in a real compiler :-)] that this is a debugger
>> problem and not a compiler issue.
>
> It's also a compiler issue if you take inlining of clones into
> account, or scheduling such that the inlined body is scattered all
> over in  the the caller's body. The compiler can tell the debugger
> only so much...

But that's not a problem with inlining, that's a problem with allowing
things to happen out of order (for some definition of things and
order) - which in my understanding -Og is going to tie down.

Thanks,

Matt

-- 
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org


Re: [PATCH][RFC] Add -Og

2012-09-05 Thread Matthew Gretton-Dann
On 4 September 2012 21:42, Hans-Peter Nilsson  wrote:
> On Mon, 3 Sep 2012, Richard Guenther wrote:
>> On Fri, Aug 10, 2012 at 1:30 PM, Richard Guenther  wrote:
>> >
>> > This adds a new optimization level, -Og, as previously discussed.
>> > It aims at providing fast compilation, a superior debugging
>> > experience and reasonable runtime performance.  Instead of making
>> > -O1 this optimization level this adds a new -Og.
>> >
>> > It's a first cut, highlighting that our fixed pass pipeline and
>> > simply enabling/disabling individual passes (but not pass copies
>> > for example) doesn't scale to properly differentiate between
>> > -Og and -O[23].  -O1 should get similar treatment, eventually
>> > just building on -Og but not focusing on debugging experience.
>> > That is, I expect that in the end we will at least have two post-IPA
>> > optimization pipelines.  It also means that you cannot enable
>> > PRE or VRP with -Og at the moment because these passes are not
>> > anywhere scheduled (similar to the situation with -O0).
>> >
>> > It has some funny effect on dump-file naming of the pass copies
>> > though, which hints at that the current setup is too static.
>> > For that reason the new queue comes after the old, to not confuse
>> > too many testcases.
>> >
>> > It also does not yet disable any of the early optimizations that
>> > make debugging harder (SRA comes to my mind here, as does
>> > switch-conversion and partial inlining).
>
> Please, no inlining.  Think of stack back-traces and their use
> when debugging.

I would argue [without sufficient knowledge of how easy this would
actually be to do in a real compiler :-)] that this is a debugger
problem and not a compiler issue.

With DWARF as the debug info format it should certainly be possible to
produce a view that looked like:

$ bt
#0b baz (...)
#0a inlined into bar (...)
#0 inlined into foo (...)
#1 do_something
#2 main

This would involve reading the .debug_frame, and then looking up the
inlined subroutines via .debug_info.

I personally would like as much optimisation as possible at -Og that
doesn't break a defined level of debug illusion.  I have seen too many
cases where people debug at -O0/-O1 and then build a release with a
-O2/-O3 build and get bitten by undefined behaviour issues.  The more
optimised -Og code is the less the reason to release a build at a
higher optimisation level.

Thanks,

Matt

-- 
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org


Re: [PATCH][RFC] Add -Og

2012-09-04 Thread Matthew Gretton-Dann
nt, and that the state of the machine is
such that everything before that sequence point will have been
completed and that nothing after that sequence point will have been
started.

It is probably also possible to argue that there is a case for having
points between sequence points where we say the code would be in a
good state (lets call them observation points).  So for instance we
might want to say that in:

 int x, a, b, c;
 ...
 x = a + b * c;

If we just say we only promise a known state at sequence points then
the compiler is free to use some form of multiply-accumulate
instruction here.  But a user may want to see the multiply followed by
addition split out.  So we could define the observation points to be
on the *, +, and =.

> 4. Generated code should be small and fast, compile-time and memory
> usage should be low.  Unless either of it defeats 1. to 3.
>
> The patch only provides a starting point and from the GIMPLE side
> should be reasonably close to the goals above.
>
> Richard.

Thanks,

Matt

-- 
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org


Re: [RFA/ARM 1/3] Add VFP support for VFMA and friends

2012-07-05 Thread Matthew Gretton-Dann

On 26/06/12 14:44, Richard Earnshaw wrote:

On 25/06/12 15:59, Matthew Gretton-Dann wrote:

All,

This patch adds support to the ARM backend for generating floating-point
fused multiply-accumulate.

OK?

gcc/ChangeLog:

2012-06-25  Matthew Gretton-Dann  

* config/arm/iterators.md (SDF): New mode iterator.
(V_if_elem): Add support for SF and DF modes.
(V_reg): Likewise.
(F_w_constraint): New mode iterator attribute.
(F_r_constraint): Likewise.
(F_fma_type): Likewise.
(F_target): Likewise.
config/arm/vfp.md (fma4): New pattern.
(*fmsub4): Likewise.
(*fmnsub4): Likewise.
(*fmnadd4): Likewise.



F_target as an attribute name doesn't tell me anything useful.  I
suggest F_maybe_not_df.


+  "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_FMA "


This should be written as

"TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_FMA && "

Then the attribute should expand

   (define_mode_attr F_maybe_not_df [(SF "1") (DF "TARGET_VFP_DOUBLE")])

As I style nit, I would also suggest using the iterator name when it
appears in the pattern name, even though it is redundant.  This avoids
potential ambiguities when there are multiple iterators operating on
different expansions.  That is, instead of:

  (define_insn "fma4"

use:

  (define_insn "fma4"

OK with those changes.

R.



Now checked in with some changes (see attached patch for what was committed) 
- changes approved off list.


gcc/ChangeLog:
2012-07-05  Matthew Gretton-Dann  

* config/arm/iterators.md (SDF): New mode iterator.
(V_if_elem): Add support for SF and DF modes.
(V_reg): Likewise.
(F_constraint): New mode iterator attribute.
(F_fma_type): Likewise.
config/arm/vfp.md (fma4): New pattern.
(*fmsub4): Likewise.
    (*fmnsub4): Likewise.
(*fmnadd4): Likewise.


gcc/testsuite/ChangeLog:
2012-07-05  Matthew Gretton-Dann  

* gcc.target/arm/fma-sp.c: New testcase.
    * gcc.target/arm/fma.c: Likewise.
* gcc.target/arm/fma.h: Likewise.

Thanks,

Matt



--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltd
Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 189282)
+++ gcc/ChangeLog   (revision 189284)
@@ -1,3 +1,15 @@
+2012-07-05  Matthew Gretton-Dann  
+
+   * config/arm/iterators.md (SDF): New mode iterator.
+   (V_if_elem): Add support for SF and DF modes.
+   (V_reg): Likewise.
+   (F_constraint): New mode iterator attribute.
+   (F_fma_type): Likewise.
+   config/arm/vfp.md (fma4): New pattern.
+   (*fmsub4): Likewise.
+   (*fmnsub4): Likewise.
+   (*fmnadd4): Likewise.
+
 2012-07-04  Uros Bizjak  
 
* expmed.c (expand_mult): Initialize coeff and is_neg.
Index: gcc/testsuite/gcc.target/arm/fma.c
===
--- gcc/testsuite/gcc.target/arm/fma.c  (revision 0)
+++ gcc/testsuite/gcc.target/arm/fma.c  (revision 189284)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=cortex-a15 -mfpu=vfpv4" } */
+
+#include "fma.h"
+
+/* { dg-final { scan-assembler-times "vfma\.f64\td\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "vfma\.f32\ts\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "vfms\.f64\td\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "vfms\.f32\ts\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "vfnma\.f64\td\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "vfnma\.f32\ts\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "vfnms\.f64\td\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "vfnms\.f32\ts\[0-9\]" 1 } } */
Index: gcc/testsuite/gcc.target/arm/fma.h
===
--- gcc/testsuite/gcc.target/arm/fma.h  (revision 0)
+++ gcc/testsuite/gcc.target/arm/fma.h  (revision 189284)
@@ -0,0 +1,50 @@
+extern double fma (double, double, double);
+extern float fmaf (float, float, float);
+
+float
+vfma32 (float x, float y, float z)
+{
+  return fmaf (x, y, z);
+}
+
+float
+vfms32 (float x, float y, float z)
+{
+  return fmaf (-x, y, z);
+}
+
+float
+vfnms32 (float x, float y, float z)
+{
+  return fmaf (x, y, -z);
+}
+
+float
+vfnma32 (float x, float y, float z)
+{
+  return fmaf (-x, y, -z);
+}
+
+double
+vfma64 (double x, double y, double z)
+{
+  return fma (x, y, z);
+}
+
+double
+vfms64 (double x, double y, double z)
+{
+  return fma (-x, y, z);
+}
+
+double
+vfnms64 (double x, double y, double z)
+{
+  return fma (x, y, -z);
+}
+
+double
+vfnma64 (double x, double y, double z)
+{
+  return fma (-x, y, -z);
+}
Index: gcc/testsuite/gcc.target/arm/fma-sp.c
==

Re: [RFA] Enable dump-noaddr test to work in out of build tree testing

2012-06-28 Thread Matthew Gretton-Dann

On 28/06/12 14:38, Mike Stump wrote:

On Jun 28, 2012, at 1:28 AM, Matthew Gretton-Dann

>  wrote:

On 27/06/12 21:35, Andrew Pinski wrote:

On Wed, Jun 27, 2012 at 3:33 AM, Matthew Gretton-Dann
 wrote:

All,

This patch enables the dump-noaddr test to work in out-of-build-tree
testing.

[snip]


I created a much simpler patch which I have been meaning to submit.
I attached it for reference.


Thanks,
Andrew Pinski

ChangeLog:
* testsuite/gcc.c-torture/unsorted/dump-noaddr.x (dump_compare): Use
an absolute dump base instead of a relative one.

Index: gcc.c-torture/unsorted/dump-noaddr.x
===
--- gcc.c-torture/unsorted/dump-noaddr.x(revision 61452)
+++ gcc.c-torture/unsorted/dump-noaddr.x(revision 61453)
@@ -11,10 +11,10 @@ proc dump_compare { src options } {
  foreach option $option_list {
  file delete -force dump1
  file mkdir dump1
-c-torture-compile $src "$option $options -dumpbase dump1/$dumpbase -DMASK=1 -x 
c --param ggc-min-heapsize=1 -fdump-ipa-all -fdump-rtl-all -fdump-tree-all 
-fdump-noaddr"
+c-torture-compile $src "$option $options -dumpbase [pwd]/dump1/$dumpbase 
-DMASK=1 -x c --param ggc-min-heapsize=1 -fdump-ipa-all -fdump-rtl-all -fdump-tree-all 
-fdump-noaddr"
  file delete -force dump2
  file mkdir dump2
-c-torture-compile $src "$option $options -dumpbase dump2/$dumpbase -DMASK=2 -x 
c -fdump-ipa-all -fdump-rtl-all -fdump-tree-all -fdump-noaddr"
+c-torture-compile $src "$option $options -dumpbase [pwd]/dump2/$dumpbase 
-DMASK=2 -x c -fdump-ipa-all -fdump-rtl-all -fdump-tree-all -fdump-noaddr"
  foreach dump1 [lsort [glob -nocomplain dump1/*]] {
  regsub dump1/ $dump1 dump2/ dump2
  set dumptail "gcc.c-torture/unsorted/[file tail $dump1]"


What I don't like about this approach is that dump1 and dump2 are

>> created in the current working directory.


On vxworks as I recall we did a cd to tmpdir, is that generally true?
Also, if one telnets in or sshes into the host under test, the cd is
mandatory... as otherwise one would dump turds (that's a technical term)
in the home directory which would be very uncool.  Maybe a better
approach would be to cd to the right place if all the Canadian setups cd,
as that then unifies them.


With out of build-tree testing this may not (I believe) be the same as
$tmpdir (where temporaries are normally created).  Also the current
directory may already contain directories/files called dump1 or dump2
which will get destroyed by running the


The point of the cd was to get to a place where temps can be created
freely...


I've not committed my version yet in case I am missing something in my
reasoning above with regards to the relationship between the current
working directory and $tmpdir.


So the question would be, does his patch work for you?  It was unclear to
me if the answer is no.


Sorry - the patch works for my use case (build==host), but I was concerned 
over the use of [pwd] vs $tmpdir.



Oh, wait, I know what I don't like about Andrew's patch, pwd, is that the
directory on the target, the host or the build machine?  And is that
going to the host machine?  They are not the same.  One needs a directory
on the host machine.


I don't think this applies to my patch though, so are you still okay for my 
version to go in or is there something else I haven't considered?


Thanks,

Matt

--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltd


--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltd




Re: [RFA] Enable dump-noaddr test to work in out of build tree testing

2012-06-28 Thread Matthew Gretton-Dann

On 27/06/12 21:35, Andrew Pinski wrote:

On Wed, Jun 27, 2012 at 3:33 AM, Matthew Gretton-Dann
 wrote:

All,

This patch enables the dump-noaddr test to work in out-of-build-tree
testing.

[snip]


I created a much simpler patch which I have been meaning to submit.
I attached it for reference.


Thanks,
Andrew Pinski

ChangeLog:
* testsuite/gcc.c-torture/unsorted/dump-noaddr.x (dump_compare): Use
an absolute dump base instead of a relative one.

Index: gcc.c-torture/unsorted/dump-noaddr.x
===
--- gcc.c-torture/unsorted/dump-noaddr.x(revision 61452)
+++ gcc.c-torture/unsorted/dump-noaddr.x(revision 61453)
@@ -11,10 +11,10 @@ proc dump_compare { src options } {
  foreach option $option_list {
file delete -force dump1
file mkdir dump1
-   c-torture-compile $src "$option $options -dumpbase dump1/$dumpbase -DMASK=1 
-x c --param ggc-min-heapsize=1 -fdump-ipa-all -fdump-rtl-all -fdump-tree-all 
-fdump-noaddr"
+   c-torture-compile $src "$option $options -dumpbase [pwd]/dump1/$dumpbase 
-DMASK=1 -x c --param ggc-min-heapsize=1 -fdump-ipa-all -fdump-rtl-all -fdump-tree-all 
-fdump-noaddr"
file delete -force dump2
file mkdir dump2
-   c-torture-compile $src "$option $options -dumpbase dump2/$dumpbase -DMASK=2 
-x c -fdump-ipa-all -fdump-rtl-all -fdump-tree-all -fdump-noaddr"
+   c-torture-compile $src "$option $options -dumpbase [pwd]/dump2/$dumpbase 
-DMASK=2 -x c -fdump-ipa-all -fdump-rtl-all -fdump-tree-all -fdump-noaddr"
foreach dump1 [lsort [glob -nocomplain dump1/*]] {
regsub dump1/ $dump1 dump2/ dump2
set dumptail "gcc.c-torture/unsorted/[file tail $dump1]"


What I don't like about this approach is that dump1 and dump2 are created in 
the current working directory.  With out of build-tree testing this may not 
(I believe) be the same as $tmpdir (where temporaries are normally created). 
 Also the current directory may already contain directories/files called 
dump1 or dump2 which will get destroyed by running the testsuite.


Hence why my approach used tmpdir.

Does this reasoning make sense?

I've not committed my version yet in case I am missing something in my 
reasoning above with regards to the relationship between the current working 
directory and $tmpdir.


Thanks,

Matt

--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltd




[RFA] Enable dump-noaddr test to work in out of build tree testing

2012-06-27 Thread Matthew Gretton-Dann

All,

This patch enables the dump-noaddr test to work in out-of-build-tree testing.

It does this by making sure that the dump files generated during the
test are created under $tmpdir.

gcc/testsuite/ChangeLog:
2012-06-27  Matthew Gretton-Dann  

* gcc.c-torture/unsorted/dump-noaddr.x: Generate dump files in
tmpdir.

Tested both in and out of build-tree against an arm-none-eabi targetted 
compiler.


OK?

Thanks,

Matt

--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltd
diff --git a/gcc/testsuite/gcc.c-torture/unsorted/dump-noaddr.x b/gcc/testsuite/gcc.c-torture/unsorted/dump-noaddr.x
index a8174e0..bd84c06 100644
--- a/gcc/testsuite/gcc.c-torture/unsorted/dump-noaddr.x
+++ b/gcc/testsuite/gcc.c-torture/unsorted/dump-noaddr.x
@@ -9,14 +9,14 @@ proc dump_compare { src options } {
 
 # loop through all the options
 foreach option $option_list {
-	file delete -force dump1
-	file mkdir dump1
-	c-torture-compile $src "$option $options -dumpbase dump1/$dumpbase -DMASK=1 -x c --param ggc-min-heapsize=1 -fdump-ipa-all -fdump-rtl-all -fdump-tree-all -fdump-noaddr"
-	file delete -force dump2
-	file mkdir dump2
-	c-torture-compile $src "$option $options -dumpbase dump2/$dumpbase -DMASK=2 -x c -fdump-ipa-all -fdump-rtl-all -fdump-tree-all -fdump-noaddr"
-	foreach dump1 [lsort [glob -nocomplain dump1/*]] {
-	regsub dump1/ $dump1 dump2/ dump2
+	file delete -force $tmpdir/dump1
+	file mkdir $tmpdir/dump1
+	c-torture-compile $src "$option $options -dumpbase $tmpdir/dump1/$dumpbase -DMASK=1 -x c --param ggc-min-heapsize=1 -fdump-ipa-all -fdump-rtl-all -fdump-tree-all -fdump-noaddr"
+	file delete -force $tmpdir/dump2
+	file mkdir $tmpdir/dump2
+	c-torture-compile $src "$option $options -dumpbase $tmpdir/dump2/$dumpbase -DMASK=2 -x c -fdump-ipa-all -fdump-rtl-all -fdump-tree-all -fdump-noaddr"
+	foreach dump1 [lsort [glob -nocomplain $tmpdir/dump1/*]] {
+	set dump2 "$tmpdir/dump2/[file tail $dump1]"
 	set dumptail "gcc.c-torture/unsorted/[file tail $dump1]"
 	#puts "$option $dump1"
 	set tmp [ diff "$dump1" "$dump2" ]
@@ -30,8 +30,8 @@ proc dump_compare { src options } {
 	#exec diff $dump1 $dump2
 	}
 }
-file delete -force dump1
-file delete -force dump2
+file delete -force $tmpdir/dump1
+file delete -force $tmpdir/dump2
 }
 
 catch {dump_compare $src $options} result

[RFA/ARM 3/3] Add support for vfma* and vfms* Neon intrinsics

2012-06-25 Thread Matthew Gretton-Dann

All,

This commit adds support for the vmfa* and vfms* Neon intrinsics.

This updates neon.ml, and the various generation tools which use it,
arm_neon.h, the testsuite and documentation.

The documentation has not been regenerated for a while and so the
changes are larger than expected.

OK?

gcc/ChangeLog:

2012-06-25  Matthew Gretton-Dann  

* config/arm/arm.c (neon_builtin_data): Add vfma and vfms
builtins.
* config/arm/neon-docgen.ml (intrinsic_groups): Add
fused-multiply-* groups.
* config/neon-gen.ml (print_feature_test_start): New function.
(print_feature_test_end): Likewise.
(print_variant): Print feature test macros.
* config/arm/neon-testgen.ml (emit_prologue): Allow different
tests to require different effective targets.
(effective_target): New function.
(test_intrinsic): Specify correct effective targets.
* gcc/config/arm/neon.md (*fmsub4): Rename...
(fmsub4): ...to this.
(neon_vfma): New expand.
(neon_vfms): Likewise.
* config/neon.ml (opcode): Add Vfma and Vfms.
(features): Add Requires_feature.
(ops): Add VFMA and VFMS intrinsics.
* config/arm/arm_neon.h: Regenerate.
* doc/arm-neon-intrinsics.texi: Likewise.

gcc/testsuite/ChangeLog:

2012-06-25  Matthew Gretton-Dann  

* gcc.target/arm/neon/vfmaQf32.c: New testcase.
* gcc.target/arm/neon/vfmaf32.c: Likewise.
* gcc.target/arm/neon/vfmsQf32.c: Likewise.
* gcc.target/arm/neon/vfmsf32.c: Likewise.

Thanks,

Matt

--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltd
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index cba98f9..0b8b41e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -18919,6 +18919,8 @@ static neon_builtin_datum neon_builtin_data[] =
   VAR8 (BINOP, vmul, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf),
   VAR8 (TERNOP, vmla, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf),
   VAR3 (TERNOP, vmlal, v8qi, v4hi, v2si),
+  VAR2 (TERNOP, vfma, v2sf, v4sf),
+  VAR2 (TERNOP, vfms, v2sf, v4sf),
   VAR8 (TERNOP, vmls, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf),
   VAR3 (TERNOP, vmlsl, v8qi, v4hi, v2si),
   VAR4 (BINOP, vqdmulh, v4hi, v2si, v8hi, v4si),
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 0567895..3afe3b0 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -1350,6 +1350,38 @@ vqdmlsl_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c)
   return (int64x2_t)__builtin_neon_vqdmlslv2si (__a, __b, __c, 1);
 }
 
+#ifdef __ARM_FEATURE_FMA
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vfma_f32 (float32x2_t __a, float32x2_t __b, float32x2_t __c)
+{
+  return (float32x2_t)__builtin_neon_vfmav2sf (__a, __b, __c, 3);
+}
+
+#endif
+#ifdef __ARM_FEATURE_FMA
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vfmaq_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c)
+{
+  return (float32x4_t)__builtin_neon_vfmav4sf (__a, __b, __c, 3);
+}
+
+#endif
+#ifdef __ARM_FEATURE_FMA
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vfms_f32 (float32x2_t __a, float32x2_t __b, float32x2_t __c)
+{
+  return (float32x2_t)__builtin_neon_vfmsv2sf (__a, __b, __c, 3);
+}
+
+#endif
+#ifdef __ARM_FEATURE_FMA
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vfmsq_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c)
+{
+  return (float32x4_t)__builtin_neon_vfmsv4sf (__a, __b, __c, 3);
+}
+
+#endif
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vsub_s8 (int8x8_t __a, int8x8_t __b)
 {
diff --git a/gcc/config/arm/neon-docgen.ml b/gcc/config/arm/neon-docgen.ml
index 23e37b4..043b1e0 100644
--- a/gcc/config/arm/neon-docgen.ml
+++ b/gcc/config/arm/neon-docgen.ml
@@ -103,6 +103,8 @@ let intrinsic_groups =
 "Multiplication", single_opcode Vmul;
 "Multiply-accumulate", single_opcode Vmla;
 "Multiply-subtract", single_opcode Vmls;
+"Fused-multiply-accumulate", single_opcode Vfma;
+"Fused-multiply-subtract", single_opcode Vfms;
 "Subtraction", single_opcode Vsub;
 "Comparison (equal-to)", single_opcode Vceq;
 "Comparison (greater-than-or-equal-to)", single_opcode Vcge;
diff --git a/gcc/config/arm/neon-gen.ml b/gcc/config/arm/neon-gen.ml
index 112c8be..0297597 100644
--- a/gcc/config/arm/neon-gen.ml
+++ b/gcc/config/arm/neon-gen.ml
@@ -239,6 +239,24 @@ let rec mode_suffix elttype shape =
 and srcmode = mode_of_elt src shape in
 string_of_mode dstmode ^ string_of_mode srcmode
 
+let print_feature_test_start features =
+  try
+match List.find (fun feature ->
+   match feature with Requires_feature _ -> true
+| _ -> false)
+

[RFA/ARM 2/3] Add vectorizer support for VFMA

2012-06-25 Thread Matthew Gretton-Dann

All,

This patch adds vectoriser support for VFMA to the ARM Neon backend.

Note that the VFP VFNMA and VFNMS instructions do not have Neon
equivalents.

OK?

gcc/ChangeLog:

2012-06-25  Matthew Gretton-Dann  

* config/arm/neon.md (fma4): New pattern.
(*fmsub4): Likewise.

2012-06-25  Matthew Gretton-Dann  

* gcc.target/arm/neon-vfma-1.c: New testcase.
* gcc.target/arm/neon-vfms-1.c: Likewise.
* lib/target-supports.exp (add_options_for_arm_neonv2): New
function.
(check_effective_target_arm_neonv2_ok_nocache): Likewise.
(check_effective_target_arm_neonv2_ok): Likewise.
(check_effective_target_arm_neonv2_hw): Likewise.
(check_effective_target_arm_neonv2): Likewise.

Thanks,

Matt
--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltd
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 4568dea..4d12fb3 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -711,6 +711,33 @@
 (const_string "neon_mla_qqq_32_qqd_32_scalar")]
 )
 
+;; Fused multiply-accumulate
+(define_insn "fma4"
+  [(set (match_operand:VCVTF 0 "register_operand" "=w")
+(fma:VCVTF (match_operand:VCVTF 1 "register_operand" "w")
+		 (match_operand:VCVTF 2 "register_operand" "w")
+		 (match_operand:VCVTF 3 "register_operand" "0")))]
+  "TARGET_NEON && TARGET_FMA"
+  "vfma%?.\\t%0, %1, %2"
+  [(set (attr "neon_type")
+	(if_then_else (match_test "")
+		  (const_string "neon_fp_vmla_ddd")
+		  (const_string "neon_fp_vmla_qqq")))]
+)
+
+(define_insn "*fmsub4"
+  [(set (match_operand:VCVTF 0 "register_operand" "=w")
+(fma:VCVTF (neg:VCVTF (match_operand:VCVTF 1 "register_operand" "w"))
+		   (match_operand:VCVTF 2 "register_operand" "w")
+		   (match_operand:VCVTF 3 "register_operand" "0")))]
+  "TARGET_NEON && TARGET_FMA"
+  "vfms%?.\\t%0, %1, %2"
+  [(set (attr "neon_type")
+	(if_then_else (match_test "")
+		  (const_string "neon_fp_vmla_ddd")
+		  (const_string "neon_fp_vmla_qqq")))]
+)
+
 (define_insn "ior3"
   [(set (match_operand:VDQ 0 "s_register_operand" "=w,w")
 	(ior:VDQ (match_operand:VDQ 1 "s_register_operand" "w,0")
diff --git a/gcc/testsuite/gcc.target/arm/neon-vfma-1.c b/gcc/testsuite/gcc.target/arm/neon-vfma-1.c
new file mode 100644
index 000..a003a82
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vfma-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neonv2_ok } */
+/* { dg-options "-O2 -ftree-vectorize -ffast-math" } */
+/* { dg-add-options arm_neonv2 } */
+/* { dg-final { scan-assembler "vfma\\.f32\[	\]+\[dDqQ]" } } */
+
+/* Verify that VFMA is used.  */
+void f1(int n, float a, float x[], float y[]) {
+  int i;
+  for (i = 0; i < n; ++i)
+y[i] = a * x[i] + y[i];
+}
diff --git a/gcc/testsuite/gcc.target/arm/neon-vfms-1.c b/gcc/testsuite/gcc.target/arm/neon-vfms-1.c
new file mode 100644
index 000..8cefd8a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vfms-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neonv2_ok } */
+/* { dg-options "-O2 -ftree-vectorize -ffast-math" } */
+/* { dg-add-options arm_neonv2 } */
+/* { dg-final { scan-assembler "vfms\\.f32\[	\]+\[dDqQ]" } } */
+
+/* Verify that VFMS is used.  */
+void f1(int n, float a, float x[], float y[]) {
+  int i;
+  for (i = 0; i < n; ++i)
+y[i] = a * -x[i] + y[i];
+}
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index bc5baa7..9fc8a5c 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2082,6 +2082,19 @@ proc add_options_for_arm_neon { flags } {
 return "$flags $et_arm_neon_flags"
 }
 
+# Add the options needed for NEON.  We need either -mfloat-abi=softfp
+# or -mfloat-abi=hard, but if one is already specified by the
+# multilib, use it.  Similarly, if a -mfpu option already enables
+# NEON, do not add -mfpu=neon.
+
+proc add_options_for_arm_neonv2 { flags } {
+if { ! [check_effective_target_arm_neonv2_ok] } {
+	return "$flags"
+}
+global et_arm_neonv2_flags
+return "$flags $et_arm_neonv2_flags"
+}
+
 # Return 1 if this is an ARM target supporting -mfpu=neon
 # -mfloat-abi=softfp or equivalent options.  Some multilibs may be
 # incompatible with these options.  Also set et_arm_neon_flags to the
@@ -2110,6 +2123,38 @@ proc check_effective_target_arm_neon_ok { } {
 		check_effective_target_arm_neon_ok_nocache]
 }
 
+# Return 1 if th

[RFA/ARM 1/3] Add VFP support for VFMA and friends

2012-06-25 Thread Matthew Gretton-Dann

All,

This patch adds support to the ARM backend for generating floating-point
fused multiply-accumulate.

OK?

gcc/ChangeLog:

2012-06-25  Matthew Gretton-Dann  

* config/arm/iterators.md (SDF): New mode iterator.
(V_if_elem): Add support for SF and DF modes.
(V_reg): Likewise.
(F_w_constraint): New mode iterator attribute.
(F_r_constraint): Likewise.
(F_fma_type): Likewise.
(F_target): Likewise.
config/arm/vfp.md (fma4): New pattern.
(*fmsub4): Likewise.
(*fmnsub4): Likewise.
(*fmnadd4): Likewise.

Thanks,

Matt

--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltd
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 795a5ee..3063f00 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -42,6 +42,9 @@
 ;; A list of the 32bit and 64bit integer modes
 (define_mode_iterator SIDI [SI DI])
 
+;; A list of modes which the VFP unit can handle
+(define_mode_iterator SDF [SF DF])
+
 ;; Integer element sizes implemented by IWMMXT.
 (define_mode_iterator VMMX [V2SI V4HI V8QI])
 
@@ -245,7 +248,8 @@
  (V4HI "P") (V8HI  "q")
  (V2SI "P") (V4SI  "q")
  (V2SF "P") (V4SF  "q")
- (DI   "P") (V2DI  "q")])
+ (DI   "P") (V2DI  "q")
+			 (SF   "")  (DF"P")])
 
 ;; Wider modes with the same number of elements.
 (define_mode_attr V_widen [(V8QI "V8HI") (V4HI "V4SI") (V2SI "V2DI")])
@@ -303,7 +307,8 @@
  (V4HI "i16") (V8HI  "i16")
  (V2SI "i32") (V4SI  "i32")
  (DI   "i64") (V2DI  "i64")
- (V2SF "f32") (V4SF  "f32")])
+ (V2SF "f32") (V4SF  "f32")
+		 (SF "f32") (DF "f64")])
 
 ;; Same, but for operations which work on signed values.
 (define_mode_attr V_s_elem [(V8QI "s8")  (V16QI "s8")
@@ -423,6 +428,12 @@
 ;; Mode attribute for vshll.
 (define_mode_attr V_innermode [(V8QI "QI") (V4HI "HI") (V2SI "SI")])
 
+;; Mode attributes used for fused-multiply-accumulate VFP support
+(define_mode_attr F_w_constraint [(SF "=t") (DF "=w")])
+(define_mode_attr F_r_constraint [(SF "t") (DF "w")])
+(define_mode_attr F_fma_type [(SF "fmacs") (DF "fmacd")])
+(define_mode_attr F_target [(SF "") (DF "&& TARGET_VFP_DOUBLE")])
+
 ;;
 ;; Code attributes
 ;;
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 2061414..2a50353 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -890,6 +890,54 @@
(set_attr "type" "fmacd")]
 )
 
+;; Fused-multiply-accumulate
+
+(define_insn "fma4"
+  [(set (match_operand:SDF 0 "register_operand" "")
+(fma:SDF (match_operand:SDF 1 "register_operand" "")
+		 (match_operand:SDF 2 "register_operand" "")
+		 (match_operand:SDF 3 "register_operand" "0")))]
+  "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_FMA "
+  "vfma%?.\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "")]
+)
+
+(define_insn "*fmsub4"
+  [(set (match_operand:SDF 0 "register_operand" "")
+	(fma:SDF (neg:SDF (match_operand:SDF 1 "register_operand" 
+	 ""))
+		 (match_operand:SDF 2 "register_operand" "")
+		 (match_operand:SDF 3 "register_operand" "0")))]
+  "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_FMA "
+  "vfms%?.\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "")]
+)
+
+(define_insn "*fnmsub4"
+  [(set (match_operand:SDF 0 "register_operand" "")
+	(fma:SDF (match_operand:SDF 1 "register_operand"  "")
+		 (match_operand:SDF 2 "register_operand" "")
+		 (neg:SDF (match_operand:SDF 3 "register_operand" "0"]
+  "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_FMA "
+  "vfnms%?.\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "")]
+)
+
+(define_insn "*fnmadd4"
+  [(set (match_operand:SDF 0 "register

[RFA/ARM 0/3] Add support for fused multiply-accumulate to ARM backend

2012-06-25 Thread Matthew Gretton-Dann

All,

This sequence of three patches adds support to the ARM Backend for fused 
multiply-accumulate patterns on cores that support it.


Patch 1 adds floating-point support.
Patch 2 adds Advanced SIMD support in the auto-vectorizer.
Patch 3 adds intrinsic support in arm_neon.h.

These patches have been tested as a whole using QEmu targetting 
arm-none-eabi, and with an arm-linux-gnueabi bootstrap.


They require the ACLE feature test macros patch currently under review here:
http://gcc.gnu.org/ml/gcc-patches/2012-06/msg01592.html

Thanks,

Matt

--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltd




Re: [RFA/ARM] Add ACLE Predefined macro support

2012-06-25 Thread Matthew Gretton-Dann
Further testing has found a couple of failures to build with a C++ compiler, 
and trunk has moved on a bit so the patch doesn't apply cleanly.


An updated patch is attached.

OK for trunk?

Same ChangeLog as before.

Thanks,

Matt

On 20/06/12 11:18, Matthew Gretton-Dann wrote:

PING.

On Mon, May 28, 2012 at 10:51:27AM +0100, Matthew Gretton-Dann wrote:

All,

This patch adds a variety of predefined macros to reveal the presence of
various features of the ARM architecture.  These are detailed in the ARM
C Language Extensions specification, available here:
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053-/index.html

This patch then adds compiler predefines for:

__ARM_SIZEOF_MINIMAL_ENUM which is defined as the size in bytes
of the smallest enum.

__ARM_ARCH which is defined as the major revision of the ARM
instruction set which the target implements.

__ARM_ARCH_ISA_THUMB which is defined as the major revision of
the thumb instruction set which the target implements.

__ARM_ARCH_PROFILE which is defined on ARMv7 targets, and ARMv6-M
targets to be the character value of `A', `R' or `M', as defined
by the target's architecture profile.

__ARM_FEATURE_LDREX which is defined as a bit mask, composed of
the widths of `ldrex' available on the target. These widths are:
bit 0 - byte.
bit 1 - 16-bit halfword.
bit 2 - 32-bit word.
bit 3 - 64-bit doubleword.

__ARM_FEATURE_CLZ which is defined for targets which support
the `clz' instruction.

__ARM_FEATURE_SIMD32 which is defined when the ARMv6 integer
SIMD instructions are available.

__ARM_FEATURE_QBIT which is defined when the Q-Bit is present in the
APSR.

__ARM_FEATURE_SAT which is defined when the saturation instructions are
available.

__ARM_FP which is defined as a bit mask composed of the widths
of floating-point types with hardware support on the target.
These widths are:
bit 1 - 16-bit half precision.
bit 2 - 32-bit single precision.
bit 3 - 64-bit double precision.

__ARM_FP16_FORMAT_IEEE which is defined when the IEEE 754-2008
standard for 16-bit floating point representation is used.

__ARM_FP16_FORMAT_ALTERNATIVE which is defined when the ARM
alternative standard for 16-bit floating point representation
is used.

__ARM_FEATURE_FMA which is defined when the fused multiply-accumulate
instructions are available for floating-point and/or Advanced SIMD
values.

__ARM_NEON_FP which is defined as a bit mask composed of the widths
of floating point values supported by the NEON hardware. These widths
are, as with __ARM_FP:
bit 1 - 16-bit half precision.
bit 2 - 32-bit single precision.
bit 3 - 64-bit double precision.

__ARM_WMMX which is defined where iwmmx operations are available
on the target.

As these macros may expand to something other than `1', we also update
cpp.texi to reflect this fact.

OK?

Thanks,

Matt

gcc/ChangeLog:
2012-05-28  Matthew Gretton-Dann  
  James Greenhalgh  

   * config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Add new built-ins.
   (TARGET_FMA): New macro.
   (TARGET_ARM_QBIT, TARGET_ARM_SAT): Likewise.
   (TARGET_ARM_ARCH): Likewise.
   (TARGET_ARM_ARCH_ISA_THUMB): Likewise.
   (TARGET_V6M, TARGET_V7M): Likewise.
   (TARGET_ARM_ARCH_PROFILE): Likewise.
   (TARGET_ARM_FEATURE_LDREX): Likewise.
   (TARGET_ARM_FP, TARGET_NEON_FP): Likewise.
   (ARM_MIN_ENUM_SIZE): Likewise.
   * config/arm/arm.c (arm_file_start): Refactor appropriately.
   (base_architecture): New enumeration.
   (arm_base_arch): New global variable.
   (processors): Add field base_arch.
   (ARM_ARCH, ARM_CORE): Adjust accordingly.
   (arm_option_override): Add initialization of arm_base_arch.
   * doc/cpp.texi (system-specific predefined macros.): Change.

gcc/testsuite/ChangeLog:
2012-05-28  Matthew Gretton-Dann  
  James Greenhalgh  

   * gcc.target/arm/ftest-support-arm.h New testcase.
   * gcc.target/arm/ftest-support-thumb.h Likewise.
   * gcc.target/arm/ftest-support.h Likewise.
   * gcc.target/arm/ftest-armv4-arm.c: Likewise.
   * gcc.target/arm/ftest-armv4t-arm.c: Likewise.
   * gcc.target/arm/ftest-armv4t-thumb.c: Likewise.
   * gcc.target/arm/ftest-armv5t-arm.c Likewise.
   * gcc.target/arm/ftest-armv5t-thumb.c Likewise.
   * gcc.target/arm/ftest-armv5te-arm.c: Likewise.
   * gcc.target/arm/ftest-armv5te-thumb.c: Likewise.
   * gcc.target/arm/ftest-armv6-arm.c Likewise.
   * gcc.target/arm/ftest-armv6-thumb.c Likewise.
   * gcc.target/arm/ftest-armv6k-arm.c Likewise.
   * gcc.target/arm/ftest-armv6k-thumb.c Likewise.
   * gcc.target/arm/ftest-armv6m-thumb.c: Likewise.
   * gcc.target/arm/ftest-armv6t2-arm.c: Likewise.
   * gcc.target/arm/ftest-armv6t2-thumb.c: Likewise.
   * gcc.target/arm/ftest-armv6z-arm.c: Likewise.
   * gcc.target/arm/ftest-armv6z-thumb.c: Likewise.
   * gcc.target/arm/ftest-armv7a-arm.c Likewise.
   * gcc.

Re: [RFA/ARM] Add ACLE Predefined macro support

2012-06-20 Thread Matthew Gretton-Dann
PING.

On Mon, May 28, 2012 at 10:51:27AM +0100, Matthew Gretton-Dann wrote:
> All,
>
> This patch adds a variety of predefined macros to reveal the presence of
> various features of the ARM architecture.  These are detailed in the ARM
> C Language Extensions specification, available here:
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053-/index.html
>
> This patch then adds compiler predefines for:
>
> __ARM_SIZEOF_MINIMAL_ENUM which is defined as the size in bytes
> of the smallest enum.
>
> __ARM_ARCH which is defined as the major revision of the ARM
> instruction set which the target implements.
>
> __ARM_ARCH_ISA_THUMB which is defined as the major revision of
> the thumb instruction set which the target implements.
>
> __ARM_ARCH_PROFILE which is defined on ARMv7 targets, and ARMv6-M
> targets to be the character value of `A', `R' or `M', as defined
> by the target's architecture profile.
>
> __ARM_FEATURE_LDREX which is defined as a bit mask, composed of
> the widths of `ldrex' available on the target. These widths are:
> bit 0 - byte.
> bit 1 - 16-bit halfword.
> bit 2 - 32-bit word.
> bit 3 - 64-bit doubleword.
>
> __ARM_FEATURE_CLZ which is defined for targets which support
> the `clz' instruction.
>
> __ARM_FEATURE_SIMD32 which is defined when the ARMv6 integer
> SIMD instructions are available.
>
> __ARM_FEATURE_QBIT which is defined when the Q-Bit is present in the
> APSR.
>
> __ARM_FEATURE_SAT which is defined when the saturation instructions are
> available.
>
> __ARM_FP which is defined as a bit mask composed of the widths
> of floating-point types with hardware support on the target.
> These widths are:
> bit 1 - 16-bit half precision.
> bit 2 - 32-bit single precision.
> bit 3 - 64-bit double precision.
>
> __ARM_FP16_FORMAT_IEEE which is defined when the IEEE 754-2008
> standard for 16-bit floating point representation is used.
>
> __ARM_FP16_FORMAT_ALTERNATIVE which is defined when the ARM
> alternative standard for 16-bit floating point representation
> is used.
>
> __ARM_FEATURE_FMA which is defined when the fused multiply-accumulate
> instructions are available for floating-point and/or Advanced SIMD
> values.
>
> __ARM_NEON_FP which is defined as a bit mask composed of the widths
> of floating point values supported by the NEON hardware. These widths
> are, as with __ARM_FP:
> bit 1 - 16-bit half precision.
> bit 2 - 32-bit single precision.
> bit 3 - 64-bit double precision.
>
> __ARM_WMMX which is defined where iwmmx operations are available
> on the target.
>
> As these macros may expand to something other than `1', we also update
> cpp.texi to reflect this fact.
>
> OK?
>
> Thanks,
>
> Matt
>
> gcc/ChangeLog:
> 2012-05-28  Matthew Gretton-Dann  
>  James Greenhalgh  
>
>   * config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Add new built-ins.
>   (TARGET_FMA): New macro.
>   (TARGET_ARM_QBIT, TARGET_ARM_SAT): Likewise.
>   (TARGET_ARM_ARCH): Likewise.
>   (TARGET_ARM_ARCH_ISA_THUMB): Likewise.
>   (TARGET_V6M, TARGET_V7M): Likewise.
>   (TARGET_ARM_ARCH_PROFILE): Likewise.
>   (TARGET_ARM_FEATURE_LDREX): Likewise.
>   (TARGET_ARM_FP, TARGET_NEON_FP): Likewise.
>   (ARM_MIN_ENUM_SIZE): Likewise.
>   * config/arm/arm.c (arm_file_start): Refactor appropriately.
>   (base_architecture): New enumeration.
>   (arm_base_arch): New global variable.
>   (processors): Add field base_arch.
>   (ARM_ARCH, ARM_CORE): Adjust accordingly.
>   (arm_option_override): Add initialization of arm_base_arch.
>   * doc/cpp.texi (system-specific predefined macros.): Change.
>
> gcc/testsuite/ChangeLog:
> 2012-05-28  Matthew Gretton-Dann  
>  James Greenhalgh  
>
>   * gcc.target/arm/ftest-support-arm.h New testcase.
>   * gcc.target/arm/ftest-support-thumb.h Likewise.
>   * gcc.target/arm/ftest-support.h Likewise.
>   * gcc.target/arm/ftest-armv4-arm.c: Likewise.
>   * gcc.target/arm/ftest-armv4t-arm.c: Likewise.
>   * gcc.target/arm/ftest-armv4t-thumb.c: Likewise.
>   * gcc.target/arm/ftest-armv5t-arm.c Likewise.
>   * gcc.target/arm/ftest-armv5t-thumb.c Likewise.
>   * gcc.target/arm/ftest-armv5te-arm.c: Likewise.
>   * gcc.target/arm/ftest-armv5te-thumb.c: Likewise.
>   * gcc.target/arm/ftest-armv6-arm.c Likewise.
>   * gcc.target/arm/ftest-armv6-thumb.c Likewise.
>   * gcc.target/arm/ftest-armv6k-arm.c Likewise.
>   * gcc.target/arm/ftest-armv6k-thumb.c Likewise.
>   * gcc.target/arm/ftest-armv6m-thumb.c: Likewise.
>   * gcc.target/arm/ftest-armv6

Re: [RFA/ARM] Add ACLE Predefined macro support

2012-05-28 Thread Matthew Gretton-Dann

On 28/05/12 12:27, Joseph S. Myers wrote:

On Mon, 28 May 2012, Matthew Gretton-Dann wrote:


This patch adds a variety of predefined macros to reveal the presence of
various features of the ARM architecture.  These are detailed in the ARM
C Language Extensions specification, available here:
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053-/index.html


Are there any plans to implement the change in __fp16 semantics in this
document (single rounding for conversion from double to __fp16, whereas
the specification previously implemented was double rounding)?  And, more
generally, the various features in the document not currently implemented
in GCC or implemented differently from the specification?


Yes.  We have these implemented against 4.7 internally, and are slowly 
rebasing these to trunk.  Unfortunately, I have no timescale for when these 
will be released.



__ARM_FEATURE_FMA which is defined when the fused multiply-accumulate
instructions are available for floating-point and/or Advanced SIMD
values.


Note that the ARM port is currently lacking the fma instruction patterns
to implement the __builtin_fma* built-in functions for processors with
those instructions.  Support for those would be a straightforward and
useful addition to GCC.


I have a patchset currently under test that will add FMA support to the ARM 
backend (both for VFP and Neon).  Hopefully this will be sent for community 
review sometime this week.



Is there a reason the ACLE doesn't include a predefined macro to say
whether registers d16-d31 are known at compile time to be available?  That
would occasionally be useful (see my comments in
<http://sourceware.org/ml/libc-ports/2012-04/msg00087.html>, for example).


ACLE is interested in C language extensions.  So, in general, it only notes 
architecture features that the presence/absence of would change how you 
would write your C code.


The number of registers available in the VFP unit is of no interest to a C 
programmer, and so ACLE doesn't provide a feature test macro for it.


ACLE does not provide support for those writing assembly directly.

I agree, however, that from a GCC+Binutils toolchain perspective, feature 
test macros for such features would be useful.


Thanks,

Matt


--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltd



[RFA/ARM] Add ACLE Predefined macro support

2012-05-28 Thread Matthew Gretton-Dann

All,

This patch adds a variety of predefined macros to reveal the presence of
various features of the ARM architecture.  These are detailed in the ARM
C Language Extensions specification, available here:
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053-/index.html

This patch then adds compiler predefines for:

__ARM_SIZEOF_MINIMAL_ENUM which is defined as the size in bytes
of the smallest enum.

__ARM_ARCH which is defined as the major revision of the ARM
instruction set which the target implements.

__ARM_ARCH_ISA_THUMB which is defined as the major revision of
the thumb instruction set which the target implements.

__ARM_ARCH_PROFILE which is defined on ARMv7 targets, and ARMv6-M
targets to be the character value of `A', `R' or `M', as defined
by the target's architecture profile.

__ARM_FEATURE_LDREX which is defined as a bit mask, composed of
the widths of `ldrex' available on the target. These widths are:
bit 0 - byte.
bit 1 - 16-bit halfword.
bit 2 - 32-bit word.
bit 3 - 64-bit doubleword.

__ARM_FEATURE_CLZ which is defined for targets which support
the `clz' instruction.

__ARM_FEATURE_SIMD32 which is defined when the ARMv6 integer
SIMD instructions are available.

__ARM_FEATURE_QBIT which is defined when the Q-Bit is present in the
APSR.

__ARM_FEATURE_SAT which is defined when the saturation instructions are
available.

__ARM_FP which is defined as a bit mask composed of the widths
of floating-point types with hardware support on the target.
These widths are:
bit 1 - 16-bit half precision.
bit 2 - 32-bit single precision.
bit 3 - 64-bit double precision.

__ARM_FP16_FORMAT_IEEE which is defined when the IEEE 754-2008
standard for 16-bit floating point representation is used.

__ARM_FP16_FORMAT_ALTERNATIVE which is defined when the ARM
alternative standard for 16-bit floating point representation
is used.

__ARM_FEATURE_FMA which is defined when the fused multiply-accumulate
instructions are available for floating-point and/or Advanced SIMD
values.

__ARM_NEON_FP which is defined as a bit mask composed of the widths
of floating point values supported by the NEON hardware. These widths
are, as with __ARM_FP:
bit 1 - 16-bit half precision.
bit 2 - 32-bit single precision.
bit 3 - 64-bit double precision.

__ARM_WMMX which is defined where iwmmx operations are available
on the target.

As these macros may expand to something other than `1', we also update
cpp.texi to reflect this fact.

OK?

Thanks,

Matt

gcc/ChangeLog:
2012-05-28  Matthew Gretton-Dann  
James Greenhalgh  

* config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Add new built-ins.
(TARGET_FMA): New macro.
(TARGET_ARM_QBIT, TARGET_ARM_SAT): Likewise.
(TARGET_ARM_ARCH): Likewise.
(TARGET_ARM_ARCH_ISA_THUMB): Likewise.
(TARGET_V6M, TARGET_V7M): Likewise.
(TARGET_ARM_ARCH_PROFILE): Likewise.
(TARGET_ARM_FEATURE_LDREX): Likewise.
(TARGET_ARM_FP, TARGET_NEON_FP): Likewise.
(ARM_MIN_ENUM_SIZE): Likewise.
* config/arm/arm.c (arm_file_start): Refactor appropriately.
(base_architecture): New enumeration.
(arm_base_arch): New global variable.
(processors): Add field base_arch.
(ARM_ARCH, ARM_CORE): Adjust accordingly.
(arm_option_override): Add initialization of arm_base_arch.
* doc/cpp.texi (system-specific predefined macros.): Change.

gcc/testsuite/ChangeLog:
2012-05-28  Matthew Gretton-Dann  
James Greenhalgh  

* gcc.target/arm/ftest-support-arm.h New testcase.
* gcc.target/arm/ftest-support-thumb.h Likewise.
* gcc.target/arm/ftest-support.h Likewise.
* gcc.target/arm/ftest-armv4-arm.c: Likewise.
* gcc.target/arm/ftest-armv4t-arm.c: Likewise.
* gcc.target/arm/ftest-armv4t-thumb.c: Likewise.
* gcc.target/arm/ftest-armv5t-arm.c Likewise.
* gcc.target/arm/ftest-armv5t-thumb.c Likewise.
* gcc.target/arm/ftest-armv5te-arm.c: Likewise.
* gcc.target/arm/ftest-armv5te-thumb.c: Likewise.
* gcc.target/arm/ftest-armv6-arm.c Likewise.
* gcc.target/arm/ftest-armv6-thumb.c Likewise.
* gcc.target/arm/ftest-armv6k-arm.c Likewise.
* gcc.target/arm/ftest-armv6k-thumb.c Likewise.
* gcc.target/arm/ftest-armv6m-thumb.c: Likewise.
* gcc.target/arm/ftest-armv6t2-arm.c: Likewise.
* gcc.target/arm/ftest-armv6t2-thumb.c: Likewise.
* gcc.target/arm/ftest-armv6z-arm.c: Likewise.
* gcc.target/arm/ftest-armv6z-thumb.c: Likewise.
* gcc.target/arm/ftest-armv7a-arm.c Likewise.
* gcc.target/arm/ftest-armv7a-thumb.c Likewise.
* gcc.target/arm/ftest-armv7m-thumb.c: Likewise.
* gcc.target/arm/ftest-armv7em-thumb.c: Likewise.
* gcc.target/arm/ftest-armv7r-arm.c Likewise.
* gcc.target/arm/ftest-armv7r-thumb.c Likewise.
* gcc/testsuite/lib/target-supports.exp
(check_e

[RFA/ARM]: Correct Neon testsuite generation

2012-03-12 Thread Matthew Gretton-Dann
All,

The commit to fix PR51534 did not update the testsuite (as no changes were
expected there).

Unfortunately, this means that I didn't notice that the Neon testsuite generator
is broken.  The attached patch fixes the generator.

Checked by re-running the Neon testsuite and arm_neon.h generators and ensuring 
no
changes in the generated testsuite/header.

OK for trunk?

OK for backporting to GCC 4.7?

Thanks,

Matt

gcc/ChangeLog:

2012-03-12  Matthew Gretton-Dann  

* config/arm/neon.ml (ops): Fixup expected instructions for
unsigned vector compares.
-- 
Matthew Gretton-Dann
Principal Engineer, PD Software, ARM Ltd.diff --git a/gcc/config/arm/neon.ml b/gcc/config/arm/neon.ml
index 363e55c..85eb5ec 100644
--- a/gcc/config/arm/neon.ml
+++ b/gcc/config/arm/neon.ml
@@ -780,14 +780,19 @@ let ops =
 
 (* Comparison, greater-than or equal.  *)
 Vcge, [], All (3, Dreg), "vcge", cmp_sign_matters, F32 :: s_8_32;
-Vcge, [Builtin_name "vcgeu"], All (3, Dreg), "vcge", cmp_sign_matters, 
u_8_32;
+Vcge, [Instruction_name ["vcge"]; Builtin_name "vcgeu"],
+  All (3, Dreg), "vcge", cmp_sign_matters,
+  u_8_32;
 Vcge, [], All (3, Qreg), "vcgeQ", cmp_sign_matters, F32 :: s_8_32;
-Vcge, [Builtin_name "vcgeu"], All (3, Qreg), "vcgeQ", cmp_sign_matters, 
u_8_32;
+Vcge, [Instruction_name ["vcge"]; Builtin_name "vcgeu"],
+  All (3, Qreg), "vcgeQ", cmp_sign_matters,
+  u_8_32;
 
 (* Comparison, less-than or equal.  *)
 Vcle, [Flipped "vcge"], All (3, Dreg), "vcle", cmp_sign_matters,
   F32 :: s_8_32;
-Vcle, [Flipped "vcgeu"], All (3, Dreg), "vcle", cmp_sign_matters,
+Vcle, [Instruction_name ["vcge"]; Flipped "vcgeu"],
+  All (3, Dreg), "vcle", cmp_sign_matters,
   u_8_32;
 Vcle, [Instruction_name ["vcge"]; Flipped "vcgeQ"],
   All (3, Qreg), "vcleQ", cmp_sign_matters,
@@ -798,14 +803,19 @@ let ops =
 
 (* Comparison, greater-than.  *)
 Vcgt, [], All (3, Dreg), "vcgt", cmp_sign_matters, F32 :: s_8_32;
-Vcgt, [Builtin_name "vcgtu"], All (3, Dreg), "vcgt", cmp_sign_matters, 
u_8_32;
+Vcgt, [Instruction_name ["vcgt"]; Builtin_name "vcgtu"],
+  All (3, Dreg), "vcgt", cmp_sign_matters,
+  u_8_32;
 Vcgt, [], All (3, Qreg), "vcgtQ", cmp_sign_matters, F32 :: s_8_32;
-Vcgt, [Builtin_name "vcgtu"], All (3, Qreg), "vcgtQ", cmp_sign_matters, 
u_8_32;
+Vcgt, [Instruction_name ["vcgt"]; Builtin_name "vcgtu"],
+  All (3, Qreg), "vcgtQ", cmp_sign_matters,
+  u_8_32;
 
 (* Comparison, less-than.  *)
 Vclt, [Flipped "vcgt"], All (3, Dreg), "vclt", cmp_sign_matters,
   F32 :: s_8_32;
-Vclt, [Flipped "vcgtu"], All (3, Dreg), "vclt", cmp_sign_matters,
+Vclt, [Instruction_name ["vcgt"]; Flipped "vcgtu"],
+  All (3, Dreg), "vclt", cmp_sign_matters,
   u_8_32;
 Vclt, [Instruction_name ["vcgt"]; Flipped "vcgtQ"],
   All (3, Qreg), "vcltQ", cmp_sign_matters,

Re: [Patch ARM] Fix PR 49069.

2012-03-01 Thread Matthew Gretton-Dann
PING.

On Tue, Jan 24, 2012 at 04:10:19PM +, Sameera Deshpande wrote:
> Hi,
> 
> Please find attached the patch fixing bug 49069.
> 
> This patch is tested with check-gcc on trunk and 4.6 without regression.
> OK for trunk?
> Is it fine to backport to 4.6 branch?
> 
> ChangeLog:
> 2012-01-24  Sameera Deshpande  
> PR target/49069
> gcc/config/arm/arm.md (cstoredi4): Handle the case when both
> operands are const_int.
> 
> gcc/testsuite/ChangeLog:
> 2012-01-24  Sameera Deshpande  
> PR target/49069
> gcc.target/arm/pr49069.c: New compile-only test.
> 
> - Thanks and regards,
>   Sameera D.

> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 751997f..e3dc98f 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -7911,8 +7911,9 @@
>   enum rtx_code code = GET_CODE (operands[1]);
>  
>   /* We should not have two constants.  */
> - gcc_assert (GET_MODE (operands[2]) == DImode
> -  || GET_MODE (operands[3]) == DImode);
> + if (!(GET_MODE (operands[2]) == DImode || GET_MODE (operands[3]) == 
> DImode)
> + && !(reload_in_progress || reload_completed))
> +   operands[3] = force_reg (DImode, operands[3]);
>  
>  /* Flip unimplemented DImode comparisons to a form that
> arm_gen_compare_reg can handle.  */
> diff --git a/gcc/testsuite/gcc.target/arm/pr49069.c 
> b/gcc/testsuite/gcc.target/arm/pr49069.c
> new file mode 100644
> index 000..3cc903e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/pr49069.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Os -mfloat-abi=softfp -mfpu=vfpv3-d16" } */
> +
> +__extension__ typedef unsigned long long int uint64_t;
> +
> +static int
> +func2 (int a, int b)
> +{
> +  return a == 0 ? a : a / b;
> +}
> +
> +int array1[1];
> +const uint64_t array2[1] = { 1 };
> +
> +void
> +foo (void)
> +{
> +  for (array1[0] = 0; array1[0] == 1; array1[0]++)
> +{
> +}
> +  if (bar (array2[0] == func2 (array1[0], 0)) == 0)
> +{
> +}
> +}

-- 
Matthew Gretton-Dann
Principal Engineer, PD Software, ARM Ltd.



[RFA/ARM] Revert r183011: Cortex-A15 branch costs

2012-02-21 Thread Matthew Gretton-Dann
The attached patch reverts revision 183011, which changed the branch
cost heuristics for Cortex-A15.

This is because further benchmarking shows that there are significant
regressions on some benchmarks which outweigh other performance gains.

OK?

Thanks,

Matt

gcc/ChangeLog:

2012-02-21  Matthew Gretton-Dann  

Revert r183011
* config/arm/arm-cores.def (cortex-a15): Use generic Cortex tuning
parameters.
* config/arm/arm.c (arm_cortex_a15_tune): Remove.
-- 
Matthew Gretton-Dann
Principal Engineer, PD Software, ARM Ltd.diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index b0bd172..80609e0 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -129,7 +129,7 @@ ARM_CORE("cortex-a5", cortexa5, 7A, 
 FL_LDSCHED, cortex_a5)
 ARM_CORE("cortex-a7",cortexa7, 7A,  
FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex)
 ARM_CORE("cortex-a8",cortexa8, 7A,  
FL_LDSCHED, cortex)
 ARM_CORE("cortex-a9",cortexa9, 7A,  
FL_LDSCHED, cortex_a9)
-ARM_CORE("cortex-a15",   cortexa15,7A,  
FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15)
+ARM_CORE("cortex-a15",   cortexa15,7A,  
FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex)
 ARM_CORE("cortex-r4",cortexr4, 7R,  
FL_LDSCHED, cortex)
 ARM_CORE("cortex-r4f",   cortexr4f,7R,  
FL_LDSCHED, cortex)
 ARM_CORE("cortex-r5",cortexr5, 7R,  
FL_LDSCHED | FL_ARM_DIV, cortex)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 4a94145..3ef3d19 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -968,17 +968,6 @@ const struct tune_params arm_cortex_a9_tune =
   arm_default_branch_cost
 };
 
-const struct tune_params arm_cortex_a15_tune =
-{
-  arm_9e_rtx_costs,
-  NULL,
-  1,   /* Constant limit.  */
-  1,   /* Max cond insns.  */
-  ARM_PREFETCH_NOT_BENEFICIAL, /* TODO: Calculate correct 
values.  */
-  false,   /* Prefer constant pool.  */
-  arm_cortex_a5_branch_cost
-};
-
 const struct tune_params arm_fa726te_tune =
 {
   arm_9e_rtx_costs,

[RFA/ARM] target/51534 Fix unsigned vector comparisons

2012-02-21 Thread Matthew Gretton-Dann
The attached patch fixes instruction generation for unsigned vector
comparisons against a known-zero vector.

ARM's Neon extensions only allow unsigned equality comparison against
unsigned zero, not less than or greater than comparisons.

This patch fixes code generation - there are further changes that could be
made which would improve the code generated which will come in a
separate patch.

This issue does not affect the auto-vectorizer.

OK?

Matt

gcc/ChangeLog:

2012-02-21  Matthew Gretton-Dann  

PR target/51534
* config/arm/arm.c (neon_builtin_data): Add entries for vcgeu
and vcgtu.
* config/arm/arm_neon.h: Regenerate.
* config/arm/neon.md (unspec): Add UNSPEC_VCGEU, and UNSPEC_VCGTU.
(neon_vcgeu): New insn.
(neon_vcgtu): Likewise.
* config/arm/neon.ml (s_8_32, u_8_32): New lists.
(ops): Unsigned comparison intrinsics call a different
builtin.

gcc/testsuite/ChangeLog:

2012-02-21  Matthew Gretton-Dann  

PR target/51534
gcc.target/arm/neon/pr51534.c: New testcase.

-- 
Matthew Gretton-Dann
Principal Engineer, PD Software, ARM Ltd.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 7f0dc6b..4a173ac 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -19108,7 +19108,9 @@ static neon_builtin_datum neon_builtin_data[] =
   VAR3 (BINOP, vsubhn, v8hi, v4si, v2di),
   VAR8 (BINOP, vceq, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf),
   VAR8 (BINOP, vcge, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf),
+  VAR6 (BINOP, vcgeu, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
   VAR8 (BINOP, vcgt, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf),
+  VAR6 (BINOP, vcgtu, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
   VAR2 (BINOP, vcage, v2sf, v4sf),
   VAR2 (BINOP, vcagt, v2sf, v4sf),
   VAR6 (BINOP, vtst, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 9cba0a9..0567895 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -1893,19 +1893,19 @@ vcge_f32 (float32x2_t __a, float32x2_t __b)
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vcge_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vcgev8qi ((int8x8_t) __a, (int8x8_t) __b, 
0);
+  return (uint8x8_t)__builtin_neon_vcgeuv8qi ((int8x8_t) __a, (int8x8_t) __b, 
0);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vcge_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vcgev4hi ((int16x4_t) __a, (int16x4_t) 
__b, 0);
+  return (uint16x4_t)__builtin_neon_vcgeuv4hi ((int16x4_t) __a, (int16x4_t) 
__b, 0);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcge_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vcgev2si ((int32x2_t) __a, (int32x2_t) 
__b, 0);
+  return (uint32x2_t)__builtin_neon_vcgeuv2si ((int32x2_t) __a, (int32x2_t) 
__b, 0);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
@@ -1935,19 +1935,19 @@ vcgeq_f32 (float32x4_t __a, float32x4_t __b)
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vcgeq_u8 (uint8x16_t __a, uint8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vcgev16qi ((int8x16_t) __a, (int8x16_t) 
__b, 0);
+  return (uint8x16_t)__builtin_neon_vcgeuv16qi ((int8x16_t) __a, (int8x16_t) 
__b, 0);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vcgeq_u16 (uint16x8_t __a, uint16x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vcgev8hi ((int16x8_t) __a, (int16x8_t) 
__b, 0);
+  return (uint16x8_t)__builtin_neon_vcgeuv8hi ((int16x8_t) __a, (int16x8_t) 
__b, 0);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcgeq_u32 (uint32x4_t __a, uint32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vcgev4si ((int32x4_t) __a, (int32x4_t) 
__b, 0);
+  return (uint32x4_t)__builtin_neon_vcgeuv4si ((int32x4_t) __a, (int32x4_t) 
__b, 0);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
@@ -1977,19 +1977,19 @@ vcle_f32 (float32x2_t __a, float32x2_t __b)
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vcle_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vcgev8qi ((int8x8_t) __b, (int8x8_t) __a, 
0);
+  return (uint8x8_t)__builtin_neon_vcgeuv8qi ((int8x8_t) __b, (int8x8_t) __a, 
0);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vcle_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vcgev4hi ((int16x4_t) __b, (int16x4_t) 
__a, 0);
+  return (uint16x4_t)__builtin_neon_vcgeuv4hi ((int16x4_t) __b, (int16x4_t) 
__a, 0);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcle_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vcgev2si ((int32x2_t) __b, (int3

[RFA/ARM] Fix thumb2_mov_notscc pattern in thumb2.md

2012-01-30 Thread Matthew Gretton-Dann
Hi,

This patch fixes the thumb2_mov_notscc pattern in the same way as the
ARM state mov_notscc pattern was fixed earlier in January.

This was highlighted by the gcc.target/arm/20120111-1.c testcase.

OK?

Thanks,

Matt

gcc/ChangeLog:

2012-01-30  Matthew Gretton-Dann  

config/arm/thumb2.md (thumb2_mov_notscc): Use MVN for true
condition.
--
Matthew Gretton-Dann
Principal Engineer, PD Software, ARM Ltd.diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 05585da..ad05feb 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -259,7 +259,7 @@
(not:SI (match_operator:SI 1 "arm_comparison_operator"
 [(match_operand 2 "cc_register" "") (const_int 0)])))]
   "TARGET_THUMB2"
-  "ite\\t%D1\;mov%D1\\t%0, #0\;mvn%d1\\t%0, #1"
+  "ite\\t%D1\;mvn%D1\\t%0, #0\;mvn%d1\\t%0, #1"
   [(set_attr "conds" "use")
(set_attr "length" "10")]
 )


[RFC/ARM] Correct mov_notscc pattern

2012-01-11 Thread Matthew Gretton-Dann
All,

The attached patch corrects the mov_notscc pattern in arm.md.

This issue also exists in 4.5 and 4.6, is it okay for me to backport the
fix to those branches, as well as trunk?

OK?

Thanks,

Matt

gcc/ChangeLog:
2012-01-10  Matthew Gretton-Dann  

* config/arm/arm.md (mov_notscc): Use MVN for false condition.

gcc/testsuite/ChangeLog:
2012-01-10  Matthew Gretton-Dann  

* testsuite/gcc.c-torture/execute/20120110-1.c: New testcase.
-- 
Matthew Gretton-Dann
Principal Engineer, PD Software, ARM Ltd.diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 0e4bc3e..5620d7d 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -7726,7 +7726,7 @@
(not:SI (match_operator:SI 1 "arm_comparison_operator"
 [(match_operand 2 "cc_register" "") (const_int 0)])))]
   "TARGET_ARM"
-  "mov%D1\\t%0, #0\;mvn%d1\\t%0, #1"
+  "mvn%D1\\t%0, #0\;mvn%d1\\t%0, #1"
   [(set_attr "conds" "use")
(set_attr "insn" "mov")
(set_attr "length" "8")]
diff --git a/gcc/testsuite/gcc.c-torture/execute/20120111-1.c 
b/gcc/testsuite/gcc.c-torture/execute/20120111-1.c
new file mode 100644
index 000..eac086e
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/20120111-1.c
@@ -0,0 +1,18 @@
+#include 
+#include 
+
+uint32_t f0a (uint64_t arg2) __attribute__((noinline));
+
+uint32_t
+f0a (uint64_t arg)
+{
+  return ~(arg > -3);
+}
+
+int main() {
+  uint32_t r1;
+  r1 = f0a (12094370573988097329ULL);
+  if (r1 != ~0U)
+abort ();
+  return 0;
+}

[RFC/ARM] Prefer branches over conditional execution for Cortex-A15.

2012-01-09 Thread Matthew Gretton-Dann
The attached patch tunes the costs of conditional execution and branches
for Cortex-A15 to be the same as that for Cortex-A5.

This gives an average improvement of over 6% on a popular embedded
benchmark.

Note that the tuning parameter structure added for Cortex-A15 is only tuned
for branch costs, and otherwise takes the generic values.  Tuning for other
optimisations (notably preload insertion) is still to be done.

Thanks,

Matt

gcc/ChangeLog:

2012-01-06  Matthew Gretton-Dann  

* config/arm/arm-cores.def (cortex-a15): Use cortex_a15_tune for
tuning parameters.
* config/arm/arm.c (arm_cortex_a15_tune): New static variable.
-- 
Matthew Gretton-Dann
Principal Engineer, PD Software, ARM Ltd.diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 80609e0..b0bd172 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -129,7 +129,7 @@ ARM_CORE("cortex-a5", cortexa5, 7A, 
 FL_LDSCHED, cortex_a5)
 ARM_CORE("cortex-a7",cortexa7, 7A,  
FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex)
 ARM_CORE("cortex-a8",cortexa8, 7A,  
FL_LDSCHED, cortex)
 ARM_CORE("cortex-a9",cortexa9, 7A,  
FL_LDSCHED, cortex_a9)
-ARM_CORE("cortex-a15",   cortexa15,7A,  
FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex)
+ARM_CORE("cortex-a15",   cortexa15,7A,  
FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15)
 ARM_CORE("cortex-r4",cortexr4, 7R,  
FL_LDSCHED, cortex)
 ARM_CORE("cortex-r4f",   cortexr4f,7R,  
FL_LDSCHED, cortex)
 ARM_CORE("cortex-r5",cortexr5, 7R,  
FL_LDSCHED | FL_ARM_DIV, cortex)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 0bded8d..6f1eb13 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -961,6 +961,17 @@ const struct tune_params arm_cortex_a9_tune =
   arm_default_branch_cost
 };
 
+const struct tune_params arm_cortex_a15_tune =
+{
+  arm_9e_rtx_costs,
+  NULL,
+  1,   /* Constant limit.  */
+  1,   /* Max cond insns.  */
+  ARM_PREFETCH_NOT_BENEFICIAL, /* TODO: Calculate correct 
values.  */
+  false,   /* Prefer constant pool.  */
+  arm_cortex_a5_branch_cost
+};
+
 const struct tune_params arm_fa726te_tune =
 {
   arm_9e_rtx_costs,

[RFA/testsuite] Use rand instead of random (again)

2011-11-30 Thread Matthew Gretton-Dann

All,

The attached patch changes another use of random in the testsuite to rand.

Tested on an arm-none-eabi target using QEmu.

Please can someone review.

Thanks,

Matt

gcc/testsuite/ChangeLog:

2011-11-30  Matthew Gretton-Dann  

* gcc.dg/torture/vec-cvt-1.c (FLTTEST): Call rand instead of
random.

--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltddiff --git a/gcc/testsuite/gcc.dg/torture/vec-cvt-1.c 
b/gcc/testsuite/gcc.dg/torture/vec-cvt-1.c
index 601f098..4354237 100644
--- a/gcc/testsuite/gcc.dg/torture/vec-cvt-1.c
+++ b/gcc/testsuite/gcc.dg/torture/vec-cvt-1.c
@@ -104,9 +104,9 @@ flttointtest##intt (void)   
\
   abort ();
\
   for (i = 0; i < N; i++)  \
 {  \
-  unsigned long long r = random ();
\
-  r = (r << 21) ^ (unsigned) random ();\
-  r = (r << 21) ^ (unsigned) random ();\
+  unsigned long long r = rand ();  \
+  r = (r << 21) ^ (unsigned) rand ();  \
+  r = (r << 21) ^ (unsigned) rand ();  \
   asm ("");
\
   f[i] = (r >> 59) / 32.0f + (__typeof (intt[0])) r;   \
   if (f[i] < fltmin) f[i] = fltmin;
\
@@ -157,9 +157,9 @@ inttoflttest##intt (void)   
\
 }  \
   for (i = 0; i < N; i++)  \
 {  \
-  unsigned long long r = random ();
\
-  r = (r << 21) ^ (unsigned) random ();\
-  r = (r << 21) ^ (unsigned) random ();\
+  unsigned long long r = rand ();  \
+  r = (r << 21) ^ (unsigned) rand ();  \
+  r = (r << 21) ^ (unsigned) rand ();  \
   asm ("");
\
   intt[i] = r; \
 }  \

Re: [RFA/ARM] Add an integer pipeline description for Cortex-A15

2011-11-30 Thread Matthew Gretton-Dann

On 29/11/11 11:04, Ramana Radhakrishnan wrote:

gcc/ChangeLog:
2011-11-28  Matthew Gretton-Dann

* config/arm/arm.c (arm_issue_rate): Cortex-A15 can triple
issue.
* config/arm/arm.md (mul64): Add new attribute.
(generic_sched): Cortex-A15 is not scheduled generically
(cortex-a15.md): Include.
* config/arm/cortex-a15.md: New machine description.
* config/arm/t-arm (MD_INCLUDES): Add cortex-a15.md


OK .


This had a conflict with the MD_INCLUDES patch in config/arm/t-arm - so 
the attached is what actually got committed.


Thanks,

Matt
--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM LtdIndex: gcc/config/arm/cortex-a15.md
===
--- gcc/config/arm/cortex-a15.md(revision 0)
+++ gcc/config/arm/cortex-a15.md(revision 0)
@@ -0,0 +1,186 @@
+;; ARM Cortex-A15 pipeline description
+;; Copyright (C) 2011 Free Software Foundation, Inc.
+;;
+;; Written by Matthew Gretton-Dann 
+
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_automaton "cortex_a15")
+
+;; The Cortex-A15 core is modelled as a triple issue pipeline that has
+;; the following dispatch units.
+;; 1. Two pipelines for simple integer operations: SX1, SX2
+;; 2. Two pipelines for Neon and FP data-processing operations: CX1, CX2
+;; 3. One pipeline for branch operations: BX
+;; 4. One pipeline for integer multiply and divide operations: MX
+;; 5. Two pipelines for load and store operations: LS1, LS2
+;;
+;; We can issue into three pipelines per-cycle.
+;;
+;; We assume that where we have unit pairs xx1 is always filled before xx2.
+
+;; The three issue units
+(define_cpu_unit "ca15_i0, ca15_i1, ca15_i2" "cortex_a15")
+
+(define_reservation "ca15_issue1" "(ca15_i0|ca15_i1|ca15_i2)")
+(define_reservation "ca15_issue2" "((ca15_i0+ca15_i1)|(ca15_i1+ca15_i2))")
+(define_reservation "ca15_issue3" "(ca15_i0+ca15_i1+ca15_i2)")
+(final_presence_set "ca15_i1" "ca15_i0")
+(final_presence_set "ca15_i2" "ca15_i1")
+
+;; The main dispatch units
+(define_cpu_unit "ca15_sx1, ca15_sx2" "cortex_a15")
+(define_cpu_unit "ca15_cx1, ca15_cx2" "cortex_a15")
+(define_cpu_unit "ca15_ls1, ca15_ls2" "cortex_a15")
+(define_cpu_unit "ca15_bx, ca15_mx" "cortex_a15")
+
+(define_reservation "ca15_ls" "(ca15_ls1|ca15_ls2)")
+
+;; The extended load-store pipeline
+(define_cpu_unit "ca15_ldr, ca15_str" "cortex_a15")
+
+;; The extended ALU pipeline
+(define_cpu_unit "ca15_sx1_alu, ca15_sx1_shf, ca15_sx1_sat" "cortex_a15")
+(define_cpu_unit "ca15_sx2_alu, ca15_sx2_shf, ca15_sx2_sat" "cortex_a15")
+
+;; Simple Execution Unit:
+;;
+;; Simple ALU without shift
+(define_insn_reservation "cortex_a15_alu" 2
+  (and (eq_attr "tune" "cortexa15")
+   (and (eq_attr "type" "alu")
+(eq_attr "neon_type" "none")))
+  "ca15_issue1,(ca15_sx1,ca15_sx1_alu)|(ca15_sx2,ca15_sx2_alu)")
+
+;; ALU ops with immediate shift
+(define_insn_reservation "cortex_a15_alu_shift" 3
+  (and (eq_attr "tune" "cortexa15")
+   (and (eq_attr "type" "alu_shift")
+(eq_attr "neon_type" "none")))
+  "ca15_issue1,(ca15_sx1,ca15_sx1+ca15_sx1_shf,ca15_sx1_alu)\
+  |(ca15_sx2,ca15_sx2+ca15_sx2_shf,ca15_sx2_alu)")
+
+;; ALU ops with register controlled shift
+(define_insn_reservation "cortex_a15_alu_shift_reg" 3
+  (and (eq_attr "tune" "cortexa15")
+   (and (eq_attr "type" "alu_shift_reg")
+   (eq_attr "neon_type" "none")))
+  "(ca15_issue2,ca15_sx1+ca15_sx2,ca15_sx1_shf,ca15_sx2_alu)\
+   |(ca15_issue1,(ca15_issue1+ca15_sx2,ca15_sx1+ca15_sx2_shf)\
+   |(ca15_issue1+ca15_sx1,ca15_sx1+ca15_sx1_shf),ca15_sx1_alu)")
+
+;; Multiply Execution Unit:
+;;
+;; 32-bit multiplies
+(define_insn_reservation "cortex_a15_mult32" 3
+  (and (eq_attr "tune" "

[PATCH/Committed] Add self to write-after-approval section of MAINTAINERS file

2011-11-30 Thread Matthew Gretton-Dann

All,

I have just committed the attached patch to add myself to the 
write-after-approval section of the MAINTAINERS file.


Thanks,

Matt

ChangeLog:

2011-11-30  Matthew Gretton-Dann  

* MAINTAINERS (write-after-approval): Add self.

--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM LtdIndex: MAINTAINERS
===
--- MAINTAINERS (revision 181835)
+++ MAINTAINERS (working copy)
@@ -372,6 +372,7 @@
 Tristan Gingoldging...@adacore.com
 Anthony Green  gr...@redhat.com
 Doug Gregordoug.gre...@gmail.com
+Matthew Gretton-Dann   matthew.gretton-d...@arm.com
 Jon Grimm  jgri...@us.ibm.com
 Laurent Guerby laur...@guerby.net
 Xuepeng Guoterry@arm.com

Tidy up MD_INCLUDES in config/arm/t-arm

2011-11-29 Thread Matthew Gretton-Dann

All,

Whilst developing the Cortex-A15 integer pipeline patch it was noted 
that the MD_INCLUDES variable in config/arm/t-arm has not been kept 
up-to-date.


The attached patch fixes this, and rearranges the list of md files into 
alphabetical order.


The list was generated using `ls -1 *.md | grep -v arm\\.md`.

Tested by doing a arm-none-eabi build.

Can someone please review, and if appropriate apply?

Thanks,

Matt

gcc/ChangeLog:
2011-11-29  Matthew Gretton-Dann  

* config/arm/t-arm (MD_INCLUDES): Ensure all md files are
listed.


--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltddiff --git a/gcc/config/arm/t-arm b/gcc/config/arm/t-arm
index a9a174d..4b94a7e 100644
--- a/gcc/config/arm/t-arm
+++ b/gcc/config/arm/t-arm
@@ -19,26 +19,43 @@
 # along with GCC; see the file COPYING3.  If not see
 # <http://www.gnu.org/licenses/>.
 
-MD_INCLUDES=   $(srcdir)/config/arm/arm-tune.md \
-   $(srcdir)/config/arm/predicates.md \
-   $(srcdir)/config/arm/arm-generic.md \
-   $(srcdir)/config/arm/arm1020e.md \
+# All md files - except for arm.md.
+# This list should be kept in alphabetical order and updated whenever an md
+# file is added or removed.
+MD_INCLUDES=   $(srcdir)/config/arm/arm1020e.md \
$(srcdir)/config/arm/arm1026ejs.md \
$(srcdir)/config/arm/arm1136jfs.md \
+   $(srcdir)/config/arm/arm926ejs.md \
+   $(srcdir)/config/arm/arm-fixed.md \
+   $(srcdir)/config/arm/arm-generic.md \
+   $(srcdir)/config/arm/arm-tune.md \
+   $(srcdir)/config/arm/cirrus.md \
+   $(srcdir)/config/arm/constraints.md \
+   $(srcdir)/config/arm/cortex-a5.md \
+   $(srcdir)/config/arm/cortex-a8.md \
+   $(srcdir)/config/arm/cortex-a8-neon.md \
+   $(srcdir)/config/arm/cortex-a9.md \
+   $(srcdir)/config/arm/cortex-a9-neon.md \
+   $(srcdir)/config/arm/cortex-m4-fpu.md \
+   $(srcdir)/config/arm/cortex-m4.md \
+   $(srcdir)/config/arm/cortex-r4f.md \
+   $(srcdir)/config/arm/cortex-r4.md \
$(srcdir)/config/arm/fa526.md \
$(srcdir)/config/arm/fa606te.md \
$(srcdir)/config/arm/fa626te.md \
-   $(srcdir)/config/arm/fmp626.md \
$(srcdir)/config/arm/fa726te.md \
-   $(srcdir)/config/arm/arm926ejs.md \
-   $(srcdir)/config/arm/cirrus.md \
+   $(srcdir)/config/arm/fmp626.md \
$(srcdir)/config/arm/fpa.md \
-   $(srcdir)/config/arm/vec-common.md \
+   $(srcdir)/config/arm/iterators.md \
$(srcdir)/config/arm/iwmmxt.md \
-   $(srcdir)/config/arm/vfp.md \
+   $(srcdir)/config/arm/ldmstm.md \
$(srcdir)/config/arm/neon.md \
+   $(srcdir)/config/arm/predicates.md \
+   $(srcdir)/config/arm/sync.md \
$(srcdir)/config/arm/thumb2.md \
-   $(srcdir)/config/arm/arm-fixed.md
+   $(srcdir)/config/arm/vec-common.md \
+   $(srcdir)/config/arm/vfp11.md \
+   $(srcdir)/config/arm/vfp.md
 
 s-config s-conditions s-flags s-codes s-constants s-emit s-recog s-preds \
s-opinit s-extract s-peep s-attr s-attrtab s-output: $(MD_INCLUDES)

[RFA/ARM] Add an integer pipeline description for Cortex-A15

2011-11-28 Thread Matthew Gretton-Dann

All,

The attached patch adds a integer pipeline description for Cortex-A15.

Although not dependent on my testing has been done on top of Sameera's 
Deshpande's A15 Prologue/Epilogue patches (see: 
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00856.html and following). 
 Testing on some popular embedded benchmarks shows the pipeline 
description gives a 1.8% performance improvement on (geomean) average. 
The testsuite shows no regressions targetting arm-none-eabi and using QEmu.


Can someone please review.

Thanks,

Matt

gcc/ChangeLog:
2011-11-28  Matthew Gretton-Dann

* config/arm/arm.c (arm_issue_rate): Cortex-A15 can triple
issue.
* config/arm/arm.md (mul64): Add new attribute.
(generic_sched): Cortex-A15 is not scheduled generically
(cortex-a15.md): Include.
* config/arm/cortex-a15.md: New machine description.
* config/arm/t-arm (MD_INCLUDES): Add cortex-a15.md

--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltddiff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e3b0b88..d17f2b5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -24123,6 +24123,9 @@ arm_issue_rate (void)
 {
   switch (arm_tune)
 {
+case cortexa15:
+  return 3;
+
 case cortexr4:
 case cortexr4f:
 case cortexr5:
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index a78ba88..facbf92 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -355,6 +355,13 @@
 (const_string "mult")
 (const_string "alu")))
 
+; Is this an (integer side) multiply with a 64-bit result?
+(define_attr "mul64" "no,yes"
+(if_then_else
+  (eq_attr "insn" 
"smlalxy,umull,umulls,umlal,umlals,smull,smulls,smlal,smlals")
+  (const_string "yes")
+  (const_string "no")))
+
 ; Load scheduling, set from the arm_ld_sched variable
 ; initialized by arm_option_override()
 (define_attr "ldsched" "no,yes" (const (symbol_ref "arm_ld_sched")))
@@ -518,7 +525,7 @@
 
 (define_attr "generic_sched" "yes,no"
   (const (if_then_else
-  (ior (eq_attr "tune" 
"fa526,fa626,fa606te,fa626te,fmp626,fa726te,arm926ejs,arm1020e,arm1026ejs,arm1136js,arm1136jfs,cortexa5,cortexa8,cortexa9,cortexm4")
+  (ior (eq_attr "tune" 
"fa526,fa626,fa606te,fa626te,fmp626,fa726te,arm926ejs,arm1020e,arm1026ejs,arm1136js,arm1136jfs,cortexa5,cortexa8,cortexa9,cortexa15,cortexm4")
   (eq_attr "tune_cortexr4" "yes"))
   (const_string "no")
   (const_string "yes"
@@ -544,6 +551,7 @@
 (include "cortex-a5.md")
 (include "cortex-a8.md")
 (include "cortex-a9.md")
+(include "cortex-a15.md")
 (include "cortex-r4.md")
 (include "cortex-r4f.md")
 (include "cortex-m4.md")
diff --git a/gcc/config/arm/cortex-a15.md b/gcc/config/arm/cortex-a15.md
new file mode 100644
index 000..ccab7cb
--- /dev/null
+++ b/gcc/config/arm/cortex-a15.md
@@ -0,0 +1,186 @@
+;; ARM Cortex-A15 pipeline description
+;; Copyright (C) 2011 Free Software Foundation, Inc.
+;;
+;; Written by Matthew Gretton-Dann 
+
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_automaton "cortex_a15")
+
+;; The Cortex-A15 core is modelled as a triple issue pipeline that has
+;; the following dispatch units.
+;; 1. Two pipelines for simple integer operations: SX1, SX2
+;; 2. Two pipelines for Neon and FP data-processing operations: CX1, CX2
+;; 3. One pipeline for branch operations: BX
+;; 4. One pipeline for integer multiply and divide operations: MX
+;; 5. Two pipelines for load and store operations: LS1, LS2
+;;
+;; We can issue into three pipelines per-cycle.
+;;
+;; We assume that where we have unit pairs xx1 is always filled before xx2.
+
+;; The three issue units
+(define_cpu_unit "ca15_i0, ca15_i1, ca15_i2" "cortex_a15")
+
+(define_reservation "ca15_issue1" "(ca15_i0|ca15_i1|ca15_i2)")
+(define_reservation "ca15_issue2" "((ca15_i0+ca15_i1)|(ca15_i1+ca15_i2))")
+(define_reservation "ca15_issue3" "(c

[RFA/testsuite] Update gcc.dg/vshift-*.c tests to use rand and not random

2011-11-21 Thread Matthew Gretton-Dann

All,

[Apologies to those getting this twice - used wrong account to send it 
initially].


The attached patch updates the gcc.dg/vshift-*.c tests to call the 
function rand and not random, as random is not available on all targets, 
but rand should be as it is in the Standard C Library.


Can someone please review the patch?

Thanks,

Matt

gcc/testsuite/ChangeLog:

2011-11-21  Matthew Gretton-Dann  

* gcc.dg/vshift-1.c (main): Call rand instead of random.
* gcc.dg/vshift-3.c (main): Likewise.

--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltddiff --git a/gcc/testsuite/gcc.dg/vshift-1.c b/gcc/testsuite/gcc.dg/vshift-1.c
index 2a237aa..2220ad5 100644
--- a/gcc/testsuite/gcc.dg/vshift-1.c
+++ b/gcc/testsuite/gcc.dg/vshift-1.c
@@ -94,10 +94,10 @@ main ()
   for (i = 0; i < N; i++)
 {
   asm ("");
-  c[i] = (random () << 1) | (random () & 1);
+  c[i] = (rand () << 1) | (rand () & 1);
   b[i] = (i * 85) & (sizeof (TYPE1) * __CHAR_BIT__ - 1);
   a[i] = c[i];
-  d[i] = (random () << 1) | (random () & 1);
+  d[i] = (rand () << 1) | (rand () & 1);
   d[i] |= (unsigned long long) c[i] << 32;
   e[i] = (i * 85) & (sizeof (TYPE2) * __CHAR_BIT__ - 1);
   f[i] = d[i];
diff --git a/gcc/testsuite/gcc.dg/vshift-3.c b/gcc/testsuite/gcc.dg/vshift-3.c
index e62c76b..367e660 100644
--- a/gcc/testsuite/gcc.dg/vshift-3.c
+++ b/gcc/testsuite/gcc.dg/vshift-3.c
@@ -100,9 +100,9 @@ main ()
   for (i = 0; i < N; i++)
 {
   asm ("");
-  c[i] = (random () << 1) | (random () & 1);
+  c[i] = (rand () << 1) | (rand () & 1);
   a[i] = c[i];
-  d[i] = (random () << 1) | (random () & 1);
+  d[i] = (rand () << 1) | (rand () & 1);
   d[i] |= (unsigned long long) c[i] << 32;
   f[i] = d[i];
 }


[wwwdocs] [RFA] Update gcc-4.7/changes.html to document -mcpu=cortex-a7

2011-11-21 Thread Matthew Gretton-Dann

All,

The attached patch updates gcc-4.7/changes.html to document the addition 
of support in the ARM backend for Cortex-A7 via the -mcpu=cortex-a7 
command line option.


The wording is based upon that used for Cortex-M4 in gcc-4.6/changes.html.

Can someone please review, and if appropriate apply the patch?

Thanks,

Matt

ChangeLog:

2011-11-21  Matthew Gretton-Dann  

* htdocs/gcc-4.7/changes.html: Document -mcpu=cortex-a7.

--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM LtdIndex: htdocs/gcc-4.7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.7/changes.html,v
retrieving revision 1.61
diff -u -p -r1.61 changes.html
--- htdocs/gcc-4.7/changes.html 19 Nov 2011 22:04:07 -  1.61
+++ htdocs/gcc-4.7/changes.html 21 Nov 2011 10:51:05 -
@@ -456,6 +456,8 @@ well.
 
 ARM
   
+GCC now supports the Cortex-A7 processor implementing the v7-a version
+  of the architecture using the option -mcpu=cortex-a7.
 The default vector size in auto-vectorization for NEON is now 128 bits.
   If vectorization fails thusly, the vectorizer tries again with
   64-bit vectors.

[RFA/ARM] Make libgcc use UDIV/SDIV instructions when they are available.

2011-11-15 Thread Matthew Gretton-Dann

All,

The attached patch causes libgcc to use the UDIV and SDIV instructions 
when possible in the implementation of the ARM div/mod functions in libgcc.


This will benefit Cortex-M3, Cortex-M4, all Cortex-R* CPUs, Cortex-A7, 
and Cortex-A15.


The special case of some Cortex-R* CPUs where the UDIV/SDIV instructions 
are only available in Thumb mode, making it beneficial to force these 
library functions into Thumb mode to make use of those instructions, is 
not handled.


This was tested by configuring GCC --with-cpu cortex-a15, and then 
running the testsuite with -mcpu=cortex-a9.  I've also manually 
inspected libgcc to make sure the functions are being built as expected.


Please can someone review?

Thanks,

Matt

libgcc/ChangeLog:

2011-11-15  Matthew Gretton-Dann  

* config/arm/lib1funcs.asm (udivsi3): Add support for divide
functions.
(aeabi_uidivmod): Likewise. 
(umodsi3): Likewise.
(divsi3): Likewise.
(aeabi_idivmod): Likewise.
(modsi3): Likewise.

Thanks,

Matt

--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltddiff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 2e76c01..094d79a 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -951,6 +951,17 @@ LSYM(udivsi3_skip_div0_test):
pop { work }
RET
 
+#elif defined(__ARM_ARCH_EXT_IDIV__)
+
+   ARM_FUNC_START udivsi3
+   ARM_FUNC_ALIAS aeabi_uidiv udivsi3
+
+   cmp r1, #0
+   beq LSYM(Ldiv0)
+
+   udivr0, r0, r1
+   RET
+
 #else /* ARM version/Thumb-2.  */
 
ARM_FUNC_START udivsi3
@@ -997,6 +1008,14 @@ FUNC_START aeabi_uidivmod
mul r2, r0
sub r1, r1, r2
bx  r3
+#elif defined(__ARM_ARCH_EXT_IDIV__)
+ARM_FUNC_START aeabi_uidivmod
+   cmp r1, #0
+   beq LSYM(Ldiv0)
+   mov r2, r0 
+   udivr0, r0, r1
+   mls r1, r0, r1, r2
+   RET
 #else
 ARM_FUNC_START aeabi_uidivmod
cmp r1, #0
@@ -1014,9 +1033,19 @@ ARM_FUNC_START aeabi_uidivmod
 /*  */
 #ifdef L_umodsi3
 
-   FUNC_START umodsi3
+#ifdef __ARM_ARCH_EXT_IDIV__
 
-#ifdef __thumb__
+   ARM_FUNC_START umodsi3
+
+   cmp r1, #0
+   beq LSYM(Ldiv0)
+   udivr2, r0, r1
+   mls r0, r1, r2, r0
+   RET
+
+#elif defined(__thumb__)
+
+   FUNC_START umodsi3
 
cmp divisor, #0
beq LSYM(Ldiv0)
@@ -1035,6 +1064,8 @@ LSYM(Lover10):

 #else  /* ARM version.  */

+   FUNC_START umodsi3
+
subsr2, r1, #1  @ compare divisor with 1
bcc LSYM(Ldiv0)
cmpne   r0, r1  @ compare dividend with divisor
@@ -1091,6 +1122,16 @@ LSYM(Lover12):
pop { work }
RET
 
+#elif defined(__ARM_ARCH_EXT_IDIV__)
+
+   ARM_FUNC_START divsi3
+   ARM_FUNC_ALIAS aeabi_idiv divsi3
+
+   cmp r1, #0
+   beq LSYM(Ldiv0)
+   sdivr0, r0, r1
+   RET
+
 #else /* ARM/Thumb-2 version.  */

ARM_FUNC_START divsi3   
@@ -1153,6 +1194,14 @@ FUNC_START aeabi_idivmod
mul r2, r0
sub r1, r1, r2
bx  r3
+#elif defined(__ARM_ARCH_EXT_IDIV__)
+ARM_FUNC_START aeabi_idivmod
+   cmp r1, #0
+   beq LSYM(Ldiv0)
+   mov r2, r0
+   sdivr0, r0, r1
+   mls r1, r0, r1, r2
+   RET
 #else
 ARM_FUNC_START aeabi_idivmod
cmp r1, #0
@@ -1170,9 +1219,20 @@ ARM_FUNC_START aeabi_idivmod
 /*  */
 #ifdef L_modsi3
 
-   FUNC_START modsi3
+#if defined(__ARM_ARCH_EXT_IDIV__)
 
-#ifdef __thumb__
+   ARM_FUNC_START modsi3
+
+   cmp r1, #0
+   beq LSYM(Ldiv0)
+
+   sdivr2, r0, r1
+   mls r0, r1, r2, r0
+   RET
+
+#elif defined(__thumb__)
+
+   FUNC_START modsi3
 
mov curbit, #1
cmp divisor, #0
@@ -1204,6 +1264,8 @@ LSYM(Lover12):
 
 #else /* ARM version.  */

+   FUNC_START modsi3
+
cmp r1, #0
beq LSYM(Ldiv0)
rsbmi   r1, r1, #0  @ loops below use unsigned.

[RFA/ARM] Add support for -mcpu=cortex-a7

2011-11-07 Thread Matthew Gretton-Dann

All,

The attached patch adds support for -mcpu=cortex-a7 to the GCC command line.

A related patch has just been committed to binutils so that gas will 
recognize cortex-a7 on the command line and in the appropriate 
directives.  See http://sourceware.org/ml/binutils/2011-11/msg00048.html.


Can someone please review and comment.

Thanks,

Matt

gcc/ChangeLog:

2011-11-07  Matthew Gretton-Dann  

* config/arm/arm-cores.def: Add -mcpu=cortex-a7.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Likewise.
* config/arm/bpabi.h (BE8_LINK_SPEC): Add Cortex A-7.
* doc/invoke.texi: Document -mcpu=cortex-a7.

--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltddiff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 742b5e8..80609e0 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -126,6 +126,7 @@ ARM_CORE("arm1156t2-s",	  arm1156t2s,	6T2, FL_LDSCHED, v6t2)
 ARM_CORE("arm1156t2f-s",  arm1156t2fs,  6T2, FL_LDSCHED | FL_VFPV2, v6t2)
 ARM_CORE("generic-armv7-a", genericv7a,	7A, FL_LDSCHED, cortex)
 ARM_CORE("cortex-a5",	  cortexa5,	7A, FL_LDSCHED, cortex_a5)
+ARM_CORE("cortex-a7",	  cortexa7,	7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex)
 ARM_CORE("cortex-a8",	  cortexa8,	7A, FL_LDSCHED, cortex)
 ARM_CORE("cortex-a9",	  cortexa9,	7A, FL_LDSCHED, cortex_a9)
 ARM_CORE("cortex-a15",	  cortexa15,	7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex)
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index 23339c7..c0b2437 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -238,6 +238,9 @@ EnumValue
 Enum(processor_type) String(cortex-a5) Value(cortexa5)
 
 EnumValue
+Enum(processor_type) String(cortex-a7) Value(cortexa7)
+
+EnumValue
 Enum(processor_type) String(cortex-a8) Value(cortexa8)
 
 EnumValue
diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md
index 4b439d0..54ef0f1 100644
--- a/gcc/config/arm/arm-tune.md
+++ b/gcc/config/arm/arm-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from arm-cores.def
 (define_attr "tune"
-	"arm2,arm250,arm3,arm6,arm60,arm600,arm610,arm620,arm7,arm7d,arm7di,arm70,arm700,arm700i,arm710,arm720,arm710c,arm7100,arm7500,arm7500fe,arm7m,arm7dm,arm7dmi,arm8,arm810,strongarm,strongarm110,strongarm1100,strongarm1110,fa526,fa626,arm7tdmi,arm7tdmis,arm710t,arm720t,arm740t,arm9,arm9tdmi,arm920,arm920t,arm922t,arm940t,ep9312,arm10tdmi,arm1020t,arm9e,arm946es,arm966es,arm968es,arm10e,arm1020e,arm1022e,xscale,iwmmxt,iwmmxt2,fa606te,fa626te,fmp626,fa726te,arm926ejs,arm1026ejs,arm1136js,arm1136jfs,arm1176jzs,arm1176jzfs,mpcorenovfp,mpcore,arm1156t2s,arm1156t2fs,genericv7a,cortexa5,cortexa8,cortexa9,cortexa15,cortexr4,cortexr4f,cortexr5,cortexm4,cortexm3,cortexm1,cortexm0"
+	"arm2,arm250,arm3,arm6,arm60,arm600,arm610,arm620,arm7,arm7d,arm7di,arm70,arm700,arm700i,arm710,arm720,arm710c,arm7100,arm7500,arm7500fe,arm7m,arm7dm,arm7dmi,arm8,arm810,strongarm,strongarm110,strongarm1100,strongarm1110,fa526,fa626,arm7tdmi,arm7tdmis,arm710t,arm720t,arm740t,arm9,arm9tdmi,arm920,arm920t,arm922t,arm940t,ep9312,arm10tdmi,arm1020t,arm9e,arm946es,arm966es,arm968es,arm10e,arm1020e,arm1022e,xscale,iwmmxt,iwmmxt2,fa606te,fa626te,fmp626,fa726te,arm926ejs,arm1026ejs,arm1136js,arm1136jfs,arm1176jzs,arm1176jzfs,mpcorenovfp,mpcore,arm1156t2s,arm1156t2fs,genericv7a,cortexa5,cortexa7,cortexa8,cortexa9,cortexa15,cortexr4,cortexr4f,cortexr5,cortexm4,cortexm3,cortexm1,cortexm0"
 	(const (symbol_ref "((enum attr_tune) arm_tune)")))
diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h
index 64d7df4..7d8e508 100644
--- a/gcc/config/arm/bpabi.h
+++ b/gcc/config/arm/bpabi.h
@@ -57,6 +57,7 @@
 
 #define BE8_LINK_SPEC \
   " %{mbig-endian:%{march=armv7-a|mcpu=cortex-a5	\
+   |mcpu=cortex-a7	\
|mcpu=cortex-a8|mcpu=cortex-a9|mcpu=cortex-a15	\
|mcpu=generic-armv7-a\
|march=armv7-m|mcpu=cortex-m3			\
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index b68f607..f2c3108 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10398,8 +10398,8 @@ assembly code.  Permissible names are: @samp{arm2}, @samp{arm250},
 @samp{arm10e}, @samp{arm1020e}, @samp{arm1022e},
 @samp{arm1136j-s}, @samp{arm1136jf-s}, @samp{mpcore}, @samp{mpcorenovfp},
 @samp{arm1156t2-s}, @samp{arm1156t2f-s}, @samp{arm1176jz-s}, @samp{arm1176jzf-s},
-@samp{cortex-a5}, @samp{cortex-a8}, @samp{cortex-a9}, @samp{cortex-a15},
-@samp{cortex-r4}, @samp{cortex-r4f}, @samp{cortex-r5},
+@samp{cortex-a5}, @samp{cortex-a7}, @samp{cortex-a8}, @samp{cortex-a9}, 
+@samp{cortex-a15}, @samp{cortex-r4}, @samp{cortex-r4f}, @samp{cortex-r5},
 @samp{cortex-m4}, @samp{cortex-m3},
 @samp{cortex-m1},
 @samp{cortex-m0},

Re: CFT: [build] Move crtstuff support to toplevel libgcc

2011-11-03 Thread Matthew Gretton-Dann

On 02/11/11 12:37, Rainer Orth wrote:

Rainer Orth  writes:


The next patch in the series moves crtstuff.c, extra_parts, EXTRA_PARTS,
EXTRA_MULTILIB_PARTS and referenced files to libgcc.  This will avoid
errors due to inconsistencies in extra_parts between gcc and libgcc in
the future.

Much of this is pretty mechanical, but given that I cannot test most of
the platforms, there are likely to be mistakes in the process.


Given Joseph's approval, I'm about to check in the following rebased
version of the patch, after regtesting on i386-pc-solaris2.11,
sparc-sun-solaris2.11, x86_64-unknown-linux-gnu, i386-apple-darwin9.8.0,
and powerpc-apple-darwin9.8.0.

 Rainer


This also breaks arm-none-eabi builds (fails to find unwind-arm-common.h 
from gcc/ginclude).  I have raised PR50978 for this 
(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50978).


Thanks,

Matt

--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltd



[PATCH/RFA] Fix up gcc.dg/vect/pr30858.c expected output

2011-10-14 Thread Matthew Gretton-Dann

All,

The attached patch corrects the expected output of the
gcc.dg/vect/pr30858.c testcase.

Historically it has expected the output "Unknown def-use cycle pattern." 
just once.


However, recent changes to GCC for ARM targets means that vectorization 
is attempted twice once with a vector size of 128-bits and once with a 
vector size of 64-bits.  This means that the output appears more than  once.


The patch works around this by making the testcase expect one or more 
instances of "Unknown def-use cycle pattern"


Can someone review please?

Thanks,

Matt

gcc/testsuite/ChangeLog:

2011-10-13  Matthew Gretton-Dann  

 * gcc.dg/vect/pr30858.c: Update expected output for
 architectures with multiple vector sizes.

--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltddiff --git a/gcc/testsuite/gcc.dg/vect/pr30858.c 
b/gcc/testsuite/gcc.dg/vect/pr30858.c
index 0af2f8e..0e7f7e1 100644
--- a/gcc/testsuite/gcc.dg/vect/pr30858.c
+++ b/gcc/testsuite/gcc.dg/vect/pr30858.c
@@ -11,5 +11,5 @@ foo (int ko)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" } } */
-/* { dg-final { scan-tree-dump-times "Unknown def-use cycle pattern." 1 "vect" 
} } */
+/* { dg-final { scan-tree-dump "Unknown def-use cycle pattern." "vect" } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */


Re: [PATCH (6/7)] More widening multiply-and-accumulate pattern matching

2011-10-13 Thread Matthew Gretton-Dann
 -2089,7 +2093,7 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator 
*gsi)
if (TREE_CODE (type) != INTEGER_TYPE)
  return false;

-  if (!is_widening_mult_p (stmt,&type1,&rhs1,&type2,&rhs2))
+  if (!is_widening_mult_p (type, stmt,&type1,&rhs1,&type2,&rhs2))
  return false;

to_mode = TYPE_MODE (type);
@@ -2255,7 +2259,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, 
gimple stmt,
if (code == PLUS_EXPR
&&  (rhs1_code == MULT_EXPR || rhs1_code == WIDEN_MULT_EXPR))
  {
-  if (!is_widening_mult_p (rhs1_stmt,&type1,&mult_rhs1,
+  if (!is_widening_mult_p (type, rhs1_stmt,&type1,&mult_rhs1,
&type2,&mult_rhs2))
return false;
add_rhs = rhs2;
@@ -2263,7 +2267,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, 
gimple stmt,
  }
else if (rhs2_code == MULT_EXPR || rhs2_code == WIDEN_MULT_EXPR)
  {
-  if (!is_widening_mult_p (rhs2_stmt,&type1,&mult_rhs1,
+  if (!is_widening_mult_p (type, rhs2_stmt,&type1,&mult_rhs1,
&type2,&mult_rhs2))
return false;
add_rhs = rhs1;



--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltd



Re: Vector Comparison patch

2011-09-30 Thread Matthew Gretton-Dann

On 29/09/11 12:27, Richard Guenther wrote:

On Thu, Sep 29, 2011 at 12:00 PM, Richard Guenther
  wrote:

On Wed, Sep 28, 2011 at 4:23 PM, Richard Guenther
  wrote:

On Mon, Sep 26, 2011 at 5:43 PM, Richard Guenther
  wrote:

On Mon, Sep 26, 2011 at 4:25 PM, Richard Guenther
  wrote:

On Wed, Sep 7, 2011 at 5:06 PM, Joseph S. Myers  wrote:

This looks like it has the same issue with maybe needing to use
TYPE_MAIN_VARIANT in type comparisons as the shuffle patch.


I don't think so, we move qualifiers to the vector type from the element type
in make_vector_type and the tests only look at the component type.

I am re-testing the patch currently and will commit it if that succeeds.


Unfortunately gcc.c-torture/execute/vector-compare-1.c fails with -m32
for

vector (2, double) d0;
vector (2, double) d1;
vector (2, long) idres;

d0 = (vector (2, double)){(double)argc,  10.};
d1 = (vector (2, double)){0., (double)-23};
idres = (d0>  d1);

as appearantly the type we chose to assign to (d0>  d1) is different
from that of idres:

/space/rguenther/src/svn/trunk/gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5:
error: incompatible types when assigning to type '__vector(2) long
int' from type '__vector(2) long long int'^M

Adjusting it to vector (2, long long) otoh yields, for -m64:

/space/rguenther/src/svn/trunk/gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5:
error: incompatible types when assigning to type '__vector(2) long
long int' from type '__vector(2) long int'

But those two types are at least compatible from their modes.  Joseph,
should we accept mode-compatible types in assignments or maybe
transparently convert them?


Looks like we have a more suitable solution for these automatically
generated vector types - mark them with TYPE_VECTOR_OPAQUE.

I'm testing the following incremental patch.

Richard.

Index: gcc/c-typeck.c
===
--- gcc/c-typeck.c.orig 2011-09-28 16:22:10.0 +0200
+++ gcc/c-typeck.c  2011-09-28 16:18:39.0 +0200
@@ -9928,8 +9928,10 @@ build_binary_op (location_t location, en
 }

   /* Always construct signed integer vector type.  */
-  intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE
(type0)), 0);
-  result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+  intt = c_common_type_for_size (GET_MODE_BITSIZE
+  (TYPE_MODE (TREE_TYPE (type0))), 0);
+  result_type = build_opaque_vector_type (intt,
+ TYPE_VECTOR_SUBPARTS (type0));
   converted = 1;
   break;
 }
@@ -10063,8 +10065,10 @@ build_binary_op (location_t location, en
 }

   /* Always construct signed integer vector type.  */
-  intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE
(type0)), 0);
-  result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+  intt = c_common_type_for_size (GET_MODE_BITSIZE
+  (TYPE_MODE (TREE_TYPE (type0))), 0);
+  result_type = build_opaque_vector_type (intt,
+ TYPE_VECTOR_SUBPARTS (type0));
   converted = 1;
   break;
 }


That doesn't seem to work either.  Because we treat the opaque and
non-opaque variants of vector  as different (the opaque type isn't
a variant type of the non-opaque one - something suspicious anyway).

I'm going to try to apply some surgery on how we build opaque variants
and then re-visit the above again.


Bootstrapped and tested on x86_64-unknown-linux-gnu and installed.

Richard.


Richard.





I'm still getting errors with latest trunk (r179378) for arm-none-eabi. 
 Please see http://gcc.gnu.org/PR50576.


Thanks,

Matt


--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltd



[RFA/arm] Fix gcc.target/arm/pr42835.c testcase

2011-09-28 Thread Matthew Gretton-Dann

All,

gcc.target/arm/pr42835.c started failing as a result of the following
change which add tree-tail merging:
  http://gcc.gnu.org/viewcvs?view=revision&revision=179275

The behaviour of the testcase with tree-tail merging is correct, but not 
what is expected.


The attached patch adds -fno-tree-tail-merge to the test options.

Tested arm-none-eabi.

Can someone review and approve please?

Thanks,

Matt

gcc/testsuite/ChangeLog:

2011-09-28  Matthew Gretton-Dann  

* gcc.target/arm/pr42835.c: Add -fno-tree-tail-merge.

--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltddiff --git a/gcc/testsuite/gcc.target/arm/pr42835.c 
b/gcc/testsuite/gcc.target/arm/pr42835.c
index 71c51eb..867dd02 100644
--- a/gcc/testsuite/gcc.target/arm/pr42835.c
+++ b/gcc/testsuite/gcc.target/arm/pr42835.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mthumb -Os" }  */
+/* { dg-options "-mthumb -Os -fno-tree-tail-merge" }  */
 /* { dg-require-effective-target arm_thumb2_ok } */
 
 int foo(int *p, int i)