from:"Ian Bolton"

RE: [PATCH, AArch64] Fix macro in vdup_lane_2 test case

2014-05-20 Thread Ian Bolton

 From: Marcus Shawcroft [mailto:marcus.shawcr...@gmail.com]
 Sent: 19 May 2014 11:45
 To: Ian Bolton
 Cc: gcc-patches
 Subject: Re: [PATCH, AArch64] Fix macro in vdup_lane_2 test case

 On 8 May 2014 18:41, Ian Bolton ian.bol...@arm.com wrote:

  gcc/testsuite
  * gcc.target/aarch64/vdup_lane_2.c (force_simd): Emit an
  actual instruction to move into the allocated register.

 This macro is attempting to force a value to a particular class of
 register, we don't need or want the mov instruction at all.  Isn't
 something like this sufficient:

 #define force_simd(V1)   asm volatile (   \
   : +w(V1)\
   : \
   : /* No clobbers */)

 ?

 /Marcus

Thanks for the review, Marcus.  I did not think of that and it
looks sane, but your suggested approach leads to some of the dup
instructions being optimised away.  Ordinarily, that would be great
but these test cases are trying to force the dups to occur.

Cheers,
Ian

RE: Using particular register class (like floating point registers) as spill register class

2014-05-19 Thread Ian Bolton

 
  Please can you try that on trunk and report back.
 
 OK, this is trunk, and I'm not longer seeing that happen.
 
 However, I am seeing:
 
0x007fb76dc82c +160: adrpx25, 0x7fb7c8
0x007fb76dc830 +164: add x25, x25, #0x480
0x007fb76dc834 +168: fmovd8, x0
0x007fb76dc838 +172: add x0, x29, #0x160
0x007fb76dc83c +176: fmovd9, x0
0x007fb76dc840 +180: add x0, x29, #0xd8
0x007fb76dc844 +184: fmovd10, x0
0x007fb76dc848 +188: add x0, x29, #0xf8
0x007fb76dc84c +192: fmovd11, x0
 
 followed later by:
 
0x007fb76dd224 +2712:fmovx0, d9
0x007fb76dd228 +2716:add x6, x29, #0x118
0x007fb76dd22c +2720:str x20, [x0,w27,sxtw #3]
0x007fb76dd230 +2724:fmovx0, d10
0x007fb76dd234 +2728:str w28, [x0,w27,sxtw #2]
0x007fb76dd238 +2732:fmovx0, d11
0x007fb76dd23c +2736:str w19, [x0,w27,sxtw #2]
 
 which seems a bit suboptimal, given that these double registers now
 have
 to be saved in the prologue.
 

Thanks for doing that.  Many AArch64 improvements have gone in since
4.8 was released.

I think we'd have to see the output for the whole function to
determine whether that code is sane. I don't suppose the source
code is shareable or you have a testcase for this you can share?

Cheers,
Ian

RE: Using particular register class (like floating point registers) as spill register class

2014-05-16 Thread Ian Bolton

 On 05/16/2014 12:05 PM, Kugan wrote:
 
 
  On 16/05/14 20:40, pins...@gmail.com wrote:
 
 
  On May 16, 2014, at 3:23 AM, Kugan
 kugan.vivekanandara...@linaro.org wrote:
 
  I would like to know if there is anyway we can use registers from
  particular register class just as spill registers (in places where
  register allocator would normally spill to stack and nothing more),
 when
  it can be useful.
 
  In AArch64, in some cases, compiling with -mgeneral-regs-only
 produces
  better performance compared not using it. The difference here is
 that
  when -mgeneral-regs-only is not used, floating point register are
 also
  used in register allocation. Then IRA/LRA has to move them to core
  registers before performing operations as shown below.
 
  Can you show the code with fp register disabled?  Does it use the
 stack to spill?  Normally this is due to register to register class
 costs compared to register to memory move cost.  Also I think it
 depends on the processor rather the target.  For thunder, using the fp
 registers might actually be better than using the stack depending if
 the stack was in L1.
  Not all the LDR/STR combination match to fmov. In the testcase I
 have,
 
  aarch64-none-linux-gnu-gcc sha_dgst.c -O2  -S  -mgeneral-regs-only
  grep -c ldr sha_dgst.s
  50
  grep -c str sha_dgst.s
  42
  grep -c fmov sha_dgst.s
  0
 
  aarch64-none-linux-gnu-gcc sha_dgst.c -O2  -S
  grep -c ldr sha_dgst.s
  42
  grep -c str sha_dgst.s
  31
  grep -c fmov sha_dgst.s
  105
 
  I  am not saying that we shouldn't use floating point register here.
 But
  from the above, it seems like register allocator is using it as more
  like core register (even though the cost mode has higher cost) and
 then
  moving the values to core registers before operations. if that is the
  case, my question is, how do we just make this as spill register
 class
  so that we will replace ldr/str with equal number of fmov when it is
  possible.
 
 I'm also seeing stuff like this:
 
 = 0x7fb72a0928 ClassFileParser::parse_constant_pool_entries(int,
 Thread*)+2500:
 add   x21, x4, x21, lsl #3
 = 0x7fb72a092c ClassFileParser::parse_constant_pool_entries(int,
 Thread*)+2504:
 fmov  w2, s8
 = 0x7fb72a0930 ClassFileParser::parse_constant_pool_entries(int,
 Thread*)+2508:
 str   w2, [x21,#88]
 
 I guess GCC doesn't know how to store an SImode value in an FP register
 into
 memory?  This is  4.8.1.
 

Please can you try that on trunk and report back.

Thanks,
Ian

RE: soft-fp functions support without using libgcc

2014-05-16 Thread Ian Bolton

 On Fri, May 16, 2014 at 6:34 AM, Sheheryar Zahoor Qazi
 sheheryar.zahoor.q...@gmail.com wrote:
 
  I am trying to provide soft-fp support to a an 18-bit soft-core
  processor architecture at my university. But the problem is that
  libgcc has not been cross-compiled for my target architecture and
 some
  functions are missing so i cannot build libgcc.I believe soft-fp is
  compiled in libgcc so i am usable to invoke soft-fp functions from
  libgcc.
  It is possible for me to provide soft-fp support without using
 libgcc.
  How should i proceed in defining the functions? Any idea? And does
 any
  archoitecture provide floating point support withoput using libgcc?
 
 I'm sorry, I don't understand the premise of your question.  It is not
 necessary to build libgcc before building libgcc.  That would not make
 sense.  If you have a working compiler that is missing some functions
 provided by libgcc, that should be sufficient to build libgcc.

If you replace cross-compiled with ported, I think it makes senses.
Can one provide soft-fp support without porting libgcc for their
architecture?

Cheers,
Ian

RE: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible

2014-05-16 Thread Ian Bolton

Ping.  This should be relatively simple to review.

Many thanks.

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Ian Bolton
 Sent: 08 May 2014 18:36
 To: gcc-patches
 Subject: [PATCH, AArch64] Use MOVN to generate 64-bit negative
 immediates where sensible

 Hi,

 It currently takes 4 instructions to generate certain immediates on
 AArch64 (unless we put them in the constant pool).

 For example ...

   long long
   beefcafebabe ()
   {
 return 0xBEEFCAFEBABEll;
   }

 leads to ...

   mov x0, 0x47806
   mov x0, 0xcafe, lsl 16
   mov x0, 0xbeef, lsl 32
   orr x0, x0, -281474976710656

 The above case is tackled in this patch by employing MOVN
 to generate the top 32-bits in a single instruction ...

   mov x0, -71536975282177
   movk x0, 0xcafe, lsl 16
   movk x0, 0xbabe, lsl 0

 Note that where at least two half-words are 0x, existing
 code that does the immediate in two instructions is still used.)

 Tested on standard gcc regressions and the attached test case.

 OK for commit?

 Cheers,
 Ian

 2014-05-08  Ian Bolton  ian.bol...@arm.com

 gcc/
   * config/aarch64/aarch64.c (aarch64_expand_mov_immediate):
   Use MOVN when top-most half-word (and only that half-word)
   is 0x.
 gcc/testsuite/
   * gcc.target/aarch64/movn_1.c: New test.

RE: [PATCH, AArch64] Fix macro in vdup_lane_2 test case

2014-05-16 Thread Ian Bolton

Ping.  This may well be classed as obvious, but that's not
obvious to me, so I request a review.  Many thanks.

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Ian Bolton
 Sent: 08 May 2014 18:42
 To: gcc-patches
 Subject: [PATCH, AArch64] Fix macro in vdup_lane_2 test case

 This patch fixes a defective macro definition, based on correct
 definition in similar testcases.  The test currently passes
 through luck rather than correctness.

 OK for commit?

 Cheers,
 Ian

 2014-05-08  Ian Bolton  ian.bol...@arm.com

 gcc/testsuite
   * gcc.target/aarch64/vdup_lane_2.c (force_simd): Emit an
   actual instruction to move into the allocated register.

GCC driver to Compile twice, score the assembly, choose the best?

2014-05-15 Thread Ian Bolton

Hi, fellow GCC developers!

I was wondering if the gcc driver could be made to invoke
cc1 twice, with different flags, and then just keep the
better of the two .s files that comes out?

I'm sure this is not a new idea, but I'm not aware of
anything being done in this area, so I've made this post to
gather your views. :)

The kinds of flags I am thinking could be toggled are
register allocation and instruction scheduling ones, since
it's very hard to find one-size-fits-all there and we don't
really want to have the user depend on knowing the right
one.

Obviously, compilation time will go up, but the run-time
benefits could be huge.

What are your thoughts?  What work in this area have I
failed to dig up in my limited research?

Many thanks,
Ian

RE: GCC driver to Compile twice, score the assembly, choose the best?

2014-05-15 Thread Ian Bolton

Thanks for the quick response.

 On Thu, May 15, 2014 at 1:46 PM, Ian Bolton ian.bol...@arm.com wrote:
  Hi, fellow GCC developers!
 
  I was wondering if the gcc driver could be made to invoke
  cc1 twice, with different flags, and then just keep the
  better of the two .s files that comes out?
 
 I'd be interested in your .s comparison tool that decides which one
 is better!
 

Well, yes, that's the hard part and it could be done a number of ways.

I think it would make a good contest actually, or summer coding
project.  Or at least a nice subject for brainstorming at the GNU
Cauldron. :)

Cheers,
Ian

[PATCH, AArch64] Implement HARD_REGNO_CALLER_SAVE_MODE

2014-05-12 Thread Ian Bolton

Currently, on AArch64, when a caller-save register is saved/restored,
GCC is accessing the maximum size of the hard register.

So an SImode integer (4 bytes) value is being stored as DImode (8 bytes)
because the int registers are 8 bytes wide, and an SFmode float (4 bytes)
and DFmode double (8 bytes) are being stored as TImode (16 bytes) to
capture the full 128-bits of the vector register.

This patch corrects this, by implementing the HARD_REGNO_CALLER_SAVE_MODE
hook, which is called by LRA to determine the minimise size it might need
to save/restore.

Tested on GCC regression suite and verified impact on a number of examples.

OK for trunk?

Cheers,
Ian


2014-05-12  Ian Bolton  ian.bol...@arm.com

* config/aarch64/aarch64-protos.h
(aarch64_hard_regno_caller_save_mode): New prototype.
* config/aarch64/aarch64.c (aarch64_hard_regno_caller_save_mode):
New function.
* config/aarch64/aarch64.h (HARD_REGNO_CALLER_SAVE_MODE): New macro.diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 04cbc78..7cf7d9f 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -202,6 +202,8 @@ enum aarch64_symbol_type aarch64_classify_symbol (rtx,
 enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx);
 enum reg_class aarch64_regno_regclass (unsigned);
 int aarch64_asm_preferred_eh_data_format (int, int);
+enum machine_mode aarch64_hard_regno_caller_save_mode (unsigned, unsigned,
+  enum machine_mode);
 int aarch64_hard_regno_mode_ok (unsigned, enum machine_mode);
 int aarch64_hard_regno_nregs (unsigned, enum machine_mode);
 int aarch64_simd_attr_length_move (rtx);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 8655f04..c2cc81b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -424,6 +424,24 @@ aarch64_hard_regno_mode_ok (unsigned regno, enum 
machine_mode mode)
   return 0;
 }
 
+/* Implement HARD_REGNO_CALLER_SAVE_MODE.  */
+enum machine_mode
+aarch64_hard_regno_caller_save_mode (unsigned regno, unsigned nregs,
+enum machine_mode mode)
+{
+  /* Handle modes that fit within single registers.  */
+  if (nregs == 1  GET_MODE_SIZE (mode) = 16)
+{
+  if (GET_MODE_SIZE (mode) = 4)
+return mode;
+  else
+return SImode;
+}
+  /* Fall back to generic for multi-reg and very large modes.  */
+  else
+return choose_hard_reg_mode (regno, nregs, false);
+}
+
 /* Return true if calls to DECL should be treated as
long-calls (ie called via a register).  */
 static bool
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index c9b30d0..0574593 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -824,6 +824,11 @@ do {   
 \
 
 #define SHIFT_COUNT_TRUNCATED !TARGET_SIMD
 
+/* Choose appropriate mode for caller saves, so we do the minimum
+   required size of load/store.  */
+#define HARD_REGNO_CALLER_SAVE_MODE(REGNO, NREGS, MODE) \
+  aarch64_hard_regno_caller_save_mode ((REGNO), (NREGS), (MODE))
+
 /* Callee only saves lower 64-bits of a 128-bit register.  Tell the
compiler the callee clobbers the top 64-bits when restoring the
bottom 64-bits.  */

[PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible

2014-05-08 Thread Ian Bolton

Hi,

It currently takes 4 instructions to generate certain immediates on
AArch64 (unless we put them in the constant pool).

For example ...

  long long
  beefcafebabe ()
  {
return 0xBEEFCAFEBABEll;
  }

leads to ...

  mov x0, 0x47806
  mov x0, 0xcafe, lsl 16
  mov x0, 0xbeef, lsl 32
  orr x0, x0, -281474976710656

The above case is tackled in this patch by employing MOVN
to generate the top 32-bits in a single instruction ...

  mov x0, -71536975282177
  movk x0, 0xcafe, lsl 16
  movk x0, 0xbabe, lsl 0

Note that where at least two half-words are 0x, existing
code that does the immediate in two instructions is still used.)

Tested on standard gcc regressions and the attached test case.

OK for commit?

Cheers,
Ian


2014-05-08  Ian Bolton  ian.bol...@arm.com

gcc/
* config/aarch64/aarch64.c (aarch64_expand_mov_immediate):
Use MOVN when top-most half-word (and only that half-word)
is 0x.
gcc/testsuite/
* gcc.target/aarch64/movn_1.c: New test.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 43a83566..a8e504e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1177,6 +1177,18 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
}
 }
 
+  /* Look for case where upper 16 bits are set, so we can use MOVN.  */
+  if ((val  0xll) == 0xll)
+{
+  emit_insn (gen_rtx_SET (VOIDmode, dest,
+ GEN_INT (~ (~val  (0xll  32);
+  emit_insn (gen_insv_immdi (dest, GEN_INT (16),
+GEN_INT ((val  16)  0x)));
+  emit_insn (gen_insv_immdi (dest, GEN_INT (0),
+GEN_INT (val  0x)));
+  return;
+}
+
  simple_sequence:
   first = true;
   mask = 0x;
diff --git a/gcc/testsuite/gcc.target/aarch64/movn_1.c 
b/gcc/testsuite/gcc.target/aarch64/movn_1.c
new file mode 100644
index 000..cc11ade
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movn_1.c
@@ -0,0 +1,27 @@
+/* { dg-do run } */
+/* { dg-options -O2 -fno-inline --save-temps } */
+
+extern void abort (void);
+
+long long
+foo ()
+{
+  /* { dg-final { scan-assembler mov\tx\[0-9\]+, -71536975282177 } } */
+  return 0xbeefcafebabell;
+}
+
+long long
+merge4 (int a, int b, int c, int d)
+{
+  return ((long long) a  48 | (long long) b  32
+ | (long long) c  16 | (long long) d);
+}
+
+int main ()
+{
+  if (foo () != merge4 (0x, 0xbeef, 0xcafe, 0xbabe))
+abort ();
+  return 0;
+}
+
+/* { dg-final { cleanup-saved-temps } } */

[PATCH, AArch64] Fix macro in vdup_lane_2 test case

2014-05-08 Thread Ian Bolton

This patch fixes a defective macro definition, based on correct
definition in similar testcases.  The test currently passes
through luck rather than correctness.

OK for commit?

Cheers,
Ian


2014-05-08  Ian Bolton  ian.bol...@arm.com

gcc/testsuite
* gcc.target/aarch64/vdup_lane_2.c (force_simd): Emit an
actual instruction to move into the allocated register.diff --git a/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c 
b/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c
index 7c04e75..2072c79 100644
--- a/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c
@@ -4,10 +4,11 @@
 
 #include arm_neon.h
 
-#define force_simd(V1)   asm volatile (  \
-  : =w(V1)   \
-  : w(V1)\
-  : /* No clobbers */)
+/* Used to force a variable to a SIMD register.  */
+#define force_simd(V1)   asm volatile (orr %0.16b, %1.16b, %1.16b\
+  : =w(V1)   \
+  : w(V1)\
+  : /* No clobbers */);
 
 extern void abort (void);

RE: [PATCH, ARM] Suppress Redundant Flag Setting for Cortex-A15

2014-04-24 Thread Ian Bolton

 Hi,
 
 On 28 January 2014 13:10, Ramana Radhakrishnan
 ramana@googlemail.com wrote:
  On Fri, Jan 24, 2014 at 5:16 PM, Ian Bolton ian.bol...@arm.com
 wrote:
  Hi there!
 
  An existing optimisation for Thumb-2 converts t32 encodings to
  t16 encodings to reduce codesize, at the expense of causing
  redundant flag setting for ADD, AND, etc.  This redundant flag
  setting can have negative performance impact on cortex-a15.
 
  This patch introduces two new tuning options so that the conversion
  from t32 to t16, which takes place in thumb2_reorg, can be
 suppressed
  for cortex-a15.
 
  To maintain some of the original benefit (reduced codesize), the
  suppression is only done where the enclosing basic block is deemed
  worthy of optimising for speed.
 
  This tested with no regressions and performance has improved for
  the workloads tested on cortex-a15.  (It might be beneficial to
  other processors too, but that has not been investigated yet.)
 
  OK for stage 1?
 
  This is OK for stage1.
 
  Ramana
 
 
  Cheers,
  Ian
 
 
  2014-01-24  Ian Bolton  ian.bol...@arm.com
 
  gcc/
  * config/arm/arm-protos.h (tune_params): New struct members.
  * config/arm/arm.c: Initialise tune_params per processor.
  (thumb2_reorg): Suppress conversion from t32 to t16 when
  optimizing for speed, based on new tune_params.
 
 This causes
 gcc.target/arm/negdi-1.c
 gcc.target/arm/negdi-2.c
 to FAIL when GCC is configured as:
 --with-mode=ar
 --with-cpu=cortex-a15
 --with-fpu=neon-vfpv4
 
 both tests used to PASS.
 (see http://cbuild.validation.linaro.org/build/cross-
 validation/gcc/209561/report-build-info.html)

Hi Christophe,

I don't recall the failure when I did the work, but I see now that
the test is looking for negs when my patch is specifically trying to
avoid flag-setting operations.

So we are now getting an rsb instead of a negs, as intended, and the
test needs fixing!

Open question: Should I look for either rsb or negs in a single
scan-assembler or look for different ones dependent on the cpu in
question or just not run the test for cortex-a15?

Cheers,
Ian

RE: [PATCH, ARM] Optimise NotDI AND/OR ZeroExtendSI for ARMv7A

2014-03-27 Thread Ian Bolton

 -Original Message-
 From: Richard Earnshaw
 Sent: 21 March 2014 13:57
 To: Ian Bolton
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH, ARM] Optimise NotDI AND/OR ZeroExtendSI for ARMv7A
 
 On 19/03/14 16:53, Ian Bolton wrote:
  This is a follow-on patch to one already committed:
  http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01128.html
 
  It implements patterns to simplify our RTL as follows:
 
  OR (Not:DI (A:DI), ZeroExtend:DI (B:SI))
--  the top half can be done with a MVN
 
  AND (Not:DI (A:DI), ZeroExtend:DI (B:SI))
--  the top half becomes zero.
 
  I've added test cases for both of these and also the existing
  anddi_notdi patterns.  The tests all pass.
 
  Full regression runs passed.
 
  OK for stage 1?
 
  Cheers,
  Ian
 
 
  2014-03-19  Ian Bolton  ian.bol...@arm.com
 
  gcc/
  * config/arm/arm.md (*anddi_notdi_zesidi): New pattern
  * config/arm/thumb2.md (*iordi_notdi_zesidi): New pattern.
 
  testsuite/
  * gcc.target/arm/anddi_notdi-1.c: New test.
  * gcc.target/arm/iordi_notdi-1.c: New test case.
 
 
  arm-and-ior-notdi-zeroextend-patch-v1.txt
 
 
  diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
  index 2ddda02..d2d85ee 100644
  --- a/gcc/config/arm/arm.md
  +++ b/gcc/config/arm/arm.md
  @@ -2962,6 +2962,28 @@
  (set_attr type multiple)]
   )
 
  +(define_insn_and_split *anddi_notdi_zesidi
  +  [(set (match_operand:DI 0 s_register_operand =r,r)
  +(and:DI (not:DI (match_operand:DI 2 s_register_operand
 0,?r))
  +(zero_extend:DI
  + (match_operand:SI 1 s_register_operand r,r]
 
 The early clobber and register tying here is unnecessary.  All of the
 input operands are consumed in the first instruction, so you can
 eliminate the ties and the restriction on the overlap.  Something like
 (untested):
 
 +(define_insn_and_split *anddi_notdi_zesidi
 +  [(set (match_operand:DI 0 s_register_operand =r)
 +(and:DI (not:DI (match_operand:DI 2 s_register_operand r))
 +(zero_extend:DI
 + (match_operand:SI 1 s_register_operand r]
 
 Ok for stage-1 with that change (though I'd recommend a another test
 run
 to validate the above).
 
 R.

Thanks, Richard.  Regression runs came back OK with that change, so
I will consider this ready for stage 1.

The patch is attached for reference.
 
Cheers,
Ian


diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 2ddda02..4176b7ff 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -2962,6 +2962,28 @@
(set_attr type multiple)]
 )
 
+(define_insn_and_split *anddi_notdi_zesidi
+  [(set (match_operand:DI 0 s_register_operand =r)
+(and:DI (not:DI (match_operand:DI 2 s_register_operand r))
+(zero_extend:DI
+ (match_operand:SI 1 s_register_operand r]
+  TARGET_32BIT
+  #
+  TARGET_32BIT  reload_completed
+  [(set (match_dup 0) (and:SI (not:SI (match_dup 2)) (match_dup 1)))
+   (set (match_dup 3) (const_int 0))]
+  
+  {
+operands[3] = gen_highpart (SImode, operands[0]);
+operands[0] = gen_lowpart (SImode, operands[0]);
+operands[2] = gen_lowpart (SImode, operands[2]);
+  }
+  [(set_attr length 8)
+   (set_attr predicable yes)
+   (set_attr predicable_short_it no)
+   (set_attr type multiple)]
+)
+
 (define_insn_and_split *anddi_notsesidi_di
   [(set (match_operand:DI 0 s_register_operand =r,r)
(and:DI (not:DI (sign_extend:DI
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 467c619..10bc8b1 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -1418,6 +1418,30 @@
(set_attr type multiple)]
 )
 
+(define_insn_and_split *iordi_notdi_zesidi
+  [(set (match_operand:DI 0 s_register_operand =r,r)
+   (ior:DI (not:DI (match_operand:DI 2 s_register_operand 0,?r))
+   (zero_extend:DI
+(match_operand:SI 1 s_register_operand r,r]
+  TARGET_THUMB2
+  #
+  TARGET_THUMB2  reload_completed
+  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
+   (set (match_dup 3) (not:SI (match_dup 4)))]
+  
+  {
+operands[3] = gen_highpart (SImode, operands[0]);
+operands[0] = gen_lowpart (SImode, operands[0]);
+operands[1] = gen_lowpart (SImode, operands[1]);
+operands[4] = gen_highpart (SImode, operands[2]);
+operands[2] = gen_lowpart (SImode, operands[2]);
+  }
+  [(set_attr length 8)
+   (set_attr predicable yes)
+   (set_attr predicable_short_it no)
+   (set_attr type multiple)]
+)
+
 (define_insn_and_split *iordi_notsesidi_di
   [(set (match_operand:DI 0 s_register_operand =r,r)
(ior:DI (not:DI (sign_extend:DI
diff --git a/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c 
b/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c
new file mode 100644
index 000..cfb33fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c
@@ -0,0 +1,65 @@
+/* { dg-do run } */
+/* { dg-options -O2 -fno-inline --save-temps } */
+
+extern void abort (void);
+
+typedef

[PATCH, ARM] Optimise NotDI AND/OR ZeroExtendSI for ARMv7A

2014-03-19 Thread Ian Bolton

This is a follow-on patch to one already committed:
http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01128.html

It implements patterns to simplify our RTL as follows:

OR (Not:DI (A:DI), ZeroExtend:DI (B:SI))
  --  the top half can be done with a MVN

AND (Not:DI (A:DI), ZeroExtend:DI (B:SI))
  --  the top half becomes zero.

I've added test cases for both of these and also the existing
anddi_notdi patterns.  The tests all pass.

Full regression runs passed.

OK for stage 1?

Cheers,
Ian


2014-03-19  Ian Bolton  ian.bol...@arm.com

gcc/
* config/arm/arm.md (*anddi_notdi_zesidi): New pattern
* config/arm/thumb2.md (*iordi_notdi_zesidi): New pattern.

testsuite/
* gcc.target/arm/anddi_notdi-1.c: New test.
* gcc.target/arm/iordi_notdi-1.c: New test case.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 2ddda02..d2d85ee 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -2962,6 +2962,28 @@
(set_attr type multiple)]
 )
 
+(define_insn_and_split *anddi_notdi_zesidi
+  [(set (match_operand:DI 0 s_register_operand =r,r)
+(and:DI (not:DI (match_operand:DI 2 s_register_operand 0,?r))
+(zero_extend:DI
+ (match_operand:SI 1 s_register_operand r,r]
+  TARGET_32BIT
+  #
+  TARGET_32BIT  reload_completed
+  [(set (match_dup 0) (and:SI (not:SI (match_dup 2)) (match_dup 1)))
+   (set (match_dup 3) (const_int 0))]
+  
+  {
+operands[3] = gen_highpart (SImode, operands[0]);
+operands[0] = gen_lowpart (SImode, operands[0]);
+operands[2] = gen_lowpart (SImode, operands[2]);
+  }
+  [(set_attr length 8)
+   (set_attr predicable yes)
+   (set_attr predicable_short_it no)
+   (set_attr type multiple)]
+)
+
 (define_insn_and_split *anddi_notsesidi_di
   [(set (match_operand:DI 0 s_register_operand =r,r)
(and:DI (not:DI (sign_extend:DI
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 467c619..10bc8b1 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -1418,6 +1418,30 @@
(set_attr type multiple)]
 )
 
+(define_insn_and_split *iordi_notdi_zesidi
+  [(set (match_operand:DI 0 s_register_operand =r,r)
+   (ior:DI (not:DI (match_operand:DI 2 s_register_operand 0,?r))
+   (zero_extend:DI
+(match_operand:SI 1 s_register_operand r,r]
+  TARGET_THUMB2
+  #
+  TARGET_THUMB2  reload_completed
+  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
+   (set (match_dup 3) (not:SI (match_dup 4)))]
+  
+  {
+operands[3] = gen_highpart (SImode, operands[0]);
+operands[0] = gen_lowpart (SImode, operands[0]);
+operands[1] = gen_lowpart (SImode, operands[1]);
+operands[4] = gen_highpart (SImode, operands[2]);
+operands[2] = gen_lowpart (SImode, operands[2]);
+  }
+  [(set_attr length 8)
+   (set_attr predicable yes)
+   (set_attr predicable_short_it no)
+   (set_attr type multiple)]
+)
+
 (define_insn_and_split *iordi_notsesidi_di
   [(set (match_operand:DI 0 s_register_operand =r,r)
(ior:DI (not:DI (sign_extend:DI
diff --git a/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c 
b/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c
new file mode 100644
index 000..cfb33fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c
@@ -0,0 +1,65 @@
+/* { dg-do run } */
+/* { dg-options -O2 -fno-inline --save-temps } */
+
+extern void abort (void);
+
+typedef long long s64int;
+typedef int s32int;
+typedef unsigned long long u64int;
+typedef unsigned int u32int;
+
+s64int
+anddi_di_notdi (s64int a, s64int b)
+{
+  return (a  ~b);
+}
+
+s64int
+anddi_di_notzesidi (s64int a, u32int b)
+{
+  return (a  ~(u64int) b);
+}
+
+s64int
+anddi_notdi_zesidi (s64int a, u32int b)
+{
+  return (~a  (u64int) b);
+}
+
+s64int
+anddi_di_notsesidi (s64int a, s32int b)
+{
+  return (a  ~(s64int) b);
+}
+
+int main ()
+{
+  s64int a64 = 0xdeadbeefll;
+  s64int b64 = 0x5f470112ll;
+  s64int c64 = 0xdeadbeef300fll;
+
+  u32int c32 = 0x01124f4f;
+  s32int d32 = 0xabbaface;
+
+  s64int z = anddi_di_notdi (c64, b64);
+  if (z != 0xdeadbeef2008ll)
+abort ();
+
+  z = anddi_di_notzesidi (a64, c32);
+  if (z != 0xdeadbeefb0b0ll)
+abort ();
+
+  z = anddi_notdi_zesidi (c64, c32);
+  if (z != 0x01104f4fll)
+abort ();
+
+  z = anddi_di_notsesidi (a64, d32);
+  if (z != 0x0531ll)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times bic\t 6 } } */
+
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/arm/iordi_notdi-1.c 
b/gcc/testsuite/gcc.target/arm/iordi_notdi-1.c
index cda9c0e..249f080 100644
--- a/gcc/testsuite/gcc.target/arm/iordi_notdi-1.c
+++ b/gcc/testsuite/gcc.target/arm/iordi_notdi-1.c
@@ -9,19 +9,25 @@ typedef unsigned long long u64int;
 typedef unsigned int u32int;
 
 s64int
-iordi_notdi (s64int a, s64int b)
+iordi_di_notdi (s64int a, s64int b)
 {
   return (a | ~b);
 }
 
 s64int
-iordi_notzesidi (s64int a, u32int b

[PATCH] Keep -ffp-contract=fast by default if we have -funsafe-math-optimizations

2014-03-10 Thread Ian Bolton

Hi,

In common.opt, -ffp-contract=fast is set as the default for GCC.  But
then it gets disabled in c-family/c-opts.c if you are using ISO C
(e.g. with -std=c99).

The reason for this patch is that if you have also specified
-funsafe-math-optimizations (or -Ofast or -ffast-math) then it is
likely your preference to have -ffp-contract=fast on, so you can
generate fused multiply adds (fma standard pattern).

This patch works by blocking the override if you have
-funsafe-math-optimizations (directly or indirectly), causing fused
multiply add to be used in the places where we might hope to see it.
(I had considered forcing -ffp-contract=fast on in opts.c if you have
-funsafe-math-optimizations, but it is already on by default ... and
it didn't work either!  The problem is that it is forced off unless
you have explicitly asked for -ffp-contract=fast at the command-line.)

Standard regressions passed.

OK for trunk or stage 1?

Cheers,
Ian


10-03-2014  Ian Bolton  ian.bol...@arm.com

* gcc/c-family/c-opts.c (c_common_post_options): Don't override
-ffp-contract=fast if unsafe-math-optimizations is on.diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index b7478f3..92ba481 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -834,7 +834,8 @@ c_common_post_options (const char **pfilename)
   if (flag_iso
!c_dialect_cxx ()
(global_options_set.x_flag_fp_contract_mode
- == (enum fp_contract_mode) 0))
+ == (enum fp_contract_mode) 0)
+   flag_unsafe_math_optimizations == 0)
 flag_fp_contract_mode = FP_CONTRACT_OFF;
 
   /* By default we use C99 inline semantics in GNU99 or C99 mode.  C99

RE: Shouldn't unsafe-math-optimizations (re-)enable fp-contract=fast?

2014-03-07 Thread Ian Bolton



 -Original Message-
 From: Joseph S. Myers
 On Thu, 6 Mar 2014, Ian Bolton wrote:

  I see in common.opt that fp-contract=fast is the default for GCC.
 
  But then it gets disabled in c-family/c-opts.c if you are using ISO C
  (e.g. with -std=c99).
 
  But surely if you have also specified -funsafe-math-optimizations
 then
  it should flip it back onto fast?
 
 That seems reasonable.
 

Thanks for the feedback, Joseph.

I've been working on the patch, but I see in gcc/opts.c that there is
a definite distinction between set_fast_math_flags and
set_unsafe_math_optimizations_flags.  I'm thinking this is more of a
fast-math thing than an unsafe_math_optimizations thing, so I should
actually be adding it in set_fast_math_flags.

Is this correct?


Whilst I'm on, I think I found a bug in the Optimize-Options 
documentation ...

It says that -ffast-math is enabled by -Ofast and that -ffast-math
enabled -funsafe-math-optimizations. But the definition of
-funsafe-math-optimizations says it is not turned on by any -O option.
I think it should say that it is turned on by -Ofast (via -ffast-math),
just for clarity and consistency.

Does that make sense?

Many thanks,
Ian




 --
 Joseph S. Myers
 jos...@codesourcery.com

RE: Shouldn't unsafe-math-optimizations (re-)enable fp-contract=fast?

2014-03-07 Thread Ian Bolton

 Thanks for the feedback, Joseph.
 
 I've been working on the patch, but I see in gcc/opts.c that there is
 a definite distinction between set_fast_math_flags and
 set_unsafe_math_optimizations_flags.  I'm thinking this is more of a
 fast-math thing than an unsafe_math_optimizations thing, so I should
 actually be adding it in set_fast_math_flags.
 
 Is this correct?

Scratch that. I went ahead and tried to set it in opts.c, but the c-opts.c
override (to OFF) will currently happen in every case UNLESS you explicitly
asked for -ffp-contract=fast.

I therefore think the better solution is now to not override it in c-opts
if unsafe-math-optimizations (which itself is enabled by -ffast-math and/or
-Ofast) is enabled.  We already have fp-contract=fast on by default, so we
should just stop it being turned off unnecessarily instead.

I've successfully coded this one and it works as expected when compiled
with -Ofast -std=c99.  I'll prepare the patch now.

Cheers,
Ian

Shouldn't unsafe-math-optimizations (re-)enable fp-contract=fast?

2014-03-06 Thread Ian Bolton

Hi there,

I see in common.opt that fp-contract=fast is the default for GCC.

But then it gets disabled in c-family/c-opts.c if you are using ISO C
(e.g. with -std=c99).

But surely if you have also specified -funsafe-math-optimizations then
it should flip it back onto fast?

I see in gcc/opts.c that a few other flags are triggered based on
-funsafe-math-optimizations but not -ffp-contract=fast.

I think this is a bug and could lead to less optimal code than people
would like if they want c99 compliance for reason A but also unsafe
math for reason B.

What do you think?

Cheers,
Ian

[PATCH, AArch64] Define __ARM_NEON by default

2014-02-24 Thread Ian Bolton

Hi,

This is needed for when people are porting their aarch32 code to aarch64.
They will have #ifdef __ARM_NEON (as specified in ACLE) and their intrinsics
currently won't get used on aarch64 because it's not defined there by
default.

This patch defines __ARM_NEON so long as we are not using general regs only.

Tested on simple testcase to ensure __ARM_NEON was defined.

OK for trunk?

Cheers,
Ian


2014-02-24  Ian Bolton  ian.bol...@arm.com

* config/aarch64/aarch64.h: Define __ARM_NEON by default if we are
not using general regs only.diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 13c424c..fc21981 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -32,6 +32,9 @@
   else \
builtin_define (__AARCH64EL__);   \
\
+  if (!TARGET_GENERAL_REGS_ONLY)   \
+   builtin_define (__ARM_NEON);  \
+   \
   switch (aarch64_cmodel)  \
{   \
  case AARCH64_CMODEL_TINY: \

[PATCH, ARM] Support ORN for DImode

2014-02-19 Thread Ian Bolton

Hi,

Patterns had previously been added to thumb2.md to support ORN, but only for
SImode.

This patch adds DImode support, to cover the full 64|64-64 operation and
the various 32|64-64 operations (see AND:DI variants that use NOT).

The patch comes with its own execution test and looks for correct number of
ORN instructions in the assembly.

Regressions passed.

OK for stage 1?


2014-02-19  Ian Bolton  ian.bol...@arm.com

gcc/
* config/arm/thumb2.md (*iordi_notdi_di): New pattern.
(*iordi_notzesidi): New pattern.
(*iordi_notsesidi_di): New pattern.
testsuite/
* gcc.target/arm/iordi_notdi-1.c: New test.diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 4f247f8..6a71fec 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -1366,6 +1366,79 @@
(set_attr type alu_reg)]
 )
 
+; Constants for op 2 will never be given to these patterns.
+(define_insn_and_split *iordi_notdi_di
+  [(set (match_operand:DI 0 s_register_operand =r,r)
+   (ior:DI (not:DI (match_operand:DI 1 s_register_operand 0,r))
+   (match_operand:DI 2 s_register_operand r,0)))]
+  TARGET_THUMB2
+  #
+  TARGET_THUMB2  reload_completed
+  [(set (match_dup 0) (ior:SI (not:SI (match_dup 1)) (match_dup 2)))
+   (set (match_dup 3) (ior:SI (not:SI (match_dup 4)) (match_dup 5)))]
+  
+  {
+operands[3] = gen_highpart (SImode, operands[0]);
+operands[0] = gen_lowpart (SImode, operands[0]);
+operands[4] = gen_highpart (SImode, operands[1]);
+operands[1] = gen_lowpart (SImode, operands[1]);
+operands[5] = gen_highpart (SImode, operands[2]);
+operands[2] = gen_lowpart (SImode, operands[2]);
+  }
+  [(set_attr length 8)
+   (set_attr predicable yes)
+   (set_attr predicable_short_it no)
+   (set_attr type multiple)]
+)
+
+(define_insn_and_split *iordi_notzesidi_di
+  [(set (match_operand:DI 0 s_register_operand =r,r)
+   (ior:DI (not:DI (zero_extend:DI
+(match_operand:SI 2 s_register_operand r,r)))
+   (match_operand:DI 1 s_register_operand 0,?r)))]
+  TARGET_THUMB2
+  #
+  ; (not (zero_extend...)) means operand0 will always be 0x
+  TARGET_THUMB2  reload_completed
+  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
+   (set (match_dup 3) (const_int -1))]
+  
+  {
+operands[3] = gen_highpart (SImode, operands[0]);
+operands[0] = gen_lowpart (SImode, operands[0]);
+operands[1] = gen_lowpart (SImode, operands[1]);
+  }
+  [(set_attr length 4,8)
+   (set_attr predicable yes)
+   (set_attr predicable_short_it no)
+   (set_attr type multiple)]
+)
+
+(define_insn_and_split *iordi_notsesidi_di
+  [(set (match_operand:DI 0 s_register_operand =r,r)
+   (ior:DI (not:DI (sign_extend:DI
+(match_operand:SI 2 s_register_operand r,r)))
+   (match_operand:DI 1 s_register_operand 0,r)))]
+  TARGET_THUMB2
+  #
+  TARGET_THUMB2  reload_completed
+  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
+   (set (match_dup 3) (ior:SI (not:SI
+   (ashiftrt:SI (match_dup 2) (const_int 31)))
+  (match_dup 4)))]
+  
+  {
+operands[3] = gen_highpart (SImode, operands[0]);
+operands[0] = gen_lowpart (SImode, operands[0]);
+operands[4] = gen_highpart (SImode, operands[1]);
+operands[1] = gen_lowpart (SImode, operands[1]);
+  }
+  [(set_attr length 8)
+   (set_attr predicable yes)
+   (set_attr predicable_short_it no)
+   (set_attr type multiple)]
+)
+
 (define_insn *orsi_notsi_si
   [(set (match_operand:SI 0 s_register_operand =r)
(ior:SI (not:SI (match_operand:SI 2 s_register_operand r))
diff --git a/gcc/testsuite/gcc.target/arm/iordi_notdi-1.c 
b/gcc/testsuite/gcc.target/arm/iordi_notdi-1.c
new file mode 100644
index 000..cda9c0e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/iordi_notdi-1.c
@@ -0,0 +1,54 @@
+/* { dg-do run } */
+/* { dg-options -O2 -fno-inline --save-temps } */
+
+extern void abort (void);
+
+typedef long long s64int;
+typedef int s32int;
+typedef unsigned long long u64int;
+typedef unsigned int u32int;
+
+s64int
+iordi_notdi (s64int a, s64int b)
+{
+  return (a | ~b);
+}
+
+s64int
+iordi_notzesidi (s64int a, u32int b)
+{
+  return (a | ~(u64int) b);
+}
+
+s64int
+iordi_notsesidi (s64int a, s32int b)
+{
+  return (a | ~(s64int) b);
+}
+
+int main ()
+{
+  s64int a64 = 0xdeadbeefll;
+  s64int b64 = 0x4f4f0112ll;
+
+  u32int c32 = 0x01124f4f;
+  s32int d32 = 0xabbaface;
+
+  s64int z = iordi_notdi (a64, b64);
+  if (z != 0xb0b0feedll)
+abort ();
+
+  z = iordi_notzesidi (a64, c32);
+  if (z != 0xfeedb0b0ll)
+abort ();
+
+  z = iordi_notsesidi (a64, d32);
+  if (z != 0xdeadbeef54450531ll)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times orn\t 5 { target arm_thumb2 } } } */
+
+/* { dg-final { cleanup-saved-temps } } */

RE: [PATCH, ARM] Skip pr59858.c test for -mfloat-abi=hard

2014-02-14 Thread Ian Bolton

  The pr59858.c testcase explicitly sets -msoft-float which is
 incompatible
  with our -mfloat-abi=hard variant.
 
  This patch therefore should not be run if you have -mfloat-abi=hard.
 
  Tested with both variations for arm-none-eabi build.
 
  OK for commit?
 
  Cheers,
  Ian
 
 
  2014-02-13  Ian Bolton  ian.bol...@arm.com
 
  testsuite/
  * gcc.target/arm/pr59858.c: Skip test if -mfloat-abi=hard.
 
 
  pr59858-skip-if-hard-float-patch-v2.txt
 
 
  diff --git a/gcc/testsuite/gcc.target/arm/pr59858.c
 b/gcc/testsuite/gcc.target/arm/pr59858.c
  index 463bd38..1e03203 100644
  --- a/gcc/testsuite/gcc.target/arm/pr59858.c
  +++ b/gcc/testsuite/gcc.target/arm/pr59858.c
  @@ -1,5 +1,6 @@
   /* { dg-do compile } */
   /* { dg-options -march=armv5te -marm -mthumb-interwork -Wall -
 Wstrict-prototypes -Wstrict-aliasing -funsigned-char -fno-builtin -fno-
 asm -msoft-float -std=gnu99 -mlittle-endian -mthumb -fno-stack-
 protector  -Os -g -feliminate-unused-debug-types -funit-at-a-time -
 fmerge-all-constants -fstrict-aliasing -fno-tree-loop-optimize -fno-
 tree-dominator-opts -fno-strength-reduce -fPIC -w } */
  +/* { dg-skip-if Test is not compatible with hard-float { *-*-* } {
 -mfloat-abi=hard } {  } } */
 
   typedef enum {
REG_ENOSYS = -1,
 
 
 This won't work if hard-float is the default.  Take a look at the way
 other tests check for this.

Hi Richard,

The test does actually pass if it is hard float by default. My comment
on the skip line was misleading, because the precise issue is when
someone specifies -mfloat-abi=hard on the command line.  I've fixed up
that comment in the attached patch now.

I've also reduced the number of command-line options passed (without
affecting the code generated) in the patch and changed -msoft-float
into -mfloat-abi=soft, since the former is deprecated and maps to the
latter anyway.

OK for commit?

Cheers,
Iandiff --git a/gcc/testsuite/gcc.target/arm/pr59858.c 
b/gcc/testsuite/gcc.target/arm/pr59858.c
index 463bd38..a944b9a 100644
--- a/gcc/testsuite/gcc.target/arm/pr59858.c
+++ b/gcc/testsuite/gcc.target/arm/pr59858.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options -march=armv5te -marm -mthumb-interwork -Wall 
-Wstrict-prototypes -Wstrict-aliasing -funsigned-char -fno-builtin -fno-asm 
-msoft-float -std=gnu99 -mlittle-endian -mthumb -fno-stack-protector  -Os -g 
-feliminate-unused-debug-types -funit-at-a-time -fmerge-all-constants 
-fstrict-aliasing -fno-tree-loop-optimize -fno-tree-dominator-opts 
-fno-strength-reduce -fPIC -w } */
+/* { dg-options -march=armv5te -fno-builtin -mfloat-abi=soft -mthumb 
-fno-stack-protector -Os -fno-tree-loop-optimize -fno-tree-dominator-opts -fPIC 
-w } */
+/* { dg-skip-if Incompatible command line options: -mfloat-abi=soft 
-mfloat-abi=hard { *-*-* } { -mfloat-abi=hard } {  } } */
 
 typedef enum {
  REG_ENOSYS = -1,

[PATCH, ARM] Skip pr59858.c test for -mfloat-abi=hard

2014-02-13 Thread Ian Bolton

Hi,

The pr59858.c testcase explicitly sets -msoft-float which is incompatible
with our -mfloat-abi=hard variant.

This patch therefore should not be run if you have -mfloat-abi=hard.

Tested with both variations for arm-none-eabi build.

OK for commit?

Cheers,
Ian


2014-02-13  Ian Bolton  ian.bol...@arm.com

testsuite/
* gcc.target/arm/pr59858.c: Skip test if -mfloat-abi=hard.diff --git a/gcc/testsuite/gcc.target/arm/pr59858.c 
b/gcc/testsuite/gcc.target/arm/pr59858.c
index 463bd38..1e03203 100644
--- a/gcc/testsuite/gcc.target/arm/pr59858.c
+++ b/gcc/testsuite/gcc.target/arm/pr59858.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options -march=armv5te -marm -mthumb-interwork -Wall 
-Wstrict-prototypes -Wstrict-aliasing -funsigned-char -fno-builtin -fno-asm 
-msoft-float -std=gnu99 -mlittle-endian -mthumb -fno-stack-protector  -Os -g 
-feliminate-unused-debug-types -funit-at-a-time -fmerge-all-constants 
-fstrict-aliasing -fno-tree-loop-optimize -fno-tree-dominator-opts 
-fno-strength-reduce -fPIC -w } */
+/* { dg-skip-if Test is not compatible with hard-float { *-*-* } { 
-mfloat-abi=hard } {  } } */
 
 typedef enum {
  REG_ENOSYS = -1,

[PATCH] Make pr59597 test PIC-friendly

2014-02-05 Thread Ian Bolton

PR59597 reinstated some code to cancel unnecessary jump threading,
and brought with it a testcase to check that the cancelling happened.

http://gcc.gnu.org/ml/gcc-patches/2014-01/msg01448.html

With PIC enabled for arm and aarch64, the unnecessary jump threading
already never took place, so there is nothing to cancel, leading the
test case to fail.  My suspicion is that similar issues will happen
for other architectures too.

This patch changes the called function to be static, so that jump
threading and the resulting cancellation happen for PIC variants too.

OK for stage 4 or wait for stage 1?

Cheers,
Ian


2014-02-05  Ian Bolton  ian.bol...@arm.com

testsuite/
* gcc.dg/tree-ssa/pr59597.c: Make called function static so that
expected outcome works for PIC variants too.diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c
index 814d299..bc9d730 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c
@@ -8,7 +8,8 @@ typedef unsigned int u32;
 
 u32 f[NNN], t[NNN];
 
-u16 Calc_crc8(u8 data, u16 crc )
+static u16
+Calc_crc8 (u8 data, u16 crc)
 {
   u8 i=0,x16=0,carry=0;
   for (i = 0; i  8; i++)
@@ -31,7 +32,9 @@ u16 Calc_crc8(u8 data, u16 crc )
 }
   return crc;
 }
-int main (int argc, char argv[])
+
+int
+main (int argc, char argv[])
 {
   int i, j; u16 crc;
   for (j = 0; j  1000; j++)

[PATCH, ARM] Suppress Redundant Flag Setting for Cortex-A15

2014-01-24 Thread Ian Bolton

Hi there!

An existing optimisation for Thumb-2 converts t32 encodings to
t16 encodings to reduce codesize, at the expense of causing
redundant flag setting for ADD, AND, etc.  This redundant flag
setting can have negative performance impact on cortex-a15.

This patch introduces two new tuning options so that the conversion
from t32 to t16, which takes place in thumb2_reorg, can be suppressed
for cortex-a15.

To maintain some of the original benefit (reduced codesize), the
suppression is only done where the enclosing basic block is deemed
worthy of optimising for speed.

This tested with no regressions and performance has improved for
the workloads tested on cortex-a15.  (It might be beneficial to
other processors too, but that has not been investigated yet.)

OK for stage 1?

Cheers,
Ian


2014-01-24  Ian Bolton  ian.bol...@arm.com

gcc/
* config/arm/arm-protos.h (tune_params): New struct members.
* config/arm/arm.c: Initialise tune_params per processor.
(thumb2_reorg): Suppress conversion from t32 to t16 when
optimizing for speed, based on new tune_params.
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 13874ee..74645ee 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -272,6 +272,11 @@ struct tune_params
   const struct cpu_vec_costs* vec_costs;
   /* Prefer Neon for 64-bit bitops.  */
   bool prefer_neon_for_64bits;
+  /* Prefer 32-bit encoding instead of flag-setting 16-bit encoding.  */
+  bool disparage_flag_setting_t16_encodings;
+  /* Prefer 32-bit encoding instead of 16-bit encoding where subset of flags
+ would be set.  */
+  bool disparage_partial_flag_setting_t16_encodings;
 };
 
 extern const struct tune_params *current_tune;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index fc81bf6..1ebaf84 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1481,7 +1481,8 @@ const struct tune_params arm_slowmul_tune =
   false,   /* Prefer LDRD/STRD.  */
   {true, true},/* Prefer non short 
circuit.  */
   arm_default_vec_cost,/* Vectorizer costs.  */
-  false /* Prefer Neon for 64-bits 
bitops.  */
+  false,/* Prefer Neon for 64-bits 
bitops.  */
+  false, false  /* Prefer 32-bit encodings.  */
 };
 
 const struct tune_params arm_fastmul_tune =
@@ -1497,7 +1498,8 @@ const struct tune_params arm_fastmul_tune =
   false,   /* Prefer LDRD/STRD.  */
   {true, true},/* Prefer non short 
circuit.  */
   arm_default_vec_cost,/* Vectorizer costs.  */
-  false /* Prefer Neon for 64-bits 
bitops.  */
+  false,/* Prefer Neon for 64-bits 
bitops.  */
+  false, false  /* Prefer 32-bit encodings.  */
 };
 
 /* StrongARM has early execution of branches, so a sequence that is worth
@@ -1516,7 +1518,8 @@ const struct tune_params arm_strongarm_tune =
   false,   /* Prefer LDRD/STRD.  */
   {true, true},/* Prefer non short 
circuit.  */
   arm_default_vec_cost,/* Vectorizer costs.  */
-  false /* Prefer Neon for 64-bits 
bitops.  */
+  false,/* Prefer Neon for 64-bits 
bitops.  */
+  false, false  /* Prefer 32-bit encodings.  */
 };
 
 const struct tune_params arm_xscale_tune =
@@ -1532,7 +1535,8 @@ const struct tune_params arm_xscale_tune =
   false,   /* Prefer LDRD/STRD.  */
   {true, true},/* Prefer non short 
circuit.  */
   arm_default_vec_cost,/* Vectorizer costs.  */
-  false /* Prefer Neon for 64-bits 
bitops.  */
+  false,/* Prefer Neon for 64-bits 
bitops.  */
+  false, false  /* Prefer 32-bit encodings.  */
 };
 
 const struct tune_params arm_9e_tune =
@@ -1548,7 +1552,8 @@ const struct tune_params arm_9e_tune =
   false,   /* Prefer LDRD/STRD.  */
   {true, true},/* Prefer non short 
circuit.  */
   arm_default_vec_cost,/* Vectorizer costs.  */
-  false /* Prefer Neon for 64-bits 
bitops.  */
+  false,/* Prefer Neon for 64-bits 
bitops.  */
+  false, false  /* Prefer 32-bit encodings.  */
 };
 
 const struct tune_params arm_v6t2_tune =
@@ -1564,7 +1569,8

RE: [PATCH, ARM] Implement __builtin_trap

2013-12-06 Thread Ian Bolton

  Hi,
 
  Currently, on ARM, you have to either call abort() or raise(SIGTRAP)
  to achieve a handy crash.
 
  This patch allows you to instead call __builtin_trap() which is much
  more efficient at falling over because it becomes just a single
  instruction that will trap for you.
 
  Two testcases have been added (for ARM and Thumb) and both pass.
 
 
  Note: This is a modified version of a patch originally submitted by
 Mark
  Mitchell back in 2010, which came in response to PR target/59091.
 
  http://gcc.gnu.org/ml/gcc-patches/2010-09/msg00639.html
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59091
 
  The main update, other than cosmetic differences, is that we've
 chosen
  the same ARM encoding as LLVM for practical purposes.  (The Thumb
  encoding in Mark's patch already matched LLVM.)
 
 
  OK for trunk?
 
  Cheers,
  Ian
 
 
  2013-12-04  Ian Bolton  ian.bol...@arm.com
 Mark Mitchell  m...@codesourcery.com
 
  gcc/
 * config/arm/arm.md (trap): New pattern.
 * config/arm/types.md: Added a type for trap.
 
  testsuite/
 * gcc.target/arm/builtin-trap.c: New test.
 * gcc.target/arm/thumb-builtin-trap.c: Likewise.
  aarch32-builtin-trap-v2.txt
 
 This needs to set the conds attribute to unconditional.  Otherwise
 the ARM backend might try to turn this into a conditional instruction.
 
 R.

Thanks, Richard. I fixed it up, tested it and committed as trivial
difference compared to what was approved already.diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index dd73366..934b859 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -9927,6 +9927,23 @@
(set_attr type mov_reg)]
 )
 
+(define_insn trap
+  [(trap_if (const_int 1) (const_int 0))]
+  
+  *
+  if (TARGET_ARM)
+return \.inst\\t0xe7f000f0\;
+  else
+return \.inst\\t0xdeff\;
+  
+  [(set (attr length)
+   (if_then_else (eq_attr is_thumb yes)
+ (const_int 2)
+ (const_int 4)))
+   (set_attr type trap)
+   (set_attr conds unconditional)]
+)
+
 
 ;; Patterns to allow combination of arithmetic, cond code and shifts
 
diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md
index 1c4b9e3..6351f08 100644
--- a/gcc/config/arm/types.md
+++ b/gcc/config/arm/types.md
@@ -152,6 +152,7 @@
 ; store2 store 2 words to memory from arm registers.
 ; store3 store 3 words to memory from arm registers.
 ; store4 store 4 (or more) words to memory from arm registers.
+; trap   cause a trap in the kernel.
 ; udiv   unsigned division.
 ; umaal  unsigned multiply accumulate accumulate long.
 ; umlal  unsigned multiply accumulate long.
@@ -645,6 +646,7 @@
   store2,\
   store3,\
   store4,\
+  trap,\
   udiv,\
   umaal,\
   umlal,\
diff --git a/gcc/testsuite/gcc.target/arm/builtin-trap.c 
b/gcc/testsuite/gcc.target/arm/builtin-trap.c
new file mode 100644
index 000..4ff8d25
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/builtin-trap.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm32 } */
+
+void
+trap ()
+{
+  __builtin_trap ();
+}
+
+/* { dg-final { scan-assembler 0xe7f000f0 { target { arm_nothumb } } } } */
diff --git a/gcc/testsuite/gcc.target/arm/thumb-builtin-trap.c 
b/gcc/testsuite/gcc.target/arm/thumb-builtin-trap.c
new file mode 100644
index 000..22e90e7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/thumb-builtin-trap.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options -mthumb } */
+/* { dg-require-effective-target arm_thumb1_ok } */
+
+void
+trap ()
+{
+  __builtin_trap ();
+}
+
+/* { dg-final { scan-assembler 0xdeff } } */

[PATCH, ARM] Implement __builtin_trap

2013-12-04 Thread Ian Bolton

Hi,

Currently, on ARM, you have to either call abort() or raise(SIGTRAP)
to achieve a handy crash.

This patch allows you to instead call __builtin_trap() which is much
more efficient at falling over because it becomes just a single
instruction that will trap for you.

Two testcases have been added (for ARM and Thumb) and both pass.


Note: This is a modified version of a patch originally submitted by Mark
Mitchell back in 2010, which came in response to PR target/59091.

http://gcc.gnu.org/ml/gcc-patches/2010-09/msg00639.html
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59091

The main update, other than cosmetic differences, is that we've chosen
the same ARM encoding as LLVM for practical purposes.  (The Thumb
encoding in Mark's patch already matched LLVM.)


OK for trunk?

Cheers,
Ian


2013-12-04  Ian Bolton  ian.bol...@arm.com
Mark Mitchell  m...@codesourcery.com

gcc/
* config/arm/arm.md (trap): New pattern.
* config/arm/types.md: Added a type for trap.

testsuite/
* gcc.target/arm/builtin-trap.c: New test.
* gcc.target/arm/thumb-builtin-trap.c: Likewise.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index dd73366..3b7a827 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -9927,6 +9927,22 @@
(set_attr type mov_reg)]
 )
 
+(define_insn trap
+  [(trap_if (const_int 1) (const_int 0))]
+  
+  *
+  if (TARGET_ARM)
+return \.inst\\t0xe7f000f0\;
+  else
+return \.inst\\t0xdeff\;
+  
+  [(set (attr length)
+   (if_then_else (eq_attr is_thumb yes)
+ (const_int 2)
+ (const_int 4)))
+   (set_attr type trap)]
+)
+
 
 ;; Patterns to allow combination of arithmetic, cond code and shifts
 
diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md
index 1c4b9e3..6351f08 100644
--- a/gcc/config/arm/types.md
+++ b/gcc/config/arm/types.md
@@ -152,6 +152,7 @@
 ; store2 store 2 words to memory from arm registers.
 ; store3 store 3 words to memory from arm registers.
 ; store4 store 4 (or more) words to memory from arm registers.
+; trap   cause a trap in the kernel.
 ; udiv   unsigned division.
 ; umaal  unsigned multiply accumulate accumulate long.
 ; umlal  unsigned multiply accumulate long.
@@ -645,6 +646,7 @@
   store2,\
   store3,\
   store4,\
+  trap,\
   udiv,\
   umaal,\
   umlal,\
diff --git a/gcc/testsuite/gcc.target/arm/builtin-trap.c 
b/gcc/testsuite/gcc.target/arm/builtin-trap.c
new file mode 100644
index 000..4ff8d25
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/builtin-trap.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm32 } */
+
+void
+trap ()
+{
+  __builtin_trap ();
+}
+
+/* { dg-final { scan-assembler 0xe7f000f0 { target { arm_nothumb } } } } */
diff --git a/gcc/testsuite/gcc.target/arm/thumb-builtin-trap.c 
b/gcc/testsuite/gcc.target/arm/thumb-builtin-trap.c
new file mode 100644
index 000..22e90e7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/thumb-builtin-trap.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options -mthumb } */
+/* { dg-require-effective-target arm_thumb1_ok } */
+
+void
+trap ()
+{
+  __builtin_trap ();
+}
+
+/* { dg-final { scan-assembler 0xdeff } } */

RE: [PATCH, ARM] Implement __builtin_trap

2013-12-04 Thread Ian Bolton

 On Wed, 4 Dec 2013, Ian Bolton wrote:
 
  The main update, other than cosmetic differences, is that we've
 chosen
  the same ARM encoding as LLVM for practical purposes.  (The Thumb
  encoding in Mark's patch already matched LLVM.)
 
 Do the encodings match what plain udf does in recent-enough gas (too
 recent for us to assume it in GCC or glibc for now), or is it something
 else?

Hi Joseph,

Yes, these encodings match the UDF instruction that is defined in the most
recent edition of the ARM architecture reference manual.

Thumb: 0xde00 | imm8  (we chose 0xff for the imm8)
ARM: 0xe7f000f0 | (imm12  8) | imm4  (we chose to use 0 for both imms)

So as not to break old versions of gas that don't recognise UDF, the
encoding is output directly.

Apologies if I have over-explained there!

Cheers,
Ian

[PATCH, AArch64] Improve handling of constants destined for FP_REGS

2013-09-04 Thread Ian Bolton

(This patch supercedes this one:
http://gcc.gnu.org/ml/gcc-patches/2013-07/msg01462.html)

The movdi_aarch64 pattern allows moving a constant into an FP_REG,
but has the constraint Dd, which is stricter than the constraint N for
moving a constant into a CORE_REG.  This is due to restricted values
allowed for MOVI instruction.

Due to the predicate allowing any constant that is valid for the
CORE_REGs, we can run into situations where IRA/reload has decided
to use FP_REGs but the value is not actually valid for MOVI.

This patch makes use of TARGET_PREFERRED_RELOAD_CLASS to ensure that
NO_REGS (which leads to literal pool) is returned, when the immediate
can't be put directly into FP_REGS.

A testcase is included.

Linux regressions all came back good.

OK for trunk?

Cheers,
Ian


2013-09-04  Ian Bolton  ian.bol...@arm.com

gcc/
* config/aarch64/aarch64.c (aarch64_preferred_reload_class):
Return NO_REGS for immediate that can't be moved directly
into FP_REGS.

testsuite/
* gcc.target/aarch64/movdi_1.c: New test.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index aed035a..2c07ccf 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4236,10 +4236,18 @@ aarch64_class_max_nregs (reg_class_t regclass, enum 
machine_mode mode)
 }
 
 static reg_class_t
-aarch64_preferred_reload_class (rtx x ATTRIBUTE_UNUSED, reg_class_t regclass)
+aarch64_preferred_reload_class (rtx x, reg_class_t regclass)
 {
-  return ((regclass == POINTER_REGS || regclass == STACK_REG)
- ? GENERAL_REGS : regclass);
+  if (regclass == POINTER_REGS || regclass == STACK_REG)
+return GENERAL_REGS;
+
+  /* If it's an integer immediate that MOVI can't handle, then
+ FP_REGS is not an option, so we return NO_REGS instead.  */
+  if (CONST_INT_P (x)  reg_class_subset_p (regclass, FP_REGS)
+   !aarch64_simd_imm_scalar_p (x, GET_MODE (x)))
+return NO_REGS;
+
+  return regclass;
 }
 
 void
diff --git a/gcc/testsuite/gcc.target/aarch64/movdi_1.c 
b/gcc/testsuite/gcc.target/aarch64/movdi_1.c
new file mode 100644
index 000..a22378d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movdi_1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fno-inline } */
+
+#include arm_neon.h
+
+void
+foo1 (uint64_t *a)
+{
+  uint64x1_t val18;
+  uint32x2_t val19;
+  uint64x1_t val20;
+  val19 = vcreate_u32 (0x80004cf3dffbUL);
+  val20 = vrsra_n_u64 (val18, vreinterpret_u64_u32 (val19), 34);
+  vst1_u64 (a, val20);
+}
+
+void
+foo2 (uint64_t *a)
+{
+  uint64x1_t val18;
+  uint32x2_t val19;
+  uint64x1_t val20;
+  val19 = vcreate_u32 (0xdffbUL);
+  val20 = vrsra_n_u64 (val18, vreinterpret_u64_u32 (val19), 34);
+  vst1_u64 (a, val20);
+}

[PATCH, AArch64] Add secondary reload for immediates into FP_REGS

2013-07-30 Thread Ian Bolton

Our movdi_aarch64 pattern allows moving a constant into an FP_REG,
but has the constraint Dd, which is stricter than the one for
moving a constant into a CORE_REG.  This is due to restricted values
allowed for MOVI instructions.

Due to the predicate for the pattern allowing any constant that is
valid for the CORE_REGs, we can run into situations where IRA/reload
has decided to use FP_REGs but the value is not actually valid for
MOVI.

This patch introduces a secondary reload to handle this case.

Supplied with testcase that highlighted original problem.
Tested on Linux GNU regressions.

OK for trunk?

Cheers,
Ian


2013-07-30  Ian Bolton  ian.bol...@arm.com

gcc/
* config/aarch64/aarch64.c (aarch64_secondary_reload)): Handle
constant into FP_REGs that is not valid for MOVI.

testsuite/
* gcc.target/aarch64/movdi_1.c: New test.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9941d7c..f16988e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4070,6 +4070,15 @@ aarch64_secondary_reload (bool in_p ATTRIBUTE_UNUSED, 
rtx x,
   if (rclass == FP_REGS  (mode == TImode || mode == TFmode)  CONSTANT_P(x))
   return CORE_REGS;
 
+  /* Only a subset of the DImode immediate values valid for CORE_REGS are
+ valid for FP_REGS.  Where we have an immediate value that isn't valid
+ for FP_REGS, and RCLASS is FP_REGS, we return CORE_REGS to cause the
+ value to be generated into there first and later copied to FP_REGS to be
+ used.  */
+  if (rclass == FP_REGS  mode == DImode  CONST_INT_P (x)
+   !aarch64_simd_imm_scalar_p (x, GET_MODE (x)))
+return CORE_REGS;
+
   return NO_REGS;
 }
 
diff --git a/gcc/testsuite/gcc.target/aarch64/movdi_1.c 
b/gcc/testsuite/gcc.target/aarch64/movdi_1.c
new file mode 100644
index 000..1decd99
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movdi_1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fno-inline } */
+
+#include arm_neon.h
+
+void
+foo (uint64_t *a)
+{
+  uint64x1_t val18;
+  uint32x2_t val19;
+  uint64x1_t val20;
+  val19 = vcreate_u32 (0x80004cf3dffbUL);
+  val20 = vrsra_n_u64 (val18, vreinterpret_u64_u32 (val19), 34);
+  vst1_u64 (a, val20);
+}

[PATCH, AArch64] Support NEG in vector registers for DI and SI mode

2013-07-23 Thread Ian Bolton

Support added for scalar NEG instruction in vector registers.

Execution testcase included.

Tested on usual GCC Linux regressions.

OK for trunk?

Cheers,
Ian


2013-07-23  Ian Bolton  ian.bol...@arm.com

gcc/
* config/aarch64/aarch64-simd.md (negmode2): Offer alternative
that uses vector registers.

testsuite/
* gcc.target/aarch64/neg_1.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index e88e5be..d76056c 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2004,12 +2004,17 @@
 )
 
 (define_insn negmode2
-  [(set (match_operand:GPI 0 register_operand =r)
-   (neg:GPI (match_operand:GPI 1 register_operand r)))]
+  [(set (match_operand:GPI 0 register_operand =r,w)
+   (neg:GPI (match_operand:GPI 1 register_operand r,w)))]
   
-  neg\\t%w0, %w1
+  @
+   neg\\t%w0, %w1
+   neg\\t%rtn0vas, %rtn1vas
   [(set_attr v8type alu)
-   (set_attr mode MODE)]
+   (set_attr simd_type *,simd_negabs)
+   (set_attr simd *,yes)
+   (set_attr mode MODE)
+   (set_attr simd_mode MODE)]
 )
 
 ;; zero_extend version of above
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 8e40c5d..7acbcfd 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -277,6 +277,12 @@
(V2DI ) (V2SF )
(V4SF ) (V2DF )])
 
+;; Register Type Name and Vector Arrangement Specifier for when
+;; we are doing scalar for DI and SIMD for SI (ignoring all but
+;; lane 0).
+(define_mode_attr rtn [(DI d) (SI )])
+(define_mode_attr vas [(DI ) (SI .2s)])
+
 ;; Map a floating point mode to the appropriate register name prefix
 (define_mode_attr s [(SF s) (DF d)])
 
diff --git a/gcc/testsuite/gcc.target/aarch64/neg_1.c 
b/gcc/testsuite/gcc.target/aarch64/neg_1.c
new file mode 100644
index 000..04b0fdd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/neg_1.c
@@ -0,0 +1,67 @@
+/* { dg-do run } */
+/* { dg-options -O2 -fno-inline --save-temps } */
+
+extern void abort (void);
+
+long long
+neg64 (long long a)
+{
+  /* { dg-final { scan-assembler neg\tx\[0-9\]+ } } */
+  return 0 - a;
+}
+
+long long
+neg64_in_dreg (long long a)
+{
+  /* { dg-final { scan-assembler neg\td\[0-9\]+, d\[0-9\]+ } } */
+  register long long x asm (d8) = a;
+  register long long y asm (d9);
+  asm volatile ( : : w (x));
+  y = 0 - x;
+  asm volatile ( : : w (y));
+  return y;
+}
+
+int
+neg32 (int a)
+{
+  /* { dg-final { scan-assembler neg\tw\[0-9\]+ } } */
+  return 0 - a;
+}
+
+int
+neg32_in_sreg (int a)
+{
+  /* { dg-final { scan-assembler neg\tv\[0-9\]+\.2s, v\[0-9\]+\.2s } } */
+  register int x asm (s8) = a;
+  register int y asm (s9);
+  asm volatile ( : : w (x));
+  y = 0 - x;
+  asm volatile ( : : w (y));
+  return y;
+}
+
+int
+main (void)
+{
+  long long a;
+  int b;
+  a = 61;
+  b = 313;
+
+  if (neg64 (a) != -61)
+abort ();
+
+  if (neg64_in_dreg (a) != -61)
+abort ();
+
+  if (neg32 (b) != -313)
+abort ();
+
+  if (neg32_in_sreg (b) != -313)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { cleanup-saved-temps } } */

[PATCH, AArch64] Add vabs_s64 intrinsic

2013-07-12 Thread Ian Bolton

This patch implements the following intrinsic:

  int64x1_t vabs_s64 (int64x1 a)

It uses __builtin_llabs(), which will lead to abs Dn, Dm being generated
for
this now that my other patch has been committed.

Test case added to scalar_intrinsics.c.


OK for trunk?

Cheers,
Ian



2013-07-12  Ian Bolton  ian.bol...@arm.com

gcc/
* config/aarch64/arm_neon.h (vabs_s64): New function.

testsuite/
* gcc.target/aarch64/scalar_intrinsics.c (test_vabs_s64): Added new
test.Index: gcc/config/aarch64/arm_neon.h
===
--- gcc/config/aarch64/arm_neon.h   (revision 200594)
+++ gcc/config/aarch64/arm_neon.h   (working copy)
@@ -17886,6 +17886,12 @@ vabsq_f64 (float64x2_t __a)
   return __builtin_aarch64_absv2df (__a);
 }
 
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vabs_s64 (int64x1_t a)
+{
+  return __builtin_llabs (a);
+}
+
 /* vadd */
 
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
Index: gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c
===
--- gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c(revision 
200594)
+++ gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c(working copy)
@@ -32,6 +32,18 @@ test_vaddd_s64_2 (int64x1_t a, int64x1_t
 vqaddd_s64 (a, d));
 }
 
+/* { dg-final { scan-assembler-times \\tabs\\td\[0-9\]+, d\[0-9\]+ 1 } } */
+
+int64x1_t
+test_vabs_s64 (int64x1_t a)
+{
+  uint64x1_t res;
+  force_simd (a);
+  res = vabs_s64 (a);
+  force_simd (res);
+  return res;
+}
+
 /* { dg-final { scan-assembler-times \\tcmeq\\td\[0-9\]+, d\[0-9\]+, 
d\[0-9\]+ 1 } } */
 
 uint64x1_t

[PATCH, AArch64] Support abs standard pattern for DI mode

2013-06-25 Thread Ian Bolton

Hi,

I'm adding support for abs standard pattern name for DI mode,
via the ABS instruction in FP registers and the EOR/SUB combo
in GP registers.

Regression tests for Linux and bare-metal all passed.

OK for trunk?

Cheers,
Ian


2013-06-25  Ian Bolton  ian.bol...@arm.com

gcc/
* config/aarch64/aarch64-simd.md (absdi2): Support abs for
DI mode.

testsuite/
* gcc.target/aarch64/abs_1.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index e88e5be..3700977 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2003,6 +2003,38 @@
(set_attr mode SI)]
 )
 
+(define_insn_and_split absdi2
+  [(set (match_operand:DI 0 register_operand =r,w)
+   (abs:DI (match_operand:DI 1 register_operand r,w)))
+   (clobber (match_scratch:DI 2 =r,X))]
+  
+  @
+   #
+   abs\\t%d0, %d1
+  reload_completed
+GP_REGNUM_P (REGNO (operands[0]))
+GP_REGNUM_P (REGNO (operands[1]))
+  [(const_int 0)]
+  {
+emit_insn (gen_rtx_SET (VOIDmode, operands[2],
+   gen_rtx_XOR (DImode,
+gen_rtx_ASHIFTRT (DImode,
+  operands[1],
+  GEN_INT (63)),
+operands[1])));
+emit_insn (gen_rtx_SET (VOIDmode,
+   operands[0],
+   gen_rtx_MINUS (DImode,
+  operands[2],
+  gen_rtx_ASHIFTRT (DImode,
+operands[1],
+GEN_INT (63);
+DONE;
+  }
+  [(set_attr v8type alu)
+   (set_attr mode DI)]
+)
+
 (define_insn negmode2
   [(set (match_operand:GPI 0 register_operand =r)
(neg:GPI (match_operand:GPI 1 register_operand r)))]
diff --git a/gcc/testsuite/gcc.target/aarch64/abs_1.c 
b/gcc/testsuite/gcc.target/aarch64/abs_1.c
new file mode 100644
index 000..938bc84
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/abs_1.c
@@ -0,0 +1,53 @@
+/* { dg-do run } */
+/* { dg-options -O2 -fno-inline --save-temps } */
+
+extern long long llabs (long long);
+extern void abort (void);
+
+long long
+abs64 (long long a)
+{
+  /* { dg-final { scan-assembler eor\t } } */
+  /* { dg-final { scan-assembler sub\t } } */
+  return llabs (a);
+}
+
+long long
+abs64_in_dreg (long long a)
+{
+  /* { dg-final { scan-assembler abs\td\[0-9\]+, d\[0-9\]+ } } */
+  register long long x asm (d8) = a;
+  register long long y asm (d9);
+  asm volatile ( : : w (x));
+  y = llabs (x);
+  asm volatile ( : : w (y));
+  return y;
+}
+
+int
+main (void)
+{
+  volatile long long ll0 = 0LL, ll1 = 1LL, llm1 = -1LL;
+
+  if (abs64 (ll0) != 0LL)
+abort ();
+
+  if (abs64 (ll1) != 1LL)
+abort ();
+
+  if (abs64 (llm1) != 1LL)
+abort ();
+
+  if (abs64_in_dreg (ll0) != 0LL)
+abort ();
+
+  if (abs64_in_dreg (ll1) != 1LL)
+abort ();
+
+  if (abs64_in_dreg (llm1) != 1LL)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { cleanup-saved-temps } } */

[PATCH, AArch64] Update insv_1.c test for Big Endian

2013-06-24 Thread Ian Bolton

Hi,

The insv_1.c test case I added recently was not compatible with big endian.
I attempted to fix with #ifdefs but dejagnu thinks all dg directives in a
file, regardless of #ifdefs, are applicable, so I had to instead make a
new test and add a new effective target to show when each test is supported.
 
I've tested these two tests on little and big.  All was OK.

OK for trunk?

Cheers,
Ian


2013-06-24  Ian Bolton  ian.bol...@arm.com

* gcc.target/config/aarch64/insv_1.c: Update to show it doesn't work
on big endian.
* gcc.target/config/aarch64/insv_2.c: New test for big endian.
* lib/target-supports.exp: Define aarch64_little_endian.diff --git a/gcc/testsuite/gcc.target/aarch64/insv_1.c 
b/gcc/testsuite/gcc.target/aarch64/insv_1.c
index bc8928d..6e3c7f0 100644
--- a/gcc/testsuite/gcc.target/aarch64/insv_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/insv_1.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
+/* { dg-do run { target aarch64*-*-* } } */
 /* { dg-options -O2 --save-temps -fno-inline } */
+/* { dg-require-effective-target aarch64_little_endian } */
 
 extern void abort (void);
 
diff --git a/gcc/testsuite/gcc.target/aarch64/insv_2.c 
b/gcc/testsuite/gcc.target/aarch64/insv_2.c
new file mode 100644
index 000..a7691a3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/insv_2.c
@@ -0,0 +1,85 @@
+/* { dg-do run { target aarch64*-*-* } } */
+/* { dg-options -O2 --save-temps -fno-inline } */
+/* { dg-require-effective-target aarch64_big_endian } */
+
+extern void abort (void);
+
+typedef struct bitfield
+{
+  unsigned short eight: 8;
+  unsigned short four: 4;
+  unsigned short five: 5;
+  unsigned short seven: 7;
+  unsigned int sixteen: 16;
+} bitfield;
+
+bitfield
+bfi1 (bitfield a)
+{
+  /* { dg-final { scan-assembler bfi\tx\[0-9\]+, x\[0-9\]+, 56, 8 } } */
+  a.eight = 3;
+  return a;
+}
+
+bitfield
+bfi2 (bitfield a)
+{
+  /* { dg-final { scan-assembler bfi\tx\[0-9\]+, x\[0-9\]+, 43, 5 } } */
+  a.five = 7;
+  return a;
+}
+
+bitfield
+movk (bitfield a)
+{
+  /* { dg-final { scan-assembler movk\tx\[0-9\]+, 0x1d6b, lsl 16 } } */
+  a.sixteen = 7531;
+  return a;
+}
+
+bitfield
+set1 (bitfield a)
+{
+  /* { dg-final { scan-assembler orr\tx\[0-9\]+, x\[0-9\]+, 272678883688448 
} } */
+  a.five = 0x1f;
+  return a;
+}
+
+bitfield
+set0 (bitfield a)
+{
+  /* { dg-final { scan-assembler and\tx\[0-9\]+, x\[0-9\]+, -272678883688449 
} } */
+  a.five = 0;
+  return a;
+}
+
+
+int
+main (int argc, char** argv)
+{
+  static bitfield a;
+  bitfield b = bfi1 (a);
+  bitfield c = bfi2 (b);
+  bitfield d = movk (c);
+
+  if (d.eight != 3)
+abort ();
+
+  if (d.five != 7)
+abort ();
+
+  if (d.sixteen != 7531)
+abort ();
+
+  d = set1 (d);
+  if (d.five != 0x1f)
+abort ();
+
+  d = set0 (d);
+  if (d.five != 0)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index a80078a..aca4215 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2105,6 +2105,15 @@ proc check_effective_target_aarch64_big_endian { } {
 }]
 }
 
+# Return 1 if this is a AArch64 target supporting little endian
+proc check_effective_target_aarch64_little_endian { } {
+return [check_no_compiler_messages aarch64_little_endian assembly {
+#if !defined(__aarch64__) || defined(__AARCH64EB__)
+#error FOO
+#endif
+}]
+}
+
 # Return 1 is this is an arm target using 32-bit instructions
 proc check_effective_target_arm32 { } {
 return [check_no_compiler_messages arm32 assembly {

[AArch64, PATCH 1/5] Improve MOVI handling (Change interface of aarch64_simd_valid_immediate)

2013-06-03 Thread Ian Bolton

(This patch is the first of five, where the first 4 do some clean-up and
the last fixes a bug with scalar MOVI.  The bug fix without the clean-up
was particularly ugly!)


This one is pretty simple - just altering an interface, so we can later
remove an unnecessary wrapper function.


OK for trunk?

Cheers,
Ian


13-06-03  Ian Bolton  ian.bol...@arm.com

* config/aarch64/aarch64.c (aarch64_simd_valid_immediate): Change
return type to bool for prototype.
(aarch64_legitimate_constant_p): Check for true instead of not -1.
(aarch64_simd_valid_immediate): Fix up each return to return a bool.
(aarch64_simd_immediate_valid_for_move): Update retval for bool.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 12a7055..05ff5fa 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -103,7 +103,7 @@ static bool aarch64_vfp_is_call_or_return_candidate (enum 
machine_mode,
 static void aarch64_elf_asm_constructor (rtx, int) ATTRIBUTE_UNUSED;
 static void aarch64_elf_asm_destructor (rtx, int) ATTRIBUTE_UNUSED;
 static void aarch64_override_options_after_change (void);
-static int aarch64_simd_valid_immediate (rtx, enum machine_mode, int, rtx *,
+static bool aarch64_simd_valid_immediate (rtx, enum machine_mode, int, rtx *,
 int *, unsigned char *, int *, int *);
 static bool aarch64_vector_mode_supported_p (enum machine_mode);
 static unsigned bit_count (unsigned HOST_WIDE_INT);
@@ -5153,7 +5153,7 @@ aarch64_legitimate_constant_p (enum machine_mode mode, 
rtx x)
  we now decompose CONST_INTs according to expand_mov_immediate.  */
   if ((GET_CODE (x) == CONST_VECTOR
 aarch64_simd_valid_immediate (x, mode, false,
-   NULL, NULL, NULL, NULL, NULL) != -1)
+   NULL, NULL, NULL, NULL, NULL))
   || CONST_INT_P (x) || aarch64_valid_floating_const (mode, x))
return !targetm.cannot_force_const_mem (mode, x);
 
@@ -6144,11 +6144,8 @@ aarch64_vect_float_const_representable_p (rtx x)
   return aarch64_float_const_representable_p (x0);
 }
 
-/* TODO: This function returns values similar to those
-   returned by neon_valid_immediate in gcc/config/arm/arm.c
-   but the API here is different enough that these magic numbers
-   are not used.  It should be sufficient to return true or false.  */
-static int
+/* Return true for valid and false for invalid.  */
+static bool
 aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, int inverse,
  rtx *modconst, int *elementwidth,
  unsigned char *elementchar,
@@ -6184,24 +6181,21 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode 
mode, int inverse,
 
   if (!(simd_imm_zero
|| aarch64_vect_float_const_representable_p (op)))
-   return -1;
+   return false;
 
-   if (modconst)
- *modconst = CONST_VECTOR_ELT (op, 0);
+  if (modconst)
+   *modconst = CONST_VECTOR_ELT (op, 0);
 
-   if (elementwidth)
- *elementwidth = elem_width;
+  if (elementwidth)
+   *elementwidth = elem_width;
 
-   if (elementchar)
- *elementchar = sizetochar (elem_width);
+  if (elementchar)
+   *elementchar = sizetochar (elem_width);
 
-   if (shift)
- *shift = 0;
+  if (shift)
+   *shift = 0;
 
-   if (simd_imm_zero)
- return 19;
-   else
- return 18;
+  return true;
 }
 
   /* Splat vector constant out into a byte vector.  */
@@ -6299,7 +6293,7 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode 
mode, int inverse,
   if (immtype == -1
   || (immtype = 12  immtype = 15)
   || immtype == 18)
-return -1;
+return false;
 
 
   if (elementwidth)
@@ -6351,7 +6345,7 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode 
mode, int inverse,
 }
 }
 
-  return immtype;
+  return (immtype = 0);
 #undef CHECK
 }
 
@@ -6369,11 +6363,11 @@ aarch64_simd_immediate_valid_for_move (rtx op, enum 
machine_mode mode,
   int tmpwidth;
   unsigned char tmpwidthc;
   int tmpmvn = 0, tmpshift = 0;
-  int retval = aarch64_simd_valid_immediate (op, mode, 0, tmpconst,
+  bool retval = aarch64_simd_valid_immediate (op, mode, 0, tmpconst,
 tmpwidth, tmpwidthc,
 tmpmvn, tmpshift);
 
-  if (retval == -1)
+  if (!retval)
 return 0;
 
   if (modconst)

[AArch64, PATCH 3/5] Improve MOVI handling (Don't update RTX operand in-place)

2013-06-03 Thread Ian Bolton

(This patch is the third of five, where the first 4 do some clean-up and
the last fixes a bug with scalar MOVI.  The bug fix without the clean-up
was particularly ugly!)


This one is focused on cleaning up aarch64_simd_valid_immediate, with
better use of arguments and no in-place modification of RTX operands.

Specifically, I've changed the set of pointers that are passed in
(it's now a struct) and the caller prints out the immediate value
directly instead of letting operand[1] get fudged.


OK for trunk?


Cheers,
Ian


2013-06-03  Ian Bolton  ian.bol...@arm.com

* config/aarch64/aarch64.c (simd_immediate_info): Struct to hold
information completed by aarch64_simd_valid_immediate.
(aarch64_legitimate_constant_p): Update arguments.
(aarch64_simd_valid_immediate): Work with struct rather than many
pointers.
(aarch64_simd_scalar_immediate_valid_for_move): Update arguments.
(aarch64_simd_make_constant): Update arguments.
(aarch64_output_simd_mov_immediate): Work with struct rather than
many pointers.  Output immediate directly rather than as operand.
* config/aarch64/aarch64-protos.h (aarch64_simd_valid_immediate):
Update prototype.
* config/aarch64/constraints.md (Dn): Update arguments.diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index d1de14e..083ce91 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -156,8 +156,8 @@ bool aarch64_simd_imm_scalar_p (rtx x, enum machine_mode 
mode);
 bool aarch64_simd_imm_zero_p (rtx, enum machine_mode);
 bool aarch64_simd_scalar_immediate_valid_for_move (rtx, enum machine_mode);
 bool aarch64_simd_shift_imm_p (rtx, enum machine_mode, bool);
-bool aarch64_simd_valid_immediate (rtx, enum machine_mode, int, rtx *,
-  int *, unsigned char *, int *, int *);
+bool aarch64_simd_valid_immediate (rtx, enum machine_mode, bool,
+  struct simd_immediate_info *);
 bool aarch64_symbolic_address_p (rtx);
 bool aarch64_symbolic_constant_p (rtx, enum aarch64_symbol_context,
  enum aarch64_symbol_type *);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5f97efe..d83e645 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -87,6 +87,14 @@ struct aarch64_address_info {
   enum aarch64_symbol_type symbol_type;
 };
 
+struct simd_immediate_info {
+  rtx value;
+  int shift;
+  int element_width;
+  unsigned char element_char;
+  bool mvn;
+};
+
 /* The current code model.  */
 enum aarch64_code_model aarch64_cmodel;
 
@@ -5150,8 +5158,7 @@ aarch64_legitimate_constant_p (enum machine_mode mode, 
rtx x)
   /* This could probably go away because
  we now decompose CONST_INTs according to expand_mov_immediate.  */
   if ((GET_CODE (x) == CONST_VECTOR
-aarch64_simd_valid_immediate (x, mode, false,
-   NULL, NULL, NULL, NULL, NULL))
+aarch64_simd_valid_immediate (x, mode, false, NULL))
   || CONST_INT_P (x) || aarch64_valid_floating_const (mode, x))
return !targetm.cannot_force_const_mem (mode, x);
 
@@ -6144,10 +6151,8 @@ aarch64_vect_float_const_representable_p (rtx x)
 
 /* Return true for valid and false for invalid.  */
 bool
-aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, int inverse,
- rtx *modconst, int *elementwidth,
- unsigned char *elementchar,
- int *mvn, int *shift)
+aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, bool inverse,
+ struct simd_immediate_info *info)
 {
 #define CHECK(STRIDE, ELSIZE, CLASS, TEST, SHIFT, NEG) \
   matches = 1; \
@@ -6181,17 +6186,14 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode 
mode, int inverse,
|| aarch64_vect_float_const_representable_p (op)))
return false;
 
-  if (modconst)
-   *modconst = CONST_VECTOR_ELT (op, 0);
-
-  if (elementwidth)
-   *elementwidth = elem_width;
-
-  if (elementchar)
-   *elementchar = sizetochar (elem_width);
-
-  if (shift)
-   *shift = 0;
+  if (info)
+   {
+ info-value = CONST_VECTOR_ELT (op, 0);
+ info-element_width = elem_width;
+ info-element_char = sizetochar (elem_width);
+ info-mvn = false;
+ info-shift = 0;
+   }
 
   return true;
 }
@@ -6293,21 +6295,13 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode 
mode, int inverse,
   || immtype == 18)
 return false;
 
-
-  if (elementwidth)
-*elementwidth = elsize;
-
-  if (elementchar)
-*elementchar = elchar;
-
-  if (mvn)
-*mvn = emvn;
-
-  if (shift)
-*shift = eshift;
-
-  if (modconst)
+  if (info)
 {
+  info-element_width = elsize

[AArch64, PATCH 2/5] Improve MOVI handling (Remove wrapper function)

2013-06-03 Thread Ian Bolton

(This patch is the second of five, where the first 4 do some clean-up and
the last fixes a bug with scalar MOVI.  The bug fix without the clean-up
was particularly ugly!)


This one is also very simple - removing a wrapper function that was an
unnecessary level of indirection.


OK for trunk?

Cheers,
Ian


13-06-03  Ian Bolton  ian.bol...@arm.com

* config/aarch64/aarch64.c (aarch64_simd_valid_immediate): No
longer static.
(aarch64_simd_immediate_valid_for_move): Remove.
(aarch64_simd_scalar_immediate_valid_for_move): Update call.
(aarch64_simd_make_constant): Update call.
(aarch64_output_simd_mov_immediate): Update call.
* config/aarch64/aarch64-protos.h (aarch64_simd_valid_immediate):
Add prototype.
* config/aarch64/constraints.md (Dn): Update call.diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 91fcde8..d1de14e 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -156,6 +156,8 @@ bool aarch64_simd_imm_scalar_p (rtx x, enum machine_mode 
mode);
 bool aarch64_simd_imm_zero_p (rtx, enum machine_mode);
 bool aarch64_simd_scalar_immediate_valid_for_move (rtx, enum machine_mode);
 bool aarch64_simd_shift_imm_p (rtx, enum machine_mode, bool);
+bool aarch64_simd_valid_immediate (rtx, enum machine_mode, int, rtx *,
+  int *, unsigned char *, int *, int *);
 bool aarch64_symbolic_address_p (rtx);
 bool aarch64_symbolic_constant_p (rtx, enum aarch64_symbol_context,
  enum aarch64_symbol_type *);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 05ff5fa..aec59b0 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -103,8 +103,6 @@ static bool aarch64_vfp_is_call_or_return_candidate (enum 
machine_mode,
 static void aarch64_elf_asm_constructor (rtx, int) ATTRIBUTE_UNUSED;
 static void aarch64_elf_asm_destructor (rtx, int) ATTRIBUTE_UNUSED;
 static void aarch64_override_options_after_change (void);
-static bool aarch64_simd_valid_immediate (rtx, enum machine_mode, int, rtx *,
-int *, unsigned char *, int *, int *);
 static bool aarch64_vector_mode_supported_p (enum machine_mode);
 static unsigned bit_count (unsigned HOST_WIDE_INT);
 static bool aarch64_const_vec_all_same_int_p (rtx,
@@ -6145,7 +6143,7 @@ aarch64_vect_float_const_representable_p (rtx x)
 }
 
 /* Return true for valid and false for invalid.  */
-static bool
+bool
 aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, int inverse,
  rtx *modconst, int *elementwidth,
  unsigned char *elementchar,
@@ -6349,45 +6347,6 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode 
mode, int inverse,
 #undef CHECK
 }
 
-/* Return TRUE if rtx X is legal for use as either a AdvSIMD MOVI instruction
-   (or, implicitly, MVNI) immediate.  Write back width per element
-   to *ELEMENTWIDTH, and a modified constant (whatever should be output
-   for a MOVI instruction) in *MODCONST.  */
-int
-aarch64_simd_immediate_valid_for_move (rtx op, enum machine_mode mode,
-  rtx *modconst, int *elementwidth,
-  unsigned char *elementchar,
-  int *mvn, int *shift)
-{
-  rtx tmpconst;
-  int tmpwidth;
-  unsigned char tmpwidthc;
-  int tmpmvn = 0, tmpshift = 0;
-  bool retval = aarch64_simd_valid_immediate (op, mode, 0, tmpconst,
-tmpwidth, tmpwidthc,
-tmpmvn, tmpshift);
-
-  if (!retval)
-return 0;
-
-  if (modconst)
-*modconst = tmpconst;
-
-  if (elementwidth)
-*elementwidth = tmpwidth;
-
-  if (elementchar)
-*elementchar = tmpwidthc;
-
-  if (mvn)
-*mvn = tmpmvn;
-
-  if (shift)
-*shift = tmpshift;
-
-  return 1;
-}
-
 static bool
 aarch64_const_vec_all_same_int_p (rtx x,
  HOST_WIDE_INT minval,
@@ -6492,9 +6451,8 @@ aarch64_simd_scalar_immediate_valid_for_move (rtx op, 
enum machine_mode mode)
   gcc_assert (!VECTOR_MODE_P (mode));
   vmode = aarch64_preferred_simd_mode (mode);
   rtx op_v = aarch64_simd_gen_const_vector_dup (vmode, INTVAL (op));
-  int retval = aarch64_simd_immediate_valid_for_move (op_v, vmode, 0,
- NULL, NULL, NULL, NULL);
-  return retval;
+  return aarch64_simd_valid_immediate (op_v, vmode, 0, NULL,
+  NULL, NULL, NULL, NULL);
 }
 
 /* Construct and return a PARALLEL RTX vector.  */
@@ -6722,8 +6680,8 @@ aarch64_simd_make_constant (rtx vals)
 gcc_unreachable ();
 
   if (const_vec != NULL_RTX
-   aarch64_simd_immediate_valid_for_move (const_vec, mode, NULL, NULL,
-   NULL, NULL, NULL

[AArch64, PATCH 4/5] Improve MOVI handling (Other minor clean-up)

2013-06-03 Thread Ian Bolton

(This patch is the fourth of five, where the first 4 do some clean-up and
the last fixes a bug with scalar MOVI.  The bug fix without the clean-up
was particularly ugly!)


I think the changelog says it all here.  Nothing major, just tidying up.


OK for trunk?


Cheers,
Ian


2013-06-03  Ian Bolton  ian.bol...@arm.com

* config/aarch64/aarch64.c (simd_immediate_info): Remove
element_char member.
(sizetochar): Return signed char.
(aarch64_simd_valid_immediate): Remove elchar and other
unnecessary variables.
(aarch64_output_simd_mov_immediate): Take rtx instead of rtx.
Calculate element_char as required.
* config/aarch64/aarch64-protos.h: Update and move prototype
for aarch64_output_simd_mov_immediate.
* config/aarch64/aarch64-simd.md (*aarch64_simd_movmode):
Update arguments.diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 083ce91..d21a2f5 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -148,6 +148,7 @@ bool aarch64_legitimate_pic_operand_p (rtx);
 bool aarch64_move_imm (HOST_WIDE_INT, enum machine_mode);
 bool aarch64_mov_operand_p (rtx, enum aarch64_symbol_context,
enum machine_mode);
+char *aarch64_output_simd_mov_immediate (rtx, enum machine_mode, unsigned);
 bool aarch64_pad_arg_upward (enum machine_mode, const_tree);
 bool aarch64_pad_reg_upward (enum machine_mode, const_tree, bool);
 bool aarch64_regno_ok_for_base_p (int, bool);
@@ -258,6 +259,4 @@ extern void aarch64_split_combinev16qi (rtx operands[3]);
 extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel);
 extern bool
 aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel);
-
-char* aarch64_output_simd_mov_immediate (rtx *, enum machine_mode, unsigned);
 #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 04fbdbd..e5990d4 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -409,7 +409,7 @@
  case 4: return ins\t%0.d[0], %1;
  case 5: return mov\t%0, %1;
  case 6:
-   return aarch64_output_simd_mov_immediate (operands[1],
+   return aarch64_output_simd_mov_immediate (operands[1],
  MODEmode, 64);
  default: gcc_unreachable ();
  }
@@ -440,7 +440,7 @@
 case 5:
return #;
 case 6:
-   return aarch64_output_simd_mov_immediate (operands[1], MODEmode, 
128);
+   return aarch64_output_simd_mov_immediate (operands[1], MODEmode, 128);
 default:
gcc_unreachable ();
 }
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d83e645..001f9c5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -91,7 +91,6 @@ struct simd_immediate_info {
   rtx value;
   int shift;
   int element_width;
-  unsigned char element_char;
   bool mvn;
 };
 
@@ -6102,7 +6101,7 @@ aarch64_mangle_type (const_tree type)
 }
 
 /* Return the equivalent letter for size.  */
-static unsigned char
+static char
 sizetochar (int size)
 {
   switch (size)
@@ -6163,7 +6162,6 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode 
mode, bool inverse,
 {  \
   immtype = (CLASS);   \
   elsize = (ELSIZE);   \
-  elchar = sizetochar (elsize);\
   eshift = (SHIFT);\
   emvn = (NEG);\
   break;   \
@@ -6172,25 +6170,20 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode 
mode, bool inverse,
   unsigned int i, elsize = 0, idx = 0, n_elts = CONST_VECTOR_NUNITS (op);
   unsigned int innersize = GET_MODE_SIZE (GET_MODE_INNER (mode));
   unsigned char bytes[16];
-  unsigned char elchar = 0;
   int immtype = -1, matches;
   unsigned int invmask = inverse ? 0xff : 0;
   int eshift, emvn;
 
   if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
 {
-  bool simd_imm_zero = aarch64_simd_imm_zero_p (op, mode);
-  int elem_width = GET_MODE_BITSIZE (GET_MODE (CONST_VECTOR_ELT (op, 0)));
-
-  if (!(simd_imm_zero
-   || aarch64_vect_float_const_representable_p (op)))
+  if (! (aarch64_simd_imm_zero_p (op, mode)
+|| aarch64_vect_float_const_representable_p (op)))
return false;
 
   if (info)
{
  info-value = CONST_VECTOR_ELT (op, 0);
- info-element_width = elem_width;
- info-element_char = sizetochar (elem_width);
+ info-element_width = GET_MODE_BITSIZE (GET_MODE (info-value));
  info-mvn = false;
  info-shift = 0;
}
@@ -6298,7 +6291,6 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode 
mode, bool inverse

[AArch64, PATCH 5/5] Improve MOVI handling (Fix invalid assembler bug)

2013-06-03 Thread Ian Bolton

(This patch is the last of five, where the first 4 did some clean-up and
this one fixes a bug with scalar MOVI.  The bug fix without the clean-up
was particularly ugly!)


GCC will currently generator invalid assembler for MOVI, if the
value in question needs to be shifted.

For example:

  prog.s:270: Error: immediate value out of range -128 to 255
 at operand 2 -- `movi v16.4h,1024'

The correct assembler for the example should be:

  movi v16.4h, 0x4, lsl 8


The fix involves calling into a function to output the instruction,
rather than just leaving for aarch64_print_operand, as is done for
vector immediates.


Regression runs have passed for Linux and bare-metal.


OK for trunk?


Cheers,
Ian


2013-06-03  Ian Bolton  ian.bol...@arm.com

gcc/
* config/aarch64/aarch64.md (*movmode_aarch64): Call
into function to generate MOVI instruction.
* config/aarch64/aarch64.c (aarch64_simd_container_mode):
New function.
(aarch64_preferred_simd_mode): Turn into wrapper.
(aarch64_output_scalar_simd_mov_immediate): New function.
* config/aarch64/aarch64-protos.h: Add prototype for above.

testsuite/
* gcc.target/aarch64/movi_1.c: New test.
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index d21a2f5..0dface1 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -148,6 +148,7 @@ bool aarch64_legitimate_pic_operand_p (rtx);
 bool aarch64_move_imm (HOST_WIDE_INT, enum machine_mode);
 bool aarch64_mov_operand_p (rtx, enum aarch64_symbol_context,
enum machine_mode);
+char *aarch64_output_scalar_simd_mov_immediate (rtx, enum machine_mode);
 char *aarch64_output_simd_mov_immediate (rtx, enum machine_mode, unsigned);
 bool aarch64_pad_arg_upward (enum machine_mode, const_tree);
 bool aarch64_pad_reg_upward (enum machine_mode, const_tree, bool);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 001f9c5..0ea05d8 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5988,32 +5988,57 @@ aarch64_vector_mode_supported_p (enum machine_mode mode)
   return false;
 }
 
-/* Return quad mode as the preferred SIMD mode.  */
+/* Return appropriate SIMD container
+   for MODE within a vector of WIDTH bits.  */
 static enum machine_mode
-aarch64_preferred_simd_mode (enum machine_mode mode)
+aarch64_simd_container_mode (enum machine_mode mode, unsigned width)
 {
+  gcc_assert (width == 64 || width == 128);
   if (TARGET_SIMD)
-switch (mode)
-  {
-  case DFmode:
-return V2DFmode;
-  case SFmode:
-return V4SFmode;
-  case SImode:
-return V4SImode;
-  case HImode:
-return V8HImode;
-  case QImode:
-return V16QImode;
-  case DImode:
-  return V2DImode;
-break;
-
-  default:;
-  }
+{
+  if (width == 128)
+   switch (mode)
+ {
+ case DFmode:
+   return V2DFmode;
+ case SFmode:
+   return V4SFmode;
+ case SImode:
+   return V4SImode;
+ case HImode:
+   return V8HImode;
+ case QImode:
+   return V16QImode;
+ case DImode:
+   return V2DImode;
+ default:
+   break;
+ }
+  else
+   switch (mode)
+ {
+ case SFmode:
+   return V2SFmode;
+ case SImode:
+   return V2SImode;
+ case HImode:
+   return V4HImode;
+ case QImode:
+   return V8QImode;
+ default:
+   break;
+ }
+}
   return word_mode;
 }
 
+/* Return 128-bit container as the preferred SIMD mode for MODE.  */
+static enum machine_mode
+aarch64_preferred_simd_mode (enum machine_mode mode)
+{
+  return aarch64_simd_container_mode (mode, 128);
+}
+
 /* Return the bitmask of possible vector sizes for the vectorizer
to iterate over.  */
 static unsigned int
@@ -7280,6 +7305,18 @@ aarch64_output_simd_mov_immediate (rtx const_vector,
   return templ;
 }
 
+char*
+aarch64_output_scalar_simd_mov_immediate (rtx immediate,
+ enum machine_mode mode)
+{
+  enum machine_mode vmode;
+
+  gcc_assert (!VECTOR_MODE_P (mode));
+  vmode = aarch64_simd_container_mode (mode, 64);
+  rtx v_op = aarch64_simd_gen_const_vector_dup (vmode, INTVAL (immediate));
+  return aarch64_output_simd_mov_immediate (v_op, vmode, 64);
+}
+
 /* Split operands into moves from op[1] + op[2] into op[0].  */
 
 void
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index e1ec48f..458239e 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -774,17 +774,34 @@
 (match_operand:SHORT 1 general_operand   r,M,Dhq,m, 
m,rZ,*w,*w, r,*w))]
   (register_operand (operands[0], MODEmode)
 || aarch64_reg_or_zero (operands[1], MODEmode))
-  @
-   mov\\t%w0, %w1

RE: [PATCH, AArch64] Support BFI instruction and insv standard pattern

2013-05-30 Thread Ian Bolton

 On 05/20/2013 11:55 AM, Ian Bolton wrote:
  I improved this patch during the work I did on the recent insv_imm
 patch
  (http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01007.html).
 
 Thanks, you cleaned up almost everything on which I would have
 commented
 with the previous patch revision.  The only thing left is:
 
  +  else if (!register_operand (value, MODEmode))
  +operands[3] = force_reg (MODEmode, value);
 
 Checking register_operand before force_reg is unnecessary; you're not
 saving a
 function call, and force_reg will itself perform the register check.

Thanks for the review, Richard.

Latest patch is attached, which fixes this.

Linux and bare-metal regression runs successful.

OK for trunk?

Cheers,
Ian


2013-05-30  Ian Bolton  ian.bol...@arm.com

gcc/
* config/aarch64/aarch64.md (insv): New define_expand.
(*insv_regmode): New define_insn.

testsuite/
* gcc.target/aarch64/insv_1.c: New test.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 2bdbfa9..89db092 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3163,6 +3163,50 @@
(set_attr mode MODE)]
 )
 
+;; Bitfield Insert (insv)
+(define_expand insvmode
+  [(set (zero_extract:GPI (match_operand:GPI 0 register_operand)
+ (match_operand 1 const_int_operand)
+ (match_operand 2 const_int_operand))
+   (match_operand:GPI 3 general_operand))]
+  
+{
+  unsigned HOST_WIDE_INT width = UINTVAL (operands[1]);
+  unsigned HOST_WIDE_INT pos = UINTVAL (operands[2]);
+  rtx value = operands[3];
+
+  if (width == 0 || (pos + width)  GET_MODE_BITSIZE (MODEmode))
+FAIL;
+
+  if (CONST_INT_P (value))
+{
+  unsigned HOST_WIDE_INT mask = ((unsigned HOST_WIDE_INT)1  width) - 1;
+
+  /* Prefer AND/OR for inserting all zeros or all ones.  */
+  if ((UINTVAL (value)  mask) == 0
+  || (UINTVAL (value)  mask) == mask)
+   FAIL;
+
+  /* 16-bit aligned 16-bit wide insert is handled by insv_imm.  */
+  if (width == 16  (pos % 16) == 0)
+   DONE;
+}
+  operands[3] = force_reg (MODEmode, value);
+})
+
+(define_insn *insv_regmode
+  [(set (zero_extract:GPI (match_operand:GPI 0 register_operand +r)
+ (match_operand 1 const_int_operand n)
+ (match_operand 2 const_int_operand n))
+   (match_operand:GPI 3 register_operand r))]
+  !(UINTVAL (operands[1]) == 0
+ || (UINTVAL (operands[2]) + UINTVAL (operands[1])
+ GET_MODE_BITSIZE (MODEmode)))
+  bfi\\t%w0, %w3, %2, %1
+  [(set_attr v8type bfm)
+   (set_attr mode MODE)]
+)
+
 (define_insn *optabALLX:mode_shft_GPI:mode
   [(set (match_operand:GPI 0 register_operand =r)
(ashift:GPI (ANY_EXTEND:GPI
diff --git a/gcc/testsuite/gcc.target/aarch64/insv_1.c 
b/gcc/testsuite/gcc.target/aarch64/insv_1.c
new file mode 100644
index 000..bc8928d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/insv_1.c
@@ -0,0 +1,84 @@
+/* { dg-do run } */
+/* { dg-options -O2 --save-temps -fno-inline } */
+
+extern void abort (void);
+
+typedef struct bitfield
+{
+  unsigned short eight: 8;
+  unsigned short four: 4;
+  unsigned short five: 5;
+  unsigned short seven: 7;
+  unsigned int sixteen: 16;
+} bitfield;
+
+bitfield
+bfi1 (bitfield a)
+{
+  /* { dg-final { scan-assembler bfi\tx\[0-9\]+, x\[0-9\]+, 0, 8 } } */
+  a.eight = 3;
+  return a;
+}
+
+bitfield
+bfi2 (bitfield a)
+{
+  /* { dg-final { scan-assembler bfi\tx\[0-9\]+, x\[0-9\]+, 16, 5 } } */
+  a.five = 7;
+  return a;
+}
+
+bitfield
+movk (bitfield a)
+{
+  /* { dg-final { scan-assembler movk\tx\[0-9\]+, 0x1d6b, lsl 32 } } */
+  a.sixteen = 7531;
+  return a;
+}
+
+bitfield
+set1 (bitfield a)
+{
+  /* { dg-final { scan-assembler orr\tx\[0-9\]+, x\[0-9\]+, 2031616 } } */
+  a.five = 0x1f;
+  return a;
+}
+
+bitfield
+set0 (bitfield a)
+{
+  /* { dg-final { scan-assembler and\tx\[0-9\]+, x\[0-9\]+, -2031617 } } */
+  a.five = 0;
+  return a;
+}
+
+
+int
+main (int argc, char** argv)
+{
+  static bitfield a;
+  bitfield b = bfi1 (a);
+  bitfield c = bfi2 (b);
+  bitfield d = movk (c);
+
+  if (d.eight != 3)
+abort ();
+
+  if (d.five != 7)
+abort ();
+
+  if (d.sixteen != 7531)
+abort ();
+
+  d = set1 (d);
+  if (d.five != 0x1f)
+abort ();
+
+  d = set0 (d);
+  if (d.five != 0)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { cleanup-saved-temps } } */

[PATCH, AArch64] Fix invalid assembler in scalar_intrinsics.c test

2013-05-22 Thread Ian Bolton

The test file scalar_intrinsics.c (in gcc.target/aarch64)
is currently compile-only.

If you attempt to make it run, as opposed to just generate
assembler, you can't because it won't assemble.

There are two issues causing trouble here:

1) Use of invalid instruction mov d0, d1.
   It should be mov d0, v1.d[0].

2) The vdupd_lane_s64 and vdupd_lane_u64 calls are being given
   a lane that is out of range, which causes invalid assembler
   output.

This patch fixes both, so that we can build on this to make
executable test cases for scalar intrinsics.

OK for trunk?

Cheers,
Ian


2013-05-22  Ian Bolton  ian.bol...@arm.com

testsuite/
* gcc.target/aarch64/scalar_intrinsics.c (force_simd):
Use a valid instruction.
(test_vdupd_lane_s64): Pass a valid lane argument.
(test_vdupd_lane_u64): Likewise.diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c 
b/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c
index 7427c62..16537ce 100644
--- a/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c
+++ b/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c
@@ -4,7 +4,7 @@
 #include arm_neon.h
 
 /* Used to force a variable to a SIMD register.  */
-#define force_simd(V1)   asm volatile (mov %d0, %d1  \
+#define force_simd(V1)   asm volatile (mov %d0, %1.d[0]  \
   : =w(V1)   \
   : w(V1)\
   : /* No clobbers */);
@@ -228,13 +228,13 @@ test_vdups_lane_u32 (uint32x4_t a)
 int64x1_t
 test_vdupd_lane_s64 (int64x2_t a)
 {
-  return vdupd_lane_s64 (a, 2);
+  return vdupd_lane_s64 (a, 1);
 }
 
 uint64x1_t
 test_vdupd_lane_u64 (uint64x2_t a)
 {
-  return vdupd_lane_u64 (a, 2);
+  return vdupd_lane_u64 (a, 1);
 }
 
 /* { dg-final { scan-assembler-times \\tcmtst\\td\[0-9\]+, d\[0-9\]+, 
d\[0-9\]+ 2 } } */

RE: [PATCH, AArch64] Support BFI instruction and insv standard pattern

2013-05-20 Thread Ian Bolton

 Hi,
 
 This patch implements the BFI variant of BFM.  In doing so, it also
 implements the insv standard pattern.
 
 I've regression tested on bare-metal and linux.
 
 It comes complete with its own compilation and execution testcase.
 
 OK for trunk?
 
 Cheers,
 Ian
 
 
 2013-05-08  Ian Bolton  ian.bol...@arm.com
 
 gcc/
   * config/aarch64/aarch64.md (insv): New define_expand.
   (*insv_regmode): New define_insn.
 
 testsuite/
   * gcc.target/aarch64/bfm_1.c: New test.


(This patch did not yet get commit approval.)

I improved this patch during the work I did on the recent insv_imm patch
(http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01007.html).

I also renamed the testcase.

Regression testing completed successfully.

OK for trunk?

Cheers,
Ian


2013-05-20  Ian Bolton  ian.bol...@arm.com

gcc/
* config/aarch64/aarch64.md (insv): New define_expand.
(*insv_regmode): New define_insn.

testsuite/
* gcc.target/aarch64/insv_1.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b27bcda..e5d6950 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3164,6 +3164,52 @@
(set_attr mode MODE)]
 )
 
+;; Bitfield Insert (insv)
+(define_expand insvmode
+  [(set (zero_extract:GPI (match_operand:GPI 0 register_operand)
+ (match_operand 1 const_int_operand)
+ (match_operand 2 const_int_operand))
+   (match_operand:GPI 3 general_operand))]
+  
+{
+  unsigned HOST_WIDE_INT width = UINTVAL (operands[1]);
+  unsigned HOST_WIDE_INT pos = UINTVAL (operands[2]);
+  rtx value = operands[3];
+
+  if (width == 0 || (pos + width)  GET_MODE_BITSIZE (MODEmode))
+FAIL;
+
+  if (CONST_INT_P (value))
+{
+  unsigned HOST_WIDE_INT mask = ((unsigned HOST_WIDE_INT)1  width) - 1;
+
+  /* Prefer AND/OR for inserting all zeros or all ones.  */
+  if ((UINTVAL (value)  mask) == 0
+  || (UINTVAL (value)  mask) == mask)
+   FAIL;
+
+  /* Force the constant into a register, unless this is a 16-bit aligned
+16-bit wide insert, which is handled by insv_imm.  */
+  if (width != 16 || (pos % 16) != 0)
+   operands[3] = force_reg (MODEmode, value);
+}
+  else if (!register_operand (value, MODEmode))
+operands[3] = force_reg (MODEmode, value);
+})
+
+(define_insn *insv_regmode
+  [(set (zero_extract:GPI (match_operand:GPI 0 register_operand +r)
+ (match_operand 1 const_int_operand n)
+ (match_operand 2 const_int_operand n))
+   (match_operand:GPI 3 register_operand r))]
+  !(UINTVAL (operands[1]) == 0
+ || (UINTVAL (operands[2]) + UINTVAL (operands[1])
+ GET_MODE_BITSIZE (MODEmode)))
+  bfi\\t%w0, %w3, %2, %1
+  [(set_attr v8type bfm)
+   (set_attr mode MODE)]
+)
+
 (define_insn *optabALLX:mode_shft_GPI:mode
   [(set (match_operand:GPI 0 register_operand =r)
(ashift:GPI (ANY_EXTEND:GPI
diff --git a/gcc/testsuite/gcc.target/aarch64/insv_1.c 
b/gcc/testsuite/gcc.target/aarch64/insv_1.c
new file mode 100644
index 000..0977e15
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/insv_1.c
@@ -0,0 +1,84 @@
+/* { dg-do run } */
+/* { dg-options -O2 --save-temps -fno-inline } */
+
+extern void abort (void);
+
+typedef struct bitfield
+{
+  unsigned short eight: 8;
+  unsigned short four: 4;
+  unsigned short five: 5;
+  unsigned short seven: 7;
+  unsigned int sixteen: 16;
+} bitfield;
+
+bitfield
+bfi1 (bitfield a)
+{
+  /* { dg-final { scan-assembler bfi\tx\[0-9\]+, x\[0-9\]+, 0, 8 } } */
+  a.eight = 3;
+  return a;
+}
+
+bitfield
+bfi2 (bitfield a)
+{
+  /* { dg-final { scan-assembler bfi\tx\[0-9\]+, x\[0-9\]+, 16, 5 } } */
+  a.five = 7;
+  return a;
+}
+
+bitfield
+movk (bitfield a)
+{
+  /* { dg-final { scan-assembler movk\tx\[0-9\]+, 0x1d6b, lsl 32 } } */
+  a.sixteen = 7531;
+  return a;
+}
+
+bitfield
+set1 (bitfield a)
+{
+  /* { dg-final { scan-assembler orr\tx\[0-9\]+, x\[0-9\]+, 2031616 } } */
+  a.five = 0x1f;
+  return a;
+}
+
+bitfield
+set0 (bitfield a)
+{
+  /* { dg-final { scan-assembler and\tx\[0-9\]+, x\[0-9\]+, -2031617 } } */
+  a.five = 0;
+  return a;
+}
+
+
+int
+main (int argc, char** argv)
+{
+  static bitfield a;
+  bitfield b = bfi1 (a);
+  bitfield c = bfi2 (b);
+  bitfield d = movk (c);
+
+  if (d.eight != 3)
+abort ();
+
+  if (d.five != 7)
+abort ();
+
+  if (d.sixteen != 7531)
+abort ();
+
+  d = set1 (d);
+  if (d.five != 0x1f)
+abort ();
+
+  d = set0 (d);
+  if (d.five != 0)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { cleanup-saved-temps } } */

[PATCH, AArch64] Allow insv_imm to handle bigger immediates via masking to 16-bits

2013-05-17 Thread Ian Bolton

The MOVK instruction is currently not used when operand 2 is
more than 16 bits, which leads to sub-optimal code.

This patch improves those situations by removing the check and
instead masking down to 16 bits within the new X format specifier
I added recently.

OK for trunk?

Cheers,
Ian


2013-05-17  Ian Bolton  ian.bol...@arm.com

* config/aarch64/aarch64.c (aarch64_print_operand): Change the X
format
specifier to only display bottom 16 bits.

* config/aarch64/aarch64.md (insv_immmode): Allow any-sized
immediate
to match for operand 2, since it will be masked.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b57416c..1bdfd85 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3424,13 +3424,13 @@ aarch64_print_operand (FILE *f, rtx x, char code)
   break;
 
 case 'X':
-  /* Print integer constant in hex.  */
+  /* Print bottom 16 bits of integer constant in hex.  */
   if (GET_CODE (x) != CONST_INT)
{
  output_operand_lossage (invalid operand for '%%%c', code);
  return;
}
-  asm_fprintf (f, 0x%wx, UINTVAL (x));
+  asm_fprintf (f, 0x%wx, UINTVAL (x)  0x);
   break;
 
 case 'w':
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b27bcda..403d717 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -858,9 +858,8 @@
  (const_int 16)
  (match_operand:GPI 1 const_int_operand n))
(match_operand:GPI 2 const_int_operand n))]
-  INTVAL (operands[1])  GET_MODE_BITSIZE (MODEmode)
-INTVAL (operands[1]) % 16 == 0
-UINTVAL (operands[2]) = 0x
+  UINTVAL (operands[1])  GET_MODE_BITSIZE (MODEmode)
+UINTVAL (operands[1]) % 16 == 0
   movk\\t%w0, %X2, lsl %1
   [(set_attr v8type movk)
(set_attr mode MODE)]

[PATCH, AArch64] Support BFI instruction and insv standard pattern

2013-05-08 Thread Ian Bolton

Hi,

This patch implements the BFI variant of BFM.  In doing so, it also
implements the insv standard pattern.

I've regression tested on bare-metal and linux.

It comes complete with its own compilation and execution testcase.

OK for trunk?

Cheers,
Ian


2013-05-08  Ian Bolton  ian.bol...@arm.com

gcc/
* config/aarch64/aarch64.md (insv): New define_expand.
(*insv_regmode): New define_insn.

testsuite/
* gcc.target/aarch64/bfm_1.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 330f78c..b730ed0 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3118,6 +3118,53 @@
(set_attr mode MODE)]
 )
 
+;; Bitfield Insert (insv)
+(define_expand insvmode
+  [(set (zero_extract:GPI (match_operand:GPI 0 register_operand)
+ (match_operand 1 const_int_operand)
+ (match_operand 2 const_int_operand))
+   (match_operand:GPI 3 general_operand))]
+  
+{
+  HOST_WIDE_INT mask = ((HOST_WIDE_INT)1  INTVAL (operands[1])) - 1;
+
+  if (GET_MODE_BITSIZE (MODEmode)  BITS_PER_WORD
+  || INTVAL (operands[1])  1
+  || INTVAL (operands[1]) = GET_MODE_BITSIZE (MODEmode)
+  || INTVAL (operands[2])  0
+  || (INTVAL (operands[2]) + INTVAL (operands[1]))
+  GET_MODE_BITSIZE (MODEmode))
+FAIL;
+
+  /* Prefer AND/OR for inserting all zeros or all ones. */
+  if (CONST_INT_P (operands[3])
+   ((INTVAL (operands[3])  mask) == 0
+ || (INTVAL (operands[3])  mask) == mask))
+FAIL;
+
+  if (!register_operand (operands[3], MODEmode))
+operands[3] = force_reg (MODEmode, operands[3]);
+
+  /* Intentional fall-through, which will lead to below pattern
+ being matched.  */
+})
+
+(define_insn *insv_regmode
+  [(set (zero_extract:GPI (match_operand:GPI 0 register_operand +r)
+ (match_operand 1 const_int_operand n)
+ (match_operand 2 const_int_operand n))
+   (match_operand:GPI 3 register_operand r))]
+  !(GET_MODE_BITSIZE (MODEmode)  BITS_PER_WORD
+  || INTVAL (operands[1])  1
+  || INTVAL (operands[1]) = GET_MODE_BITSIZE (MODEmode)
+  || INTVAL (operands[2])  0
+  || (INTVAL (operands[2]) + INTVAL (operands[1]))
+  GET_MODE_BITSIZE (MODEmode))
+  bfi\\t%w0, %w3, %2, %1
+  [(set_attr v8type bfm)
+   (set_attr mode MODE)]
+)
+
 (define_insn *optabALLX:mode_shft_GPI:mode
   [(set (match_operand:GPI 0 register_operand =r)
(ashift:GPI (ANY_EXTEND:GPI
diff --git a/gcc/testsuite/gcc.target/aarch64/bfm_1.c 
b/gcc/testsuite/gcc.target/aarch64/bfm_1.c
new file mode 100644
index 000..d9a73a1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/bfm_1.c
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-options -O2 --save-temps -fno-inline } */
+
+extern void abort (void);
+
+typedef struct bitfield
+{
+  unsigned short eight: 8;
+  unsigned short four: 4;
+  unsigned short five: 5;
+  unsigned short seven: 7;
+} bitfield;
+
+bitfield
+bfi1 (bitfield a)
+{
+  /* { dg-final { scan-assembler bfi\tw\[0-9\]+, w\[0-9\]+, 0, 8 } } */
+  a.eight = 3;
+  return a;
+}
+
+bitfield
+bfi2 (bitfield a)
+{
+  /* { dg-final { scan-assembler bfi\tw\[0-9\]+, w\[0-9\]+, 16, 5 } } */
+  a.five = 7;
+  return a;
+}
+
+int
+main (int argc, char** argv)
+{
+  bitfield a;
+  bitfield b = bfi1 (a);
+  bitfield c = bfi2 (b);
+
+  if (c.eight != 3)
+abort ();
+
+  if (c.five != 7)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { cleanup-saved-temps } } */

[PATCH, AArch64] Testcases for TST instruction

2013-05-02 Thread Ian Bolton

I previously fixed a bug with the patterns that generate TST.

I added these testcases to make our regression testing more solid.

They've been running on our internal branch for about a month.

OK to commit to trunk?

Cheers,
Ian


2013-05-02  Ian Bolton  ian.bol...@arm.com

   * gcc.target/aarch64/tst_1.c: New test.
   * gcc.target/aarch64/tst_2.c: LikewiseIndex: gcc/testsuite/gcc.target/aarch64/tst_1.c
===
--- gcc/testsuite/gcc.target/aarch64/tst_1.c(revision 0)
+++ gcc/testsuite/gcc.target/aarch64/tst_1.c(revision 0)
@@ -0,0 +1,150 @@
+/* { dg-do run } */
+/* { dg-options -O2 --save-temps -fno-inline } */
+
+extern void abort (void);
+
+int
+tst_si_test1 (int a, int b, int c)
+{
+  int d = a  b;
+
+  /* { dg-final { scan-assembler-times tst\tw\[0-9\]+, w\[0-9\]+ 2 } } */
+  if (d == 0)
+return 12;
+  else
+return 18;
+}
+
+int
+tst_si_test2 (int a, int b, int c)
+{
+  int d = a  0x;
+
+  /* { dg-final { scan-assembler tst\tw\[0-9\]+, -1717986919 } } */
+  if (d == 0)
+return 12;
+  else
+return 18;
+}
+
+int
+tst_si_test3 (int a, int b, int c)
+{
+  int d = a  (b  3);
+
+  /* { dg-final { scan-assembler tst\tw\[0-9\]+, w\[0-9\]+, lsl 3 } } */
+  if (d == 0)
+return 12;
+  else
+return 18;
+}
+
+typedef long long s64;
+
+s64
+tst_di_test1 (s64 a, s64 b, s64 c)
+{
+  s64 d = a  b;
+
+  /* { dg-final { scan-assembler-times tst\tx\[0-9\]+, x\[0-9\]+ 2 } } */
+  if (d == 0)
+return 12;
+  else
+return 18;
+}
+
+s64
+tst_di_test2 (s64 a, s64 b, s64 c)
+{
+  s64 d = a  0xll;
+
+  /* { dg-final { scan-assembler tst\tx\[0-9\]+, -6148914691236517206 } } */
+  if (d == 0)
+return 12;
+  else
+return 18;
+}
+
+s64
+tst_di_test3 (s64 a, s64 b, s64 c)
+{
+  s64 d = a  (b  3);
+
+  /* { dg-final { scan-assembler tst\tx\[0-9\]+, x\[0-9\]+, lsl 3 } } */
+  if (d == 0)
+return 12;
+  else
+return 18;
+}
+
+int
+main ()
+{
+  int x;
+  s64 y;
+
+  x = tst_si_test1 (29, 4, 5);
+  if (x != 18)
+abort ();
+
+  x = tst_si_test1 (5, 2, 20);
+  if (x != 12)
+abort ();
+
+  x = tst_si_test2 (29, 4, 5);
+  if (x != 18)
+abort ();
+
+  x = tst_si_test2 (1024, 2, 20);
+  if (x != 12)
+abort ();
+
+  x = tst_si_test3 (35, 4, 5);
+  if (x != 18)
+abort ();
+
+  x = tst_si_test3 (5, 2, 20);
+  if (x != 12)
+abort ();
+
+  y = tst_di_test1 (0x13029ll,
+ 0x32004ll,
+ 0x505050505ll);
+
+  if (y != 18)
+abort ();
+
+  y = tst_di_test1 (0x5000500050005ll,
+ 0x2111211121112ll,
+ 0x02020ll);
+  if (y != 12)
+abort ();
+
+  y = tst_di_test2 (0x13029ll,
+ 0x32004ll,
+ 0x505050505ll);
+  if (y != 18)
+abort ();
+
+  y = tst_di_test2 (0x540004100ll,
+ 0x32004ll,
+ 0x805050205ll);
+  if (y != 12)
+abort ();
+
+  y = tst_di_test3 (0x13029ll,
+ 0x06408ll,
+ 0x505050505ll);
+  if (y != 18)
+abort ();
+
+  y = tst_di_test3 (0x130002900ll,
+ 0x08808ll,
+ 0x505050505ll);
+  if (y != 12)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/aarch64/tst_2.c
===
--- gcc/testsuite/gcc.target/aarch64/tst_2.c(revision 0)
+++ gcc/testsuite/gcc.target/aarch64/tst_2.c(revision 0)
@@ -0,0 +1,156 @@
+/* { dg-do run } */
+/* { dg-options -O2 --save-temps -fno-inline } */
+
+extern void abort (void);
+
+int
+tst_si_test1 (int a, int b, int c)
+{
+  int d = a  b;
+
+  /* { dg-final { scan-assembler-not tst\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ } 
} */
+  /* { dg-final { scan-assembler-times and\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ 
2 } } */
+  if (d = 0)
+return 12;
+  else
+return 18;
+}
+
+int
+tst_si_test2 (int a, int b, int c)
+{
+  int d = a  0x;
+
+  /* { dg-final { scan-assembler-not tst\tw\[0-9\]+, w\[0-9\]+, -1717986919 
} } */
+  /* { dg-final { scan-assembler and\tw\[0-9\]+, w\[0-9\]+, -1717986919 } } 
*/
+  if (d = 0)
+return 12;
+  else
+return 18;
+}
+
+int
+tst_si_test3 (int a, int b, int c)
+{
+  int d = a  (b  3);
+
+  /* { dg-final { scan-assembler-not tst\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, 
lsl 3 } } */
+  /* { dg-final { scan-assembler and\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3 
} } */
+  if (d = 0)
+return 12;
+  else
+return 18;
+}
+
+typedef long long s64;
+
+s64
+tst_di_test1 (s64 a, s64 b, s64 c)
+{
+  s64 d = a  b;
+
+  /* { dg-final { scan-assembler-not tst\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+ } 
} */
+  /* { dg-final { scan-assembler-times and\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+ 
2 } } */
+  if (d = 0)
+return 12;
+  else
+return 18;
+}
+
+s64
+tst_di_test2 (s64 a, s64 b, s64 c)
+{
+  s64 d = a  0xll

RE: [PATCH, AArch64] Testcases for ANDS instruction

2013-05-01 Thread Ian Bolton

 From: Richard Earnshaw
 This rule
  +  /* { dg-final { scan-assembler and\tw\[0-9\]+, w\[0-9\]+, w\[0-
 9\]+ } } */
 
 Will match anything that this rule
 
  +  /* { dg-final { scan-assembler and\tw\[0-9\]+, w\[0-9\]+, w\[0-
 9\]+, lsl 3 } } */
 
 matches (though not vice versa).
 
 Similarly for the x register variants.
 

Thanks for the review. I've fixed this up in the attached patch, by
counting the number of matches for the first rule and expecting it to
match additional times to cover the overlap with the lsl based rule.

I've also renamed the testcases in line with the suggested GCC testcase
naming convention.

OK for commit?

Cheers,
Ian


2013-05-01  Ian Bolton  ian.bol...@arm.com

* gcc.target/aarch64/ands_1.c: New test.
* gcc.target/aarch64/ands_2.c: LikewiseIndex: gcc/testsuite/gcc.target/aarch64/ands_1.c
===
--- gcc/testsuite/gcc.target/aarch64/ands_1.c   (revision 0)
+++ gcc/testsuite/gcc.target/aarch64/ands_1.c   (revision 0)
@@ -0,0 +1,151 @@
+/* { dg-do run } */
+/* { dg-options -O2 --save-temps -fno-inline } */
+
+extern void abort (void);
+
+int
+ands_si_test1 (int a, int b, int c)
+{
+  int d = a  b;
+
+  /* { dg-final { scan-assembler-times ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ 
2 } } */
+  if (d == 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+int
+ands_si_test2 (int a, int b, int c)
+{
+  int d = a  0xff;
+
+  /* { dg-final { scan-assembler ands\tw\[0-9\]+, w\[0-9\]+, 255 } } */
+  if (d == 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+int
+ands_si_test3 (int a, int b, int c)
+{
+  int d = a  (b  3);
+
+  /* { dg-final { scan-assembler ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 
3 } } */
+  if (d == 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+typedef long long s64;
+
+s64
+ands_di_test1 (s64 a, s64 b, s64 c)
+{
+  s64 d = a  b;
+
+  /* { dg-final { scan-assembler-times ands\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+ 
2 } } */
+  if (d == 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+s64
+ands_di_test2 (s64 a, s64 b, s64 c)
+{
+  s64 d = a  0xff;
+
+  /* { dg-final { scan-assembler ands\tx\[0-9\]+, x\[0-9\]+, 255 } } */
+  if (d == 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+s64
+ands_di_test3 (s64 a, s64 b, s64 c)
+{
+  s64 d = a  (b  3);
+
+  /* { dg-final { scan-assembler ands\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 
3 } } */
+  if (d == 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+int
+main ()
+{
+  int x;
+  s64 y;
+
+  x = ands_si_test1 (29, 4, 5);
+  if (x != 13)
+abort ();
+
+  x = ands_si_test1 (5, 2, 20);
+  if (x != 25)
+abort ();
+
+  x = ands_si_test2 (29, 4, 5);
+  if (x != 38)
+abort ();
+
+  x = ands_si_test2 (1024, 2, 20);
+  if (x != 1044)
+abort ();
+
+  x = ands_si_test3 (35, 4, 5);
+  if (x != 41)
+abort ();
+
+  x = ands_si_test3 (5, 2, 20);
+  if (x != 25)
+abort ();
+
+  y = ands_di_test1 (0x13029ll,
+ 0x32004ll,
+ 0x505050505ll);
+
+  if (y != ((0x13029ll  0x32004ll) + 0x32004ll + 0x505050505ll))
+abort ();
+
+  y = ands_di_test1 (0x5000500050005ll,
+ 0x2111211121112ll,
+ 0x02020ll);
+  if (y != 0x5000500052025ll)
+abort ();
+
+  y = ands_di_test2 (0x13029ll,
+ 0x32004ll,
+ 0x505050505ll);
+  if (y != ((0x13029ll  0xff) + 0x32004ll + 0x505050505ll))
+abort ();
+
+  y = ands_di_test2 (0x130002900ll,
+ 0x32004ll,
+ 0x505050505ll);
+  if (y != (0x130002900ll + 0x505050505ll))
+abort ();
+
+  y = ands_di_test3 (0x13029ll,
+ 0x06408ll,
+ 0x505050505ll);
+  if (y != ((0x13029ll  (0x06408ll  3))
+   + 0x06408ll + 0x505050505ll))
+abort ();
+
+  y = ands_di_test3 (0x130002900ll,
+ 0x08808ll,
+ 0x505050505ll);
+  if (y != (0x130002900ll + 0x505050505ll))
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/aarch64/ands_2.c
===
--- gcc/testsuite/gcc.target/aarch64/ands_2.c   (revision 0)
+++ gcc/testsuite/gcc.target/aarch64/ands_2.c   (revision 0)
@@ -0,0 +1,157 @@
+/* { dg-do run } */
+/* { dg-options -O2 --save-temps -fno-inline } */
+
+extern void abort (void);
+
+int
+ands_si_test1 (int a, int b, int c)
+{
+  int d = a  b;
+
+  /* { dg-final { scan-assembler-not ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ } 
} */
+  /* { dg-final { scan-assembler-times and\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ 
2 } } */
+  if (d = 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+int
+ands_si_test2 (int a, int b, int c)
+{
+  int d = a  0x;
+
+  /* { dg-final { scan-assembler-not ands\tw\[0-9\]+, w\[0-9\]+, -1717986919 
} } */
+  /* { dg-final { scan

RE: [PATCH, AArch64] Support BICS instruction in the backend

2013-05-01 Thread Ian Bolton

 From: Marcus Shawcroft [mailto:marcus.shawcr...@gmail.com]
 +  /* { dg-final { scan-assembler bics\tx\[0-9\]+, x\[0-9\]+, x\[0-
 9\]+ } } */
 
 +  /* { dg-final { scan-assembler bics\tx\[0-9\]+, x\[0-9\]+,
 x\[0-9\]+, lsl 3 } } */
 
 Ian, These two patterns have the same issue Richard just highlighted
 on your other patch, ie the first pattern will also match anything
 matched by the second pattern.
 
 /Marcus
 

I've fixed the rules in the testcases and renamed the files to match
naming conventions in the latest patch (attached).

OK to commit?

Cheers,
Ian


2013-05-01  Ian Bolton  ian.bol...@arm.com

gcc/
* config/aarch64/aarch64.md (*and_one_cmplmode3_compare0):
New pattern.
(*and_one_cmplsi3_compare0_uxtw): Likewise.
(*and_one_cmpl_SHIFT:optabmode3_compare0): Likewise.
(*and_one_cmpl_SHIFT:optabsi3_compare0_uxtw): Likewise.


testsuite/
* gcc.target/aarch64/bics_1.c: New test.
* gcc.target/aarch64/bics_2.c: Likewise.

[PATCH, AArch64] Fix for LDR/STR to/from S and D registers

2013-05-01 Thread Ian Bolton

This is a fix for this patch:
http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01621.html

If someone compiles with -mgeneral-regs-only then those instructions
shouldn't be used. We can enforce that by adding the fp attribute to
the relevant alternatives in the patterns.

Regression tests all good.

OK for trunk?

Cheers,
Ian


2013-05-01  Ian Bolton  ian.bol...@arm.com

* config/aarch64/aarch64.md (movsi_aarch64): Only allow to/from
S reg when fp attribute set.
(movdi_aarch64): Only allow to/from D reg when fp attribute set.Index: gcc/config/aarch64/aarch64.md
===
--- gcc/config/aarch64/aarch64.md   (revision 198456)
+++ gcc/config/aarch64/aarch64.md   (working copy)
@@ -825,7 +825,7 @@ (define_insn *movsi_aarch64
fmov\\t%s0, %s1
   [(set_attr v8type move,alu,load1,load1,store1,store1,fmov,fmov,fmov)
(set_attr mode SI)
-   (set_attr fp *,*,*,*,*,*,yes,yes,yes)]
+   (set_attr fp *,*,*,yes,*,yes,yes,yes,yes)]
 )
 
 (define_insn *movdi_aarch64
@@ -850,7 +850,7 @@ (define_insn *movdi_aarch64
movi\\t%d0, %1
   [(set_attr v8type 
move,move,move,alu,load1,load1,store1,store1,adr,adr,fmov,fmov,fmov,fmov)
(set_attr mode DI)
-   (set_attr fp *,*,*,*,*,*,*,*,*,*,yes,yes,yes,yes)]
+   (set_attr fp *,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,yes)]
 )
 
 (define_insn insv_immmode

RE: [PATCH, AArch64] Support BICS instruction in the backend

2013-05-01 Thread Ian Bolton

 Can we have the patch attached ?

OK
Index: gcc/testsuite/gcc.target/aarch64/bics_1.c
===
--- gcc/testsuite/gcc.target/aarch64/bics_1.c   (revision 0)
+++ gcc/testsuite/gcc.target/aarch64/bics_1.c   (revision 0)
@@ -0,0 +1,107 @@
+/* { dg-do run } */
+/* { dg-options -O2 --save-temps -fno-inline } */
+
+extern void abort (void);
+
+int
+bics_si_test1 (int a, int b, int c)
+{
+  int d = a  ~b;
+
+  /* { dg-final { scan-assembler-times bics\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ 
2 } } */
+  if (d == 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+int
+bics_si_test2 (int a, int b, int c)
+{
+  int d = a  ~(b  3);
+
+  /* { dg-final { scan-assembler bics\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 
3 } } */
+  if (d == 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+typedef long long s64;
+
+s64
+bics_di_test1 (s64 a, s64 b, s64 c)
+{
+  s64 d = a  ~b;
+
+  /* { dg-final { scan-assembler-times bics\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+ 
2 } } */
+  if (d == 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+s64
+bics_di_test2 (s64 a, s64 b, s64 c)
+{
+  s64 d = a  ~(b  3);
+
+  /* { dg-final { scan-assembler bics\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 
3 } } */
+  if (d == 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+int
+main ()
+{
+  int x;
+  s64 y;
+
+  x = bics_si_test1 (29, ~4, 5);
+  if (x != ((29  4) + ~4 + 5))
+abort ();
+
+  x = bics_si_test1 (5, ~2, 20);
+  if (x != 25)
+abort ();
+
+  x = bics_si_test2 (35, ~4, 5);
+  if (x != ((35  ~(~4  3)) + ~4 + 5))
+abort ();
+
+  x = bics_si_test2 (96, ~2, 20);
+  if (x != 116)
+abort ();
+
+  y = bics_di_test1 (0x13029ll,
+ ~0x32004ll,
+ 0x505050505ll);
+
+  if (y != ((0x13029ll  0x32004ll) + ~0x32004ll + 0x505050505ll))
+abort ();
+
+  y = bics_di_test1 (0x5000500050005ll,
+ ~0x2111211121112ll,
+ 0x02020ll);
+  if (y != 0x5000500052025ll)
+abort ();
+
+  y = bics_di_test2 (0x13029ll,
+ ~0x06408ll,
+ 0x505050505ll);
+  if (y != ((0x13029ll  ~(~0x06408ll  3))
+   + ~0x06408ll + 0x505050505ll))
+abort ();
+
+  y = bics_di_test2 (0x130002900ll,
+ ~0x08808ll,
+ 0x505050505ll);
+  if (y != (0x130002900ll + 0x505050505ll))
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/aarch64/bics_2.c
===
--- gcc/testsuite/gcc.target/aarch64/bics_2.c   (revision 0)
+++ gcc/testsuite/gcc.target/aarch64/bics_2.c   (revision 0)
@@ -0,0 +1,111 @@
+/* { dg-do run } */
+/* { dg-options -O2 --save-temps -fno-inline } */
+
+extern void abort (void);
+
+int
+bics_si_test1 (int a, int b, int c)
+{
+  int d = a  ~b;
+
+  /* { dg-final { scan-assembler-not bics\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ } 
} */
+  /* { dg-final { scan-assembler-times bic\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ 
2 } } */
+  if (d = 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+int
+bics_si_test2 (int a, int b, int c)
+{
+  int d = a  ~(b  3);
+
+  /* { dg-final { scan-assembler-not bics\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, 
lsl 3 } } */
+  /* { dg-final { scan-assembler bic\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3 
} } */
+  if (d = 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+typedef long long s64;
+
+s64
+bics_di_test1 (s64 a, s64 b, s64 c)
+{
+  s64 d = a  ~b;
+
+  /* { dg-final { scan-assembler-not bics\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+ } 
} */
+  /* { dg-final { scan-assembler-times bic\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+ 
2 } } */
+  if (d = 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+s64
+bics_di_test2 (s64 a, s64 b, s64 c)
+{
+  s64 d = a  ~(b  3);
+
+  /* { dg-final { scan-assembler-not bics\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, 
lsl 3 } } */
+  /* { dg-final { scan-assembler bic\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 3 
} } */
+  if (d = 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+int
+main ()
+{
+  int x;
+  s64 y;
+
+  x = bics_si_test1 (29, ~4, 5);
+  if (x != ((29  4) + ~4 + 5))
+abort ();
+
+  x = bics_si_test1 (5, ~2, 20);
+  if (x != 25)
+abort ();
+
+  x = bics_si_test2 (35, ~4, 5);
+  if (x != ((35  ~(~4  3)) + ~4 + 5))
+abort ();
+
+  x = bics_si_test2 (96, ~2, 20);
+  if (x != 116)
+abort ();
+
+  y = bics_di_test1 (0x13029ll,
+ ~0x32004ll,
+ 0x505050505ll);
+
+  if (y != ((0x13029ll  0x32004ll) + ~0x32004ll + 0x505050505ll))
+abort ();
+
+  y = bics_di_test1 (0x5000500050005ll,
+ ~0x2111211121112ll,
+ 0x02020ll);
+  if (y != 0x5000500052025ll)
+abort ();
+
+  y = bics_di_test2 (0x13029ll,
+ ~0x06408ll,
+ 0x505050505ll);
+

[PATCH, AArch64] Testcases for ANDS instruction

2013-04-26 Thread Ian Bolton

I made some testcases to go with my implementation of ANDS in the backend,
but Naveen Hurugalawadi got the ANDS patterns in before me!
 
I'm now just left with the testcases, but they are still worth adding, so
here they are.

Tests are working correctly as of current trunk.

OK to commit?

Cheers,
Ian


2013-04-26  Ian Bolton  ian.bol...@arm.com

   * gcc.target/aarch64/ands.c: New test.
   * gcc.target/aarch64/ands2.c: LikewiseIndex: gcc/testsuite/gcc.target/aarch64/ands2.c
===
--- gcc/testsuite/gcc.target/aarch64/ands2.c(revision 0)
+++ gcc/testsuite/gcc.target/aarch64/ands2.c(revision 0)
@@ -0,0 +1,157 @@
+/* { dg-do run } */
+/* { dg-options -O2 --save-temps -fno-inline } */
+
+extern void abort (void);
+
+int
+ands_si_test1 (int a, int b, int c)
+{
+  int d = a  b;
+
+  /* { dg-final { scan-assembler-not ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ } 
} */
+  /* { dg-final { scan-assembler and\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ } } */
+  if (d = 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+int
+ands_si_test2 (int a, int b, int c)
+{
+  int d = a  0x;
+
+  /* { dg-final { scan-assembler-not ands\tw\[0-9\]+, w\[0-9\]+, -1717986919 
} } */
+  /* { dg-final { scan-assembler and\tw\[0-9\]+, w\[0-9\]+, -1717986919 } } 
*/
+  if (d = 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+int
+ands_si_test3 (int a, int b, int c)
+{
+  int d = a  (b  3);
+
+  /* { dg-final { scan-assembler-not ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, 
lsl 3 } } */
+  /* { dg-final { scan-assembler and\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3 
} } */
+  if (d = 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+typedef long long s64;
+
+s64
+ands_di_test1 (s64 a, s64 b, s64 c)
+{
+  s64 d = a  b;
+
+  /* { dg-final { scan-assembler-not ands\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+ } 
} */
+  /* { dg-final { scan-assembler and\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+ } } */
+  if (d = 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+s64
+ands_di_test2 (s64 a, s64 b, s64 c)
+{
+  s64 d = a  0xll;
+
+  /* { dg-final { scan-assembler-not ands\tx\[0-9\]+, x\[0-9\]+, 
-6148914691236517206 } } */
+  /* { dg-final { scan-assembler and\tx\[0-9\]+, x\[0-9\]+, 
-6148914691236517206 } } */
+  if (d = 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+s64
+ands_di_test3 (s64 a, s64 b, s64 c)
+{
+  s64 d = a  (b  3);
+
+  /* { dg-final { scan-assembler-not ands\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, 
lsl 3 } } */
+  /* { dg-final { scan-assembler and\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 3 
} } */
+  if (d = 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+int
+main ()
+{
+  int x;
+  s64 y;
+
+  x = ands_si_test1 (29, 4, 5);
+  if (x != 13)
+abort ();
+
+  x = ands_si_test1 (5, 2, 20);
+  if (x != 25)
+abort ();
+
+  x = ands_si_test2 (29, 4, 5);
+  if (x != 34)
+abort ();
+
+  x = ands_si_test2 (1024, 2, 20);
+  if (x != 1044)
+abort ();
+
+  x = ands_si_test3 (35, 4, 5);
+  if (x != 41)
+abort ();
+
+  x = ands_si_test3 (5, 2, 20);
+  if (x != 25)
+abort ();
+
+  y = ands_di_test1 (0x13029ll,
+ 0x32004ll,
+ 0x505050505ll);
+
+  if (y != ((0x13029ll  0x32004ll) + 0x32004ll + 0x505050505ll))
+abort ();
+
+  y = ands_di_test1 (0x5000500050005ll,
+ 0x2111211121112ll,
+ 0x02020ll);
+  if (y != 0x5000500052025ll)
+abort ();
+
+  y = ands_di_test2 (0x13029ll,
+ 0x32004ll,
+ 0x505050505ll);
+  if (y != ((0x13029ll  0xll) + 0x32004ll + 
0x505050505ll))
+abort ();
+
+  y = ands_di_test2 (0x540004100ll,
+ 0x32004ll,
+ 0x805050205ll);
+  if (y != (0x540004100ll + 0x805050205ll))
+abort ();
+
+  y = ands_di_test3 (0x13029ll,
+ 0x06408ll,
+ 0x505050505ll);
+  if (y != ((0x13029ll  (0x06408ll  3))
+   + 0x06408ll + 0x505050505ll))
+abort ();
+
+  y = ands_di_test3 (0x130002900ll,
+ 0x08808ll,
+ 0x505050505ll);
+  if (y != (0x130002900ll + 0x505050505ll))
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/aarch64/ands.c
===
--- gcc/testsuite/gcc.target/aarch64/ands.c (revision 0)
+++ gcc/testsuite/gcc.target/aarch64/ands.c (revision 0)
@@ -0,0 +1,151 @@
+/* { dg-do run } */
+/* { dg-options -O2 --save-temps -fno-inline } */
+
+extern void abort (void);
+
+int
+ands_si_test1 (int a, int b, int c)
+{
+  int d = a  b;
+
+  /* { dg-final { scan-assembler ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ } } */
+  if (d == 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+int
+ands_si_test2 (int a, int b, int c)
+{
+  int d = a  0xff

[PATCH, AArch64] Support BICS instruction in the backend

2013-04-26 Thread Ian Bolton

With these patterns, we can now generate BICS in the appropriate places.

I've included test cases.

This has been run on linux and bare-metal regression tests.

OK to commit?

Cheers,
Ian



2013-04-26  Ian Bolton  ian.bol...@arm.com

gcc/
* config/aarch64/aarch64.md (*and_one_cmplmode3_compare0):
New pattern.
(*and_one_cmplsi3_compare0_uxtw): Likewise.
(*and_one_cmpl_SHIFT:optabmode3_compare0): Likewise.
(*and_one_cmpl_SHIFT:optabsi3_compare0_uxtw): Likewise.

testsuite/
* gcc.target/aarch64/bics.c: New test.
* gcc.target/aarch64/bics2.c: Likewise.Index: gcc/testsuite/gcc.target/aarch64/bics.c
===
--- gcc/testsuite/gcc.target/aarch64/bics.c (revision 0)
+++ gcc/testsuite/gcc.target/aarch64/bics.c (revision 0)
@@ -0,0 +1,107 @@
+/* { dg-do run } */
+/* { dg-options -O2 --save-temps -fno-inline } */
+
+extern void abort (void);
+
+int
+bics_si_test1 (int a, int b, int c)
+{
+  int d = a  ~b;
+
+  /* { dg-final { scan-assembler bics\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ } } */
+  if (d == 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+int
+bics_si_test2 (int a, int b, int c)
+{
+  int d = a  ~(b  3);
+
+  /* { dg-final { scan-assembler bics\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 
3 } } */
+  if (d == 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+typedef long long s64;
+
+s64
+bics_di_test1 (s64 a, s64 b, s64 c)
+{
+  s64 d = a  ~b;
+
+  /* { dg-final { scan-assembler bics\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+ } } */
+  if (d == 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+s64
+bics_di_test2 (s64 a, s64 b, s64 c)
+{
+  s64 d = a  ~(b  3);
+
+  /* { dg-final { scan-assembler bics\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 
3 } } */
+  if (d == 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+int
+main ()
+{
+  int x;
+  s64 y;
+
+  x = bics_si_test1 (29, ~4, 5);
+  if (x != ((29  4) + ~4 + 5))
+abort ();
+
+  x = bics_si_test1 (5, ~2, 20);
+  if (x != 25)
+abort ();
+
+  x = bics_si_test2 (35, ~4, 5);
+  if (x != ((35  ~(~4  3)) + ~4 + 5))
+abort ();
+
+  x = bics_si_test2 (96, ~2, 20);
+  if (x != 116)
+abort ();
+
+  y = bics_di_test1 (0x13029ll,
+ ~0x32004ll,
+ 0x505050505ll);
+
+  if (y != ((0x13029ll  0x32004ll) + ~0x32004ll + 0x505050505ll))
+abort ();
+
+  y = bics_di_test1 (0x5000500050005ll,
+ ~0x2111211121112ll,
+ 0x02020ll);
+  if (y != 0x5000500052025ll)
+abort ();
+
+  y = bics_di_test2 (0x13029ll,
+ ~0x06408ll,
+ 0x505050505ll);
+  if (y != ((0x13029ll  ~(~0x06408ll  3))
+   + ~0x06408ll + 0x505050505ll))
+abort ();
+
+  y = bics_di_test2 (0x130002900ll,
+ ~0x08808ll,
+ 0x505050505ll);
+  if (y != (0x130002900ll + 0x505050505ll))
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/aarch64/bics2.c
===
--- gcc/testsuite/gcc.target/aarch64/bics2.c(revision 0)
+++ gcc/testsuite/gcc.target/aarch64/bics2.c(revision 0)
@@ -0,0 +1,111 @@
+/* { dg-do run } */
+/* { dg-options -O2 --save-temps -fno-inline } */
+
+extern void abort (void);
+
+int
+bics_si_test1 (int a, int b, int c)
+{
+  int d = a  ~b;
+
+  /* { dg-final { scan-assembler-not bics\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ } 
} */
+  /* { dg-final { scan-assembler bic\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ } } */
+  if (d = 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+int
+bics_si_test2 (int a, int b, int c)
+{
+  int d = a  ~(b  3);
+
+  /* { dg-final { scan-assembler-not bics\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, 
lsl 3 } } */
+  /* { dg-final { scan-assembler bic\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3 
} } */
+  if (d = 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+typedef long long s64;
+
+s64
+bics_di_test1 (s64 a, s64 b, s64 c)
+{
+  s64 d = a  ~b;
+
+  /* { dg-final { scan-assembler-not bics\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+ } 
} */
+  /* { dg-final { scan-assembler bic\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+ } } */
+  if (d = 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+s64
+bics_di_test2 (s64 a, s64 b, s64 c)
+{
+  s64 d = a  ~(b  3);
+
+  /* { dg-final { scan-assembler-not bics\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, 
lsl 3 } } */
+  /* { dg-final { scan-assembler bic\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 3 
} } */
+  if (d = 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+int
+main ()
+{
+  int x;
+  s64 y;
+
+  x = bics_si_test1 (29, ~4, 5);
+  if (x != ((29  4) + ~4 + 5))
+abort ();
+
+  x = bics_si_test1 (5, ~2, 20);
+  if (x != 25)
+abort ();
+
+  x = bics_si_test2 (35, ~4, 5);
+  if (x != ((35  ~(~4  3)) + ~4 + 5))
+abort ();
+
+  x = bics_si_test2 (96, ~2, 20

[PATCH, AArch64] Support LDR/STR to/from S and D registers

2013-04-26 Thread Ian Bolton

This patch allows us to load to and store from the S and D registers,
which helps with doing scalar operations in those registers.

This has been regression tested on bare-metal and linux.

OK for trunk?

Cheers,
Ian


2013-04-26  Ian Bolton  ian.bol...@arm.com

* config/aarch64/aarch64.md (movsi_aarch64): Support LDR/STR
from/to S register.
(movdi_aarch64): Support LDR/STR from/to D register.Index: gcc/config/aarch64/aarch64.md
===
--- gcc/config/aarch64/aarch64.md   (revision 198231)
+++ gcc/config/aarch64/aarch64.md   (working copy)
@@ -808,26 +808,28 @@ (define_expand movmode
 )
 
 (define_insn *movsi_aarch64
-  [(set (match_operand:SI 0 nonimmediate_operand =r,r,r,m, *w, r,*w)
-   (match_operand:SI 1 aarch64_mov_operand  r,M,m,rZ,rZ,*w,*w))]
+  [(set (match_operand:SI 0 nonimmediate_operand =r,r,r,*w,m,  m,*w, r,*w)
+   (match_operand:SI 1 aarch64_mov_operand   r,M,m, m,rZ,*w,rZ,*w,*w))]
   (register_operand (operands[0], SImode)
 || aarch64_reg_or_zero (operands[1], SImode))
   @
mov\\t%w0, %w1
mov\\t%w0, %1
ldr\\t%w0, %1
+   ldr\\t%s0, %1
str\\t%w1, %0
+   str\\t%s1, %0
fmov\\t%s0, %w1
fmov\\t%w0, %s1
fmov\\t%s0, %s1
-  [(set_attr v8type move,alu,load1,store1,fmov,fmov,fmov)
+  [(set_attr v8type move,alu,load1,load1,store1,store1,fmov,fmov,fmov)
(set_attr mode SI)
-   (set_attr fp *,*,*,*,yes,yes,yes)]
+   (set_attr fp *,*,*,*,*,*,yes,yes,yes)]
 )
 
 (define_insn *movdi_aarch64
-  [(set (match_operand:DI 0 nonimmediate_operand =r,k,r,r,r,m, r,  r,  *w, 
r,*w,w)
-   (match_operand:DI 1 aarch64_mov_operand   
r,r,k,N,m,rZ,Usa,Ush,rZ,*w,*w,Dd))]
+  [(set (match_operand:DI 0 nonimmediate_operand =r,k,r,r,r,*w,m,  m,r,  r, 
 *w, r,*w,w)
+   (match_operand:DI 1 aarch64_mov_operand   r,r,k,N,m, 
m,rZ,*w,Usa,Ush,rZ,*w,*w,Dd))]
   (register_operand (operands[0], DImode)
 || aarch64_reg_or_zero (operands[1], DImode))
   @
@@ -836,16 +838,18 @@ (define_insn *movdi_aarch64
mov\\t%x0, %1
mov\\t%x0, %1
ldr\\t%x0, %1
+   ldr\\t%d0, %1
str\\t%x1, %0
+   str\\t%d1, %0
adr\\t%x0, %a1
adrp\\t%x0, %A1
fmov\\t%d0, %x1
fmov\\t%x0, %d1
fmov\\t%d0, %d1
movi\\t%d0, %1
-  [(set_attr v8type 
move,move,move,alu,load1,store1,adr,adr,fmov,fmov,fmov,fmov)
+  [(set_attr v8type 
move,move,move,alu,load1,load1,store1,store1,adr,adr,fmov,fmov,fmov,fmov)
(set_attr mode DI)
-   (set_attr fp *,*,*,*,*,*,*,*,yes,yes,yes,yes)]
+   (set_attr fp *,*,*,*,*,*,*,*,*,*,yes,yes,yes,yes)]
 )
 
 (define_insn insv_immmode

RE: [PATCH, AArch64] Make MOVK output operand 2 in hex

2013-04-25 Thread Ian Bolton

Since this is a bug fix, I'll need to backport to 4.8.

Is that OK?

Cheers,
Ian

 OK
 /Marcus
 
 On 20 March 2013 17:21, Ian Bolton ian.bol...@arm.com wrote:
  MOVK should not be generated with a negative immediate, which
  the assembler rightfully rejects.
 
  This patch makes MOVK output its 2nd operand in hex instead.
 
  Tested on bare-metal and linux.
 
  OK for trunk?
 
  Cheers,
  Ian
 
 
  2013-03-20  Ian Bolton  ian.bol...@arm.com
 
  gcc/
  * config/aarch64/aarch64.c (aarch64_print_operand): New
  format specifier for printing a constant in hex.
  * config/aarch64/aarch64.md (insv_immmode): Use the X
  format specifier for printing second operand.
 
  testsuite/
  * gcc.target/aarch64/movk.c: New test.

[PATCH, AArch64] Enable Redundant Extension Elimination by default at 02 or higher

2013-04-24 Thread Ian Bolton

This patch enables Redundant Extension Elimination pass for AArch64.

Testing shows no regressions on linux and bare-metal.

In terms of performance impact, it reduces code-size for some benchmarks
and makes no difference on others.

OK to commit to trunk?

Cheers,
Ian


2013-04-24  Ian Bolton  ian.bol...@arm.com

* common/config/aarch64/aarch64-common.c: Enable REE pass at O2 or
higher
by default.
Index: gcc/common/config/aarch64/aarch64-common.c
===
--- gcc/common/config/aarch64/aarch64-common.c  (revision 198231)
+++ gcc/common/config/aarch64/aarch64-common.c  (working copy)
@@ -44,6 +44,8 @@ static const struct default_options aarc
   {
 /* Enable section anchors by default at -O1 or higher.  */
 { OPT_LEVELS_1_PLUS, OPT_fsection_anchors, NULL, 1 },
+/* Enable redundant extension instructions removal at -O2 and higher.  */
+{ OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };

[PATCH AArch64] Make omit-frame-pointer work correctly

2013-03-28 Thread Ian Bolton

Currently, if you compile with -fomit-frame-pointer, the frame record
and frame pointer are still maintained (i.e. There is no way to get
the behaviour you are asking for!).

This patch fixes that.  It also makes sure that if you ask for no frame
pointers in leaf functions then they are not generated there unless LR
gets clobbered in the leaf for some reason.  (I have testcases here to
check for that.)

OK to commit to trunk?

Cheers,
Ian



2013-03-28  Ian Bolton  ian.bol...@arm.com

gcc/
* config/aarch64/aarch64.md (aarch64_can_eliminate): Only keep
frame record when required.

testsuite/
* gcc.target/aarch64/inc/asm-adder-clobber-lr.c: New test.
* gcc.target/aarch64/inc/asm-adder-no-clobber-lr.c: Likewise.
* gcc.target/aarch64/test-framepointer-1.c: Likewise.
* gcc.target/aarch64/test-framepointer-2.c: Likewise.
* gcc.target/aarch64/test-framepointer-3.c: Likewise.
* gcc.target/aarch64/test-framepointer-4.c: Likewise.
* gcc.target/aarch64/test-framepointer-5.c: Likewise.
* gcc.target/aarch64/test-framepointer-6.c: Likewise.
* gcc.target/aarch64/test-framepointer-7.c: Likewise.
* gcc.target/aarch64/test-framepointer-8.c: Likewise.Index: gcc/testsuite/gcc.target/aarch64/test-framepointer-2.c
===
--- gcc/testsuite/gcc.target/aarch64/test-framepointer-2.c  (revision 0)
+++ gcc/testsuite/gcc.target/aarch64/test-framepointer-2.c  (revision 0)
@@ -0,0 +1,15 @@
+/* { dg-do run } */
+/* { dg-options -O2 -fomit-frame-pointer -mno-omit-leaf-frame-pointer 
-fno-inline --save-temps } */
+
+#include asm-adder-no-clobber-lr.c
+
+/* omit-frame-pointer is TRUE.
+   omit-leaf-frame-pointer is false, but irrelevant due to omit-frame-pointer.
+   LR is not being clobbered in the leaf.
+
+   Since we asked to have no frame pointers anywhere, we expect no frame
+   record in main or the leaf.  */
+
+/* { dg-final { scan-assembler-not stp\tx29, x30, \\\[sp, -\[0-9\]+\\\]! } } 
*/
+
+/* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/aarch64/test-framepointer-6.c
===
--- gcc/testsuite/gcc.target/aarch64/test-framepointer-6.c  (revision 0)
+++ gcc/testsuite/gcc.target/aarch64/test-framepointer-6.c  (revision 0)
@@ -0,0 +1,15 @@
+/* { dg-do run } */
+/* { dg-options -O2 -fomit-frame-pointer -mno-omit-leaf-frame-pointer 
-fno-inline --save-temps } */
+
+#include asm-adder-clobber-lr.c
+
+/* omit-frame-pointer is TRUE.
+   omit-leaf-frame-pointer is false, but irrelevant due to omit-frame-pointer.
+   LR is being clobbered in the leaf.
+
+   Since we asked to have no frame pointers anywhere, we expect no frame
+   record in main or the leaf.  */
+
+/* { dg-final { scan-assembler-not stp\tx29, x30, \\\[sp, -\[0-9\]+\\\]! } } 
*/
+
+/* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/aarch64/test-framepointer-3.c
===
--- gcc/testsuite/gcc.target/aarch64/test-framepointer-3.c  (revision 0)
+++ gcc/testsuite/gcc.target/aarch64/test-framepointer-3.c  (revision 0)
@@ -0,0 +1,15 @@
+/* { dg-do run } */
+/* { dg-options -O2 -fomit-frame-pointer -momit-leaf-frame-pointer 
-fno-inline --save-temps } */
+
+#include asm-adder-no-clobber-lr.c
+
+/* omit-frame-pointer is TRUE.
+   omit-leaf-frame-pointer is true, but irrelevant due to omit-frame-pointer.
+   LR is not being clobbered in the leaf.
+
+   Since we asked to have no frame pointers anywhere, we expect no frame
+   record in main or the leaf.  */
+
+/* { dg-final { scan-assembler-not stp\tx29, x30, \\\[sp, -\[0-9\]+\\\]! } } 
*/
+
+/* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/aarch64/test-framepointer-7.c
===
--- gcc/testsuite/gcc.target/aarch64/test-framepointer-7.c  (revision 0)
+++ gcc/testsuite/gcc.target/aarch64/test-framepointer-7.c  (revision 0)
@@ -0,0 +1,15 @@
+/* { dg-do run } */
+/* { dg-options -O2 -fomit-frame-pointer -momit-leaf-frame-pointer 
-fno-inline --save-temps } */
+
+#include asm-adder-clobber-lr.c
+
+/* omit-frame-pointer is TRUE.
+   omit-leaf-frame-pointer is true, but irrelevant due to omit-frame-pointer.
+   LR is being clobbered in the leaf.
+
+   Since we asked to have no frame pointers anywhere, we expect no frame
+   record in main or the leaf.  */
+
+/* { dg-final { scan-assembler-not stp\tx29, x30, \\\[sp, -\[0-9\]+\\\]! } } 
*/
+
+/* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/aarch64/asm-adder-clobber-lr.c
===
--- gcc/testsuite/gcc.target/aarch64/asm-adder-clobber-lr.c (revision 0)
+++ gcc/testsuite/gcc.target/aarch64/asm-adder-clobber-lr.c (revision 0)
@@ -0,0 +1,24 @@
+extern void abort (void

[PATCH, AArch64] Make MOVK output operand 2 in hex

2013-03-20 Thread Ian Bolton

MOVK should not be generated with a negative immediate, which
the assembler rightfully rejects.

This patch makes MOVK output its 2nd operand in hex instead.

Tested on bare-metal and linux.

OK for trunk?

Cheers,
Ian


2013-03-20  Ian Bolton  ian.bol...@arm.com

gcc/
* config/aarch64/aarch64.c (aarch64_print_operand): New
format specifier for printing a constant in hex.
* config/aarch64/aarch64.md (insv_immmode): Use the X
format specifier for printing second operand.

testsuite/
* gcc.target/aarch64/movk.c: New test.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1404a70..5e51630 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3365,6 +3365,16 @@ aarch64_print_operand (FILE *f, rtx x, char code)
   REGNO (x) - V0_REGNUM + (code - 'S'));
   break;
 
+case 'X':
+  /* Print integer constant in hex.  */
+  if (GET_CODE (x) != CONST_INT)
+   {
+ output_operand_lossage (invalid operand for '%%%c', code);
+ return;
+   }
+  asm_fprintf (f, 0x%x, UINTVAL (x));
+  break;
+
 case 'w':
 case 'x':
   /* Print a general register name or the zero register (32-bit or
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 40e66db..9c89413 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -850,8 +850,8 @@
(match_operand:GPI 2 const_int_operand n))]
   INTVAL (operands[1])  GET_MODE_BITSIZE (MODEmode)
 INTVAL (operands[1]) % 16 == 0
-INTVAL (operands[2]) = 0x
-  movk\\t%w0, %2, lsl %1
+UINTVAL (operands[2]) = 0x
+  movk\\t%w0, %X2, lsl %1
   [(set_attr v8type movk)
(set_attr mode MODE)]
 )
diff --git a/gcc/testsuite/gcc.target/aarch64/movk.c 
b/gcc/testsuite/gcc.target/aarch64/movk.c
new file mode 100644
index 000..e4b2209
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movk.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options -O2 --save-temps -fno-inline } */
+
+extern void abort (void);
+
+long long int
+dummy_number_generator ()
+{
+  /* { dg-final { scan-assembler movk\tx\[0-9\]+, 0xefff, lsl 16 } } */
+  /* { dg-final { scan-assembler movk\tx\[0-9\]+, 0xc4cc, lsl 32 } } */
+  /* { dg-final { scan-assembler movk\tx\[0-9\]+, 0xfffe, lsl 48 } } */
+  return -346565474575675;
+}
+
+int
+main (void)
+{
+
+  long long int num = dummy_number_generator ();
+  if (num  0)
+abort ();
+
+  /* { dg-final { scan-assembler movk\tx\[0-9\]+, 0x4667, lsl 16 } } */
+  /* { dg-final { scan-assembler movk\tx\[0-9\]+, 0x7a3d, lsl 32 } } */
+  if (num / 69313094915135 != -5)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { cleanup-saved-temps } } */

RE: [PING^1] [AArch64] Implement Bitwise AND and Set Flags

2013-03-15 Thread Ian Bolton

 Please consider this as a reminder to review the patch posted at
 following link:-
 http://gcc.gnu.org/ml/gcc-patches/2013-01/msg01374.html
 
 The patch is slightly modified to use CC_NZ mode instead of CC.
 
 Please review the patch and let me know if its okay?
 

Hi Naveen,

With the CC_NZ fix, the patch looks good apart from one thing:
the second set in each pattern should have the =r,rk constraint
rather than just =r,r.

That said, I've attached a patch that provides more thorough test cases,
including execute ones.  When you get commit approval (which will be
after GCC goes into stage 1 again) then I can add in the test
cases.  You might as well run them now though, for more confidence
in your work.

BTW, I have an implementation of BICS that's been waiting for
GCC to hit stage 1.  I'll send that out for review soon.

NOTE: I do not have maintainer powers here, so you need someone else
to give the OK to your patch.

Cheers,
Ian
diff --git a/gcc/testsuite/gcc.target/aarch64/ands1.c 
b/gcc/testsuite/gcc.target/aarch64/ands1.c
new file mode 100644
index 000..e2bf956
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ands1.c
@@ -0,0 +1,150 @@
+/* { dg-do run } */
+/* { dg-options -O2 --save-temps } */
+
+extern void abort (void);
+
+int
+ands_si_test1 (int a, int b, int c)
+{
+  int d = a  b;
+
+  /* { dg-final { scan-assembler ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ } } */
+  if (d == 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+int
+ands_si_test2 (int a, int b, int c)
+{
+  int d = a  0xff;
+
+  /* { dg-final { scan-assembler ands\tw\[0-9\]+, w\[0-9\]+, 255 } } */
+  if (d == 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+int
+ands_si_test3 (int a, int b, int c)
+{
+  int d = a  (b  3);
+
+  /* { dg-final { scan-assembler ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 
3 } } */
+  if (d == 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+typedef long long s64;
+
+s64
+ands_di_test1 (s64 a, s64 b, s64 c)
+{
+  s64 d = a  b;
+
+  /* { dg-final { scan-assembler ands\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+ } } */
+  if (d == 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+s64
+ands_di_test2 (s64 a, s64 b, s64 c)
+{
+  s64 d = a  0xff;
+
+  /* { dg-final { scan-assembler ands\tx\[0-9\]+, x\[0-9\]+, 255 } } */
+  if (d == 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+s64
+ands_di_test3 (s64 a, s64 b, s64 c)
+{
+  s64 d = a  (b  3);
+
+  /* { dg-final { scan-assembler ands\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 
3 } } */
+  if (d == 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+int main ()
+{
+  int x;
+  s64 y;
+
+  x = ands_si_test1 (29, 4, 5);
+  if (x != 13)
+abort();
+
+  x = ands_si_test1 (5, 2, 20);
+  if (x != 25)
+abort();
+
+  x = ands_si_test2 (29, 4, 5);
+  if (x != 38)
+abort();
+
+  x = ands_si_test2 (1024, 2, 20);
+  if (x != 1044)
+abort();
+
+  x = ands_si_test3 (35, 4, 5);
+  if (x != 41)
+abort();
+
+  x = ands_si_test3 (5, 2, 20);
+  if (x != 25)
+abort();
+
+  y = ands_di_test1 (0x13029ll,
+ 0x32004ll,
+ 0x505050505ll);
+
+  if (y != ((0x13029ll  0x32004ll) + 0x32004ll + 0x505050505ll))
+abort();
+
+  y = ands_di_test1 (0x5000500050005ll,
+ 0x2111211121112ll,
+ 0x02020ll);
+  if (y != 0x5000500052025ll)
+abort();
+
+  y = ands_di_test2 (0x13029ll,
+ 0x32004ll,
+ 0x505050505ll);
+  if (y != ((0x13029ll  0xff) + 0x32004ll + 0x505050505ll))
+abort();
+
+  y = ands_di_test2 (0x130002900ll,
+ 0x32004ll,
+ 0x505050505ll);
+  if (y != (0x130002900ll + 0x505050505ll))
+abort();
+
+  y = ands_di_test3 (0x13029ll,
+ 0x06408ll,
+ 0x505050505ll);
+  if (y != ((0x13029ll  (0x06408ll  3))
+   + 0x06408ll + 0x505050505ll))
+abort();
+
+  y = ands_di_test3 (0x130002900ll,
+ 0x08808ll,
+ 0x505050505ll);
+  if (y != (0x130002900ll + 0x505050505ll))
+abort();
+
+  return 0;
+}
+
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/ands2.c 
b/gcc/testsuite/gcc.target/aarch64/ands2.c
new file mode 100644
index 000..c778a54
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ands2.c
@@ -0,0 +1,156 @@
+/* { dg-do run } */
+/* { dg-options -O2 --save-temps } */
+
+extern void abort (void);
+
+int
+ands_si_test1 (int a, int b, int c)
+{
+  int d = a  b;
+
+  /* { dg-final { scan-assembler-not ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ } 
} */
+  /* { dg-final { scan-assembler and\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ } } */
+  if (d = 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+int
+ands_si_test2 (int a, int b, int c)
+{
+  int d = a  0x;
+
+  /* { dg-final { scan-assembler-not ands\tw\[0-9\]+, w\[0-9\]+, -1717986919 
} } */
+  /* {

[PATCH, AArch64] Support EXTR in backend

2013-03-14 Thread Ian Bolton

We couldn't generate EXTR for AArch64 ... until now!

This patch includes the pattern and a test.

Full regression testing for Linux and bare-metal passed.

OK for trunk stage-1?

Thanks,
Ian


2013-03-14  Ian Bolton  ian.bol...@arm.com

gcc/
* config/aarch64/aarch64.md (*extrmode5_insn): New pattern.
(*extrsi5_insn_uxtw): Likewise.

testsuite/
* gcc.target/aarch64/extr.c: New test.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 73d86a7..ef1c0f3 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2703,6 +2703,34 @@
(set_attr mode MODE)]
 )
 
+(define_insn *extrmode5_insn
+  [(set (match_operand:GPI 0 register_operand =r)
+   (ior:GPI (ashift:GPI (match_operand:GPI 1 register_operand r)
+(match_operand 3 const_int_operand n))
+(lshiftrt:GPI (match_operand:GPI 2 register_operand r)
+  (match_operand 4 const_int_operand n]
+  UINTVAL (operands[3])  GET_MODE_BITSIZE (MODEmode) 
+   (UINTVAL (operands[3]) + UINTVAL (operands[4]) == GET_MODE_BITSIZE 
(MODEmode))
+  extr\\t%w0, %w1, %w2, %4
+  [(set_attr v8type shift)
+   (set_attr mode MODE)]
+)
+
+;; zero_extend version of the above
+(define_insn *extrsi5_insn_uxtw
+  [(set (match_operand:DI 0 register_operand =r)
+   (zero_extend:DI
+(ior:SI (ashift:SI (match_operand:SI 1 register_operand r)
+   (match_operand 3 const_int_operand n))
+(lshiftrt:SI (match_operand:SI 2 register_operand r)
+ (match_operand 4 const_int_operand n)]
+  UINTVAL (operands[3])  32 
+   (UINTVAL (operands[3]) + UINTVAL (operands[4]) == 32)
+  extr\\t%w0, %w1, %w2, %4
+  [(set_attr v8type shift)
+   (set_attr mode SI)]
+)
+
 (define_insn *ANY_EXTEND:optabGPI:mode_ashlSHORT:mode
   [(set (match_operand:GPI 0 register_operand =r)
(ANY_EXTEND:GPI
diff --git a/gcc/testsuite/gcc.target/aarch64/extr.c 
b/gcc/testsuite/gcc.target/aarch64/extr.c
new file mode 100644
index 000..a78dd8d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/extr.c
@@ -0,0 +1,34 @@
+/* { dg-options -O2 --save-temps } */
+/* { dg-do run } */
+
+extern void abort (void);
+
+int
+test_si (int a, int b)
+{
+  /* { dg-final { scan-assembler extr\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, 27\n 
} } */
+  return (a  5) | ((unsigned int) b  27);
+}
+
+long long
+test_di (long long a, long long b)
+{
+  /* { dg-final { scan-assembler extr\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, 45\n 
} } */
+  return (a  19) | ((unsigned long long) b  45);
+}
+
+int
+main ()
+{
+  int v;
+  long long w;
+  v = test_si (0x0004, 0x3000);
+  if (v != 0x0086)
+abort();
+  w = test_di (0x0001040040040004ll, 0x00700500ll);
+  if (w != 0x2002002000200380ll)
+abort();
+  return 0;
+}
+
+/* { dg-final { cleanup-saved-temps } } */

[PATCH, AArch64] Support ROR in backend

2013-03-14 Thread Ian Bolton

We couldn't generate ROR (preferred alias of EXTR when both source
registers are the same) for AArch64, when rotating by an immediate,
... until now!

This patch includes the pattern and a test.

Full regression testing for Linux and bare-metal passed.

OK for trunk stage-1?

Thanks,
Ian


2013-03-14  Ian Bolton  ian.bol...@arm.com

gcc/
* config/aarch64/aarch64.md (*rormode3_insn): New pattern.
(*rorsi3_insn_uxtw): Likewise.

testsuite/
* gcc.target/aarch64/ror.c: New test.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index ef1c0f3..367c0e3 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2731,6 +2731,34 @@
(set_attr mode SI)]
 )
 
+(define_insn *rormode3_insn
+  [(set (match_operand:GPI 0 register_operand =r)
+   (rotate:GPI (match_operand:GPI 1 register_operand r)
+   (match_operand 2 const_int_operand n)))]
+  UINTVAL (operands[2])  GET_MODE_BITSIZE (MODEmode)
+{
+  operands[3] = GEN_INT (sizen - UINTVAL (operands[2]));
+  return ror\\t%w0, %w1, %3;
+}
+  [(set_attr v8type shift)
+   (set_attr mode MODE)]
+)
+
+;; zero_extend version of the above
+(define_insn *rorsi3_insn_uxtw
+  [(set (match_operand:DI 0 register_operand =r)
+   (zero_extend:DI
+(rotate:SI (match_operand:SI 1 register_operand r)
+   (match_operand 2 const_int_operand n]
+  UINTVAL (operands[2])  32
+{
+  operands[3] = GEN_INT (32 - UINTVAL (operands[2]));
+  return ror\\t%w0, %w1, %3;
+}
+  [(set_attr v8type shift)
+   (set_attr mode SI)]
+)
+
 (define_insn *ANY_EXTEND:optabGPI:mode_ashlSHORT:mode
   [(set (match_operand:GPI 0 register_operand =r)
(ANY_EXTEND:GPI
diff --git a/gcc/testsuite/gcc.target/aarch64/ror.c 
b/gcc/testsuite/gcc.target/aarch64/ror.c
new file mode 100644
index 000..4d266f0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ror.c
@@ -0,0 +1,34 @@
+/* { dg-options -O2 --save-temps } */
+/* { dg-do run } */
+
+extern void abort (void);
+
+int
+test_si (int a)
+{
+  /* { dg-final { scan-assembler ror\tw\[0-9\]+, w\[0-9\]+, 27\n } } */
+  return (a  5) | ((unsigned int) a  27);
+}
+
+long long
+test_di (long long a)
+{
+  /* { dg-final { scan-assembler ror\tx\[0-9\]+, x\[0-9\]+, 45\n } } */
+  return (a  19) | ((unsigned long long) a  45);
+}
+
+int
+main ()
+{
+  int v;
+  long long w;
+  v = test_si (0x0203050);
+  if (v != 0x4060a00)
+abort();
+  w = test_di (0x020506010304ll);
+  if (w != 0x102830081820ll)
+abort();
+  return 0;
+}
+
+/* { dg-final { cleanup-saved-temps } } */

[PATCH, AArch64] Support SBC in the backend

2013-03-14 Thread Ian Bolton

We couldn't generate SBC for AArch64 ... until now!

This really patch includes the main pattern, a zero_extend form
of it and a test.

Full regression testing for Linux and bare-metal passed.

OK for trunk stage-1?

Thanks,
Ian


2013-03-14  Ian Bolton  ian.bol...@arm.com

gcc/
* config/aarch64/aarch64.md (*submode3_carryin): New pattern.
(*subsi3_carryin_uxtw): Likewise.

testsuite/
* gcc.target/aarch64/sbc.c: New test.

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 4358b44..c99e188 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1790,6 +1790,34 @@
(set_attr mode SI)]
 )
 
+(define_insn *submode3_carryin
+  [(set
+(match_operand:GPI 0 register_operand =r)
+(minus:GPI (minus:GPI
+   (match_operand:GPI 1 register_operand r)
+   (ltu:GPI (reg:CC CC_REGNUM) (const_int 0)))
+  (match_operand:GPI 2 register_operand r)))]
+   
+   sbc\\t%w0, %w1, %w2
+  [(set_attr v8type adc)
+   (set_attr mode MODE)]
+)
+
+;; zero_extend version of the above
+(define_insn *subsi3_carryin_uxtw
+  [(set
+(match_operand:DI 0 register_operand =r)
+(zero_extend:DI
+ (minus:SI (minus:SI
+   (match_operand:SI 1 register_operand r)
+   (ltu:SI (reg:CC CC_REGNUM) (const_int 0)))
+  (match_operand:SI 2 register_operand r]
+   
+   sbc\\t%w0, %w1, %w2
+  [(set_attr v8type adc)
+   (set_attr mode SI)]
+)
+
 (define_insn *sub_uxtmode_multp2
   [(set (match_operand:GPI 0 register_operand =rk)
(minus:GPI (match_operand:GPI 4 register_operand r)
diff --git a/gcc/testsuite/gcc.target/aarch64/sbc.c 
b/gcc/testsuite/gcc.target/aarch64/sbc.c
new file mode 100644
index 000..e479910
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sbc.c
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-options -O2 --save-temps } */
+
+extern void abort (void);
+
+typedef unsigned int u32int;
+typedef unsigned long long u64int;
+
+u32int
+test_si (u32int w1, u32int w2, u32int w3, u32int w4)
+{
+  u32int w0;
+  /* { dg-final { scan-assembler sbc\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+\n } } 
*/
+  w0 = w1 - w2 - (w3  w4);
+  return w0;
+}
+
+u64int
+test_di (u64int x1, u64int x2, u64int x3, u64int x4)
+{
+  u64int x0;
+  /* { dg-final { scan-assembler sbc\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+\n } } 
*/
+  x0 = x1 - x2 - (x3  x4);
+  return x0;
+}
+
+int
+main ()
+{
+  u32int x;
+  u64int y;
+  x = test_si (7, 8, 12, 15);
+  if (x != -2)
+abort();
+  y = test_di (0x987654321ll, 0x123456789ll, 0x345345345ll, 0x123123123ll);
+  if (y != 0x8641fdb98ll)
+abort();
+  return 0;
+}
+
+/* { dg-final { cleanup-saved-temps } } */

[PATCH, AArch64] AND operation should use CC_NZ mode

2013-02-01 Thread Ian Bolton

The mode for AND should really be CC_NZ, so I fixed that up and in the TST
patterns that (erroneously) expected it to be CC mode.

It has been tested on linux and bare-metal.

OK to commit to trunk (as bug fix)?

Thanks.
Ian



13-02-01  Ian Bolton  ian.bol...@arm.com

* config/aarch64/aarch64.c (aarch64_select_cc_mode): Return correct
CC mode for AND.
* config/aarch64/aarch64.md (*andmode3nr_compare0): Fixed to use
CC_NZ.
(*and_SHIFT:optabmode3nr_compare0): Likewise.


-

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 03b1361..2b09669 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3076,7 +3076,7 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
   if ((GET_MODE (x) == SImode || GET_MODE (x) == DImode)
y == const0_rtx
(code == EQ || code == NE || code == LT || code == GE)
-   (GET_CODE (x) == PLUS || GET_CODE (x) == MINUS))
+   (GET_CODE (x) == PLUS || GET_CODE (x) == MINUS || GET_CODE (x) ==
AND))
 return CC_NZmode;
 
   /* A compare with a shifted operand.  Because of canonicalization,
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 36267c9..c4c152f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2470,8 +2470,8 @@
 )
 
 (define_insn *andmode3nr_compare0
-  [(set (reg:CC CC_REGNUM)
-   (compare:CC
+  [(set (reg:CC_NZ CC_REGNUM)
+   (compare:CC_NZ
 (and:GPI (match_operand:GPI 0 register_operand %r,r)
  (match_operand:GPI 1 aarch64_logical_operand
r,lconst))
 (const_int 0)))]
@@ -2481,8 +2481,8 @@
(set_attr mode MODE)])
 
 (define_insn *and_SHIFT:optabmode3nr_compare0
-  [(set (reg:CC CC_REGNUM)
-   (compare:CC
+  [(set (reg:CC_NZ CC_REGNUM)
+   (compare:CC_NZ
 (and:GPI (SHIFT:GPI
   (match_operand:GPI 0 register_operand r)
   (match_operand:QI 1 aarch64_shift_imm_mode n))

[PATCH, AArch64] Make zero_extends explicit for some SImode patterns

2013-01-15 Thread Ian Bolton

Greetings!
 
I've made zero_extend versions of SI mode patterns that write
to W registers in order to make the implicit zero_extend that
they do explicit, so GCC can be smarter about when it actually
needs to plant a zero_extend (uxtw).

If that sounds familiar, it's because this patch continues the
work of one already committed. :)

This has been regression-tested for linux and bare-metal.

OK for trunk and backport to ARM/aarch64-4.7-branch?

Cheers,
Ian


2013-01-15  Ian Bolton  ian.bol...@arm.com

* gcc/config/aarch64/aarch64.md
(*cstoresi_neg_uxtw): New pattern.
(*cmovsi_insn_uxtw): New pattern.
(*optabsi3_uxtw): New pattern.
(*LOGICAL:optab_SHIFT:optabsi3_uxtw): New pattern.
(*optabsi3_insn_uxtw): New pattern.
(*bswapsi2_uxtw): New pattern.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index ec65b3c..8dd6c22 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1,5 +1,5 @@
 ;; Machine description for AArch64 architecture.
-;; Copyright (C) 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
+;; Copyright (C) 2009, 2010, 2011, 2012, 2013 Free Software Foundation, Inc.
 ;; Contributed by ARM Ltd.
 ;;
 ;; This file is part of GCC.
@@ -2193,6 +2193,18 @@
(set_attr mode MODE)]
 )
 
+;; zero_extend version of the above
+(define_insn *cstoresi_insn_uxtw
+  [(set (match_operand:DI 0 register_operand =r)
+   (zero_extend:DI
+(match_operator:SI 1 aarch64_comparison_operator
+ [(match_operand 2 cc_register ) (const_int 0)])))]
+  
+  cset\\t%w0, %m1
+  [(set_attr v8type csel)
+   (set_attr mode SI)]
+)
+
 (define_insn *cstoremode_neg
   [(set (match_operand:ALLI 0 register_operand =r)
(neg:ALLI (match_operator:ALLI 1 aarch64_comparison_operator
@@ -2203,6 +2215,18 @@
(set_attr mode MODE)]
 )
 
+;; zero_extend version of the above
+(define_insn *cstoresi_neg_uxtw
+  [(set (match_operand:DI 0 register_operand =r)
+   (zero_extend:DI
+(neg:SI (match_operator:SI 1 aarch64_comparison_operator
+ [(match_operand 2 cc_register ) (const_int 0)]]
+  
+  csetm\\t%w0, %m1
+  [(set_attr v8type csel)
+   (set_attr mode SI)]
+)
+
 (define_expand cmovmode6
   [(set (match_operand:GPI 0 register_operand )
(if_then_else:GPI
@@ -2257,6 +2281,30 @@
(set_attr mode MODE)]
 )
 
+;; zero_extend version of above
+(define_insn *cmovsi_insn_uxtw
+  [(set (match_operand:DI 0 register_operand =r,r,r,r,r,r,r)
+   (zero_extend:DI
+(if_then_else:SI
+ (match_operator 1 aarch64_comparison_operator
+  [(match_operand 2 cc_register ) (const_int 0)])
+ (match_operand:SI 3 aarch64_reg_zero_or_m1_or_1 
rZ,rZ,UsM,rZ,Ui1,UsM,Ui1)
+ (match_operand:SI 4 aarch64_reg_zero_or_m1_or_1 
rZ,UsM,rZ,Ui1,rZ,UsM,Ui1]
+  !((operands[3] == const1_rtx  operands[4] == constm1_rtx)
+ || (operands[3] == constm1_rtx  operands[4] == const1_rtx))
+  ;; Final two alternatives should be unreachable, but included for 
completeness
+  @
+   csel\\t%w0, %w3, %w4, %m1
+   csinv\\t%w0, %w3, wzr, %m1
+   csinv\\t%w0, %w4, wzr, %M1
+   csinc\\t%w0, %w3, wzr, %m1
+   csinc\\t%w0, %w4, wzr, %M1
+   mov\\t%w0, -1
+   mov\\t%w0, 1
+  [(set_attr v8type csel)
+   (set_attr mode SI)]
+)
+
 (define_insn *cmovmode_insn
   [(set (match_operand:GPF 0 register_operand =w)
(if_then_else:GPF
@@ -2369,6 +2417,17 @@
   [(set_attr v8type logic,logic_imm)
(set_attr mode MODE)])
 
+;; zero_extend version of above
+(define_insn *optabsi3_uxtw
+  [(set (match_operand:DI 0 register_operand =r,rk)
+   (zero_extend:DI
+ (LOGICAL:SI (match_operand:SI 1 register_operand %r,r)
+(match_operand:SI 2 aarch64_logical_operand r,K]
+  
+  logical\\t%w0, %w1, %w2
+  [(set_attr v8type logic,logic_imm)
+   (set_attr mode SI)])
+
 (define_insn *LOGICAL:optab_SHIFT:optabmode3
   [(set (match_operand:GPI 0 register_operand =r)
(LOGICAL:GPI (SHIFT:GPI
@@ -2380,6 +2439,19 @@
   [(set_attr v8type logic_shift)
(set_attr mode MODE)])
 
+;; zero_extend version of above
+(define_insn *LOGICAL:optab_SHIFT:optabsi3_uxtw
+  [(set (match_operand:DI 0 register_operand =r)
+   (zero_extend:DI
+(LOGICAL:SI (SHIFT:SI
+ (match_operand:SI 1 register_operand r)
+ (match_operand:QI 2 aarch64_shift_imm_si n))
+(match_operand:SI 3 register_operand r]
+  
+  LOGICAL:logical\\t%w0, %w3, %w1, SHIFT:shift %2
+  [(set_attr v8type logic_shift)
+   (set_attr mode SI)])
+
 (define_insn one_cmplmode2
   [(set (match_operand:GPI 0 register_operand =r)
(not:GPI (match_operand:GPI 1 register_operand r)))]
@@ -2591,6 +2663,18 @@
(set_attr mode MODE)]
 )
 
+;; zero_extend version of above
+(define_insn *optabsi3_insn_uxtw
+  [(set (match_operand:DI 0 register_operand =r)
+   (zero_extend:DI (SHIFT:SI
+(match_operand:SI 1 register_operand r

RE: [PATCH, AArch64] Make zero_extends explicit for common SImode patterns

2012-12-14 Thread Ian Bolton

Hi Richard,

 +  add\\t%w0, %w2, %w, suxtSHORT:size

  ^^^ %w1

Got spot. I guess that pattern hasn't fired yet then!  I'll fix it.


  This patch significantly reduces the number of redundant
  uxtw instructions seen in a variety of programs.
 
  (There are further patterns that can be done, but I have them
  in a separate patch that's still in development.)
 
 What do you get if you enable flag_ree, as we do for x86_64?
 In theory this should avoid even more extensions...
 
 
 C.f. common/config/i386/i386-common.c:
 
 static const struct default_options ix86_option_optimization_table[] =
   {
 /* Enable redundant extension instructions removal at -O2 and
 higher.  */
 { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
 

I should have said that I am indeed running with REE enabled.  It has
some impact (about 70 further UXTW removed from the set of binaries
I've been building) and seems to mostly be good across basic blocks
within the same function.  As far as I can tell, there is no downside
to REE, so I think it should be enabled by default for O2 or higher
on AArch64 too.

I'll prepare a new patch ...

RE: [PATCH, AArch64] Make zero_extends explicit for common SImode patterns

2012-12-14 Thread Ian Bolton

 Hi Richard,
 
  +  add\\t%w0, %w2, %w, suxtSHORT:size
 
   ^^^ %w1
 
 Got spot. I guess that pattern hasn't fired yet then!  I'll fix it.

Now fixed in v3.

 I should have said that I am indeed running with REE enabled.  It has
 some impact (about 70 further UXTW removed from the set of binaries
 I've been building) and seems to mostly be good across basic blocks
 within the same function.  As far as I can tell, there is no downside
 to REE, so I think it should be enabled by default for O2 or higher
 on AArch64 too.
 

I'm going to enable REE in a separate patch.

Is this one OK to commit here and backport to ARM/aarch64-4.7-branch?

Thanks,
Ian
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a9a8b5f..d5c0206 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1271,6 +1273,22 @@
(set_attr mode SI)]
 )
 
+;; zero_extend version of above
+(define_insn *addsi3_aarch64_uxtw
+  [(set
+(match_operand:DI 0 register_operand =rk,rk,rk)
+(zero_extend:DI (plus:SI
+ (match_operand:SI 1 register_operand %rk,rk,rk)
+ (match_operand:SI 2 aarch64_plus_operand I,r,J]
+  
+  @
+  add\\t%w0, %w1, %2
+  add\\t%w0, %w1, %w2
+  sub\\t%w0, %w1, #%n2
+  [(set_attr v8type alu)
+   (set_attr mode SI)]
+)
+
 (define_insn *adddi3_aarch64
   [(set
 (match_operand:DI 0 register_operand =rk,rk,rk,!w)
@@ -1304,6 +1322,23 @@
(set_attr mode MODE)]
 )
 
+;; zero_extend version of above
+(define_insn *addsi3_compare0_uxtw
+  [(set (reg:CC_NZ CC_REGNUM)
+   (compare:CC_NZ
+(plus:SI (match_operand:SI 1 register_operand %r,r)
+  (match_operand:SI 2 aarch64_plus_operand rI,J))
+(const_int 0)))
+   (set (match_operand:DI 0 register_operand =r,r)
+   (zero_extend:DI (plus:SI (match_dup 1) (match_dup 2]
+  
+  @
+  adds\\t%w0, %w1, %w2
+  subs\\t%w0, %w1, #%n2
+  [(set_attr v8type alus)
+   (set_attr mode SI)]
+)
+
 (define_insn *addmode3nr_compare0
   [(set (reg:CC_NZ CC_REGNUM)
(compare:CC_NZ
@@ -1340,6 +1375,19 @@
(set_attr mode MODE)]
 )
 
+;; zero_extend version of above
+(define_insn *add_shift_si_uxtw
+  [(set (match_operand:DI 0 register_operand =rk)
+   (zero_extend:DI (plus:SI
+   (ASHIFT:SI (match_operand:SI 1 register_operand r)
+  (match_operand:QI 2 aarch64_shift_imm_si n))
+  (match_operand:SI 3 register_operand r]
+  
+  add\\t%w0, %w3, %w1, shift %2
+  [(set_attr v8type alu_shift)
+   (set_attr mode SI)]
+)
+
 (define_insn *add_mul_imm_mode
   [(set (match_operand:GPI 0 register_operand =rk)
(plus:GPI (mult:GPI (match_operand:GPI 1 register_operand r)
@@ -1361,6 +1409,17 @@
(set_attr mode GPI:MODE)]
 )
 
+;; zero_extend version of above
+(define_insn *add_optabSHORT:mode_si_uxtw
+  [(set (match_operand:DI 0 register_operand =rk)
+   (zero_extend:DI (plus:SI (ANY_EXTEND:SI (match_operand:SHORT 1 
register_operand r))
+ (match_operand:GPI 2 register_operand r]
+  
+  add\\t%w0, %w2, %w1, suxtSHORT:size
+  [(set_attr v8type alu_ext)
+   (set_attr mode SI)]
+)
+
 (define_insn *add_optabALLX:mode_shft_GPI:mode
   [(set (match_operand:GPI 0 register_operand =rk)
(plus:GPI (ashift:GPI (ANY_EXTEND:GPI
@@ -1373,6 +1432,19 @@
(set_attr mode GPI:MODE)]
 )
 
+;; zero_extend version of above
+(define_insn *add_optabSHORT:mode_shft_si_uxtw
+  [(set (match_operand:DI 0 register_operand =rk)
+   (zero_extend:DI (plus:SI (ashift:SI (ANY_EXTEND:SI
+  (match_operand:SHORT 1 register_operand r))
+ (match_operand 2 aarch64_imm3 Ui3))
+ (match_operand:SI 3 register_operand r]
+  
+  add\\t%w0, %w3, %w1, suxtSHORT:size %2
+  [(set_attr v8type alu_ext)
+   (set_attr mode SI)]
+)
+
 (define_insn *add_optabALLX:mode_mult_GPI:mode
   [(set (match_operand:GPI 0 register_operand =rk)
(plus:GPI (mult:GPI (ANY_EXTEND:GPI
@@ -1385,6 +1457,19 @@
(set_attr mode GPI:MODE)]
 )
 
+;; zero_extend version of above
+(define_insn *add_optabSHORT:mode_mult_si_uxtw
+  [(set (match_operand:DI 0 register_operand =rk)
+   (zero_extend:DI (plus:SI (mult:SI (ANY_EXTEND:SI
+(match_operand:SHORT 1 register_operand r))
+   (match_operand 2 aarch64_pwr_imm3 Up3))
+ (match_operand:SI 3 register_operand r]
+  
+  add\\t%w0, %w3, %w1, suxtSHORT:size %p2
+  [(set_attr v8type alu_ext)
+   (set_attr mode SI)]
+)
+
 (define_insn *add_optabmode_multp2
   [(set (match_operand:GPI 0 register_operand =rk)
(plus:GPI (ANY_EXTRACT:GPI
@@ -1399,6 +1484,21 @@
(set_attr mode MODE)]
 )
 
+;; zero_extend version of above
+(define_insn *add_optabsi_multp2_uxtw
+  [(set (match_operand:DI 0 register_operand =rk)
+   (zero_extend:DI (plus:SI (ANY_EXTRACT:SI
+  (mult:SI (match_operand:SI 1 register_operand r)
+

[PATCH, AArch64] Make zero_extends explicit for common SImode patterns

2012-12-13 Thread Ian Bolton

Season's greetings to you! :)

I've made zero_extend versions of SI mode patterns that write
to W registers in order to make the implicit zero_extend that
they do explicit, so GCC can be smarter about when it actually
needs to plant a zero_extend (uxtw).

This patch significantly reduces the number of redundant
uxtw instructions seen in a variety of programs.

(There are further patterns that can be done, but I have them
in a separate patch that's still in development.)

OK for trunk and backport to ARM/aarch64-4.7-branch?

Cheers,
Ian


2012-12-13  Ian Bolton  ian.bol...@arm.com

* gcc/config/aarch64/aarch64.md
(*addsi3_aarch64_uxtw): New pattern.
(*addsi3_compare0_uxtw): New pattern.
(*add_shift_si_uxtw): New pattern.
(*add_optabSHORT:mode_si_uxtw): New pattern.
(*add_optabSHORT:mode_shft_si_uxtw): New pattern.
(*add_optabSHORT:mode_mult_si_uxtw): New pattern.
(*add_optabsi_multp2_uxtw): New pattern.
(*addsi3_carryin_uxtw): New pattern.
(*addsi3_carryin_alt1_uxtw): New pattern.
(*addsi3_carryin_alt2_uxtw): New pattern.
(*addsi3_carryin_alt3_uxtw): New pattern.
(*add_uxtsi_multp2_uxtw): New pattern.
(*subsi3_uxtw): New pattern.
(*subsi3_compare0_uxtw): New pattern.
(*sub_shift_si_uxtw): New pattern.
(*sub_mul_imm_si_uxtw): New pattern.
(*sub_optabSHORT:mode_si_uxtw): New pattern.
(*sub_optabSHORT:mode_shft_si_uxtw): New pattern.
(*sub_optabsi_multp2_uxtw): New pattern.
(*sub_uxtsi_multp2_uxtw): New pattern.
(*negsi2_uxtw): New pattern.
(*negsi2_compare0_uxtw): New pattern.
(*neg_shift_si2_uxtw): New pattern.
(*neg_mul_imm_si2_uxtw): New pattern.
(*mulsi3_uxtw): New pattern.
(*maddsi_uxtw): New pattern.
(*msubsi_uxtw): New pattern.
(*mulsi_neg_uxtw): New pattern.
(*su_optabdivsi3_uxtw): New pattern.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a9a8b5f..d5c0206 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1271,6 +1273,22 @@
(set_attr mode SI)]
 )
 
+;; zero_extend version of above
+(define_insn *addsi3_aarch64_uxtw
+  [(set
+(match_operand:DI 0 register_operand =rk,rk,rk)
+(zero_extend:DI (plus:SI
+ (match_operand:SI 1 register_operand %rk,rk,rk)
+ (match_operand:SI 2 aarch64_plus_operand I,r,J]
+  
+  @
+  add\\t%w0, %w1, %2
+  add\\t%w0, %w1, %w2
+  sub\\t%w0, %w1, #%n2
+  [(set_attr v8type alu)
+   (set_attr mode SI)]
+)
+
 (define_insn *adddi3_aarch64
   [(set
 (match_operand:DI 0 register_operand =rk,rk,rk,!w)
@@ -1304,6 +1322,23 @@
(set_attr mode MODE)]
 )
 
+;; zero_extend version of above
+(define_insn *addsi3_compare0_uxtw
+  [(set (reg:CC_NZ CC_REGNUM)
+   (compare:CC_NZ
+(plus:SI (match_operand:SI 1 register_operand %r,r)
+  (match_operand:SI 2 aarch64_plus_operand rI,J))
+(const_int 0)))
+   (set (match_operand:DI 0 register_operand =r,r)
+   (zero_extend:DI (plus:SI (match_dup 1) (match_dup 2]
+  
+  @
+  adds\\t%w0, %w1, %w2
+  subs\\t%w0, %w1, #%n2
+  [(set_attr v8type alus)
+   (set_attr mode SI)]
+)
+
 (define_insn *addmode3nr_compare0
   [(set (reg:CC_NZ CC_REGNUM)
(compare:CC_NZ
@@ -1340,6 +1375,19 @@
(set_attr mode MODE)]
 )
 
+;; zero_extend version of above
+(define_insn *add_shift_si_uxtw
+  [(set (match_operand:DI 0 register_operand =rk)
+   (zero_extend:DI (plus:SI
+   (ASHIFT:SI (match_operand:SI 1 register_operand r)
+  (match_operand:QI 2 aarch64_shift_imm_si n))
+  (match_operand:SI 3 register_operand r]
+  
+  add\\t%w0, %w3, %w1, shift %2
+  [(set_attr v8type alu_shift)
+   (set_attr mode SI)]
+)
+
 (define_insn *add_mul_imm_mode
   [(set (match_operand:GPI 0 register_operand =rk)
(plus:GPI (mult:GPI (match_operand:GPI 1 register_operand r)
@@ -1361,6 +1409,17 @@
(set_attr mode GPI:MODE)]
 )
 
+;; zero_extend version of above
+(define_insn *add_optabSHORT:mode_si_uxtw
+  [(set (match_operand:DI 0 register_operand =rk)
+   (zero_extend:DI (plus:SI (ANY_EXTEND:SI (match_operand:SHORT 1 
register_operand r))
+ (match_operand:GPI 2 register_operand r]
+  
+  add\\t%w0, %w2, %w, suxtSHORT:size
+  [(set_attr v8type alu_ext)
+   (set_attr mode SI)]
+)
+
 (define_insn *add_optabALLX:mode_shft_GPI:mode
   [(set (match_operand:GPI 0 register_operand =rk)
(plus:GPI (ashift:GPI (ANY_EXTEND:GPI
@@ -1373,6 +1432,19 @@
(set_attr mode GPI:MODE)]
 )
 
+;; zero_extend version of above
+(define_insn *add_optabSHORT:mode_shft_si_uxtw
+  [(set (match_operand:DI 0 register_operand =rk)
+   (zero_extend:DI (plus:SI (ashift:SI (ANY_EXTEND:SI
+  (match_operand:SHORT 1 register_operand r))
+ (match_operand 2 aarch64_imm3 Ui3

RE: [PATCH, AArch64 4.7] Backport of __builtin_bswap16 optimisation

2012-11-30 Thread Ian Bolton

It turned out that this patch depended on another one from
earlier, so I have backported that to ARM/aarch64-4.7-branch too.

http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00452.html


Cheers,
Ian

 -Original Message-
 From: Ian Bolton [mailto:ian.bol...@arm.com]
 Sent: 23 November 2012 18:09
 To: gcc-patches@gcc.gnu.org
 Subject: [PATCH, AArch64 4.7] Backport of __builtin_bswap16
 optimisation
 
 I had already committed my testcase for this for aarch64, but
 it depends on this patch that doesn't yet exist in 4.7, so I
 backported to our ARM/aarch64-4.7-branch.
 
 Cheers,
 Ian
 
 
 
 From:
 http://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=f811051bf87b1de7804c19
 c8192d0d099d157145
 
 diff --git a/gcc/ChangeLog b/gcc/ChangeLog
 index be34843..ce08fce 100644
 --- a/gcc/ChangeLog
 +++ b/gcc/ChangeLog
 @@ -1,3 +1,8 @@
 +2012-09-26  Christophe Lyon christophe.l...@linaro.org
 +
 + * tree-ssa-math-opts.c (bswap_stats): Add found_16bit field.
 + (execute_optimize_bswap): Add support for builtin_bswap16.
 +
  2012-09-26  Richard Guenther  rguent...@suse.de
 
   * tree.h (DECL_IS_BUILTIN): Compare LOCATION_LOCUS.
 diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
 index 3aad841..7c96949 100644
 --- a/gcc/testsuite/ChangeLog
 +++ b/gcc/testsuite/ChangeLog
 @@ -1,3 +1,7 @@
 +2012-09-26  Christophe Lyon christophe.l...@linaro.org
 +
 + * gcc.target/arm/builtin-bswap16-1.c: New testcase.
 +
  2012-09-25  Segher Boessenkool  seg...@kernel.crashing.org
 
   PR target/51274
 diff --git a/gcc/testsuite/gcc.target/arm/builtin-bswap16-1.c
 b/gcc/testsuite/gcc.target/arm/builtin-bswap16-1.c
 new file mode 100644
 index 000..6920f00
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/arm/builtin-bswap16-1.c
 @@ -0,0 +1,15 @@
 +/* { dg-do compile } */
 +/* { dg-options -O2 } */
 +/* { dg-require-effective-target arm_arch_v6_ok } */
 +/* { dg-add-options arm_arch_v6 } */
 +/* { dg-final { scan-assembler-not orr\[ \t\] } } */
 +
 +unsigned short swapu16_1 (unsigned short x)
 +{
 +  return (x  8) | (x  8);
 +}
 +
 +unsigned short swapu16_2 (unsigned short x)
 +{
 +  return (x  8) | (x  8);
 +}
 diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
 index 16ff397..d9f4e9e 100644
 --- a/gcc/tree-ssa-math-opts.c
 +++ b/gcc/tree-ssa-math-opts.c
 @@ -154,6 +154,9 @@ static struct
 
  static struct
  {
 +  /* Number of hand-written 16-bit bswaps found.  */
 +  int found_16bit;
 +
/* Number of hand-written 32-bit bswaps found.  */
int found_32bit;
 
 @@ -1803,9 +1806,9 @@ static unsigned int
  execute_optimize_bswap (void)
  {
basic_block bb;
 -  bool bswap32_p, bswap64_p;
 +  bool bswap16_p, bswap32_p, bswap64_p;
bool changed = false;
 -  tree bswap32_type = NULL_TREE, bswap64_type = NULL_TREE;
 +  tree bswap16_type = NULL_TREE, bswap32_type = NULL_TREE,
 bswap64_type = NULL_TREE;
 
if (BITS_PER_UNIT != 8)
  return 0;
 @@ -1813,17 +1816,25 @@ execute_optimize_bswap (void)
if (sizeof (HOST_WIDEST_INT)  8)
  return 0;
 
 +  bswap16_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP16)
 + optab_handler (bswap_optab, HImode) !=
 CODE_FOR_nothing);
bswap32_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP32)
   optab_handler (bswap_optab, SImode) !=
 CODE_FOR_nothing);
bswap64_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP64)
   (optab_handler (bswap_optab, DImode) !=
 CODE_FOR_nothing
  || (bswap32_p  word_mode == SImode)));
 
 -  if (!bswap32_p  !bswap64_p)
 +  if (!bswap16_p  !bswap32_p  !bswap64_p)
  return 0;
 
/* Determine the argument type of the builtins.  The code later on
   assumes that the return and argument type are the same.  */
 +  if (bswap16_p)
 +{
 +  tree fndecl = builtin_decl_explicit (BUILT_IN_BSWAP16);
 +  bswap16_type = TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (fndecl)));
 +}
 +
if (bswap32_p)
  {
tree fndecl = builtin_decl_explicit (BUILT_IN_BSWAP32);
 @@ -1863,6 +1874,13 @@ execute_optimize_bswap (void)
 
 switch (type_size)
   {
 + case 16:
 +   if (bswap16_p)
 + {
 +   fndecl = builtin_decl_explicit (BUILT_IN_BSWAP16);
 +   bswap_type = bswap16_type;
 + }
 +   break;
   case 32:
 if (bswap32_p)
   {
 @@ -1890,7 +1908,9 @@ execute_optimize_bswap (void)
   continue;
 
 changed = true;
 -   if (type_size == 32)
 +   if (type_size == 16)
 + bswap_stats.found_16bit++;
 +   else if (type_size == 32)
   bswap_stats.found_32bit++;
 else
   bswap_stats.found_64bit++;
 @@ -1935,6 +1955,8 @@ execute_optimize_bswap (void)
   }
  }
 
 +  statistics_counter_event (cfun, 16-bit bswap implementations
 found,
 + bswap_stats.found_16bit);
statistics_counter_event (cfun, 32-bit bswap implementations
 found

[PATCH, AArch64 4.7] Backport of __builtin_bswap16 optimisation

2012-11-23 Thread Ian Bolton

I had already committed my testcase for this for aarch64, but
it depends on this patch that doesn't yet exist in 4.7, so I
backported to our ARM/aarch64-4.7-branch.

Cheers,
Ian



From:
http://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=f811051bf87b1de7804c19c8192
d0d099d157145

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index be34843..ce08fce 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2012-09-26  Christophe Lyon christophe.l...@linaro.org
+
+   * tree-ssa-math-opts.c (bswap_stats): Add found_16bit field.
+   (execute_optimize_bswap): Add support for builtin_bswap16.
+
 2012-09-26  Richard Guenther  rguent...@suse.de
 
* tree.h (DECL_IS_BUILTIN): Compare LOCATION_LOCUS.
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 3aad841..7c96949 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2012-09-26  Christophe Lyon christophe.l...@linaro.org
+
+   * gcc.target/arm/builtin-bswap16-1.c: New testcase.
+
 2012-09-25  Segher Boessenkool  seg...@kernel.crashing.org
 
PR target/51274
diff --git a/gcc/testsuite/gcc.target/arm/builtin-bswap16-1.c
b/gcc/testsuite/gcc.target/arm/builtin-bswap16-1.c
new file mode 100644
index 000..6920f00
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/builtin-bswap16-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+/* { dg-require-effective-target arm_arch_v6_ok } */
+/* { dg-add-options arm_arch_v6 } */
+/* { dg-final { scan-assembler-not orr\[ \t\] } } */
+
+unsigned short swapu16_1 (unsigned short x)
+{
+  return (x  8) | (x  8);
+}
+
+unsigned short swapu16_2 (unsigned short x)
+{
+  return (x  8) | (x  8);
+}
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 16ff397..d9f4e9e 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -154,6 +154,9 @@ static struct
 
 static struct
 {
+  /* Number of hand-written 16-bit bswaps found.  */
+  int found_16bit;
+
   /* Number of hand-written 32-bit bswaps found.  */
   int found_32bit;
 
@@ -1803,9 +1806,9 @@ static unsigned int
 execute_optimize_bswap (void)
 {
   basic_block bb;
-  bool bswap32_p, bswap64_p;
+  bool bswap16_p, bswap32_p, bswap64_p;
   bool changed = false;
-  tree bswap32_type = NULL_TREE, bswap64_type = NULL_TREE;
+  tree bswap16_type = NULL_TREE, bswap32_type = NULL_TREE, bswap64_type =
NULL_TREE;
 
   if (BITS_PER_UNIT != 8)
 return 0;
@@ -1813,17 +1816,25 @@ execute_optimize_bswap (void)
   if (sizeof (HOST_WIDEST_INT)  8)
 return 0;
 
+  bswap16_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP16)
+   optab_handler (bswap_optab, HImode) != CODE_FOR_nothing);
   bswap32_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP32)
optab_handler (bswap_optab, SImode) != CODE_FOR_nothing);
   bswap64_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP64)
(optab_handler (bswap_optab, DImode) != CODE_FOR_nothing
   || (bswap32_p  word_mode == SImode)));
 
-  if (!bswap32_p  !bswap64_p)
+  if (!bswap16_p  !bswap32_p  !bswap64_p)
 return 0;
 
   /* Determine the argument type of the builtins.  The code later on
  assumes that the return and argument type are the same.  */
+  if (bswap16_p)
+{
+  tree fndecl = builtin_decl_explicit (BUILT_IN_BSWAP16);
+  bswap16_type = TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (fndecl)));
+}
+
   if (bswap32_p)
 {
   tree fndecl = builtin_decl_explicit (BUILT_IN_BSWAP32);
@@ -1863,6 +1874,13 @@ execute_optimize_bswap (void)
 
  switch (type_size)
{
+   case 16:
+ if (bswap16_p)
+   {
+ fndecl = builtin_decl_explicit (BUILT_IN_BSWAP16);
+ bswap_type = bswap16_type;
+   }
+ break;
case 32:
  if (bswap32_p)
{
@@ -1890,7 +1908,9 @@ execute_optimize_bswap (void)
continue;
 
  changed = true;
- if (type_size == 32)
+ if (type_size == 16)
+   bswap_stats.found_16bit++;
+ else if (type_size == 32)
bswap_stats.found_32bit++;
  else
bswap_stats.found_64bit++;
@@ -1935,6 +1955,8 @@ execute_optimize_bswap (void)
}
 }
 
+  statistics_counter_event (cfun, 16-bit bswap implementations found,
+   bswap_stats.found_16bit);
   statistics_counter_event (cfun, 32-bit bswap implementations found,
bswap_stats.found_32bit);
   statistics_counter_event (cfun, 64-bit bswap implementations found,

[PATCH AArch64] Fix faulty commit of testsuite/gcc.target/aarch64/csinc-2.c

2012-11-16 Thread Ian Bolton

A commit I did earlier in the week got truncated somehow, leading
to a broken testcase for AArch64 target.

I've just commited this fix as obvious on trunk and the
arm/aarch64-4.7-branch.

Cheers
Ian


Index: gcc/testsuite/gcc.target/aarch64/csinc-2.c
===
--- gcc/testsuite/gcc.target/aarch64/csinc-2.c  (revision 193571)
+++ gcc/testsuite/gcc.target/aarch64/csinc-2.c  (revision 193572)
@@ -12,3 +12,7 @@ typedef long long s64;
 
 s64
 foo2 (s64 a, s64 b)
+{
+  return (a == b) ? 7 : 1;
+  /* { dg-final { scan-assembler csinc\tx\[0-9\].*xzr } } */
+}

[PATCH AArch64] Implement bswaphi2 with rev16

2012-11-16 Thread Ian Bolton

This patch implements the standard pattern bswaphi2 for AArch64.

Regression tests all pass.

OK for trunk and backport to arm/aarch64-4.7-branch?

Cheers,
Ian



2012-11-16  Ian Bolton  ian.bol...@arm.com

* gcc/config/aarch64/aarch64.md (bswaphi2): New pattern.
* gcc/testsuite/gcc.target/aarch64/builtin-bswap-1.c: New test.
* gcc/testsuite/gcc.target/aarch64/builtin-bswap-2.c: New test.


-


diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index e6086a9..22c7103 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2340,6 +2340,15 @@
(set_attr mode MODE)]
 )
 
+(define_insn bswaphi2
+  [(set (match_operand:HI 0 register_operand =r)
+(bswap:HI (match_operand:HI 1 register_operand r)))]
+  
+  rev16\\t%w0, %w1
+  [(set_attr v8type rev)
+   (set_attr mode HI)]
+)
+
 ;; ---
 ;; Floating-point intrinsics
 ;; ---
diff --git a/gcc/testsuite/gcc.target/aarch64/builtin-bswap-1.c
b/gcc/testsuite/gcc.target/aarch64/builtin-bswap-1.c
new file mode 100644
index 000..a6706e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/builtin-bswap-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+
+/* { dg-final { scan-assembler-times rev16\\t 2 } }  */
+
+/* rev16 */
+short
+swaps16 (short x)
+{
+  return __builtin_bswap16 (x);
+}
+
+/* rev16 */
+unsigned short
+swapu16 (unsigned short x)
+{
+  return __builtin_bswap16 (x);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/builtin-bswap-2.c
b/gcc/testsuite/gcc.target/aarch64/builtin-bswap-2.c
new file mode 100644
index 000..6018b48
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/builtin-bswap-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+
+/* { dg-final { scan-assembler-times rev16\\t 2 } }  */
+
+/* rev16 */
+unsigned short
+swapu16_1 (unsigned short x)
+{
+  return (x  8) | (x  8);
+}
+
+/* rev16 */
+unsigned short
+swapu16_2 (unsigned short x)
+{
+  return (x  8) | (x  8);
+}

[PATCH arm/aarch64-4.7] Fix up Changelogs

2012-11-14 Thread Ian Bolton

Some changes had been added to gcc/ChangeLog and gcc/testsuite/Changelog
when they should have been recorded in the gcc/Changelog.aarch64 and
gcc/testsuite/Changelog.aarch64 files instead.

Committed as obvious.

Cheers,
Ian

[PATCH,AArch64] Optimise comparison where intermediate result not used

2012-11-06 Thread Ian Bolton

Hi all,

When we perform an addition but only use the result for a comparison,
we can save an instruction.

Consider this function:

int foo (int a, int b) {
  return ((a + b) == 0) ? 1 : 7;
}


Here is the original output:

foo:
add w0, w0, w1
cmp w0, wzr
mov w1, 7
mov w0, 1
csel w0, w1, w0, ne
ret

Now we get this:

foo:
cmn w0, w1
mov w1, 7
mov w0, 1
cselw0, w1, w0, ne
ret

:)


I added other testcases for this and also some for adds and subs, which
were investigated as part of this work.


OK for trunk?

Cheers,
Ian


2012-11-06  Ian Bolton  ian.bol...@arm.com

  * gcc/config/aarch64/aarch64.md (*compare_negmode): New pattern.
  * gcc/testsuite/gcc.target/aarch64/cmn.c: New test.
  * gcc/testsuite/gcc.target/aarch64/adds.c: New test.
  * gcc/testsuite/gcc.target/aarch64/subs.c: New test.




diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index e6086a9..6935192 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1310,6 +1310,17 @@
(set_attr mode MODE)]
 )
 
+(define_insn *compare_negmode
+  [(set (reg:CC CC_REGNUM)
+   (compare:CC
+(match_operand:GPI 0 register_operand r)
+(neg:GPI (match_operand:GPI 1 register_operand r]
+  
+  cmn\\t%w0, %w1
+  [(set_attr v8type alus)
+   (set_attr mode MODE)]
+)
+
 (define_insn *add_shift_mode
   [(set (match_operand:GPI 0 register_operand =rk)
(plus:GPI (ASHIFT:GPI (match_operand:GPI 1 register_operand r)
diff --git a/gcc/testsuite/gcc.target/aarch64/adds.c
b/gcc/testsuite/gcc.target/aarch64/adds.c
new file mode 100644
index 000..aa42321
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/adds.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+
+int z;
+int
+foo (int x, int y)
+{
+  int l = x + y;
+  if (l == 0)
+return 5;
+
+  /* { dg-final { scan-assembler adds\tw\[0-9\] } } */
+  z = l ;
+  return 25;
+}
+
+typedef long long s64;
+
+s64 zz;
+s64
+foo2 (s64 x, s64 y)
+{
+  s64 l = x + y;
+  if (l  0)
+return 5;
+
+  /* { dg-final { scan-assembler adds\tx\[0-9\] } } */
+  zz = l ;
+  return 25;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/cmn.c
b/gcc/testsuite/gcc.target/aarch64/cmn.c
new file mode 100644
index 000..1f06f57
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cmn.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+
+int
+foo (int a, int b)
+{
+  if (a + b)
+return 5;
+  else
+return 2;
+  /* { dg-final { scan-assembler cmn\tw\[0-9\] } } */
+}
+
+typedef long long s64;
+
+s64
+foo2 (s64 a, s64 b)
+{
+  if (a + b)
+return 5;
+  else
+return 2;
+  /* { dg-final { scan-assembler cmn\tx\[0-9\] } } */
+}  
diff --git a/gcc/testsuite/gcc.target/aarch64/subs.c
b/gcc/testsuite/gcc.target/aarch64/subs.c
new file mode 100644
index 000..2bf1975
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/subs.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+
+int z;
+int
+foo (int x, int y)
+{
+  int l = x - y;
+  if (l == 0)
+return 5;
+
+  /* { dg-final { scan-assembler subs\tw\[0-9\] } } */
+  z = l ;
+  return 25;
+}
+
+typedef long long s64;
+
+s64 zz;
+s64
+foo2 (s64 x, s64 y)
+{
+  s64 l = x - y;
+  if (l  0)
+return 5;
+
+  /* { dg-final { scan-assembler subs\tx\[0-9\] } } */
+  zz = l ;
+  return 25;
+}

[PATCH,AArch64] Use CSINC instead of CSEL to return 1

2012-11-06 Thread Ian Bolton

Where a CSEL can return the value 1 as one of the alternatives,
it is usually more efficient to use a CSINC than a CSEL (and
never less efficient), since the value of 1 can be derived from
wzr, rather than needing to set it up in a register first.

This patch enables this capability.

It has been regression tested on trunk.

OK for commit?

Cheers,
Ian



2012-11-06  Ian Bolton  ian.bol...@arm.com

* gcc/config/aarch64/aarch64.md (cmovmode_insn): Emit
CSINC when one of the alternatives is constant 1.
* gcc/config/aarch64/constraints.md: New constraint.
* gcc/config/aarch64/predicates.md: Rename predicate
aarch64_reg_zero_or_m1 to aarch64_reg_zero_or_m1_or_1.

* gcc/testsuite/gcc.target/aarch64/csinc-2.c: New test.



-

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 6935192..038465e 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1877,19 +1877,23 @@
 )
 
 (define_insn *cmovmode_insn
-  [(set (match_operand:ALLI 0 register_operand =r,r,r,r)
+  [(set (match_operand:ALLI 0 register_operand =r,r,r,r,r,r,r)
(if_then_else:ALLI
 (match_operator 1 aarch64_comparison_operator
  [(match_operand 2 cc_register ) (const_int 0)])
-(match_operand:ALLI 3 aarch64_reg_zero_or_m1 rZ,rZ,UsM,UsM)
-(match_operand:ALLI 4 aarch64_reg_zero_or_m1 rZ,UsM,rZ,UsM)))]
-  
-  ;; Final alternative should be unreachable, but included for completeness
+(match_operand:ALLI 3 aarch64_reg_zero_or_m1_or_1
rZ,rZ,UsM,rZ,Ui1,UsM,Ui1)
+(match_operand:ALLI 4 aarch64_reg_zero_or_m1_or_1
rZ,UsM,rZ,Ui1,rZ,UsM,Ui1)))]
+  !((operands[3] == const1_rtx  operands[4] == constm1_rtx)
+ || (operands[3] == constm1_rtx  operands[4] == const1_rtx))
+  ;; Final two alternatives should be unreachable, but included for
completeness
   @
csel\\t%w0, %w3, %w4, %m1
csinv\\t%w0, %w3, wzr, %m1
csinv\\t%w0, %w4, wzr, %M1
-   mov\\t%w0, -1
+   csinc\\t%w0, %w3, wzr, %m1
+   csinc\\t%w0, %w4, wzr, %M1
+   mov\\t%w0, -1
+   mov\\t%w0, 1
   [(set_attr v8type csel)
(set_attr mode MODE)]
 )
diff --git a/gcc/config/aarch64/constraints.md
b/gcc/config/aarch64/constraints.md
index da50a47..780faaa 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -102,6 +102,11 @@
   A constraint that matches the immediate constant -1.
   (match_test op == constm1_rtx))
 
+(define_constraint Ui1
+  @internal
+  A constraint that matches the immediate constant +1.
+  (match_test op == const1_rtx))
+
 (define_constraint Ui3
   @internal
   A constraint that matches the integers 0...4.
diff --git a/gcc/config/aarch64/predicates.md
b/gcc/config/aarch64/predicates.md
index 328e5cf..aae71c1 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -31,11 +31,12 @@
(ior (match_operand 0 register_operand)
(match_test op == const0_rtx
 
-(define_predicate aarch64_reg_zero_or_m1
+(define_predicate aarch64_reg_zero_or_m1_or_1
   (and (match_code reg,subreg,const_int)
(ior (match_operand 0 register_operand)
(ior (match_test op == const0_rtx)
-(match_test op == constm1_rtx)
+(ior (match_test op == constm1_rtx)
+ (match_test op == const1_rtx))
 
 (define_predicate aarch64_fp_compare_operand
   (ior (match_operand 0 register_operand)
diff --git a/gcc/testsuite/gcc.target/aarch64/csinc-2.c
b/gcc/testsuite/gcc.target/aarch64/csinc-2.c
new file mode 100644
index 000..6ed9080
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/csinc-2.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+
+int
+foo (int a, int b)
+{
+  return (a  b) ? 1 : 7;
+  /* { dg-final { scan-assembler csinc\tw\[0-9\].*wzr } } */
+}
+
+typedef long long s64;
+
+s64
+foo2 (s64 a, s64 b)
+{
+  return (a == b) ? 7 : 1;
+  /* { dg-final { scan-assembler csinc\tx\[0-9\].*xzr } } */
+}

RE: [PATCH][AArch64] Restrict usage of SBFIZ to valid range only

2012-10-16 Thread Ian Bolton

 Subject: [PATCH][AArch64] Restrict usage of SBFIZ to valid range only
 
 This fixes an issue where we were generating an SBFIZ with
 operand 3 outside of the valid range (as determined by the
 size of the destination register and the amount of shift).
 
 My patch checks that the range is valid before allowing
 the pattern to be used.
 
 This has now had full regression testing and all is OK.
 
 OK for aarch64-trunk and aarch64-4_7-branch?
 
 Cheers,
 Ian
 
 
 2012-10-15  Ian Bolton  ian.bol...@arm.com
 
   * gcc/config/aarch64/aarch64.md
   (optabALLX:mode_shft_GPI:mode): Restrict based on op2.
 
 
 -
 
 
 diff --git a/gcc/config/aarch64/aarch64.md
 b/gcc/config/aarch64/aarch64.md
 index e6086a9..3bfe6e6 100644
 --- a/gcc/config/aarch64/aarch64.md
 +++ b/gcc/config/aarch64/aarch64.md
 @@ -2311,7 +2311,7 @@
 (ashift:GPI (ANY_EXTEND:GPI
  (match_operand:ALLX 1 register_operand r))
 (match_operand 2 const_int_operand n)))]
 -  
 +  ALLX:sizen = (GPI:sizen - UINTVAL (operands[2]))
subfiz\\t%GPI:w0, %GPI:w1, %2, #ALLX:sizen
[(set_attr v8type bfm)
 (set_attr mode GPI:MODE)]
 
 
 

New and improved version is at the end of this email.

This has had full regression testing and all is OK.

OK for aarch64-trunk and aarch64-4_7-branch?

Cheers,
Ian



2012-10-16  Ian Bolton  ian.bol...@arm.com

  * gcc/config/aarch64/aarch64.md
  (optabALLX:mode_shft_GPI:mode): Restrict operands.




diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index e6086a9..e77496f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2311,8 +2311,13 @@
(ashift:GPI (ANY_EXTEND:GPI
 (match_operand:ALLX 1 register_operand r))
(match_operand 2 const_int_operand n)))]
-  
-  subfiz\\t%GPI:w0, %GPI:w1, %2, #ALLX:sizen
+  UINTVAL (operands[2])  GPI:sizen
+{
+  operands[3] = (ALLX:sizen = (GPI:sizen - UINTVAL (operands[2])))
+ ? GEN_INT (ALLX:sizen)
+ : GEN_INT (GPI:sizen - UINTVAL (operands[2]));
+  return subfiz\t%GPI:w0, %GPI:w1, %2, %3;
+}
   [(set_attr v8type bfm)
(set_attr mode GPI:MODE)]
 )

[PATCH][AArch64] Restrict usage of SBFIZ to valid range only

2012-10-15 Thread Ian Bolton

This fixes an issue where we were generating an SBFIZ with
operand 3 outside of the valid range (as determined by the
size of the destination register and the amount of shift).

My patch checks that the range is valid before allowing
the pattern to be used.

This has now had full regression testing and all is OK.

OK for aarch64-trunk and aarch64-4_7-branch?

Cheers,
Ian


2012-10-15  Ian Bolton  ian.bol...@arm.com

* gcc/config/aarch64/aarch64.md
(optabALLX:mode_shft_GPI:mode): Restrict based on op2.


-


diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index e6086a9..3bfe6e6 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2311,7 +2311,7 @@
(ashift:GPI (ANY_EXTEND:GPI
 (match_operand:ALLX 1 register_operand r))
(match_operand 2 const_int_operand n)))]
-  
+  ALLX:sizen = (GPI:sizen - UINTVAL (operands[2]))
   subfiz\\t%GPI:w0, %GPI:w1, %2, #ALLX:sizen
   [(set_attr v8type bfm)
(set_attr mode GPI:MODE)]

RE: [PATCH, AArch64] Allow symbol+offset even if not being used for memory access

2012-09-25 Thread Ian Bolton

 Ok.  Having dug a bit deeper I think the main problem is that you're
 working against yourself by not handling this pattern right from the
 beginning.  You have split the address incorrectly to begin and are
 now trying to recover after the fact.
 
 The following patch seems to do the trick for me, producing
 
  (insn 6 5 7 (set (reg:DI 81)
  (high:DI (const:DI (plus:DI (symbol_ref:DI (arr) [flags
 0x80]  var_decl 0x7f9bae1105f0 arr)
  (const_int 12 [0xc]) z.c:8 -1
   (nil))
 
  (insn 7 6 8 (set (reg:DI 80)
  (lo_sum:DI (reg:DI 81)
  (const:DI (plus:DI (symbol_ref:DI (arr) [flags 0x80]
 var_decl 0x7f9bae1105f0 arr)
  (const_int 12 [0xc]) z.c:8 -1
   (expr_list:REG_EQUAL (const:DI (plus:DI (symbol_ref:DI (arr)
 [flags 0x80]  var_decl 0x7f9bae1105f0 arr)
  (const_int 12 [0xc])))
  (nil)))
 
 right from the .150.expand dump.
 
 I'll leave it to you to fully regression test and commit the patch
 as appropriate.  ;-)
 

Thanks so much for this, Richard.

I have prepared a new patch heavily based off yours, which really
demands its own new email trail, so I shall make a fresh post.

Cheers,
Ian

[PATCH, AArch64] Handle symbol + offset more effectively

2012-09-25 Thread Ian Bolton

Hi all,

This patch corrects what seemed to be a typo in expand_mov_immediate
in aarch64.c, where we had || instead of an  in our original code.

if (offset != const0_rtx
   (targetm.cannot_force_const_mem (mode, imm)
  || (can_create_pseudo_p (  // - should have been 

At any given time, this code would have treated all input the same
and will have caused all non-zero offsets to have been forced to
temporaries, and made us never run the code in the remainder of the
function.

In terms of measurable impact, this patch provides a better fix to the
problem I was trying to solve with this patch:

http://gcc.gnu.org/ml/gcc-patches/2012-08/msg02072.html

Almost all credit should go to Richard Henderson for this patch.
It is all his, but for a minor change I made to some predicates which
now become relevant when we execute more of the expand_mov_immediate
function.

My testing showed no regressions for bare-metal or linux.

OK for aarch64-branch and aarch64-4.7-branch?

Cheers,
Ian


2012-09-25  Richard Henderson  r...@redhat.com
Ian Bolton  ian.bol...@arm.com

* config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Fix a
functional typo and refactor code in switch statement.
* config/aarch64/aarch64.md (add_losym): Handle symbol + offset.
* config/aarch64/predicates.md (aarch64_tls_ie_symref): Match const.
(aarch64_tls_le_symref): Likewise.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 2d7eba7..edeee30 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -652,43 +652,57 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
   unsigned HOST_WIDE_INT val;
   bool subtargets;
   rtx subtarget;
-  rtx base, offset;
   int one_match, zero_match;
 
   gcc_assert (mode == SImode || mode == DImode);
 
-  /* If we have (const (plus symbol offset)), and that expression cannot
- be forced into memory, load the symbol first and add in the offset.  */
-  split_const (imm, base, offset);
-  if (offset != const0_rtx
-   (targetm.cannot_force_const_mem (mode, imm)
- || (can_create_pseudo_p (
-{
-  base = aarch64_force_temporary (dest, base);
-  aarch64_emit_move (dest, aarch64_add_offset (mode, NULL, base, INTVAL 
(offset)));
-  return;
-}
-
   /* Check on what type of symbol it is.  */
-  if (GET_CODE (base) == SYMBOL_REF || GET_CODE (base) == LABEL_REF)
+  if (GET_CODE (imm) == SYMBOL_REF
+  || GET_CODE (imm) == LABEL_REF
+  || GET_CODE (imm) == CONST)
 {
-  rtx mem;
-  switch (aarch64_classify_symbol (base, SYMBOL_CONTEXT_ADR))
+  rtx mem, base, offset;
+  enum aarch64_symbol_type sty;
+
+  /* If we have (const (plus symbol offset)), separate out the offset
+before we start classifying the symbol.  */
+  split_const (imm, base, offset);
+
+  sty = aarch64_classify_symbol (base, SYMBOL_CONTEXT_ADR);
+  switch (sty)
{
case SYMBOL_FORCE_TO_MEM:
- mem  = force_const_mem (mode, imm);
+ if (offset != const0_rtx
+  targetm.cannot_force_const_mem (mode, imm))
+   {
+ gcc_assert(can_create_pseudo_p ());
+ base = aarch64_force_temporary (dest, base);
+ base = aarch64_add_offset (mode, NULL, base, INTVAL (offset));
+ aarch64_emit_move (dest, base);
+ return;
+   }
+ mem = force_const_mem (mode, imm);
  gcc_assert (mem);
  emit_insn (gen_rtx_SET (VOIDmode, dest, mem));
  return;
 
-case SYMBOL_SMALL_TLSGD:
-case SYMBOL_SMALL_TLSDESC:
-case SYMBOL_SMALL_GOTTPREL:
-case SYMBOL_SMALL_TPREL:
+case SYMBOL_SMALL_TLSGD:
+case SYMBOL_SMALL_TLSDESC:
+case SYMBOL_SMALL_GOTTPREL:
case SYMBOL_SMALL_GOT:
+ if (offset != const0_rtx)
+   {
+ gcc_assert(can_create_pseudo_p ());
+ base = aarch64_force_temporary (dest, base);
+ base = aarch64_add_offset (mode, NULL, base, INTVAL (offset));
+ aarch64_emit_move (dest, base);
+ return;
+   }
+ /* FALLTHRU */
+
+case SYMBOL_SMALL_TPREL:
case SYMBOL_SMALL_ABSOLUTE:
- aarch64_load_symref_appropriately
-   (dest, imm, aarch64_classify_symbol (base, SYMBOL_CONTEXT_ADR));
+ aarch64_load_symref_appropriately (dest, imm, sty);
  return;
 
default:
@@ -696,7 +710,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
}
 }
 
-  if ((CONST_INT_P (imm)  aarch64_move_imm (INTVAL (imm), mode)))
+  if (CONST_INT_P (imm)  aarch64_move_imm (INTVAL (imm), mode))
 {
   emit_insn (gen_rtx_SET (VOIDmode, dest, imm));
   return;
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b399ab4..3834558 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2840,7 +2840,7 @@
(lo_sum:DI

[PATCH, AArch64] Implement ctz and clrsb standard patterns

2012-09-18 Thread Ian Bolton

I've implemented the following standard patterns:

* clrsb
* ctz

Regression runs passed and I have added compilation tests for them,
and clz as well.   (Execution tests are covered by
gcc/testsuite/gcc.c-torture/execute/builtin-bitops-1.c.)

OK for aarch64-branch and aarch64-4.7-branch?

Cheers,
Ian


2012-09-18  Ian Bolton  ian.bol...@arm.com

gcc/
* config/aarch64/aarch64.h: Define CTZ_DEFINED_VALUE_AT_ZERO.
* config/aarch64/aarch64.md (clrsbmode2): New pattern.
* config/aarch64/aarch64.md (rbitmode2): New pattern.
* config/aarch64/aarch64.md (ctzmode2): New pattern.

gcc/testsuite/
* gcc.target/aarch64/clrsb.c: New test.
* gcc.target/aarch64/clz.c: New test.
* gcc.target/aarch64/ctz.c: New test.diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 5d121fa..abf96c5 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -703,6 +703,8 @@ do {
 \
 
 #define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
   ((VALUE) = ((MODE) == SImode ? 32 : 64), 2)
+#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
+  ((VALUE) = ((MODE) == SImode ? 32 : 64), 2)
 
 #define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (Pmode, LR_REGNUM)
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 33815ff..5278957 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -153,6 +153,8 @@
 (UNSPEC_CMTST   83) ; Used in aarch64-simd.md.
 (UNSPEC_FMAX83) ; Used in aarch64-simd.md.
 (UNSPEC_FMIN84) ; Used in aarch64-simd.md.
+(UNSPEC_CLS 85) ; Used in aarch64-simd.md.
+(UNSPEC_RBIT86) ; Used in aarch64-simd.md.
   ]
 )
 
@@ -2128,6 +2130,33 @@
   [(set_attr v8type clz)
(set_attr mode MODE)])
 
+(define_insn clrsbmode2
+  [(set (match_operand:GPI 0 register_operand =r)
+   (unspec:GPI [(match_operand:GPI 1 register_operand r)] UNSPEC_CLS))]
+  
+  cls\\t%w0, %w1
+  [(set_attr v8type clz)
+   (set_attr mode MODE)])
+
+(define_insn rbitmode2
+  [(set (match_operand:GPI 0 register_operand =r)
+   (unspec:GPI [(match_operand:GPI 1 register_operand r)] 
UNSPEC_RBIT))]
+  
+  rbit\\t%w0, %w1
+  [(set_attr v8type rbit)
+   (set_attr mode MODE)])
+
+(define_expand ctzmode2
+  [(match_operand:GPI 0 register_operand)
+   (match_operand:GPI 1 register_operand)]
+  
+  {
+emit_insn (gen_rbitmode2 (operands[0], operands[1]));
+emit_insn (gen_clzmode2 (operands[0], operands[0]));
+DONE;
+  }
+)
+
 (define_insn *andmode3nr_compare0
   [(set (reg:CC CC_REGNUM)
(compare:CC
diff --git a/gcc/testsuite/gcc.target/aarch64/clrsb.c 
b/gcc/testsuite/gcc.target/aarch64/clrsb.c
new file mode 100644
index 000..a75dfa0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/clrsb.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+
+unsigned int functest(unsigned int x)
+{
+   return __builtin_clrsb(x);
+}
+
+/* { dg-final { scan-assembler cls\tw } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/clz.c 
b/gcc/testsuite/gcc.target/aarch64/clz.c
new file mode 100644
index 000..66e2d29
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/clz.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+
+unsigned int functest(unsigned int x)
+{
+   return __builtin_clz(x);
+}
+
+/* { dg-final { scan-assembler clz\tw } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/ctz.c 
b/gcc/testsuite/gcc.target/aarch64/ctz.c
new file mode 100644
index 000..15a2473
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ctz.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+
+unsigned int functest(unsigned int x)
+{
+  return __builtin_ctz(x);
+}
+
+/* { dg-final { scan-assembler rbit\tw } } */
+/* { dg-final { scan-assembler clz\tw } } */
+

RE: [PATCH, AArch64] Implement ctz and clrsb standard patterns

2012-09-18 Thread Ian Bolton


  diff --git a/gcc/config/aarch64/aarch64.md
 b/gcc/config/aarch64/aarch64.md
  index 33815ff..5278957 100644
  --- a/gcc/config/aarch64/aarch64.md
  +++ b/gcc/config/aarch64/aarch64.md
  @@ -153,6 +153,8 @@
   (UNSPEC_CMTST   83) ; Used in aarch64-simd.md.
   (UNSPEC_FMAX83) ; Used in aarch64-simd.md.
   (UNSPEC_FMIN84) ; Used in aarch64-simd.md.
  +(UNSPEC_CLS 85) ; Used in aarch64-simd.md.
  +(UNSPEC_RBIT86) ; Used in aarch64-simd.md.
 
 The comment doesn't appear to be true.
 

Fair point!  I will fix that.

RE: [PATCH, AArch64] Implement ctz and clrsb standard patterns

2012-09-18 Thread Ian Bolton


   diff --git a/gcc/config/aarch64/aarch64.md
  b/gcc/config/aarch64/aarch64.md
   index 33815ff..5278957 100644
   --- a/gcc/config/aarch64/aarch64.md
   +++ b/gcc/config/aarch64/aarch64.md
   @@ -153,6 +153,8 @@
(UNSPEC_CMTST 83) ; Used in aarch64-simd.md.
(UNSPEC_FMAX  83) ; Used in aarch64-simd.md.
(UNSPEC_FMIN  84) ; Used in aarch64-simd.md.
   +(UNSPEC_CLS   85) ; Used in aarch64-simd.md.
   +(UNSPEC_RBIT  86) ; Used in aarch64-simd.md.
 
  The comment doesn't appear to be true.
 
 
 Fair point!  I will fix that.
 

New patch with comment fixed is attached.

Now good to commit to aarch64-branch and aarch64-4.7-branch?

Cheers,
Iandiff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 5d121fa..abf96c5 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -703,6 +703,8 @@ do {
 \
 
 #define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
   ((VALUE) = ((MODE) == SImode ? 32 : 64), 2)
+#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
+  ((VALUE) = ((MODE) == SImode ? 32 : 64), 2)
 
 #define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (Pmode, LR_REGNUM)
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 33815ff..5278957 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -153,6 +153,8 @@
 (UNSPEC_CMTST   83) ; Used in aarch64-simd.md.
 (UNSPEC_FMAX83) ; Used in aarch64-simd.md.
 (UNSPEC_FMIN84) ; Used in aarch64-simd.md.
+(UNSPEC_CLS 85) ; Used in aarch64.md.
+(UNSPEC_RBIT86) ; Used in aarch64.md.
   ]
 )
 
@@ -2128,6 +2130,33 @@
   [(set_attr v8type clz)
(set_attr mode MODE)])
 
+(define_insn clrsbmode2
+  [(set (match_operand:GPI 0 register_operand =r)
+   (unspec:GPI [(match_operand:GPI 1 register_operand r)] UNSPEC_CLS))]
+  
+  cls\\t%w0, %w1
+  [(set_attr v8type clz)
+   (set_attr mode MODE)])
+
+(define_insn rbitmode2
+  [(set (match_operand:GPI 0 register_operand =r)
+   (unspec:GPI [(match_operand:GPI 1 register_operand r)] 
UNSPEC_RBIT))]
+  
+  rbit\\t%w0, %w1
+  [(set_attr v8type rbit)
+   (set_attr mode MODE)])
+
+(define_expand ctzmode2
+  [(match_operand:GPI 0 register_operand)
+   (match_operand:GPI 1 register_operand)]
+  
+  {
+emit_insn (gen_rbitmode2 (operands[0], operands[1]));
+emit_insn (gen_clzmode2 (operands[0], operands[0]));
+DONE;
+  }
+)
+
 (define_insn *andmode3nr_compare0
   [(set (reg:CC CC_REGNUM)
(compare:CC
diff --git a/gcc/testsuite/gcc.target/aarch64/clrsb.c 
b/gcc/testsuite/gcc.target/aarch64/clrsb.c
new file mode 100644
index 000..a75dfa0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/clrsb.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+
+unsigned int functest(unsigned int x)
+{
+   return __builtin_clrsb(x);
+}
+
+/* { dg-final { scan-assembler cls\tw } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/clz.c 
b/gcc/testsuite/gcc.target/aarch64/clz.c
new file mode 100644
index 000..66e2d29
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/clz.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+
+unsigned int functest(unsigned int x)
+{
+   return __builtin_clz(x);
+}
+
+/* { dg-final { scan-assembler clz\tw } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/ctz.c 
b/gcc/testsuite/gcc.target/aarch64/ctz.c
new file mode 100644
index 000..15a2473
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ctz.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+
+unsigned int functest(unsigned int x)
+{
+  return __builtin_ctz(x);
+}
+
+/* { dg-final { scan-assembler rbit\tw } } */
+/* { dg-final { scan-assembler clz\tw } } */
+

RE: [PATCH, AArch64] Implement ctz and clrsb standard patterns

2012-09-18 Thread Ian Bolton

New version attached with better formatted test cases.

OK for aarch64-branch and aarch64-4.7-branch?

Cheers,
Ian


-

2012-09-18  Ian Bolton  ian.bol...@arm.com

gcc/
* config/aarch64/aarch64.h: Define CTZ_DEFINED_VALUE_AT_ZERO.
* config/aarch64/aarch64.md (clrsbmode2): New pattern.
* config/aarch64/aarch64.md (rbitmode2): New pattern.
* config/aarch64/aarch64.md (ctzmode2): New pattern.

gcc/testsuite/
* gcc.target/aarch64/clrsb.c: New test.
* gcc.target/aarch64/clz.c: New test.
* gcc.target/aarch64/ctz.c: New test.diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 5d121fa..abf96c5 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -703,6 +703,8 @@ do {
 \
 
 #define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
   ((VALUE) = ((MODE) == SImode ? 32 : 64), 2)
+#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
+  ((VALUE) = ((MODE) == SImode ? 32 : 64), 2)
 
 #define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (Pmode, LR_REGNUM)
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 33815ff..5278957 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -153,6 +153,8 @@
 (UNSPEC_CMTST   83) ; Used in aarch64-simd.md.
 (UNSPEC_FMAX83) ; Used in aarch64-simd.md.
 (UNSPEC_FMIN84) ; Used in aarch64-simd.md.
+(UNSPEC_CLS 85) ; Used in aarch64.md.
+(UNSPEC_RBIT86) ; Used in aarch64.md.
   ]
 )
 
@@ -2128,6 +2130,33 @@
   [(set_attr v8type clz)
(set_attr mode MODE)])
 
+(define_insn clrsbmode2
+  [(set (match_operand:GPI 0 register_operand =r)
+   (unspec:GPI [(match_operand:GPI 1 register_operand r)] UNSPEC_CLS))]
+  
+  cls\\t%w0, %w1
+  [(set_attr v8type clz)
+   (set_attr mode MODE)])
+
+(define_insn rbitmode2
+  [(set (match_operand:GPI 0 register_operand =r)
+   (unspec:GPI [(match_operand:GPI 1 register_operand r)] 
UNSPEC_RBIT))]
+  
+  rbit\\t%w0, %w1
+  [(set_attr v8type rbit)
+   (set_attr mode MODE)])
+
+(define_expand ctzmode2
+  [(match_operand:GPI 0 register_operand)
+   (match_operand:GPI 1 register_operand)]
+  
+  {
+emit_insn (gen_rbitmode2 (operands[0], operands[1]));
+emit_insn (gen_clzmode2 (operands[0], operands[0]));
+DONE;
+  }
+)
+
 (define_insn *andmode3nr_compare0
   [(set (reg:CC CC_REGNUM)
(compare:CC
diff --git a/gcc/testsuite/gcc.target/aarch64/clrsb.c 
b/gcc/testsuite/gcc.target/aarch64/clrsb.c
new file mode 100644
index 000..a75dfa0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/clrsb.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+
+unsigned int functest (unsigned int x)
+{
+  return __builtin_clrsb (x);
+}
+
+/* { dg-final { scan-assembler cls\tw } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/clz.c 
b/gcc/testsuite/gcc.target/aarch64/clz.c
new file mode 100644
index 000..66e2d29
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/clz.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+
+unsigned int functest (unsigned int x)
+{
+  return __builtin_clz (x);
+}
+
+/* { dg-final { scan-assembler clz\tw } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/ctz.c 
b/gcc/testsuite/gcc.target/aarch64/ctz.c
new file mode 100644
index 000..15a2473
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ctz.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+
+unsigned int functest (unsigned int x)
+{
+  return __builtin_ctz (x);
+}
+
+/* { dg-final { scan-assembler rbit\tw } } */
+/* { dg-final { scan-assembler clz\tw } } */
+

RE: [PATCH, AArch64] Implement fnma, fms and fnms standard patterns

2012-09-17 Thread Ian Bolton

OK for 4.7 as well?

 -Original Message-
 From: Richard Earnshaw
 Sent: 14 September 2012 18:18
 To: Ian Bolton
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH, AArch64] Implement fnma, fms and fnms standard
 patterns

 On 14/09/12 18:05, Ian Bolton wrote:
  The following standard pattern names were implemented by simply
  renaming some existing patterns:

  * fnma
  * fms
  * fnms

  I have added an extra pattern for when we don't care about
  signed zero, so we can do -fma (a,b,c) more efficiently.

  Regression testing all passed.

  OK to commit?

  Cheers,
  Ian

  2012-09-14  Ian Bolton  ian.bol...@arm.com

  gcc/
  * config/aarch64/aarch64.md (fmsubmode4): Renamed
  to fnmamode4.
  * config/aarch64/aarch64.md (fnmsubmode4): Renamed
  to fmsmode4.
  * config/aarch64/aarch64.md (fnmaddmode4): Renamed
  to fnmsmode4.
  * config/aarch64/aarch64.md (*fnmaddmode4): New pattern.

  testsuite/
  * gcc.target/aarch64/fmadd.c: Added extra tests.
  * gcc.target/aarch64/fnmadd-fastmath.c: New test.

 OK.

 R.

RE: [PATCH, AArch64] Implement ffs standard pattern

2012-09-17 Thread Ian Bolton

OK for aarch64-4.7-branch as well?

 -Original Message-
 From: Richard Earnshaw
 Sent: 14 September 2012 18:31
 To: Ian Bolton
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH, AArch64] Implement ffs standard pattern

 On 14/09/12 16:26, Ian Bolton wrote:
  I've implemented the standard pattern ffs, which leads to
  __builtin_ffs being generated with 4 instructions instead
  of 5 instructions.

  Regression tests and my new test pass.

  OK to commit?

  Cheers,
  Ian

  2012-09-14  Ian Bolton  ian.bol...@arm.com

  gcc/
  * config/aarch64/aarch64.md (csinc3mode): Make it into a
  named pattern.
  * config/aarch64/aarch64.md (ffsmode2): New pattern.

  testsuite/

  * gcc.target/aarch64/ffs.c: New test.

 OK.

 R.

[PATCH, AArch64] Implement ffs standard pattern

2012-09-14 Thread Ian Bolton

I've implemented the standard pattern ffs, which leads to
__builtin_ffs being generated with 4 instructions instead
of 5 instructions.

Regression tests and my new test pass.

OK to commit?


Cheers,
Ian



2012-09-14  Ian Bolton  ian.bol...@arm.com
 
gcc/
* config/aarch64/aarch64.md (csinc3mode): Make it into a
named pattern.
* config/aarch64/aarch64.md (ffsmode2): New pattern.

testsuite/

* gcc.target/aarch64/ffs.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 5278957..dfdba42 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2021,7 +2021,7 @@
   [(set_attr v8type csel)
(set_attr mode MODE)])
 
-(define_insn *csinc3mode_insn
+(define_insn csinc3mode_insn
   [(set (match_operand:GPI 0 register_operand =r)
 (if_then_else:GPI
  (match_operator:GPI 1 aarch64_comparison_operator
@@ -2157,6 +2157,21 @@
   }
 )
 
+(define_expand ffsmode2
+  [(match_operand:GPI 0 register_operand)
+   (match_operand:GPI 1 register_operand)]
+  
+  {
+rtx ccreg = aarch64_gen_compare_reg (EQ, operands[1], const0_rtx);
+rtx x = gen_rtx_NE (VOIDmode, ccreg, const0_rtx);
+
+emit_insn (gen_rbitmode2 (operands[0], operands[1]));
+emit_insn (gen_clzmode2 (operands[0], operands[0]));
+emit_insn (gen_csinc3mode_insn (operands[0], x, ccreg, operands[0], 
const0_rtx));
+DONE;
+  }
+)
+
 (define_insn *andmode3nr_compare0
   [(set (reg:CC CC_REGNUM)
(compare:CC
diff --git a/gcc/testsuite/gcc.target/aarch64/ffs.c 
b/gcc/testsuite/gcc.target/aarch64/ffs.c
new file mode 100644
index 000..a344761
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ffs.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+
+unsigned int functest(unsigned int x)
+{
+  return __builtin_ffs(x);
+}
+
+/* { dg-final { scan-assembler cmp\tw } } */
+/* { dg-final { scan-assembler rbit\tw } } */
+/* { dg-final { scan-assembler clz\tw } } */
+/* { dg-final { scan-assembler csinc\tw } } */

[PATCH, AArch64] Implement fnma, fms and fnms standard patterns

2012-09-14 Thread Ian Bolton

The following standard pattern names were implemented by simply
renaming some existing patterns:

* fnma
* fms
* fnms

I have added an extra pattern for when we don't care about
signed zero, so we can do -fma (a,b,c) more efficiently.

Regression testing all passed.
 
OK to commit?

Cheers,
Ian


2012-09-14  Ian Bolton  ian.bol...@arm.com

gcc/
* config/aarch64/aarch64.md (fmsubmode4): Renamed
to fnmamode4.
* config/aarch64/aarch64.md (fnmsubmode4): Renamed
to fmsmode4.
* config/aarch64/aarch64.md (fnmaddmode4): Renamed
to fnmsmode4.
* config/aarch64/aarch64.md (*fnmaddmode4): New pattern.

testsuite/
* gcc.target/aarch64/fmadd.c: Added extra tests.
* gcc.target/aarch64/fnmadd-fastmath.c: New test.


diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 3fbebf7..33815ff 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2506,7 +2506,7 @@
(set_attr mode MODE)]
 )
 
-(define_insn *fmsubmode4
+(define_insn fnmamode4
   [(set (match_operand:GPF 0 register_operand =w)
(fma:GPF (neg:GPF (match_operand:GPF 1 register_operand w))
 (match_operand:GPF 2 register_operand w)
@@ -2517,7 +2517,7 @@
(set_attr mode MODE)]
 )
 
-(define_insn *fnmsubmode4
+(define_insn fmsmode4
   [(set (match_operand:GPF 0 register_operand =w)
 (fma:GPF (match_operand:GPF 1 register_operand w)
 (match_operand:GPF 2 register_operand w)
@@ -2528,7 +2528,7 @@
(set_attr mode MODE)]
 )
 
-(define_insn *fnmaddmode4
+(define_insn fnmsmode4
   [(set (match_operand:GPF 0 register_operand =w)
(fma:GPF (neg:GPF (match_operand:GPF 1 register_operand w))
 (match_operand:GPF 2 register_operand w)
@@ -2539,6 +2539,18 @@
(set_attr mode MODE)]
 )
 
+;; If signed zeros are ignored, -(a * b + c) = -a * b - c.
+(define_insn *fnmaddmode4
+  [(set (match_operand:GPF 0 register_operand)
+   (neg:GPF (fma:GPF (match_operand:GPF 1 register_operand)
+ (match_operand:GPF 2 register_operand)
+ (match_operand:GPF 3 register_operand]
+  !HONOR_SIGNED_ZEROS (MODEmode)  TARGET_FLOAT
+  fnmadd\\t%s0, %s1, %s2, %s3
+  [(set_attr v8type fmadd)
+   (set_attr mode MODE)]
+)
+
 ;; ---
 ;; Floating-point conversions
 ;; ---
diff --git a/gcc/testsuite/gcc.target/aarch64/fmadd.c 
b/gcc/testsuite/gcc.target/aarch64/fmadd.c
index 3b4..39975db 100644
--- a/gcc/testsuite/gcc.target/aarch64/fmadd.c
+++ b/gcc/testsuite/gcc.target/aarch64/fmadd.c
@@ -4,15 +4,52 @@
 extern double fma (double, double, double);
 extern float fmaf (float, float, float);
 
-double test1 (double x, double y, double z)
+double test_fma1 (double x, double y, double z)
 {
   return fma (x, y, z);
 }
 
-float test2 (float x, float y, float z)
+float test_fma2 (float x, float y, float z)
 {
   return fmaf (x, y, z);
 }
 
+double test_fnma1 (double x, double y, double z)
+{
+  return fma (-x, y, z);
+}
+
+float test_fnma2 (float x, float y, float z)
+{
+  return fmaf (-x, y, z);
+}
+
+double test_fms1 (double x, double y, double z)
+{
+  return fma (x, y, -z);
+}
+
+float test_fms2 (float x, float y, float z)
+{
+  return fmaf (x, y, -z);
+}
+
+double test_fnms1 (double x, double y, double z)
+{
+  return fma (-x, y, -z);
+}
+
+float test_fnms2 (float x, float y, float z)
+{
+  return fmaf (-x, y, -z);
+}
+
 /* { dg-final { scan-assembler-times fmadd\td\[0-9\] 1 } } */
 /* { dg-final { scan-assembler-times fmadd\ts\[0-9\] 1 } } */
+/* { dg-final { scan-assembler-times fmsub\td\[0-9\] 1 } } */
+/* { dg-final { scan-assembler-times fmsub\ts\[0-9\] 1 } } */
+/* { dg-final { scan-assembler-times fnmsub\td\[0-9\] 1 } } */
+/* { dg-final { scan-assembler-times fnmsub\ts\[0-9\] 1 } } */
+/* { dg-final { scan-assembler-times fnmadd\td\[0-9\] 1 } } */
+/* { dg-final { scan-assembler-times fnmadd\ts\[0-9\] 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/aarch64/fnmadd-fastmath.c 
b/gcc/testsuite/gcc.target/aarch64/fnmadd-fastmath.c
new file mode 100644
index 000..9c115df
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/fnmadd-fastmath.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -ffast-math } */
+
+extern double fma (double, double, double);
+extern float fmaf (float, float, float);
+
+double test_fma1 (double x, double y, double z)
+{
+  return - fma (x, y, z);
+}
+
+float test_fma2 (float x, float y, float z)
+{
+  return - fmaf (x, y, z);
+}
+
+/* { dg-final { scan-assembler-times fnmadd\td\[0-9\] 1 } } */
+/* { dg-final { scan-assembler-times fnmadd\ts\[0-9\] 1 } } */
+

RE: [PATCH, AArch64] Allow symbol+offset even if not being used for memory access

2012-09-10 Thread Ian Bolton

 Can you send me the test case you were looking at for this?

See attached.  (Most of it is superfluous, but the point is that
we are not using the address to do a memory access.)

Cheers,
Ian
 

constant-test1.c
Description: Binary data

RE: [PATCH, AArch64] Allow symbol+offset even if not being used for memory access

2012-09-06 Thread Ian Bolton

 On 2012-08-31 07:49, Ian Bolton wrote:
  +(define_split
  +  [(set (match_operand:DI 0 register_operand =r)
  +   (const:DI (plus:DI (match_operand:DI 1 aarch64_valid_symref
 S)
  +  (match_operand:DI 2 const_int_operand
 i]
  +  
  +  [(set (match_dup 0) (high:DI (const:DI (plus:DI (match_dup 1)
  + (match_dup 2)
  +   (set (match_dup 0) (lo_sum:DI (match_dup 0)
  +(const:DI (plus:DI (match_dup 1)
  +   (match_dup 2)]
  +  
  +)
 
 You ought not need this as a separate split, since (CONST ...) should
 be handled exactly like (SYMBOL_REF).

I see in combine.c that it does get done for a MEM (which is how my
earlier patch worked), but not when it's being used for other reasons
(hence the title of this email).

See below for current code from find_split_point:

case MEM:
#ifdef HAVE_lo_sum
  /* If we have (mem (const ..)) or (mem (symbol_ref ...)), split it
 using LO_SUM and HIGH.  */
  if (GET_CODE (XEXP (x, 0)) == CONST
  || GET_CODE (XEXP (x, 0)) == SYMBOL_REF)
{
  enum machine_mode address_mode
= targetm.addr_space.address_mode (MEM_ADDR_SPACE (x));

  SUBST (XEXP (x, 0),
 gen_rtx_LO_SUM (address_mode,
 gen_rtx_HIGH (address_mode, XEXP (x, 0)),
 XEXP (x, 0)));
  return XEXP (XEXP (x, 0), 0);
}
#endif


If I don't use my split pattern, I could alter combine to remove the
requirement that parent is a MEM.

What do you think?

 
 Also note that constraints (=r etc) aren't used for splits.
 

If I keep the pattern, I will remove the constraints.  Thanks for the
pointers in this regard.


Cheers,
Ian

RE: [PATCH, AArch64] Allow symbol+offset even if not being used for memory access

2012-09-06 Thread Ian Bolton

 From: Richard Henderson [mailto:r...@redhat.com]
 On 09/06/2012 08:06 AM, Ian Bolton wrote:
  If I don't use my split pattern, I could alter combine to remove the
  requirement that parent is a MEM.
 
  What do you think?
 
 I merely question the calling out of CONST as special.
 
 Either you've got some pattern that handles SYMBOL_REF
 the same way, or you're missing something.

Oh, I understand now.  Thanks for clarifying.

Some digging has shown me that the transformation keys off
the equivalence, as highlighted below.  It's always phrased in
terms of a const and never a symbol_ref.


after ud_dce:

   6 r82:DI=high(`arr')
   7 r81:DI=r82:DI+low(`arr')
 REG_DEAD: r82:DI
 REG_EQUAL: `arr'
   8 r80:DI=r81:DI+0xc
 REG_DEAD: r81:DI
 REG_EQUAL: const(`arr'+0xc)   - this equivalence

after combine:

   7 r80:DI=high(const(`arr'+0xc))
   8 r80:DI=r80:DI+low(const(`arr'+0xc))
 REG_EQUAL: const(`arr'+0xc)   - this equivalence


Based on that, and assuming I remove the constraints on the
pattern, would you say the patch is worthy of commit?

Thanks,
Ian

[PATCH, AArch64] Allow symbol+offset even if not being used for memory access

2012-08-31 Thread Ian Bolton

Hi,

This patch builds on a previous one that allowed symbol+offset as symbol
references for memory accesses.  It allows us to have symbol+offset even
when no memory access is apparent.

It reduces codesize for cases such as this one:

  int arr[100];

  uint64_t foo (uint64_t a) {
uint64_t const z = 1234567ll32+7;
uint64_t const y = (uint64_t) arr[3];
return y + a + z;
  }

Before the patch, the code looked like this:

adrpx2, arr
mov x1, 74217034874880 
add x2, x2, :lo12:arr  
add x2, x2, 12 
movkx1, 2411, lsl 48   
add x1, x2, x1 
add x0, x1, x0 
ret

Now, it looks like this:
 
adrpx1, arr+12
mov x2, 74217034874880
movkx2, 2411, lsl 48
add x1, x1, :lo12:arr+12
add x1, x1, x2
add x0, x1, x0
ret

Testing shows no regressions.  OK to commit?


2012-08-31  Ian Bolton  ian.bol...@arm.com

* gcc/config/aarch64/aarch64.md: New pattern.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a00d3f0..de9c927 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2795,7 +2795,7 @@
(lo_sum:DI (match_operand:DI 1 register_operand r)
   (match_operand 2 aarch64_valid_symref S)))]
   
-  add\\t%0, %1, :lo12:%2
+  add\\t%0, %1, :lo12:%a2
   [(set_attr v8type alu)
(set_attr mode DI)]
 
@@ -2890,6 +2890,20 @@
   [(set_attr length 0)]
 )
 
+(define_split
+  [(set (match_operand:DI 0 register_operand =r)
+   (const:DI (plus:DI (match_operand:DI 1 aarch64_valid_symref S)
+  (match_operand:DI 2 const_int_operand i]
+  
+  [(set (match_dup 0) (high:DI (const:DI (plus:DI (match_dup 1)
+ (match_dup 2)
+   (set (match_dup 0) (lo_sum:DI (match_dup 0)
+(const:DI (plus:DI (match_dup 1)
+   (match_dup 2)]
+  
+)
+
+
 ;; AdvSIMD Stuff
 (include aarch64-simd.md)

[PATCH, AArch64] Allow symbol+offset as symbolic constant expression

2012-07-06 Thread Ian Bolton

Hi,

This patch reduces codesize for cases such as this one:

 int arr[100];
 int foo () { return arr[10]; }

Before the patch, the code looked like this:

 adrp x0, arr
 add x0, x0, :lo12:arr
 ldr w0, [x0,40]

Now, it looks like this:
 
 adrp x0, arr+40
 ldr w0, [x0,#:lo12:arr+40]

Some workloads have seen up to 1K reduction in code size.

OK to commit?

Cheers,
Ian



2012-07-06  Ian Bolton  ian.bol...@arm.com

* gcc/config/aarch64/aarch64.c (aarch64_print_operand): Use
aarch64_classify_symbolic_expression for classifying operands.

* gcc/config/aarch64/aarch64.c
(aarch64_classify_symbolic_expression): New function.

* gcc/config/aarch64/aarch64.c (aarch64_symbolic_constant_p):
New function.

* gcc/config/aarch64/predicates.md (aarch64_valid_symref):
Symbol with constant offset is a valid symbol reference.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 542c1e0..53c238a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2820,6 +2820,17 @@ aarch64_symbolic_address_p (rtx x)
   return GET_CODE (x) == SYMBOL_REF || GET_CODE (x) == LABEL_REF;
 }
 
+/* Classify the base of symbolic expression X, given that X appears in
+   context CONTEXT.  */
+static enum aarch64_symbol_type
+aarch64_classify_symbolic_expression (rtx x, enum aarch64_symbol_context 
context)
+{
+  rtx offset;
+  split_const (x, x, offset);
+  return aarch64_classify_symbol (x, context);
+}
+
+
 /* Return TRUE if X is a legitimate address for accessing memory in
mode MODE.  */
 static bool
@@ -3227,7 +3238,7 @@ aarch64_print_operand (FILE *f, rtx x, char code)
   if (GET_CODE (x) == HIGH)
x = XEXP (x, 0);
 
-  switch (aarch64_classify_symbol (x, SYMBOL_CONTEXT_ADR))
+  switch (aarch64_classify_symbolic_expression (x, SYMBOL_CONTEXT_ADR))
{
case SYMBOL_SMALL_GOT:
  asm_fprintf (asm_out_file, :got:);
@@ -3256,7 +3267,7 @@ aarch64_print_operand (FILE *f, rtx x, char code)
   break;
 
 case 'L':
-  switch (aarch64_classify_symbol (x, SYMBOL_CONTEXT_ADR))
+  switch (aarch64_classify_symbolic_expression (x, SYMBOL_CONTEXT_ADR))
{
case SYMBOL_SMALL_GOT:
  asm_fprintf (asm_out_file, :lo12:);
@@ -3285,7 +3296,8 @@ aarch64_print_operand (FILE *f, rtx x, char code)
   break;
 
 case 'G':
-  switch (aarch64_classify_symbol (x, SYMBOL_CONTEXT_ADR))
+
+  switch (aarch64_classify_symbolic_expression (x, SYMBOL_CONTEXT_ADR))
{
case SYMBOL_SMALL_TPREL:
  asm_fprintf (asm_out_file, :tprel_hi12:);
@@ -4746,6 +4758,8 @@ aarch64_classify_tls_symbol (rtx x)
 }
 }
 
+/* Return the method that should be used to access SYMBOL_REF or
+   LABEL_REF X in context CONTEXT.  */
 enum aarch64_symbol_type
 aarch64_classify_symbol (rtx x,
 enum aarch64_symbol_context context ATTRIBUTE_UNUSED)
@@ -4817,7 +4831,23 @@ aarch64_classify_symbol (rtx x,
   return SYMBOL_FORCE_TO_MEM;
 }
 
+/* Return true if X is a symbolic constant that can be used in context
+   CONTEXT.  If it is, store the type of the symbol in *SYMBOL_TYPE.  */
+
+bool
+aarch64_symbolic_constant_p (rtx x, enum aarch64_symbol_context context,
+enum aarch64_symbol_type *symbol_type)
+{
+  rtx offset;
+  split_const (x, x, offset);
+  if (GET_CODE (x) == SYMBOL_REF || GET_CODE (x) == LABEL_REF)
+*symbol_type = aarch64_classify_symbol (x, context);
+  else
+return false;
 
+  /* No checking of offset at this point.  */
+  return true;
+}
 
 bool
 aarch64_constant_address_p (rtx x)
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 7089e8b..328e5cf 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -114,8 +114,12 @@
(match_test mode == DImode  CONSTANT_ADDRESS_P (op
 
 (define_predicate aarch64_valid_symref
-  (and (match_code symbol_ref, label_ref)
-   (match_test aarch64_classify_symbol (op, SYMBOL_CONTEXT_ADR) != 
SYMBOL_FORCE_TO_MEM)))
+  (match_code const, symbol_ref, label_ref)
+{
+  enum aarch64_symbol_type symbol_type;
+  return (aarch64_symbolic_constant_p (op, SYMBOL_CONTEXT_ADR, symbol_type)
+ symbol_type != SYMBOL_FORCE_TO_MEM);
+})
 
 (define_predicate aarch64_tls_ie_symref
   (match_code symbol_ref, label_ref)

Using -save-temps and @file should also save the intermediate @file used by the driver?

2011-05-10 Thread Ian Bolton

Does anyone have some thoughts they'd like to share on this:

When you compile anything using @file support, the driver assumes @file
(at_file_supplied is true) is allowed and may pass options to the linker via
@file using a *temporary* file.

When -save-temps is also used, the temporary @file passed to the linker
should
also be saved.

Saving the temporary @file passed to the linker allows a developer to re-run
just the collect2/ld command.

On trunk this means that gcc/gcc.c (create_at_file) should honour the
save_temps_flag, saving the temporary @file for later analysis or use.

From: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44273

RE: Move cgraph_node_set and varpool_node_set out of GGC and make them use pointer_map

2011-05-04 Thread Ian Bolton

 Hi,
 I always considered the cgrpah_node_set/varpool_node_set to be
 overengineered
 but they also turned out to be quite ineffective since we do quite a
 lot of
 queries into them during stremaing out.
 
 This patch moves them to pointer_map, like I did for streamer cache.
 While
 doing so I needed to get the structure out of GGC memory since
 pointer_map is
 not ggc firendly. This is not a deal at all, because the sets can only
 point to
 live cgraph/varpool entries anyway. Pointing to removed ones would lead
 to
 spectacular failures in any case.
 
 Bootstrapped/regtested x86_64-linux, OK?
 
 Honza
 
   * cgraph.h (cgraph_node_set_def, varpool_node_set_def): Move out
 of GTY;
   replace hash by pointer map.
   (cgraph_node_set_element_def, cgraph_node_set_element,
   const_cgraph_node_set_element, varpool_node_set_element_def,
   varpool_node_set_element, const_varpool_node_set_element):
 Remove.
   (free_cgraph_node_set, free_varpool_node_set): New function.
   (cgraph_node_set_size, varpool_node_set_size): Use vector size.
   * tree-emutls.c: Free varpool node set.
   * ipa-utils.c (cgraph_node_set_new, cgraph_node_set_add,
   cgraph_node_set_remove, cgraph_node_set_find,
 dump_cgraph_node_set,
   debug_cgraph_node_set, free_cgraph_node_set,
 varpool_node_set_new,
   varpool_node_set_add, varpool_node_set_remove,
 varpool_node_set_find,
   dump_varpool_node_set, free_varpool_node_set,
 debug_varpool_node_set):
   Move here from ipa.c; implement using pointer_map
   * ipa.c (cgraph_node_set_new, cgraph_node_set_add,
   cgraph_node_set_remove, cgraph_node_set_find,
 dump_cgraph_node_set,
   debug_cgraph_node_set, varpool_node_set_new,
   varpool_node_set_add, varpool_node_set_remove,
 varpool_node_set_find,
   dump_varpool_node_set, debug_varpool_node_set):
   Move to ipa-uitls.c.
   * lto/lto.c (ltrans_partition_def): Remove GTY annotations.
   (ltrans_partitions): Move to heap.
   (new_partition): Update.
   (free_ltrans_partitions): New function.
   (lto_wpa_write_files): Use it.
   * passes.c (ipa_write_summaries): Update.

This causes cross and native build of ARM Linux toolchain to fail:

gcc -c   -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE  -W -Wall -Wwrite-
strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes -Wmissing-
format-attribute -Wold-style-definition -Wc++-compat -fno-common  -
DHAVE_CONFIG_H -I. -Ilto -
I/work/source/gcc -
I/work/source/gcc/lto -
I/work/source/gcc/../include -
I/work/source/gcc/../libcpp/include -
I/work/source/gcc/../libdecnumber -
I/work/source/gcc/../libdecnumber/dpd
-I../libdecnumber
/work/source/gcc/lto/lto.c -o
lto/lto.o
/work/source/gcc/lto/lto.c:1163:
warning: function declaration isn't a prototype
/work/source/gcc/lto/lto.c: In
function 'free_ltrans_partitions':
/work/source/gcc/lto/lto.c:1163:
warning: old-style function definition
/work/source/gcc/lto/lto.c:1168:
error: 'struct ltrans_partition_def' has no member named 'cgraph'
/work/source/gcc/lto/lto.c:1168:
error: 'set' undeclared (first use in this function)
/work/source/gcc/lto/lto.c:1168:
error: (Each undeclared identifier is reported only once
/work/source/gcc/lto/lto.c:1168:
error: for each function it appears in.)
/work/source/gcc/lto/lto.c:1171:
warning: implicit declaration of function
'VEC_latrans_partition_heap_free'
make[2]: *** [lto/lto.o] Error 1
make[2]: *** Waiting for unfinished jobs
rm gcov.pod gfdl.pod cpp.pod fsf-funding.pod gcc.pod
make[2]: Leaving directory `/work/cross-build/trunk-r173334-
thumb/arm-none-linux-gnueabi/obj/gcc1/gcc'
make[1]: *** [all-gcc] Error 2
make[1]: Leaving directory `/work/cross-build/trunk-r173334-
thumb/arm-none-linux-gnueabi/obj/gcc1'
make: *** [all] Error 2
+ exit

But I see you fixed it up soon after (r173336), so no action is
required now, but I figured it was worth letting people know anyway.

Cheers,
Ian

RE: Link error

2010-08-12 Thread Ian Bolton

Phung Nguyen wrote:
 I am trying to build cross compiler for xc16x. I built successfully
 binutils 2.18; gcc 4.0 and newlib 1.18. Everything is fine when
 compiling a simple C file without any library call. It is also fine
 when making a simple call to printf like printf(Hello world).
 However, i got error message from linker when call printf(i=%i,i);

I don't know the answer, but I think you are more likely to get one
if you post to gcc-h...@gcc.gnu.org.  The gcc@gcc.gnu.org list is for
people developing gcc, rather than only building or using it.

I hope you find your answer soon.

Best regards,
Ian

Question about tree-switch-conversion.c

2010-08-10 Thread Ian Bolton

I am in the process of fixing PR44328
(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44328)

The problem is that gen_inbound_check in tree-switch-conversion.c subtracts
info.range_min from info.index_expr, which can cause the MIN and MAX values
for info.index_expr to become invalid.

For example:


typedef enum {
  FIRST = 0,
  SECOND,
  THIRD,
  FOURTH
} ExampleEnum;

int dummy (const ExampleEnum e)
{
  int mode = 0;
  switch (e)
  {
case SECOND: mode = 20; break;
case THIRD: mode = 30; break;
case FOURTH: mode = 40; break;
  }
  return mode;
}


tree-switch-conversion would like to create a lookup table for this, so
that SECOND maps to entry 0, THIRD maps to entry 1 and FOURTH maps to
entry 2.  It achieves this by subtracting SECOND from index_expr.  The
problem is that after the subtraction, the type of the result can have a
value outside the range 0-3.

Later, when tree-vrp.c sees the inbound check as being = 2 with a possible
range for the type as 0-3, it converts the =2 into a != 3, which is
totally wrong.  If e==FIRST, then we can end up looking for entry 255 in
the lookup table!

I think the solution is to update the type of the result of the subtraction
to show that it is no longer in the range 0-3, but I have had trouble
implementing this.  The attached patch (based off 4.5 branch) shows my
current approach, but I ran into LTO issues:

lto1: internal compiler error: in lto_get_pickled_tree, at lto-streamer-in.c

I am guessing this is because the debug info for the type does not match
the new range I have set for it.

Is there a *right* way to update the range such that LTO doesn't get
unhappy?  (Maybe a cast with fold_convert_loc would be right?)


pr44328.gcc4.5.fix.patch
Description: Binary data

Making BB Reorder Smarter for -Os

2010-04-08 Thread Ian Bolton

Bugzilla 41004 calls for a more -Os-friendly algorithm for BB Reorder,
and I'm hoping I can meet this challenge.  But I need your help!

If you have any existing ideas or thoughts that could help me get
closer to a sensible heuristic sooner, then please post them to this
list.

In the mean time, here's my progress so far:

My first thought was to limit BB Reorder to hot functions, as identified
by gcov, so we could get maximum execution time benefit for minimised
code
size impact.  Based on benchmarking I've done so far, this looks to be a
winner, but it only works when profile data is available.

Without profile data, we face two main problems:

1) We cannot easily estimate how many times we will execute our more
efficient code-layout, so we can't do an accurate trade-off versus the
code size increase.

2) The traces that BB Reorder constructs will be based on static branch
prediction, rather than true dynamic flows of execution, so the new
layout may not be the best one in practice.

We can address #1 by tagging functions as hot (using attributes), but
that may not always be possible and it does not guarantee that we will
get minimal codesize increases, which is the main aim of this work.

I'm not sure how #2 can be addressed, so I'm planning to sidestep it
completely, since the problem isn't really the performance pay-off but
the codesize increase that usually comes with each new layout of a
function that BB Reorder makes.

My current plan is to characterise a function within find_traces()
(looking at things like the number of traces, edge probabilities and
frequencies, etc) and only call connect_traces() to effect the
reordering
change if these characteristics suggest that minimal code disruption
will
occur and/or maximum performance pay-off.

Thanks for reading and I look forward to your input!

Cheers,
Ian

RE: BB reorder forced off for -Os

2010-03-30 Thread Ian Bolton

 We're not able to enable BB reordering with -Os.  The behaviour is
 hard-coded via this if statement in rest_of_handle_reorder_blocks():
 
   if ((flag_reorder_blocks || flag_reorder_blocks_and_partition)
   /* Don't reorder blocks when optimizing for size because extra
 jump insns may
be created; also barrier may create extra padding.
 
More correctly we should have a block reordering mode that
tried
 to
minimize the combined size of all the jumps.  This would more
or
 less
automatically remove extra jumps, but would also try to use
more
 short
jumps instead of long jumps.  */
optimize_function_for_speed_p (cfun))
 {
   reorder_basic_blocks ();
 
 If you comment out the  optimize_function_for_speed_p (cfun) then
 BB reordering takes places as desired (although this isn't a solution
 obviously).
 
 In a private message Ian indicated that this had a small impact for
the
 ISA he's working with but a significant performance gain.  I tried the
 same thing with the ISA I work on (Ubicom32) and this change typically
 increased code sizes by between 0.1% and 0.3% but improved performance
 by anything from 0.8% to 3% so on balance this is definitely winning
 for most of our users (this for a couple of benchmarks, the Linux
 kernel, busybox and smbd).
 

It should be noted that commenting out the conditional to do with
optimising for speed will make BB reordering come on for all functions,
even cold ones, so I think whatever gains have come from making this
hacky change could increase further if BB reordering is set to
only come on for hot functions when compiling with -Os.  (Certainly
the code size increases could be minimised, whilst hopefully retaining
the performance gains.)

Note that I am in no way suggesting this should be the default
behaviour for -Os, but that it should be switchable via the
flags just like other optimisations are.  But, once it is switchable,
I expect choosing to turn it on for -Os should not cause universal
enabling of BB reordering for every function (as opposed to the current
universal disabling of BB reordering for every function), but a sensible
half-way point, based on heat, so that you get the performance wins with
minimal code size increases on selected functions.

Cheers,
Ian

BB reorder forced off for -Os

2010-03-23 Thread Ian Bolton

Is there any reason why BB reorder has been disabled
in bb-reorder.c for -Os, such that you can't even
turn it on with -freorder-blocks?

From what I've heard on this list in recent days,
BB reorder gives good performance wins such that
most people would still want it on even if it did
increase code size a little.

Cheers,
Ian

RE: Understanding Scheduling

2010-03-22 Thread Ian Bolton

 Enabling BB-reorder only if profile info is available, is not the
 right way to go. The compiler really doesn't place blocks in sane
 places without it -- and it shouldn't have to, either. For example if
 you split an edge at some point, the last thing you want to worry
 about, is where the new basic block is going to end up.
 
 There are actually a few bugs in Bugzilla about BB-reorder, FWIW.

I've done a few searches in Bugzilla and am not sure if I have found
the BB reorder bugs you are referring to.

The ones I have found are:

16797: Opportunity to remove unnecessary load instructions
41396: missed space optimization related to basic block reorder
21002: RTL prologue and basic-block reordering pessimizes delay-slot
   filling.

(If you can recall any others, I'd appreciate hearing of them.)

Based on 41396, it looks like BB reorder is disabled for -Os.  But
you said in your post above that the compiler really doesn't place
blocks in sane places without it, so does that mean that we could
probably increase performance for -Os if BB reorder was (improved)
and enabled for -Os?

Cheers,
Ian

Understanding Scheduling

2010-03-19 Thread Ian Bolton

Hi folks!

I've moved on from register allocation (see Understanding IRA thread)
and onto scheduling.

In particular, I am investigating the effectiveness of the sched1
pass on our architecture and the associated interblock-scheduling
optimisation.


Let's start with sched1 ...

For our architecture at least, it seems like Richard Earnshaw is
right that sched1 is generally bad when you are using -Os, because
it can increase register pressure and cause extra spill/fill code when
you move independent instructions in between dependent instructions.

For example:

LOAD c2,c1[0]
LOAD c3,c1[1]
ADD c2,c2,c3  # depends on LOAD above it (might stall)
LOAD c3,c1[2]
ADD c2,c2,c3  # depends on LOAD above it (might stall)
LOAD c3,c1[3]
ADD c2,c2,c3  # depends on LOAD above it (might stall)
LOAD c3,c1[4]
ADD c2,c2,c3  # depends on LOAD above it (might stall)

might become:

LOAD c2,c1[0]
LOAD c3,c1[1]
LOAD c4,c1[2] # independent of first two LOADS
LOAD c5,c1[3] # independent of first two LOADS
ADD c2,c2,c3  # not dependent on preceding two insns (avoids stall)
LOAD c3,c1[4]
ADD c2,c2,c4  # not dependent on preceding three insns (avoids stall)
...

This is a nice effect if your LOAD instructions have a latency of 3,
so this should lead to performance increases, and indeed this is
what I see for some low-reg-pressure Nullstone cases.  Turning
sched1 off therefore causes a regression on these cases.

However, this pipeline-type effect may increase your register
pressure such that caller-save regs are required and extra spill/fill
code needs to be generated.  This happens for some other Nullstone
cases, and so it is good to have sched1 turned off for them!

It's therefore looking like some kind of clever hybrid is required.

I mention all this because I was wondering which other architectures
have turned off sched1 for -Os?  More importantly, I was wondering
if anyone else had considered creating some kind of clever hybrid
that only uses sched1 when it will increase performance without
increasing register pressure?

Or perhaps I could make a heuristic based on the balanced-ness of the
tree?  (I see sched1 does a lot better if the tree is balanced, since
it has more options to play with.)


Now onto interblock-scheduling ...

As we all know, you can't have interblock-scheduling enabled unless
you use the sched1 pass, so if sched1 is off then interblock is
irrelevant.  For now, let's assume we are going to make some clever
hybrid that allows sched1 when we think it will increase performance
for Os and we are going to keep sched1 on for O2 and O3.

As I understand it, interblock-scheduling enlarges the scope of
sched1, such that you can insert independent insns from a
completely different block in between dependent insns in this
block.  As well as potentially amortizing stalls on high latency
insns, we also get the chance to do meatier work in the destination
block and leave less to do in the source block.  I don't know if this
is a deliberate effect of interblock-scheduling or if it is just
a happy side-effect.

Anyway, the reason I mention interblock-scheduling is that I see it
doing seemingly intelligent moves, but then the later BB-reorder pass
is juggling blocks around such that we end up with extra code inside
hot loops!  I assume this is because the scheduler and BB-reorderer
are largely ignorant of each other, and so good intentions on the
part of the former can be scuppered by the latter.

I was wondering if anyone else has witnessed this madness on their
architecture?  Maybe it is a bug with BB-reorder?  Or maybe it should
only be enabled when function profiling information (e.g. gcov) is
available?  Or maybe it is not a high-priority thing for anyone to
think about because no one uses interblock-scheduling?

If anyone can shed some light on the above, I'd greatly appreciate
it.  For now, I will continue my experiments with selective enabling
of sched1 for -Os.

Best regards,
Ian

RE: Understanding Scheduling

2010-03-19 Thread Ian Bolton

  Let's start with sched1 ...
 
  For our architecture at least, it seems like Richard Earnshaw is
  right that sched1 is generally bad when you are using -Os, because
  it can increase register pressure and cause extra spill/fill code
 when
  you move independent instructions in between dependent instructions.
 
 Please note that Vladimir Makarov implemented register pressure-aware
 sched1
 for GCC 4.5, activated with  -fsched-pressure.  I thought I should
 mention
 this because your e-mail omits it completely, so it's hard to tell
 whether you
 tested it.
 

Hi Alexander,

We are on GCC 4.4 at the moment.  I don't see us moving up in the
near future, but I could apply the patch and see what it does for us.

Cheers,
Ian

RE: Understanding Scheduling

2010-03-19 Thread Ian Bolton

  I mention all this because I was wondering which other architectures
  have turned off sched1 for -Os?  More importantly, I was wondering
  if anyone else had considered creating some kind of clever hybrid
  that only uses sched1 when it will increase performance without
  increasing register pressure?
 
 
 http://gcc.gnu.org/ml/gcc-patches/2009-09/msg3.html
 
 Another problem is that sched1 for architectures with few registers
can
 result in reload failure.  I tried to fix this in the patch mentioned
 above but I am not sure it is done for all targets and all possible
 programs.  The right solution for this would be implementing hard
 register spills in the reload.

I don't think we have so few registers that reload failure will occur,
so it might be worth me trying this.

 
 The mentioned above code does not work for RA based on priority
 coloring
 because register pressure calculation for intersected or nested
classes
 has a little sense.

Hmm.  Thanks for mentioning that.  As you might recall, we are using
priority coloring at the moment because it yielded better performance
than Chaitin-Briggs.  Well, the real reason CB was rejected was that
we were already using Priority and so moving to CB would cause
sufficient
disruption such that performance increases and decreases would be
inevitable.  I did get some gains, but I also got regressions that we
couldn't absorb at the time I did the work.

I might be revisiting CB soon though, as it did tend to yield smaller
code, which is becoming more important to us. 

 
 If scheduling for the target is very important (as for itanium or
 in-order execution power6), I'd recommend to look at the selective
 scheduler.

I don't think scheduling is highly important for us, but I will take
a look at the selective scheduler.

 
  Or perhaps I could make a heuristic based on the balanced-ness of
the
  tree?  (I see sched1 does a lot better if the tree is balanced,
since
  it has more options to play with.)
 
 
 
 The register pressure is already mostly minimized when shed1 starts to
 work.

I guess this is a factor in the unbalancedness of the tree.  The more
you balance it, the more likely it will get wider and require more
registers.  But the wider and more balanced, the more options for sched1
and the more chance of a performance win (assuming the increase in reg
pressure does not outweigh the scheduling performance win.)

  Now onto interblock-scheduling ...
 
  As we all know, you can't have interblock-scheduling enabled unless
  you use the sched1 pass, so if sched1 is off then interblock is
  irrelevant.  For now, let's assume we are going to make some clever
  hybrid that allows sched1 when we think it will increase performance
  for Os and we are going to keep sched1 on for O2 and O3.
 
  As I understand it, interblock-scheduling enlarges the scope of
  sched1, such that you can insert independent insns from a
  completely different block in between dependent insns in this
  block.  As well as potentially amortizing stalls on high latency
  insns, we also get the chance to do meatier work in the
destination
  block and leave less to do in the source block.  I don't know if
this
  is a deliberate effect of interblock-scheduling or if it is just
  a happy side-effect.
 
  Anyway, the reason I mention interblock-scheduling is that I see it
  doing seemingly intelligent moves, but then the later BB-reorder
pass
  is juggling blocks around such that we end up with extra code inside
  hot loops!  I assume this is because the scheduler and BB-reorderer
  are largely ignorant of each other, and so good intentions on the
  part of the former can be scuppered by the latter.
 
 
 That is right.  It would be nice if somebody solves the problem.

Hmm.  If we keep sched1 on, then maybe I will be the man to do it!

Best regards,
Ian

RE: Coloring problem - Pass 0 for finding allocno costs

2010-03-18 Thread Ian Bolton

 The problem I see is that for registers 100,101 I get best register
 class D instead of R - actually they get the same cost and D is chosen
 (maybe because it is first).

Hi Frank.

Do D and R overlap?  It would be useful to know which regs are in
which class, before trying to understand what is going on.

Can you paste an example of your define_insn from your MD file to show
how operands from D or R are both valid?  I ask this because it is
possible to express that D is more expensive than R with operand
constraints.

For general IRA info, you might like to look over my long thread on
here called Understanding IRA.

Cheers,
Ian

1 2 >

1 - 100 of 127 matches

Mail list logo