Re: [PATCH][MIPS] Enable load-load/store-store bonding

2015-03-30 Thread sameera

Hi!

Sorry for delay in sending this patch for review.
Please find attached updated patch.

In P5600, 2 consecutive loads/stores of same type which access contiguous memory locations are bonded together by instruction issue unit to dispatch 
single load/store instruction which accesses both locations. This allows 2X improvement in memory intensive code. This optimization can be performed 
for LH, SH, LW, SW, LWC, SWC, LDC, SDC instructions.


This patch adds peephole2 patterns to identify such loads/stores, and put them in parallel, so that the scheduler will not split it - thereby 
guaranteeing h/w level load/store bonding.


The patch is tested with dejagnu for correctness, and tested on hardware for 
performance.
Ok for trunk?

Changelog:
gcc/
* config/mips/mips.md (JOIN_MODE): New mode iterator.
(join2_load_Store): New pattern.
(join2_loadhi): Likewise.
(define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode
load-load and store-stores.
* config/mips/mips.opt (mload-store-pairs): New option.
(TARGET_LOAD_STORE_PAIRS): New macro.
*config/mips/mips.h (ENABLE_LD_ST_PAIRS): Likewise.
*config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype.
*config/mips/mips.c(mips_load_store_bonding_p): New function.

- Thanks and regards,
  Sameera D.

On Tuesday 24 June 2014 04:12 PM, Sameera Deshpande wrote:

Hi Richard,

Thanks for the review.
Please find attached updated patch after your review comments.

Changelog:
gcc/
* config/mips/mips.md (JOIN_MODE): New mode iterator.
(join2_load_Store): New pattern.
(join2_loadhi): Likewise.
(define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode
load-load and store-stores.
* config/mips/mips.opt (mload-store-pairs): New option.
(TARGET_LOAD_STORE_PAIRS): New macro.
*config/mips/mips.h (ENABLE_P5600_LD_ST_PAIRS): Likewise.
*config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype.
*config/mips/mips.c(mips_load_store_bonding_p): New function.

The change is tested with dejagnu with additional options -mload-store-pairs 
and -mtune=p5600.
The perf measurement is yet to finish.


We had offline discussion based on your comment. There is additional
view on the same.
Only ISAs mips32r2, mips32r3 and mips32r5 support P5600. Remaining
ISAs do not support P5600.
For mips32r2 (24K) and mips32r3 (micromips), load-store pairing is
implemented separately, and hence, as you suggested, P5600 Ld-ST
bonding optimization should not be enabled for them.
So, is it fine if I emit error for any ISAs other than mips32r2,
mips32r3 and mips32r5 when P5600 is enabled, or the compilation should
continue by emitting warning and disabling P5600?


No, the point is that we have two separate concepts: ISA and optimisation
target.  -mipsN and -march=N control the ISA (which instructions are
available) and -mtune=M controls optimisation decisions within the
constraints of that N, such as scheduling and the cost of things like
multiplication and division.

E.g. you could have -mips2 -mtune=p5600 -mfix-24k: generate MIPS II-
compatible code, optimise it for p5600, but make sure that 24k workarounds
are used.  The code would run correctly on any MIPS II-compatible processor
without known errata and also on the 24k.

Ok, disabled the peephole pattern for fix-24k and micromips - to allow specific 
patterns to be matched.


+
+mld-st-pairing
+Target Report Var(TARGET_ENABLE_LD_ST_PAIRING) Enable load/store
+pairing


Other options are just "TARGET_" + the captialised form of the option name,
so I'd prefer TARGET_LD_ST_PAIRING instead.  Although "ld" might be
misleading since it's an abbreviation for "load" rather than the LD instruction.
Maybe -mload-store-pairs, since plurals are more common than "-ing"?
Not sure that's a great suggestion though.

Renamed the option and corresponding macro as suggested.


Performance testing for this patch is not yet done.
If the patch proves beneficial in most of the testcases (which we
believe will do on P5600) we will enable this optimization by default
for P5600 - in which case this option can be removed.


OK.  Sending the patch for comments before performance testing is fine, but
I think it'd be better to commit the patch only after the testing is done, since
otherwise the patch might need to be tweaked.

I don't see any problem with keeping the option in case people want to
experiment with it.  I just think the patch should only go in once it can be
enabled by default for p5600.  I.e. the option would exist to turn off the
pairing.

Not having the option is fine too of course.

Yes, after perf analysis, I will share the results across, and then depending 
upon the impact, the decision can be made - whether to make the option as 
default or not, and then the patch will be submitted.


We should al

Re: [PATCH][MIPS] Enable load-load/store-store bonding

2015-04-19 Thread sameera

Gentle reminder!

- Thanks and regards,
  Sameera D.

On Monday 30 March 2015 04:58 PM, sameera wrote:

Hi!

Sorry for delay in sending this patch for review.
Please find attached updated patch.

In P5600, 2 consecutive loads/stores of same type which access contiguous 
memory locations are bonded together by instruction issue unit to dispatch
single load/store instruction which accesses both locations. This allows 2X 
improvement in memory intensive code. This optimization can be performed
for LH, SH, LW, SW, LWC, SWC, LDC, SDC instructions.

This patch adds peephole2 patterns to identify such loads/stores, and put them 
in parallel, so that the scheduler will not split it - thereby
guaranteeing h/w level load/store bonding.

The patch is tested with dejagnu for correctness, and tested on hardware for 
performance.
Ok for trunk?

Changelog:
gcc/
 * config/mips/mips.md (JOIN_MODE): New mode iterator.
 (join2_load_Store): New pattern.
 (join2_loadhi): Likewise.
 (define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode
 load-load and store-stores.
 * config/mips/mips.opt (mload-store-pairs): New option.
 (TARGET_LOAD_STORE_PAIRS): New macro.
 *config/mips/mips.h (ENABLE_LD_ST_PAIRS): Likewise.
 *config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype.
 *config/mips/mips.c(mips_load_store_bonding_p): New function.

- Thanks and regards,
   Sameera D.

On Tuesday 24 June 2014 04:12 PM, Sameera Deshpande wrote:

Hi Richard,

Thanks for the review.
Please find attached updated patch after your review comments.

Changelog:
gcc/
* config/mips/mips.md (JOIN_MODE): New mode iterator.
(join2_load_Store): New pattern.
(join2_loadhi): Likewise.
(define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode
load-load and store-stores.
* config/mips/mips.opt (mload-store-pairs): New option.
(TARGET_LOAD_STORE_PAIRS): New macro.
*config/mips/mips.h (ENABLE_P5600_LD_ST_PAIRS): Likewise.
*config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype.
*config/mips/mips.c(mips_load_store_bonding_p): New function.

The change is tested with dejagnu with additional options -mload-store-pairs 
and -mtune=p5600.
The perf measurement is yet to finish.


We had offline discussion based on your comment. There is additional
view on the same.
Only ISAs mips32r2, mips32r3 and mips32r5 support P5600. Remaining
ISAs do not support P5600.
For mips32r2 (24K) and mips32r3 (micromips), load-store pairing is
implemented separately, and hence, as you suggested, P5600 Ld-ST
bonding optimization should not be enabled for them.
So, is it fine if I emit error for any ISAs other than mips32r2,
mips32r3 and mips32r5 when P5600 is enabled, or the compilation should
continue by emitting warning and disabling P5600?


No, the point is that we have two separate concepts: ISA and optimisation
target.  -mipsN and -march=N control the ISA (which instructions are
available) and -mtune=M controls optimisation decisions within the
constraints of that N, such as scheduling and the cost of things like
multiplication and division.

E.g. you could have -mips2 -mtune=p5600 -mfix-24k: generate MIPS II-
compatible code, optimise it for p5600, but make sure that 24k workarounds
are used.  The code would run correctly on any MIPS II-compatible processor
without known errata and also on the 24k.

Ok, disabled the peephole pattern for fix-24k and micromips - to allow specific 
patterns to be matched.


+
+mld-st-pairing
+Target Report Var(TARGET_ENABLE_LD_ST_PAIRING) Enable load/store
+pairing


Other options are just "TARGET_" + the captialised form of the option name,
so I'd prefer TARGET_LD_ST_PAIRING instead.  Although "ld" might be
misleading since it's an abbreviation for "load" rather than the LD instruction.
Maybe -mload-store-pairs, since plurals are more common than "-ing"?
Not sure that's a great suggestion though.

Renamed the option and corresponding macro as suggested.


Performance testing for this patch is not yet done.
If the patch proves beneficial in most of the testcases (which we
believe will do on P5600) we will enable this optimization by default
for P5600 - in which case this option can be removed.


OK.  Sending the patch for comments before performance testing is fine, but
I think it'd be better to commit the patch only after the testing is done, since
otherwise the patch might need to be tweaked.

I don't see any problem with keeping the option in case people want to
experiment with it.  I just think the patch should only go in once it can be
enabled by default for p5600.  I.e. the option would exist to turn off the
pairing.

Not having the option is fine too of course.

Yes, after perf analysis, I will share the results across, and then depending 
upon the impact, the decision can be made - whether to make the option
as default or not, and then the patch

Re: [PATCH][MIPS] Enable load-load/store-store bonding

2015-05-11 Thread sameera

On Tuesday 21 April 2015 12:39 AM, Matthew Fortune wrote:

Sameera Deshpande  writes:

Gentle reminder!


Thanks Sameera. Just a couple of comments inline below and a question
for Catherine at the end.


- Thanks and regards,
Sameera D.

On Monday 30 March 2015 04:58 PM, sameera wrote:

Hi!

Sorry for delay in sending this patch for review.
Please find attached updated patch.

In P5600, 2 consecutive loads/stores of same type which access
contiguous memory locations are bonded together by instruction issue
unit to dispatch single load/store instruction which accesses both

locations. This allows 2X improvement in memory intensive code. This
optimization can be performed for LH, SH, LW, SW, LWC, SWC, LDC, SDC
instructions.


This patch adds peephole2 patterns to identify such loads/stores, and
put them in parallel, so that the scheduler will not split it -

thereby guaranteeing h/w level load/store bonding.


The patch is tested with dejagnu for correctness, and tested on

hardware for performance.

Ok for trunk?

Changelog:
gcc/
  * config/mips/mips.md (JOIN_MODE): New mode iterator.
  (join2_load_Store): New pattern.
  (join2_loadhi): Likewise.
  (define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-

mode

  load-load and store-stores.
  * config/mips/mips.opt (mload-store-pairs): New option.
  (TARGET_LOAD_STORE_PAIRS): New macro.
  *config/mips/mips.h (ENABLE_LD_ST_PAIRS): Likewise.
  *config/mips/mips-protos.h (mips_load_store_bonding_p): New

prototype.

  *config/mips/mips.c(mips_load_store_bonding_p): New function.


I don't know if this has been corrupted by mail clients but a single
space after '*' and a space before '('.


diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
index b48e04f..244eb8d 100644
--- a/gcc/config/mips/mips-protos.h
+++ b/gcc/config/mips/mips-protos.h
@@ -360,6 +360,7 @@ extern bool mips_epilogue_uses (unsigned int);
extern void mips_final_prescan_insn (rtx_insn *, rtx *, int);
extern int mips_trampoline_code_size (void);
extern void mips_function_profiler (FILE *);
+extern bool mips_load_store_bonding_p (rtx *, machine_mode, bool);

typedef rtx (*mulsidi3_gen_fn) (rtx, rtx, rtx);
#ifdef RTX_CODE
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 1733457..85f0591 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -18241,6 +18241,64 @@ umips_load_store_pair_p_1 (bool load_p, bool swap_p,
   return true;
}

+bool
+mips_load_store_bonding_p (rtx *operands, enum machine_mode mode, bool load_p)


Remove enum from machine_mode.


+{
+  rtx reg1, reg2, mem1, mem2, base1, base2;
+  enum reg_class rc1, rc2;
+  HOST_WIDE_INT offset1, offset2;
+
+  if (load_p)
+{
+  reg1 = operands[0];
+  reg2 = operands[2];
+  mem1 = operands[1];
+  mem2 = operands[3];
+}
+  else
+{
+  reg1 = operands[1];
+  reg2 = operands[3];
+  mem1 = operands[0];
+  mem2 = operands[2];
+}
+
+  if (mips_address_insns (XEXP (mem1, 0), mode, false) == 0
+  || mips_address_insns (XEXP (mem2, 0), mode, false) == 0)
+return false;
+
+  mips_split_plus (XEXP (mem1, 0), &base1, &offset1);
+  mips_split_plus (XEXP (mem2, 0), &base2, &offset2);
+
+  /* Base regs do not match.  */
+  if (!REG_P (base1) || !rtx_equal_p (base1, base2))
+return false;
+
+  /* Either of the loads is clobbering base register.  */
+  if (load_p
+  && (REGNO (reg1) == REGNO (base1)
+ || (REGNO (reg2) == REGNO (base1
+return false;


Can you add a comment saying that this case does not get bonded by
any known hardware even though it could be valid to bond them if it
is the second load that clobbers the base.


+  /* Loading in same registers.  */
+  if (load_p
+  && REGNO (reg1) == REGNO (reg2))
+return false;
+
+  /* The loads/stores are not of same type.  */
+  rc1 = REGNO_REG_CLASS (REGNO (reg1));
+  rc2 = REGNO_REG_CLASS (REGNO (reg2));
+  if (rc1 != rc2
+  && !reg_class_subset_p (rc1, rc2)
+  && !reg_class_subset_p (rc2, rc1))
+return false;
+
+  if (abs (offset1 - offset2) != GET_MODE_SIZE (mode))
+return false;
+
+  return true;
+}
+
/* OPERANDS describes the operands to a pair of SETs, in the order
dest1, src1, dest2, src2.  Return true if the operands can be used
in an LWP or SWP instruction; LOAD_P says which.  */
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index ec69ed5..1bd0dae 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -3147,3 +3147,7 @@ extern GTY(()) struct target_globals *mips16_globals;
#define STANDARD_STARTFILE_PREFIX_1 "/lib64/"
#define STANDARD_STARTFILE_PREFIX_2 "/usr/lib64/"
#endif
+
+#define ENABLE_LD_ST_PAIRS \
+  (TARGET_LOAD_STORE_PAIRS && TUNE_P5600 \
+   && !TARGET_MICROMIPS && !TARGET_FIX_24K)


I've already forgotten why these e

Re: [PATCH][MIPS] Enable load-load/store-store bonding

2015-05-11 Thread sameera

On Monday 11 May 2015 05:43 PM, Matthew Fortune wrote:

Hi Sameera,

Sameera Deshpande  writes:

Changelog:
gcc/
  * config/mips/mips.md (JOIN_MODE): New mode iterator.
  (join2_load_Store): New pattern.
  (join2_loadhi): Likewise.
  (define_peehole2): Add peephole2 patterns to join 2
HI/SI/SF/DF-mode
  load-load and store-stores.
  * config/mips/mips.opt (mload-store-pairs): New option.
  (TARGET_LOAD_STORE_PAIRS): New macro.
  * config/mips/mips.h (ENABLE_LD_ST_PAIRS): Likewise.
  * config/mips/mips-protos.h (mips_load_store_bonding_p): New
prototype.
  * config/mips/mips.c (mips_load_store_bonding_p): New function.

gcc/testsuite/
  * gcc.target/mips/p5600-bonding.c : New testcase to test
bonding.


Just 'New file.' is fine for the changelog.


diff --git a/gcc/testsuite/gcc.target/mips/p5600-bonding.c 
b/gcc/testsuite/gcc.target/mips/p5600-bonding.c
new file mode 100644
index 000..122b9f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/p5600-bonding.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-dp -mtune=p5600  -mno-micromips -mno-mips16" } */
+/* { dg-skip-if "Bonding needs peephole optimization." { *-*-* } { "-O0" "-O1" } { 
"" } } */
+typedef int VINT32 __attribute__ ((vector_size((16;
+
+void memory_operation_fun2_si(void * __restrict src, void * __restrict dest, 
int num)


Code style applies for testcases too, return type on line above, space
after function name, line length.


+{
+VINT32 *vsrc = (VINT32 *)src;


Indentation.


+VINT32 *vdest = (VINT32 *)dest;
+int i;
+
+for (i = 0; i < num - 1; i+=2)
+{


Indentation


+  vdest[i] = (vdest[i] + vsrc[i]);


Unnecessary brackets.


+  vdest[i + 1] = vdest[i + 1] + vsrc[i + 1];
+}
+}
+/* { dg-final { scan-assembler "join2_" } }  */
+


OK with those changes.

Thanks,
Matthew


Hi Matthew,

Thanks for the comments.
Please find attached updated patch.

I do not have permissions to apply the patch in GCC.
Can you please submit the patch for me?

Changelog:
gcc/
* config/mips/mips.md (JOIN_MODE): New mode iterator.
(join2_load_Store): New pattern.
(join2_loadhi): Likewise.
(define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode
load-load and store-stores.
* config/mips/mips.opt (mload-store-pairs): New option.
(TARGET_LOAD_STORE_PAIRS): New macro.
* config/mips/mips.h (ENABLE_LD_ST_PAIRS): Likewise.
* config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype.
* config/mips/mips.c (mips_load_store_bonding_p): New function.

gcc/testsuite/
* gcc.target/mips/p5600-bonding.c : New file.
diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
index b48e04f..244eb8d 100644
--- a/gcc/config/mips/mips-protos.h
+++ b/gcc/config/mips/mips-protos.h
@@ -360,6 +360,7 @@ extern bool mips_epilogue_uses (unsigned int);
 extern void mips_final_prescan_insn (rtx_insn *, rtx *, int);
 extern int mips_trampoline_code_size (void);
 extern void mips_function_profiler (FILE *);
+extern bool mips_load_store_bonding_p (rtx *, machine_mode, bool);
 
 typedef rtx (*mulsidi3_gen_fn) (rtx, rtx, rtx);
 #ifdef RTX_CODE
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index bf69850..4fc15c4 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -18241,6 +18241,66 @@ umips_load_store_pair_p_1 (bool load_p, bool swap_p,
   return true;
 }
 
+bool
+mips_load_store_bonding_p (rtx *operands, machine_mode mode, bool load_p)
+{
+  rtx reg1, reg2, mem1, mem2, base1, base2;
+  enum reg_class rc1, rc2;
+  HOST_WIDE_INT offset1, offset2;
+
+  if (load_p)
+{
+  reg1 = operands[0];
+  reg2 = operands[2];
+  mem1 = operands[1];
+  mem2 = operands[3];
+}
+  else
+{
+  reg1 = operands[1];
+  reg2 = operands[3];
+  mem1 = operands[0];
+  mem2 = operands[2];
+}
+
+  if (mips_address_insns (XEXP (mem1, 0), mode, false) == 0
+  || mips_address_insns (XEXP (mem2, 0), mode, false) == 0)
+return false;
+
+  mips_split_plus (XEXP (mem1, 0), &base1, &offset1);
+  mips_split_plus (XEXP (mem2, 0), &base2, &offset2);
+
+  /* Base regs do not match.  */
+  if (!REG_P (base1) || !rtx_equal_p (base1, base2))
+return false;
+
+  /* Either of the loads is clobbering base register.  It is legitimate to bond
+ loads if second load clobbers base register.  However, hardware does not
+ support such bonding.  */
+  if (load_p
+  && (REGNO (reg1) == REGNO (base1)
+	  || (REGNO (reg2) == REGNO (base1
+return false;
+
+  /* Loading in same registers.  */
+  if (load_p
+  && REGNO (reg1) == REGNO (reg2))
+return false;
+
+  /* The loads/stores are not of same type.  */
+  rc1 = REGNO_REG_CLASS (REGNO (reg1));
+  rc2 = REGNO_REG_CLASS

Re: [PATCH][MIPS] Enable load-load/store-store bonding

2015-05-12 Thread sameera

Hi Mike,

Thanks for your comments.
Please find my comments inlined.

- Thanks and regards,
  Sameera D.

On Monday 11 May 2015 10:09 PM, Mike Stump wrote:

On May 11, 2015, at 4:05 AM, sameera  wrote:

+(define_insn "*join2_loadhi"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (any_extend:SI (match_operand:HI 1 "non_volatile_mem_operand" "m")))
+   (set (match_operand:SI 2 "register_operand" "=r")
+   (any_extend:SI (match_operand:HI 3 "non_volatile_mem_operand" "m")))]
+  "ENABLE_LD_ST_PAIRS && reload_completed"
+  {
+/* Reg-renaming pass reuses base register if it is dead after bonded loads.
+   Hardware does not bond those loads, even when they are consecutive.
+   However, order of the loads need to be checked for correctness.  */
+if (!reg_overlap_mentioned_p (operands[0], operands[1]))
+  {
+   output_asm_insn ("lh\t%0,%1", operands);
+   output_asm_insn ("lh\t%2,%3", operands);
+  }
+else
+  {
+   output_asm_insn ("lh\t%2,%3", operands);
+   output_asm_insn ("lh\t%0,%1", operands);
+  }
+
+return "";
+  }
+  [(set_attr "move_type" "load")
+   (set_attr "insn_count" "2")])



However, unlike other architectures, we do not generate single instruction for 
bonded pair,


Actually, you do.  The above is 1 instruction pattern.  Doesn’t matter much 
what it prints as or what the CPU thinks of it.

The pattern is single, however, the asm code will have multiple instructions 
generated for the pattern.



because of which it is difficult to check if bonding is happening or not. 
Hence, an assembly file is generated with debug dumps, and the bonded 
loads/stores are identified by their pattern names.


Nothing wrong with that approach.  Also, in the assembly, one can look for 
sequences of instruction if they way.
Load/store bonding is not just contiguous load/store instructions, but they also need to have same base register and offset with specific difference. 
Hence, The way you suggested might not be useful always. Hence, I am comparing the pattern name instead.

See gcc/testsuite/gcc.target/aarch64/fuse_adrp_add_1.c:

   /* { dg-final { scan-assembler "adrp\tx.*, fixed_regs\n\tadd\tx.*, 
x.*fixed_regs" } } */

in the test suite for example.


I am trying FUSION for MIPS as suggested by Mike, and testing the perf impact 
of it along with other mips specific options.


I think you will discover it is virtually what you have now, and works better.  
The fusion just can peephole over greater distances, that’s the only real 
difference.
Yes, in many cases I see clear improvement. However, it also tries to bring loads/stores together, which were split intentionally by msched-weight 
option, introduced for MIPS. I need to measure performance and do perf tuning (if needed) for that option before sending it for review.




[RFA/ARM][Patch 01/02]: Thumb2 epilogue in RTL

2011-09-28 Thread Sameera Deshpande
Hi!

This patch generates Thumb2 epilogues in RTL form.

The work involves defining new functions, predicates and patterns along with
few changes in existing code:
* The load_multiple_operation predicate was found to be too restrictive for
integer loads as it required consecutive destination regs, so this
restriction was lifted.
* Variations of load_multiple_operation were required to handle cases 
   - where SP must be the base register 
   - where FP values were being loaded (which do require consecutive
destination registers)
   - where PC can be in register-list (which requires return pattern along
with register loads).
  Hence, the common code was factored out into a new function in arm.c and
parameterised to show 
   - whether consecutive destination regs are needed
   - the data type being loaded 
   - whether the base register has to be SP
   - whether PC is in register-list

The patch is tested with arm-eabi with no regressions.

ChangeLog:

2011-09-28  Ian Bolton 
Sameera Deshpande  
   
   * config/arm/arm-protos.h (load_multiple_operation_p): New
declaration.
 (thumb2_expand_epilogue): Likewise.
 (thumb2_output_return): Likewise
 (thumb2_expand_return): Likewise.
 (thumb_unexpanded_epilogue): Rename to... 
 (thumb1_unexpanded_epilogue): ...this 
   * config/arm/arm.c (load_multiple_operation_p): New function. 
 (thumb2_emit_multi_reg_pop): Likewise.
 (thumb2_emit_vfp_multi_reg_pop): Likewise.
 (thumb2_expand_return): Likewise. 
 (thumb2_expand_epilogue): Likewise. 
 (thumb2_output_return): Likewise
 (thumb_unexpanded_epilogue): Rename to...
 ( thumb1_unexpanded_epilogue): ...this
   * config/arm/arm.md (pop_multiple_with_stack_update): New pattern. 
 (pop_multiple_with_stack_update_and_return): Likewise.
 (thumb2_ldr_with_return): Likewise.
 (floating_point_pop_multiple_with_stack_update): Likewise.
 (return): Update condition and code for pattern.
 (arm_return): Likewise.
 (epilogue_insns): Likewise.
   * config/arm/predicates.md (load_multiple_operation): Update
predicate.
 (load_multiple_operation_stack_and_return): New predicate. 
 (load_multiple_operation_stack): Likewise.
 (load_multiple_operation_stack_fp): Likewise.
   * config/arm/thumb2.md (thumb2_return): Remove.
 (thumb2_rtl_epilogue_return): New pattern.


- Thanks and regards,
  Sameera D.

thumb2_rtl_epilogue_complete-27Sept.patch
Description: Binary data


Ping! Re: [RFA/ARM][Patch 01/02]: Thumb2 epilogue in RTL

2011-10-05 Thread Sameera Deshpande
Ping!

http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01854.html

On Wed, 2011-09-28 at 17:15 +0100, Sameera Deshpande wrote:
> Hi!
> 
> This patch generates Thumb2 epilogues in RTL form.
> 
> The work involves defining new functions, predicates and patterns along with
> few changes in existing code:
> * The load_multiple_operation predicate was found to be too restrictive for
> integer loads as it required consecutive destination regs, so this
> restriction was lifted.
> * Variations of load_multiple_operation were required to handle cases 
>- where SP must be the base register 
>- where FP values were being loaded (which do require consecutive
> destination registers)
>- where PC can be in register-list (which requires return pattern along
> with register loads).
>   Hence, the common code was factored out into a new function in arm.c and
> parameterised to show 
>- whether consecutive destination regs are needed
>- the data type being loaded 
>- whether the base register has to be SP
>- whether PC is in register-list
> 
> The patch is tested with arm-eabi with no regressions.
> 
> ChangeLog:
> 
> 2011-09-28  Ian Bolton 
> Sameera Deshpande  
>
>* config/arm/arm-protos.h (load_multiple_operation_p): New
> declaration.
>  (thumb2_expand_epilogue): Likewise.
>  (thumb2_output_return): Likewise
>  (thumb2_expand_return): Likewise.
>  (thumb_unexpanded_epilogue): Rename to... 
>  (thumb1_unexpanded_epilogue): ...this 
>* config/arm/arm.c (load_multiple_operation_p): New function. 
>  (thumb2_emit_multi_reg_pop): Likewise.
>  (thumb2_emit_vfp_multi_reg_pop): Likewise.
>  (thumb2_expand_return): Likewise. 
>  (thumb2_expand_epilogue): Likewise. 
>  (thumb2_output_return): Likewise
>  (thumb_unexpanded_epilogue): Rename to...
>  ( thumb1_unexpanded_epilogue): ...this
>* config/arm/arm.md (pop_multiple_with_stack_update): New pattern. 
>  (pop_multiple_with_stack_update_and_return): Likewise.
>  (thumb2_ldr_with_return): Likewise.
>  (floating_point_pop_multiple_with_stack_update): Likewise.
>  (return): Update condition and code for pattern.
>  (arm_return): Likewise.
>  (epilogue_insns): Likewise.
>* config/arm/predicates.md (load_multiple_operation): Update
> predicate.
>  (load_multiple_operation_stack_and_return): New predicate. 
>  (load_multiple_operation_stack): Likewise.
>  (load_multiple_operation_stack_fp): Likewise.
>* config/arm/thumb2.md (thumb2_return): Remove.
>  (thumb2_rtl_epilogue_return): New pattern.
> 
> 
> - Thanks and regards,
>   Sameera D.

-- 




Ping! Re: [RFA/ARM][Patch 02/02]: ARM epilogues in RTL

2011-10-05 Thread Sameera Deshpande
Ping!

http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01855.html

On Wed, 2011-09-28 at 17:15 +0100, Sameera Deshpande wrote:
> Hi!
> 
> This patch generates ARM epilogue in RTL form.
> 
> The work defines new functions and reuses most of the static functions and
> patterns defined in the previous patch (Thumb2 epilogues in RTL) with minor
> changes to handle mode specific details. 
> Hence, this patch depends completely on previous patch.
> 
> It is tested with arm-eabi with no regression.
> 
> ChangeLog:
> 
> 2011-09-28  Sameera Deshpande  
> 
> 
>* config/arm/arm-protos.h (arm_expand_epilogue): New declarations. 
>  (arm_expand_return): Likewise.
>  (thumb2_expand_epilogue): Add new boolean argument. 
>* config/arm/arm.c (print_multi_reg): Remove.
>  (vfp_output_fldmd): Likewise.
> 
>  (arm_output_epilogue): Likewise.
>  (output_return_instruction): Update the function. 
>  (thumb2_emit_multi_reg_pop): Rename to...
>  (arm_emit_multi_reg_pop): ...this 
>  (thumb2_emit_vfp_multi_reg_pop): Rename to...
>  (arm_emit_vfp_multi_reg_pop): ...this
>  (arm_emit_vfp_multi_reg_pop): Add new argument base_reg.
>  (arm_expand_return): New function.
> 
>  (arm_expand_epilogue): Likewise.
>  (thumb2_expand_epilogue): Add new argument is_sibling.
>* config/arm/arm.md (pop_multiple_with_stack_update): Update 
>  condition and code for pattern.
>  (arm_return): Likewise.
>  (pop_multiple_with_stack_update_and_return): Likewise.
>  (floating_point_pop_multiple_with_stack_update): Likewise.
>  (thumb2_ldr_with_return): Rename to...
>  (ldr_with_return): ...this
>  (ldr_with_return): Update condition.
>  (cond_return): Remove.
>  (cond_return_inverted): Likewise.
>  (return): Update code.
>  (epilogue): Likewise. 
>  (sibcall_epilogue): Likewise.
>  (epilogue_insns): Update condition and code.
> 
> 
> - Thanks and regards,
>   Sameera D.

-- 




[RFA/ARM][Patch 00/05]: Introduction - Generate LDRD/STRD in prologue/epilogue instead of PUSH/POP.

2011-10-11 Thread Sameera Deshpande
This series of 5 patches generate LDRD/STRD instead of POP/PUSH in
epilogue/prologue for ARM and Thumb-2 mode of A15.

Patch [1/5] introduces new field in tune which can be used to indicate
whether LDRD/STRD are preferred over POP/PUSH by the specific core.

Patches [2-5/5] use this field to determine if LDRD/STRD can be
generated instead of PUSH/POP in ARM and Thumb-2 mode.

Patch [2/5] generates LDRD instead of POP for Thumb-2 epilogue in A15.
This patch depends on patch [1/5].

Patch [3/5] generates STRD instead of PUSH for Thumb-2 prologue in A15.
This patch depends for variables, functions and patterns defined in
[1/5] and [2/5].

Patch [4/5] generates STRD instead of PUSH for ARM prologue in A15. This
patch depends on [1/5].

Patch [5/5] generates LDRD instead of POP for ARM epilogue in A15. This
patch depends for variables, functions and patterns defined in [1/5] and
[4/5].

All these patches depend upon the Thumb2/ARM RTL epilogue patches
http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01854.html,
http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01855.html submitted for
review.

All these patches are applied in given order and tested with check-gcc,
check-gdb and bootstrap without regression. 

In case of ARM mode, significant performance improvement can be seen on
some parts of a popular embedded consumer benchmark (~26%). 
However, in most of the cases, not much effect is seen on performance.
(~ 3% improvement) 

In case of thumb2, the performance improvement observed on same parts
the benchmark is ~11% (2.5% improvement). 

-- 






[RFA/ARM][Patch 01/05]: Create tune for Cortex-A15.

2011-10-11 Thread Sameera Deshpande
Hi!

This patch adds new field in tune_params to indicate if LDRD/STRD are
preferred over PUSH/POP in prologue/epilogue of specific core.
It also creates new tune for cortex-A15 and updates tunes for other
cores to set new field to default value. 

Changelog entry for Patch to create tune for cortex-a15:

2011-10-11  Sameera Deshpande
 

* config/arm/arm-cores.def (cortex_a15): Update.
* config/arm/arm-protos.h (struct tune_params): Add new field...
  (arm_gen_ldrd_strd): ... this.
* config/arm/arm.c (arm_slowmul_tune): Add 
  arm_gen_ldrd_strd field settings.
  (arm_fastmul_tune): Likewise.
  (arm_strongarm_tune): Likewise.
  (arm_xscale_tune): Likewise.
  (arm_9e_tune): Likewise.
  (arm_v6t2_tune): Likewise.
  (arm_cortex_tune): Likewise.
  (arm_cortex_a5_tune): Likewise.
  (arm_cortex_a9_tune): Likewise.
  (arm_fa726te_tune): Likewise. 
  (arm_cortex_a15_tune): New variable.
-- 


On Tue, 2011-10-11 at 10:08 +0100, Sameera Deshpande wrote:
> This series of 5 patches generate LDRD/STRD instead of POP/PUSH in
> epilogue/prologue for ARM and Thumb-2 mode of A15.
> 
> Patch [1/5] introduces new field in tune which can be used to indicate
> whether LDRD/STRD are preferred over POP/PUSH by the specific core.
> 
> Patches [2-5/5] use this field to determine if LDRD/STRD can be
> generated instead of PUSH/POP in ARM and Thumb-2 mode.
> 
> Patch [2/5] generates LDRD instead of POP for Thumb-2 epilogue in A15.
> This patch depends on patch [1/5].
> 
> Patch [3/5] generates STRD instead of PUSH for Thumb-2 prologue in A15.
> This patch depends for variables, functions and patterns defined in
> [1/5] and [2/5].
> 
> Patch [4/5] generates STRD instead of PUSH for ARM prologue in A15. This
> patch depends on [1/5].
> 
> Patch [5/5] generates LDRD instead of POP for ARM epilogue in A15. This
> patch depends for variables, functions and patterns defined in [1/5] and
> [4/5].
> 
> All these patches depend upon the Thumb2/ARM RTL epilogue patches
> http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01854.html,
> http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01855.html submitted for
> review.
> 
> All these patches are applied in given order and tested with check-gcc,
> check-gdb and bootstrap without regression. 
> 
> In case of ARM mode, significant performance improvement can be seen on
> some parts of a popular embedded consumer benchmark (~26%). 
> However, in most of the cases, not much effect is seen on performance.
> (~ 3% improvement) 
> 
> In case of thumb2, the performance improvement observed on same parts
> the benchmark is ~11% (2.5% improvement). 
> diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 742b5e8..1b42713 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -128,7 +128,7 @@ ARM_CORE("generic-armv7-a", genericv7a,	7A, FL_LDSCHED, cortex)
 ARM_CORE("cortex-a5",	  cortexa5,	7A, FL_LDSCHED, cortex_a5)
 ARM_CORE("cortex-a8",	  cortexa8,	7A, FL_LDSCHED, cortex)
 ARM_CORE("cortex-a9",	  cortexa9,	7A, FL_LDSCHED, cortex_a9)
-ARM_CORE("cortex-a15",	  cortexa15,	7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex)
+ARM_CORE("cortex-a15",	  cortexa15,	7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15)
 ARM_CORE("cortex-r4",	  cortexr4,	7R, FL_LDSCHED, cortex)
 ARM_CORE("cortex-r4f",	  cortexr4f,	7R, FL_LDSCHED, cortex)
 ARM_CORE("cortex-r5",	  cortexr5,	7R, FL_LDSCHED | FL_ARM_DIV, cortex)
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index f69bc42..c6b8f71 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -243,6 +243,9 @@ struct tune_params
   int l1_cache_line_size;
   bool prefer_constant_pool;
   int (*branch_cost) (bool, bool);
+  /* This flag indicates if STRD/LDRD instructions are preferred
+ over PUSH/POP in epilogue/prologue.  */
+  bool prefer_ldrd_strd;
 };
 
 extern const struct tune_params *current_tune;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6c09267..d709375 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -850,7 +850,8 @@ const struct tune_params arm_slowmul_tune =
   5,		/* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   true,		/* Prefer constant pool.  */
-  arm_default_branch_cost
+  arm_default_branch_cost,
+  false /* Prefer LDRD/STRD.  */
 };
 
 const struct tune_params arm_fastmul_tune =
@@ -861,7 +862,8 @@ const struct tune_params arm_fastmul_tune =
   5,		/* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   true,		/* Prefer constant pool.  */
-  arm_default_bran

[RFA/ARM][Patch 02/05]: LDRD generation instead of POP in A15 Thumb2 epilogue.

2011-10-11 Thread Sameera Deshpande
Hi!

This patch generates LDRD instead of POP for Thumb2 epilogue in A15. 

For optimize_size, original epilogue is generated for A15.
The work involves defining new functions, predicates and patterns.

As LDRD cannot be generated for PC, if PC is in register-list, LDRD is
generated for all other registers in the list which can form register
pair.
Then LDR with return is generated if PC is the only register left to be
popped, otherwise POP with return is generated.

The patch is tested with check-gcc, check-gdb and bootstrap with no
regression. 

Changelog entry for Patch to emit LDRD for thumb2 epilogue in A15:

2011-10-11  Sameera Deshpande
 

   
* config/arm/arm-protos.h (bad_reg_pair_for_thumb_ldrd_strd):
New 
  declaration.
* config/arm/arm.c (bad_reg_pair_for_thumb_ldrd_strd): New
helper 
  function.
  (thumb2_emit_ldrd_pop): New static function.
  (thumb2_expand_epilogue): Update functions.
* config/arm/constraints.md (Pz): New constraint. 
* config/arm/ldmstm.md (thumb2_ldrd_base): New pattern.
  (thumb2_ldrd): Likewise.
* config/arm/predicates.md (ldrd_immediate_operand): New
predicate.

-- 


diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index c6b8f71..06a67b5 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -202,6 +202,7 @@ extern void thumb_reload_in_hi (rtx *);
 extern void thumb_set_return_address (rtx, rtx);
 extern const char *thumb1_output_casesi (rtx *);
 extern const char *thumb2_output_casesi (rtx *);
+extern bool bad_reg_pair_for_thumb_ldrd_strd (rtx, rtx);
 #endif
 
 /* Defined in pe.c.  */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d709375..3eba510 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15410,6 +15410,155 @@ arm_emit_vfp_multi_reg_pop (int first_reg, int num_regs, rtx base_reg)
   par = emit_insn (par);
   add_reg_note (par, REG_FRAME_RELATED_EXPR, dwarf);
 }
+bool
+bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx src2)
+{
+  return (GET_CODE (src1) != REG
+  || GET_CODE (src2) != REG
+  || (REGNO (src1) == PC_REGNUM)
+  || (REGNO (src1) == SP_REGNUM)
+  || (REGNO (src1) == REGNO (src2))
+  || (REGNO (src2) == PC_REGNUM)
+  || (REGNO (src2) == SP_REGNUM));
+}
+
+/* Generate and emit a pattern that will be recognized as LDRD pattern.  If even
+   number of registers are being popped, multiple LDRD patterns are created for
+   all register pairs.  If odd number of registers are popped, last register is
+   loaded by using LDR pattern.  */
+static bool
+thumb2_emit_ldrd_pop (unsigned long saved_regs_mask, bool really_return)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, reg, tmp1;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+if (saved_regs_mask & (1 << i))
+  num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+  gcc_assert (really_return || ((saved_regs_mask & (1 << PC_REGNUM)) == 0));
+
+  if (really_return && (saved_regs_mask & (1 << PC_REGNUM)))
+/* We cannot generate ldrd for PC.  Hence, reduce the count if PC is
+   to be popped.  So, if num_regs is even, now it will become odd,
+   and we can generate pop with PC.  If num_regs is odd, it will be
+   even now, and ldr with return can be generated for PC.  */
+num_regs--;
+
+  for (i = 0, j = 0; i < (num_regs - (num_regs % 2)); j++)
+/* Var j iterates over all the registers to gather all the registers in
+   saved_regs_mask.  Var i gives index of saved registers in stack frame.
+   A PARALLEL RTX of register-pair is created here, so that pattern for
+   LDRD can be matched.  As PC is always last register to be popped, and
+   we have already decremented num_regs if PC, we don't have to worry
+   about PC in this loop.  */
+if (saved_regs_mask & (1 << j))
+  {
+gcc_assert (j != SP_REGNUM);
+
+/* Create RTX for memory load.  New RTX is created for dwarf as
+   they are not sharable.  */
+reg = gen_rtx_REG (SImode, j);
+tmp = gen_rtx_SET (SImode,
+   reg,
+   gen_frame_mem (SImode,
+   plus_constant (stack_pointer_rtx, 4 * i)));
+
+tmp1 = gen_rtx_SET (SImode,
+   reg,
+   gen_frame_mem (SImode,
+   plus_constant (stack_pointer_rtx, 4 * i)));
+RTX_FRAME_RELATED_P (tmp) = 1;
+RTX_FRAME_RELATED_P (tmp1) = 1;
+
+if (i % 2 == 0)
+  {
+/* When saved-register index (i) is even, the RTX to be emitted is
+   yet to be cre

[RFA/ARM][Patch 03/05]: STRD generation instead of PUSH in A15 Thumb2 prologue.

2011-10-11 Thread Sameera Deshpande
Hi!

This patch generates STRD instruction instead of PUSH in thumb2 mode for
A15.

For optimize_size, original prologue is generated for A15.
The work involves defining new functions, predicates and patterns.

The patch is tested with check-gcc, check-gdb and bootstrap with no
regression. 

Changelog entries for the patch for STRD generation for a15-thumb2:

2011-10-11  Sameera Deshpande
 

   
* config/arm/arm.c (thumb2_emit_strd_push): New static
function.  
  (arm_expand_prologue): Update. 
* config/arm/ldmstm.md (thumb2_strd): New pattern.
  (thumb2_strd_base): Likewise.
-- 


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 3eba510..fd8c31d 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15095,6 +15095,125 @@ arm_output_function_epilogue (FILE *file ATTRIBUTE_UNUSED,
 }
 }
 
+/* Generate and emit a pattern that will be recognized as STRD pattern.  If even
+   number of registers are being pushed, multiple STRD patterns are created for
+   all register pairs.  If odd number of registers are pushed, first register is
+   stored by using STR pattern.  */
+static void
+thumb2_emit_strd_push (unsigned long saved_regs_mask)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, reg, tmp1;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+if (saved_regs_mask & (1 << i))
+  num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+
+  /* Pre-decrement the stack pointer, based on there being num_regs 4-byte
+ registers to push.  */
+  tmp = gen_rtx_SET (VOIDmode,
+ stack_pointer_rtx,
+ plus_constant (stack_pointer_rtx, -4 * num_regs));
+  RTX_FRAME_RELATED_P (tmp) = 1;
+  insn = emit_insn (tmp);
+
+  /* Create sequence for DWARF info.  */
+  dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (num_regs + 1));
+
+  /* RTLs cannot be shared, hence create new copy for dwarf.  */
+  tmp1 = gen_rtx_SET (VOIDmode,
+ stack_pointer_rtx,
+ plus_constant (stack_pointer_rtx, -4 * num_regs));
+  RTX_FRAME_RELATED_P (tmp1) = 1;
+  XVECEXP (dwarf, 0, 0) = tmp1;
+
+  for (i = num_regs - 1, j = LAST_ARM_REGNUM; i >= (num_regs % 2); j--)
+/* Var j iterates over all the registers to gather all the registers in
+   saved_regs_mask.  Var i gives index of register R_j in stack frame.
+   A PARALLEL RTX of register-pair is created here, so that pattern for
+   STRD can be matched.  If num_regs is odd, 1st register will be pushed
+   using STR and remaining registers will be pushed with STRD in pairs.
+   If num_regs is even, all registers are pushed with STRD in pairs.
+   Hence, skip first element for odd num_regs.  */
+if (saved_regs_mask & (1 << j))
+  {
+gcc_assert (j != SP_REGNUM);
+gcc_assert (j != PC_REGNUM);
+
+/* Create RTX for store.  New RTX is created for dwarf as
+   they are not sharable.  */
+reg = gen_rtx_REG (SImode, j);
+tmp = gen_rtx_SET (SImode,
+   gen_frame_mem
+   (SImode,
+plus_constant (stack_pointer_rtx, 4 * i)),
+   reg);
+
+tmp1 = gen_rtx_SET (SImode,
+   gen_frame_mem
+   (SImode,
+plus_constant (stack_pointer_rtx, 4 * i)),
+   reg);
+RTX_FRAME_RELATED_P (tmp) = 1;
+RTX_FRAME_RELATED_P (tmp1) = 1;
+
+if (((i - (num_regs % 2)) % 2) == 1)
+  /* When (i - (num_regs % 2)) is odd, the RTX to be emitted is yet to
+ be created.  Hence create it first.  The STRD pattern we are
+ generating is :
+ [ (SET (MEM (PLUS (SP) (NUM))) (reg_t1))
+   (SET (MEM (PLUS (SP) (NUM + 4))) (reg_t2)) ]
+ were target registers need not be consecutive.  */
+  par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+
+/* Register R_j is added in PARALLEL RTX.  If (i - (num_regs % 2)) is
+   even, the reg_j is added as 0th element and if it is odd, reg_i is
+   added as 1st element of STRD pattern shown above.  */
+XVECEXP (par, 0, ((i - (num_regs % 2)) % 2)) = tmp;
+XVECEXP (dwarf, 0, (i + 1)) = tmp1;
+
+if (((i - (num_regs % 2)) % 2) == 0)
+  /* When (i - (num_regs % 2)) is even, RTXs for both the registers
+ to be loaded are generated in above given STRD pattern, and the
+ pattern can be emitted now.  */
+  emit_insn (par);
+
+i--;
+  }
+
+  if ((num_regs % 2) == 1)
+{
+  /* If odd number of registers are pushed, generate STR pattern to store
+ 

[RFA/ARM][Patch 04/05]: STRD generation instead of PUSH in A15 ARM prologue.

2011-10-11 Thread Sameera Deshpande
Hi!

This patch generates STRD instead of PUSH in prologue for A15 ARM mode.

For optimize_size, original prologue is generated for A15.
The work involves defining new functions, predicates and patterns, along
with minor changes in existing code:
* STRD in ARM mode needs consecutive registers to be stored. The
performance of compiler degrades greatly if R3 is pushed for stack
alignment as it generates single LDR for pushing R3. Instead, having SUB
instruction to do stack adjustment is more efficient. Hence, the
condition in arm_get_frame_offsets () is changed to disable push-in-R3
if prefer_ldrd_strd in ARM mode.

In this patch we keep on accumulating non-consecutive registers till
register-pair to be pushed is found. Then, first PUSH all the
accumulated registers, followed by STRD with pre-stack update for
register-pair. We repeat this until all the registers in register-list
are PUSHed.

The patch is tested with check-gcc, check-gdb and bootstrap with no
regression. 

Changelog entry for Patch to emit STRD for ARM prologue in A15:

2011-10-11  Sameera Deshpande
 
   
* config/arm/arm-protos.h (bad_reg_pair_for_arm_ldrd_strd): New
declaration.
* config/arm/arm.c (arm_emit_strd_push): New static function.  
  (bad_reg_pair_for_arm_ldrd_strd): New helper function.
  (arm_expand_prologue): Update. 
  (arm_get_frame_offsets): Update.
* config/arm/ldmstm.md (arm_strd_base): New pattern.
-- 


diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 06a67b5..d5287ad 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -162,6 +162,7 @@ extern const char *arm_output_memory_barrier (rtx *);
 extern const char *arm_output_sync_insn (rtx, rtx *);
 extern unsigned int arm_sync_loop_insns (rtx , rtx *);
 extern int arm_attr_length_push_multi(rtx, rtx);
+extern bool bad_reg_pair_for_arm_ldrd_strd (rtx, rtx);
 
 #if defined TREE_CODE
 extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index fd8c31d..08fa0d5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -93,6 +93,7 @@ static bool arm_assemble_integer (rtx, unsigned int, int);
 static void arm_print_operand (FILE *, rtx, int);
 static void arm_print_operand_address (FILE *, rtx);
 static bool arm_print_operand_punct_valid_p (unsigned char code);
+static rtx emit_multi_reg_push (unsigned long);
 static const char *fp_const_from_val (REAL_VALUE_TYPE *);
 static arm_cc get_arm_condition_code (rtx);
 static HOST_WIDE_INT int_log2 (HOST_WIDE_INT);
@@ -15095,6 +15096,116 @@ arm_output_function_epilogue (FILE *file ATTRIBUTE_UNUSED,
 }
 }
 
+/* STRD in ARM mode needs consecutive registers to be stored.  This function
+   keeps accumulating non-consecutive registers until first consecutive register
+   pair is found.  It then generates multi-reg PUSH for all accumulated
+   registers, and then generates STRD with write-back for consecutive register
+   pair.  This process is repeated until all the registers are stored on stack.
+   multi-reg PUSH takes care of lone registers as well.  */
+static void
+arm_emit_strd_push (unsigned long saved_regs_mask)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx tmp, tmp1;
+  unsigned long regs_to_be_pushed_mask;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+if (saved_regs_mask & (1 << i))
+  num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+
+  for (i=0, j = LAST_ARM_REGNUM, regs_to_be_pushed_mask = 0; i < num_regs; j--)
+/* Var j iterates over all registers to gather all registers in
+   saved_regs_mask.  Var i is used to count number of registers stored on
+   stack.  regs_to_be_pushed_mask accumulates non-consecutive registers
+   that can be pushed using multi-reg PUSH before STRD is generated.  */
+if (saved_regs_mask & (1 << j))
+  {
+gcc_assert (j != SP_REGNUM);
+gcc_assert (j != PC_REGNUM);
+i++;
+
+if ((j % 2 == 1)
+&& (saved_regs_mask & (1 << (j - 1)))
+&& regs_to_be_pushed_mask)
+  {
+/* Current register and previous register form register pair for
+   which STRD can be generated.  Hence, emit PUSH for accumulated
+   registers and reset regs_to_be_pushed_mask.  */
+insn = emit_multi_reg_push (regs_to_be_pushed_mask);
+regs_to_be_pushed_mask = 0;
+RTX_FRAME_RELATED_P (insn) = 1;
+continue;
+  }
+
+regs_to_be_pushed_mask |= (1 << j);
+
+if ((j % 2) == 0 && (saved_regs_mask & (1 << (j + 1
+  {
+/* We have found 2 consecutive registers, for whi

[RFA/ARM][Patch 05/05]: LDRD generation instead of POP in A15 ARM epilogue.

2011-10-11 Thread Sameera Deshpande
Hi!

This patch generates LDRD instead of POP in epilogue for A15 ARM mode.

For optimize_size, original epilogue is generated for A15.
The work involves defining new functions, predicates and patterns.

In this patch we keep on accumulating non-consecutive registers till
register-pair to be popped is found. Then, first POP all the accumulated
registers, followed by LDRD with post-stack update for register-pair. We
repeat this until all the registers in register-list are POPPed.

The patch is tested with check-gcc, check-gdb and bootstrap with no
regression.
 
Changelog entry for Patch to emit LDRD for ARM epilogue in A15:

2011-10-11  Sameera Deshpande
 
   
* config/arm/arm.c (arm_emit_ldrd_pop): New static function.  
  (arm_expand_epilogue): Update. 
* config/arm/ldmstm.md (arm_ldrd_base): New pattern.
  (arm_ldr_with_update): Likewise. 
-- 


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 08fa0d5..0b9fd93 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -967,7 +967,7 @@ const struct tune_params arm_cortex_a9_tune =
   ARM_PREFETCH_BENEFICIAL(4,32,32),
   false,	/* Prefer constant pool.  */
   arm_default_branch_cost,
-  false /* Prefer LDRD/STRD.  */
+  true  /* Prefer LDRD/STRD.  */
 };
 
 const struct tune_params arm_fa726te_tune =
@@ -15664,6 +15664,145 @@ bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx src2)
   || (REGNO (src2) == SP_REGNUM));
 }
 
+/* LDRD in ARM mode needs consecutive registers to be stored.  This function
+   keeps accumulating non-consecutive registers until first consecutive register
+   pair is found.  It then generates multi-reg POP for all accumulated
+   registers, and then generates LDRD with write-back for consecutive register
+   pair.  This process is repeated until all the registers are loaded from
+   stack.  multi-reg POP takes care of lone registers as well.  However, LDRD
+   cannot be generated for PC, as results are unpredictable.  Hence, if PC is
+   in SAVED_REGS_MASK, generate multi-reg POP with RETURN or LDR with RETURN
+   depending upon number of registers in REGS_TO_BE_POPPED_MASK.  */
+static void
+arm_emit_ldrd_pop (unsigned long saved_regs_mask, bool really_return)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, tmp1;
+  unsigned long regs_to_be_popped_mask = 0;
+  bool pc_in_list = false;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+if (saved_regs_mask & (1 << i))
+  num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+
+  for (i = 0, j = 0; i < num_regs; j++)
+if (saved_regs_mask & (1 << j))
+  {
+i++;
+if ((j % 2) == 0
+&& (saved_regs_mask & (1 << (j + 1)))
+&& (j + 1) != SP_REGNUM
+&& (j + 1) != PC_REGNUM
+&& regs_to_be_popped_mask)
+  {
+/* Current register and next register form register pair for which
+   LDRD can be generated.  Generate POP for accumulated registers
+   and reset regs_to_be_popped_mask.  SP should be handled here as
+   the results are unpredictable if register being stored is same
+   as index register (in this case, SP).  PC is always the last
+   register being popped.  Hence, we don't have to worry about PC
+   here.  */
+arm_emit_multi_reg_pop (regs_to_be_popped_mask, pc_in_list);
+pc_in_list = false;
+regs_to_be_popped_mask = 0;
+continue;
+  }
+
+if (j == PC_REGNUM)
+  {
+gcc_assert (really_return);
+pc_in_list = 1;
+  }
+
+regs_to_be_popped_mask |= (1 << j);
+
+if ((j % 2) == 1
+&& (saved_regs_mask & (1 << (j - 1)))
+&& j != SP_REGNUM
+&& j != PC_REGNUM)
+  {
+ /* Generate a LDRD for register pair R_, R_.  The pattern
+generated here is
+[(SET SP, (PLUS SP, 8))
+ (SET R_, (MEM SP))
+ (SET R_, (MEM (PLUS SP, 4)))].  */
+ par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));
+ dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (3));
+
+ tmp = gen_rtx_SET (VOIDmode,
+stack_pointer_rtx,
+plus_constant (stack_pointer_rtx, 8));
+ tmp1 = gen_rtx_SET (VOIDmode,
+ stack_pointer_rtx,
+ plus_constant (stack_pointer_rtx, 8));
+ RTX_FRAME_RELATED_P (tmp) = 1;
+

[wwwdocs] Add branch description for new branch unified-autovect

2016-07-08 Thread Sameera Deshpande
Hi!

I have created new branch unified-autovect based on ToT.

Please find attached the patch adding information about new branch 
"unified-autovect" in the documentation.
Is it ok to commit?

- Thanks and regards,
  Sameera D.

unified-autovec-doc.patch
Description: unified-autovec-doc.patch


[unified-autovect: Patch 2/N] Implementation of k-arity promotion/reduction

2017-02-05 Thread Sameera Deshpande
Hi Richard,

Sorry for delayed patch submission. I was on maternity leave, so could not post 
earlier.
Here is the previous mail for your reference: 
https://gcc.gnu.org/ml/gcc/2016-06/msg00043.html

Please find attached the patch for stage 2: implementation of k-arity 
promotion/reduction in the series "Improving effectiveness and generality of 
autovectorization using unified representation".

The permute nodes within primitive reorder tree(PRT) generated from input 
program can have any arity depending upon stride of accesses. However, the 
target cannot have instructions to support all arities. Hence, we need to 
promote or reduce the arity of PRT to enable successful tree tiling.

In classic autovectorization, if vectorization stride > 2, arity reduction is 
performed by generating cascaded extract and interleave instructions as 
described by "Auto-vectorization of Interleaved Data for SIMD" by D. Nuzman, I. 
Rosen and A. Zaks.  

Moreover, to enable SLP across loop, "Loop-aware SLP in GCC" by D. Nuzman, I. 
Rosen and A. Zaks unrolls loop till stride = vector size.

k-arity reduction/promotion algorithm makes use of modulo arithmetic to 
generate PRT of desired arity for both above-mentioned cases.

Single ILV node of arity k can be reduced into cascaded ILV nodes with single 
node of arity m with children of arity k/m such that ith child of original ILV 
node becomes floor (i/m) th child of (i%m) th child of new parent.

Single EXTR node with k parts and i selector can be reduced into cascaded EXTR 
nodes such that parent EXTR node has m parts and i/(k/m) selection on child 
EXTR node with k/m parts and i % (k/m) selection.

Similarly, loop unrolling to get desired arity m can be represented as arity 
promotion from k to m.

Single ILV node of arity k can be promoted to single ILV node of arity m by 
adding extraction with m/k parts and selection i/k of i%k the child of original 
tree as ith child of new ILV node.

To enable loop-aware SLP, we first promote arity of input PRT to maximum vector 
size permissible on the architecture. This can have impact on vector code size, 
though performance will be the same. However, to allow variable vector size 
like SVE in NEON, it is necessary.

Later we apply arity promotion reduction algorithm on the output tree to get 
tree with desired arity. For now, we are supporting target arity = 2, as most 
of the architectures have support for that. However, the code can be extended 
for additional arity supports as well.

I have tested the code with handwritten testcases for correctness.
Do you spot any problem in the logic or arithmetic that I am performing for 
reduction/promotion? If not, will push this patch on the branch that we have 
created - unified-autovect.

- Thanks and regards,
  Sameera D.Index: gcc/Makefile.in
===
--- gcc/Makefile.in	(revision 243687)
+++ gcc/Makefile.in	(working copy)
@@ -1529,6 +1529,7 @@
 	tree-vect-slp.o \
 	tree-vectorizer.o \
 	tree-vect-unified.o \
+	tree-vect-unified-opts.o \
 	tree-vrp.o \
 	tree.o \
 	valtrack.o \
Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c	(revision 238158)
+++ gcc/tree-vect-data-refs.c	(working copy)
@@ -136,16 +136,9 @@
   return scalar_type;
 }
 
-
-/* Insert DDR into LOOP_VINFO list of ddrs that may alias and need to be
-   tested at run-time.  Return TRUE if DDR was successfully inserted.
-   Return false if versioning is not supported.  */
-
-static bool
-vect_mark_for_runtime_alias_test (ddr_p ddr, loop_vec_info loop_vinfo)
+bool
+vect_mark_for_runtime_alias_test_1 (ddr_p ddr, loop *loop)
 {
-  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-
   if ((unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS) == 0)
 return false;
 
@@ -189,11 +182,28 @@
   return false;
 }
 
-  LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).safe_push (ddr);
   return true;
 }
 
 
+
+/* Insert DDR into LOOP_VINFO list of ddrs that may alias and need to be
+   tested at run-time.  Return TRUE if DDR was successfully inserted.
+   Return false if versioning is not supported.  */
+
+static bool
+vect_mark_for_runtime_alias_test (ddr_p ddr, loop_vec_info loop_vinfo)
+{
+  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  bool is_alias;
+
+  is_alias = vect_mark_for_runtime_alias_test_1 (ddr, loop);
+  if (is_alias)
+LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).safe_push (ddr);
+  return is_alias;
+}
+
+
 /* Function vect_analyze_data_ref_dependence.
 
Return TRUE if there (might) exist a dependence between a memory-reference
Index: gcc/tree-vect-unified-opts.c
===
--- gcc/tree-vect-unified-opts.c	(revision 0)
+++ gcc/tree-vect-unified-opts.c	(working copy)
@@ -0,0 +1,391 @@
+/* lOOP Vectorization using unified representation
+the terms of the GNU General Public

[PATCH, MIPS] Calling convention differs depending on the presence of MSA

2017-02-08 Thread Sameera Deshpande
Hi Matthew,

Please find attached the patch to fix the calling convention issue, 
where argument and result passing convention differed for MSA and 
non-MSA variants.

The implementation of TARGET_RETURN_IN_MEMORY is altered to block V4SF to be 
returned in registers.

Ok for trunk?

- Thanks and regards,
  Sameera D.


Changelog:
gcc/
* config/mips/mips.c (mips_return_in_memory) : Restrict V4SFmode to be 
returned in registers.

gcc/testsuite/
* gcc.target/mips/msa-fp-cc.c : New testcase.

RE: [PATCH, MIPS] Calling convention differs depending on the presence of MSA

2017-02-11 Thread Sameera Deshpande
Hi Matthew,

Please find attached updated patch as per our offline discussion.

I have disabled return in registers for all vector float types, and updated the 
test case accordingly.

Ok for trunk?

- Thanks and regards,
  Sameera D.

From: Sameera Deshpande
Sent: 08 February 2017 14:10:52
To: Matthew Fortune
Cc: gcc-patches@gcc.gnu.org
Subject: [PATCH, MIPS] Calling convention differs depending on the presence of 
MSA

Hi Matthew,

Please find attached the patch to fix the calling convention issue,
where argument and result passing convention differed for MSA and
non-MSA variants.

The implementation of TARGET_RETURN_IN_MEMORY is altered to block V4SF to be 
returned in registers.

Ok for trunk?

- Thanks and regards,
  Sameera D.


Changelog:
gcc/
* config/mips/mips.c (mips_return_in_memory) : Restrict V4SFmode to be 
returned in registers.

gcc/testsuite/
* gcc.target/mips/msa-fp-cc.c : New testcase.

fix_calling_convention.patch
Description: fix_calling_convention.patch


[unified-autovect: Patch 1b/N] Instruction tile and grammar creation.

2017-04-06 Thread Sameera Deshpande
ase see if this looks correct, or do I need additional 
information to successfully generate pattern matcher automatically?

Also, can you please comment on usability or scalability of this approach 
across all the architectures or point me to appropriate people in the group 
with whom I can discuss target specific vectorization issues?

- Thanks and regards,
  Sameera D.Index: gcc/Makefile.in
===
--- gcc/Makefile.in	(revision 246613)
+++ gcc/Makefile.in	(working copy)
@@ -1067,7 +1067,12 @@
 	build/print-rtl.o build/hash-table.o
 BUILD_MD = build/read-md.o
 BUILD_ERRORS = build/errors.o
+BUILD_UNITED = build/vec.o build/hash-table.o build/errors.o \
+	   build/ggc-none.o \
+	   build/tree-vect-unified-common.o build/tree-vect-unified-opts.o
 
+build/tree-vect-unified-common.o : tree-vect-unified-common.c gtype-desc.h insn-codes.h
+build/tree-vect-unified-opts.o : tree-vect-unified-opts.c gtype-desc.h insn-codes.h
 # Specify the directories to be searched for header files.
 # Both . and srcdir are used, in that order,
 # so that *config.h will be found in the compilation
@@ -2207,7 +2212,7 @@
   insn-emit.c insn-recog.c insn-extract.c insn-output.c insn-peep.c \
   insn-attr.h insn-attr-common.h insn-attrtab.c insn-dfatab.c \
   insn-latencytab.c insn-preds.c gimple-match.c generic-match.c \
-  insn-target-def.h
+  insn-target-def.h insn-vect-inst-tiles.h
 
 # Dependencies for the md file.  The first time through, we just assume
 # the md file itself and the generated dependency file (in order to get
@@ -2234,7 +2239,8 @@
 			  insn-extract.c insn-output.c \
 			  insn-peep.c insn-recog.c
 
-simple_generated_h	= $(simple_rtl_generated_h) insn-constants.h
+simple_generated_h	= $(simple_rtl_generated_h) insn-constants.h \
+			  insn-vect-inst-tiles.h
 
 simple_generated_c	= $(simple_rtl_generated_c) insn-enums.c
 
@@ -2602,6 +2608,8 @@
   $(GENSUPPORT_H)
 build/rtl.o: rtl.c $(BCONFIG_H) coretypes.h $(GTM_H) $(SYSTEM_H)	\
   $(RTL_H) $(GGC_H) errors.h
+build/tree.o: tree.c $(BCONFIG_H) coretypes.h $(GTM_H) $(SYSTEM_H)	\
+  $(RTL_H) $(GGC_H) errors.h
 build/vec.o : vec.c $(BCONFIG_H) $(SYSTEM_H) coretypes.h $(VEC_H)	\
$(GGC_H) toplev.h $(DIAGNOSTIC_CORE_H)
 build/hash-table.o : hash-table.c $(BCONFIG_H) $(SYSTEM_H) coretypes.h  \
@@ -2655,6 +2663,9 @@
   coretypes.h $(GTM_H) $(RTL_BASE_H) errors.h $(READ_MD_H) $(GENSUPPORT_H)	\
   $(HASH_TABLE_H) target-insns.def
 build/gengenrtl.o : gengenrtl.c $(BCONFIG_H) $(SYSTEM_H) rtl.def
+build/genvect-inst-tiles.o : genvect-inst-tiles.c $(RTL_BASE_H) $(BCONFIG_H)\
+  $(SYSTEM_H) coretypes.h $(GTM_H) errors.h tree-vect-unified.h \
+  tree-vect-unified-opts.o tree-vect-unified-common.o
 
 # The gengtype generator program is special: Two versions are built.
 # One is for the build machine, and one is for the host to allow
@@ -2732,8 +2743,11 @@
 genprogerr = $(genprogmd) genrtl modes gtype hooks cfn-macros
 $(genprogerr:%=build/gen%$(build_exeext)): $(BUILD_ERRORS)
 
+genprogunited = vect-inst-tiles
+$(genprogunited:%=build/gen%$(build_exeext)): $(BUILD_UNITED) 
+
 # Remaining build programs.
-genprog = $(genprogerr) check checksum condmd match
+genprog = $(genprogerr) $(genprogunited) check checksum condmd match
 
 # These programs need libs over and above what they get from the above list.
 build/genautomata$(build_exeext) : BUILD_LIBS += -lm
Index: gcc/config/mips/mips.h
===
--- gcc/config/mips/mips.h	(revision 246613)
+++ gcc/config/mips/mips.h	(working copy)
@@ -3468,4 +3468,37 @@
   (TARGET_LOAD_STORE_PAIRS && (TUNE_P5600 || TUNE_I6400) \
&& !TARGET_MICROMIPS && !TARGET_FIX_24K)
 
+#define TARGET_VEC_PERM_CONST_ORDER \
+{ \
+  {2, 2, 2, (int[2]){0,2}, 1, "PCKEV.D", "RRR", NULL, NULL}, \
+  {2, 2, 2, (int[2]){1,3}, 1, "PCKOD.D", "RRR", NULL, NULL}, \
+\
+  {2, 4, 4, (int[4]){0,4,2,6}, 1, "ILVEV.W", "RRR", NULL, NULL}, \
+  {2, 4, 4, (int[4]){1,5,3,7}, 1, "ILVOD.W", "RRR", NULL, NULL}, \
+  {2, 4, 4, (int[4]){0,2,4,6}, 1, "PCKEV.W", "RRR", NULL, NULL}, \
+  {2, 4, 4, (int[4]){1,3,5,7}, 1, "PCKOD.W", "RRR", NULL, NULL}, \
+  {2, 4, 4, (int[4]){2,6,3,7}, 1, "ILVL.W", "RRR", NULL, NULL}, \
+  {2, 4, 4, (int[4]){0,4,1,5}, 1, "ILVR.W", "RRR", NULL, NULL}, \
+\
+  {2, 8, 8, (int[8]){0,8,2,10,4,12,6,14}, 1, "ILVEV.H", "RRR", NULL, NULL}, \
+  {2, 8, 8, (int[8]){1,9,3,11,5,13,7,15}, 1, "ILVOD.H", "RRR", NULL, NULL}, \
+  {2, 8, 8, (int[8]){0,2,4,6,8,10,12,14}, 1, "PCKEV.H", "RRR", NULL, NULL}, \
+  {2, 8, 8, (int[8]){1,3,5,7,9,11,13,15}, 1, "PCKOD.H", "RRR", NULL, NULL}, \
+  {2, 8, 8, (int[8]){0,8,1,9,2,10,3,11}, 1, "ILVR.H", "RRR", NULL, NULL}, \
+  {2

[PATCH][MIPS] Enable load-load/store-store bonding

2014-06-19 Thread Sameera Deshpande
Hi Richard,

Please find attached the patch implementing load-load/store-store bonding 
supported by P5600.

In P5600, 2 consecutive loads/stores of same type which access contiguous 
memory locations are bonded together by instruction issue unit to dispatch 
single load/store instruction which accesses both locations. This allows 2X 
improvement in memory intensive code. This optimization can be performed for 
LH, SH, LW, SW, LWC, SWC, LDC, SDC instructions.

This patch adds peephole2 patterns to identify such loads/stores, and put them 
in parallel, so that the scheduler will not split it - thereby guarantying h/w 
level load/store bonding.

The patch is tested with dejagnu for correctness.
Local testing on hardware for perf is  currently going on.
Ok for trunk? 

Changelog:
gcc/
* config/mips/mips.md (JOINLDST1): New mode iterator.
(insn_type): New mode attribute.
(reg): Update mode attribute.
(join2_load_Store): New pattern.
(join2_loadhi): Likewise.
(join2_storehi): Likewise.
(define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode
load-load and store-stores.
* config/mips/mips.opt (mld-st-pairing): New option.
* config/mips/mips.c (mips_option_override): New exception.
*config/mips/mips.h (ENABLE_LD_ST_PAIRING): New macro.

- Thanks and regards,
   Sameera D.



load-store-pairing.patch
Description: load-store-pairing.patch


RE: [PATCH][MIPS] Enable load-load/store-store bonding

2014-06-23 Thread Sameera Deshpande
Hi Richard,

Thanks for your comments. I am working on the review comments, and will share 
the reworked patch soon.
However, here is clarification on some of the issues raised.

> > +  if (TARGET_FIX_24K && TUNE_P5600)
> > +error ("unsupported combination: %s", "-mtune=p5600 -mfix-24k");
> > +
> >/* Save the base compression state and process flags as though we
> >   were generating uncompressed code.  */
> >mips_base_compression_flags = TARGET_COMPRESSION;
> 
> Although it's a bit of an odd combination, we need to accept -mfix-24k -
> mtune=p5600 and continue to implement the 24k workarounds.
> The idea is that a distributor can build for a common base architecture, add -
> mfix- options for processors that might run the code, and add -mtune= for
> the processor that's most of interest optimisation-wise.
> 
> We should just make the pairing of stores conditional on !TARGET_FIX_24K.
We had offline discussion based on your comment. There is additional view on 
the same.
Only ISAs mips32r2, mips32r3 and mips32r5 support P5600. Remaining ISAs do not 
support P5600. 
For mips32r2 (24K) and mips32r3 (micromips), load-store pairing is implemented 
separately, and hence, as you suggested, P5600 Ld-ST bonding optimization 
should not be enabled for them.
So, is it fine if I emit error for any ISAs other than mips32r2, mips32r3 and 
mips32r5 when P5600 is enabled, or the compilation should continue by emitting 
warning and disabling P5600?
Also, the optimization will be enabled only if !TARGET_FIX_24K && 
!TARGET_MICTOMIPS as suggested by you.

> > +
> > +#define ENABLE_LD_ST_PAIRING \
> > +  (TARGET_ENABLE_LD_ST_PAIRING && TUNE_P5600)
> 
> The patch requires -mld-st-pairing to be passed explicitly even for -
> mtune=p5600.  Is that because it's not a consistent enough win for us to
> enable it by default?  It sounded from the description like it should be an
> improvement more often that not.
> 
> We should allow pairing even without -mtune=p5600.
Performance testing for this patch is not yet done. 
If the patch proves beneficial in most of the testcases (which we believe will 
do on P5600) we will enable this optimization by default for P5600 - in which 
case this option can be removed.

> 
> Are QImodes not paired in the same way?  If so, it'd be worth adding a
> comment above the define_mode_iterator saying that QI is deliberately
> excluded.
The P5600 datasheet mentions bonding of load/stores in HI, SI, SF and DF modes 
only. Hence QI mode is excluded. I will add the comment on the iterator.

- Thanks and regards,
   Sameera D.



RE: [PATCH][MIPS] Enable load-load/store-store bonding

2014-06-24 Thread Sameera Deshpande
dingly.
In order to allow d/f for both register classes, the pattern 
join2_load_store was altered a bit which eliminated this mode iterator.

> 
> Outer (parallel ...)s are redundant in a define_insn.
Removed.

> 
> It would be better to add the mips_load_store_insns for each operand
> rather than multiplying one of them by 2.  Or see the next bit for an
> alternative.
Using the alternative method as you suggested, so this change is not needed.

> Please instead add HI to the define_mode_iterator so that we can use the
> same peephole and define_insn.
Added HI in the mode iterator to eliminate join2_storehi pattern and 
corresponding peephole2.
As arithmetic operations on HImode is not supported, we generate zero or sign 
extended loads in such cases. 
To handle that case, join2_loadhi pattern is kept.

- Thanks and regards,
   Sameera D.



load-store-pairing.patch
Description: load-store-pairing.patch


Re: [Aarch64] Fix conditional branches with target far away.

2018-07-31 Thread Sameera Deshpande
On Mon 9 Apr, 2018, 2:06 PM Sameera Deshpande, 
wrote:

> Hi Richard,
>
> I do not see the said patch applied in ToT yet. When do you expect it
> to be available in ToT?
>
> - Thanks and regards,
>   Sameera D.
>
> On 30 March 2018 at 17:01, Sameera Deshpande
>  wrote:
> > Hi Richard,
> >
> > The testcase is working with the patch you suggested, thanks for
> > pointing that out.
> >
> > On 30 March 2018 at 16:54, Sameera Deshpande
> >  wrote:
> >> On 30 March 2018 at 16:39, Richard Sandiford
> >>  wrote:
> >>>> Hi Sudakshina,
> >>>>
> >>>> Thanks for pointing that out. Updated the conditions for attribute
> >>>> length to take care of boundary conditions for offset range.
> >>>>
> >>>> Please find attached the updated patch.
> >>>>
> >>>> I have tested it for gcc testsuite and the failing testcase. Ok for
> trunk?
> >>>>
> >>>> On 22 March 2018 at 19:06, Sudakshina Das  wrote:
> >>>>> Hi Sameera
> >>>>>
> >>>>> On 22/03/18 02:07, Sameera Deshpande wrote:
> >>>>>>
> >>>>>> Hi Sudakshina,
> >>>>>>
> >>>>>> As per the ARMv8 ARM, for the offset range (-1048576 ,1048572), the
> >>>>>> far branch instruction offset is inclusive of both the offsets.
> Hence,
> >>>>>> I am using <=||=> and not <||>= as it was in previous
> implementation.
> >>>>>
> >>>>>
> >>>>> I have to admit earlier I was only looking at the patch mechanically
> and
> >>>>> found a difference with the previous implementation in offset
> comparison.
> >>>>> After you pointed out, I looked up the ARMv8 ARM and I have a couple
> of
> >>>>> doubts:
> >>>>>
> >>>>> 1. My understanding is that any offset in [-1048576 ,1048572] both
> inclusive
> >>>>> qualifies as an 'in range' offset. However, the code for both
> attribute
> >>>>> length and far_branch has been using [-1048576 ,1048572), that is, (
> >= && <
> >>>>> ). If the far_branch was incorrectly calculated, then maybe the
> length
> >>>>> calculations with similar magic numbers should also be corrected? Of
> course,
> >>>>> I am not an expert in this and maybe this was a conscience decision
> so I
> >>>>> would ask Ramana to maybe clarify if he remembers.
> >>>>>
> >>>>> 2. Now to come back to your patch, if my understanding is correct, I
> think a
> >>>>> far_branch would be anything outside of this range, that is,
> >>>>> (offset < -1048576 || offset > 1048572), anything that can not be
> >>>>> represented in the 21-bit range.
> >>>>>
> >>>>> Thanks
> >>>>> Sudi
> >>>
> >>> [...]
> >>>
> >>>> @@ -466,14 +459,9 @@
> >>>>[(set_attr "type" "branch")
> >>>> (set (attr "length")
> >>>>   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int
> -1048576))
> >>>> -(lt (minus (match_dup 2) (pc)) (const_int
> 1048572)))
> >>>> +(le (minus (match_dup 2) (pc)) (const_int
> 1048572)))
> >>>> (const_int 4)
> >>>> -   (const_int 8)))
> >>>
> >>> Sorry for not replying earlier, but I think the use of "lt" rather than
> >>> "le" in the current length attribute is deliberate.  Distances measured
> >>> from (pc) in "length" are a bit special in that backward distances are
> >>> measured from the start of the instruction and forward distances are
> >>> measured from the end of the instruction:
> >>>
> >>>   /* The address of the current insn.  We implement this actually
> as the
> >>>  address of the current insn for backward branches, but the
> last
> >>>  address of the next insn for forward branches, and both with
> >>>  adjustments that account for the worst-case possible
> stretching of
> >>>  intervening alignments between this insn and its
> destination.  */
> >>>
> >>> This avoids the c

[AARCH64]Bug in fix for branch offsets over 1 MiB?

2018-01-20 Thread Sameera Deshpande
Hi!

I am seeing multiple assembler errors with error message "Error:
conditional branch out of range" for customer code.

The root cause of the bug is that conditional branches are generated
whose branch target ends up being too far away to be encoded in the
instruction.  It appears that there was an attempt to fix this issue
in the below change:

commit 050af05b9761f1979f11c151519e7244d5becd7c
Author: thopre01 
Date:   Thu Aug 27 10:08:54 2015 +

2015-08-27  Ramana Radhakrishnan
<[ramana.radhakrish...@arm.com|mailto:ramana.radhakrish...@arm.com]>
Andre Vieira
<[andre.simoesdiasvie...@arm.com|mailto:andre.simoesdiasvie...@arm.com]>

gcc/
* config/aarch64/[aarch64.md|http://aarch64.md/] (*condjump):
Handle functions > 1 MiB.
(*cb1): Likewise.
(*tb1): Likewise.
(*cb1): Likewise.
* config/aarch64/[iterators.md|http://iterators.md/] (inv_cb):
New code attribute.
(inv_tb): Likewise.
* config/aarch64/aarch64.c (aarch64_gen_far_branch): New.
* config/aarch64/aarch64-protos.h (aarch64_gen_far_branch): New.

gcc/testsuite/
* gcc.target/aarch64/long_branch_1.c: New test.

However, as per GCC Internal documentation, only special attribute
"length" should use PC and match_dup while defining an attribute. I
verified by looking at code in final pass, and realised that
get_attribute_length does not map directly to the functions generated
from the definition of attribute length in RTL patterns, but computes
the lengths in shorten_branches and uses insn_current_length as
intermediate function.

The far_branch attribute defined similar to attribute length expects
same values to be returned by (minus (match_dup 2) (pc)) which is
incorrect.

I am looking at TARGET_MACHINE_DEPENDENT_REORG macro instead like few
other architectures, to emit far branches.

Is that approach acceptable?

PS: I am waiting for customer's approval for attaching the test case.

-- 
- Thanks and regards,
  Sameera D.


Re: [AARCH64]Bug in fix for branch offsets over 1 MiB?

2018-01-29 Thread Sameera Deshpande
On 30-Jan-2018 2:37 AM, "Richard Sandiford" 
wrote:

Sameera Deshpande  writes:
> Hi!
>
> I am seeing multiple assembler errors with error message "Error:
> conditional branch out of range" for customer code.
>
> The root cause of the bug is that conditional branches are generated
> whose branch target ends up being too far away to be encoded in the
> instruction.  It appears that there was an attempt to fix this issue
> in the below change:
>
> commit 050af05b9761f1979f11c151519e7244d5becd7c
> Author: thopre01 
> Date:   Thu Aug 27 10:08:54 2015 +
>
> 2015-08-27  Ramana Radhakrishnan
> <[ramana.radhakrish...@arm.com|mailto:ramana.radhakrish...@arm.com]>
> Andre Vieira
> <[andre.simoesdiasvie...@arm.com|mailto:andre.simoesdiasvie...@arm.com]>
>
> gcc/
> * config/aarch64/[aarch64.md|http://aarch64.md/] (*condjump):
> Handle functions > 1 MiB.
> (*cb1): Likewise.
> (*tb1): Likewise.
> (*cb1): Likewise.
> * config/aarch64/[iterators.md|http://iterators.md/] (inv_cb):
> New code attribute.
> (inv_tb): Likewise.
> * config/aarch64/aarch64.c (aarch64_gen_far_branch): New.
> * config/aarch64/aarch64-protos.h (aarch64_gen_far_branch): New.
>
> gcc/testsuite/
> * gcc.target/aarch64/long_branch_1.c: New test.
>
> However, as per GCC Internal documentation, only special attribute
> "length" should use PC and match_dup while defining an attribute. I
> verified by looking at code in final pass, and realised that
> get_attribute_length does not map directly to the functions generated
> from the definition of attribute length in RTL patterns, but computes
> the lengths in shorten_branches and uses insn_current_length as
> intermediate function.
>
> The far_branch attribute defined similar to attribute length expects
> same values to be returned by (minus (match_dup 2) (pc)) which is
> incorrect.
>
> I am looking at TARGET_MACHINE_DEPENDENT_REORG macro instead like few
> other architectures, to emit far branches.
>
> Is that approach acceptable?

I don't think we need to go that far.  The INSN_ADDRESSES should be
correct when outputting the instructions, so does it work if we use
those instead of get_attr_far_branch?

Thanks,
Richard

> PS: I am waiting for customer's approval for attaching the test case.


Hi Richard,

Thanks for your reply. I will try using INSN_ADDRESSES and will get back to
you.

- Thanks and regards,
  Sameera D.


Re: [AARCH64]Bug in fix for branch offsets over 1 MiB?

2018-01-29 Thread Sameera Deshpande
On 30 January 2018 at 09:28, Sameera Deshpande
 wrote:
> On 30-Jan-2018 2:37 AM, "Richard Sandiford" 
> wrote:
>
> Sameera Deshpande  writes:
>> Hi!
>>
>> I am seeing multiple assembler errors with error message "Error:
>> conditional branch out of range" for customer code.
>>
>> The root cause of the bug is that conditional branches are generated
>> whose branch target ends up being too far away to be encoded in the
>> instruction.  It appears that there was an attempt to fix this issue
>> in the below change:
>>
>> commit 050af05b9761f1979f11c151519e7244d5becd7c
>> Author: thopre01 
>> Date:   Thu Aug 27 10:08:54 2015 +
>>
>> 2015-08-27  Ramana Radhakrishnan
>> <[ramana.radhakrish...@arm.com|mailto:ramana.radhakrish...@arm.com]>
>> Andre Vieira
>> <[andre.simoesdiasvie...@arm.com|mailto:andre.simoesdiasvie...@arm.com]>
>>
>> gcc/
>> * config/aarch64/[aarch64.md|http://aarch64.md/] (*condjump):
>> Handle functions > 1 MiB.
>> (*cb1): Likewise.
>> (*tb1): Likewise.
>> (*cb1): Likewise.
>> * config/aarch64/[iterators.md|http://iterators.md/] (inv_cb):
>> New code attribute.
>> (inv_tb): Likewise.
>> * config/aarch64/aarch64.c (aarch64_gen_far_branch): New.
>> * config/aarch64/aarch64-protos.h (aarch64_gen_far_branch): New.
>>
>> gcc/testsuite/
>> * gcc.target/aarch64/long_branch_1.c: New test.
>>
>> However, as per GCC Internal documentation, only special attribute
>> "length" should use PC and match_dup while defining an attribute. I
>> verified by looking at code in final pass, and realised that
>> get_attribute_length does not map directly to the functions generated
>> from the definition of attribute length in RTL patterns, but computes
>> the lengths in shorten_branches and uses insn_current_length as
>> intermediate function.
>>
>> The far_branch attribute defined similar to attribute length expects
>> same values to be returned by (minus (match_dup 2) (pc)) which is
>> incorrect.
>>
>> I am looking at TARGET_MACHINE_DEPENDENT_REORG macro instead like few
>> other architectures, to emit far branches.
>>
>> Is that approach acceptable?
>
> I don't think we need to go that far.  The INSN_ADDRESSES should be
> correct when outputting the instructions, so does it work if we use
> those instead of get_attr_far_branch?
>
> Thanks,
> Richard
>
>> PS: I am waiting for customer's approval for attaching the test case.
>
>
> Hi Richard,
>
> Thanks for your reply. I will try using INSN_ADDRESSES and will get back to
> you.
>
> - Thanks and regards,
>   Sameera D.
>

Hi Richard,

I verified that it works. Thanks a lot! Will do the testing, and
update the patch.

-- 
- Thanks and regards,
  Sameera D.


[Aarch64] Fix conditional branches with target far away.

2018-02-14 Thread Sameera Deshpande
Hi!

Please find attached the patch to fix bug in branches with offsets over 1MiB.
There has been an attempt to fix this issue in commit
050af05b9761f1979f11c151519e7244d5becd7c

However, the far_branch attribute defined in above patch used
insn_length - which computes incorrect offset. Hence, eliminated the
attribute completely, and computed the offset from insn_addresses
instead.

Ok for trunk?

gcc/Changelog

2018-02-13 Sameera Deshpande 
* config/aarch64/aarch64.md (far_branch): Remove attribute. Eliminate
all the dependencies on the attribute from RTL patterns.

-- 
- Thanks and regards,
  Sameera D.
Index: gcc/config/aarch64/aarch64.md
===
--- gcc/config/aarch64/aarch64.md	(revision 257620)
+++ gcc/config/aarch64/aarch64.md	(working copy)
@@ -244,13 +244,6 @@
 	(const_string "no")
 	] (const_string "yes")))
 
-;; Attribute that specifies whether we are dealing with a branch to a
-;; label that is far away, i.e. further away than the maximum/minimum
-;; representable in a signed 21-bits number.
-;; 0 :=: no
-;; 1 :=: yes
-(define_attr "far_branch" "" (const_int 0))
-
 ;; Strictly for compatibility with AArch32 in pipeline models, since AArch64 has
 ;; no predicated insns.
 (define_attr "predicated" "yes,no" (const_string "no"))
@@ -448,12 +441,7 @@
 	(if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
 			   (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
 		  (const_int 4)
-		  (const_int 8)))
-   (set (attr "far_branch")
-	(if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
-			   (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
-		  (const_int 0)
-		  (const_int 1)))]
+		  (const_int 8)))]
 )
 
 ;; For a 24-bit immediate CST we can optimize the compare for equality
@@ -670,12 +658,7 @@
 	(if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -1048576))
 			   (lt (minus (match_dup 1) (pc)) (const_int 1048572)))
 		  (const_int 4)
-		  (const_int 8)))
-   (set (attr "far_branch")
-	(if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
-			   (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
-		  (const_int 0)
-		  (const_int 1)))]
+		  (const_int 8)))]
 )
 
 (define_insn "*tb1"
@@ -692,7 +675,11 @@
   {
 if (get_attr_length (insn) == 8)
   {
-	if (get_attr_far_branch (insn) == 1)
+	long long int offset;
+	offset = INSN_ADDRESSES (INSN_UID (XEXP (operands[2], 0)))
+		  - INSN_ADDRESSES (INSN_UID (insn));
+
+	if (offset <= -1048576 || offset >= 1048572)
 	  return aarch64_gen_far_branch (operands, 2, "Ltb",
 	 "\\t%0, %1, ");
 	else
@@ -709,12 +696,7 @@
 	(if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -32768))
 			   (lt (minus (match_dup 2) (pc)) (const_int 32764)))
 		  (const_int 4)
-		  (const_int 8)))
-   (set (attr "far_branch")
-	(if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
-			   (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
-		  (const_int 0)
-		  (const_int 1)))]
+		  (const_int 8)))]
 
 )
 
@@ -727,8 +709,12 @@
   ""
   {
 if (get_attr_length (insn) == 8)
-  {
-	if (get_attr_far_branch (insn) == 1)
+   {
+long long int offset;
+offset = INSN_ADDRESSES (INSN_UID (XEXP (operands[1], 0)))
+		 - INSN_ADDRESSES (INSN_UID (insn));
+
+	if (offset <= -1048576 || offset >= 1048572)
 	  return aarch64_gen_far_branch (operands, 1, "Ltb",
 	 "\\t%0, , ");
 	else
@@ -740,7 +726,7 @@
 	output_asm_insn (buf, operands);
 	return "\t%l1";
 	  }
-  }
+   }
 else
   return "\t%0, , %l1";
   }
@@ -749,12 +735,7 @@
 	(if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -32768))
 			   (lt (minus (match_dup 1) (pc)) (const_int 32764)))
 		  (const_int 4)
-		  (const_int 8)))
-   (set (attr "far_branch")
-	(if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -1048576))
-			   (lt (minus (match_dup 1) (pc)) (const_int 1048572)))
-		  (const_int 0)
-		  (const_int 1)))]
+		  (const_int 8)))]
 )
 
 ;; ---


Re: [Aarch64] Fix conditional branches with target far away.

2018-02-27 Thread Sameera Deshpande
On 14 February 2018 at 14:00, Sameera Deshpande
 wrote:
> Hi!
>
> Please find attached the patch to fix bug in branches with offsets over 1MiB.
> There has been an attempt to fix this issue in commit
> 050af05b9761f1979f11c151519e7244d5becd7c
>
> However, the far_branch attribute defined in above patch used
> insn_length - which computes incorrect offset. Hence, eliminated the
> attribute completely, and computed the offset from insn_addresses
> instead.
>
> Ok for trunk?
>
> gcc/Changelog
>
> 2018-02-13 Sameera Deshpande 
> * config/aarch64/aarch64.md (far_branch): Remove attribute. Eliminate
> all the dependencies on the attribute from RTL patterns.
>
> --
> - Thanks and regards,
>   Sameera D.


Gentle reminder!

-- 
- Thanks and regards,
  Sameera D.


Re: [Aarch64] Fix conditional branches with target far away.

2018-02-28 Thread Sameera Deshpande
On 27 February 2018 at 18:25, Ramana Radhakrishnan
 wrote:
> On Wed, Feb 14, 2018 at 8:30 AM, Sameera Deshpande
>  wrote:
>> Hi!
>>
>> Please find attached the patch to fix bug in branches with offsets over 1MiB.
>> There has been an attempt to fix this issue in commit
>> 050af05b9761f1979f11c151519e7244d5becd7c
>>
>> However, the far_branch attribute defined in above patch used
>> insn_length - which computes incorrect offset. Hence, eliminated the
>> attribute completely, and computed the offset from insn_addresses
>> instead.
>>
>> Ok for trunk?
>>
>> gcc/Changelog
>>
>> 2018-02-13 Sameera Deshpande 
>> * config/aarch64/aarch64.md (far_branch): Remove attribute. Eliminate
>> all the dependencies on the attribute from RTL patterns.
>>
>
> I'm not a maintainer but this looks good to me modulo notes about how
> this was tested. What would be nice is a testcase for the testsuite as
> well as ensuring that the patch has been bootstrapped and regression
> tested. AFAIR, the original patch was put in because match.pd failed
> when bootstrap in another context.
>
>
> regards
> Ramana
>
>> --
>> - Thanks and regards,
>>   Sameera D.

The patch is tested with GCC testsuite and bootstrapping successfully.
Also tested for spec benchmark.

-- 
- Thanks and regards,
  Sameera D.


Re: [Aarch64] Fix conditional branches with target far away.

2018-03-15 Thread Sameera Deshpande
Ping!

On 28 February 2018 at 16:18, Sameera Deshpande
 wrote:
> On 27 February 2018 at 18:25, Ramana Radhakrishnan
>  wrote:
>> On Wed, Feb 14, 2018 at 8:30 AM, Sameera Deshpande
>>  wrote:
>>> Hi!
>>>
>>> Please find attached the patch to fix bug in branches with offsets over 
>>> 1MiB.
>>> There has been an attempt to fix this issue in commit
>>> 050af05b9761f1979f11c151519e7244d5becd7c
>>>
>>> However, the far_branch attribute defined in above patch used
>>> insn_length - which computes incorrect offset. Hence, eliminated the
>>> attribute completely, and computed the offset from insn_addresses
>>> instead.
>>>
>>> Ok for trunk?
>>>
>>> gcc/Changelog
>>>
>>> 2018-02-13 Sameera Deshpande 
>>> * config/aarch64/aarch64.md (far_branch): Remove attribute. 
>>> Eliminate
>>> all the dependencies on the attribute from RTL patterns.
>>>
>>
>> I'm not a maintainer but this looks good to me modulo notes about how
>> this was tested. What would be nice is a testcase for the testsuite as
>> well as ensuring that the patch has been bootstrapped and regression
>> tested. AFAIR, the original patch was put in because match.pd failed
>> when bootstrap in another context.
>>
>>
>> regards
>> Ramana
>>
>>> --
>>> - Thanks and regards,
>>>   Sameera D.
>
> The patch is tested with GCC testsuite and bootstrapping successfully.
> Also tested for spec benchmark.
>
> --
> - Thanks and regards,
>   Sameera D.



-- 
- Thanks and regards,
  Sameera D.


Re: [Aarch64] Fix conditional branches with target far away.

2018-03-21 Thread Sameera Deshpande
Hi Sudakshina,

As per the ARMv8 ARM, for the offset range (-1048576 ,1048572), the
far branch instruction offset is inclusive of both the offsets. Hence,
I am using <=||=> and not <||>= as it was in previous implementation.

On 16 March 2018 at 00:51, Sudakshina Das  wrote:
> On 15/03/18 15:27, Sameera Deshpande wrote:
>>
>> Ping!
>>
>> On 28 February 2018 at 16:18, Sameera Deshpande
>>  wrote:
>>>
>>> On 27 February 2018 at 18:25, Ramana Radhakrishnan
>>>  wrote:
>>>>
>>>> On Wed, Feb 14, 2018 at 8:30 AM, Sameera Deshpande
>>>>  wrote:
>>>>>
>>>>> Hi!
>>>>>
>>>>> Please find attached the patch to fix bug in branches with offsets over
>>>>> 1MiB.
>>>>> There has been an attempt to fix this issue in commit
>>>>> 050af05b9761f1979f11c151519e7244d5becd7c
>>>>>
>>>>> However, the far_branch attribute defined in above patch used
>>>>> insn_length - which computes incorrect offset. Hence, eliminated the
>>>>> attribute completely, and computed the offset from insn_addresses
>>>>> instead.
>>>>>
>>>>> Ok for trunk?
>>>>>
>>>>> gcc/Changelog
>>>>>
>>>>> 2018-02-13 Sameera Deshpande 
>>>>>  * config/aarch64/aarch64.md (far_branch): Remove attribute.
>>>>> Eliminate
>>>>>  all the dependencies on the attribute from RTL patterns.
>>>>>
>>>>
>>>> I'm not a maintainer but this looks good to me modulo notes about how
>>>> this was tested. What would be nice is a testcase for the testsuite as
>>>> well as ensuring that the patch has been bootstrapped and regression
>>>> tested. AFAIR, the original patch was put in because match.pd failed
>>>> when bootstrap in another context.
>>>>
>>>>
>>>> regards
>>>> Ramana
>>>>
>>>>> --
>>>>> - Thanks and regards,
>>>>>Sameera D.
>>>
>>>
>>> The patch is tested with GCC testsuite and bootstrapping successfully.
>>> Also tested for spec benchmark.
>>>
>
> I am not a maintainer either. I noticed that the range check you do for
> the offset has a (<= || >=). The "far_branch" however did (< || >=) for a
> positive value. Was that also part of the incorrect offset calculation?
>
> @@ -692,7 +675,11 @@
> {
>   if (get_attr_length (insn) =3D=3D 8)
> {
> -   if (get_attr_far_branch (insn) =3D=3D 1)
> +   long long int offset;
> +   offset =3D INSN_ADDRESSES (INSN_UID (XEXP (operands[2], 0)))
> + - INSN_ADDRESSES (INSN_UID (insn));
> +
> +   if (offset <=3D -1048576 || offset >=3D 1048572)
>return aarch64_gen_far_branch (operands, 2, "Ltb",
>   "\\t%0, %1, ");
>  else
> @@ -709,12 +696,7 @@
>  (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int
> -32768))
> (lt (minus (match_dup 2) (pc)) (const_int
> 32764)))
>(const_int 4)
> - (const_int 8)))
> -   (set (attr "far_branch")
> -   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int
> -1048576))
> -  (lt (minus (match_dup 2) (pc)) (const_int
> 1048572)))
> - (const_int 0)
> - (const_int 1)))]
> + (const_int 8)))]
>
>   )
>
> Thanks
> Sudi
>
>>> --
>>> - Thanks and regards,
>>>Sameera D.
>>
>>
>>
>>
>



-- 
- Thanks and regards,
  Sameera D.


Re: [Aarch64] Fix conditional branches with target far away.

2018-03-29 Thread Sameera Deshpande
Hi Sudakshina,

Thanks for pointing that out. Updated the conditions for attribute
length to take care of boundary conditions for offset range.

Please find attached the updated patch.

I have tested it for gcc testsuite and the failing testcase. Ok for trunk?

On 22 March 2018 at 19:06, Sudakshina Das  wrote:
> Hi Sameera
>
> On 22/03/18 02:07, Sameera Deshpande wrote:
>>
>> Hi Sudakshina,
>>
>> As per the ARMv8 ARM, for the offset range (-1048576 ,1048572), the
>> far branch instruction offset is inclusive of both the offsets. Hence,
>> I am using <=||=> and not <||>= as it was in previous implementation.
>
>
> I have to admit earlier I was only looking at the patch mechanically and
> found a difference with the previous implementation in offset comparison.
> After you pointed out, I looked up the ARMv8 ARM and I have a couple of
> doubts:
>
> 1. My understanding is that any offset in [-1048576 ,1048572] both inclusive
> qualifies as an 'in range' offset. However, the code for both attribute
> length and far_branch has been using [-1048576 ,1048572), that is, ( >= && <
> ). If the far_branch was incorrectly calculated, then maybe the length
> calculations with similar magic numbers should also be corrected? Of course,
> I am not an expert in this and maybe this was a conscience decision so I
> would ask Ramana to maybe clarify if he remembers.
>
> 2. Now to come back to your patch, if my understanding is correct, I think a
> far_branch would be anything outside of this range, that is,
> (offset < -1048576 || offset > 1048572), anything that can not be
> represented in the 21-bit range.
>
> Thanks
> Sudi
>
>
>>
>> On 16 March 2018 at 00:51, Sudakshina Das  wrote:
>>>
>>> On 15/03/18 15:27, Sameera Deshpande wrote:
>>>>
>>>>
>>>> Ping!
>>>>
>>>> On 28 February 2018 at 16:18, Sameera Deshpande
>>>>  wrote:
>>>>>
>>>>>
>>>>> On 27 February 2018 at 18:25, Ramana Radhakrishnan
>>>>>  wrote:
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 14, 2018 at 8:30 AM, Sameera Deshpande
>>>>>>  wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi!
>>>>>>>
>>>>>>> Please find attached the patch to fix bug in branches with offsets
>>>>>>> over
>>>>>>> 1MiB.
>>>>>>> There has been an attempt to fix this issue in commit
>>>>>>> 050af05b9761f1979f11c151519e7244d5becd7c
>>>>>>>
>>>>>>> However, the far_branch attribute defined in above patch used
>>>>>>> insn_length - which computes incorrect offset. Hence, eliminated the
>>>>>>> attribute completely, and computed the offset from insn_addresses
>>>>>>> instead.
>>>>>>>
>>>>>>> Ok for trunk?
>>>>>>>
>>>>>>> gcc/Changelog
>>>>>>>
>>>>>>> 2018-02-13 Sameera Deshpande 
>>>>>>>   * config/aarch64/aarch64.md (far_branch): Remove attribute.
>>>>>>> Eliminate
>>>>>>>   all the dependencies on the attribute from RTL patterns.
>>>>>>>
>>>>>>
>>>>>> I'm not a maintainer but this looks good to me modulo notes about how
>>>>>> this was tested. What would be nice is a testcase for the testsuite as
>>>>>> well as ensuring that the patch has been bootstrapped and regression
>>>>>> tested. AFAIR, the original patch was put in because match.pd failed
>>>>>> when bootstrap in another context.
>>>>>>
>>>>>>
>>>>>> regards
>>>>>> Ramana
>>>>>>
>>>>>>> --
>>>>>>> - Thanks and regards,
>>>>>>> Sameera D.
>>>>>
>>>>>
>>>>>
>>>>> The patch is tested with GCC testsuite and bootstrapping successfully.
>>>>> Also tested for spec benchmark.
>>>>>
>>>
>>> I am not a maintainer either. I noticed that the range check you do for
>>> the offset has a (<= || >=). The "far_branch" however did (< || >=) for a
>>> positive value. Was that also part of the incorrect offset calculation?
>>>
>>> @@ -692,7 +675,11 @@
>&

Re: [Aarch64] Fix conditional branches with target far away.

2018-03-29 Thread Sameera Deshpande
Hi Sudakshina,

That testcase cannot be addwd as of now, as it needs approval from client.

On Thu 29 Mar, 2018, 9:01 PM Sudakshina Das,  wrote:

> Hi Sameera
>
> On 29/03/18 11:44, Sameera Deshpande wrote:
> > Hi Sudakshina,
> >
> > Thanks for pointing that out. Updated the conditions for attribute
> > length to take care of boundary conditions for offset range.
> >
> > Please find attached the updated patch.
> >
> > I have tested it for gcc testsuite and the failing testcase. Ok for
> trunk?
>
> Thank you so much for fixing the length as well along with you patch.
> You mention a failing testcase? Maybe it would be helpful to add that
> to the patch for the gcc testsuite.
>
> Sudi
>
> >
> > On 22 March 2018 at 19:06, Sudakshina Das  wrote:
> >> Hi Sameera
> >>
> >> On 22/03/18 02:07, Sameera Deshpande wrote:
> >>>
> >>> Hi Sudakshina,
> >>>
> >>> As per the ARMv8 ARM, for the offset range (-1048576 ,1048572), the
> >>> far branch instruction offset is inclusive of both the offsets. Hence,
> >>> I am using <=||=> and not <||>= as it was in previous implementation.
> >>
> >>
> >> I have to admit earlier I was only looking at the patch mechanically and
> >> found a difference with the previous implementation in offset
> comparison.
> >> After you pointed out, I looked up the ARMv8 ARM and I have a couple of
> >> doubts:
> >>
> >> 1. My understanding is that any offset in [-1048576 ,1048572] both
> inclusive
> >> qualifies as an 'in range' offset. However, the code for both attribute
> >> length and far_branch has been using [-1048576 ,1048572), that is, ( >=
> && <
> >> ). If the far_branch was incorrectly calculated, then maybe the length
> >> calculations with similar magic numbers should also be corrected? Of
> course,
> >> I am not an expert in this and maybe this was a conscience decision so I
> >> would ask Ramana to maybe clarify if he remembers.
> >>
> >> 2. Now to come back to your patch, if my understanding is correct, I
> think a
> >> far_branch would be anything outside of this range, that is,
> >> (offset < -1048576 || offset > 1048572), anything that can not be
> >> represented in the 21-bit range.
> >>
> >> Thanks
> >> Sudi
> >>
> >>
> >>>
> >>> On 16 March 2018 at 00:51, Sudakshina Das  wrote:
> >>>>
> >>>> On 15/03/18 15:27, Sameera Deshpande wrote:
> >>>>>
> >>>>>
> >>>>> Ping!
> >>>>>
> >>>>> On 28 February 2018 at 16:18, Sameera Deshpande
> >>>>>  wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 27 February 2018 at 18:25, Ramana Radhakrishnan
> >>>>>>  wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Feb 14, 2018 at 8:30 AM, Sameera Deshpande
> >>>>>>>  wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Hi!
> >>>>>>>>
> >>>>>>>> Please find attached the patch to fix bug in branches with offsets
> >>>>>>>> over
> >>>>>>>> 1MiB.
> >>>>>>>> There has been an attempt to fix this issue in commit
> >>>>>>>> 050af05b9761f1979f11c151519e7244d5becd7c
> >>>>>>>>
> >>>>>>>> However, the far_branch attribute defined in above patch used
> >>>>>>>> insn_length - which computes incorrect offset. Hence, eliminated
> the
> >>>>>>>> attribute completely, and computed the offset from insn_addresses
> >>>>>>>> instead.
> >>>>>>>>
> >>>>>>>> Ok for trunk?
> >>>>>>>>
> >>>>>>>> gcc/Changelog
> >>>>>>>>
> >>>>>>>> 2018-02-13 Sameera Deshpande 
> >>>>>>>>* config/aarch64/aarch64.md (far_branch): Remove
> attribute.
> >>>>>>>> Eliminate
> >>>>>>>>all the dependencies on the attribute from RTL
> patterns.
> >>>>>>>>
> >>>>>>>
> >>>&

Re: [Aarch64] Fix conditional branches with target far away.

2018-03-30 Thread Sameera Deshpande
On 30 March 2018 at 16:39, Richard Sandiford
 wrote:
>> Hi Sudakshina,
>>
>> Thanks for pointing that out. Updated the conditions for attribute
>> length to take care of boundary conditions for offset range.
>>
>> Please find attached the updated patch.
>>
>> I have tested it for gcc testsuite and the failing testcase. Ok for trunk?
>>
>> On 22 March 2018 at 19:06, Sudakshina Das  wrote:
>>> Hi Sameera
>>>
>>> On 22/03/18 02:07, Sameera Deshpande wrote:
>>>>
>>>> Hi Sudakshina,
>>>>
>>>> As per the ARMv8 ARM, for the offset range (-1048576 ,1048572), the
>>>> far branch instruction offset is inclusive of both the offsets. Hence,
>>>> I am using <=||=> and not <||>= as it was in previous implementation.
>>>
>>>
>>> I have to admit earlier I was only looking at the patch mechanically and
>>> found a difference with the previous implementation in offset comparison.
>>> After you pointed out, I looked up the ARMv8 ARM and I have a couple of
>>> doubts:
>>>
>>> 1. My understanding is that any offset in [-1048576 ,1048572] both inclusive
>>> qualifies as an 'in range' offset. However, the code for both attribute
>>> length and far_branch has been using [-1048576 ,1048572), that is, ( >= && <
>>> ). If the far_branch was incorrectly calculated, then maybe the length
>>> calculations with similar magic numbers should also be corrected? Of course,
>>> I am not an expert in this and maybe this was a conscience decision so I
>>> would ask Ramana to maybe clarify if he remembers.
>>>
>>> 2. Now to come back to your patch, if my understanding is correct, I think a
>>> far_branch would be anything outside of this range, that is,
>>> (offset < -1048576 || offset > 1048572), anything that can not be
>>> represented in the 21-bit range.
>>>
>>> Thanks
>>> Sudi
>
> [...]
>
>> @@ -466,14 +459,9 @@
>>[(set_attr "type" "branch")
>> (set (attr "length")
>>   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
>> -(lt (minus (match_dup 2) (pc)) (const_int 1048572)))
>> +(le (minus (match_dup 2) (pc)) (const_int 1048572)))
>> (const_int 4)
>> -   (const_int 8)))
>
> Sorry for not replying earlier, but I think the use of "lt" rather than
> "le" in the current length attribute is deliberate.  Distances measured
> from (pc) in "length" are a bit special in that backward distances are
> measured from the start of the instruction and forward distances are
> measured from the end of the instruction:
>
>   /* The address of the current insn.  We implement this actually as the
>  address of the current insn for backward branches, but the last
>  address of the next insn for forward branches, and both with
>  adjustments that account for the worst-case possible stretching of
>  intervening alignments between this insn and its destination.  */
>
> This avoids the chicken-and-egg situation of the length of the branch
> depending on the forward distance and the forward distance depending
> on the length of the branch.
>
> In contrast, this code:
>
>> @@ -712,7 +695,11 @@
>>{
>>  if (get_attr_length (insn) == 8)
>>{
>> - if (get_attr_far_branch (insn) == 1)
>> + long long int offset;
>> + offset = INSN_ADDRESSES (INSN_UID (XEXP (operands[2], 0)))
>> +   - INSN_ADDRESSES (INSN_UID (insn));
>> +
>> + if (offset < -1048576 || offset > 1048572)
>> return aarch64_gen_far_branch (operands, 2, "Ltb",
>>"\\t%0, %1, ");
>>   else
>
> is reading the final computed addresses, so the code is right to use
> the real instruction range.  (FWIW I agree with Kyrill that using
> IN_RANGE with hex constants would be clearer.)
>
> That said... a possible problem comes from situations like:
>
>address length insn
>..c  8 A
>   ..align to 8 bytes...
>..8B
>..c  4 C
>   ..align to 16 bytes...
>..0D, branch to B
>
> when D is at the maximum extent of the branch range and when GCC's length
> for A is only a conservative estimate.  If the length of A turns out to
> be 4 rather than 8 at assembly time, the align

Re: [Aarch64] Fix conditional branches with target far away.

2018-03-30 Thread Sameera Deshpande
Hi Richard,

The testcase is working with the patch you suggested, thanks for
pointing that out.

On 30 March 2018 at 16:54, Sameera Deshpande
 wrote:
> On 30 March 2018 at 16:39, Richard Sandiford
>  wrote:
>>> Hi Sudakshina,
>>>
>>> Thanks for pointing that out. Updated the conditions for attribute
>>> length to take care of boundary conditions for offset range.
>>>
>>> Please find attached the updated patch.
>>>
>>> I have tested it for gcc testsuite and the failing testcase. Ok for trunk?
>>>
>>> On 22 March 2018 at 19:06, Sudakshina Das  wrote:
>>>> Hi Sameera
>>>>
>>>> On 22/03/18 02:07, Sameera Deshpande wrote:
>>>>>
>>>>> Hi Sudakshina,
>>>>>
>>>>> As per the ARMv8 ARM, for the offset range (-1048576 ,1048572), the
>>>>> far branch instruction offset is inclusive of both the offsets. Hence,
>>>>> I am using <=||=> and not <||>= as it was in previous implementation.
>>>>
>>>>
>>>> I have to admit earlier I was only looking at the patch mechanically and
>>>> found a difference with the previous implementation in offset comparison.
>>>> After you pointed out, I looked up the ARMv8 ARM and I have a couple of
>>>> doubts:
>>>>
>>>> 1. My understanding is that any offset in [-1048576 ,1048572] both 
>>>> inclusive
>>>> qualifies as an 'in range' offset. However, the code for both attribute
>>>> length and far_branch has been using [-1048576 ,1048572), that is, ( >= && 
>>>> <
>>>> ). If the far_branch was incorrectly calculated, then maybe the length
>>>> calculations with similar magic numbers should also be corrected? Of 
>>>> course,
>>>> I am not an expert in this and maybe this was a conscience decision so I
>>>> would ask Ramana to maybe clarify if he remembers.
>>>>
>>>> 2. Now to come back to your patch, if my understanding is correct, I think 
>>>> a
>>>> far_branch would be anything outside of this range, that is,
>>>> (offset < -1048576 || offset > 1048572), anything that can not be
>>>> represented in the 21-bit range.
>>>>
>>>> Thanks
>>>> Sudi
>>
>> [...]
>>
>>> @@ -466,14 +459,9 @@
>>>[(set_attr "type" "branch")
>>> (set (attr "length")
>>>   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int 
>>> -1048576))
>>> -(lt (minus (match_dup 2) (pc)) (const_int 
>>> 1048572)))
>>> +(le (minus (match_dup 2) (pc)) (const_int 
>>> 1048572)))
>>> (const_int 4)
>>> -   (const_int 8)))
>>
>> Sorry for not replying earlier, but I think the use of "lt" rather than
>> "le" in the current length attribute is deliberate.  Distances measured
>> from (pc) in "length" are a bit special in that backward distances are
>> measured from the start of the instruction and forward distances are
>> measured from the end of the instruction:
>>
>>   /* The address of the current insn.  We implement this actually as the
>>  address of the current insn for backward branches, but the last
>>  address of the next insn for forward branches, and both with
>>  adjustments that account for the worst-case possible stretching of
>>  intervening alignments between this insn and its destination.  */
>>
>> This avoids the chicken-and-egg situation of the length of the branch
>> depending on the forward distance and the forward distance depending
>> on the length of the branch.
>>
>> In contrast, this code:
>>
>>> @@ -712,7 +695,11 @@
>>>{
>>>  if (get_attr_length (insn) == 8)
>>>{
>>> - if (get_attr_far_branch (insn) == 1)
>>> + long long int offset;
>>> + offset = INSN_ADDRESSES (INSN_UID (XEXP (operands[2], 0)))
>>> +   - INSN_ADDRESSES (INSN_UID (insn));
>>> +
>>> + if (offset < -1048576 || offset > 1048572)
>>> return aarch64_gen_far_branch (operands, 2, "Ltb",
>>>"\\t%0, %1, ");
>>>   else
>>
>> is reading the final computed addresses, so the code is right to use
>> the real instruction

Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2018-04-06 Thread Sameera Deshpande
Hi Christophe,

Please find attached the updated patch with testcases.

Ok for trunk?

- Thanks and regards,
  Sameera D.

2017-12-14 22:17 GMT+05:30 Christophe Lyon :
> 2017-12-14 9:29 GMT+01:00 Sameera Deshpande :
>> Hi!
>>
>> Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and
>> vst1_*_x3 intrinsics as defined by Neon document.
>>
>> Ok for trunk?
>>
>> - Thanks and regards,
>>   Sameera D.
>>
>> gcc/Changelog:
>>
>> 2017-11-14  Sameera Deshpande  
>>
>>
>> * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
>> (st1x2): Likewise.
>> (st1x3): Likewise.
>> * config/aarch64/aarch64-simd.md
>> (aarch64_ld1x3): New pattern.
>> (aarch64_ld1_x3_): Likewise
>> (aarch64_st1x2): Likewise
>> (aarch64_st1_x2_): Likewise
>> (aarch64_st1x3): Likewise
>> (aarch64_st1_x3_): Likewise
>> * config/aarch64/arm_neon.h (vld1_u8_x3): New function.
>> (vld1_s8_x3): Likewise.
>> (vld1_u16_x3): Likewise.
>> (vld1_s16_x3): Likewise.
>> (vld1_u32_x3): Likewise.
>> (vld1_s32_x3): Likewise.
>> (vld1_u64_x3): Likewise.
>> (vld1_s64_x3): Likewise.
>> (vld1_fp16_x3): Likewise.
>> (vld1_f32_x3): Likewise.
>> (vld1_f64_x3): Likewise.
>> (vld1_p8_x3): Likewise.
>> (vld1_p16_x3): Likewise.
>> (vld1_p64_x3): Likewise.
>> (vld1q_u8_x3): Likewise.
>> (vld1q_s8_x3): Likewise.
>> (vld1q_u16_x3): Likewise.
>> (vld1q_s16_x3): Likewise.
>> (vld1q_u32_x3): Likewise.
>> (vld1q_s32_x3): Likewise.
>> (vld1q_u64_x3): Likewise.
>> (vld1q_s64_x3): Likewise.
>> (vld1q_f16_x3): Likewise.
>> (vld1q_f32_x3): Likewise.
>> (vld1q_f64_x3): Likewise.
>> (vld1q_p8_x3): Likewise.
>> (vld1q_p16_x3): Likewise.
>> (vld1q_p64_x3): Likewise.
>> (vst1_s64_x2): Likewise.
>> (vst1_u64_x2): Likewise.
>> (vst1_f64_x2): Likewise.
>> (vst1_s8_x2): Likewise.
>> (vst1_p8_x2): Likewise.
>> (vst1_s16_x2): Likewise.
>> (vst1_p16_x2): Likewise.
>> (vst1_s32_x2): Likewise.
>> (vst1_u8_x2): Likewise.
>> (vst1_u16_x2): Likewise.
>> (vst1_u32_x2): Likewise.
>> (vst1_f16_x2): Likewise.
>> (vst1_f32_x2): Likewise.
>> (vst1_p64_x2): Likewise.
>> (vst1q_s8_x2): Likewise.
>> (vst1q_p8_x2): Likewise.
>> (vst1q_s16_x2): Likewise.
>> (vst1q_p16_x2): Likewise.
>> (vst1q_s32_x2): Likewise.
>> (vst1q_s64_x2): Likewise.
>> (vst1q_u8_x2): Likewise.
>> (vst1q_u16_x2): Likewise.
>> (vst1q_u32_x2): Likewise.
>> (vst1q_u64_x2): Likewise.
>> (vst1q_f16_x2): Likewise.
>> (vst1q_f32_x2): Likewise.
>> (vst1q_f64_x2): Likewise.
>> (vst1q_p64_x2): Likewise.
>> (vst1_s64_x3): Likewise.
>> (vst1_u64_x3): Likewise.
>> (vst1_f64_x3): Likewise.
>> (vst1_s8_x3): Likewise.
>> (vst1_p8_x3): Likewise.
>> (vst1_s16_x3): Likewise.
>> (vst1_p16_x3): Likewise.
>> (vst1_s32_x3): Likewise.
>> (vst1_u8_x3): Likewise.
>> (vst1_u16_x3): Likewise.
>> (vst1_u32_x3): Likewise.
>> (vst1_f16_x3): Likewise.
>> (vst1_f32_x3): Likewise.
>> (vst1_p64_x3): Likewise.
>> (vst1q_s8_x3): Likewise.
>> (vst1q_p8_x3): Likewise.
>> (vst1q_s16_x3): Likewise.
>> (vst1q_p16_x3): Likewise.
>> (vst1q_s32_x3): Likewise.
>> (vst1q_s64_x3): Likewise.
>> (vst1q_u8_x3): Likewise.
>> (vst1q_u16_x3): Likewise.
>> (vst1q_u32_x3): Likewise.
>> (vst1q_u64_x3): Likewise.
>> (vst1q_f16_x3): Likewise.
>> (vst1q_f32_x3): Likewise.
>> (vst1q_f64_x3): Likewise.
>> (vst1q_p64_x3): Likewise.
>
> Hi,
> I'm not a maintainer, but I suspect you should add some tests.
>
> Christophe



-- 
- Thanks and regards,
  Sameera D.
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index b383f24..2fd072a 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarc

Re: [Aarch64] Fix conditional branches with target far away.

2018-04-09 Thread Sameera Deshpande
Hi Richard,

I do not see the said patch applied in ToT yet. When do you expect it
to be available in ToT?

- Thanks and regards,
  Sameera D.

On 30 March 2018 at 17:01, Sameera Deshpande
 wrote:
> Hi Richard,
>
> The testcase is working with the patch you suggested, thanks for
> pointing that out.
>
> On 30 March 2018 at 16:54, Sameera Deshpande
>  wrote:
>> On 30 March 2018 at 16:39, Richard Sandiford
>>  wrote:
>>>> Hi Sudakshina,
>>>>
>>>> Thanks for pointing that out. Updated the conditions for attribute
>>>> length to take care of boundary conditions for offset range.
>>>>
>>>> Please find attached the updated patch.
>>>>
>>>> I have tested it for gcc testsuite and the failing testcase. Ok for trunk?
>>>>
>>>> On 22 March 2018 at 19:06, Sudakshina Das  wrote:
>>>>> Hi Sameera
>>>>>
>>>>> On 22/03/18 02:07, Sameera Deshpande wrote:
>>>>>>
>>>>>> Hi Sudakshina,
>>>>>>
>>>>>> As per the ARMv8 ARM, for the offset range (-1048576 ,1048572), the
>>>>>> far branch instruction offset is inclusive of both the offsets. Hence,
>>>>>> I am using <=||=> and not <||>= as it was in previous implementation.
>>>>>
>>>>>
>>>>> I have to admit earlier I was only looking at the patch mechanically and
>>>>> found a difference with the previous implementation in offset comparison.
>>>>> After you pointed out, I looked up the ARMv8 ARM and I have a couple of
>>>>> doubts:
>>>>>
>>>>> 1. My understanding is that any offset in [-1048576 ,1048572] both 
>>>>> inclusive
>>>>> qualifies as an 'in range' offset. However, the code for both attribute
>>>>> length and far_branch has been using [-1048576 ,1048572), that is, ( >= 
>>>>> && <
>>>>> ). If the far_branch was incorrectly calculated, then maybe the length
>>>>> calculations with similar magic numbers should also be corrected? Of 
>>>>> course,
>>>>> I am not an expert in this and maybe this was a conscience decision so I
>>>>> would ask Ramana to maybe clarify if he remembers.
>>>>>
>>>>> 2. Now to come back to your patch, if my understanding is correct, I 
>>>>> think a
>>>>> far_branch would be anything outside of this range, that is,
>>>>> (offset < -1048576 || offset > 1048572), anything that can not be
>>>>> represented in the 21-bit range.
>>>>>
>>>>> Thanks
>>>>> Sudi
>>>
>>> [...]
>>>
>>>> @@ -466,14 +459,9 @@
>>>>[(set_attr "type" "branch")
>>>> (set (attr "length")
>>>>   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int 
>>>> -1048576))
>>>> -(lt (minus (match_dup 2) (pc)) (const_int 
>>>> 1048572)))
>>>> +(le (minus (match_dup 2) (pc)) (const_int 
>>>> 1048572)))
>>>> (const_int 4)
>>>> -   (const_int 8)))
>>>
>>> Sorry for not replying earlier, but I think the use of "lt" rather than
>>> "le" in the current length attribute is deliberate.  Distances measured
>>> from (pc) in "length" are a bit special in that backward distances are
>>> measured from the start of the instruction and forward distances are
>>> measured from the end of the instruction:
>>>
>>>   /* The address of the current insn.  We implement this actually as the
>>>  address of the current insn for backward branches, but the last
>>>  address of the next insn for forward branches, and both with
>>>  adjustments that account for the worst-case possible stretching of
>>>  intervening alignments between this insn and its destination.  */
>>>
>>> This avoids the chicken-and-egg situation of the length of the branch
>>> depending on the forward distance and the forward distance depending
>>> on the length of the branch.
>>>
>>> In contrast, this code:
>>>
>>>> @@ -712,7 +695,11 @@
>>>>{
>>>>  if (get_attr_length (insn) == 8)
>>>>{
>>>> - if (get_attr_far_branch (insn) =

Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2018-04-10 Thread Sameera Deshpande
On 7 April 2018 at 01:25, Christophe Lyon  wrote:
> Hi,
>
> 2018-04-06 12:15 GMT+02:00 Sameera Deshpande :
>> Hi Christophe,
>>
>> Please find attached the updated patch with testcases.
>>
>> Ok for trunk?
>
> Thanks for the update.
>
> Since the new intrinsics are only available on aarch64, you want to
> prevent the tests from running on arm.
> Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two 
> targets.
> There are several examples on how to do that in that directory.
>
> I have also noticed that the tests fail at execution on aarch64_be.
>
> I didn't look at the patch in details.
>
> Christophe
>
>
>>
>> - Thanks and regards,
>>   Sameera D.
>>
>> 2017-12-14 22:17 GMT+05:30 Christophe Lyon :
>>> 2017-12-14 9:29 GMT+01:00 Sameera Deshpande :
>>>> Hi!
>>>>
>>>> Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and
>>>> vst1_*_x3 intrinsics as defined by Neon document.
>>>>
>>>> Ok for trunk?
>>>>
>>>> - Thanks and regards,
>>>>   Sameera D.
>>>>
>>>> gcc/Changelog:
>>>>
>>>> 2017-11-14  Sameera Deshpande  
>>>>
>>>>
>>>> * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
>>>> (st1x2): Likewise.
>>>> (st1x3): Likewise.
>>>> * config/aarch64/aarch64-simd.md
>>>> (aarch64_ld1x3): New pattern.
>>>> (aarch64_ld1_x3_): Likewise
>>>> (aarch64_st1x2): Likewise
>>>> (aarch64_st1_x2_): Likewise
>>>> (aarch64_st1x3): Likewise
>>>> (aarch64_st1_x3_): Likewise
>>>> * config/aarch64/arm_neon.h (vld1_u8_x3): New function.
>>>> (vld1_s8_x3): Likewise.
>>>> (vld1_u16_x3): Likewise.
>>>> (vld1_s16_x3): Likewise.
>>>> (vld1_u32_x3): Likewise.
>>>> (vld1_s32_x3): Likewise.
>>>> (vld1_u64_x3): Likewise.
>>>> (vld1_s64_x3): Likewise.
>>>> (vld1_fp16_x3): Likewise.
>>>> (vld1_f32_x3): Likewise.
>>>> (vld1_f64_x3): Likewise.
>>>> (vld1_p8_x3): Likewise.
>>>> (vld1_p16_x3): Likewise.
>>>> (vld1_p64_x3): Likewise.
>>>> (vld1q_u8_x3): Likewise.
>>>> (vld1q_s8_x3): Likewise.
>>>> (vld1q_u16_x3): Likewise.
>>>> (vld1q_s16_x3): Likewise.
>>>> (vld1q_u32_x3): Likewise.
>>>> (vld1q_s32_x3): Likewise.
>>>> (vld1q_u64_x3): Likewise.
>>>> (vld1q_s64_x3): Likewise.
>>>> (vld1q_f16_x3): Likewise.
>>>> (vld1q_f32_x3): Likewise.
>>>> (vld1q_f64_x3): Likewise.
>>>> (vld1q_p8_x3): Likewise.
>>>> (vld1q_p16_x3): Likewise.
>>>> (vld1q_p64_x3): Likewise.
>>>> (vst1_s64_x2): Likewise.
>>>> (vst1_u64_x2): Likewise.
>>>> (vst1_f64_x2): 
>>>> Likewise.patchurl=http://people.linaro.org/~christophe.lyon/armv8_2-fp16-scalar-2.patch3
> patchname=armv8_2-fp16-scalar-2.patch3
> refrev=259064
> email_to=christophe.l...@linaro.org
>
>>>> (vst1_s8_x2): Likewise.
>>>> (vst1_p8_x2): Likewise.
>>>> (vst1_s16_x2): Likewise.
>>>> (vst1_p16_x2): Likewise.
>>>> (vst1_s32_x2): Likewise.
>>>> (vst1_u8_x2): Likewise.
>>>> (vst1_u16_x2): Likewise.
>>>> (vst1_u32_x2): Likewise.
>>>> (vst1_f16_x2): Likewise.
>>>> (vst1_f32_x2): Likewise.
>>>> (vst1_p64_x2): Likewise.
>>>> (vst1q_s8_x2): Likewise.
>>>> (vst1q_p8_x2): Likewise.
>>>> (vst1q_s16_x2): Likewise.
>>>> (vst1q_p16_x2): Likewise.
>>>> (vst1q_s32_x2): Likewise.
>>>> (vst1q_s64_x2): Likewise.
>>>> (vst1q_u8_x2): Likewise.
>>>> (vst1q_u16_x2): Likewise.
>>>> (vst1q_u32_x2): Likewise.
>>>> (vst1q_u64_x2): Likewise.
>>>> (vst1q_f16_x2): Likewise.
>>>> (vst1q_f32_x2): Likewise.
>>>> (vst1q_f64_x2): Likewise.
>>>> (vst1q_p64_x2): Likewise.
>>>>  

Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2018-04-11 Thread Sameera Deshpande
On 10 April 2018 at 20:07, Sudakshina Das  wrote:
> Hi Sameera
>
>
> On 10/04/18 11:20, Sameera Deshpande wrote:
>>
>> On 7 April 2018 at 01:25, Christophe Lyon 
>> wrote:
>>>
>>> Hi,
>>>
>>> 2018-04-06 12:15 GMT+02:00 Sameera Deshpande
>>> :
>>>>
>>>> Hi Christophe,
>>>>
>>>> Please find attached the updated patch with testcases.
>>>>
>>>> Ok for trunk?
>>>
>>>
>>> Thanks for the update.
>>>
>>> Since the new intrinsics are only available on aarch64, you want to
>>> prevent the tests from running on arm.
>>> Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two
>>> targets.
>>> There are several examples on how to do that in that directory.
>>>
>>> I have also noticed that the tests fail at execution on aarch64_be.
>>>
>>> I didn't look at the patch in details.
>>>
>>> Christophe
>>>
>>>
>>>>
>>>> - Thanks and regards,
>>>>Sameera D.
>>>>
>>>> 2017-12-14 22:17 GMT+05:30 Christophe Lyon :
>>>>>
>>>>> 2017-12-14 9:29 GMT+01:00 Sameera Deshpande
>>>>> :
>>>>>>
>>>>>> Hi!
>>>>>>
>>>>>> Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and
>>>>>> vst1_*_x3 intrinsics as defined by Neon document.
>>>>>>
>>>>>> Ok for trunk?
>>>>>>
>>>>>> - Thanks and regards,
>>>>>>Sameera D.
>>>>>>
>>>>>> gcc/Changelog:
>>>>>>
>>>>>> 2017-11-14  Sameera Deshpande  
>>>>>>
>>>>>>
>>>>>>  * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
>>>>>>  (st1x2): Likewise.
>>>>>>  (st1x3): Likewise.
>>>>>>  * config/aarch64/aarch64-simd.md
>>>>>> (aarch64_ld1x3): New pattern.
>>>>>>  (aarch64_ld1_x3_): Likewise
>>>>>>  (aarch64_st1x2): Likewise
>>>>>>  (aarch64_st1_x2_): Likewise
>>>>>>  (aarch64_st1x3): Likewise
>>>>>>  (aarch64_st1_x3_): Likewise
>>>>>>  * config/aarch64/arm_neon.h (vld1_u8_x3): New function.
>>>>>>  (vld1_s8_x3): Likewise.
>>>>>>  (vld1_u16_x3): Likewise.
>>>>>>  (vld1_s16_x3): Likewise.
>>>>>>  (vld1_u32_x3): Likewise.
>>>>>>  (vld1_s32_x3): Likewise.
>>>>>>  (vld1_u64_x3): Likewise.
>>>>>>  (vld1_s64_x3): Likewise.
>>>>>>  (vld1_fp16_x3): Likewise.
>>>>>>  (vld1_f32_x3): Likewise.
>>>>>>  (vld1_f64_x3): Likewise.
>>>>>>  (vld1_p8_x3): Likewise.
>>>>>>  (vld1_p16_x3): Likewise.
>>>>>>  (vld1_p64_x3): Likewise.
>>>>>>  (vld1q_u8_x3): Likewise.
>>>>>>  (vld1q_s8_x3): Likewise.
>>>>>>  (vld1q_u16_x3): Likewise.
>>>>>>  (vld1q_s16_x3): Likewise.
>>>>>>  (vld1q_u32_x3): Likewise.
>>>>>>  (vld1q_s32_x3): Likewise.
>>>>>>  (vld1q_u64_x3): Likewise.
>>>>>>  (vld1q_s64_x3): Likewise.
>>>>>>  (vld1q_f16_x3): Likewise.
>>>>>>  (vld1q_f32_x3): Likewise.
>>>>>>  (vld1q_f64_x3): Likewise.
>>>>>>  (vld1q_p8_x3): Likewise.
>>>>>>  (vld1q_p16_x3): Likewise.
>>>>>>  (vld1q_p64_x3): Likewise.
>>>>>>  (vst1_s64_x2): Likewise.
>>>>>>  (vst1_u64_x2): Likewise.
>>>>>>  (vst1_f64_x2):
>>>>>> Likewise.patchurl=http://people.linaro.org/~christophe.lyon/armv8_2-fp16-scalar-2.patch3
>>>
>>> patchname=armv8_2-fp16-scalar-2.patch3
>>> refrev=259064
>>> email_to=christophe.l...@linaro.org
>>>
>>>>>>  (vst1_s8_x2): Likewise.
>>>>>>  (vst1_p8_x2): Likewise.
>>>>>>  (vst1_s16_x2): Likewise.
>>>>>>  (vst1_p16_x2): Lik

Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2018-04-11 Thread Sameera Deshpande
On 11 April 2018 at 15:53, Sudakshina Das  wrote:
> Hi Sameera
>
>
> On 11/04/18 09:04, Sameera Deshpande wrote:
>>
>> On 10 April 2018 at 20:07, Sudakshina Das  wrote:
>>>
>>> Hi Sameera
>>>
>>>
>>> On 10/04/18 11:20, Sameera Deshpande wrote:
>>>>
>>>>
>>>> On 7 April 2018 at 01:25, Christophe Lyon 
>>>> wrote:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> 2018-04-06 12:15 GMT+02:00 Sameera Deshpande
>>>>> :
>>>>>>
>>>>>>
>>>>>> Hi Christophe,
>>>>>>
>>>>>> Please find attached the updated patch with testcases.
>>>>>>
>>>>>> Ok for trunk?
>>>>>
>>>>>
>>>>>
>>>>> Thanks for the update.
>>>>>
>>>>> Since the new intrinsics are only available on aarch64, you want to
>>>>> prevent the tests from running on arm.
>>>>> Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two
>>>>> targets.
>>>>> There are several examples on how to do that in that directory.
>>>>>
>>>>> I have also noticed that the tests fail at execution on aarch64_be.
>>>>>
>>>>> I didn't look at the patch in details.
>>>>>
>>>>> Christophe
>>>>>
>>>>>
>>>>>>
>>>>>> - Thanks and regards,
>>>>>> Sameera D.
>>>>>>
>>>>>> 2017-12-14 22:17 GMT+05:30 Christophe Lyon
>>>>>> :
>>>>>>>
>>>>>>>
>>>>>>> 2017-12-14 9:29 GMT+01:00 Sameera Deshpande
>>>>>>> :
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi!
>>>>>>>>
>>>>>>>> Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and
>>>>>>>> vst1_*_x3 intrinsics as defined by Neon document.
>>>>>>>>
>>>>>>>> Ok for trunk?
>>>>>>>>
>>>>>>>> - Thanks and regards,
>>>>>>>> Sameera D.
>>>>>>>>
>>>>>>>> gcc/Changelog:
>>>>>>>>
>>>>>>>> 2017-11-14  Sameera Deshpande  
>>>>>>>>
>>>>>>>>
>>>>>>>>   * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
>>>>>>>>   (st1x2): Likewise.
>>>>>>>>   (st1x3): Likewise.
>>>>>>>>   * config/aarch64/aarch64-simd.md
>>>>>>>> (aarch64_ld1x3): New pattern.
>>>>>>>>   (aarch64_ld1_x3_): Likewise
>>>>>>>>   (aarch64_st1x2): Likewise
>>>>>>>>   (aarch64_st1_x2_): Likewise
>>>>>>>>   (aarch64_st1x3): Likewise
>>>>>>>>   (aarch64_st1_x3_): Likewise
>>>>>>>>   * config/aarch64/arm_neon.h (vld1_u8_x3): New function.
>>>>>>>>   (vld1_s8_x3): Likewise.
>>>>>>>>   (vld1_u16_x3): Likewise.
>>>>>>>>   (vld1_s16_x3): Likewise.
>>>>>>>>   (vld1_u32_x3): Likewise.
>>>>>>>>   (vld1_s32_x3): Likewise.
>>>>>>>>   (vld1_u64_x3): Likewise.
>>>>>>>>   (vld1_s64_x3): Likewise.
>>>>>>>>   (vld1_fp16_x3): Likewise.
>>>>>>>>   (vld1_f32_x3): Likewise.
>>>>>>>>   (vld1_f64_x3): Likewise.
>>>>>>>>   (vld1_p8_x3): Likewise.
>>>>>>>>   (vld1_p16_x3): Likewise.
>>>>>>>>   (vld1_p64_x3): Likewise.
>>>>>>>>   (vld1q_u8_x3): Likewise.
>>>>>>>>   (vld1q_s8_x3): Likewise.
>>>>>>>>   (vld1q_u16_x3): Likewise.
>>>>>>>>   (vld1q_s16_x3): Likewise.
>>>>>>>>   (vld1q_u32_x3): Likewise.
>>>>>>>>   (vld1q_s32_x3): Likewise.
>>>>>>>>   (vld1q_u64_x3): Likewis

Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2018-04-13 Thread Sameera Deshpande
On Fri 13 Apr, 2018, 8:04 PM James Greenhalgh, 
wrote:

> On Fri, Apr 06, 2018 at 08:55:47PM +0100, Christophe Lyon wrote:
> > Hi,
> >
> > 2018-04-06 12:15 GMT+02:00 Sameera Deshpande <
> sameera.deshpa...@linaro.org>:
> > > Hi Christophe,
> > >
> > > Please find attached the updated patch with testcases.
> > >
> > > Ok for trunk?
> >
> > Thanks for the update.
> >
> > Since the new intrinsics are only available on aarch64, you want to
> > prevent the tests from running on arm.
> > Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two
> targets.
> > There are several examples on how to do that in that directory.
> >
> > I have also noticed that the tests fail at execution on aarch64_be.
>
> I think this is important to fix. We don't want the big-endian target to
> have
> failing implementations of the Neon intrinsics. What is the nature of the
> failure?
>
> From what I can see, nothing in the patch prevents using these intrinsics
> on big-endian, so either the intrinsics behaviour is wrong (we have a wrong
> code bug), or the testcase expected behaviour is wrong.
>
> I don't think disabling the test for big-endian is the right fix. We should
> either fix the intrinsics, or fix the testcase.
>
> Thanks,
> James
>
> Hi James,


As the tests assume the little endian order of elements while checking the
results, the tests are failing for big endian targets. So, the failures are
not because of intrinsic implementations, but because of the testcase.

- Thanks and regards,
  Sameera D.


Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2018-04-30 Thread Sameera Deshpande
On 13 April 2018 at 20:21, James Greenhalgh  wrote:
> On Fri, Apr 13, 2018 at 03:39:32PM +0100, Sameera Deshpande wrote:
>> On Fri 13 Apr, 2018, 8:04 PM James Greenhalgh, 
>> mailto:james.greenha...@arm.com>> wrote:
>> On Fri, Apr 06, 2018 at 08:55:47PM +0100, Christophe Lyon wrote:
>> > Hi,
>> >
>> > 2018-04-06 12:15 GMT+02:00 Sameera Deshpande 
>> > mailto:sameera.deshpa...@linaro.org>>:
>> > > Hi Christophe,
>> > >
>> > > Please find attached the updated patch with testcases.
>> > >
>> > > Ok for trunk?
>> >
>> > Thanks for the update.
>> >
>> > Since the new intrinsics are only available on aarch64, you want to
>> > prevent the tests from running on arm.
>> > Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two 
>> > targets.
>> > There are several examples on how to do that in that directory.
>> >
>> > I have also noticed that the tests fail at execution on aarch64_be.
>>
>> I think this is important to fix. We don't want the big-endian target to have
>> failing implementations of the Neon intrinsics. What is the nature of the
>> failure?
>>
>> From what I can see, nothing in the patch prevents using these intrinsics
>> on big-endian, so either the intrinsics behaviour is wrong (we have a wrong
>> code bug), or the testcase expected behaviour is wrong.
>>
>> I don't think disabling the test for big-endian is the right fix. We should
>> either fix the intrinsics, or fix the testcase.
>>
>> Thanks,
>> James
>>
>> Hi James,
>>
>> As the tests assume the little endian order of elements while checking the
>> results, the tests are failing for big endian targets. So, the failures are
>> not because of intrinsic implementations, but because of the testcase.
>
> The testcase is a little hard to follow through the macros, but why would
> this be the case?
>
> ld1 is deterministic on big and little endian for which elements will be
> loaded from memory, as is st1.
>
> My expectation would be that:
>
>   int __attribute__ ((noinline))
>   test_vld_u16_x3 ()
>   {
> uint16_t data[3 * 3];
> uint16_t temp[3 * 3];
> uint16x4x3_t vectors;
> int i,j;
> for (i = 0; i < 3 * 3; i++)
>   data [i] = (uint16_t) 3*i;
> asm volatile ("" : : : "memory");
> vectors = vld1_u16_x3 (data);
> vst1_u16 (temp, vectors.val[0]);
> vst1_u16 (&temp[3], vectors.val[1]);
> vst1_u16 (&temp[3 * 2], vectors.val[2]);
> asm volatile ("" : : : "memory");
> for (j = 0; j < 3 * 3; j++)
>   if (temp[j] != data[j])
> return 1;
> return 0;
>   }
>
> would work equally well for big- or little-endian.
>
> I think this is more likely to be an intrinsics implementation bug.
>
> Thanks,
> James
>

Hi James,

Please find attached the updated patch, which now passes for little as
well as big endian.
Ok for trunk?

-- 
- Thanks and regards,
  Sameera D.

gcc/Changelog:

2018-05-01  Sameera Deshpande  


* config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
(st1x2): Likewise.
(st1x3): Likewise.
* config/aarch64/aarch64-simd.md
(aarch64_ld1x3): New pattern.
(aarch64_ld1_x3_): Likewise
(aarch64_st1x2): Likewise
(aarch64_st1_x2_): Likewise
(aarch64_st1x3): Likewise
(aarch64_st1_x3_): Likewise
* config/aarch64/arm_neon.h (vld1_u8_x3): New function.
(vld1_s8_x3): Likewise.
(vld1_u16_x3): Likewise.
(vld1_s16_x3): Likewise.
(vld1_u32_x3): Likewise.
(vld1_s32_x3): Likewise.
(vld1_u64_x3): Likewise.
(vld1_s64_x3): Likewise.
(vld1_f16_x3): Likewise.
(vld1_f32_x3): Likewise.
(vld1_f64_x3): Likewise.
(vld1_p8_x3): Likewise.
(vld1_p16_x3): Likewise.
(vld1_p64_x3): Likewise.
(vld1q_u8_x3): Likewise.
(vld1q_s8_x3): Likewise.
(vld1q_u16_x3): Likewise.
(vld1q_s16_x3): Likewise.
(vld1q_u32_x3): Likewise.
(vld1q_s32_x3): Likewise.
(vld1q_u64_x3): Likewise.
(vld1q_s64_x3): Likewise.
(vld1q_f16_x3): Likewise.
(vld1q_f32_x3): Likewise.
(vld1q_f64_x3): Likewise.
(vld1q_p8_x3): Likewise.
(vld1q_p16_x3): Likewise.
(vld1q_p64_x3): Likewise.
(vst1_s64_x2): Likewise.
(vst1_u64_x2): Likewise.
(vst1_f64_x2): Likewise.
(vst1_s8_x2): Likewise.
(vst1_p8_x2): Likewise.
(vst1_s16_x2): Likewise.
(vst1_p16_x2): Likewise.
(vst1_s32

[AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2017-12-14 Thread Sameera Deshpande
Hi!

Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and
vst1_*_x3 intrinsics as defined by Neon document.

Ok for trunk?

- Thanks and regards,
  Sameera D.

gcc/Changelog:

2017-11-14  Sameera Deshpande  


* config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
(st1x2): Likewise.
(st1x3): Likewise.
* config/aarch64/aarch64-simd.md
(aarch64_ld1x3): New pattern.
(aarch64_ld1_x3_): Likewise
(aarch64_st1x2): Likewise
(aarch64_st1_x2_): Likewise
(aarch64_st1x3): Likewise
(aarch64_st1_x3_): Likewise
* config/aarch64/arm_neon.h (vld1_u8_x3): New function.
(vld1_s8_x3): Likewise.
(vld1_u16_x3): Likewise.
(vld1_s16_x3): Likewise.
(vld1_u32_x3): Likewise.
(vld1_s32_x3): Likewise.
(vld1_u64_x3): Likewise.
(vld1_s64_x3): Likewise.
(vld1_fp16_x3): Likewise.
(vld1_f32_x3): Likewise.
(vld1_f64_x3): Likewise.
(vld1_p8_x3): Likewise.
(vld1_p16_x3): Likewise.
(vld1_p64_x3): Likewise.
(vld1q_u8_x3): Likewise.
(vld1q_s8_x3): Likewise.
(vld1q_u16_x3): Likewise.
(vld1q_s16_x3): Likewise.
(vld1q_u32_x3): Likewise.
(vld1q_s32_x3): Likewise.
(vld1q_u64_x3): Likewise.
(vld1q_s64_x3): Likewise.
(vld1q_f16_x3): Likewise.
(vld1q_f32_x3): Likewise.
(vld1q_f64_x3): Likewise.
(vld1q_p8_x3): Likewise.
(vld1q_p16_x3): Likewise.
(vld1q_p64_x3): Likewise.
(vst1_s64_x2): Likewise.
(vst1_u64_x2): Likewise.
(vst1_f64_x2): Likewise.
(vst1_s8_x2): Likewise.
(vst1_p8_x2): Likewise.
(vst1_s16_x2): Likewise.
(vst1_p16_x2): Likewise.
(vst1_s32_x2): Likewise.
(vst1_u8_x2): Likewise.
(vst1_u16_x2): Likewise.
(vst1_u32_x2): Likewise.
(vst1_f16_x2): Likewise.
(vst1_f32_x2): Likewise.
(vst1_p64_x2): Likewise.
(vst1q_s8_x2): Likewise.
(vst1q_p8_x2): Likewise.
(vst1q_s16_x2): Likewise.
(vst1q_p16_x2): Likewise.
(vst1q_s32_x2): Likewise.
(vst1q_s64_x2): Likewise.
(vst1q_u8_x2): Likewise.
(vst1q_u16_x2): Likewise.
(vst1q_u32_x2): Likewise.
(vst1q_u64_x2): Likewise.
(vst1q_f16_x2): Likewise.
(vst1q_f32_x2): Likewise.
(vst1q_f64_x2): Likewise.
(vst1q_p64_x2): Likewise.
(vst1_s64_x3): Likewise.
(vst1_u64_x3): Likewise.
(vst1_f64_x3): Likewise.
(vst1_s8_x3): Likewise.
(vst1_p8_x3): Likewise.
(vst1_s16_x3): Likewise.
(vst1_p16_x3): Likewise.
(vst1_s32_x3): Likewise.
(vst1_u8_x3): Likewise.
(vst1_u16_x3): Likewise.
(vst1_u32_x3): Likewise.
(vst1_f16_x3): Likewise.
(vst1_f32_x3): Likewise.
(vst1_p64_x3): Likewise.
(vst1q_s8_x3): Likewise.
(vst1q_p8_x3): Likewise.
(vst1q_s16_x3): Likewise.
(vst1q_p16_x3): Likewise.
(vst1q_s32_x3): Likewise.
(vst1q_s64_x3): Likewise.
(vst1q_u8_x3): Likewise.
(vst1q_u16_x3): Likewise.
(vst1q_u32_x3): Likewise.
(vst1q_u64_x3): Likewise.
(vst1q_f16_x3): Likewise.
(vst1q_f32_x3): Likewise.
(vst1q_f64_x3): Likewise.
(vst1q_p64_x3): Likewise.
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 52d01342372..fa623e90017 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -441,6 +441,15 @@
   BUILTIN_VALL_F16 (STORE1, st1, 0)
   VAR1(STORE1P, st1, 0, v2di)
 
+  /* Implemented by aarch64_ld1x3.  */
+  BUILTIN_VALLDIF (LOADSTRUCT, ld1x3, 0)
+
+  /* Implemented by aarch64_st1x2.  */
+  BUILTIN_VALLDIF (STORESTRUCT, st1x2, 0)
+
+  /* Implemented by aarch64_st1x3.  */
+  BUILTIN_VALLDIF (STORESTRUCT, st1x3, 0)
+
   /* Implemented by fma4.  */
   BUILTIN_VHSDF (TERNOP, fma, 4)
   VAR1 (TERNOP, fma, 4, hf)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 4fd34c18f95..852bcf0c16a 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -5038,6 +5038,70 @@
 }
 })
 
+
+(define_expand "aarch64_ld1x3"
+  [(match_operand:CI 0 "register_operand" "=w")
+   (match_operand:DI 1 "register_operand" "r")
+   (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+  "TARGET_SIMD"
+{
+  rtx mem = gen_rtx_MEM (CImode, operands[1]);
+  emit_insn (gen_aarch64_ld1_x3_ (operands[0], mem));
+  DONE;
+})
+
+(define_insn "aarch64_ld1_x3_"
+  [(set (match_operand:CI 0 "register_operand" "=w")
+(unspec:CI
+	  [(match_operand:CI 1 "aarch64_simd_struct_operand" "Utv")
+	   (unspec:VALLDIF [(const_int 3)] UNSPEC_VS

[Patch, regrename] Fix PR87330 : ICE in scan_rtx_reg, at regrename.c

2018-10-08 Thread Sameera Deshpande
Hi!

Please find attached the patch fixing the issue PR87330 : ICE in
scan_rtx_reg, at regrename.c:1097.
The regrename pass does not rename the registers which are in notes,
because of which the REG_DEAD note had previous register names, which
caused conflicting liveness information generated for tag collision
pass.

It is better to do it in regrename_do_replace instead while
regrename_analyze, because the note information does not really
contribute into the regrename analysis, hence need not be added in the
def-use chains that are computed. regrename_do_replace is where the
decision to finally rename the register is made - where the note can
be altered with new regname.

Other notes need not be changed, as they don't hold renamed register
information.

Ok for trunk?

Changelog:

2018-10-09 Sameera Deshpande diff --git a/gcc/regrename.c b/gcc/regrename.c
index 8424093..a3446a2 100644
--- a/gcc/regrename.c
+++ b/gcc/regrename.c
@@ -970,6 +970,7 @@ regrename_do_replace (struct du_head *head, int reg)
   unsigned int regno = ORIGINAL_REGNO (*chain->loc);
   struct reg_attrs *attr = REG_ATTRS (*chain->loc);
   int reg_ptr = REG_POINTER (*chain->loc);
+  rtx note;
 
   if (DEBUG_INSN_P (chain->insn) && REGNO (*chain->loc) != base_regno)
 	validate_change (chain->insn, &(INSN_VAR_LOCATION_LOC (chain->insn)),
@@ -986,6 +987,11 @@ regrename_do_replace (struct du_head *head, int reg)
 	  last_reg = *chain->loc;
 	}
 	  validate_change (chain->insn, chain->loc, last_repl, true);
+	  note = find_regno_note (chain->insn, REG_DEAD, base_regno);
+	  if (note != 0)
+	{
+	  validate_change (chain->insn, &XEXP (note, 0), last_repl, true);
+	}
 	}
 }
 


[AArch64] Add Saphira pipeline description.

2018-10-26 Thread Sameera Deshpande
Hi!

Please find attached the patch to add a pipeline description for the
Qualcomm Saphira core.  It is tested with a bootstrap and make check,
with no regressions.

Ok for trunk?

gcc/
Changelog:

2018-10-26 Sameera Deshpande 

* config/aarch64/aarch64-cores.def (saphira): Use saphira pipeline.
* config/aarch64/aarch64.md: Include saphira.md
* config/aarch64/saphira.md: New file for pipeline description.

-- 
- Thanks and regards,
  Sameera D.
diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index 3d876b8..8e4c646 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -90,7 +90,7 @@ AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2
 /* ARMv8.4-A Architecture Processors.  */
 
 /* Qualcomm ('Q') cores. */
-AARCH64_CORE("saphira", saphira,falkor,8_4A,  AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   0x51, 0xC01, -1)
+AARCH64_CORE("saphira", saphira,saphira,8_4A,  AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   0x51, 0xC01, -1)
 
 /* ARMv8-A big.LITTLE implementations.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a014a01..f951354 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -298,6 +298,7 @@
 (include "../arm/cortex-a57.md")
 (include "../arm/exynos-m1.md")
 (include "falkor.md")
+(include "saphira.md")
 (include "thunderx.md")
 (include "../arm/xgene1.md")
 (include "thunderx2t99.md")
diff --git a/gcc/config/aarch64/saphira.md b/gcc/config/aarch64/saphira.md
new file mode 100644
index 000..bbf1c5c
--- /dev/null
+++ b/gcc/config/aarch64/saphira.md
@@ -0,0 +1,583 @@
+;; Saphira pipeline description
+;; Copyright (C) 2017-2018 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_automaton "saphira")
+
+;; Complex int instructions (e.g. multiply and divide) execute in the X
+;; pipeline.  Simple int instructions execute in the X, Y, Z and B pipelines.
+
+(define_cpu_unit "saphira_x" "saphira")
+(define_cpu_unit "saphira_y" "saphira")
+
+;; Branches execute in the Z or B pipeline or in one of the int pipelines depending
+;; on how complex it is.  Simple int insns (like movz) can also execute here.
+
+(define_cpu_unit "saphira_z" "saphira")
+(define_cpu_unit "saphira_b" "saphira")
+
+;; Vector and FP insns execute in the VX and VY pipelines.
+
+(define_automaton "saphira_vfp")
+
+(define_cpu_unit "saphira_vx" "saphira_vfp")
+(define_cpu_unit "saphira_vy" "saphira_vfp")
+
+;; Loads execute in the LD pipeline.
+;; Stores execute in the ST pipeline, for address, data, and
+;; vector data.
+
+(define_automaton "saphira_mem")
+
+(define_cpu_unit "saphira_ld" "saphira_mem")
+(define_cpu_unit "saphira_st" "saphira_mem")
+
+;; The GTOV and VTOG pipelines are for general to vector reg moves, and vice
+;; versa.
+
+(define_cpu_unit "saphira_gtov" "saphira")
+(define_cpu_unit "saphira_vtog" "saphira")
+
+;; Common reservation combinations.
+
+(define_reservation "saphira_vxvy" "saphira_vx|saphira_vy")
+(define_reservation "saphira_zb"   "saphira_z|saphira_b")
+(define_reservation "saphira_xyzb" "saphira_x|saphira_y|saphira_z|saphira_b")
+
+;; SIMD Floating-Point Instructions
+
+(define_insn_reservation "saphira_afp_1_vxvy" 1
+  (and (eq_attr "tune" "saphira")
+   (eq_attr "type" "neon_fp_neg_s,neon_fp_neg_d,neon_fp_abs_s,neon_fp_abs_d,neon_fp_neg_s_q,neon_fp_neg_d_q,neon_fp_abs_s_q,neon_fp_abs_d_q"))
+  "saphira_vxvy")
+
+(define_insn_reservation "saphira_afp_2_vxvy" 2
+  (and (eq_attr "tune" "saphira")
+   (eq_attr "type" "neon_fp_minmax_s,neon_fp_minmax_d,neon_fp_reduc_minmax_s,neon_fp_reduc_minmax_d,neon_fp_compare_s,neon_fp_compare_d,neon_fp_round_s,neon_fp_round_d,neon_fp_minmax_s_q,ne

Re: [RFA/ARM][Patch 01/02]: Thumb2 epilogue in RTL

2011-11-07 Thread Sameera Deshpande
Hi Ramana,

Please find attached reworked patch. The patch is tested with check-gcc,
check-gdb and bootstrap with no regression.

On Fri, 2011-10-21 at 13:43 +0100, Ramana Radhakrishnan wrote: 
> Why are you differentiating on stack_only ? Does it really matter ?
> 
The patterns pop_multi* generate pop instruction, hence I wanted to be
sure that base register is stack.

I can remove stack_only option by
1. Modifying pattern to match SP as base-register explicitly or
2. Generate ldm%(ia%) instruction for non-SP base registers.

I chose second option.

> Hmmm isn't this true of only LDM's in Thumb state ? Though it could be argued
> that this patch is only T2 epilogues.
> 
Yes, its true. But for single register we want to match LDR pattern and
not any of ldm* or pop_multi* pattern. So, I am barring LDM for single
register here.

> >+strcpy (pattern, \"fldmfdd\\t\");
> >+strcat (pattern,
> >+reg_names[REGNO (SET_DEST (XVECEXP (operands[0], 0, 
> >0)))]);
> >+strcat (pattern, \"!, {\");
> >+strcat (pattern, table[(REGNO (XEXP (XVECEXP (operands[0], 0, 1), 0))
> >+   - FIRST_VFP_REGNUM) / 2].name);
> 
> Can't you reuse names from arm.h and avoid the table here ?
> 
The array REGISTER_NAMES in aout.h use S0, S2, ...  names for double
registers. Is there any way to use OVERLAPPING_REGISTER_NAMES? If that
can be done, I can eliminate the table here.

Updated ChangeLog entry:

2011-09-28  Ian Bolton 
Sameera Deshpande  
   
   * config/arm/arm-protos.h (load_multiple_operation_p): New
declaration.
 (thumb2_expand_epilogue): Likewise.
 (thumb2_output_return): Likewise
 (thumb2_expand_return): Likewise.
 (thumb_unexpanded_epilogue): Rename to... 
 (thumb1_unexpanded_epilogue): ...this 
   * config/arm/arm.c (load_multiple_operation_p): New function. 
 (thumb2_emit_multi_reg_pop): Likewise.
 (thumb2_emit_vfp_multi_reg_pop): Likewise.
 (thumb2_expand_return): Likewise. 
 (thumb2_expand_epilogue): Likewise. 
 (thumb2_output_return): Likewise
 (thumb_unexpanded_epilogue): Rename to...
 ( thumb1_unexpanded_epilogue): ...this
   * config/arm/arm.md (pop_multiple_with_stack_update): New
pattern. 
 (pop_multiple_with_stack_update_and_return): Likewise.
 (thumb2_ldr_with_return): Likewise.
 (vfp_point_pop_multiple_with_stack_update): Likewise.
 (return): Update condition and code for pattern.
 (arm_return): Likewise.
 (epilogue_insns): Likewise.
   * config/arm/predicates.md (load_multiple_operation): Update
predicate.
 (load_multiple_operation_return): New predicate. 
 (load_multiple_operation_fp): Likewise.
   * config/arm/thumb2.md (thumb2_return): Remove.
 (thumb2_rtl_epilogue_return): New pattern.


- Thanks and regards,
  Sameera D.diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 23a29c6..2c38883 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -65,6 +65,7 @@ extern int thumb1_legitimate_address_p (enum machine_mode, rtx, int);
 extern int arm_const_double_rtx (rtx);
 extern int neg_const_double_rtx_ok_for_fpa (rtx);
 extern int vfp3_const_double_rtx (rtx);
+extern bool load_multiple_operation_p (rtx, bool, enum machine_mode, bool);
 extern int neon_immediate_valid_for_move (rtx, enum machine_mode, rtx *, int *);
 extern int neon_immediate_valid_for_logic (rtx, enum machine_mode, int, rtx *,
 	   int *);
@@ -176,10 +177,13 @@ extern int arm_float_words_big_endian (void);
 
 /* Thumb functions.  */
 extern void arm_init_expanders (void);
-extern const char *thumb_unexpanded_epilogue (void);
+extern const char *thumb1_unexpanded_epilogue (void);
 extern void thumb1_expand_prologue (void);
 extern void thumb1_expand_epilogue (void);
 extern const char *thumb1_output_interwork (void);
+extern void thumb2_expand_epilogue (void);
+extern void thumb2_output_return (rtx);
+extern void thumb2_expand_return (void);
 #ifdef TREE_CODE
 extern int is_called_in_ARM_mode (tree);
 #endif
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e07c8c3..ec87892 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -8906,6 +8906,137 @@ neon_valid_immediate (rtx op, enum machine_mode mode, int inverse,
 #undef CHECK
 }
 
+/* Return true if OP is a valid load multiple operation for MODE mode.
+   CONSECUTIVE is true if the registers in the operation must form
+   a consecutive sequence in the register bank.  STACK_ONLY is true
+   if the base register must be the stack pointer.  RETURN_PC is true
+   if value is to be loaded in PC.  */
+bool
+load_multiple_operation_p (rtx op, bool consecutive, enum machine_mode mode,
+   bool return_pc)
+{
+  HOST_WIDE_INT count = XVE

Re: [RFA/ARM][Patch 02/05]: LDRD generation instead of POP in A15 Thumb2 epilogue.

2011-11-07 Thread Sameera Deshpande

> 
> 
> I don't believe REG_FRAME_RELATED_EXPR does the right thing for 
> anything besides prologues.  You need to emit REG_CFA_RESTORE
> for the pop inside an epilogue.

Richard, here is updated patch that uses REG_CFA_RESTORE instead of
REG_FRAME_RELATED_EXPR. 


The patch is tested with check-gcc, check-gdb and bootstrap with no
regression.

Ok for trunk?

- Thanks and regards,
  Sameeradiff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 37113f5..e71ead5 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -203,6 +203,7 @@ extern void thumb_reload_in_hi (rtx *);
 extern void thumb_set_return_address (rtx, rtx);
 extern const char *thumb1_output_casesi (rtx *);
 extern const char *thumb2_output_casesi (rtx *);
+extern bool bad_reg_pair_for_thumb_ldrd_strd (rtx, rtx);
 #endif
 
 /* Defined in pe.c.  */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 429b644..05c9368 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15706,6 +15706,151 @@ arm_emit_vfp_multi_reg_pop (int first_reg, int num_regs, rtx base_reg)
   REG_NOTES (par) = dwarf;
 }
 
+bool
+bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx src2)
+{
+  return (GET_CODE (src1) != REG
+  || GET_CODE (src2) != REG
+  || (REGNO (src1) == PC_REGNUM)
+  || (REGNO (src1) == SP_REGNUM)
+  || (REGNO (src1) == REGNO (src2))
+  || (REGNO (src2) == PC_REGNUM)
+  || (REGNO (src2) == SP_REGNUM));
+}
+
+/* Generate and emit a pattern that will be recognized as LDRD pattern.  If even
+   number of registers are being popped, multiple LDRD patterns are created for
+   all register pairs.  If odd number of registers are popped, last register is
+   loaded by using LDR pattern.  */
+static bool
+thumb2_emit_ldrd_pop (unsigned long saved_regs_mask, bool really_return)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, reg, tmp1;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+if (saved_regs_mask & (1 << i))
+  num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+  gcc_assert (really_return || ((saved_regs_mask & (1 << PC_REGNUM)) == 0));
+
+  /* We cannot generate ldrd for PC.  Hence, reduce the count if PC is
+ to be popped.  So, if num_regs is even, now it will become odd,
+ and we can generate pop with PC.  If num_regs is odd, it will be
+ even now, and ldr with return can be generated for PC.  */
+  if (really_return && (saved_regs_mask & (1 << PC_REGNUM)))
+num_regs--;
+
+  /* Var j iterates over all the registers to gather all the registers in
+ saved_regs_mask.  Var i gives index of saved registers in stack frame.
+ A PARALLEL RTX of register-pair is created here, so that pattern for
+ LDRD can be matched.  As PC is always last register to be popped, and
+ we have already decremented num_regs if PC, we don't have to worry
+ about PC in this loop.  */
+  for (i = 0, j = 0; i < (num_regs - (num_regs % 2)); j++)
+if (saved_regs_mask & (1 << j))
+  {
+gcc_assert (j != SP_REGNUM);
+
+/* Create RTX for memory load.  */
+reg = gen_rtx_REG (SImode, j);
+tmp = gen_rtx_SET (SImode,
+   reg,
+   gen_frame_mem (SImode,
+   plus_constant (stack_pointer_rtx, 4 * i)));
+RTX_FRAME_RELATED_P (tmp) = 1;
+
+if (i % 2 == 0)
+  {
+/* When saved-register index (i) is even, the RTX to be emitted is
+   yet to be created.  Hence create it first.  The LDRD pattern we
+   are generating is :
+   [ (SET (reg_t0) (MEM (PLUS (SP) (NUM
+ (SET (reg_t1) (MEM (PLUS (SP) (NUM + 4 ]
+   where target registers need not be consecutive.  */
+par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+dwarf = NULL_RTX;
+  }
+
+/* ith register is added in PARALLEL RTX.  If i is even, the reg_i is
+   added as 0th element and if i is odd, reg_i is added as 1st element
+   of LDRD pattern shown above.  */
+XVECEXP (par, 0, (i % 2)) = tmp;
+dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
+
+if ((i % 2) == 1)
+  {
+/* When saved-register index (i) is odd, RTXs for both the registers
+   to be loaded are generated in above given LDRD pattern, and the
+   pattern can be emitted now.  */
+par = emit_insn (par);
+REG_NOTES (par) = dwarf;
+  }
+
+i++;
+  }
+
+  /* If the number of registers pushed is odd AND really_return is false OR
+ number of registers are even AND really_return is true, last register is
+ popped using LDR.  It can be PC as well.  Hence, adjust the stack first and
+ then LDR with post increment.  */
+
+  /* Increment the stack pointer, based on there being
+ num_regs 4-byte re

Re: [RFA/ARM][Patch 03/05]: STRD generation instead of PUSH in A15 Thumb2 prologue.

2011-11-07 Thread Sameera Deshpande
Hi Ramana,

Please find attached reworked patch. The patch is tested with check-gcc,
check-gdb and bootstrap with no regression.

Ok?

- Thanks and regards,
  Sameera D.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 05c9368..334a25f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15438,6 +15438,125 @@ arm_output_function_epilogue (FILE *file ATTRIBUTE_UNUSED,
 }
 }
 
+/* Generate and emit a pattern that will be recognized as STRD pattern.  If even
+   number of registers are being pushed, multiple STRD patterns are created for
+   all register pairs.  If odd number of registers are pushed, emit a
+   combination of STRDs and STR for the prologue saves.  */
+static void
+thumb2_emit_strd_push (unsigned long saved_regs_mask)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, reg, tmp1;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+if (saved_regs_mask & (1 << i))
+  num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+
+  /* Pre-decrement the stack pointer, based on there being num_regs 4-byte
+ registers to push.  */
+  tmp = gen_rtx_SET (VOIDmode,
+ stack_pointer_rtx,
+ plus_constant (stack_pointer_rtx, -4 * num_regs));
+  RTX_FRAME_RELATED_P (tmp) = 1;
+  insn = emit_insn (tmp);
+
+  /* Create sequence for DWARF info.  */
+  dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (num_regs + 1));
+
+  /* RTLs cannot be shared, hence create new copy for dwarf.  */
+  tmp1 = gen_rtx_SET (VOIDmode,
+ stack_pointer_rtx,
+ plus_constant (stack_pointer_rtx, -4 * num_regs));
+  RTX_FRAME_RELATED_P (tmp1) = 1;
+  XVECEXP (dwarf, 0, 0) = tmp1;
+
+  /* Var j iterates over all the registers to gather all the registers in
+ saved_regs_mask.  Var i gives index of register R_j in stack frame.
+ A PARALLEL RTX of register-pair is created here, so that pattern for
+ STRD can be matched.  If num_regs is odd, 1st register will be pushed
+ using STR and remaining registers will be pushed with STRD in pairs.
+ If num_regs is even, all registers are pushed with STRD in pairs.
+ Hence, skip first element for odd num_regs.  */
+  for (i = num_regs - 1, j = LAST_ARM_REGNUM; i >= (num_regs % 2); j--)
+if (saved_regs_mask & (1 << j))
+  {
+gcc_assert (j != SP_REGNUM);
+gcc_assert (j != PC_REGNUM);
+
+/* Create RTX for store.  New RTX is created for dwarf as
+   they are not sharable.  */
+reg = gen_rtx_REG (SImode, j);
+tmp = gen_rtx_SET (SImode,
+   gen_frame_mem
+   (SImode,
+plus_constant (stack_pointer_rtx, 4 * i)),
+   reg);
+
+tmp1 = gen_rtx_SET (SImode,
+   gen_frame_mem
+   (SImode,
+plus_constant (stack_pointer_rtx, 4 * i)),
+   reg);
+RTX_FRAME_RELATED_P (tmp) = 1;
+RTX_FRAME_RELATED_P (tmp1) = 1;
+
+if (((i - (num_regs % 2)) % 2) == 1)
+  /* When (i - (num_regs % 2)) is odd, the RTX to be emitted is yet to
+ be created.  Hence create it first.  The STRD pattern we are
+ generating is :
+ [ (SET (MEM (PLUS (SP) (NUM))) (reg_t1))
+   (SET (MEM (PLUS (SP) (NUM + 4))) (reg_t2)) ]
+ were target registers need not be consecutive.  */
+  par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+
+/* Register R_j is added in PARALLEL RTX.  If (i - (num_regs % 2)) is
+   even, the reg_j is added as 0th element and if it is odd, reg_i is
+   added as 1st element of STRD pattern shown above.  */
+XVECEXP (par, 0, ((i - (num_regs % 2)) % 2)) = tmp;
+XVECEXP (dwarf, 0, (i + 1)) = tmp1;
+
+if (((i - (num_regs % 2)) % 2) == 0)
+  /* When (i - (num_regs % 2)) is even, RTXs for both the registers
+ to be loaded are generated in above given STRD pattern, and the
+ pattern can be emitted now.  */
+  emit_insn (par);
+
+i--;
+  }
+
+  if ((num_regs % 2) == 1)
+{
+  /* If odd number of registers are pushed, generate STR pattern to store
+ lone register.  */
+  for (; (saved_regs_mask & (1 << j)) == 0; j--);
+
+  tmp1 = gen_frame_mem (SImode, plus_constant (stack_pointer_rtx, 4 * i));
+  reg = gen_rtx_REG (SImode, j);
+  tmp = gen_rtx_SET (SImode, tmp1, reg);
+  RTX_FRAME_RELATED_P (tmp) = 1;
+
+  emit_insn (tmp);
+
+  tmp1 = gen_rtx_SET (SImode,
+ gen_frame_mem
+ (SImode,
+  plus_constant (stack_pointer_rtx, 4 * i)),
+ reg);
+  RTX_FRAME_RELATED_P (tmp1) = 1;
+  XVECEXP (dwarf, 0, 

Re: [RFA/ARM][Patch 01/02]: Thumb2 epilogue in RTL

2011-11-07 Thread Sameera Deshpande
On Mon, 2011-11-07 at 09:56 +, Paul Brook wrote:
> > The array REGISTER_NAMES in aout.h use S0, S2, ...  names for double
> > registers. Is there any way to use OVERLAPPING_REGISTER_NAMES? If that
> > can be done, I can eliminate the table here.
> 
> You should be using %P.
> 

Paul,

Thanks for your comment. Please find attached reworked patch. The patch
is tested with check-gcc without regression.

- Thanks and regards,
  Sameera D. diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 23a29c6..2c38883 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -65,6 +65,7 @@ extern int thumb1_legitimate_address_p (enum machine_mode, rtx, int);
 extern int arm_const_double_rtx (rtx);
 extern int neg_const_double_rtx_ok_for_fpa (rtx);
 extern int vfp3_const_double_rtx (rtx);
+extern bool load_multiple_operation_p (rtx, bool, enum machine_mode, bool);
 extern int neon_immediate_valid_for_move (rtx, enum machine_mode, rtx *, int *);
 extern int neon_immediate_valid_for_logic (rtx, enum machine_mode, int, rtx *,
 	   int *);
@@ -176,10 +177,13 @@ extern int arm_float_words_big_endian (void);
 
 /* Thumb functions.  */
 extern void arm_init_expanders (void);
-extern const char *thumb_unexpanded_epilogue (void);
+extern const char *thumb1_unexpanded_epilogue (void);
 extern void thumb1_expand_prologue (void);
 extern void thumb1_expand_epilogue (void);
 extern const char *thumb1_output_interwork (void);
+extern void thumb2_expand_epilogue (void);
+extern void thumb2_output_return (rtx);
+extern void thumb2_expand_return (void);
 #ifdef TREE_CODE
 extern int is_called_in_ARM_mode (tree);
 #endif
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e07c8c3..ec87892 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -8906,6 +8906,137 @@ neon_valid_immediate (rtx op, enum machine_mode mode, int inverse,
 #undef CHECK
 }
 
+/* Return true if OP is a valid load multiple operation for MODE mode.
+   CONSECUTIVE is true if the registers in the operation must form
+   a consecutive sequence in the register bank.  STACK_ONLY is true
+   if the base register must be the stack pointer.  RETURN_PC is true
+   if value is to be loaded in PC.  */
+bool
+load_multiple_operation_p (rtx op, bool consecutive, enum machine_mode mode,
+   bool return_pc)
+{
+  HOST_WIDE_INT count = XVECLEN (op, 0);
+  unsigned dest_regno, first_dest_regno;
+  rtx src_addr;
+  HOST_WIDE_INT i = 1, base = 0;
+  HOST_WIDE_INT offset = 0;
+  rtx elt;
+  bool addr_reg_loaded = false;
+  bool update = false;
+  int reg_increment, regs_per_val;
+  int offset_adj;
+
+  /* If DFmode, we must be asking for consecutive,
+ since fldmdd can only do consecutive regs.  */
+  gcc_assert ((mode != DFmode) || consecutive);
+
+  /* Set up the increments and the regs per val based on the mode.  */
+  reg_increment = GET_MODE_SIZE (mode);
+  regs_per_val = mode == DFmode ? 2 : 1;
+  offset_adj = return_pc ? 1 : 0;
+
+  if (count <= 1
+  || GET_CODE (XVECEXP (op, 0, offset_adj)) != SET
+  || !REG_P (SET_DEST (XVECEXP (op, 0, offset_adj
+return false;
+
+  /* Check to see if this might be a write-back.  */
+  if (GET_CODE (SET_SRC (elt = XVECEXP (op, 0, offset_adj))) == PLUS)
+{
+  i++;
+  base = 1;
+  update = true;
+
+  /* The offset adjustment should be same as number of registers being
+ popped * size of single register.  */
+  if (!REG_P (SET_DEST (elt))
+  || !REG_P (XEXP (SET_SRC (elt), 0))
+  || !CONST_INT_P (XEXP (SET_SRC (elt), 1))
+  || INTVAL (XEXP (SET_SRC (elt), 1)) !=
+  ((count - 1 - offset_adj) * reg_increment))
+return false;
+}
+
+  i = i + offset_adj;
+  base = base + offset_adj;
+  /* Perform a quick check so we don't blow up below.  */
+  if (GET_CODE (XVECEXP (op, 0, i - 1)) != SET
+  || !REG_P (SET_DEST (XVECEXP (op, 0, i - 1)))
+  || !MEM_P (SET_SRC (XVECEXP (op, 0, i - 1
+return false;
+
+  /* If only one reg being loaded, success depends on the type:
+ FLDMDD can do just one reg, LDM must do at least two.  */
+  if (count <= i)
+return mode == DFmode ? true : false;
+
+  first_dest_regno = REGNO (SET_DEST (XVECEXP (op, 0, i - 1)));
+  dest_regno = first_dest_regno;
+
+  src_addr = XEXP (SET_SRC (XVECEXP (op, 0, i - 1)), 0);
+
+  if (GET_CODE (src_addr) == PLUS)
+{
+  if (!CONST_INT_P (XEXP (src_addr, 1)))
+return false;
+  offset = INTVAL (XEXP (src_addr, 1));
+  src_addr = XEXP (src_addr, 0);
+}
+
+  if (!REG_P (src_addr))
+return false;
+
+  /* The pattern we are trying to match here is:
+ [(SET (R_d0) (MEM (PLUS (src_addr) (offset
+  (SET (R_d1) (MEM (PLUS (src_addr) (offset + 
+   :
+   :
+  (SET (R_dn) (MEM (PLUS (src_addr) (offset + n * 
+ ]
+ Where,
+ 1.  If offset is 0, first insn should be (SET (R_d0) (M

Re: [RFA/ARM][Patch 04/05]: STRD generation instead of PUSH in A15 ARM prologue.

2011-11-08 Thread Sameera Deshpande
On Fri, 2011-10-21 at 13:45 +0100, Ramana Radhakrishnan wrote: 
> >+arm_emit_strd_push (unsigned long saved_regs_mask)
> 
> How different is this from the thumb2 version you sent out in Patch 03/05 ?
> 
Thumb-2 STRD can handle non-consecutive registers, ARM STRD cannot.
Because of which we accumulate non-consecutive STRDs in ARM mode and
emit STM instruction. For consecutive registers, STRD is generated.

> >@@ -15958,7 +16081,8 @@ arm_get_frame_offsets (void)
> >  use 32-bit push/pop instructions.  */
> >   if (! any_sibcall_uses_r3 ()
> >   && arm_size_return_regs () <= 12
> >-  && (offsets->saved_regs_mask & (1 << 3)) == 0)
> >+  && (offsets->saved_regs_mask & (1 << 3)) == 0
> >+  && (TARGET_THUMB2 || !current_tune->prefer_ldrd_strd))
> 
> Not sure I completely follow this change yet.
> 
If the stack is not aligned, we need to adjust the stack in prologue.
Here, instead of adjusting the stack, we PUSH register R3 on stack, so
that no additional ADD instruction is needed for stack adjustment.
This works fine when we generate multi-reg load/store instructions.

However, when we generate STRD in ARM mode, non-consecutive registers
are stored using STR/STM instruction. As pair register of R3 (reg R2) is
never pushed on stack, we always end up generating STR instruction to
PUSH R3 on stack. This is more expensive than doing ADD SP, SP, #4 for
stack adjustment.

e.g. if we are PUSHing {R4, R5, R6} registers, the stack is not aligned,
hence, we PUSH {R3, R4, R5, R6}
So, Instructions generated are:
STR R6, [sp, #4]
STRD R4, R5, [sp, #12]
STR R3, [sp, #16]

However, if instead of R3, other caller-saved register is PUSHed,
we push {R4, R5, R6, R7}, to generate
STRD R6, R7, [sp, #8]
STRD R4, R5, [sp, #16]

If no caller saved register is available, we generate ADD instruction,
which is still better than generating STR. 
> 
> Hmmm the question remains if we want to put these into ldmstm.md since
> it was theoretically
> auto-generated from ldmstm.ml. If this has to be marked to be separate
> then I'd like
> to regenerate ldmstm.md from ldmstm.ml and differentiate between the
> bits that can be auto-generated
> and the bits that have been added since.
> 
The current patterns are quite different from patterns generated using
arm-ldmstm.ml. I will submit updated arm-ldmstm.ml file generating
ldrd/strd patterns as a new patch. Is that fine?

The patch is tested with check-gcc, check-gdb and bootstrap.

I see a regression in gcc:
FAIL: gcc.c-torture/execute/vector-compare-1.c compilation,  -O3
-fomit-frame-pointer -funroll-loops with error message 
/tmp/ccC13odV.s: Assembler messages:
/tmp/ccC13odV.s:544: Error: co-processor offset out of range

This seems to be uncovered latent bug, and I am looking into it.

- Thanks and regards,
  Sameera D.diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index e71ead5..ccf05c7 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -163,6 +163,7 @@ extern const char *arm_output_memory_barrier (rtx *);
 extern const char *arm_output_sync_insn (rtx, rtx *);
 extern unsigned int arm_sync_loop_insns (rtx , rtx *);
 extern int arm_attr_length_push_multi(rtx, rtx);
+extern bool bad_reg_pair_for_arm_ldrd_strd (rtx, rtx);
 
 #if defined TREE_CODE
 extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 334a25f..deee78b 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -93,6 +93,7 @@ static bool arm_assemble_integer (rtx, unsigned int, int);
 static void arm_print_operand (FILE *, rtx, int);
 static void arm_print_operand_address (FILE *, rtx);
 static bool arm_print_operand_punct_valid_p (unsigned char code);
+static rtx emit_multi_reg_push (unsigned long);
 static const char *fp_const_from_val (REAL_VALUE_TYPE *);
 static arm_cc get_arm_condition_code (rtx);
 static HOST_WIDE_INT int_log2 (HOST_WIDE_INT);
@@ -15438,6 +15439,117 @@ arm_output_function_epilogue (FILE *file ATTRIBUTE_UNUSED,
 }
 }
 
+/* STRD in ARM mode needs consecutive registers to be stored.  This function
+   keeps accumulating non-consecutive registers until first consecutive register
+   pair is found.  It then generates multi register PUSH for all accumulated
+   registers, and then generates STRD with write-back for consecutive register
+   pair.  This process is repeated until all the registers are stored on stack.
+   multi register PUSH takes care of lone registers as well.  */
+static void
+arm_emit_strd_push (unsigned long saved_regs_mask)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx tmp, tmp1;
+  unsigned long regs_to_be_pushed_mask;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+

Re: [RFA/ARM][Patch 05/05]: LDRD generation instead of POP in A15 ARM epilogue.

2011-11-08 Thread Sameera Deshpande
On Fri, 2011-10-21 at 13:45 +0100, Ramana Radhakrishnan wrote: 
> change that. Other than that this patch looks OK and please watch out
> for stylistic issues from the previous patch.

Ramana, please find attached reworked patch. The patch is tested with
check-gcc, check-gdb and bootstrap with no regression.

- Thanks and regards,
  Sameera D.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index deee78b..4a86749 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15960,6 +15960,135 @@ bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx src2)
   || (REGNO (src2) == SP_REGNUM));
 }
 
+/* LDRD in ARM mode needs consecutive registers to be stored.  This function
+   keeps accumulating non-consecutive registers until first consecutive register
+   pair is found.  It then generates multi-reg POP for all accumulated
+   registers, and then generates LDRD with write-back for consecutive register
+   pair.  This process is repeated until all the registers are loaded from
+   stack.  multi register POP takes care of lone registers as well.  However,
+   LDRD cannot be generated for PC, as results are unpredictable.  Hence, if PC
+   is in SAVED_REGS_MASK, generate multi-reg POP with RETURN or LDR with RETURN
+   depending upon number of registers in REGS_TO_BE_POPPED_MASK.  */
+static void
+arm_emit_ldrd_pop (unsigned long saved_regs_mask, bool really_return)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp;
+  unsigned long regs_to_be_popped_mask = 0;
+  bool pc_in_list = false;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+if (saved_regs_mask & (1 << i))
+  num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+
+  for (i = 0, j = 0; i < num_regs; j++)
+if (saved_regs_mask & (1 << j))
+  {
+i++;
+if ((j % 2) == 0
+&& (saved_regs_mask & (1 << (j + 1)))
+&& (j + 1) != SP_REGNUM
+&& (j + 1) != PC_REGNUM
+&& regs_to_be_popped_mask)
+  {
+/* Current register and next register form register pair for which
+   LDRD can be generated.  Generate POP for accumulated registers
+   and reset regs_to_be_popped_mask.  SP should be handled here as
+   the results are unpredictable if register being stored is same
+   as index register (in this case, SP).  PC is always the last
+   register being popped.  Hence, we don't have to worry about PC
+   here.  */
+arm_emit_multi_reg_pop (regs_to_be_popped_mask, pc_in_list);
+pc_in_list = false;
+regs_to_be_popped_mask = 0;
+continue;
+  }
+
+if (j == PC_REGNUM)
+  {
+gcc_assert (really_return);
+pc_in_list = 1;
+  }
+
+regs_to_be_popped_mask |= (1 << j);
+
+if ((j % 2) == 1
+&& (saved_regs_mask & (1 << (j - 1)))
+&& j != SP_REGNUM
+&& j != PC_REGNUM)
+  {
+ /* Generate a LDRD for register pair R_, R_.  The pattern
+generated here is
+[(SET SP, (PLUS SP, 8))
+ (SET R_, (MEM SP))
+ (SET R_, (MEM (PLUS SP, 4)))].  */
+ par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));
+
+ tmp = gen_rtx_SET (VOIDmode,
+stack_pointer_rtx,
+plus_constant (stack_pointer_rtx, 8));
+ RTX_FRAME_RELATED_P (tmp) = 1;
+ XVECEXP (par, 0, 0) = tmp;
+
+ tmp = gen_rtx_SET (SImode,
+gen_rtx_REG (SImode, j - 1),
+gen_frame_mem (SImode, stack_pointer_rtx));
+ RTX_FRAME_RELATED_P (tmp) = 1;
+ XVECEXP (par, 0, 1) = tmp;
+ dwarf = alloc_reg_note (REG_CFA_RESTORE,
+ gen_rtx_REG (SImode, j - 1),
+ dwarf);
+
+ tmp = gen_rtx_SET (SImode,
+ gen_rtx_REG (SImode, j),
+ gen_frame_mem (SImode,
+   plus_constant (stack_pointer_rtx, 4)));
+ RTX_FRAME_RELATED_P (tmp) = 1;
+ XVECEXP (par, 0, 2) = tmp;
+ dwarf = alloc_reg_note (REG_CFA_RESTORE,
+ gen_rtx_REG (SImode, j),
+ dwarf);
+
+ insn = emit_insn (par);
+ REG_NOTES (insn) = dwarf;
+ pc_in_list = false;
+ regs_to_be_popped_mask = 0;
+ dwarf = NULL_RTX;
+  }
+  }
+
+  if (regs_to_be_popped_mask)
+{
+  /* single PC pop can happen here.  Take care of that.  */
+  if

Re: [RFA/ARM][Patch 01/02]: Thumb2 epilogue in RTL

2011-11-10 Thread Sameera Deshpande
Hi Richard,

thanks for your comments.

-- 



> +  if (GET_CODE (SET_SRC (elt = XVECEXP (op, 0, offset_adj))) == PLUS)
> 
> It's generally best not to use assignments within conditionals unless
> there is a strong reason otherwise (that normally implies something like
> being deep within a condition test where you only want to update the
> variable if some pre-conditions are true and that can't be easily
> factored out).
> 
> +  != (unsigned int) (first_dest_regno + regs_per_val *
> (i - base
> 
> Line length (split the line just before the '+' operator.
> 
> +  /* now show EVERY reg that will be restored, using a SET for each.  */
> 
> Capital letter at start of sentence.  Why is EVERY in caps?
> 
> +  saved_regs_mask = offsets->saved_regs_mask;
> +  for (i = 0, num_regs = 0; i <= LAST_ARM_REGNUM; i++)
> 
> blank line before the for loop.
> 
> +  /* It's illegal to do a pop for only one reg, so generate an ldr.  */
> 
> GCC coding standards suggest avoiding the use of 'illegal'.  Suggest
> changing that to 'Pop can only be used for more than one reg; so...'
> 
> +reg_names[REGNO (XEXP (XVECEXP (operands[0], 0, 2),
> 0))]);
> +
> +/* Skip over the first two elements and the one we just generated.
>  */
> +for (i = 3; i < (num_saves); i++)
> +  {
> +strcat (pattern, \", %|\");
> 
> +strcat (pattern,
> 
> +reg_names[REGNO (XEXP (XVECEXP (operands[0], 0, i),
> 0))]);
> +  }
> +
> +strcat (pattern, \"}\");
> +output_asm_insn (pattern, operands);
> +
> 
> +return \"\";
> +  }
> +  "
> 
> +  [(set_attr "type" "load4")]
> 
> There's a lot of trailing white space here.  Please remove.

Removed white spaces in reworked patch
http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01009.html

> 
> +(define_insn "*thumb2_ldr_with_return"
> +  [(return)
> +   (set (reg:SI PC_REGNUM)
> +(mem:SI (post_inc:SI (match_operand:SI 0 "s_register_operand"
> "k"]
> +  "TARGET_THUMB2"
> +  "ldr%?\t%|pc, [%0], #4"
> +  [(set_attr "type" "load1")
> +   (set_attr "predicable" "yes")]
> +)
> +
> 
> This pattern doesn't seem to be used.  What's its purpose?

This pattern is generated from thumb2_expand_return in 

+  if (num_regs == 1)
+{
+  rtx par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+  rtx reg = gen_rtx_REG (SImode, PC_REGNUM);
+  rtx addr = gen_rtx_MEM (SImode,
+  gen_rtx_POST_INC (SImode,
+
stack_pointer_rtx));
+  set_mem_alias_set (addr, get_frame_alias_set ());
+  XVECEXP (par, 0, 0) = ret_rtx;
+  XVECEXP (par, 0, 1) = gen_rtx_SET (SImode, reg, addr);
+  RTX_FRAME_RELATED_P (par) = 1;
+  emit_jump_insn (par);
+}

> 
> +static const struct { const char *const name; } table[]
> +  = { {\"d0\"}, {\"d1\"}, {\"d2\"}, {\"d3\"},
> 
> I'm not keen on having this table.  Generally the register names should
> be configurable depending on the assembler flavour and this patch
> defeats that.  Is there any way to rewrite this code so that it can use
> the standard operand methods for generating register names?

The updated patch was resent after comments from Ramana and Paul which
eliminates this table.

http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01009.html

I will take care of other formatting issues and will resend the patch.

> 
> In summary, this is mostly OK, apart from the last two items.
> 
> R.

- Thanks and regards,
  Sameera D.




Re: [RFA/ARM][Patch 01/02]: Thumb2 epilogue in RTL

2011-11-10 Thread Sameera Deshpande
On Thu, 2011-11-10 at 13:44 +, Richard Earnshaw wrote:
> On 28/09/11 17:15, Sameera Deshpande wrote:
> > Hi!
> > 
> > This patch generates Thumb2 epilogues in RTL form.
> > 
> > The work involves defining new functions, predicates and patterns along with
> > few changes in existing code:
> > * The load_multiple_operation predicate was found to be too restrictive for
> > integer loads as it required consecutive destination regs, so this
> > restriction was lifted.
> > * Variations of load_multiple_operation were required to handle cases 
> >- where SP must be the base register 
> >- where FP values were being loaded (which do require consecutive
> > destination registers)
> >- where PC can be in register-list (which requires return pattern along
> > with register loads).
> >   Hence, the common code was factored out into a new function in arm.c and
> > parameterised to show 
> >- whether consecutive destination regs are needed
> >- the data type being loaded 
> >- whether the base register has to be SP
> >- whether PC is in register-list
> > 
> > The patch is tested with arm-eabi with no regressions.
> > 
> > ChangeLog:
> > 
> > 2011-09-28  Ian Bolton 
> > Sameera Deshpande  
> >
> >* config/arm/arm-protos.h (load_multiple_operation_p): New
> > declaration.
> >  (thumb2_expand_epilogue): Likewise.
> >  (thumb2_output_return): Likewise
> >  (thumb2_expand_return): Likewise.
> >  (thumb_unexpanded_epilogue): Rename to... 
> >  (thumb1_unexpanded_epilogue): ...this 
> >* config/arm/arm.c (load_multiple_operation_p): New function. 
> >  (thumb2_emit_multi_reg_pop): Likewise.
> >  (thumb2_emit_vfp_multi_reg_pop): Likewise.
> >  (thumb2_expand_return): Likewise. 
> >  (thumb2_expand_epilogue): Likewise. 
> >  (thumb2_output_return): Likewise
> >  (thumb_unexpanded_epilogue): Rename to...
> >  ( thumb1_unexpanded_epilogue): ...this
> >* config/arm/arm.md (pop_multiple_with_stack_update): New pattern. 
> >  (pop_multiple_with_stack_update_and_return): Likewise.
> >  (thumb2_ldr_with_return): Likewise.
> >  (floating_point_pop_multiple_with_stack_update): Likewise.
> >  (return): Update condition and code for pattern.
> >  (arm_return): Likewise.
> >  (epilogue_insns): Likewise.
> >* config/arm/predicates.md (load_multiple_operation): Update
> > predicate.
> >  (load_multiple_operation_stack_and_return): New predicate. 
> >  (load_multiple_operation_stack): Likewise.
> >  (load_multiple_operation_stack_fp): Likewise.
> >* config/arm/thumb2.md (thumb2_return): Remove.
> >  (thumb2_rtl_epilogue_return): New pattern.
> > 
> > 
> > - Thanks and regards,
> >   Sameera D.
> > 
> > 
> > thumb2_rtl_epilogue_complete-27Sept.patch
> > 
> 
> +  if (GET_CODE (SET_SRC (elt = XVECEXP (op, 0, offset_adj))) == PLUS)
> 
> It's generally best not to use assignments within conditionals unless
> there is a strong reason otherwise (that normally implies something like
> being deep within a condition test where you only want to update the
> variable if some pre-conditions are true and that can't be easily
> factored out).
> 
> +  != (unsigned int) (first_dest_regno + regs_per_val *
> (i - base
> 
> Line length (split the line just before the '+' operator.
> 
> +  /* now show EVERY reg that will be restored, using a SET for each.  */
> 
> Capital letter at start of sentence.  Why is EVERY in caps?
> 
> +  saved_regs_mask = offsets->saved_regs_mask;
> +  for (i = 0, num_regs = 0; i <= LAST_ARM_REGNUM; i++)
> 
> blank line before the for loop.
> 
> +  /* It's illegal to do a pop for only one reg, so generate an ldr.  */
> 
> GCC coding standards suggest avoiding the use of 'illegal'.  Suggest
> changing that to 'Pop can only be used for more than one reg; so...'
> 
> +reg_names[REGNO (XEXP (XVECEXP (operands[0], 0, 2),
> 0))]);
> +
> +/* Skip over the first two elements and the one we just generated.
>  */
> +for (i = 3; i < (num_saves); i++)
> +  {
> +strcat (pattern, \", %|\");
> 
> +strcat (pattern,
> 
> +reg_names[REGNO (XEXP (XVECEXP (operands[0], 0, i),
> 0))]);
> +  }
> +
> +strc

[RFA/ARM][Patch]: Fix NEG_POOL_RANGE

2011-11-17 Thread Sameera Deshpande
Hi!

Please find attached the patch updating NEG_POOL_RANGE from 1008 to
1020 -(8 + ).

Tested with check-gcc with no regression.

The test-case failing for patch 'STRD generation instead of PUSH in A15
ARM prologue' (http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01158.html)
passes with this fix.

gcc/ChangeLog entry:

2011-11-17  Sameera Deshpande  

   * config/arm/arm.md (arm_movdi): Update NEG_POOL_RANGE.
 (movdf_soft_insn): Likewise.
   * config/arm/fpa.md (thumb2_movdf_fpa): Likewise.
   * config/arm/neon.md (neon_mov): Likewise.
   * config/arm/vfp.md (movdi_vfp): Likewise.
 (movdi_vfp_cortexa8): Likewise.
 (movdf_vfp): Likewise.

- Thanks and regards,
  Sameera D.*** gcc/config/arm/.svn/text-base/arm.md.svn-base	Mon Oct 31 14:59:55 2011
--- gcc/config/arm/arm.md	Thu Nov 17 11:52:38 2011
*** (define_insn "*arm_movdi"
*** 5223,5229 
[(set_attr "length" "8,12,16,8,8")
 (set_attr "type" "*,*,*,load2,store2")
 (set_attr "arm_pool_range" "*,*,*,1020,*")
!(set_attr "arm_neg_pool_range" "*,*,*,1008,*")
 (set_attr "thumb2_pool_range" "*,*,*,4096,*")
 (set_attr "thumb2_neg_pool_range" "*,*,*,0,*")]
  )
--- 5223,5229 
[(set_attr "length" "8,12,16,8,8")
 (set_attr "type" "*,*,*,load2,store2")
 (set_attr "arm_pool_range" "*,*,*,1020,*")
!(set_attr "arm_neg_pool_range" "*,*,*,1004,*")
 (set_attr "thumb2_pool_range" "*,*,*,4096,*")
 (set_attr "thumb2_neg_pool_range" "*,*,*,0,*")]
  )
*** (define_insn "*movdf_soft_insn"
*** 6583,6589 
[(set_attr "length" "8,12,16,8,8")
 (set_attr "type" "*,*,*,load2,store2")
 (set_attr "pool_range" "*,*,*,1020,*")
!(set_attr "arm_neg_pool_range" "*,*,*,1008,*")
 (set_attr "thumb2_neg_pool_range" "*,*,*,0,*")]
  )
  
--- 6583,6589 
[(set_attr "length" "8,12,16,8,8")
 (set_attr "type" "*,*,*,load2,store2")
 (set_attr "pool_range" "*,*,*,1020,*")
!(set_attr "arm_neg_pool_range" "*,*,*,1004,*")
 (set_attr "thumb2_neg_pool_range" "*,*,*,0,*")]
  )
  
*** gcc/config/arm/.svn/text-base/neon.md.svn-base	Mon Oct 31 14:59:54 2011
--- gcc/config/arm/neon.md	Thu Nov 17 11:52:38 2011
*** (define_insn "*neon_mov"
*** 198,204 
(set_attr "insn" "*,*,*,*,*,*,mov,*,*")
(set_attr "length" "4,4,4,4,4,4,8,8,8")
(set_attr "pool_range" "*,*,*,1020,*,*,*,1020,*")
!   (set_attr "neg_pool_range" "*,*,*,1008,*,*,*,1008,*")])
  
  (define_insn "*neon_mov"
[(set (match_operand:VQXMOV 0 "nonimmediate_operand"
--- 198,204 
(set_attr "insn" "*,*,*,*,*,*,mov,*,*")
(set_attr "length" "4,4,4,4,4,4,8,8,8")
(set_attr "pool_range" "*,*,*,1020,*,*,*,1020,*")
!   (set_attr "neg_pool_range" "*,*,*,1004,*,*,*,1004,*")])
  
  (define_insn "*neon_mov"
[(set (match_operand:VQXMOV 0 "nonimmediate_operand"
*** (define_insn "*neon_mov"
*** 243,249 
 (set_attr "insn" "*,*,*,*,*,*,mov,*,*")
 (set_attr "length" "4,8,4,8,8,8,16,8,16")
 (set_attr "pool_range" "*,*,*,1020,*,*,*,1020,*")
!(set_attr "neg_pool_range" "*,*,*,1008,*,*,*,1008,*")])
  
  (define_expand "movti"
[(set (match_operand:TI 0 "nonimmediate_operand" "")
--- 243,249 
 (set_attr "insn" "*,*,*,*,*,*,mov,*,*")
 (set_attr "length" "4,8,4,8,8,8,16,8,16")
 (set_attr "pool_range" "*,*,*,1020,*,*,*,1020,*")
!(set_attr "neg_pool_range" "*,*,*,996,*,*,*,996,*")])
  
  (define_expand "movti"
[(set (match_operand:TI 0 "nonimmediate_operand" "")
*** gcc/config/arm/.svn/text-base/vfp.md.svn-base	Thu Sep 29 10:31:13 2011
--- gcc/config/arm/vfp.md	Thu Nov 17 11:52:38 2011
*** (define_insn "*movdi_vfp"
*** 178,184 
   (const_int 4))]
(const_int 4)))
 (set_attr "pool_range" "*,*,*,*,1020,4096,*,*,*,*,1020,*")
!(set_attr "neg_pool_range" "*,*,*,*,1008,0,*,*,*,*,1008,*")
 (set_attr "arch"   "t2,any,any,any,a,t2,any,any,any,any,any,an

Re: Ping! Re: [RFA/ARM][Patch 02/02]: ARM epilogues in RTL

2011-11-22 Thread Sameera Deshpande
On Fri, 2011-11-18 at 21:45 +, Ramana Radhakrishnan wrote:
> On 5 October 2011 17:04, Sameera Deshpande  wrote:
> > Ping!
> >
> > http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01855.html
> 
> 
> This should now be rebased given your other changes to the Thumb2
> epilogues patch or is there a newer version of this patch . This
> doesn't apply cleanly on top of the other patch.

Ramana, I have already sent the rebased patch at
gcc.gnu.org/ml/gcc-patches/2011-11/msg00885.html.
Can you please see if it is working? 

-- 




RE: [RFA/ARM][Patch 01/02]: Thumb2 epilogue in RTL

2011-11-22 Thread Sameera Deshpande
On Tue, 2011-11-22 at 01:55 +, Xinyu Qi wrote:
> At 2011-11-19 07:11:17,"Ramana Radhakrishnan" 
>  wrote:
> > On 10 November 2011 18:07, Sameera Deshpande
> >  wrote:
> > > Please find attached the reworked patch.
> > 
> > OK but for a very small bit .
> > 
> > I'll note that we need to add support for the iwMMXt registers but the
> > attached patch (untested) should be what is needed to support the iwMMxt
> > registers and this should be something that should be incorporated into your
> > ARM RTL epilogues patch as well. My understanding is that this is all you
> > should need to do as there is a wldrd as a part of *iwmmx*_movdi instruction
> > and therefore this should just work in this form.
> > 
> > Xinyu can you help in testing this patch once this is applied along with 
> > your
> > other patches to turn on Thumb2 for iwmmxt ?
> > 
> > cheers
> > Ramana
> 
> Hi, Ramana,
> 
> Though I don't fully understand this patch, I think it is unnecessary to 
> consider turn on Thumb2 for iwmmxt,
> because there is no chip that supports WMMX instructions with thumb2 mode.
> 
> Thanks,
> Xinyu
> 

Ramana, in that case, should I add the change you suggested in ARM RTL
epilogue patch only?

-- 




Re: [RFA/ARM][Patch]: Fix NEG_POOL_RANGE

2011-11-24 Thread Sameera Deshpande
On Fri, 2011-11-18 at 23:12 +, Ramana Radhakrishnan wrote:
> On 17 November 2011 15:16, Sameera Deshpande  
> wrote:
> > Hi!
> >
> > Please find attached the patch updating NEG_POOL_RANGE from 1008 to
> > 1020 -(8 + ).
> 
> This is OK - can you add a comment around the neg_pool_range attribute
> in arm.md stating that the limit should essentially be  -
> (8 +  ?.
> 
Hi Ramana,

Thanks for your comment.
Please find attached the updated patch.

-- *** /work/spec-test/local-checkouts/gcc-fsf/gcc/config/arm/arm.md	2011-11-22 17:20:36.0 +
--- gcc/config/arm/arm.md	2011-11-22 17:14:48.0 +
***  (define_attr "enabled" "no,yes"
*** 268,274 
  ; can be placed.  If the distance is zero, then this insn will never
  ; reference the pool.
  ; NEG_POOL_RANGE is nonzero for insns that can reference a constant pool entry
! ; before its address.
  (define_attr "arm_pool_range" "" (const_int 0))
  (define_attr "thumb2_pool_range" "" (const_int 0))
  (define_attr "arm_neg_pool_range" "" (const_int 0))
--- 268,274 
  ; can be placed.  If the distance is zero, then this insn will never
  ; reference the pool.
  ; NEG_POOL_RANGE is nonzero for insns that can reference a constant pool entry
! ; before its address.  It is set to  - (8 + ).
  (define_attr "arm_pool_range" "" (const_int 0))
  (define_attr "thumb2_pool_range" "" (const_int 0))
  (define_attr "arm_neg_pool_range" "" (const_int 0))
*** (define_insn "*arm_movdi"
*** 5223,5229 
[(set_attr "length" "8,12,16,8,8")
 (set_attr "type" "*,*,*,load2,store2")
 (set_attr "arm_pool_range" "*,*,*,1020,*")
!(set_attr "arm_neg_pool_range" "*,*,*,1008,*")
 (set_attr "thumb2_pool_range" "*,*,*,4096,*")
 (set_attr "thumb2_neg_pool_range" "*,*,*,0,*")]
  )
--- 5223,5229 
[(set_attr "length" "8,12,16,8,8")
 (set_attr "type" "*,*,*,load2,store2")
 (set_attr "arm_pool_range" "*,*,*,1020,*")
!(set_attr "arm_neg_pool_range" "*,*,*,1004,*")
 (set_attr "thumb2_pool_range" "*,*,*,4096,*")
 (set_attr "thumb2_neg_pool_range" "*,*,*,0,*")]
  )
*** (define_insn "*movdf_soft_insn"
*** 6583,6589 
[(set_attr "length" "8,12,16,8,8")
 (set_attr "type" "*,*,*,load2,store2")
 (set_attr "pool_range" "*,*,*,1020,*")
!(set_attr "arm_neg_pool_range" "*,*,*,1008,*")
 (set_attr "thumb2_neg_pool_range" "*,*,*,0,*")]
  )
  
--- 6583,6589 
[(set_attr "length" "8,12,16,8,8")
 (set_attr "type" "*,*,*,load2,store2")
 (set_attr "pool_range" "*,*,*,1020,*")
!(set_attr "arm_neg_pool_range" "*,*,*,1004,*")
 (set_attr "thumb2_neg_pool_range" "*,*,*,0,*")]
  )
  
*** /work/spec-test/local-checkouts/gcc-fsf/gcc/config/arm/fpa.md	2011-11-22 17:18:37.0 +
--- gcc/config/arm/fpa.md	2011-11-22 17:14:48.0 +
*** (define_insn "*thumb2_movdf_fpa"
*** 671,677 
 (set_attr "type"
  "load1,store2,*,store2,load1,ffarith,ffarith,f_fpa_load,f_fpa_store,r_mem_f,f_mem_r")
 (set_attr "pool_range" "*,*,*,*,4092,*,*,1024,*,*,*")
!(set_attr "neg_pool_range" "*,*,*,*,0,*,*,1020,*,*,*")]
  )
  
  ;; Saving and restoring the floating point registers in the prologue should
--- 671,677 
 (set_attr "type"
  "load1,store2,*,store2,load1,ffarith,ffarith,f_fpa_load,f_fpa_store,r_mem_f,f_mem_r")
 (set_attr "pool_range" "*,*,*,*,4092,*,*,1024,*,*,*")
!(set_attr "neg_pool_range" "*,*,*,*,0,*,*,1008,*,*,*")]
  )
  
  ;; Saving and restoring the floating point registers in the prologue should
*** /work/spec-test/local-checkouts/gcc-fsf/gcc/config/arm/neon.md	2011-11-22 17:18:37.0 +
--- gcc/config/arm/neon.md	2011-11-22 17:14:48.0 +
*** (define_insn "*neon_mov"
*** 198,204 
(set_attr "insn" "*,*,*,*,*,*,mov,*,*")
(set_attr "length" "4,4,4,4,4,4,8,8,8")
(set_attr "pool_range" "*,*,*,1020,*,*,*,1020,*")
!   (set_attr "neg_pool_range" "*,*,*,1008,*,*,*,1008,*")])
  
  (define_insn "*neon_mov"
[(set (match_operand:VQXMOV 0 "nonimmediate_operand"
--- 198,204 
(set_attr "

[Patch] Fix Bug 51162

2011-11-24 Thread Sameera Deshpande
Hi,

Please find attached the patch fixing bugzilla issue
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51162.

ARM architecture implements vec_[load|store]_lanes which are
implemented as internal function calls. The function gimple_call_fn ()
returns NULL for internal calls. Hence, this patch guards dereferences
of 'fn' in dump_gimple_call ().

Tests in gcc-dg/vect failing with 'segmentation fault', pass with this
patch.

gcc/Changelog entry:
2011-11-24  Sameera Deshpande  

   * gimple-pretty-print.c (dump_gimple_call): Check if fn is NULL
 before dereferencing.

-- diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index f0e7c50..6d96868 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -699,11 +699,12 @@ dump_gimple_call (pretty_printer *buffer, gimple gs, int spc, int flags)
 pp_string (buffer, " [tail call]");
 
   /* Dump the arguments of _ITM_beginTransaction sanely.  */
-  if (TREE_CODE (fn) == ADDR_EXPR)
+  if (fn != NULL && TREE_CODE (fn) == ADDR_EXPR)
 fn = TREE_OPERAND (fn, 0);
-  if (TREE_CODE (fn) == FUNCTION_DECL && decl_is_tm_clone (fn))
+  if (fn != NULL && TREE_CODE (fn) == FUNCTION_DECL && decl_is_tm_clone (fn))
 pp_string (buffer, " [tm-clone]");
-  if (TREE_CODE (fn) == FUNCTION_DECL
+  if (fn != NULL
+  && TREE_CODE (fn) == FUNCTION_DECL
   && DECL_BUILT_IN_CLASS (fn) == BUILT_IN_NORMAL
   && DECL_FUNCTION_CODE (fn) == BUILT_IN_TM_START
   && gimple_call_num_args (gs) > 0)

Added myself to MAINTAINERS: write after approval

2011-11-25 Thread Sameera Deshpande
Committed.

-- Index: MAINTAINERS
===
--- MAINTAINERS	(revision 181721)
+++ MAINTAINERS	(working copy)
@@ -345,6 +345,7 @@
 David Daney	david.da...@caviumnetworks.com
 Bud Davis	jmda...@link.com
 Chris Demetriou	c...@google.com
+Sameera Deshpandesameera.deshpa...@arm.com
 Fran�ois Dumont	fdum...@gcc.gnu.org
 Benoit Dupont de Dinechin			benoit.dupont-de-dinec...@st.com
 Michael Eager	ea...@eagercon.com

Re: [RFA/ARM][Patch 01/02]: Thumb2 epilogue in RTL

2011-12-01 Thread Sameera Deshpande
On Tue, 2011-11-22 at 10:37 +, Ramana Radhakrishnan wrote:

> Xinyu: I seem to have mis-remembered that one of your patches was
> turning on Thumb2 for wMMX.
> >
> > Ramana, in that case, should I add the change you suggested in ARM RTL
> > epilogue patch only?
> 
> The comment in Thumb2 epilogues should remain and yes - it should be
> added to the ARM RTL epilogue patch only. I'm also ok with that being
> in with a #if 0 around it but given it's in the epilogue whoever tries
> turning on Thumb2 for iwMMX will surely notice that in the first
> testrun :)

Ramana,

Please find attached updated patch which sets CFA_RESTORE note for
single register pop and fixing new ICEs in check-gcc at trunk.

The patch is tested with check-gcc, bootstrap and check-gdb without
regression.

-- diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 23a29c6..2c38883 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -65,6 +65,7 @@ extern int thumb1_legitimate_address_p (enum machine_mode, rtx, int);
 extern int arm_const_double_rtx (rtx);
 extern int neg_const_double_rtx_ok_for_fpa (rtx);
 extern int vfp3_const_double_rtx (rtx);
+extern bool load_multiple_operation_p (rtx, bool, enum machine_mode, bool);
 extern int neon_immediate_valid_for_move (rtx, enum machine_mode, rtx *, int *);
 extern int neon_immediate_valid_for_logic (rtx, enum machine_mode, int, rtx *,
 	   int *);
@@ -176,10 +177,13 @@ extern int arm_float_words_big_endian (void);
 
 /* Thumb functions.  */
 extern void arm_init_expanders (void);
-extern const char *thumb_unexpanded_epilogue (void);
+extern const char *thumb1_unexpanded_epilogue (void);
 extern void thumb1_expand_prologue (void);
 extern void thumb1_expand_epilogue (void);
 extern const char *thumb1_output_interwork (void);
+extern void thumb2_expand_epilogue (void);
+extern void thumb2_output_return (rtx);
+extern void thumb2_expand_return (void);
 #ifdef TREE_CODE
 extern int is_called_in_ARM_mode (tree);
 #endif
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e3b0b88..40c8b44 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -8906,6 +8906,139 @@ neon_valid_immediate (rtx op, enum machine_mode mode, int inverse,
 #undef CHECK
 }
 
+/* Return true if OP is a valid load multiple operation for MODE mode.
+   CONSECUTIVE is true if the registers in the operation must form
+   a consecutive sequence in the register bank.  STACK_ONLY is true
+   if the base register must be the stack pointer.  RETURN_PC is true
+   if value is to be loaded in PC.  */
+bool
+load_multiple_operation_p (rtx op, bool consecutive, enum machine_mode mode,
+   bool return_pc)
+{
+  HOST_WIDE_INT count = XVECLEN (op, 0);
+  unsigned dest_regno, first_dest_regno;
+  rtx src_addr;
+  HOST_WIDE_INT i = 1, base = 0;
+  HOST_WIDE_INT offset = 0;
+  rtx elt;
+  bool addr_reg_loaded = false;
+  bool update = false;
+  int reg_increment, regs_per_val;
+  int offset_adj;
+
+  /* If DFmode, we must be asking for consecutive,
+ since fldmdd can only do consecutive regs.  */
+  gcc_assert ((mode != DFmode) || consecutive);
+
+  /* Set up the increments and the regs per val based on the mode.  */
+  reg_increment = GET_MODE_SIZE (mode);
+  regs_per_val = mode == DFmode ? 2 : 1;
+  offset_adj = return_pc ? 1 : 0;
+
+  if (count <= 1
+  || GET_CODE (XVECEXP (op, 0, offset_adj)) != SET
+  || !REG_P (SET_DEST (XVECEXP (op, 0, offset_adj
+return false;
+
+  /* Check to see if this might be a write-back.  */
+  elt = XVECEXP (op, 0, offset_adj);
+  if (GET_CODE (SET_SRC (elt)) == PLUS)
+{
+  i++;
+  base = 1;
+  update = true;
+
+  /* The offset adjustment should be same as number of registers being
+ popped * size of single register.  */
+  if (!REG_P (SET_DEST (elt))
+  || !REG_P (XEXP (SET_SRC (elt), 0))
+  || !CONST_INT_P (XEXP (SET_SRC (elt), 1))
+  || INTVAL (XEXP (SET_SRC (elt), 1)) !=
+  ((count - 1 - offset_adj) * reg_increment))
+return false;
+}
+
+  i = i + offset_adj;
+  base = base + offset_adj;
+  /* Perform a quick check so we don't blow up below.  */
+  if (GET_CODE (XVECEXP (op, 0, i - 1)) != SET
+  || !REG_P (SET_DEST (XVECEXP (op, 0, i - 1)))
+  || !MEM_P (SET_SRC (XVECEXP (op, 0, i - 1
+return false;
+
+  /* If only one reg being loaded, success depends on the type:
+ FLDMDD can do just one reg, LDM must do at least two.  */
+  if (count <= i)
+return mode == DFmode ? true : false;
+
+  first_dest_regno = REGNO (SET_DEST (XVECEXP (op, 0, i - 1)));
+  dest_regno = first_dest_regno;
+
+  src_addr = XEXP (SET_SRC (XVECEXP (op, 0, i - 1)), 0);
+
+  if (GET_CODE (src_addr) == PLUS)
+{
+  if (!CONST_INT_P (XEXP (src_addr, 1)))
+return false;
+  offset = INTVAL (XEXP (src_addr, 1));
+  src_addr = XEXP (src_addr, 0);
+}
+
+  if (!REG_P (src_addr))
+return false;
+
+  /* T

Re: [Patch] Fix Bug 51162

2011-12-02 Thread Sameera Deshpande
On Wed, 2011-11-30 at 19:43 +, Jason Merrill wrote:
> On 11/24/2011 05:42 AM, Sameera Deshpande wrote:
> > -  if (TREE_CODE (fn) == ADDR_EXPR)
> > +  if (fn != NULL && TREE_CODE (fn) == ADDR_EXPR)
> >  fn = TREE_OPERAND (fn, 0);
> > -  if (TREE_CODE (fn) == FUNCTION_DECL && decl_is_tm_clone (fn))
> > +  if (fn != NULL && TREE_CODE (fn) == FUNCTION_DECL && decl_is_tm_clone 
> > (fn))
> >  pp_string (buffer, " [tm-clone]");
> > -  if (TREE_CODE (fn) == FUNCTION_DECL
> > +  if (fn != NULL
> 
> I'd rather not add the null check so many times.  How about just 
> returning if fn is null?
> 
> Jason
> 

Jason,

Thanks for your comment.
Please find attached reworked patch returning if fn is NULL.

the patch is tested with check-gcc for ARM.

-- diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index f0e7c50..3b5f670 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -698,6 +698,9 @@ dump_gimple_call (pretty_printer *buffer, gimple gs, int spc, int flags)
   if (gimple_call_tail_p (gs))
 pp_string (buffer, " [tail call]");
 
+  if (fn == NULL)
+return;
+
   /* Dump the arguments of _ITM_beginTransaction sanely.  */
   if (TREE_CODE (fn) == ADDR_EXPR)
 fn = TREE_OPERAND (fn, 0);

Re: [RFA/ARM][Patch 02/05]: LDRD generation instead of POP in A15 Thumb2 epilogue.

2011-12-30 Thread Sameera Deshpande
Hi!

Please find attached revised LDRD generation patch for A15 Thumb-2 mode.

Because of the major rework in ARM and Thumb-2 RTL epilogue patches,
this patch has undergone some changes.

The patch is tested with check-gcc, bootstrap and check-gdb without
regression.

Ok for trunk?

-- diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 64d5993..49aae52 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -201,6 +201,7 @@ extern void thumb_reload_in_hi (rtx *);
 extern void thumb_set_return_address (rtx, rtx);
 extern const char *thumb1_output_casesi (rtx *);
 extern const char *thumb2_output_casesi (rtx *);
+extern bool bad_reg_pair_for_thumb_ldrd_strd (rtx, rtx);
 #endif
 
 /* Defined in pe.c.  */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d671281..6d008c5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15847,6 +15847,154 @@ arm_emit_vfp_multi_reg_pop (int first_reg, int num_regs, rtx base_reg)
   REG_NOTES (par) = dwarf;
 }
 
+bool
+bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx src2)
+{
+  return (GET_CODE (src1) != REG
+  || GET_CODE (src2) != REG
+  || (REGNO (src1) == PC_REGNUM)
+  || (REGNO (src1) == SP_REGNUM)
+  || (REGNO (src1) == REGNO (src2))
+  || (REGNO (src2) == PC_REGNUM)
+  || (REGNO (src2) == SP_REGNUM));
+}
+
+/* Generate and emit a pattern that will be recognized as LDRD pattern.  If even
+   number of registers are being popped, multiple LDRD patterns are created for
+   all register pairs.  If odd number of registers are popped, last register is
+   loaded by using LDR pattern.  */
+static void
+thumb2_emit_ldrd_pop (unsigned long saved_regs_mask, bool really_return)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, reg, tmp1;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+if (saved_regs_mask & (1 << i))
+  num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+  gcc_assert (really_return || ((saved_regs_mask & (1 << PC_REGNUM)) == 0));
+
+  /* We cannot generate ldrd for PC.  Hence, reduce the count if PC is
+ to be popped.  So, if num_regs is even, now it will become odd,
+ and we can generate pop with PC.  If num_regs is odd, it will be
+ even now, and ldr with return can be generated for PC.  */
+  if (really_return && (saved_regs_mask & (1 << PC_REGNUM)))
+num_regs--;
+
+  /* Var j iterates over all the registers to gather all the registers in
+ saved_regs_mask.  Var i gives index of saved registers in stack frame.
+ A PARALLEL RTX of register-pair is created here, so that pattern for
+ LDRD can be matched.  As PC is always last register to be popped, and
+ we have already decremented num_regs if PC, we don't have to worry
+ about PC in this loop.  */
+  for (i = 0, j = 0; i < (num_regs - (num_regs % 2)); j++)
+if (saved_regs_mask & (1 << j))
+  {
+gcc_assert (j != SP_REGNUM);
+
+/* Create RTX for memory load.  */
+reg = gen_rtx_REG (SImode, j);
+tmp = gen_rtx_SET (SImode,
+   reg,
+   gen_frame_mem (SImode,
+   plus_constant (stack_pointer_rtx, 4 * i)));
+RTX_FRAME_RELATED_P (tmp) = 1;
+
+if (i % 2 == 0)
+  {
+/* When saved-register index (i) is even, the RTX to be emitted is
+   yet to be created.  Hence create it first.  The LDRD pattern we
+   are generating is :
+   [ (SET (reg_t0) (MEM (PLUS (SP) (NUM
+ (SET (reg_t1) (MEM (PLUS (SP) (NUM + 4 ]
+   where target registers need not be consecutive.  */
+par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+dwarf = NULL_RTX;
+  }
+
+/* ith register is added in PARALLEL RTX.  If i is even, the reg_i is
+   added as 0th element and if i is odd, reg_i is added as 1st element
+   of LDRD pattern shown above.  */
+XVECEXP (par, 0, (i % 2)) = tmp;
+dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
+
+if ((i % 2) == 1)
+  {
+/* When saved-register index (i) is odd, RTXs for both the registers
+   to be loaded are generated in above given LDRD pattern, and the
+   pattern can be emitted now.  */
+par = emit_insn (par);
+REG_NOTES (par) = dwarf;
+  }
+
+i++;
+  }
+
+  /* If the number of registers pushed is odd AND really_return is false OR
+ number of registers are even AND really_return is true, last register is
+ popped using LDR.  It can be PC as well.  Hence, adjust the stack first and
+ then LDR with post increment.  */
+
+  /* Increment the stack pointer, based on there being
+ num_regs 4-byte registers to restore.  */
+  tmp = gen_rtx_SET (VOIDmode,
+ stack_pointer_rtx,
+  

Re: [RFA/ARM][Patch 05/05]: LDRD generation instead of POP in A15 ARM epilogue.

2011-12-30 Thread Sameera Deshpande
Hi Ramana,

Please find attached revised LDRD generation patch for A15 ARM mode.

Because of the major rework in ARM RTL epilogue patch, this patch has
undergone some changes.

The patch is tested with check-gcc, bootstrap and check-gdb without
regression.

Ok for trunk?

-- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d5c651c..46becfb 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -16101,6 +16101,135 @@ bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx src2)
   || (REGNO (src2) == SP_REGNUM));
 }
 
+/* LDRD in ARM mode needs consecutive registers to be stored.  This function
+   keeps accumulating non-consecutive registers until first consecutive register
+   pair is found.  It then generates multi-reg POP for all accumulated
+   registers, and then generates LDRD with write-back for consecutive register
+   pair.  This process is repeated until all the registers are loaded from
+   stack.  multi register POP takes care of lone registers as well.  However,
+   LDRD cannot be generated for PC, as results are unpredictable.  Hence, if PC
+   is in SAVED_REGS_MASK, generate multi-reg POP with RETURN or LDR with RETURN
+   depending upon number of registers in REGS_TO_BE_POPPED_MASK.  */
+static void
+arm_emit_ldrd_pop (unsigned long saved_regs_mask, bool really_return)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp;
+  unsigned long regs_to_be_popped_mask = 0;
+  bool pc_in_list = false;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+if (saved_regs_mask & (1 << i))
+  num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+
+  for (i = 0, j = 0; i < num_regs; j++)
+if (saved_regs_mask & (1 << j))
+  {
+i++;
+if ((j % 2) == 0
+&& (saved_regs_mask & (1 << (j + 1)))
+&& (j + 1) != SP_REGNUM
+&& (j + 1) != PC_REGNUM
+&& regs_to_be_popped_mask)
+  {
+/* Current register and next register form register pair for which
+   LDRD can be generated.  Generate POP for accumulated registers
+   and reset regs_to_be_popped_mask.  SP should be handled here as
+   the results are unpredictable if register being stored is same
+   as index register (in this case, SP).  PC is always the last
+   register being popped.  Hence, we don't have to worry about PC
+   here.  */
+arm_emit_multi_reg_pop (regs_to_be_popped_mask, pc_in_list);
+pc_in_list = false;
+regs_to_be_popped_mask = 0;
+continue;
+  }
+
+if (j == PC_REGNUM)
+  {
+gcc_assert (really_return);
+pc_in_list = 1;
+  }
+
+regs_to_be_popped_mask |= (1 << j);
+
+if ((j % 2) == 1
+&& (saved_regs_mask & (1 << (j - 1)))
+&& j != SP_REGNUM
+&& j != PC_REGNUM)
+  {
+ /* Generate a LDRD for register pair R_, R_.  The pattern
+generated here is
+[(SET SP, (PLUS SP, 8))
+ (SET R_, (MEM SP))
+ (SET R_, (MEM (PLUS SP, 4)))].  */
+ par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));
+
+ tmp = gen_rtx_SET (VOIDmode,
+stack_pointer_rtx,
+plus_constant (stack_pointer_rtx, 8));
+ RTX_FRAME_RELATED_P (tmp) = 1;
+ XVECEXP (par, 0, 0) = tmp;
+
+ tmp = gen_rtx_SET (SImode,
+gen_rtx_REG (SImode, j - 1),
+gen_frame_mem (SImode, stack_pointer_rtx));
+ RTX_FRAME_RELATED_P (tmp) = 1;
+ XVECEXP (par, 0, 1) = tmp;
+ dwarf = alloc_reg_note (REG_CFA_RESTORE,
+ gen_rtx_REG (SImode, j - 1),
+ dwarf);
+
+ tmp = gen_rtx_SET (SImode,
+ gen_rtx_REG (SImode, j),
+ gen_frame_mem (SImode,
+   plus_constant (stack_pointer_rtx, 4)));
+ RTX_FRAME_RELATED_P (tmp) = 1;
+ XVECEXP (par, 0, 2) = tmp;
+ dwarf = alloc_reg_note (REG_CFA_RESTORE,
+ gen_rtx_REG (SImode, j),
+ dwarf);
+
+ insn = emit_insn (par);
+ REG_NOTES (insn) = dwarf;
+ pc_in_list = false;
+ regs_to_be_popped_mask = 0;
+ dwarf = NULL_RTX;
+  }
+  }
+
+  if (regs_to_be_popped_mask)
+{
+  /* single PC pop can happen here.  Take care of that.  */
+  if (pc_in_list && (regs_to_be_popped_mask == (1 << PC_REGNUM)))
+{
+  /* Only PC is to be popped.  */
+  par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+  XVECEXP (par

[Patch ARM] Fix PR 49069.

2012-01-24 Thread Sameera Deshpande
Hi,

Please find attached the patch fixing bug 49069.

This patch is tested with check-gcc on trunk and 4.6 without regression.
OK for trunk?
Is it fine to backport to 4.6 branch?

ChangeLog:
2012-01-24  Sameera Deshpande  
PR target/49069
gcc/config/arm/arm.md (cstoredi4): Handle the case when both
operands are const_int.

gcc/testsuite/ChangeLog:
2012-01-24  Sameera Deshpande  
PR target/49069
gcc.target/arm/pr49069.c: New compile-only test.

- Thanks and regards,
  Sameera D.

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 751997f..e3dc98f 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -7911,8 +7911,9 @@
  enum rtx_code code = GET_CODE (operands[1]);
 
  /* We should not have two constants.  */
- gcc_assert (GET_MODE (operands[2]) == DImode
-		 || GET_MODE (operands[3]) == DImode);
+ if (!(GET_MODE (operands[2]) == DImode || GET_MODE (operands[3]) == DImode)
+ && !(reload_in_progress || reload_completed))
+   operands[3] = force_reg (DImode, operands[3]);
 
 /* Flip unimplemented DImode comparisons to a form that
arm_gen_compare_reg can handle.  */
diff --git a/gcc/testsuite/gcc.target/arm/pr49069.c b/gcc/testsuite/gcc.target/arm/pr49069.c
new file mode 100644
index 000..3cc903e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr49069.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-Os -mfloat-abi=softfp -mfpu=vfpv3-d16" } */
+
+__extension__ typedef unsigned long long int uint64_t;
+
+static int
+func2 (int a, int b)
+{
+  return a == 0 ? a : a / b;
+}
+
+int array1[1];
+const uint64_t array2[1] = { 1 };
+
+void
+foo (void)
+{
+  for (array1[0] = 0; array1[0] == 1; array1[0]++)
+{
+}
+  if (bar (array2[0] == func2 (array1[0], 0)) == 0)
+{
+}
+}

Re: [Patch, regrename] Fix PR87330 : ICE in scan_rtx_reg, at regrename.c

2018-10-30 Thread Sameera Deshpande
On Tue, 9 Oct 2018 at 04:08, Eric Botcazou  wrote:
>
> > Other notes need not be changed, as they don't hold renamed register
> > information.
> >
> > Ok for trunk?
>
> No, REG_DEAD & REG_UNUSED note must be recomputed by passes consuming them.
>
> > 2018-10-09 Sameera Deshpande  >
> > * gcc/regrename.c (regrename_do_replace): Add condition to alter
> > regname if note has same register marked dead in notes.
>
> No gcc/ prefix in gcc/ChangeLog.
>
> --
> Eric Botcazou

Hi Eric,

Thanks for your comments.

Please find attached updated patch invoking data flow for updating the
REG_DEAD and REG_UNUSED notes.

As this change is made in falkor specific file, adding James and
Richard for review.

Ok for trunk?

Changelog:

2018-10-30 Sameera Deshpande diff --git a/gcc/config/aarch64/falkor-tag-collision-avoidance.c b/gcc/config/aarch64/falkor-tag-collision-avoidance.c
index fb6568f..4ca9d66 100644
--- a/gcc/config/aarch64/falkor-tag-collision-avoidance.c
+++ b/gcc/config/aarch64/falkor-tag-collision-avoidance.c
@@ -805,6 +805,7 @@ execute_tag_collision_avoidance ()
   df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
   df_chain_add_problem (DF_UD_CHAIN);
   df_compute_regs_ever_live (true);
+  df_note_add_problem ();
   df_analyze ();
   df_set_flags (DF_DEFER_INSN_RESCAN);
 


Re: [AArch64] Add Saphira pipeline description.

2018-10-30 Thread Sameera Deshpande
On Fri, 26 Oct 2018 at 13:33, Sameera Deshpande
 wrote:
>
> Hi!
>
> Please find attached the patch to add a pipeline description for the
> Qualcomm Saphira core.  It is tested with a bootstrap and make check,
> with no regressions.
>
> Ok for trunk?
>
> gcc/
> Changelog:
>
> 2018-10-26 Sameera Deshpande 
>
> * config/aarch64/aarch64-cores.def (saphira): Use saphira pipeline.
> * config/aarch64/aarch64.md: Include saphira.md
> * config/aarch64/saphira.md: New file for pipeline description.
>
> --
> - Thanks and regards,
>   Sameera D.

Hi!

Please find attached updated patch.
Bootstrap and make check passed without regression. Ok for trunk?

-- 
- Thanks and regards,
  Sameera D.
diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index 3d876b8..8e4c646 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -90,7 +90,7 @@ AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2
 /* ARMv8.4-A Architecture Processors.  */
 
 /* Qualcomm ('Q') cores. */
-AARCH64_CORE("saphira", saphira,falkor,8_4A,  AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   0x51, 0xC01, -1)
+AARCH64_CORE("saphira", saphira,saphira,8_4A,  AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   0x51, 0xC01, -1)
 
 /* ARMv8-A big.LITTLE implementations.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a014a01..f951354 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -298,6 +298,7 @@
 (include "../arm/cortex-a57.md")
 (include "../arm/exynos-m1.md")
 (include "falkor.md")
+(include "saphira.md")
 (include "thunderx.md")
 (include "../arm/xgene1.md")
 (include "thunderx2t99.md")
diff --git a/gcc/config/aarch64/saphira.md b/gcc/config/aarch64/saphira.md
new file mode 100644
index 000..bbf1c5c
--- /dev/null
+++ b/gcc/config/aarch64/saphira.md
@@ -0,0 +1,583 @@
+;; Saphira pipeline description
+;; Copyright (C) 2017-2018 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_automaton "saphira")
+
+;; Complex int instructions (e.g. multiply and divide) execute in the X
+;; pipeline.  Simple int instructions execute in the X, Y, Z and B pipelines.
+
+(define_cpu_unit "saphira_x" "saphira")
+(define_cpu_unit "saphira_y" "saphira")
+
+;; Branches execute in the Z or B pipeline or in one of the int pipelines depending
+;; on how complex it is.  Simple int insns (like movz) can also execute here.
+
+(define_cpu_unit "saphira_z" "saphira")
+(define_cpu_unit "saphira_b" "saphira")
+
+;; Vector and FP insns execute in the VX and VY pipelines.
+
+(define_automaton "saphira_vfp")
+
+(define_cpu_unit "saphira_vx" "saphira_vfp")
+(define_cpu_unit "saphira_vy" "saphira_vfp")
+
+;; Loads execute in the LD pipeline.
+;; Stores execute in the ST pipeline, for address, data, and
+;; vector data.
+
+(define_automaton "saphira_mem")
+
+(define_cpu_unit "saphira_ld" "saphira_mem")
+(define_cpu_unit "saphira_st" "saphira_mem")
+
+;; The GTOV and VTOG pipelines are for general to vector reg moves, and vice
+;; versa.
+
+(define_cpu_unit "saphira_gtov" "saphira")
+(define_cpu_unit "saphira_vtog" "saphira")
+
+;; Common reservation combinations.
+
+(define_reservation "saphira_vxvy" "saphira_vx|saphira_vy")
+(define_reservation "saphira_zb"   "saphira_z|saphira_b")
+(define_reservation "saphira_xyzb" "saphira_x|saphira_y|saphira_z|saphira_b")
+
+;; SIMD Floating-Point Instructions
+
+(define_insn_reservation "saphira_afp_1_vxvy" 1
+  (and (eq_attr "tune" "saphira")
+   (eq_attr "type" "neon_fp_neg_s,neon_fp_neg_d,neon_fp_abs_s,neon_fp_abs_d,neon_fp_neg_s_q,neon_fp_neg_d_q,neon_fp_abs_s_q,neon_fp_abs_d_q"))
+  "saphira_vxvy")
+
+(define_insn_reservatio

Re: [Patch, regrename] Fix PR87330 : ICE in scan_rtx_reg, at regrename.c

2018-10-30 Thread Sameera Deshpande
On Tue, 30 Oct 2018 at 16:16, Richard Earnshaw (lists)
 wrote:
>
> On 30/10/2018 10:09, Sameera Deshpande wrote:
> > On Tue, 9 Oct 2018 at 04:08, Eric Botcazou  wrote:
> >>
> >>> Other notes need not be changed, as they don't hold renamed register
> >>> information.
> >>>
> >>> Ok for trunk?
> >>
> >> No, REG_DEAD & REG_UNUSED note must be recomputed by passes consuming them.
> >>
> >>> 2018-10-09 Sameera Deshpande  >>>
> >>> * gcc/regrename.c (regrename_do_replace): Add condition to alter
> >>> regname if note has same register marked dead in notes.
> >>
> >> No gcc/ prefix in gcc/ChangeLog.
> >>
> >> --
> >> Eric Botcazou
> >
> > Hi Eric,
> >
> > Thanks for your comments.
> >
> > Please find attached updated patch invoking data flow for updating the
> > REG_DEAD and REG_UNUSED notes.
> >
> > As this change is made in falkor specific file, adding James and
> > Richard for review.
> >
> > Ok for trunk?
> >
> > Changelog:
> >
> > 2018-10-30 Sameera Deshpande  >
> > * gcc/config/aarch64/falkor-tag-collision-avoidance.c
> > (execute_tag_collision_avoidance): Invoke df_note_add_problem to
> > recompute REG_DEAD and REG_UNUSED notes before analysis.
> >
>
> 'Call df_note_add_problem.' is enough.
>
> OK with that change.
>
> R.
>
> >
> > bug87330.patch
> >
> > diff --git a/gcc/config/aarch64/falkor-tag-collision-avoidance.c 
> > b/gcc/config/aarch64/falkor-tag-collision-avoidance.c
> > index fb6568f..4ca9d66 100644
> > --- a/gcc/config/aarch64/falkor-tag-collision-avoidance.c
> > +++ b/gcc/config/aarch64/falkor-tag-collision-avoidance.c
> > @@ -805,6 +805,7 @@ execute_tag_collision_avoidance ()
> >df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
> >df_chain_add_problem (DF_UD_CHAIN);
> >df_compute_regs_ever_live (true);
> > +  df_note_add_problem ();
> >df_analyze ();
> >df_set_flags (DF_DEFER_INSN_RESCAN);
> >
> >
>
Thanks Richard! Patch committed at revision 265618.

-- 
- Thanks and regards,
  Sameera D.


Re: [AArch64] Add Saphira pipeline description.

2018-10-31 Thread Sameera Deshpande
On Wed, 31 Oct 2018 at 00:37, James Greenhalgh  wrote:
>
> On Tue, Oct 30, 2018 at 05:12:58AM -0500, Sameera Deshpande wrote:
> > On Fri, 26 Oct 2018 at 13:33, Sameera Deshpande
> >  wrote:
> > >
> > > Hi!
> > >
> > > Please find attached the patch to add a pipeline description for the
> > > Qualcomm Saphira core.  It is tested with a bootstrap and make check,
> > > with no regressions.
> > >
> > > Ok for trunk?
>
> OK.
>
Thanks James, will commit the change.

> I wonder if there's anything we can do to improve maintainability in these
> cases where two pipeline models have considerable overlaps.
>
I agree that there is a need to have some mechanism to maintain the
architectures which have many commonalities.
However, Saphira and Falkor are very different to have lot of sharing,
and with further performance tuning for Saphira, the differences will
be more prominent.
I will commit this patch as is for saphira, and will look at the
possible factoring for Saphira and Falkor pipelines, with
commonalities and differences when the tuning for Saphira is done.

> Thanks,
> James
>
> > >
> > > gcc/
> > > Changelog:
> > >
> > > 2018-10-26 Sameera Deshpande 
> > >
> > > * config/aarch64/aarch64-cores.def (saphira): Use saphira pipeline.
> > > * config/aarch64/aarch64.md: Include saphira.md
> > > * config/aarch64/saphira.md: New file for pipeline description.
> > >
> > > --
> > > - Thanks and regards,
> > >   Sameera D.
> >
> > Hi!
> >
> > Please find attached updated patch.
> > Bootstrap and make check passed without regression. Ok for trunk?
> >
> > --
> > - Thanks and regards,
> >   Sameera D.
>
> > diff --git a/gcc/config/aarch64/aarch64-cores.def 
> > b/gcc/config/aarch64/aarch64-cores.def
> > index 3d876b8..8e4c646 100644
> > --- a/gcc/config/aarch64/aarch64-cores.def
> > +++ b/gcc/config/aarch64/aarch64-cores.def
> > @@ -90,7 +90,7 @@ AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  
> > AARCH64_FL_FOR_ARCH8_2
> >  /* ARMv8.4-A Architecture Processors.  */
> >
> >  /* Qualcomm ('Q') cores. */
> > -AARCH64_CORE("saphira", saphira,falkor,8_4A,  
> > AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   
> > 0x51, 0xC01, -1)
> > +AARCH64_CORE("saphira", saphira,saphira,8_4A,  
> > AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   
> > 0x51, 0xC01, -1)
> >
> >  /* ARMv8-A big.LITTLE implementations.  */
> >
> > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> > index a014a01..f951354 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -298,6 +298,7 @@
> >  (include "../arm/cortex-a57.md")
> >  (include "../arm/exynos-m1.md")
> >  (include "falkor.md")
> > +(include "saphira.md")
> >  (include "thunderx.md")
> >  (include "../arm/xgene1.md")
> >  (include "thunderx2t99.md")
> > diff --git a/gcc/config/aarch64/saphira.md b/gcc/config/aarch64/saphira.md
> > new file mode 100644
> > index 000..bbf1c5c
> > --- /dev/null
> > +++ b/gcc/config/aarch64/saphira.md
> > @@ -0,0 +1,583 @@
> > +;; Saphira pipeline description
> > +;; Copyright (C) 2017-2018 Free Software Foundation, Inc.
> > +;;
> > +;; This file is part of GCC.
> > +;;
> > +;; GCC is free software; you can redistribute it and/or modify it
> > +;; under the terms of the GNU General Public License as published by
> > +;; the Free Software Foundation; either version 3, or (at your option)
> > +;; any later version.
> > +;;
> > +;; GCC is distributed in the hope that it will be useful, but
> > +;; WITHOUT ANY WARRANTY; without even the implied warranty of
> > +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +;; General Public License for more details.
> > +;;
> > +;; You should have received a copy of the GNU General Public License
> > +;; along with GCC; see the file COPYING3.  If not see
> > +;; <http://www.gnu.org/licenses/>.
> > +
> > +(define_automaton "saphira")
> > +
> > +;; Complex int instructions (e.g. multiply and divide) execute in the X
> > +;; pipeline.  Simple int instructions execute in the X, Y, Z and B 
> > pipelines.
> > +
> > +(define_cpu_unit "saphira_x" "saphira"

Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2018-05-08 Thread Sameera Deshpande
On 1 May 2018 at 05:05, Sameera Deshpande  wrote:
> On 13 April 2018 at 20:21, James Greenhalgh  wrote:
>> On Fri, Apr 13, 2018 at 03:39:32PM +0100, Sameera Deshpande wrote:
>>> On Fri 13 Apr, 2018, 8:04 PM James Greenhalgh, 
>>> mailto:james.greenha...@arm.com>> wrote:
>>> On Fri, Apr 06, 2018 at 08:55:47PM +0100, Christophe Lyon wrote:
>>> > Hi,
>>> >
>>> > 2018-04-06 12:15 GMT+02:00 Sameera Deshpande 
>>> > mailto:sameera.deshpa...@linaro.org>>:
>>> > > Hi Christophe,
>>> > >
>>> > > Please find attached the updated patch with testcases.
>>> > >
>>> > > Ok for trunk?
>>> >
>>> > Thanks for the update.
>>> >
>>> > Since the new intrinsics are only available on aarch64, you want to
>>> > prevent the tests from running on arm.
>>> > Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two 
>>> > targets.
>>> > There are several examples on how to do that in that directory.
>>> >
>>> > I have also noticed that the tests fail at execution on aarch64_be.
>>>
>>> I think this is important to fix. We don't want the big-endian target to 
>>> have
>>> failing implementations of the Neon intrinsics. What is the nature of the
>>> failure?
>>>
>>> From what I can see, nothing in the patch prevents using these intrinsics
>>> on big-endian, so either the intrinsics behaviour is wrong (we have a wrong
>>> code bug), or the testcase expected behaviour is wrong.
>>>
>>> I don't think disabling the test for big-endian is the right fix. We should
>>> either fix the intrinsics, or fix the testcase.
>>>
>>> Thanks,
>>> James
>>>
>>> Hi James,
>>>
>>> As the tests assume the little endian order of elements while checking the
>>> results, the tests are failing for big endian targets. So, the failures are
>>> not because of intrinsic implementations, but because of the testcase.
>>
>> The testcase is a little hard to follow through the macros, but why would
>> this be the case?
>>
>> ld1 is deterministic on big and little endian for which elements will be
>> loaded from memory, as is st1.
>>
>> My expectation would be that:
>>
>>   int __attribute__ ((noinline))
>>   test_vld_u16_x3 ()
>>   {
>> uint16_t data[3 * 3];
>> uint16_t temp[3 * 3];
>> uint16x4x3_t vectors;
>> int i,j;
>> for (i = 0; i < 3 * 3; i++)
>>   data [i] = (uint16_t) 3*i;
>> asm volatile ("" : : : "memory");
>> vectors = vld1_u16_x3 (data);
>> vst1_u16 (temp, vectors.val[0]);
>> vst1_u16 (&temp[3], vectors.val[1]);
>> vst1_u16 (&temp[3 * 2], vectors.val[2]);
>> asm volatile ("" : : : "memory");
>> for (j = 0; j < 3 * 3; j++)
>>   if (temp[j] != data[j])
>> return 1;
>> return 0;
>>   }
>>
>> would work equally well for big- or little-endian.
>>
>> I think this is more likely to be an intrinsics implementation bug.
>>
>> Thanks,
>> James
>>
>
> Hi James,
>
> Please find attached the updated patch, which now passes for little as
> well as big endian.
> Ok for trunk?
>
> --
> - Thanks and regards,
>   Sameera D.
>
> gcc/Changelog:
>
> 2018-05-01  Sameera Deshpande  
>
>
> * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
> (st1x2): Likewise.
> (st1x3): Likewise.
> * config/aarch64/aarch64-simd.md
> (aarch64_ld1x3): New pattern.
> (aarch64_ld1_x3_): Likewise
> (aarch64_st1x2): Likewise
> (aarch64_st1_x2_): Likewise
> (aarch64_st1x3): Likewise
> (aarch64_st1_x3_): Likewise
> * config/aarch64/arm_neon.h (vld1_u8_x3): New function.
> (vld1_s8_x3): Likewise.
> (vld1_u16_x3): Likewise.
> (vld1_s16_x3): Likewise.
> (vld1_u32_x3): Likewise.
> (vld1_s32_x3): Likewise.
> (vld1_u64_x3): Likewise.
> (vld1_s64_x3): Likewise.
> (vld1_f16_x3): Likewise.
> (vld1_f32_x3): Likewise.
> (vld1_f64_x3): Likewise.
> (vld1_p8_x3): Likewise.
> (vld1_p16_x3): Likewise.
> (vld1_p64_x3): Likewise.
> (vld1q_u8_x3): Likewise.
> (vld1q_s8_x3): Likewise.
> (vld1q_u16_x3): Likewise.
> (vld1q_

Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2018-05-22 Thread Sameera Deshpande
On Tue 22 May, 2018, 9:26 PM James Greenhalgh, 
wrote:

> On Mon, Apr 30, 2018 at 06:35:11PM -0500, Sameera Deshpande wrote:
> > On 13 April 2018 at 20:21, James Greenhalgh 
> wrote:
> > > On Fri, Apr 13, 2018 at 03:39:32PM +0100, Sameera Deshpande wrote:
> > >> On Fri 13 Apr, 2018, 8:04 PM James Greenhalgh, <
> james.greenha...@arm.com<mailto:james.greenha...@arm.com>> wrote:
> > >> On Fri, Apr 06, 2018 at 08:55:47PM +0100, Christophe Lyon wrote:
> > >> > Hi,
> > >> >
> > >> > 2018-04-06 12:15 GMT+02:00 Sameera Deshpande <
> sameera.deshpa...@linaro.org<mailto:sameera.deshpa...@linaro.org>>:
> > >> > > Hi Christophe,
> > >> > >
> > >> > > Please find attached the updated patch with testcases.
> > >> > >
> > >> > > Ok for trunk?
> > >> >
> > >> > Thanks for the update.
> > >> >
> > >> > Since the new intrinsics are only available on aarch64, you want to
> > >> > prevent the tests from running on arm.
> > >> > Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the
> two targets.
> > >> > There are several examples on how to do that in that directory.
> > >> >
> > >> > I have also noticed that the tests fail at execution on aarch64_be.
> > >>
> > >> I think this is important to fix. We don't want the big-endian target
> to have
> > >> failing implementations of the Neon intrinsics. What is the nature of
> the
> > >> failure?
> > >>
> > >> From what I can see, nothing in the patch prevents using these
> intrinsics
> > >> on big-endian, so either the intrinsics behaviour is wrong (we have a
> wrong
> > >> code bug), or the testcase expected behaviour is wrong.
> > >>
> > >> I don't think disabling the test for big-endian is the right fix. We
> should
> > >> either fix the intrinsics, or fix the testcase.
> > >>
> > >> Thanks,
> > >> James
> > >>
> > >> Hi James,
> > >>
> > >> As the tests assume the little endian order of elements while
> checking the
> > >> results, the tests are failing for big endian targets. So, the
> failures are
> > >> not because of intrinsic implementations, but because of the testcase.
> > >
> > > The testcase is a little hard to follow through the macros, but why
> would
> > > this be the case?
> > >
> > > ld1 is deterministic on big and little endian for which elements will
> be
> > > loaded from memory, as is st1.
> > >
> > > My expectation would be that:
> > >
> > >   int __attribute__ ((noinline))
> > >   test_vld_u16_x3 ()
> > >   {
> > > uint16_t data[3 * 3];
> > > uint16_t temp[3 * 3];
> > > uint16x4x3_t vectors;
> > > int i,j;
> > > for (i = 0; i < 3 * 3; i++)
> > >   data [i] = (uint16_t) 3*i;
> > > asm volatile ("" : : : "memory");
> > > vectors = vld1_u16_x3 (data);
> > > vst1_u16 (temp, vectors.val[0]);
> > > vst1_u16 (&temp[3], vectors.val[1]);
> > > vst1_u16 (&temp[3 * 2], vectors.val[2]);
> > > asm volatile ("" : : : "memory");
> > > for (j = 0; j < 3 * 3; j++)
> > >   if (temp[j] != data[j])
> > > return 1;
> > > return 0;
> > >   }
> > >
> > > would work equally well for big- or little-endian.
> > >
> > > I think this is more likely to be an intrinsics implementation bug.
> > >
> > > Thanks,
> > > James
> > >
> >
> > Hi James,
> >
> > Please find attached the updated patch, which now passes for little as
> > well as big endian.
> > Ok for trunk?
>
>
> OK.
>
> Thanks,
> James
>
> >
> > --
> > - Thanks and regards,
> >   Sameera D.
> >
> > gcc/Changelog:
> >
> > 2018-05-01  Sameera Deshpande  
> >
> >
> > * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
> > (st1x2): Likewise.
> > (st1x3): Likewise.
> > * config/aarch64/aarch64-simd.md
> > (aarch64_ld1x3): New pattern.
> > (aarch64_ld1_x3_): Likewise
> > (aarch64_st1x2): Likewise
> > (aarch64_st1_x2_): Likewise
&

[AARCH64] Add support of ARMv8.4 in saphira for Qualcomm server part

2018-05-29 Thread Sameera Deshpande
Hi!

Please find attached the patch to add support of ARMv8.4 in saphira
for Qualcomm server part. Tested on aarch64, without any regressions.

Ok for trunk?

-- 
- Thanks and regards,
  Sameera D.
diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index 33b96ca2861..e64d8314fa9 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -86,10 +86,10 @@ AARCH64_CORE("thunderx2t99",  thunderx2t99,  thunderx2t99, 8_1A,  AARCH64_FL_FOR
 AARCH64_CORE("cortex-a55",  cortexa55, cortexa53, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa53, 0x41, 0xd05, -1)
 AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa73, 0x41, 0xd0a, -1)
 
-/* ARMv8.3-A Architecture Processors.  */
+/* ARMv8.4-A Architecture Processors.  */
 
 /* Qualcomm ('Q') cores. */
-AARCH64_CORE("saphira", saphira,falkor,8_3A,  AARCH64_FL_FOR_ARCH8_3 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   0x51, 0xC01, -1)
+AARCH64_CORE("saphira", saphira,falkor,8_4A,  AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   0x51, 0xC01, -1)
 
 /* ARMv8-A big.LITTLE implementations.  */
 


Re: [AARCH64] Add support of ARMv8.4 in saphira for Qualcomm server part

2018-05-29 Thread Sameera Deshpande
On Tue 29 May, 2018, 9:19 PM Siddhesh Poyarekar, <
siddhesh.poyare...@linaro.org> wrote:

> On 29 May 2018 at 21:17, James Greenhalgh 
> wrote:
> > On Tue, May 29, 2018 at 05:01:42AM -0500, Sameera Deshpande wrote:
> >> Hi!
> >>
> >> Please find attached the patch to add support of ARMv8.4 in saphira
> >> for Qualcomm server part. Tested on aarch64, without any regressions.
> >>
> >> Ok for trunk?
> >
> > I'm trusting that this is the right thing to do for this core. As
> Siddhesh
> > contributed the original patch; I'd like him to also sign off on this
> > modification.
> >
> > OK for trunk with Siddhesh's ack.
>
> LGTM too.
>
> Thanks,
> Siddhesh
>

Thanks James and Siddhesh.

- Sameera

>