[RFA/ARM][Patch 01/02]: Thumb2 epilogue in RTL

2011-09-28 Thread Sameera Deshpande
Hi!

This patch generates Thumb2 epilogues in RTL form.

The work involves defining new functions, predicates and patterns along with
few changes in existing code:
* The load_multiple_operation predicate was found to be too restrictive for
integer loads as it required consecutive destination regs, so this
restriction was lifted.
* Variations of load_multiple_operation were required to handle cases 
   - where SP must be the base register 
   - where FP values were being loaded (which do require consecutive
destination registers)
   - where PC can be in register-list (which requires return pattern along
with register loads).
  Hence, the common code was factored out into a new function in arm.c and
parameterised to show 
   - whether consecutive destination regs are needed
   - the data type being loaded 
   - whether the base register has to be SP
   - whether PC is in register-list

The patch is tested with arm-eabi with no regressions.

ChangeLog:

2011-09-28  Ian Bolton 
Sameera Deshpande  
   
   * config/arm/arm-protos.h (load_multiple_operation_p): New
declaration.
 (thumb2_expand_epilogue): Likewise.
 (thumb2_output_return): Likewise
 (thumb2_expand_return): Likewise.
 (thumb_unexpanded_epilogue): Rename to... 
 (thumb1_unexpanded_epilogue): ...this 
   * config/arm/arm.c (load_multiple_operation_p): New function. 
 (thumb2_emit_multi_reg_pop): Likewise.
 (thumb2_emit_vfp_multi_reg_pop): Likewise.
 (thumb2_expand_return): Likewise. 
 (thumb2_expand_epilogue): Likewise. 
 (thumb2_output_return): Likewise
 (thumb_unexpanded_epilogue): Rename to...
 ( thumb1_unexpanded_epilogue): ...this
   * config/arm/arm.md (pop_multiple_with_stack_update): New pattern. 
 (pop_multiple_with_stack_update_and_return): Likewise.
 (thumb2_ldr_with_return): Likewise.
 (floating_point_pop_multiple_with_stack_update): Likewise.
 (return): Update condition and code for pattern.
 (arm_return): Likewise.
 (epilogue_insns): Likewise.
   * config/arm/predicates.md (load_multiple_operation): Update
predicate.
 (load_multiple_operation_stack_and_return): New predicate. 
 (load_multiple_operation_stack): Likewise.
 (load_multiple_operation_stack_fp): Likewise.
   * config/arm/thumb2.md (thumb2_return): Remove.
 (thumb2_rtl_epilogue_return): New pattern.


- Thanks and regards,
  Sameera D.

thumb2_rtl_epilogue_complete-27Sept.patch
Description: Binary data


Ping! Re: [RFA/ARM][Patch 01/02]: Thumb2 epilogue in RTL

2011-10-05 Thread Sameera Deshpande
Ping!

http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01854.html

On Wed, 2011-09-28 at 17:15 +0100, Sameera Deshpande wrote:
> Hi!
> 
> This patch generates Thumb2 epilogues in RTL form.
> 
> The work involves defining new functions, predicates and patterns along with
> few changes in existing code:
> * The load_multiple_operation predicate was found to be too restrictive for
> integer loads as it required consecutive destination regs, so this
> restriction was lifted.
> * Variations of load_multiple_operation were required to handle cases 
>- where SP must be the base register 
>- where FP values were being loaded (which do require consecutive
> destination registers)
>- where PC can be in register-list (which requires return pattern along
> with register loads).
>   Hence, the common code was factored out into a new function in arm.c and
> parameterised to show 
>- whether consecutive destination regs are needed
>- the data type being loaded 
>- whether the base register has to be SP
>- whether PC is in register-list
> 
> The patch is tested with arm-eabi with no regressions.
> 
> ChangeLog:
> 
> 2011-09-28  Ian Bolton 
> Sameera Deshpande  
>
>* config/arm/arm-protos.h (load_multiple_operation_p): New
> declaration.
>  (thumb2_expand_epilogue): Likewise.
>  (thumb2_output_return): Likewise
>  (thumb2_expand_return): Likewise.
>  (thumb_unexpanded_epilogue): Rename to... 
>  (thumb1_unexpanded_epilogue): ...this 
>* config/arm/arm.c (load_multiple_operation_p): New function. 
>  (thumb2_emit_multi_reg_pop): Likewise.
>  (thumb2_emit_vfp_multi_reg_pop): Likewise.
>  (thumb2_expand_return): Likewise. 
>  (thumb2_expand_epilogue): Likewise. 
>  (thumb2_output_return): Likewise
>  (thumb_unexpanded_epilogue): Rename to...
>  ( thumb1_unexpanded_epilogue): ...this
>* config/arm/arm.md (pop_multiple_with_stack_update): New pattern. 
>  (pop_multiple_with_stack_update_and_return): Likewise.
>  (thumb2_ldr_with_return): Likewise.
>  (floating_point_pop_multiple_with_stack_update): Likewise.
>  (return): Update condition and code for pattern.
>  (arm_return): Likewise.
>  (epilogue_insns): Likewise.
>* config/arm/predicates.md (load_multiple_operation): Update
> predicate.
>  (load_multiple_operation_stack_and_return): New predicate. 
>  (load_multiple_operation_stack): Likewise.
>  (load_multiple_operation_stack_fp): Likewise.
>* config/arm/thumb2.md (thumb2_return): Remove.
>  (thumb2_rtl_epilogue_return): New pattern.
> 
> 
> - Thanks and regards,
>   Sameera D.

-- 




Ping! Re: [RFA/ARM][Patch 02/02]: ARM epilogues in RTL

2011-10-05 Thread Sameera Deshpande
Ping!

http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01855.html

On Wed, 2011-09-28 at 17:15 +0100, Sameera Deshpande wrote:
> Hi!
> 
> This patch generates ARM epilogue in RTL form.
> 
> The work defines new functions and reuses most of the static functions and
> patterns defined in the previous patch (Thumb2 epilogues in RTL) with minor
> changes to handle mode specific details. 
> Hence, this patch depends completely on previous patch.
> 
> It is tested with arm-eabi with no regression.
> 
> ChangeLog:
> 
> 2011-09-28  Sameera Deshpande  
> 
> 
>* config/arm/arm-protos.h (arm_expand_epilogue): New declarations. 
>  (arm_expand_return): Likewise.
>  (thumb2_expand_epilogue): Add new boolean argument. 
>* config/arm/arm.c (print_multi_reg): Remove.
>  (vfp_output_fldmd): Likewise.
> 
>  (arm_output_epilogue): Likewise.
>  (output_return_instruction): Update the function. 
>  (thumb2_emit_multi_reg_pop): Rename to...
>  (arm_emit_multi_reg_pop): ...this 
>  (thumb2_emit_vfp_multi_reg_pop): Rename to...
>  (arm_emit_vfp_multi_reg_pop): ...this
>  (arm_emit_vfp_multi_reg_pop): Add new argument base_reg.
>  (arm_expand_return): New function.
> 
>  (arm_expand_epilogue): Likewise.
>  (thumb2_expand_epilogue): Add new argument is_sibling.
>* config/arm/arm.md (pop_multiple_with_stack_update): Update 
>  condition and code for pattern.
>  (arm_return): Likewise.
>  (pop_multiple_with_stack_update_and_return): Likewise.
>  (floating_point_pop_multiple_with_stack_update): Likewise.
>  (thumb2_ldr_with_return): Rename to...
>  (ldr_with_return): ...this
>  (ldr_with_return): Update condition.
>  (cond_return): Remove.
>  (cond_return_inverted): Likewise.
>  (return): Update code.
>  (epilogue): Likewise. 
>  (sibcall_epilogue): Likewise.
>  (epilogue_insns): Update condition and code.
> 
> 
> - Thanks and regards,
>   Sameera D.

-- 




[RFA/ARM][Patch 00/05]: Introduction - Generate LDRD/STRD in prologue/epilogue instead of PUSH/POP.

2011-10-11 Thread Sameera Deshpande
This series of 5 patches generate LDRD/STRD instead of POP/PUSH in
epilogue/prologue for ARM and Thumb-2 mode of A15.

Patch [1/5] introduces new field in tune which can be used to indicate
whether LDRD/STRD are preferred over POP/PUSH by the specific core.

Patches [2-5/5] use this field to determine if LDRD/STRD can be
generated instead of PUSH/POP in ARM and Thumb-2 mode.

Patch [2/5] generates LDRD instead of POP for Thumb-2 epilogue in A15.
This patch depends on patch [1/5].

Patch [3/5] generates STRD instead of PUSH for Thumb-2 prologue in A15.
This patch depends for variables, functions and patterns defined in
[1/5] and [2/5].

Patch [4/5] generates STRD instead of PUSH for ARM prologue in A15. This
patch depends on [1/5].

Patch [5/5] generates LDRD instead of POP for ARM epilogue in A15. This
patch depends for variables, functions and patterns defined in [1/5] and
[4/5].

All these patches depend upon the Thumb2/ARM RTL epilogue patches
http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01854.html,
http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01855.html submitted for
review.

All these patches are applied in given order and tested with check-gcc,
check-gdb and bootstrap without regression. 

In case of ARM mode, significant performance improvement can be seen on
some parts of a popular embedded consumer benchmark (~26%). 
However, in most of the cases, not much effect is seen on performance.
(~ 3% improvement) 

In case of thumb2, the performance improvement observed on same parts
the benchmark is ~11% (2.5% improvement). 

-- 






[RFA/ARM][Patch 01/05]: Create tune for Cortex-A15.

2011-10-11 Thread Sameera Deshpande
Hi!

This patch adds new field in tune_params to indicate if LDRD/STRD are
preferred over PUSH/POP in prologue/epilogue of specific core.
It also creates new tune for cortex-A15 and updates tunes for other
cores to set new field to default value. 

Changelog entry for Patch to create tune for cortex-a15:

2011-10-11  Sameera Deshpande
 

* config/arm/arm-cores.def (cortex_a15): Update.
* config/arm/arm-protos.h (struct tune_params): Add new field...
  (arm_gen_ldrd_strd): ... this.
* config/arm/arm.c (arm_slowmul_tune): Add 
  arm_gen_ldrd_strd field settings.
  (arm_fastmul_tune): Likewise.
  (arm_strongarm_tune): Likewise.
  (arm_xscale_tune): Likewise.
  (arm_9e_tune): Likewise.
  (arm_v6t2_tune): Likewise.
  (arm_cortex_tune): Likewise.
  (arm_cortex_a5_tune): Likewise.
  (arm_cortex_a9_tune): Likewise.
  (arm_fa726te_tune): Likewise. 
  (arm_cortex_a15_tune): New variable.
-- 


On Tue, 2011-10-11 at 10:08 +0100, Sameera Deshpande wrote:
> This series of 5 patches generate LDRD/STRD instead of POP/PUSH in
> epilogue/prologue for ARM and Thumb-2 mode of A15.
> 
> Patch [1/5] introduces new field in tune which can be used to indicate
> whether LDRD/STRD are preferred over POP/PUSH by the specific core.
> 
> Patches [2-5/5] use this field to determine if LDRD/STRD can be
> generated instead of PUSH/POP in ARM and Thumb-2 mode.
> 
> Patch [2/5] generates LDRD instead of POP for Thumb-2 epilogue in A15.
> This patch depends on patch [1/5].
> 
> Patch [3/5] generates STRD instead of PUSH for Thumb-2 prologue in A15.
> This patch depends for variables, functions and patterns defined in
> [1/5] and [2/5].
> 
> Patch [4/5] generates STRD instead of PUSH for ARM prologue in A15. This
> patch depends on [1/5].
> 
> Patch [5/5] generates LDRD instead of POP for ARM epilogue in A15. This
> patch depends for variables, functions and patterns defined in [1/5] and
> [4/5].
> 
> All these patches depend upon the Thumb2/ARM RTL epilogue patches
> http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01854.html,
> http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01855.html submitted for
> review.
> 
> All these patches are applied in given order and tested with check-gcc,
> check-gdb and bootstrap without regression. 
> 
> In case of ARM mode, significant performance improvement can be seen on
> some parts of a popular embedded consumer benchmark (~26%). 
> However, in most of the cases, not much effect is seen on performance.
> (~ 3% improvement) 
> 
> In case of thumb2, the performance improvement observed on same parts
> the benchmark is ~11% (2.5% improvement). 
> diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 742b5e8..1b42713 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -128,7 +128,7 @@ ARM_CORE("generic-armv7-a", genericv7a,	7A, FL_LDSCHED, cortex)
 ARM_CORE("cortex-a5",	  cortexa5,	7A, FL_LDSCHED, cortex_a5)
 ARM_CORE("cortex-a8",	  cortexa8,	7A, FL_LDSCHED, cortex)
 ARM_CORE("cortex-a9",	  cortexa9,	7A, FL_LDSCHED, cortex_a9)
-ARM_CORE("cortex-a15",	  cortexa15,	7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex)
+ARM_CORE("cortex-a15",	  cortexa15,	7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15)
 ARM_CORE("cortex-r4",	  cortexr4,	7R, FL_LDSCHED, cortex)
 ARM_CORE("cortex-r4f",	  cortexr4f,	7R, FL_LDSCHED, cortex)
 ARM_CORE("cortex-r5",	  cortexr5,	7R, FL_LDSCHED | FL_ARM_DIV, cortex)
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index f69bc42..c6b8f71 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -243,6 +243,9 @@ struct tune_params
   int l1_cache_line_size;
   bool prefer_constant_pool;
   int (*branch_cost) (bool, bool);
+  /* This flag indicates if STRD/LDRD instructions are preferred
+ over PUSH/POP in epilogue/prologue.  */
+  bool prefer_ldrd_strd;
 };
 
 extern const struct tune_params *current_tune;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6c09267..d709375 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -850,7 +850,8 @@ const struct tune_params arm_slowmul_tune =
   5,		/* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   true,		/* Prefer constant pool.  */
-  arm_default_branch_cost
+  arm_default_branch_cost,
+  false /* Prefer LDRD/STRD.  */
 };
 
 const struct tune_params arm_fastmul_tune =
@@ -861,7 +862,8 @@ const struct tune_params arm_fastmul_tune =
   5,		/* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   true,		/* Prefer constant pool.  */
-  arm_default_bran

[RFA/ARM][Patch 02/05]: LDRD generation instead of POP in A15 Thumb2 epilogue.

2011-10-11 Thread Sameera Deshpande
Hi!

This patch generates LDRD instead of POP for Thumb2 epilogue in A15. 

For optimize_size, original epilogue is generated for A15.
The work involves defining new functions, predicates and patterns.

As LDRD cannot be generated for PC, if PC is in register-list, LDRD is
generated for all other registers in the list which can form register
pair.
Then LDR with return is generated if PC is the only register left to be
popped, otherwise POP with return is generated.

The patch is tested with check-gcc, check-gdb and bootstrap with no
regression. 

Changelog entry for Patch to emit LDRD for thumb2 epilogue in A15:

2011-10-11  Sameera Deshpande
 

   
* config/arm/arm-protos.h (bad_reg_pair_for_thumb_ldrd_strd):
New 
  declaration.
* config/arm/arm.c (bad_reg_pair_for_thumb_ldrd_strd): New
helper 
  function.
  (thumb2_emit_ldrd_pop): New static function.
  (thumb2_expand_epilogue): Update functions.
* config/arm/constraints.md (Pz): New constraint. 
* config/arm/ldmstm.md (thumb2_ldrd_base): New pattern.
  (thumb2_ldrd): Likewise.
* config/arm/predicates.md (ldrd_immediate_operand): New
predicate.

-- 


diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index c6b8f71..06a67b5 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -202,6 +202,7 @@ extern void thumb_reload_in_hi (rtx *);
 extern void thumb_set_return_address (rtx, rtx);
 extern const char *thumb1_output_casesi (rtx *);
 extern const char *thumb2_output_casesi (rtx *);
+extern bool bad_reg_pair_for_thumb_ldrd_strd (rtx, rtx);
 #endif
 
 /* Defined in pe.c.  */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d709375..3eba510 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15410,6 +15410,155 @@ arm_emit_vfp_multi_reg_pop (int first_reg, int num_regs, rtx base_reg)
   par = emit_insn (par);
   add_reg_note (par, REG_FRAME_RELATED_EXPR, dwarf);
 }
+bool
+bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx src2)
+{
+  return (GET_CODE (src1) != REG
+  || GET_CODE (src2) != REG
+  || (REGNO (src1) == PC_REGNUM)
+  || (REGNO (src1) == SP_REGNUM)
+  || (REGNO (src1) == REGNO (src2))
+  || (REGNO (src2) == PC_REGNUM)
+  || (REGNO (src2) == SP_REGNUM));
+}
+
+/* Generate and emit a pattern that will be recognized as LDRD pattern.  If even
+   number of registers are being popped, multiple LDRD patterns are created for
+   all register pairs.  If odd number of registers are popped, last register is
+   loaded by using LDR pattern.  */
+static bool
+thumb2_emit_ldrd_pop (unsigned long saved_regs_mask, bool really_return)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, reg, tmp1;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+if (saved_regs_mask & (1 << i))
+  num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+  gcc_assert (really_return || ((saved_regs_mask & (1 << PC_REGNUM)) == 0));
+
+  if (really_return && (saved_regs_mask & (1 << PC_REGNUM)))
+/* We cannot generate ldrd for PC.  Hence, reduce the count if PC is
+   to be popped.  So, if num_regs is even, now it will become odd,
+   and we can generate pop with PC.  If num_regs is odd, it will be
+   even now, and ldr with return can be generated for PC.  */
+num_regs--;
+
+  for (i = 0, j = 0; i < (num_regs - (num_regs % 2)); j++)
+/* Var j iterates over all the registers to gather all the registers in
+   saved_regs_mask.  Var i gives index of saved registers in stack frame.
+   A PARALLEL RTX of register-pair is created here, so that pattern for
+   LDRD can be matched.  As PC is always last register to be popped, and
+   we have already decremented num_regs if PC, we don't have to worry
+   about PC in this loop.  */
+if (saved_regs_mask & (1 << j))
+  {
+gcc_assert (j != SP_REGNUM);
+
+/* Create RTX for memory load.  New RTX is created for dwarf as
+   they are not sharable.  */
+reg = gen_rtx_REG (SImode, j);
+tmp = gen_rtx_SET (SImode,
+   reg,
+   gen_frame_mem (SImode,
+   plus_constant (stack_pointer_rtx, 4 * i)));
+
+tmp1 = gen_rtx_SET (SImode,
+   reg,
+   gen_frame_mem (SImode,
+   plus_constant (stack_pointer_rtx, 4 * i)));
+RTX_FRAME_RELATED_P (tmp) = 1;
+RTX_FRAME_RELATED_P (tmp1) = 1;
+
+if (i % 2 == 0)
+  {
+/* When saved-register index (i) is even, the RTX to be emitted is
+   yet to be cre

[RFA/ARM][Patch 03/05]: STRD generation instead of PUSH in A15 Thumb2 prologue.

2011-10-11 Thread Sameera Deshpande
Hi!

This patch generates STRD instruction instead of PUSH in thumb2 mode for
A15.

For optimize_size, original prologue is generated for A15.
The work involves defining new functions, predicates and patterns.

The patch is tested with check-gcc, check-gdb and bootstrap with no
regression. 

Changelog entries for the patch for STRD generation for a15-thumb2:

2011-10-11  Sameera Deshpande
 

   
* config/arm/arm.c (thumb2_emit_strd_push): New static
function.  
  (arm_expand_prologue): Update. 
* config/arm/ldmstm.md (thumb2_strd): New pattern.
  (thumb2_strd_base): Likewise.
-- 


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 3eba510..fd8c31d 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15095,6 +15095,125 @@ arm_output_function_epilogue (FILE *file ATTRIBUTE_UNUSED,
 }
 }
 
+/* Generate and emit a pattern that will be recognized as STRD pattern.  If even
+   number of registers are being pushed, multiple STRD patterns are created for
+   all register pairs.  If odd number of registers are pushed, first register is
+   stored by using STR pattern.  */
+static void
+thumb2_emit_strd_push (unsigned long saved_regs_mask)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, reg, tmp1;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+if (saved_regs_mask & (1 << i))
+  num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+
+  /* Pre-decrement the stack pointer, based on there being num_regs 4-byte
+ registers to push.  */
+  tmp = gen_rtx_SET (VOIDmode,
+ stack_pointer_rtx,
+ plus_constant (stack_pointer_rtx, -4 * num_regs));
+  RTX_FRAME_RELATED_P (tmp) = 1;
+  insn = emit_insn (tmp);
+
+  /* Create sequence for DWARF info.  */
+  dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (num_regs + 1));
+
+  /* RTLs cannot be shared, hence create new copy for dwarf.  */
+  tmp1 = gen_rtx_SET (VOIDmode,
+ stack_pointer_rtx,
+ plus_constant (stack_pointer_rtx, -4 * num_regs));
+  RTX_FRAME_RELATED_P (tmp1) = 1;
+  XVECEXP (dwarf, 0, 0) = tmp1;
+
+  for (i = num_regs - 1, j = LAST_ARM_REGNUM; i >= (num_regs % 2); j--)
+/* Var j iterates over all the registers to gather all the registers in
+   saved_regs_mask.  Var i gives index of register R_j in stack frame.
+   A PARALLEL RTX of register-pair is created here, so that pattern for
+   STRD can be matched.  If num_regs is odd, 1st register will be pushed
+   using STR and remaining registers will be pushed with STRD in pairs.
+   If num_regs is even, all registers are pushed with STRD in pairs.
+   Hence, skip first element for odd num_regs.  */
+if (saved_regs_mask & (1 << j))
+  {
+gcc_assert (j != SP_REGNUM);
+gcc_assert (j != PC_REGNUM);
+
+/* Create RTX for store.  New RTX is created for dwarf as
+   they are not sharable.  */
+reg = gen_rtx_REG (SImode, j);
+tmp = gen_rtx_SET (SImode,
+   gen_frame_mem
+   (SImode,
+plus_constant (stack_pointer_rtx, 4 * i)),
+   reg);
+
+tmp1 = gen_rtx_SET (SImode,
+   gen_frame_mem
+   (SImode,
+plus_constant (stack_pointer_rtx, 4 * i)),
+   reg);
+RTX_FRAME_RELATED_P (tmp) = 1;
+RTX_FRAME_RELATED_P (tmp1) = 1;
+
+if (((i - (num_regs % 2)) % 2) == 1)
+  /* When (i - (num_regs % 2)) is odd, the RTX to be emitted is yet to
+ be created.  Hence create it first.  The STRD pattern we are
+ generating is :
+ [ (SET (MEM (PLUS (SP) (NUM))) (reg_t1))
+   (SET (MEM (PLUS (SP) (NUM + 4))) (reg_t2)) ]
+ were target registers need not be consecutive.  */
+  par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+
+/* Register R_j is added in PARALLEL RTX.  If (i - (num_regs % 2)) is
+   even, the reg_j is added as 0th element and if it is odd, reg_i is
+   added as 1st element of STRD pattern shown above.  */
+XVECEXP (par, 0, ((i - (num_regs % 2)) % 2)) = tmp;
+XVECEXP (dwarf, 0, (i + 1)) = tmp1;
+
+if (((i - (num_regs % 2)) % 2) == 0)
+  /* When (i - (num_regs % 2)) is even, RTXs for both the registers
+ to be loaded are generated in above given STRD pattern, and the
+ pattern can be emitted now.  */
+  emit_insn (par);
+
+i--;
+  }
+
+  if ((num_regs % 2) == 1)
+{
+  /* If odd number of registers are pushed, generate STR pattern to store
+ 

[RFA/ARM][Patch 04/05]: STRD generation instead of PUSH in A15 ARM prologue.

2011-10-11 Thread Sameera Deshpande
Hi!

This patch generates STRD instead of PUSH in prologue for A15 ARM mode.

For optimize_size, original prologue is generated for A15.
The work involves defining new functions, predicates and patterns, along
with minor changes in existing code:
* STRD in ARM mode needs consecutive registers to be stored. The
performance of compiler degrades greatly if R3 is pushed for stack
alignment as it generates single LDR for pushing R3. Instead, having SUB
instruction to do stack adjustment is more efficient. Hence, the
condition in arm_get_frame_offsets () is changed to disable push-in-R3
if prefer_ldrd_strd in ARM mode.

In this patch we keep on accumulating non-consecutive registers till
register-pair to be pushed is found. Then, first PUSH all the
accumulated registers, followed by STRD with pre-stack update for
register-pair. We repeat this until all the registers in register-list
are PUSHed.

The patch is tested with check-gcc, check-gdb and bootstrap with no
regression. 

Changelog entry for Patch to emit STRD for ARM prologue in A15:

2011-10-11  Sameera Deshpande
 
   
* config/arm/arm-protos.h (bad_reg_pair_for_arm_ldrd_strd): New
declaration.
* config/arm/arm.c (arm_emit_strd_push): New static function.  
  (bad_reg_pair_for_arm_ldrd_strd): New helper function.
  (arm_expand_prologue): Update. 
  (arm_get_frame_offsets): Update.
* config/arm/ldmstm.md (arm_strd_base): New pattern.
-- 


diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 06a67b5..d5287ad 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -162,6 +162,7 @@ extern const char *arm_output_memory_barrier (rtx *);
 extern const char *arm_output_sync_insn (rtx, rtx *);
 extern unsigned int arm_sync_loop_insns (rtx , rtx *);
 extern int arm_attr_length_push_multi(rtx, rtx);
+extern bool bad_reg_pair_for_arm_ldrd_strd (rtx, rtx);
 
 #if defined TREE_CODE
 extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index fd8c31d..08fa0d5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -93,6 +93,7 @@ static bool arm_assemble_integer (rtx, unsigned int, int);
 static void arm_print_operand (FILE *, rtx, int);
 static void arm_print_operand_address (FILE *, rtx);
 static bool arm_print_operand_punct_valid_p (unsigned char code);
+static rtx emit_multi_reg_push (unsigned long);
 static const char *fp_const_from_val (REAL_VALUE_TYPE *);
 static arm_cc get_arm_condition_code (rtx);
 static HOST_WIDE_INT int_log2 (HOST_WIDE_INT);
@@ -15095,6 +15096,116 @@ arm_output_function_epilogue (FILE *file ATTRIBUTE_UNUSED,
 }
 }
 
+/* STRD in ARM mode needs consecutive registers to be stored.  This function
+   keeps accumulating non-consecutive registers until first consecutive register
+   pair is found.  It then generates multi-reg PUSH for all accumulated
+   registers, and then generates STRD with write-back for consecutive register
+   pair.  This process is repeated until all the registers are stored on stack.
+   multi-reg PUSH takes care of lone registers as well.  */
+static void
+arm_emit_strd_push (unsigned long saved_regs_mask)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx tmp, tmp1;
+  unsigned long regs_to_be_pushed_mask;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+if (saved_regs_mask & (1 << i))
+  num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+
+  for (i=0, j = LAST_ARM_REGNUM, regs_to_be_pushed_mask = 0; i < num_regs; j--)
+/* Var j iterates over all registers to gather all registers in
+   saved_regs_mask.  Var i is used to count number of registers stored on
+   stack.  regs_to_be_pushed_mask accumulates non-consecutive registers
+   that can be pushed using multi-reg PUSH before STRD is generated.  */
+if (saved_regs_mask & (1 << j))
+  {
+gcc_assert (j != SP_REGNUM);
+gcc_assert (j != PC_REGNUM);
+i++;
+
+if ((j % 2 == 1)
+&& (saved_regs_mask & (1 << (j - 1)))
+&& regs_to_be_pushed_mask)
+  {
+/* Current register and previous register form register pair for
+   which STRD can be generated.  Hence, emit PUSH for accumulated
+   registers and reset regs_to_be_pushed_mask.  */
+insn = emit_multi_reg_push (regs_to_be_pushed_mask);
+regs_to_be_pushed_mask = 0;
+RTX_FRAME_RELATED_P (insn) = 1;
+continue;
+  }
+
+regs_to_be_pushed_mask |= (1 << j);
+
+if ((j % 2) == 0 && (saved_regs_mask & (1 << (j + 1
+  {
+/* We have found 2 consecutive registers, for whi

[RFA/ARM][Patch 05/05]: LDRD generation instead of POP in A15 ARM epilogue.

2011-10-11 Thread Sameera Deshpande
Hi!

This patch generates LDRD instead of POP in epilogue for A15 ARM mode.

For optimize_size, original epilogue is generated for A15.
The work involves defining new functions, predicates and patterns.

In this patch we keep on accumulating non-consecutive registers till
register-pair to be popped is found. Then, first POP all the accumulated
registers, followed by LDRD with post-stack update for register-pair. We
repeat this until all the registers in register-list are POPPed.

The patch is tested with check-gcc, check-gdb and bootstrap with no
regression.
 
Changelog entry for Patch to emit LDRD for ARM epilogue in A15:

2011-10-11  Sameera Deshpande
 
   
* config/arm/arm.c (arm_emit_ldrd_pop): New static function.  
  (arm_expand_epilogue): Update. 
* config/arm/ldmstm.md (arm_ldrd_base): New pattern.
  (arm_ldr_with_update): Likewise. 
-- 


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 08fa0d5..0b9fd93 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -967,7 +967,7 @@ const struct tune_params arm_cortex_a9_tune =
   ARM_PREFETCH_BENEFICIAL(4,32,32),
   false,	/* Prefer constant pool.  */
   arm_default_branch_cost,
-  false /* Prefer LDRD/STRD.  */
+  true  /* Prefer LDRD/STRD.  */
 };
 
 const struct tune_params arm_fa726te_tune =
@@ -15664,6 +15664,145 @@ bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx src2)
   || (REGNO (src2) == SP_REGNUM));
 }
 
+/* LDRD in ARM mode needs consecutive registers to be stored.  This function
+   keeps accumulating non-consecutive registers until first consecutive register
+   pair is found.  It then generates multi-reg POP for all accumulated
+   registers, and then generates LDRD with write-back for consecutive register
+   pair.  This process is repeated until all the registers are loaded from
+   stack.  multi-reg POP takes care of lone registers as well.  However, LDRD
+   cannot be generated for PC, as results are unpredictable.  Hence, if PC is
+   in SAVED_REGS_MASK, generate multi-reg POP with RETURN or LDR with RETURN
+   depending upon number of registers in REGS_TO_BE_POPPED_MASK.  */
+static void
+arm_emit_ldrd_pop (unsigned long saved_regs_mask, bool really_return)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, tmp1;
+  unsigned long regs_to_be_popped_mask = 0;
+  bool pc_in_list = false;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+if (saved_regs_mask & (1 << i))
+  num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+
+  for (i = 0, j = 0; i < num_regs; j++)
+if (saved_regs_mask & (1 << j))
+  {
+i++;
+if ((j % 2) == 0
+&& (saved_regs_mask & (1 << (j + 1)))
+&& (j + 1) != SP_REGNUM
+&& (j + 1) != PC_REGNUM
+&& regs_to_be_popped_mask)
+  {
+/* Current register and next register form register pair for which
+   LDRD can be generated.  Generate POP for accumulated registers
+   and reset regs_to_be_popped_mask.  SP should be handled here as
+   the results are unpredictable if register being stored is same
+   as index register (in this case, SP).  PC is always the last
+   register being popped.  Hence, we don't have to worry about PC
+   here.  */
+arm_emit_multi_reg_pop (regs_to_be_popped_mask, pc_in_list);
+pc_in_list = false;
+regs_to_be_popped_mask = 0;
+continue;
+  }
+
+if (j == PC_REGNUM)
+  {
+gcc_assert (really_return);
+pc_in_list = 1;
+  }
+
+regs_to_be_popped_mask |= (1 << j);
+
+if ((j % 2) == 1
+&& (saved_regs_mask & (1 << (j - 1)))
+&& j != SP_REGNUM
+&& j != PC_REGNUM)
+  {
+ /* Generate a LDRD for register pair R_, R_.  The pattern
+generated here is
+[(SET SP, (PLUS SP, 8))
+ (SET R_, (MEM SP))
+ (SET R_, (MEM (PLUS SP, 4)))].  */
+ par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));
+ dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (3));
+
+ tmp = gen_rtx_SET (VOIDmode,
+stack_pointer_rtx,
+plus_constant (stack_pointer_rtx, 8));
+ tmp1 = gen_rtx_SET (VOIDmode,
+ stack_pointer_rtx,
+ plus_constant (stack_pointer_rtx, 8));
+ RTX_FRAME_RELATED_P (tmp) = 1;
+

[wwwdocs] Add branch description for new branch unified-autovect

2016-07-08 Thread Sameera Deshpande
Hi!

I have created new branch unified-autovect based on ToT.

Please find attached the patch adding information about new branch 
"unified-autovect" in the documentation.
Is it ok to commit?

- Thanks and regards,
  Sameera D.

unified-autovec-doc.patch
Description: unified-autovec-doc.patch


[unified-autovect: Patch 2/N] Implementation of k-arity promotion/reduction

2017-02-05 Thread Sameera Deshpande
oot) == 0)
+return root;
+
+  return NULL;
+}
+
+/* Function k_arity_promotion_reduction.
+
+   Driver function for promoting/reducing arity of the tree rooted at ROOT from
+   FROM_ARITY to TO_ARITY.  */
+
+struct primop_tree *
+k_arity_promotion_reduction (struct primop_tree *root, int to_arity)
+{
+  struct primop_tree *retval = root;
+  int from_arity, i;
+
+  if (dump_enabled_p ())
+{
+  dump_printf_loc (MSG_NOTE, vect_location,
+			"\n k_arity_promotion_reduction: ");
+  dump_primtree_node (MSG_NOTE, root);
+}
+
+  if (PT_NODE_OP (root) == POP_EXTR || PT_NODE_OP (root) == POP_SPLT)
+from_arity = PT_DIVISION (root);
+  else
+from_arity = PT_ARITY (root);
+
+  if (PT_NODE_OP (root) >= MAX_TREE_CODES && PT_NODE_OP (root) < POP_COLLAPSE)
+{
+  if (from_arity > to_arity)
+	{
+	  /* Arity reduction.  */
+	  if (from_arity % to_arity == 0)
+	{
+	  retval = k_arity_reduction (root, from_arity, to_arity);
+	  return retval;
+	}
+	  else
+	return NULL;
+	}
+  else if (from_arity < to_arity)
+	{
+	  /* Arity promotion.  */
+	  if (to_arity % from_arity == 0)
+	{
+	  retval = k_arity_promotion (root, from_arity, to_arity);
+	  return retval;
+	}
+	  else
+	return NULL;
+	}
+  else
+	{
+	  retval = duplicate_prim_node (root);
+	  //return retval;
+	}
+}
+
+  if (retval != NULL)
+{
+  /* The tree node is compute-node.  Hence, no action to be taken for arity
+	 promotion/reduction.  However, the subtrees below this root may need
+	 arity adjustment.  Hence, invoke k_arity_promotion_reduction algorithm
+	 recursively on children of root.  */
+  for (i = 0; i < retval->children.length (); i++)
+	{
+	  struct primop_tree *tmp;
+	  tmp = k_arity_promotion_reduction (PT_CHILD (retval, i), to_arity);
+	  if (tmp == NULL)
+	return NULL;
+
+	  PT_CHILD (retval, i) = tmp;
+	}
+
+  PT_ARITY (retval) = i;
+}
+
+  return retval;
+
+
+}
+
+#endif
Index: gcc/tree-vect-unified.c
===
--- gcc/tree-vect-unified.c	(revision 243687)
+++ gcc/tree-vect-unified.c	(working copy)
@@ -1,4 +1,10 @@
-/* lOOP Vectorization using unified representation
+/* lOOP Vectorization using unified representation for permute instructions.
+   Copyright (C) 2003-2015 Free Software Foundation, Inc.
+   Contributed by Sameera Deshpande 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
 the terms of the GNU General Public License as published by the Free
 Software Foundation; either version 3, or (at your option) any later
 version.
@@ -102,6 +108,26 @@
 
 } // anon namespace
 
+#define DEFTREECODE(SYM, NAME, TYPE, LEN) NAME,
+#define END_OF_BASE_TREE_CODES "@dummy",
+
+static const char *const tree_code_name[] = {
+#include "all-tree.def"
+"ILV",
+"CONCAT",
+"EXTR",
+"SPLT",
+"COLLAPSE",
+"MEMREF",
+"CONST",
+"INVAR",
+"ITER"
+};
+
+#undef DEFTREECODE
+#undef END_OF_BASE_TREE_CODES
+
+
 gimple_opt_pass *
 make_pass_unified_vectorize (gcc::context *ctxt)
 {
@@ -108,6 +134,53 @@
   return new pass_unified_vectorize (ctxt);
 }
 
+
+vec stmt_attr_vec;
+
+void
+init_stmt_attr_vec (void)
+{
+  gcc_assert (!stmt_attr_vec.exists ());
+  stmt_attr_vec.create (50);
+}
+
+void
+free_stmt_attr_vec (void)
+{
+  gcc_assert (stmt_attr_vec.exists ());
+  stmt_attr_vec.release ();
+}
+
+inline void
+set_stmt_attr (gimple *stmt, struct stmt_attr *info)
+{
+  unsigned int uid = gimple_uid (stmt);
+  if (uid == 0)
+{
+  gcc_checking_assert (info);
+  uid = stmt_attr_vec.length () + 1;
+  gimple_set_uid (stmt, uid);
+  stmt_attr_vec.safe_push (info);
+}
+  else
+{
+  gcc_checking_assert (info == NULL);
+  stmt_attr_vec[uid - 1] = info;
+}
+}
+
+inline struct stmt_attr *
+get_stmt_attr (gimple *stmt)
+{
+  unsigned int uid = gimple_uid (stmt);
+  if (uid == 0)
+return NULL;
+
+  return stmt_attr_vec[uid - 1];
+}
+
+
+
 /* Function new_iter_node.
 
Create new ITER_node for the loop LOOP, and return the pointer.  */
@@ -1498,7 +1571,7 @@
 */
 
 struct primop_tree *
-populate_prim_node (enum primop_code pcode, struct ITER_node *inode,
+populate_prim_node (enum primop_code pcode, tree iter_count,
 		struct primop_tree *parent, gimple *stmt)
 {
   struct primop_tree *ptree;
@@ -1506,7 +1579,7 @@
 
   PT_NODE_OP (ptree) = (int) pcode;
   PT_PARENT (ptree) = parent;
-  PT_ITER_COUNT (ptree) = ITER_NODE_NITERS (inode);
+  PT_ITER_COUNT (ptree) = iter_count;
 
   if (stmt)
 {
@@ -1566,11 +1639,11 @@
 
 struct primop_tree *
 create_primTree_memref (tree base, tree step, bool is_read, int num,
-struct ITER_node *inode, struct primop_tree *parent)
+tree iter_count, struct primop_tree *parent)
 {

[PATCH, MIPS] Calling convention differs depending on the presence of MSA

2017-02-08 Thread Sameera Deshpande
Hi Matthew,

Please find attached the patch to fix the calling convention issue, 
where argument and result passing convention differed for MSA and 
non-MSA variants.

The implementation of TARGET_RETURN_IN_MEMORY is altered to block V4SF to be 
returned in registers.

Ok for trunk?

- Thanks and regards,
  Sameera D.


Changelog:
gcc/
* config/mips/mips.c (mips_return_in_memory) : Restrict V4SFmode to be 
returned in registers.

gcc/testsuite/
* gcc.target/mips/msa-fp-cc.c : New testcase.

RE: [PATCH, MIPS] Calling convention differs depending on the presence of MSA

2017-02-11 Thread Sameera Deshpande
Hi Matthew,

Please find attached updated patch as per our offline discussion.

I have disabled return in registers for all vector float types, and updated the 
test case accordingly.

Ok for trunk?

- Thanks and regards,
  Sameera D.

From: Sameera Deshpande
Sent: 08 February 2017 14:10:52
To: Matthew Fortune
Cc: gcc-patches@gcc.gnu.org
Subject: [PATCH, MIPS] Calling convention differs depending on the presence of 
MSA

Hi Matthew,

Please find attached the patch to fix the calling convention issue,
where argument and result passing convention differed for MSA and
non-MSA variants.

The implementation of TARGET_RETURN_IN_MEMORY is altered to block V4SF to be 
returned in registers.

Ok for trunk?

- Thanks and regards,
  Sameera D.


Changelog:
gcc/
* config/mips/mips.c (mips_return_in_memory) : Restrict V4SFmode to be 
returned in registers.

gcc/testsuite/
* gcc.target/mips/msa-fp-cc.c : New testcase.

fix_calling_convention.patch
Description: fix_calling_convention.patch


[unified-autovect: Patch 1b/N] Instruction tile and grammar creation.

2017-04-06 Thread Sameera Deshpande
, 8, 8, (int[8]){4,12,5,13,6,14,7,15}, 1, "ILVL.H", "RRR", NULL, NULL}, \
+\
+  {2, 16, 16, (int[16]){0,16,2,18,4,20,6,22,8,24,10,26,12,28,14,30}, 1, \
+	 "ILVEV.Q", "RRR", NULL, NULL}, \
+  {2, 16, 16, (int[16]){1,17,3,19,5,21,7,23,9,25,11,27,13,29,15,31}, 1, \
+	 "ILVOD.Q", "RRR", NULL, NULL}, \
+  {2, 16, 16, (int[16]){0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30}, 1, \
+	 "PCKEV.Q", "RRR", NULL, NULL}, \
+  {2, 16, 16, (int[16]){1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31}, 1, \
+	 "PCKOD.Q", "RRR", NULL, NULL}, \
+  {2, 16, 16, (int[16]){8,24,9,25,10,26,11,27,12,28,13,29,14,30,15,31}, 1, \
+	 "ILVL.Q", "RRR", NULL, NULL}, \
+  {2, 16, 16, (int[16]){0,16,1,17,2,18,3,19,4,20,5,21,6,22,7,23}, 1, \
+	 "ILVR.Q", "RRR", NULL, NULL}, \
+}
+
 #define MAX_VECTOR_SIZE 16
Index: gcc/coretypes.h
===
--- gcc/coretypes.h	(revision 246613)
+++ gcc/coretypes.h	(working copy)
@@ -358,6 +358,8 @@
 typedef unsigned char uchar;
 #endif
 
+struct vec_perm_order_spec;
+
 /* Most host source files will require the following headers.  */
 #if !defined (GENERATOR_FILE) && !defined (USED_FOR_TARGET)
 #include "machmode.h"
Index: gcc/genvect-inst-tiles.c
===
--- gcc/genvect-inst-tiles.c	(revision 0)
+++ gcc/genvect-inst-tiles.c	(working copy)
@@ -0,0 +1,713 @@
+/* Loop Vectorization using unified representation for permute instructions.
+   Copyright (C) 2003-2015 Free Software Foundation, Inc.
+   Contributed by Sameera Deshpande 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#define GENERATOR_FILE 1
+#include "bconfig.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "errors.h"
+#ifdef GENERATOR_FILE
+#include "machmode.h"
+#include "signop.h"
+#include "wide-int.h"
+#include "double-int.h"
+#include "real.h"
+#include "fixed-value.h"
+#include "statistics.h"
+#include "vec.h"
+#include "hash-table.h"
+#include "hash-set.h"
+#include "input.h"
+#include "is-a.h"
+#include "target.h"
+#endif
+
+#include "tree-core.h"
+#include "tree-vect-unified.h"
+//#include "tree-vect-unified-common.c"
+//#include "tree-vect-unified-opts.c"
+
+#define DEBUG 0
+int target_flags;
+
+struct vec_perm_order_spec target_spec[] = TARGET_VEC_PERM_CONST_ORDER;
+vec  rules;
+
+/* List of non-terminals used in grammar.  The index is used in the grammar rule
+   to point to appropriate non-terminal in the list.  For now, the non-terminal
+   is just list of strings with NT names.  However if needed, it can be updated
+   to hold additional information in the structure.  */
+vec non_terminals;
+
+/* List of terminals in Grammar.  Currently, we support only 3 categories in
+   terminals -
+   MEM, REG and CONST.  */
+vecterminals;
+
+/* Function create_placeholder.
+
+*/
+
+struct primop_tree *
+create_placeholder (int idx, char ch, struct primop_tree *parent)
+{
+  struct primop_tree *ptree;
+
+  ptree = populate_prim_node (POP_PH, NULL,
+			  parent, NULL);
+  PT_PH_IDX (ptree) = idx;
+  PT_PH_TYPE (ptree) = ch;
+  return ptree;
+}
+
+/* Function create_perm_order_tree.
+
+   For each element in TARGET_VEC_PERM_CONST_ORDER
+   Do
+ 1. Create ILV node with arity out_vec_size.
+ 2. For ith element in perm_order
+	Do
+	  1. Create EXTR node with parts = in_vec_size and selector = i % parts
+	  2. Create child of EXTR as PLACEHOLDER__, i / parts
+   	 should not exceed num_opd.  For k_arity_promotion_reduction and
+   	 unity_redundancy_elimination, PLACEHOLDER__ is used for
+   	 matching.  Whereas for grammar definition, only PLACEHOLDER_
+   	 is used for generating rules.
+	   Done
+   Done
+*/
+
+struct primop_tree *
+create_perm_order_tree (struct vec_perm_order_spec spec)
+{
+  int i, num;
+  struct primop_tree *ilv_node, *expr_node, *placeholder;
+
+  ilv_node = create_primTree_combine (POP_ILV, NULL,
+		spec.out_vec_size, NULL, NULL);
+
+  for (i = 0; i < spec.out_vec_size; i++)
+{
+  expr_nod

[PATCH][MIPS] Enable load-load/store-store bonding

2014-06-19 Thread Sameera Deshpande
Hi Richard,

Please find attached the patch implementing load-load/store-store bonding 
supported by P5600.

In P5600, 2 consecutive loads/stores of same type which access contiguous 
memory locations are bonded together by instruction issue unit to dispatch 
single load/store instruction which accesses both locations. This allows 2X 
improvement in memory intensive code. This optimization can be performed for 
LH, SH, LW, SW, LWC, SWC, LDC, SDC instructions.

This patch adds peephole2 patterns to identify such loads/stores, and put them 
in parallel, so that the scheduler will not split it - thereby guarantying h/w 
level load/store bonding.

The patch is tested with dejagnu for correctness.
Local testing on hardware for perf is  currently going on.
Ok for trunk? 

Changelog:
gcc/
* config/mips/mips.md (JOINLDST1): New mode iterator.
(insn_type): New mode attribute.
(reg): Update mode attribute.
(join2_load_Store): New pattern.
(join2_loadhi): Likewise.
(join2_storehi): Likewise.
(define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode
load-load and store-stores.
* config/mips/mips.opt (mld-st-pairing): New option.
* config/mips/mips.c (mips_option_override): New exception.
*config/mips/mips.h (ENABLE_LD_ST_PAIRING): New macro.

- Thanks and regards,
   Sameera D.



load-store-pairing.patch
Description: load-store-pairing.patch


RE: [PATCH][MIPS] Enable load-load/store-store bonding

2014-06-23 Thread Sameera Deshpande
Hi Richard,

Thanks for your comments. I am working on the review comments, and will share 
the reworked patch soon.
However, here is clarification on some of the issues raised.

> > +  if (TARGET_FIX_24K && TUNE_P5600)
> > +error ("unsupported combination: %s", "-mtune=p5600 -mfix-24k");
> > +
> >/* Save the base compression state and process flags as though we
> >   were generating uncompressed code.  */
> >mips_base_compression_flags = TARGET_COMPRESSION;
> 
> Although it's a bit of an odd combination, we need to accept -mfix-24k -
> mtune=p5600 and continue to implement the 24k workarounds.
> The idea is that a distributor can build for a common base architecture, add -
> mfix- options for processors that might run the code, and add -mtune= for
> the processor that's most of interest optimisation-wise.
> 
> We should just make the pairing of stores conditional on !TARGET_FIX_24K.
We had offline discussion based on your comment. There is additional view on 
the same.
Only ISAs mips32r2, mips32r3 and mips32r5 support P5600. Remaining ISAs do not 
support P5600. 
For mips32r2 (24K) and mips32r3 (micromips), load-store pairing is implemented 
separately, and hence, as you suggested, P5600 Ld-ST bonding optimization 
should not be enabled for them.
So, is it fine if I emit error for any ISAs other than mips32r2, mips32r3 and 
mips32r5 when P5600 is enabled, or the compilation should continue by emitting 
warning and disabling P5600?
Also, the optimization will be enabled only if !TARGET_FIX_24K && 
!TARGET_MICTOMIPS as suggested by you.

> > +
> > +#define ENABLE_LD_ST_PAIRING \
> > +  (TARGET_ENABLE_LD_ST_PAIRING && TUNE_P5600)
> 
> The patch requires -mld-st-pairing to be passed explicitly even for -
> mtune=p5600.  Is that because it's not a consistent enough win for us to
> enable it by default?  It sounded from the description like it should be an
> improvement more often that not.
> 
> We should allow pairing even without -mtune=p5600.
Performance testing for this patch is not yet done. 
If the patch proves beneficial in most of the testcases (which we believe will 
do on P5600) we will enable this optimization by default for P5600 - in which 
case this option can be removed.

> 
> Are QImodes not paired in the same way?  If so, it'd be worth adding a
> comment above the define_mode_iterator saying that QI is deliberately
> excluded.
The P5600 datasheet mentions bonding of load/stores in HI, SI, SF and DF modes 
only. Hence QI mode is excluded. I will add the comment on the iterator.

- Thanks and regards,
   Sameera D.



RE: [PATCH][MIPS] Enable load-load/store-store bonding

2014-06-24 Thread Sameera Deshpande
Hi Richard,

Thanks for the review.
Please find attached updated patch after your review comments.

Changelog:
gcc/
* config/mips/mips.md (JOIN_MODE): New mode iterator.
(join2_load_Store): New pattern.
(join2_loadhi): Likewise.
(define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode
load-load and store-stores.
* config/mips/mips.opt (mload-store-pairs): New option.
(TARGET_LOAD_STORE_PAIRS): New macro.
*config/mips/mips.h (ENABLE_P5600_LD_ST_PAIRS): Likewise.
*config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype.
*config/mips/mips.c(mips_load_store_bonding_p): New function.

The change is tested with dejagnu with additional options -mload-store-pairs 
and -mtune=p5600.
The perf measurement is yet to finish.

> > We had offline discussion based on your comment. There is additional
> > view on the same.
> > Only ISAs mips32r2, mips32r3 and mips32r5 support P5600. Remaining
> > ISAs do not support P5600.
> > For mips32r2 (24K) and mips32r3 (micromips), load-store pairing is
> > implemented separately, and hence, as you suggested, P5600 Ld-ST
> > bonding optimization should not be enabled for them.
> > So, is it fine if I emit error for any ISAs other than mips32r2,
> > mips32r3 and mips32r5 when P5600 is enabled, or the compilation should
> > continue by emitting warning and disabling P5600?
> 
> No, the point is that we have two separate concepts: ISA and optimisation
> target.  -mipsN and -march=N control the ISA (which instructions are
> available) and -mtune=M controls optimisation decisions within the
> constraints of that N, such as scheduling and the cost of things like
> multiplication and division.
> 
> E.g. you could have -mips2 -mtune=p5600 -mfix-24k: generate MIPS II-
> compatible code, optimise it for p5600, but make sure that 24k workarounds
> are used.  The code would run correctly on any MIPS II-compatible processor
> without known errata and also on the 24k.
Ok, disabled the peephole pattern for fix-24k and micromips - to allow specific 
patterns to be matched.

> > +
> > +mld-st-pairing
> > +Target Report Var(TARGET_ENABLE_LD_ST_PAIRING) Enable load/store
> > +pairing
> 
> Other options are just "TARGET_" + the captialised form of the option name,
> so I'd prefer TARGET_LD_ST_PAIRING instead.  Although "ld" might be
> misleading since it's an abbreviation for "load" rather than the LD 
> instruction.
> Maybe -mload-store-pairs, since plurals are more common than "-ing"?
> Not sure that's a great suggestion though.
Renamed the option and corresponding macro as suggested.

> > Performance testing for this patch is not yet done.
> > If the patch proves beneficial in most of the testcases (which we
> > believe will do on P5600) we will enable this optimization by default
> > for P5600 - in which case this option can be removed.
> 
> OK.  Sending the patch for comments before performance testing is fine, but
> I think it'd be better to commit the patch only after the testing is done, 
> since
> otherwise the patch might need to be tweaked.
> 
> I don't see any problem with keeping the option in case people want to
> experiment with it.  I just think the patch should only go in once it can be
> enabled by default for p5600.  I.e. the option would exist to turn off the
> pairing.
> 
> Not having the option is fine too of course.
Yes, after perf analysis, I will share the results across, and then depending 
upon the impact, the decision can be made - whether to make the option as 
default or not, and then the patch will be submitted.

> We should allow pairing even without -mtune=p5600.
The load-store pairing is currently attribute of P5600, so I have not enabled 
the pairing without mtune=5600. If need be, can enable that without mtune=p5600.

> 
> (define_mode_iterator JOIN_MODE [
>   SI
>   (DI "TARGET_64BIT")
>   (SF "TARGET_HARD_FLOAT")
>   (DF "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT")])
>
Done this change.
 
> and then extend:
> 
> > @@ -883,6 +884,8 @@
> >  (define_mode_attr loadx [(SF "lwxc1") (DF "ldxc1") (V2SF "ldxc1")])
> > (define_mode_attr storex [(SF "swxc1") (DF "sdxc1") (V2SF "sdxc1")])
> >
> > +(define_mode_attr insn_type [(SI "") (SF "fp") (DF "fp")])
> > +
> >  ;; The unextended ranges of the MIPS16 addiu and daddiu instructions
> > ;; are different.  Some forms of unextended addiu have an 8-bit
> > immediate  ;; field but the equivalent daddiu has only a 5-bit field.
> 
> this accordingly.
In order to allow d/f for both register classes, the pattern 
join2_load_store was altered a bit which eliminated this mode iterator.

> 
> Outer (parallel ...)s are redundant in a define_insn.
Removed.

> 
> It would be better to add the mips_load_store_insns for each operand
> rather than multiplying one of them by 2.  Or see the next bit for an
> alternative.
Using the alternative method as you suggested, so this change is not needed.

> Please instead add HI to the define_mode_ite

Re: [Aarch64] Fix conditional branches with target far away.

2018-07-31 Thread Sameera Deshpande
On Mon 9 Apr, 2018, 2:06 PM Sameera Deshpande, 
wrote:

> Hi Richard,
>
> I do not see the said patch applied in ToT yet. When do you expect it
> to be available in ToT?
>
> - Thanks and regards,
>   Sameera D.
>
> On 30 March 2018 at 17:01, Sameera Deshpande
>  wrote:
> > Hi Richard,
> >
> > The testcase is working with the patch you suggested, thanks for
> > pointing that out.
> >
> > On 30 March 2018 at 16:54, Sameera Deshpande
> >  wrote:
> >> On 30 March 2018 at 16:39, Richard Sandiford
> >>  wrote:
> >>>> Hi Sudakshina,
> >>>>
> >>>> Thanks for pointing that out. Updated the conditions for attribute
> >>>> length to take care of boundary conditions for offset range.
> >>>>
> >>>> Please find attached the updated patch.
> >>>>
> >>>> I have tested it for gcc testsuite and the failing testcase. Ok for
> trunk?
> >>>>
> >>>> On 22 March 2018 at 19:06, Sudakshina Das  wrote:
> >>>>> Hi Sameera
> >>>>>
> >>>>> On 22/03/18 02:07, Sameera Deshpande wrote:
> >>>>>>
> >>>>>> Hi Sudakshina,
> >>>>>>
> >>>>>> As per the ARMv8 ARM, for the offset range (-1048576 ,1048572), the
> >>>>>> far branch instruction offset is inclusive of both the offsets.
> Hence,
> >>>>>> I am using <=||=> and not <||>= as it was in previous
> implementation.
> >>>>>
> >>>>>
> >>>>> I have to admit earlier I was only looking at the patch mechanically
> and
> >>>>> found a difference with the previous implementation in offset
> comparison.
> >>>>> After you pointed out, I looked up the ARMv8 ARM and I have a couple
> of
> >>>>> doubts:
> >>>>>
> >>>>> 1. My understanding is that any offset in [-1048576 ,1048572] both
> inclusive
> >>>>> qualifies as an 'in range' offset. However, the code for both
> attribute
> >>>>> length and far_branch has been using [-1048576 ,1048572), that is, (
> >= && <
> >>>>> ). If the far_branch was incorrectly calculated, then maybe the
> length
> >>>>> calculations with similar magic numbers should also be corrected? Of
> course,
> >>>>> I am not an expert in this and maybe this was a conscience decision
> so I
> >>>>> would ask Ramana to maybe clarify if he remembers.
> >>>>>
> >>>>> 2. Now to come back to your patch, if my understanding is correct, I
> think a
> >>>>> far_branch would be anything outside of this range, that is,
> >>>>> (offset < -1048576 || offset > 1048572), anything that can not be
> >>>>> represented in the 21-bit range.
> >>>>>
> >>>>> Thanks
> >>>>> Sudi
> >>>
> >>> [...]
> >>>
> >>>> @@ -466,14 +459,9 @@
> >>>>[(set_attr "type" "branch")
> >>>> (set (attr "length")
> >>>>   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int
> -1048576))
> >>>> -(lt (minus (match_dup 2) (pc)) (const_int
> 1048572)))
> >>>> +(le (minus (match_dup 2) (pc)) (const_int
> 1048572)))
> >>>> (const_int 4)
> >>>> -   (const_int 8)))
> >>>
> >>> Sorry for not replying earlier, but I think the use of "lt" rather than
> >>> "le" in the current length attribute is deliberate.  Distances measured
> >>> from (pc) in "length" are a bit special in that backward distances are
> >>> measured from the start of the instruction and forward distances are
> >>> measured from the end of the instruction:
> >>>
> >>>   /* The address of the current insn.  We implement this actually
> as the
> >>>  address of the current insn for backward branches, but the
> last
> >>>  address of the next insn for forward branches, and both with
> >>>  adjustments that account for the worst-case possible
> stretching of
> >>>  intervening alignments between this insn and its
> destination.  */
> >>>
> >>> This avoids the c

[AARCH64]Bug in fix for branch offsets over 1 MiB?

2018-01-20 Thread Sameera Deshpande
Hi!

I am seeing multiple assembler errors with error message "Error:
conditional branch out of range" for customer code.

The root cause of the bug is that conditional branches are generated
whose branch target ends up being too far away to be encoded in the
instruction.  It appears that there was an attempt to fix this issue
in the below change:

commit 050af05b9761f1979f11c151519e7244d5becd7c
Author: thopre01 
Date:   Thu Aug 27 10:08:54 2015 +

2015-08-27  Ramana Radhakrishnan
<[ramana.radhakrish...@arm.com|mailto:ramana.radhakrish...@arm.com]>
Andre Vieira
<[andre.simoesdiasvie...@arm.com|mailto:andre.simoesdiasvie...@arm.com]>

gcc/
* config/aarch64/[aarch64.md|http://aarch64.md/] (*condjump):
Handle functions > 1 MiB.
(*cb1): Likewise.
(*tb1): Likewise.
(*cb1): Likewise.
* config/aarch64/[iterators.md|http://iterators.md/] (inv_cb):
New code attribute.
(inv_tb): Likewise.
* config/aarch64/aarch64.c (aarch64_gen_far_branch): New.
* config/aarch64/aarch64-protos.h (aarch64_gen_far_branch): New.

gcc/testsuite/
* gcc.target/aarch64/long_branch_1.c: New test.

However, as per GCC Internal documentation, only special attribute
"length" should use PC and match_dup while defining an attribute. I
verified by looking at code in final pass, and realised that
get_attribute_length does not map directly to the functions generated
from the definition of attribute length in RTL patterns, but computes
the lengths in shorten_branches and uses insn_current_length as
intermediate function.

The far_branch attribute defined similar to attribute length expects
same values to be returned by (minus (match_dup 2) (pc)) which is
incorrect.

I am looking at TARGET_MACHINE_DEPENDENT_REORG macro instead like few
other architectures, to emit far branches.

Is that approach acceptable?

PS: I am waiting for customer's approval for attaching the test case.

-- 
- Thanks and regards,
  Sameera D.


Re: [AARCH64]Bug in fix for branch offsets over 1 MiB?

2018-01-29 Thread Sameera Deshpande
On 30-Jan-2018 2:37 AM, "Richard Sandiford" 
wrote:

Sameera Deshpande  writes:
> Hi!
>
> I am seeing multiple assembler errors with error message "Error:
> conditional branch out of range" for customer code.
>
> The root cause of the bug is that conditional branches are generated
> whose branch target ends up being too far away to be encoded in the
> instruction.  It appears that there was an attempt to fix this issue
> in the below change:
>
> commit 050af05b9761f1979f11c151519e7244d5becd7c
> Author: thopre01 
> Date:   Thu Aug 27 10:08:54 2015 +
>
> 2015-08-27  Ramana Radhakrishnan
> <[ramana.radhakrish...@arm.com|mailto:ramana.radhakrish...@arm.com]>
> Andre Vieira
> <[andre.simoesdiasvie...@arm.com|mailto:andre.simoesdiasvie...@arm.com]>
>
> gcc/
> * config/aarch64/[aarch64.md|http://aarch64.md/] (*condjump):
> Handle functions > 1 MiB.
> (*cb1): Likewise.
> (*tb1): Likewise.
> (*cb1): Likewise.
> * config/aarch64/[iterators.md|http://iterators.md/] (inv_cb):
> New code attribute.
> (inv_tb): Likewise.
> * config/aarch64/aarch64.c (aarch64_gen_far_branch): New.
> * config/aarch64/aarch64-protos.h (aarch64_gen_far_branch): New.
>
> gcc/testsuite/
> * gcc.target/aarch64/long_branch_1.c: New test.
>
> However, as per GCC Internal documentation, only special attribute
> "length" should use PC and match_dup while defining an attribute. I
> verified by looking at code in final pass, and realised that
> get_attribute_length does not map directly to the functions generated
> from the definition of attribute length in RTL patterns, but computes
> the lengths in shorten_branches and uses insn_current_length as
> intermediate function.
>
> The far_branch attribute defined similar to attribute length expects
> same values to be returned by (minus (match_dup 2) (pc)) which is
> incorrect.
>
> I am looking at TARGET_MACHINE_DEPENDENT_REORG macro instead like few
> other architectures, to emit far branches.
>
> Is that approach acceptable?

I don't think we need to go that far.  The INSN_ADDRESSES should be
correct when outputting the instructions, so does it work if we use
those instead of get_attr_far_branch?

Thanks,
Richard

> PS: I am waiting for customer's approval for attaching the test case.


Hi Richard,

Thanks for your reply. I will try using INSN_ADDRESSES and will get back to
you.

- Thanks and regards,
  Sameera D.


Re: [AARCH64]Bug in fix for branch offsets over 1 MiB?

2018-01-29 Thread Sameera Deshpande
On 30 January 2018 at 09:28, Sameera Deshpande
 wrote:
> On 30-Jan-2018 2:37 AM, "Richard Sandiford" 
> wrote:
>
> Sameera Deshpande  writes:
>> Hi!
>>
>> I am seeing multiple assembler errors with error message "Error:
>> conditional branch out of range" for customer code.
>>
>> The root cause of the bug is that conditional branches are generated
>> whose branch target ends up being too far away to be encoded in the
>> instruction.  It appears that there was an attempt to fix this issue
>> in the below change:
>>
>> commit 050af05b9761f1979f11c151519e7244d5becd7c
>> Author: thopre01 
>> Date:   Thu Aug 27 10:08:54 2015 +
>>
>> 2015-08-27  Ramana Radhakrishnan
>> <[ramana.radhakrish...@arm.com|mailto:ramana.radhakrish...@arm.com]>
>> Andre Vieira
>> <[andre.simoesdiasvie...@arm.com|mailto:andre.simoesdiasvie...@arm.com]>
>>
>> gcc/
>> * config/aarch64/[aarch64.md|http://aarch64.md/] (*condjump):
>> Handle functions > 1 MiB.
>> (*cb1): Likewise.
>> (*tb1): Likewise.
>> (*cb1): Likewise.
>> * config/aarch64/[iterators.md|http://iterators.md/] (inv_cb):
>> New code attribute.
>> (inv_tb): Likewise.
>> * config/aarch64/aarch64.c (aarch64_gen_far_branch): New.
>> * config/aarch64/aarch64-protos.h (aarch64_gen_far_branch): New.
>>
>> gcc/testsuite/
>> * gcc.target/aarch64/long_branch_1.c: New test.
>>
>> However, as per GCC Internal documentation, only special attribute
>> "length" should use PC and match_dup while defining an attribute. I
>> verified by looking at code in final pass, and realised that
>> get_attribute_length does not map directly to the functions generated
>> from the definition of attribute length in RTL patterns, but computes
>> the lengths in shorten_branches and uses insn_current_length as
>> intermediate function.
>>
>> The far_branch attribute defined similar to attribute length expects
>> same values to be returned by (minus (match_dup 2) (pc)) which is
>> incorrect.
>>
>> I am looking at TARGET_MACHINE_DEPENDENT_REORG macro instead like few
>> other architectures, to emit far branches.
>>
>> Is that approach acceptable?
>
> I don't think we need to go that far.  The INSN_ADDRESSES should be
> correct when outputting the instructions, so does it work if we use
> those instead of get_attr_far_branch?
>
> Thanks,
> Richard
>
>> PS: I am waiting for customer's approval for attaching the test case.
>
>
> Hi Richard,
>
> Thanks for your reply. I will try using INSN_ADDRESSES and will get back to
> you.
>
> - Thanks and regards,
>   Sameera D.
>

Hi Richard,

I verified that it works. Thanks a lot! Will do the testing, and
update the patch.

-- 
- Thanks and regards,
  Sameera D.


[Aarch64] Fix conditional branches with target far away.

2018-02-14 Thread Sameera Deshpande
Hi!

Please find attached the patch to fix bug in branches with offsets over 1MiB.
There has been an attempt to fix this issue in commit
050af05b9761f1979f11c151519e7244d5becd7c

However, the far_branch attribute defined in above patch used
insn_length - which computes incorrect offset. Hence, eliminated the
attribute completely, and computed the offset from insn_addresses
instead.

Ok for trunk?

gcc/Changelog

2018-02-13 Sameera Deshpande 
* config/aarch64/aarch64.md (far_branch): Remove attribute. Eliminate
all the dependencies on the attribute from RTL patterns.

-- 
- Thanks and regards,
  Sameera D.
Index: gcc/config/aarch64/aarch64.md
===
--- gcc/config/aarch64/aarch64.md	(revision 257620)
+++ gcc/config/aarch64/aarch64.md	(working copy)
@@ -244,13 +244,6 @@
 	(const_string "no")
 	] (const_string "yes")))
 
-;; Attribute that specifies whether we are dealing with a branch to a
-;; label that is far away, i.e. further away than the maximum/minimum
-;; representable in a signed 21-bits number.
-;; 0 :=: no
-;; 1 :=: yes
-(define_attr "far_branch" "" (const_int 0))
-
 ;; Strictly for compatibility with AArch32 in pipeline models, since AArch64 has
 ;; no predicated insns.
 (define_attr "predicated" "yes,no" (const_string "no"))
@@ -448,12 +441,7 @@
 	(if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
 			   (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
 		  (const_int 4)
-		  (const_int 8)))
-   (set (attr "far_branch")
-	(if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
-			   (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
-		  (const_int 0)
-		  (const_int 1)))]
+		  (const_int 8)))]
 )
 
 ;; For a 24-bit immediate CST we can optimize the compare for equality
@@ -670,12 +658,7 @@
 	(if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -1048576))
 			   (lt (minus (match_dup 1) (pc)) (const_int 1048572)))
 		  (const_int 4)
-		  (const_int 8)))
-   (set (attr "far_branch")
-	(if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
-			   (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
-		  (const_int 0)
-		  (const_int 1)))]
+		  (const_int 8)))]
 )
 
 (define_insn "*tb1"
@@ -692,7 +675,11 @@
   {
 if (get_attr_length (insn) == 8)
   {
-	if (get_attr_far_branch (insn) == 1)
+	long long int offset;
+	offset = INSN_ADDRESSES (INSN_UID (XEXP (operands[2], 0)))
+		  - INSN_ADDRESSES (INSN_UID (insn));
+
+	if (offset <= -1048576 || offset >= 1048572)
 	  return aarch64_gen_far_branch (operands, 2, "Ltb",
 	 "\\t%0, %1, ");
 	else
@@ -709,12 +696,7 @@
 	(if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -32768))
 			   (lt (minus (match_dup 2) (pc)) (const_int 32764)))
 		  (const_int 4)
-		  (const_int 8)))
-   (set (attr "far_branch")
-	(if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
-			   (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
-		  (const_int 0)
-		  (const_int 1)))]
+		  (const_int 8)))]
 
 )
 
@@ -727,8 +709,12 @@
   ""
   {
 if (get_attr_length (insn) == 8)
-  {
-	if (get_attr_far_branch (insn) == 1)
+   {
+long long int offset;
+offset = INSN_ADDRESSES (INSN_UID (XEXP (operands[1], 0)))
+		 - INSN_ADDRESSES (INSN_UID (insn));
+
+	if (offset <= -1048576 || offset >= 1048572)
 	  return aarch64_gen_far_branch (operands, 1, "Ltb",
 	 "\\t%0, , ");
 	else
@@ -740,7 +726,7 @@
 	output_asm_insn (buf, operands);
 	return "\t%l1";
 	  }
-  }
+   }
 else
   return "\t%0, , %l1";
   }
@@ -749,12 +735,7 @@
 	(if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -32768))
 			   (lt (minus (match_dup 1) (pc)) (const_int 32764)))
 		  (const_int 4)
-		  (const_int 8)))
-   (set (attr "far_branch")
-	(if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -1048576))
-			   (lt (minus (match_dup 1) (pc)) (const_int 1048572)))
-		  (const_int 0)
-		  (const_int 1)))]
+		  (const_int 8)))]
 )
 
 ;; ---


Re: [Aarch64] Fix conditional branches with target far away.

2018-02-27 Thread Sameera Deshpande
On 14 February 2018 at 14:00, Sameera Deshpande
 wrote:
> Hi!
>
> Please find attached the patch to fix bug in branches with offsets over 1MiB.
> There has been an attempt to fix this issue in commit
> 050af05b9761f1979f11c151519e7244d5becd7c
>
> However, the far_branch attribute defined in above patch used
> insn_length - which computes incorrect offset. Hence, eliminated the
> attribute completely, and computed the offset from insn_addresses
> instead.
>
> Ok for trunk?
>
> gcc/Changelog
>
> 2018-02-13 Sameera Deshpande 
> * config/aarch64/aarch64.md (far_branch): Remove attribute. Eliminate
> all the dependencies on the attribute from RTL patterns.
>
> --
> - Thanks and regards,
>   Sameera D.


Gentle reminder!

-- 
- Thanks and regards,
  Sameera D.


Re: [Aarch64] Fix conditional branches with target far away.

2018-02-28 Thread Sameera Deshpande
On 27 February 2018 at 18:25, Ramana Radhakrishnan
 wrote:
> On Wed, Feb 14, 2018 at 8:30 AM, Sameera Deshpande
>  wrote:
>> Hi!
>>
>> Please find attached the patch to fix bug in branches with offsets over 1MiB.
>> There has been an attempt to fix this issue in commit
>> 050af05b9761f1979f11c151519e7244d5becd7c
>>
>> However, the far_branch attribute defined in above patch used
>> insn_length - which computes incorrect offset. Hence, eliminated the
>> attribute completely, and computed the offset from insn_addresses
>> instead.
>>
>> Ok for trunk?
>>
>> gcc/Changelog
>>
>> 2018-02-13 Sameera Deshpande 
>> * config/aarch64/aarch64.md (far_branch): Remove attribute. Eliminate
>> all the dependencies on the attribute from RTL patterns.
>>
>
> I'm not a maintainer but this looks good to me modulo notes about how
> this was tested. What would be nice is a testcase for the testsuite as
> well as ensuring that the patch has been bootstrapped and regression
> tested. AFAIR, the original patch was put in because match.pd failed
> when bootstrap in another context.
>
>
> regards
> Ramana
>
>> --
>> - Thanks and regards,
>>   Sameera D.

The patch is tested with GCC testsuite and bootstrapping successfully.
Also tested for spec benchmark.

-- 
- Thanks and regards,
  Sameera D.


Re: [Aarch64] Fix conditional branches with target far away.

2018-03-15 Thread Sameera Deshpande
Ping!

On 28 February 2018 at 16:18, Sameera Deshpande
 wrote:
> On 27 February 2018 at 18:25, Ramana Radhakrishnan
>  wrote:
>> On Wed, Feb 14, 2018 at 8:30 AM, Sameera Deshpande
>>  wrote:
>>> Hi!
>>>
>>> Please find attached the patch to fix bug in branches with offsets over 
>>> 1MiB.
>>> There has been an attempt to fix this issue in commit
>>> 050af05b9761f1979f11c151519e7244d5becd7c
>>>
>>> However, the far_branch attribute defined in above patch used
>>> insn_length - which computes incorrect offset. Hence, eliminated the
>>> attribute completely, and computed the offset from insn_addresses
>>> instead.
>>>
>>> Ok for trunk?
>>>
>>> gcc/Changelog
>>>
>>> 2018-02-13 Sameera Deshpande 
>>> * config/aarch64/aarch64.md (far_branch): Remove attribute. 
>>> Eliminate
>>> all the dependencies on the attribute from RTL patterns.
>>>
>>
>> I'm not a maintainer but this looks good to me modulo notes about how
>> this was tested. What would be nice is a testcase for the testsuite as
>> well as ensuring that the patch has been bootstrapped and regression
>> tested. AFAIR, the original patch was put in because match.pd failed
>> when bootstrap in another context.
>>
>>
>> regards
>> Ramana
>>
>>> --
>>> - Thanks and regards,
>>>   Sameera D.
>
> The patch is tested with GCC testsuite and bootstrapping successfully.
> Also tested for spec benchmark.
>
> --
> - Thanks and regards,
>   Sameera D.



-- 
- Thanks and regards,
  Sameera D.


Re: [Aarch64] Fix conditional branches with target far away.

2018-03-21 Thread Sameera Deshpande
Hi Sudakshina,

As per the ARMv8 ARM, for the offset range (-1048576 ,1048572), the
far branch instruction offset is inclusive of both the offsets. Hence,
I am using <=||=> and not <||>= as it was in previous implementation.

On 16 March 2018 at 00:51, Sudakshina Das  wrote:
> On 15/03/18 15:27, Sameera Deshpande wrote:
>>
>> Ping!
>>
>> On 28 February 2018 at 16:18, Sameera Deshpande
>>  wrote:
>>>
>>> On 27 February 2018 at 18:25, Ramana Radhakrishnan
>>>  wrote:
>>>>
>>>> On Wed, Feb 14, 2018 at 8:30 AM, Sameera Deshpande
>>>>  wrote:
>>>>>
>>>>> Hi!
>>>>>
>>>>> Please find attached the patch to fix bug in branches with offsets over
>>>>> 1MiB.
>>>>> There has been an attempt to fix this issue in commit
>>>>> 050af05b9761f1979f11c151519e7244d5becd7c
>>>>>
>>>>> However, the far_branch attribute defined in above patch used
>>>>> insn_length - which computes incorrect offset. Hence, eliminated the
>>>>> attribute completely, and computed the offset from insn_addresses
>>>>> instead.
>>>>>
>>>>> Ok for trunk?
>>>>>
>>>>> gcc/Changelog
>>>>>
>>>>> 2018-02-13 Sameera Deshpande 
>>>>>  * config/aarch64/aarch64.md (far_branch): Remove attribute.
>>>>> Eliminate
>>>>>  all the dependencies on the attribute from RTL patterns.
>>>>>
>>>>
>>>> I'm not a maintainer but this looks good to me modulo notes about how
>>>> this was tested. What would be nice is a testcase for the testsuite as
>>>> well as ensuring that the patch has been bootstrapped and regression
>>>> tested. AFAIR, the original patch was put in because match.pd failed
>>>> when bootstrap in another context.
>>>>
>>>>
>>>> regards
>>>> Ramana
>>>>
>>>>> --
>>>>> - Thanks and regards,
>>>>>Sameera D.
>>>
>>>
>>> The patch is tested with GCC testsuite and bootstrapping successfully.
>>> Also tested for spec benchmark.
>>>
>
> I am not a maintainer either. I noticed that the range check you do for
> the offset has a (<= || >=). The "far_branch" however did (< || >=) for a
> positive value. Was that also part of the incorrect offset calculation?
>
> @@ -692,7 +675,11 @@
> {
>   if (get_attr_length (insn) =3D=3D 8)
> {
> -   if (get_attr_far_branch (insn) =3D=3D 1)
> +   long long int offset;
> +   offset =3D INSN_ADDRESSES (INSN_UID (XEXP (operands[2], 0)))
> + - INSN_ADDRESSES (INSN_UID (insn));
> +
> +   if (offset <=3D -1048576 || offset >=3D 1048572)
>return aarch64_gen_far_branch (operands, 2, "Ltb",
>   "\\t%0, %1, ");
>  else
> @@ -709,12 +696,7 @@
>  (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int
> -32768))
> (lt (minus (match_dup 2) (pc)) (const_int
> 32764)))
>(const_int 4)
> - (const_int 8)))
> -   (set (attr "far_branch")
> -   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int
> -1048576))
> -  (lt (minus (match_dup 2) (pc)) (const_int
> 1048572)))
> - (const_int 0)
> - (const_int 1)))]
> + (const_int 8)))]
>
>   )
>
> Thanks
> Sudi
>
>>> --
>>> - Thanks and regards,
>>>Sameera D.
>>
>>
>>
>>
>



-- 
- Thanks and regards,
  Sameera D.


Re: [Aarch64] Fix conditional branches with target far away.

2018-03-29 Thread Sameera Deshpande
Hi Sudakshina,

Thanks for pointing that out. Updated the conditions for attribute
length to take care of boundary conditions for offset range.

Please find attached the updated patch.

I have tested it for gcc testsuite and the failing testcase. Ok for trunk?

On 22 March 2018 at 19:06, Sudakshina Das  wrote:
> Hi Sameera
>
> On 22/03/18 02:07, Sameera Deshpande wrote:
>>
>> Hi Sudakshina,
>>
>> As per the ARMv8 ARM, for the offset range (-1048576 ,1048572), the
>> far branch instruction offset is inclusive of both the offsets. Hence,
>> I am using <=||=> and not <||>= as it was in previous implementation.
>
>
> I have to admit earlier I was only looking at the patch mechanically and
> found a difference with the previous implementation in offset comparison.
> After you pointed out, I looked up the ARMv8 ARM and I have a couple of
> doubts:
>
> 1. My understanding is that any offset in [-1048576 ,1048572] both inclusive
> qualifies as an 'in range' offset. However, the code for both attribute
> length and far_branch has been using [-1048576 ,1048572), that is, ( >= && <
> ). If the far_branch was incorrectly calculated, then maybe the length
> calculations with similar magic numbers should also be corrected? Of course,
> I am not an expert in this and maybe this was a conscience decision so I
> would ask Ramana to maybe clarify if he remembers.
>
> 2. Now to come back to your patch, if my understanding is correct, I think a
> far_branch would be anything outside of this range, that is,
> (offset < -1048576 || offset > 1048572), anything that can not be
> represented in the 21-bit range.
>
> Thanks
> Sudi
>
>
>>
>> On 16 March 2018 at 00:51, Sudakshina Das  wrote:
>>>
>>> On 15/03/18 15:27, Sameera Deshpande wrote:
>>>>
>>>>
>>>> Ping!
>>>>
>>>> On 28 February 2018 at 16:18, Sameera Deshpande
>>>>  wrote:
>>>>>
>>>>>
>>>>> On 27 February 2018 at 18:25, Ramana Radhakrishnan
>>>>>  wrote:
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 14, 2018 at 8:30 AM, Sameera Deshpande
>>>>>>  wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi!
>>>>>>>
>>>>>>> Please find attached the patch to fix bug in branches with offsets
>>>>>>> over
>>>>>>> 1MiB.
>>>>>>> There has been an attempt to fix this issue in commit
>>>>>>> 050af05b9761f1979f11c151519e7244d5becd7c
>>>>>>>
>>>>>>> However, the far_branch attribute defined in above patch used
>>>>>>> insn_length - which computes incorrect offset. Hence, eliminated the
>>>>>>> attribute completely, and computed the offset from insn_addresses
>>>>>>> instead.
>>>>>>>
>>>>>>> Ok for trunk?
>>>>>>>
>>>>>>> gcc/Changelog
>>>>>>>
>>>>>>> 2018-02-13 Sameera Deshpande 
>>>>>>>   * config/aarch64/aarch64.md (far_branch): Remove attribute.
>>>>>>> Eliminate
>>>>>>>   all the dependencies on the attribute from RTL patterns.
>>>>>>>
>>>>>>
>>>>>> I'm not a maintainer but this looks good to me modulo notes about how
>>>>>> this was tested. What would be nice is a testcase for the testsuite as
>>>>>> well as ensuring that the patch has been bootstrapped and regression
>>>>>> tested. AFAIR, the original patch was put in because match.pd failed
>>>>>> when bootstrap in another context.
>>>>>>
>>>>>>
>>>>>> regards
>>>>>> Ramana
>>>>>>
>>>>>>> --
>>>>>>> - Thanks and regards,
>>>>>>> Sameera D.
>>>>>
>>>>>
>>>>>
>>>>> The patch is tested with GCC testsuite and bootstrapping successfully.
>>>>> Also tested for spec benchmark.
>>>>>
>>>
>>> I am not a maintainer either. I noticed that the range check you do for
>>> the offset has a (<= || >=). The "far_branch" however did (< || >=) for a
>>> positive value. Was that also part of the incorrect offset calculation?
>>>
>>> @@ -692,7 +675,11 @@
>&

Re: [Aarch64] Fix conditional branches with target far away.

2018-03-29 Thread Sameera Deshpande
Hi Sudakshina,

That testcase cannot be addwd as of now, as it needs approval from client.

On Thu 29 Mar, 2018, 9:01 PM Sudakshina Das,  wrote:

> Hi Sameera
>
> On 29/03/18 11:44, Sameera Deshpande wrote:
> > Hi Sudakshina,
> >
> > Thanks for pointing that out. Updated the conditions for attribute
> > length to take care of boundary conditions for offset range.
> >
> > Please find attached the updated patch.
> >
> > I have tested it for gcc testsuite and the failing testcase. Ok for
> trunk?
>
> Thank you so much for fixing the length as well along with you patch.
> You mention a failing testcase? Maybe it would be helpful to add that
> to the patch for the gcc testsuite.
>
> Sudi
>
> >
> > On 22 March 2018 at 19:06, Sudakshina Das  wrote:
> >> Hi Sameera
> >>
> >> On 22/03/18 02:07, Sameera Deshpande wrote:
> >>>
> >>> Hi Sudakshina,
> >>>
> >>> As per the ARMv8 ARM, for the offset range (-1048576 ,1048572), the
> >>> far branch instruction offset is inclusive of both the offsets. Hence,
> >>> I am using <=||=> and not <||>= as it was in previous implementation.
> >>
> >>
> >> I have to admit earlier I was only looking at the patch mechanically and
> >> found a difference with the previous implementation in offset
> comparison.
> >> After you pointed out, I looked up the ARMv8 ARM and I have a couple of
> >> doubts:
> >>
> >> 1. My understanding is that any offset in [-1048576 ,1048572] both
> inclusive
> >> qualifies as an 'in range' offset. However, the code for both attribute
> >> length and far_branch has been using [-1048576 ,1048572), that is, ( >=
> && <
> >> ). If the far_branch was incorrectly calculated, then maybe the length
> >> calculations with similar magic numbers should also be corrected? Of
> course,
> >> I am not an expert in this and maybe this was a conscience decision so I
> >> would ask Ramana to maybe clarify if he remembers.
> >>
> >> 2. Now to come back to your patch, if my understanding is correct, I
> think a
> >> far_branch would be anything outside of this range, that is,
> >> (offset < -1048576 || offset > 1048572), anything that can not be
> >> represented in the 21-bit range.
> >>
> >> Thanks
> >> Sudi
> >>
> >>
> >>>
> >>> On 16 March 2018 at 00:51, Sudakshina Das  wrote:
> >>>>
> >>>> On 15/03/18 15:27, Sameera Deshpande wrote:
> >>>>>
> >>>>>
> >>>>> Ping!
> >>>>>
> >>>>> On 28 February 2018 at 16:18, Sameera Deshpande
> >>>>>  wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 27 February 2018 at 18:25, Ramana Radhakrishnan
> >>>>>>  wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Feb 14, 2018 at 8:30 AM, Sameera Deshpande
> >>>>>>>  wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Hi!
> >>>>>>>>
> >>>>>>>> Please find attached the patch to fix bug in branches with offsets
> >>>>>>>> over
> >>>>>>>> 1MiB.
> >>>>>>>> There has been an attempt to fix this issue in commit
> >>>>>>>> 050af05b9761f1979f11c151519e7244d5becd7c
> >>>>>>>>
> >>>>>>>> However, the far_branch attribute defined in above patch used
> >>>>>>>> insn_length - which computes incorrect offset. Hence, eliminated
> the
> >>>>>>>> attribute completely, and computed the offset from insn_addresses
> >>>>>>>> instead.
> >>>>>>>>
> >>>>>>>> Ok for trunk?
> >>>>>>>>
> >>>>>>>> gcc/Changelog
> >>>>>>>>
> >>>>>>>> 2018-02-13 Sameera Deshpande 
> >>>>>>>>* config/aarch64/aarch64.md (far_branch): Remove
> attribute.
> >>>>>>>> Eliminate
> >>>>>>>>all the dependencies on the attribute from RTL
> patterns.
> >>>>>>>>
> >>>>>>>
> >>>&

Re: [Aarch64] Fix conditional branches with target far away.

2018-03-30 Thread Sameera Deshpande
On 30 March 2018 at 16:39, Richard Sandiford
 wrote:
>> Hi Sudakshina,
>>
>> Thanks for pointing that out. Updated the conditions for attribute
>> length to take care of boundary conditions for offset range.
>>
>> Please find attached the updated patch.
>>
>> I have tested it for gcc testsuite and the failing testcase. Ok for trunk?
>>
>> On 22 March 2018 at 19:06, Sudakshina Das  wrote:
>>> Hi Sameera
>>>
>>> On 22/03/18 02:07, Sameera Deshpande wrote:
>>>>
>>>> Hi Sudakshina,
>>>>
>>>> As per the ARMv8 ARM, for the offset range (-1048576 ,1048572), the
>>>> far branch instruction offset is inclusive of both the offsets. Hence,
>>>> I am using <=||=> and not <||>= as it was in previous implementation.
>>>
>>>
>>> I have to admit earlier I was only looking at the patch mechanically and
>>> found a difference with the previous implementation in offset comparison.
>>> After you pointed out, I looked up the ARMv8 ARM and I have a couple of
>>> doubts:
>>>
>>> 1. My understanding is that any offset in [-1048576 ,1048572] both inclusive
>>> qualifies as an 'in range' offset. However, the code for both attribute
>>> length and far_branch has been using [-1048576 ,1048572), that is, ( >= && <
>>> ). If the far_branch was incorrectly calculated, then maybe the length
>>> calculations with similar magic numbers should also be corrected? Of course,
>>> I am not an expert in this and maybe this was a conscience decision so I
>>> would ask Ramana to maybe clarify if he remembers.
>>>
>>> 2. Now to come back to your patch, if my understanding is correct, I think a
>>> far_branch would be anything outside of this range, that is,
>>> (offset < -1048576 || offset > 1048572), anything that can not be
>>> represented in the 21-bit range.
>>>
>>> Thanks
>>> Sudi
>
> [...]
>
>> @@ -466,14 +459,9 @@
>>[(set_attr "type" "branch")
>> (set (attr "length")
>>   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
>> -(lt (minus (match_dup 2) (pc)) (const_int 1048572)))
>> +(le (minus (match_dup 2) (pc)) (const_int 1048572)))
>> (const_int 4)
>> -   (const_int 8)))
>
> Sorry for not replying earlier, but I think the use of "lt" rather than
> "le" in the current length attribute is deliberate.  Distances measured
> from (pc) in "length" are a bit special in that backward distances are
> measured from the start of the instruction and forward distances are
> measured from the end of the instruction:
>
>   /* The address of the current insn.  We implement this actually as the
>  address of the current insn for backward branches, but the last
>  address of the next insn for forward branches, and both with
>  adjustments that account for the worst-case possible stretching of
>  intervening alignments between this insn and its destination.  */
>
> This avoids the chicken-and-egg situation of the length of the branch
> depending on the forward distance and the forward distance depending
> on the length of the branch.
>
> In contrast, this code:
>
>> @@ -712,7 +695,11 @@
>>{
>>  if (get_attr_length (insn) == 8)
>>{
>> - if (get_attr_far_branch (insn) == 1)
>> + long long int offset;
>> + offset = INSN_ADDRESSES (INSN_UID (XEXP (operands[2], 0)))
>> +   - INSN_ADDRESSES (INSN_UID (insn));
>> +
>> + if (offset < -1048576 || offset > 1048572)
>> return aarch64_gen_far_branch (operands, 2, "Ltb",
>>"\\t%0, %1, ");
>>   else
>
> is reading the final computed addresses, so the code is right to use
> the real instruction range.  (FWIW I agree with Kyrill that using
> IN_RANGE with hex constants would be clearer.)
>
> That said... a possible problem comes from situations like:
>
>address length insn
>..c  8 A
>   ..align to 8 bytes...
>..8B
>..c  4 C
>   ..align to 16 bytes...
>..0D, branch to B
>
> when D is at the maximum extent of the branch range and when GCC's length
> for A is only a conservative estimate.  If the length of A turns out to
> be 4 rather than 8 at assembly time, the align

Re: [Aarch64] Fix conditional branches with target far away.

2018-03-30 Thread Sameera Deshpande
Hi Richard,

The testcase is working with the patch you suggested, thanks for
pointing that out.

On 30 March 2018 at 16:54, Sameera Deshpande
 wrote:
> On 30 March 2018 at 16:39, Richard Sandiford
>  wrote:
>>> Hi Sudakshina,
>>>
>>> Thanks for pointing that out. Updated the conditions for attribute
>>> length to take care of boundary conditions for offset range.
>>>
>>> Please find attached the updated patch.
>>>
>>> I have tested it for gcc testsuite and the failing testcase. Ok for trunk?
>>>
>>> On 22 March 2018 at 19:06, Sudakshina Das  wrote:
>>>> Hi Sameera
>>>>
>>>> On 22/03/18 02:07, Sameera Deshpande wrote:
>>>>>
>>>>> Hi Sudakshina,
>>>>>
>>>>> As per the ARMv8 ARM, for the offset range (-1048576 ,1048572), the
>>>>> far branch instruction offset is inclusive of both the offsets. Hence,
>>>>> I am using <=||=> and not <||>= as it was in previous implementation.
>>>>
>>>>
>>>> I have to admit earlier I was only looking at the patch mechanically and
>>>> found a difference with the previous implementation in offset comparison.
>>>> After you pointed out, I looked up the ARMv8 ARM and I have a couple of
>>>> doubts:
>>>>
>>>> 1. My understanding is that any offset in [-1048576 ,1048572] both 
>>>> inclusive
>>>> qualifies as an 'in range' offset. However, the code for both attribute
>>>> length and far_branch has been using [-1048576 ,1048572), that is, ( >= && 
>>>> <
>>>> ). If the far_branch was incorrectly calculated, then maybe the length
>>>> calculations with similar magic numbers should also be corrected? Of 
>>>> course,
>>>> I am not an expert in this and maybe this was a conscience decision so I
>>>> would ask Ramana to maybe clarify if he remembers.
>>>>
>>>> 2. Now to come back to your patch, if my understanding is correct, I think 
>>>> a
>>>> far_branch would be anything outside of this range, that is,
>>>> (offset < -1048576 || offset > 1048572), anything that can not be
>>>> represented in the 21-bit range.
>>>>
>>>> Thanks
>>>> Sudi
>>
>> [...]
>>
>>> @@ -466,14 +459,9 @@
>>>[(set_attr "type" "branch")
>>> (set (attr "length")
>>>   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int 
>>> -1048576))
>>> -(lt (minus (match_dup 2) (pc)) (const_int 
>>> 1048572)))
>>> +(le (minus (match_dup 2) (pc)) (const_int 
>>> 1048572)))
>>> (const_int 4)
>>> -   (const_int 8)))
>>
>> Sorry for not replying earlier, but I think the use of "lt" rather than
>> "le" in the current length attribute is deliberate.  Distances measured
>> from (pc) in "length" are a bit special in that backward distances are
>> measured from the start of the instruction and forward distances are
>> measured from the end of the instruction:
>>
>>   /* The address of the current insn.  We implement this actually as the
>>  address of the current insn for backward branches, but the last
>>  address of the next insn for forward branches, and both with
>>  adjustments that account for the worst-case possible stretching of
>>  intervening alignments between this insn and its destination.  */
>>
>> This avoids the chicken-and-egg situation of the length of the branch
>> depending on the forward distance and the forward distance depending
>> on the length of the branch.
>>
>> In contrast, this code:
>>
>>> @@ -712,7 +695,11 @@
>>>{
>>>  if (get_attr_length (insn) == 8)
>>>{
>>> - if (get_attr_far_branch (insn) == 1)
>>> + long long int offset;
>>> + offset = INSN_ADDRESSES (INSN_UID (XEXP (operands[2], 0)))
>>> +   - INSN_ADDRESSES (INSN_UID (insn));
>>> +
>>> + if (offset < -1048576 || offset > 1048572)
>>> return aarch64_gen_far_branch (operands, 2, "Ltb",
>>>"\\t%0, %1, ");
>>>   else
>>
>> is reading the final computed addresses, so the code is right to use
>> the real instruction

Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2018-04-06 Thread Sameera Deshpande
Hi Christophe,

Please find attached the updated patch with testcases.

Ok for trunk?

- Thanks and regards,
  Sameera D.

2017-12-14 22:17 GMT+05:30 Christophe Lyon :
> 2017-12-14 9:29 GMT+01:00 Sameera Deshpande :
>> Hi!
>>
>> Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and
>> vst1_*_x3 intrinsics as defined by Neon document.
>>
>> Ok for trunk?
>>
>> - Thanks and regards,
>>   Sameera D.
>>
>> gcc/Changelog:
>>
>> 2017-11-14  Sameera Deshpande  
>>
>>
>> * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
>> (st1x2): Likewise.
>> (st1x3): Likewise.
>> * config/aarch64/aarch64-simd.md
>> (aarch64_ld1x3): New pattern.
>> (aarch64_ld1_x3_): Likewise
>> (aarch64_st1x2): Likewise
>> (aarch64_st1_x2_): Likewise
>> (aarch64_st1x3): Likewise
>> (aarch64_st1_x3_): Likewise
>> * config/aarch64/arm_neon.h (vld1_u8_x3): New function.
>> (vld1_s8_x3): Likewise.
>> (vld1_u16_x3): Likewise.
>> (vld1_s16_x3): Likewise.
>> (vld1_u32_x3): Likewise.
>> (vld1_s32_x3): Likewise.
>> (vld1_u64_x3): Likewise.
>> (vld1_s64_x3): Likewise.
>> (vld1_fp16_x3): Likewise.
>> (vld1_f32_x3): Likewise.
>> (vld1_f64_x3): Likewise.
>> (vld1_p8_x3): Likewise.
>> (vld1_p16_x3): Likewise.
>> (vld1_p64_x3): Likewise.
>> (vld1q_u8_x3): Likewise.
>> (vld1q_s8_x3): Likewise.
>> (vld1q_u16_x3): Likewise.
>> (vld1q_s16_x3): Likewise.
>> (vld1q_u32_x3): Likewise.
>> (vld1q_s32_x3): Likewise.
>> (vld1q_u64_x3): Likewise.
>> (vld1q_s64_x3): Likewise.
>> (vld1q_f16_x3): Likewise.
>> (vld1q_f32_x3): Likewise.
>> (vld1q_f64_x3): Likewise.
>> (vld1q_p8_x3): Likewise.
>> (vld1q_p16_x3): Likewise.
>> (vld1q_p64_x3): Likewise.
>> (vst1_s64_x2): Likewise.
>> (vst1_u64_x2): Likewise.
>> (vst1_f64_x2): Likewise.
>> (vst1_s8_x2): Likewise.
>> (vst1_p8_x2): Likewise.
>> (vst1_s16_x2): Likewise.
>> (vst1_p16_x2): Likewise.
>> (vst1_s32_x2): Likewise.
>> (vst1_u8_x2): Likewise.
>> (vst1_u16_x2): Likewise.
>> (vst1_u32_x2): Likewise.
>> (vst1_f16_x2): Likewise.
>> (vst1_f32_x2): Likewise.
>> (vst1_p64_x2): Likewise.
>> (vst1q_s8_x2): Likewise.
>> (vst1q_p8_x2): Likewise.
>> (vst1q_s16_x2): Likewise.
>> (vst1q_p16_x2): Likewise.
>> (vst1q_s32_x2): Likewise.
>> (vst1q_s64_x2): Likewise.
>> (vst1q_u8_x2): Likewise.
>> (vst1q_u16_x2): Likewise.
>> (vst1q_u32_x2): Likewise.
>> (vst1q_u64_x2): Likewise.
>> (vst1q_f16_x2): Likewise.
>> (vst1q_f32_x2): Likewise.
>> (vst1q_f64_x2): Likewise.
>> (vst1q_p64_x2): Likewise.
>> (vst1_s64_x3): Likewise.
>> (vst1_u64_x3): Likewise.
>> (vst1_f64_x3): Likewise.
>> (vst1_s8_x3): Likewise.
>> (vst1_p8_x3): Likewise.
>> (vst1_s16_x3): Likewise.
>> (vst1_p16_x3): Likewise.
>> (vst1_s32_x3): Likewise.
>> (vst1_u8_x3): Likewise.
>> (vst1_u16_x3): Likewise.
>> (vst1_u32_x3): Likewise.
>> (vst1_f16_x3): Likewise.
>> (vst1_f32_x3): Likewise.
>> (vst1_p64_x3): Likewise.
>> (vst1q_s8_x3): Likewise.
>> (vst1q_p8_x3): Likewise.
>> (vst1q_s16_x3): Likewise.
>> (vst1q_p16_x3): Likewise.
>> (vst1q_s32_x3): Likewise.
>> (vst1q_s64_x3): Likewise.
>> (vst1q_u8_x3): Likewise.
>> (vst1q_u16_x3): Likewise.
>> (vst1q_u32_x3): Likewise.
>> (vst1q_u64_x3): Likewise.
>> (vst1q_f16_x3): Likewise.
>> (vst1q_f32_x3): Likewise.
>> (vst1q_f64_x3): Likewise.
>> (vst1q_p64_x3): Likewise.
>
> Hi,
> I'm not a maintainer, but I suspect you should add some tests.
>
> Christophe



-- 
- Thanks and regards,
  Sameera D.
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index b383f24..2fd072a 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarc

Re: [Aarch64] Fix conditional branches with target far away.

2018-04-09 Thread Sameera Deshpande
Hi Richard,

I do not see the said patch applied in ToT yet. When do you expect it
to be available in ToT?

- Thanks and regards,
  Sameera D.

On 30 March 2018 at 17:01, Sameera Deshpande
 wrote:
> Hi Richard,
>
> The testcase is working with the patch you suggested, thanks for
> pointing that out.
>
> On 30 March 2018 at 16:54, Sameera Deshpande
>  wrote:
>> On 30 March 2018 at 16:39, Richard Sandiford
>>  wrote:
>>>> Hi Sudakshina,
>>>>
>>>> Thanks for pointing that out. Updated the conditions for attribute
>>>> length to take care of boundary conditions for offset range.
>>>>
>>>> Please find attached the updated patch.
>>>>
>>>> I have tested it for gcc testsuite and the failing testcase. Ok for trunk?
>>>>
>>>> On 22 March 2018 at 19:06, Sudakshina Das  wrote:
>>>>> Hi Sameera
>>>>>
>>>>> On 22/03/18 02:07, Sameera Deshpande wrote:
>>>>>>
>>>>>> Hi Sudakshina,
>>>>>>
>>>>>> As per the ARMv8 ARM, for the offset range (-1048576 ,1048572), the
>>>>>> far branch instruction offset is inclusive of both the offsets. Hence,
>>>>>> I am using <=||=> and not <||>= as it was in previous implementation.
>>>>>
>>>>>
>>>>> I have to admit earlier I was only looking at the patch mechanically and
>>>>> found a difference with the previous implementation in offset comparison.
>>>>> After you pointed out, I looked up the ARMv8 ARM and I have a couple of
>>>>> doubts:
>>>>>
>>>>> 1. My understanding is that any offset in [-1048576 ,1048572] both 
>>>>> inclusive
>>>>> qualifies as an 'in range' offset. However, the code for both attribute
>>>>> length and far_branch has been using [-1048576 ,1048572), that is, ( >= 
>>>>> && <
>>>>> ). If the far_branch was incorrectly calculated, then maybe the length
>>>>> calculations with similar magic numbers should also be corrected? Of 
>>>>> course,
>>>>> I am not an expert in this and maybe this was a conscience decision so I
>>>>> would ask Ramana to maybe clarify if he remembers.
>>>>>
>>>>> 2. Now to come back to your patch, if my understanding is correct, I 
>>>>> think a
>>>>> far_branch would be anything outside of this range, that is,
>>>>> (offset < -1048576 || offset > 1048572), anything that can not be
>>>>> represented in the 21-bit range.
>>>>>
>>>>> Thanks
>>>>> Sudi
>>>
>>> [...]
>>>
>>>> @@ -466,14 +459,9 @@
>>>>[(set_attr "type" "branch")
>>>> (set (attr "length")
>>>>   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int 
>>>> -1048576))
>>>> -(lt (minus (match_dup 2) (pc)) (const_int 
>>>> 1048572)))
>>>> +(le (minus (match_dup 2) (pc)) (const_int 
>>>> 1048572)))
>>>> (const_int 4)
>>>> -   (const_int 8)))
>>>
>>> Sorry for not replying earlier, but I think the use of "lt" rather than
>>> "le" in the current length attribute is deliberate.  Distances measured
>>> from (pc) in "length" are a bit special in that backward distances are
>>> measured from the start of the instruction and forward distances are
>>> measured from the end of the instruction:
>>>
>>>   /* The address of the current insn.  We implement this actually as the
>>>  address of the current insn for backward branches, but the last
>>>  address of the next insn for forward branches, and both with
>>>  adjustments that account for the worst-case possible stretching of
>>>  intervening alignments between this insn and its destination.  */
>>>
>>> This avoids the chicken-and-egg situation of the length of the branch
>>> depending on the forward distance and the forward distance depending
>>> on the length of the branch.
>>>
>>> In contrast, this code:
>>>
>>>> @@ -712,7 +695,11 @@
>>>>{
>>>>  if (get_attr_length (insn) == 8)
>>>>{
>>>> - if (get_attr_far_branch (insn) =

Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2018-04-10 Thread Sameera Deshpande
On 7 April 2018 at 01:25, Christophe Lyon  wrote:
> Hi,
>
> 2018-04-06 12:15 GMT+02:00 Sameera Deshpande :
>> Hi Christophe,
>>
>> Please find attached the updated patch with testcases.
>>
>> Ok for trunk?
>
> Thanks for the update.
>
> Since the new intrinsics are only available on aarch64, you want to
> prevent the tests from running on arm.
> Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two 
> targets.
> There are several examples on how to do that in that directory.
>
> I have also noticed that the tests fail at execution on aarch64_be.
>
> I didn't look at the patch in details.
>
> Christophe
>
>
>>
>> - Thanks and regards,
>>   Sameera D.
>>
>> 2017-12-14 22:17 GMT+05:30 Christophe Lyon :
>>> 2017-12-14 9:29 GMT+01:00 Sameera Deshpande :
>>>> Hi!
>>>>
>>>> Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and
>>>> vst1_*_x3 intrinsics as defined by Neon document.
>>>>
>>>> Ok for trunk?
>>>>
>>>> - Thanks and regards,
>>>>   Sameera D.
>>>>
>>>> gcc/Changelog:
>>>>
>>>> 2017-11-14  Sameera Deshpande  
>>>>
>>>>
>>>> * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
>>>> (st1x2): Likewise.
>>>> (st1x3): Likewise.
>>>> * config/aarch64/aarch64-simd.md
>>>> (aarch64_ld1x3): New pattern.
>>>> (aarch64_ld1_x3_): Likewise
>>>> (aarch64_st1x2): Likewise
>>>> (aarch64_st1_x2_): Likewise
>>>> (aarch64_st1x3): Likewise
>>>> (aarch64_st1_x3_): Likewise
>>>> * config/aarch64/arm_neon.h (vld1_u8_x3): New function.
>>>> (vld1_s8_x3): Likewise.
>>>> (vld1_u16_x3): Likewise.
>>>> (vld1_s16_x3): Likewise.
>>>> (vld1_u32_x3): Likewise.
>>>> (vld1_s32_x3): Likewise.
>>>> (vld1_u64_x3): Likewise.
>>>> (vld1_s64_x3): Likewise.
>>>> (vld1_fp16_x3): Likewise.
>>>> (vld1_f32_x3): Likewise.
>>>> (vld1_f64_x3): Likewise.
>>>> (vld1_p8_x3): Likewise.
>>>> (vld1_p16_x3): Likewise.
>>>> (vld1_p64_x3): Likewise.
>>>> (vld1q_u8_x3): Likewise.
>>>> (vld1q_s8_x3): Likewise.
>>>> (vld1q_u16_x3): Likewise.
>>>> (vld1q_s16_x3): Likewise.
>>>> (vld1q_u32_x3): Likewise.
>>>> (vld1q_s32_x3): Likewise.
>>>> (vld1q_u64_x3): Likewise.
>>>> (vld1q_s64_x3): Likewise.
>>>> (vld1q_f16_x3): Likewise.
>>>> (vld1q_f32_x3): Likewise.
>>>> (vld1q_f64_x3): Likewise.
>>>> (vld1q_p8_x3): Likewise.
>>>> (vld1q_p16_x3): Likewise.
>>>> (vld1q_p64_x3): Likewise.
>>>> (vst1_s64_x2): Likewise.
>>>> (vst1_u64_x2): Likewise.
>>>> (vst1_f64_x2): 
>>>> Likewise.patchurl=http://people.linaro.org/~christophe.lyon/armv8_2-fp16-scalar-2.patch3
> patchname=armv8_2-fp16-scalar-2.patch3
> refrev=259064
> email_to=christophe.l...@linaro.org
>
>>>> (vst1_s8_x2): Likewise.
>>>> (vst1_p8_x2): Likewise.
>>>> (vst1_s16_x2): Likewise.
>>>> (vst1_p16_x2): Likewise.
>>>> (vst1_s32_x2): Likewise.
>>>> (vst1_u8_x2): Likewise.
>>>> (vst1_u16_x2): Likewise.
>>>> (vst1_u32_x2): Likewise.
>>>> (vst1_f16_x2): Likewise.
>>>> (vst1_f32_x2): Likewise.
>>>> (vst1_p64_x2): Likewise.
>>>> (vst1q_s8_x2): Likewise.
>>>> (vst1q_p8_x2): Likewise.
>>>> (vst1q_s16_x2): Likewise.
>>>> (vst1q_p16_x2): Likewise.
>>>> (vst1q_s32_x2): Likewise.
>>>> (vst1q_s64_x2): Likewise.
>>>> (vst1q_u8_x2): Likewise.
>>>> (vst1q_u16_x2): Likewise.
>>>> (vst1q_u32_x2): Likewise.
>>>> (vst1q_u64_x2): Likewise.
>>>> (vst1q_f16_x2): Likewise.
>>>> (vst1q_f32_x2): Likewise.
>>>> (vst1q_f64_x2): Likewise.
>>>> (vst1q_p64_x2): Likewise.
>>>>  

Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2018-04-11 Thread Sameera Deshpande
On 10 April 2018 at 20:07, Sudakshina Das  wrote:
> Hi Sameera
>
>
> On 10/04/18 11:20, Sameera Deshpande wrote:
>>
>> On 7 April 2018 at 01:25, Christophe Lyon 
>> wrote:
>>>
>>> Hi,
>>>
>>> 2018-04-06 12:15 GMT+02:00 Sameera Deshpande
>>> :
>>>>
>>>> Hi Christophe,
>>>>
>>>> Please find attached the updated patch with testcases.
>>>>
>>>> Ok for trunk?
>>>
>>>
>>> Thanks for the update.
>>>
>>> Since the new intrinsics are only available on aarch64, you want to
>>> prevent the tests from running on arm.
>>> Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two
>>> targets.
>>> There are several examples on how to do that in that directory.
>>>
>>> I have also noticed that the tests fail at execution on aarch64_be.
>>>
>>> I didn't look at the patch in details.
>>>
>>> Christophe
>>>
>>>
>>>>
>>>> - Thanks and regards,
>>>>Sameera D.
>>>>
>>>> 2017-12-14 22:17 GMT+05:30 Christophe Lyon :
>>>>>
>>>>> 2017-12-14 9:29 GMT+01:00 Sameera Deshpande
>>>>> :
>>>>>>
>>>>>> Hi!
>>>>>>
>>>>>> Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and
>>>>>> vst1_*_x3 intrinsics as defined by Neon document.
>>>>>>
>>>>>> Ok for trunk?
>>>>>>
>>>>>> - Thanks and regards,
>>>>>>Sameera D.
>>>>>>
>>>>>> gcc/Changelog:
>>>>>>
>>>>>> 2017-11-14  Sameera Deshpande  
>>>>>>
>>>>>>
>>>>>>  * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
>>>>>>  (st1x2): Likewise.
>>>>>>  (st1x3): Likewise.
>>>>>>  * config/aarch64/aarch64-simd.md
>>>>>> (aarch64_ld1x3): New pattern.
>>>>>>  (aarch64_ld1_x3_): Likewise
>>>>>>  (aarch64_st1x2): Likewise
>>>>>>  (aarch64_st1_x2_): Likewise
>>>>>>  (aarch64_st1x3): Likewise
>>>>>>  (aarch64_st1_x3_): Likewise
>>>>>>  * config/aarch64/arm_neon.h (vld1_u8_x3): New function.
>>>>>>  (vld1_s8_x3): Likewise.
>>>>>>  (vld1_u16_x3): Likewise.
>>>>>>  (vld1_s16_x3): Likewise.
>>>>>>  (vld1_u32_x3): Likewise.
>>>>>>  (vld1_s32_x3): Likewise.
>>>>>>  (vld1_u64_x3): Likewise.
>>>>>>  (vld1_s64_x3): Likewise.
>>>>>>  (vld1_fp16_x3): Likewise.
>>>>>>  (vld1_f32_x3): Likewise.
>>>>>>  (vld1_f64_x3): Likewise.
>>>>>>  (vld1_p8_x3): Likewise.
>>>>>>  (vld1_p16_x3): Likewise.
>>>>>>  (vld1_p64_x3): Likewise.
>>>>>>  (vld1q_u8_x3): Likewise.
>>>>>>  (vld1q_s8_x3): Likewise.
>>>>>>  (vld1q_u16_x3): Likewise.
>>>>>>  (vld1q_s16_x3): Likewise.
>>>>>>  (vld1q_u32_x3): Likewise.
>>>>>>  (vld1q_s32_x3): Likewise.
>>>>>>  (vld1q_u64_x3): Likewise.
>>>>>>  (vld1q_s64_x3): Likewise.
>>>>>>  (vld1q_f16_x3): Likewise.
>>>>>>  (vld1q_f32_x3): Likewise.
>>>>>>  (vld1q_f64_x3): Likewise.
>>>>>>  (vld1q_p8_x3): Likewise.
>>>>>>  (vld1q_p16_x3): Likewise.
>>>>>>  (vld1q_p64_x3): Likewise.
>>>>>>  (vst1_s64_x2): Likewise.
>>>>>>  (vst1_u64_x2): Likewise.
>>>>>>  (vst1_f64_x2):
>>>>>> Likewise.patchurl=http://people.linaro.org/~christophe.lyon/armv8_2-fp16-scalar-2.patch3
>>>
>>> patchname=armv8_2-fp16-scalar-2.patch3
>>> refrev=259064
>>> email_to=christophe.l...@linaro.org
>>>
>>>>>>  (vst1_s8_x2): Likewise.
>>>>>>  (vst1_p8_x2): Likewise.
>>>>>>  (vst1_s16_x2): Likewise.
>>>>>>  (vst1_p16_x2): Lik

Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2018-04-11 Thread Sameera Deshpande
On 11 April 2018 at 15:53, Sudakshina Das  wrote:
> Hi Sameera
>
>
> On 11/04/18 09:04, Sameera Deshpande wrote:
>>
>> On 10 April 2018 at 20:07, Sudakshina Das  wrote:
>>>
>>> Hi Sameera
>>>
>>>
>>> On 10/04/18 11:20, Sameera Deshpande wrote:
>>>>
>>>>
>>>> On 7 April 2018 at 01:25, Christophe Lyon 
>>>> wrote:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> 2018-04-06 12:15 GMT+02:00 Sameera Deshpande
>>>>> :
>>>>>>
>>>>>>
>>>>>> Hi Christophe,
>>>>>>
>>>>>> Please find attached the updated patch with testcases.
>>>>>>
>>>>>> Ok for trunk?
>>>>>
>>>>>
>>>>>
>>>>> Thanks for the update.
>>>>>
>>>>> Since the new intrinsics are only available on aarch64, you want to
>>>>> prevent the tests from running on arm.
>>>>> Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two
>>>>> targets.
>>>>> There are several examples on how to do that in that directory.
>>>>>
>>>>> I have also noticed that the tests fail at execution on aarch64_be.
>>>>>
>>>>> I didn't look at the patch in details.
>>>>>
>>>>> Christophe
>>>>>
>>>>>
>>>>>>
>>>>>> - Thanks and regards,
>>>>>> Sameera D.
>>>>>>
>>>>>> 2017-12-14 22:17 GMT+05:30 Christophe Lyon
>>>>>> :
>>>>>>>
>>>>>>>
>>>>>>> 2017-12-14 9:29 GMT+01:00 Sameera Deshpande
>>>>>>> :
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi!
>>>>>>>>
>>>>>>>> Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and
>>>>>>>> vst1_*_x3 intrinsics as defined by Neon document.
>>>>>>>>
>>>>>>>> Ok for trunk?
>>>>>>>>
>>>>>>>> - Thanks and regards,
>>>>>>>> Sameera D.
>>>>>>>>
>>>>>>>> gcc/Changelog:
>>>>>>>>
>>>>>>>> 2017-11-14  Sameera Deshpande  
>>>>>>>>
>>>>>>>>
>>>>>>>>   * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
>>>>>>>>   (st1x2): Likewise.
>>>>>>>>   (st1x3): Likewise.
>>>>>>>>   * config/aarch64/aarch64-simd.md
>>>>>>>> (aarch64_ld1x3): New pattern.
>>>>>>>>   (aarch64_ld1_x3_): Likewise
>>>>>>>>   (aarch64_st1x2): Likewise
>>>>>>>>   (aarch64_st1_x2_): Likewise
>>>>>>>>   (aarch64_st1x3): Likewise
>>>>>>>>   (aarch64_st1_x3_): Likewise
>>>>>>>>   * config/aarch64/arm_neon.h (vld1_u8_x3): New function.
>>>>>>>>   (vld1_s8_x3): Likewise.
>>>>>>>>   (vld1_u16_x3): Likewise.
>>>>>>>>   (vld1_s16_x3): Likewise.
>>>>>>>>   (vld1_u32_x3): Likewise.
>>>>>>>>   (vld1_s32_x3): Likewise.
>>>>>>>>   (vld1_u64_x3): Likewise.
>>>>>>>>   (vld1_s64_x3): Likewise.
>>>>>>>>   (vld1_fp16_x3): Likewise.
>>>>>>>>   (vld1_f32_x3): Likewise.
>>>>>>>>   (vld1_f64_x3): Likewise.
>>>>>>>>   (vld1_p8_x3): Likewise.
>>>>>>>>   (vld1_p16_x3): Likewise.
>>>>>>>>   (vld1_p64_x3): Likewise.
>>>>>>>>   (vld1q_u8_x3): Likewise.
>>>>>>>>   (vld1q_s8_x3): Likewise.
>>>>>>>>   (vld1q_u16_x3): Likewise.
>>>>>>>>   (vld1q_s16_x3): Likewise.
>>>>>>>>   (vld1q_u32_x3): Likewise.
>>>>>>>>   (vld1q_s32_x3): Likewise.
>>>>>>>>   (vld1q_u64_x3): Likewis

Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2018-04-13 Thread Sameera Deshpande
On Fri 13 Apr, 2018, 8:04 PM James Greenhalgh, 
wrote:

> On Fri, Apr 06, 2018 at 08:55:47PM +0100, Christophe Lyon wrote:
> > Hi,
> >
> > 2018-04-06 12:15 GMT+02:00 Sameera Deshpande <
> sameera.deshpa...@linaro.org>:
> > > Hi Christophe,
> > >
> > > Please find attached the updated patch with testcases.
> > >
> > > Ok for trunk?
> >
> > Thanks for the update.
> >
> > Since the new intrinsics are only available on aarch64, you want to
> > prevent the tests from running on arm.
> > Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two
> targets.
> > There are several examples on how to do that in that directory.
> >
> > I have also noticed that the tests fail at execution on aarch64_be.
>
> I think this is important to fix. We don't want the big-endian target to
> have
> failing implementations of the Neon intrinsics. What is the nature of the
> failure?
>
> From what I can see, nothing in the patch prevents using these intrinsics
> on big-endian, so either the intrinsics behaviour is wrong (we have a wrong
> code bug), or the testcase expected behaviour is wrong.
>
> I don't think disabling the test for big-endian is the right fix. We should
> either fix the intrinsics, or fix the testcase.
>
> Thanks,
> James
>
> Hi James,


As the tests assume the little endian order of elements while checking the
results, the tests are failing for big endian targets. So, the failures are
not because of intrinsic implementations, but because of the testcase.

- Thanks and regards,
  Sameera D.


Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2018-04-30 Thread Sameera Deshpande
On 13 April 2018 at 20:21, James Greenhalgh  wrote:
> On Fri, Apr 13, 2018 at 03:39:32PM +0100, Sameera Deshpande wrote:
>> On Fri 13 Apr, 2018, 8:04 PM James Greenhalgh, 
>> mailto:james.greenha...@arm.com>> wrote:
>> On Fri, Apr 06, 2018 at 08:55:47PM +0100, Christophe Lyon wrote:
>> > Hi,
>> >
>> > 2018-04-06 12:15 GMT+02:00 Sameera Deshpande 
>> > mailto:sameera.deshpa...@linaro.org>>:
>> > > Hi Christophe,
>> > >
>> > > Please find attached the updated patch with testcases.
>> > >
>> > > Ok for trunk?
>> >
>> > Thanks for the update.
>> >
>> > Since the new intrinsics are only available on aarch64, you want to
>> > prevent the tests from running on arm.
>> > Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two 
>> > targets.
>> > There are several examples on how to do that in that directory.
>> >
>> > I have also noticed that the tests fail at execution on aarch64_be.
>>
>> I think this is important to fix. We don't want the big-endian target to have
>> failing implementations of the Neon intrinsics. What is the nature of the
>> failure?
>>
>> From what I can see, nothing in the patch prevents using these intrinsics
>> on big-endian, so either the intrinsics behaviour is wrong (we have a wrong
>> code bug), or the testcase expected behaviour is wrong.
>>
>> I don't think disabling the test for big-endian is the right fix. We should
>> either fix the intrinsics, or fix the testcase.
>>
>> Thanks,
>> James
>>
>> Hi James,
>>
>> As the tests assume the little endian order of elements while checking the
>> results, the tests are failing for big endian targets. So, the failures are
>> not because of intrinsic implementations, but because of the testcase.
>
> The testcase is a little hard to follow through the macros, but why would
> this be the case?
>
> ld1 is deterministic on big and little endian for which elements will be
> loaded from memory, as is st1.
>
> My expectation would be that:
>
>   int __attribute__ ((noinline))
>   test_vld_u16_x3 ()
>   {
> uint16_t data[3 * 3];
> uint16_t temp[3 * 3];
> uint16x4x3_t vectors;
> int i,j;
> for (i = 0; i < 3 * 3; i++)
>   data [i] = (uint16_t) 3*i;
> asm volatile ("" : : : "memory");
> vectors = vld1_u16_x3 (data);
> vst1_u16 (temp, vectors.val[0]);
> vst1_u16 (&temp[3], vectors.val[1]);
> vst1_u16 (&temp[3 * 2], vectors.val[2]);
> asm volatile ("" : : : "memory");
> for (j = 0; j < 3 * 3; j++)
>   if (temp[j] != data[j])
> return 1;
> return 0;
>   }
>
> would work equally well for big- or little-endian.
>
> I think this is more likely to be an intrinsics implementation bug.
>
> Thanks,
> James
>

Hi James,

Please find attached the updated patch, which now passes for little as
well as big endian.
Ok for trunk?

-- 
- Thanks and regards,
  Sameera D.

gcc/Changelog:

2018-05-01  Sameera Deshpande  


* config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
(st1x2): Likewise.
(st1x3): Likewise.
* config/aarch64/aarch64-simd.md
(aarch64_ld1x3): New pattern.
(aarch64_ld1_x3_): Likewise
(aarch64_st1x2): Likewise
(aarch64_st1_x2_): Likewise
(aarch64_st1x3): Likewise
(aarch64_st1_x3_): Likewise
* config/aarch64/arm_neon.h (vld1_u8_x3): New function.
(vld1_s8_x3): Likewise.
(vld1_u16_x3): Likewise.
(vld1_s16_x3): Likewise.
(vld1_u32_x3): Likewise.
(vld1_s32_x3): Likewise.
(vld1_u64_x3): Likewise.
(vld1_s64_x3): Likewise.
(vld1_f16_x3): Likewise.
(vld1_f32_x3): Likewise.
(vld1_f64_x3): Likewise.
(vld1_p8_x3): Likewise.
(vld1_p16_x3): Likewise.
(vld1_p64_x3): Likewise.
(vld1q_u8_x3): Likewise.
(vld1q_s8_x3): Likewise.
(vld1q_u16_x3): Likewise.
(vld1q_s16_x3): Likewise.
(vld1q_u32_x3): Likewise.
(vld1q_s32_x3): Likewise.
(vld1q_u64_x3): Likewise.
(vld1q_s64_x3): Likewise.
(vld1q_f16_x3): Likewise.
(vld1q_f32_x3): Likewise.
(vld1q_f64_x3): Likewise.
(vld1q_p8_x3): Likewise.
(vld1q_p16_x3): Likewise.
(vld1q_p64_x3): Likewise.
(vst1_s64_x2): Likewise.
(vst1_u64_x2): Likewise.
(vst1_f64_x2): Likewise.
(vst1_s8_x2): Likewise.
(vst1_p8_x2): Likewise.
(vst1_s16_x2): Likewise.
(vst1_p16_x2): Likewise.
(vst1_s32

[AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2017-12-14 Thread Sameera Deshpande
Hi!

Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and
vst1_*_x3 intrinsics as defined by Neon document.

Ok for trunk?

- Thanks and regards,
  Sameera D.

gcc/Changelog:

2017-11-14  Sameera Deshpande  


* config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
(st1x2): Likewise.
(st1x3): Likewise.
* config/aarch64/aarch64-simd.md
(aarch64_ld1x3): New pattern.
(aarch64_ld1_x3_): Likewise
(aarch64_st1x2): Likewise
(aarch64_st1_x2_): Likewise
(aarch64_st1x3): Likewise
(aarch64_st1_x3_): Likewise
* config/aarch64/arm_neon.h (vld1_u8_x3): New function.
(vld1_s8_x3): Likewise.
(vld1_u16_x3): Likewise.
(vld1_s16_x3): Likewise.
(vld1_u32_x3): Likewise.
(vld1_s32_x3): Likewise.
(vld1_u64_x3): Likewise.
(vld1_s64_x3): Likewise.
(vld1_fp16_x3): Likewise.
(vld1_f32_x3): Likewise.
(vld1_f64_x3): Likewise.
(vld1_p8_x3): Likewise.
(vld1_p16_x3): Likewise.
(vld1_p64_x3): Likewise.
(vld1q_u8_x3): Likewise.
(vld1q_s8_x3): Likewise.
(vld1q_u16_x3): Likewise.
(vld1q_s16_x3): Likewise.
(vld1q_u32_x3): Likewise.
(vld1q_s32_x3): Likewise.
(vld1q_u64_x3): Likewise.
(vld1q_s64_x3): Likewise.
(vld1q_f16_x3): Likewise.
(vld1q_f32_x3): Likewise.
(vld1q_f64_x3): Likewise.
(vld1q_p8_x3): Likewise.
(vld1q_p16_x3): Likewise.
(vld1q_p64_x3): Likewise.
(vst1_s64_x2): Likewise.
(vst1_u64_x2): Likewise.
(vst1_f64_x2): Likewise.
(vst1_s8_x2): Likewise.
(vst1_p8_x2): Likewise.
(vst1_s16_x2): Likewise.
(vst1_p16_x2): Likewise.
(vst1_s32_x2): Likewise.
(vst1_u8_x2): Likewise.
(vst1_u16_x2): Likewise.
(vst1_u32_x2): Likewise.
(vst1_f16_x2): Likewise.
(vst1_f32_x2): Likewise.
(vst1_p64_x2): Likewise.
(vst1q_s8_x2): Likewise.
(vst1q_p8_x2): Likewise.
(vst1q_s16_x2): Likewise.
(vst1q_p16_x2): Likewise.
(vst1q_s32_x2): Likewise.
(vst1q_s64_x2): Likewise.
(vst1q_u8_x2): Likewise.
(vst1q_u16_x2): Likewise.
(vst1q_u32_x2): Likewise.
(vst1q_u64_x2): Likewise.
(vst1q_f16_x2): Likewise.
(vst1q_f32_x2): Likewise.
(vst1q_f64_x2): Likewise.
(vst1q_p64_x2): Likewise.
(vst1_s64_x3): Likewise.
(vst1_u64_x3): Likewise.
(vst1_f64_x3): Likewise.
(vst1_s8_x3): Likewise.
(vst1_p8_x3): Likewise.
(vst1_s16_x3): Likewise.
(vst1_p16_x3): Likewise.
(vst1_s32_x3): Likewise.
(vst1_u8_x3): Likewise.
(vst1_u16_x3): Likewise.
(vst1_u32_x3): Likewise.
(vst1_f16_x3): Likewise.
(vst1_f32_x3): Likewise.
(vst1_p64_x3): Likewise.
(vst1q_s8_x3): Likewise.
(vst1q_p8_x3): Likewise.
(vst1q_s16_x3): Likewise.
(vst1q_p16_x3): Likewise.
(vst1q_s32_x3): Likewise.
(vst1q_s64_x3): Likewise.
(vst1q_u8_x3): Likewise.
(vst1q_u16_x3): Likewise.
(vst1q_u32_x3): Likewise.
(vst1q_u64_x3): Likewise.
(vst1q_f16_x3): Likewise.
(vst1q_f32_x3): Likewise.
(vst1q_f64_x3): Likewise.
(vst1q_p64_x3): Likewise.
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 52d01342372..fa623e90017 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -441,6 +441,15 @@
   BUILTIN_VALL_F16 (STORE1, st1, 0)
   VAR1(STORE1P, st1, 0, v2di)
 
+  /* Implemented by aarch64_ld1x3.  */
+  BUILTIN_VALLDIF (LOADSTRUCT, ld1x3, 0)
+
+  /* Implemented by aarch64_st1x2.  */
+  BUILTIN_VALLDIF (STORESTRUCT, st1x2, 0)
+
+  /* Implemented by aarch64_st1x3.  */
+  BUILTIN_VALLDIF (STORESTRUCT, st1x3, 0)
+
   /* Implemented by fma4.  */
   BUILTIN_VHSDF (TERNOP, fma, 4)
   VAR1 (TERNOP, fma, 4, hf)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 4fd34c18f95..852bcf0c16a 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -5038,6 +5038,70 @@
 }
 })
 
+
+(define_expand "aarch64_ld1x3"
+  [(match_operand:CI 0 "register_operand" "=w")
+   (match_operand:DI 1 "register_operand" "r")
+   (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+  "TARGET_SIMD"
+{
+  rtx mem = gen_rtx_MEM (CImode, operands[1]);
+  emit_insn (gen_aarch64_ld1_x3_ (operands[0], mem));
+  DONE;
+})
+
+(define_insn "aarch64_ld1_x3_"
+  [(set (match_operand:CI 0 "register_operand" "=w")
+(unspec:CI
+	  [(match_operand:CI 1 "aarch64_simd_struct_operand" "Utv")
+	   (unspec:VALLDIF [(const_int 3)] UNSPEC_VS

[Patch, regrename] Fix PR87330 : ICE in scan_rtx_reg, at regrename.c

2018-10-08 Thread Sameera Deshpande
Hi!

Please find attached the patch fixing the issue PR87330 : ICE in
scan_rtx_reg, at regrename.c:1097.
The regrename pass does not rename the registers which are in notes,
because of which the REG_DEAD note had previous register names, which
caused conflicting liveness information generated for tag collision
pass.

It is better to do it in regrename_do_replace instead while
regrename_analyze, because the note information does not really
contribute into the regrename analysis, hence need not be added in the
def-use chains that are computed. regrename_do_replace is where the
decision to finally rename the register is made - where the note can
be altered with new regname.

Other notes need not be changed, as they don't hold renamed register
information.

Ok for trunk?

Changelog:

2018-10-09 Sameera Deshpande diff --git a/gcc/regrename.c b/gcc/regrename.c
index 8424093..a3446a2 100644
--- a/gcc/regrename.c
+++ b/gcc/regrename.c
@@ -970,6 +970,7 @@ regrename_do_replace (struct du_head *head, int reg)
   unsigned int regno = ORIGINAL_REGNO (*chain->loc);
   struct reg_attrs *attr = REG_ATTRS (*chain->loc);
   int reg_ptr = REG_POINTER (*chain->loc);
+  rtx note;
 
   if (DEBUG_INSN_P (chain->insn) && REGNO (*chain->loc) != base_regno)
 	validate_change (chain->insn, &(INSN_VAR_LOCATION_LOC (chain->insn)),
@@ -986,6 +987,11 @@ regrename_do_replace (struct du_head *head, int reg)
 	  last_reg = *chain->loc;
 	}
 	  validate_change (chain->insn, chain->loc, last_repl, true);
+	  note = find_regno_note (chain->insn, REG_DEAD, base_regno);
+	  if (note != 0)
+	{
+	  validate_change (chain->insn, &XEXP (note, 0), last_repl, true);
+	}
 	}
 }
 


[AArch64] Add Saphira pipeline description.

2018-10-26 Thread Sameera Deshpande
Hi!

Please find attached the patch to add a pipeline description for the
Qualcomm Saphira core.  It is tested with a bootstrap and make check,
with no regressions.

Ok for trunk?

gcc/
Changelog:

2018-10-26 Sameera Deshpande 

* config/aarch64/aarch64-cores.def (saphira): Use saphira pipeline.
* config/aarch64/aarch64.md: Include saphira.md
* config/aarch64/saphira.md: New file for pipeline description.

-- 
- Thanks and regards,
  Sameera D.
diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index 3d876b8..8e4c646 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -90,7 +90,7 @@ AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2
 /* ARMv8.4-A Architecture Processors.  */
 
 /* Qualcomm ('Q') cores. */
-AARCH64_CORE("saphira", saphira,falkor,8_4A,  AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   0x51, 0xC01, -1)
+AARCH64_CORE("saphira", saphira,saphira,8_4A,  AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   0x51, 0xC01, -1)
 
 /* ARMv8-A big.LITTLE implementations.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a014a01..f951354 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -298,6 +298,7 @@
 (include "../arm/cortex-a57.md")
 (include "../arm/exynos-m1.md")
 (include "falkor.md")
+(include "saphira.md")
 (include "thunderx.md")
 (include "../arm/xgene1.md")
 (include "thunderx2t99.md")
diff --git a/gcc/config/aarch64/saphira.md b/gcc/config/aarch64/saphira.md
new file mode 100644
index 000..bbf1c5c
--- /dev/null
+++ b/gcc/config/aarch64/saphira.md
@@ -0,0 +1,583 @@
+;; Saphira pipeline description
+;; Copyright (C) 2017-2018 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_automaton "saphira")
+
+;; Complex int instructions (e.g. multiply and divide) execute in the X
+;; pipeline.  Simple int instructions execute in the X, Y, Z and B pipelines.
+
+(define_cpu_unit "saphira_x" "saphira")
+(define_cpu_unit "saphira_y" "saphira")
+
+;; Branches execute in the Z or B pipeline or in one of the int pipelines depending
+;; on how complex it is.  Simple int insns (like movz) can also execute here.
+
+(define_cpu_unit "saphira_z" "saphira")
+(define_cpu_unit "saphira_b" "saphira")
+
+;; Vector and FP insns execute in the VX and VY pipelines.
+
+(define_automaton "saphira_vfp")
+
+(define_cpu_unit "saphira_vx" "saphira_vfp")
+(define_cpu_unit "saphira_vy" "saphira_vfp")
+
+;; Loads execute in the LD pipeline.
+;; Stores execute in the ST pipeline, for address, data, and
+;; vector data.
+
+(define_automaton "saphira_mem")
+
+(define_cpu_unit "saphira_ld" "saphira_mem")
+(define_cpu_unit "saphira_st" "saphira_mem")
+
+;; The GTOV and VTOG pipelines are for general to vector reg moves, and vice
+;; versa.
+
+(define_cpu_unit "saphira_gtov" "saphira")
+(define_cpu_unit "saphira_vtog" "saphira")
+
+;; Common reservation combinations.
+
+(define_reservation "saphira_vxvy" "saphira_vx|saphira_vy")
+(define_reservation "saphira_zb"   "saphira_z|saphira_b")
+(define_reservation "saphira_xyzb" "saphira_x|saphira_y|saphira_z|saphira_b")
+
+;; SIMD Floating-Point Instructions
+
+(define_insn_reservation "saphira_afp_1_vxvy" 1
+  (and (eq_attr "tune" "saphira")
+   (eq_attr "type" "neon_fp_neg_s,neon_fp_neg_d,neon_fp_abs_s,neon_fp_abs_d,neon_fp_neg_s_q,neon_fp_neg_d_q,neon_fp_abs_s_q,neon_fp_abs_d_q"))
+  "saphira_vxvy")
+
+(define_insn_reservation "saphira_afp_2_vxvy" 2
+  (and (eq_attr "tune" "saphira")
+   (eq_attr "type" "neon_fp_minmax_s,neon_fp_minmax_d,neon_fp_reduc_minmax_s,neon_fp_reduc_minmax_d,neon_fp_compare_s,neon_fp_compare_d,neon_fp_round_s,neon_fp_round_d,neon_fp_minmax_s_q,ne

Re: [RFA/ARM][Patch 01/02]: Thumb2 epilogue in RTL

2011-11-07 Thread Sameera Deshpande
Hi Ramana,

Please find attached reworked patch. The patch is tested with check-gcc,
check-gdb and bootstrap with no regression.

On Fri, 2011-10-21 at 13:43 +0100, Ramana Radhakrishnan wrote: 
> Why are you differentiating on stack_only ? Does it really matter ?
> 
The patterns pop_multi* generate pop instruction, hence I wanted to be
sure that base register is stack.

I can remove stack_only option by
1. Modifying pattern to match SP as base-register explicitly or
2. Generate ldm%(ia%) instruction for non-SP base registers.

I chose second option.

> Hmmm isn't this true of only LDM's in Thumb state ? Though it could be argued
> that this patch is only T2 epilogues.
> 
Yes, its true. But for single register we want to match LDR pattern and
not any of ldm* or pop_multi* pattern. So, I am barring LDM for single
register here.

> >+strcpy (pattern, \"fldmfdd\\t\");
> >+strcat (pattern,
> >+reg_names[REGNO (SET_DEST (XVECEXP (operands[0], 0, 
> >0)))]);
> >+strcat (pattern, \"!, {\");
> >+strcat (pattern, table[(REGNO (XEXP (XVECEXP (operands[0], 0, 1), 0))
> >+   - FIRST_VFP_REGNUM) / 2].name);
> 
> Can't you reuse names from arm.h and avoid the table here ?
> 
The array REGISTER_NAMES in aout.h use S0, S2, ...  names for double
registers. Is there any way to use OVERLAPPING_REGISTER_NAMES? If that
can be done, I can eliminate the table here.

Updated ChangeLog entry:

2011-09-28  Ian Bolton 
Sameera Deshpande  
   
   * config/arm/arm-protos.h (load_multiple_operation_p): New
declaration.
 (thumb2_expand_epilogue): Likewise.
 (thumb2_output_return): Likewise
 (thumb2_expand_return): Likewise.
 (thumb_unexpanded_epilogue): Rename to... 
 (thumb1_unexpanded_epilogue): ...this 
   * config/arm/arm.c (load_multiple_operation_p): New function. 
 (thumb2_emit_multi_reg_pop): Likewise.
 (thumb2_emit_vfp_multi_reg_pop): Likewise.
 (thumb2_expand_return): Likewise. 
 (thumb2_expand_epilogue): Likewise. 
 (thumb2_output_return): Likewise
 (thumb_unexpanded_epilogue): Rename to...
 ( thumb1_unexpanded_epilogue): ...this
   * config/arm/arm.md (pop_multiple_with_stack_update): New
pattern. 
 (pop_multiple_with_stack_update_and_return): Likewise.
 (thumb2_ldr_with_return): Likewise.
 (vfp_point_pop_multiple_with_stack_update): Likewise.
 (return): Update condition and code for pattern.
 (arm_return): Likewise.
 (epilogue_insns): Likewise.
   * config/arm/predicates.md (load_multiple_operation): Update
predicate.
 (load_multiple_operation_return): New predicate. 
 (load_multiple_operation_fp): Likewise.
   * config/arm/thumb2.md (thumb2_return): Remove.
 (thumb2_rtl_epilogue_return): New pattern.


- Thanks and regards,
  Sameera D.diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 23a29c6..2c38883 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -65,6 +65,7 @@ extern int thumb1_legitimate_address_p (enum machine_mode, rtx, int);
 extern int arm_const_double_rtx (rtx);
 extern int neg_const_double_rtx_ok_for_fpa (rtx);
 extern int vfp3_const_double_rtx (rtx);
+extern bool load_multiple_operation_p (rtx, bool, enum machine_mode, bool);
 extern int neon_immediate_valid_for_move (rtx, enum machine_mode, rtx *, int *);
 extern int neon_immediate_valid_for_logic (rtx, enum machine_mode, int, rtx *,
 	   int *);
@@ -176,10 +177,13 @@ extern int arm_float_words_big_endian (void);
 
 /* Thumb functions.  */
 extern void arm_init_expanders (void);
-extern const char *thumb_unexpanded_epilogue (void);
+extern const char *thumb1_unexpanded_epilogue (void);
 extern void thumb1_expand_prologue (void);
 extern void thumb1_expand_epilogue (void);
 extern const char *thumb1_output_interwork (void);
+extern void thumb2_expand_epilogue (void);
+extern void thumb2_output_return (rtx);
+extern void thumb2_expand_return (void);
 #ifdef TREE_CODE
 extern int is_called_in_ARM_mode (tree);
 #endif
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e07c8c3..ec87892 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -8906,6 +8906,137 @@ neon_valid_immediate (rtx op, enum machine_mode mode, int inverse,
 #undef CHECK
 }
 
+/* Return true if OP is a valid load multiple operation for MODE mode.
+   CONSECUTIVE is true if the registers in the operation must form
+   a consecutive sequence in the register bank.  STACK_ONLY is true
+   if the base register must be the stack pointer.  RETURN_PC is true
+   if value is to be loaded in PC.  */
+bool
+load_multiple_operation_p (rtx op, bool consecutive, enum machine_mode mode,
+   bool return_pc)
+{
+  HOST_WIDE_INT count = XVE

Re: [RFA/ARM][Patch 02/05]: LDRD generation instead of POP in A15 Thumb2 epilogue.

2011-11-07 Thread Sameera Deshpande

> 
> 
> I don't believe REG_FRAME_RELATED_EXPR does the right thing for 
> anything besides prologues.  You need to emit REG_CFA_RESTORE
> for the pop inside an epilogue.

Richard, here is updated patch that uses REG_CFA_RESTORE instead of
REG_FRAME_RELATED_EXPR. 


The patch is tested with check-gcc, check-gdb and bootstrap with no
regression.

Ok for trunk?

- Thanks and regards,
  Sameeradiff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 37113f5..e71ead5 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -203,6 +203,7 @@ extern void thumb_reload_in_hi (rtx *);
 extern void thumb_set_return_address (rtx, rtx);
 extern const char *thumb1_output_casesi (rtx *);
 extern const char *thumb2_output_casesi (rtx *);
+extern bool bad_reg_pair_for_thumb_ldrd_strd (rtx, rtx);
 #endif
 
 /* Defined in pe.c.  */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 429b644..05c9368 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15706,6 +15706,151 @@ arm_emit_vfp_multi_reg_pop (int first_reg, int num_regs, rtx base_reg)
   REG_NOTES (par) = dwarf;
 }
 
+bool
+bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx src2)
+{
+  return (GET_CODE (src1) != REG
+  || GET_CODE (src2) != REG
+  || (REGNO (src1) == PC_REGNUM)
+  || (REGNO (src1) == SP_REGNUM)
+  || (REGNO (src1) == REGNO (src2))
+  || (REGNO (src2) == PC_REGNUM)
+  || (REGNO (src2) == SP_REGNUM));
+}
+
+/* Generate and emit a pattern that will be recognized as LDRD pattern.  If even
+   number of registers are being popped, multiple LDRD patterns are created for
+   all register pairs.  If odd number of registers are popped, last register is
+   loaded by using LDR pattern.  */
+static bool
+thumb2_emit_ldrd_pop (unsigned long saved_regs_mask, bool really_return)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, reg, tmp1;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+if (saved_regs_mask & (1 << i))
+  num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+  gcc_assert (really_return || ((saved_regs_mask & (1 << PC_REGNUM)) == 0));
+
+  /* We cannot generate ldrd for PC.  Hence, reduce the count if PC is
+ to be popped.  So, if num_regs is even, now it will become odd,
+ and we can generate pop with PC.  If num_regs is odd, it will be
+ even now, and ldr with return can be generated for PC.  */
+  if (really_return && (saved_regs_mask & (1 << PC_REGNUM)))
+num_regs--;
+
+  /* Var j iterates over all the registers to gather all the registers in
+ saved_regs_mask.  Var i gives index of saved registers in stack frame.
+ A PARALLEL RTX of register-pair is created here, so that pattern for
+ LDRD can be matched.  As PC is always last register to be popped, and
+ we have already decremented num_regs if PC, we don't have to worry
+ about PC in this loop.  */
+  for (i = 0, j = 0; i < (num_regs - (num_regs % 2)); j++)
+if (saved_regs_mask & (1 << j))
+  {
+gcc_assert (j != SP_REGNUM);
+
+/* Create RTX for memory load.  */
+reg = gen_rtx_REG (SImode, j);
+tmp = gen_rtx_SET (SImode,
+   reg,
+   gen_frame_mem (SImode,
+   plus_constant (stack_pointer_rtx, 4 * i)));
+RTX_FRAME_RELATED_P (tmp) = 1;
+
+if (i % 2 == 0)
+  {
+/* When saved-register index (i) is even, the RTX to be emitted is
+   yet to be created.  Hence create it first.  The LDRD pattern we
+   are generating is :
+   [ (SET (reg_t0) (MEM (PLUS (SP) (NUM
+ (SET (reg_t1) (MEM (PLUS (SP) (NUM + 4 ]
+   where target registers need not be consecutive.  */
+par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+dwarf = NULL_RTX;
+  }
+
+/* ith register is added in PARALLEL RTX.  If i is even, the reg_i is
+   added as 0th element and if i is odd, reg_i is added as 1st element
+   of LDRD pattern shown above.  */
+XVECEXP (par, 0, (i % 2)) = tmp;
+dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
+
+if ((i % 2) == 1)
+  {
+/* When saved-register index (i) is odd, RTXs for both the registers
+   to be loaded are generated in above given LDRD pattern, and the
+   pattern can be emitted now.  */
+par = emit_insn (par);
+REG_NOTES (par) = dwarf;
+  }
+
+i++;
+  }
+
+  /* If the number of registers pushed is odd AND really_return is false OR
+ number of registers are even AND really_return is true, last register is
+ popped using LDR.  It can be PC as well.  Hence, adjust the stack first and
+ then LDR with post increment.  */
+
+  /* Increment the stack pointer, based on there being
+ num_regs 4-byte re

Re: [RFA/ARM][Patch 03/05]: STRD generation instead of PUSH in A15 Thumb2 prologue.

2011-11-07 Thread Sameera Deshpande
Hi Ramana,

Please find attached reworked patch. The patch is tested with check-gcc,
check-gdb and bootstrap with no regression.

Ok?

- Thanks and regards,
  Sameera D.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 05c9368..334a25f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15438,6 +15438,125 @@ arm_output_function_epilogue (FILE *file ATTRIBUTE_UNUSED,
 }
 }
 
+/* Generate and emit a pattern that will be recognized as STRD pattern.  If even
+   number of registers are being pushed, multiple STRD patterns are created for
+   all register pairs.  If odd number of registers are pushed, emit a
+   combination of STRDs and STR for the prologue saves.  */
+static void
+thumb2_emit_strd_push (unsigned long saved_regs_mask)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, reg, tmp1;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+if (saved_regs_mask & (1 << i))
+  num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+
+  /* Pre-decrement the stack pointer, based on there being num_regs 4-byte
+ registers to push.  */
+  tmp = gen_rtx_SET (VOIDmode,
+ stack_pointer_rtx,
+ plus_constant (stack_pointer_rtx, -4 * num_regs));
+  RTX_FRAME_RELATED_P (tmp) = 1;
+  insn = emit_insn (tmp);
+
+  /* Create sequence for DWARF info.  */
+  dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (num_regs + 1));
+
+  /* RTLs cannot be shared, hence create new copy for dwarf.  */
+  tmp1 = gen_rtx_SET (VOIDmode,
+ stack_pointer_rtx,
+ plus_constant (stack_pointer_rtx, -4 * num_regs));
+  RTX_FRAME_RELATED_P (tmp1) = 1;
+  XVECEXP (dwarf, 0, 0) = tmp1;
+
+  /* Var j iterates over all the registers to gather all the registers in
+ saved_regs_mask.  Var i gives index of register R_j in stack frame.
+ A PARALLEL RTX of register-pair is created here, so that pattern for
+ STRD can be matched.  If num_regs is odd, 1st register will be pushed
+ using STR and remaining registers will be pushed with STRD in pairs.
+ If num_regs is even, all registers are pushed with STRD in pairs.
+ Hence, skip first element for odd num_regs.  */
+  for (i = num_regs - 1, j = LAST_ARM_REGNUM; i >= (num_regs % 2); j--)
+if (saved_regs_mask & (1 << j))
+  {
+gcc_assert (j != SP_REGNUM);
+gcc_assert (j != PC_REGNUM);
+
+/* Create RTX for store.  New RTX is created for dwarf as
+   they are not sharable.  */
+reg = gen_rtx_REG (SImode, j);
+tmp = gen_rtx_SET (SImode,
+   gen_frame_mem
+   (SImode,
+plus_constant (stack_pointer_rtx, 4 * i)),
+   reg);
+
+tmp1 = gen_rtx_SET (SImode,
+   gen_frame_mem
+   (SImode,
+plus_constant (stack_pointer_rtx, 4 * i)),
+   reg);
+RTX_FRAME_RELATED_P (tmp) = 1;
+RTX_FRAME_RELATED_P (tmp1) = 1;
+
+if (((i - (num_regs % 2)) % 2) == 1)
+  /* When (i - (num_regs % 2)) is odd, the RTX to be emitted is yet to
+ be created.  Hence create it first.  The STRD pattern we are
+ generating is :
+ [ (SET (MEM (PLUS (SP) (NUM))) (reg_t1))
+   (SET (MEM (PLUS (SP) (NUM + 4))) (reg_t2)) ]
+ were target registers need not be consecutive.  */
+  par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+
+/* Register R_j is added in PARALLEL RTX.  If (i - (num_regs % 2)) is
+   even, the reg_j is added as 0th element and if it is odd, reg_i is
+   added as 1st element of STRD pattern shown above.  */
+XVECEXP (par, 0, ((i - (num_regs % 2)) % 2)) = tmp;
+XVECEXP (dwarf, 0, (i + 1)) = tmp1;
+
+if (((i - (num_regs % 2)) % 2) == 0)
+  /* When (i - (num_regs % 2)) is even, RTXs for both the registers
+ to be loaded are generated in above given STRD pattern, and the
+ pattern can be emitted now.  */
+  emit_insn (par);
+
+i--;
+  }
+
+  if ((num_regs % 2) == 1)
+{
+  /* If odd number of registers are pushed, generate STR pattern to store
+ lone register.  */
+  for (; (saved_regs_mask & (1 << j)) == 0; j--);
+
+  tmp1 = gen_frame_mem (SImode, plus_constant (stack_pointer_rtx, 4 * i));
+  reg = gen_rtx_REG (SImode, j);
+  tmp = gen_rtx_SET (SImode, tmp1, reg);
+  RTX_FRAME_RELATED_P (tmp) = 1;
+
+  emit_insn (tmp);
+
+  tmp1 = gen_rtx_SET (SImode,
+ gen_frame_mem
+ (SImode,
+  plus_constant (stack_pointer_rtx, 4 * i)),
+ reg);
+  RTX_FRAME_RELATED_P (tmp1) = 1;
+  XVECEXP (dwarf, 0, (i + 1)) = tmp1;
+}
+
+  add_reg_note (insn, R

Re: [RFA/ARM][Patch 01/02]: Thumb2 epilogue in RTL

2011-11-07 Thread Sameera Deshpande
On Mon, 2011-11-07 at 09:56 +, Paul Brook wrote:
> > The array REGISTER_NAMES in aout.h use S0, S2, ...  names for double
> > registers. Is there any way to use OVERLAPPING_REGISTER_NAMES? If that
> > can be done, I can eliminate the table here.
> 
> You should be using %P.
> 

Paul,

Thanks for your comment. Please find attached reworked patch. The patch
is tested with check-gcc without regression.

- Thanks and regards,
  Sameera D. diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 23a29c6..2c38883 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -65,6 +65,7 @@ extern int thumb1_legitimate_address_p (enum machine_mode, rtx, int);
 extern int arm_const_double_rtx (rtx);
 extern int neg_const_double_rtx_ok_for_fpa (rtx);
 extern int vfp3_const_double_rtx (rtx);
+extern bool load_multiple_operation_p (rtx, bool, enum machine_mode, bool);
 extern int neon_immediate_valid_for_move (rtx, enum machine_mode, rtx *, int *);
 extern int neon_immediate_valid_for_logic (rtx, enum machine_mode, int, rtx *,
 	   int *);
@@ -176,10 +177,13 @@ extern int arm_float_words_big_endian (void);
 
 /* Thumb functions.  */
 extern void arm_init_expanders (void);
-extern const char *thumb_unexpanded_epilogue (void);
+extern const char *thumb1_unexpanded_epilogue (void);
 extern void thumb1_expand_prologue (void);
 extern void thumb1_expand_epilogue (void);
 extern const char *thumb1_output_interwork (void);
+extern void thumb2_expand_epilogue (void);
+extern void thumb2_output_return (rtx);
+extern void thumb2_expand_return (void);
 #ifdef TREE_CODE
 extern int is_called_in_ARM_mode (tree);
 #endif
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e07c8c3..ec87892 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -8906,6 +8906,137 @@ neon_valid_immediate (rtx op, enum machine_mode mode, int inverse,
 #undef CHECK
 }
 
+/* Return true if OP is a valid load multiple operation for MODE mode.
+   CONSECUTIVE is true if the registers in the operation must form
+   a consecutive sequence in the register bank.  STACK_ONLY is true
+   if the base register must be the stack pointer.  RETURN_PC is true
+   if value is to be loaded in PC.  */
+bool
+load_multiple_operation_p (rtx op, bool consecutive, enum machine_mode mode,
+   bool return_pc)
+{
+  HOST_WIDE_INT count = XVECLEN (op, 0);
+  unsigned dest_regno, first_dest_regno;
+  rtx src_addr;
+  HOST_WIDE_INT i = 1, base = 0;
+  HOST_WIDE_INT offset = 0;
+  rtx elt;
+  bool addr_reg_loaded = false;
+  bool update = false;
+  int reg_increment, regs_per_val;
+  int offset_adj;
+
+  /* If DFmode, we must be asking for consecutive,
+ since fldmdd can only do consecutive regs.  */
+  gcc_assert ((mode != DFmode) || consecutive);
+
+  /* Set up the increments and the regs per val based on the mode.  */
+  reg_increment = GET_MODE_SIZE (mode);
+  regs_per_val = mode == DFmode ? 2 : 1;
+  offset_adj = return_pc ? 1 : 0;
+
+  if (count <= 1
+  || GET_CODE (XVECEXP (op, 0, offset_adj)) != SET
+  || !REG_P (SET_DEST (XVECEXP (op, 0, offset_adj
+return false;
+
+  /* Check to see if this might be a write-back.  */
+  if (GET_CODE (SET_SRC (elt = XVECEXP (op, 0, offset_adj))) == PLUS)
+{
+  i++;
+  base = 1;
+  update = true;
+
+  /* The offset adjustment should be same as number of registers being
+ popped * size of single register.  */
+  if (!REG_P (SET_DEST (elt))
+  || !REG_P (XEXP (SET_SRC (elt), 0))
+  || !CONST_INT_P (XEXP (SET_SRC (elt), 1))
+  || INTVAL (XEXP (SET_SRC (elt), 1)) !=
+  ((count - 1 - offset_adj) * reg_increment))
+return false;
+}
+
+  i = i + offset_adj;
+  base = base + offset_adj;
+  /* Perform a quick check so we don't blow up below.  */
+  if (GET_CODE (XVECEXP (op, 0, i - 1)) != SET
+  || !REG_P (SET_DEST (XVECEXP (op, 0, i - 1)))
+  || !MEM_P (SET_SRC (XVECEXP (op, 0, i - 1
+return false;
+
+  /* If only one reg being loaded, success depends on the type:
+ FLDMDD can do just one reg, LDM must do at least two.  */
+  if (count <= i)
+return mode == DFmode ? true : false;
+
+  first_dest_regno = REGNO (SET_DEST (XVECEXP (op, 0, i - 1)));
+  dest_regno = first_dest_regno;
+
+  src_addr = XEXP (SET_SRC (XVECEXP (op, 0, i - 1)), 0);
+
+  if (GET_CODE (src_addr) == PLUS)
+{
+  if (!CONST_INT_P (XEXP (src_addr, 1)))
+return false;
+  offset = INTVAL (XEXP (src_addr, 1));
+  src_addr = XEXP (src_addr, 0);
+}
+
+  if (!REG_P (src_addr))
+return false;
+
+  /* The pattern we are trying to match here is:
+ [(SET (R_d0) (MEM (PLUS (src_addr) (offset
+  (SET (R_d1) (MEM (PLUS (src_addr) (offset + 
+   :
+   :
+  (SET (R_dn) (MEM (PLUS (src_addr) (offset + n * 
+ ]
+ Where,
+ 1.  If offset is 0, first insn should be (SET (R_d0) (MEM (src_addr))).
+ 2.  REGNO (R_d0) <

Re: [RFA/ARM][Patch 04/05]: STRD generation instead of PUSH in A15 ARM prologue.

2011-11-08 Thread Sameera Deshpande
On Fri, 2011-10-21 at 13:45 +0100, Ramana Radhakrishnan wrote: 
> >+arm_emit_strd_push (unsigned long saved_regs_mask)
> 
> How different is this from the thumb2 version you sent out in Patch 03/05 ?
> 
Thumb-2 STRD can handle non-consecutive registers, ARM STRD cannot.
Because of which we accumulate non-consecutive STRDs in ARM mode and
emit STM instruction. For consecutive registers, STRD is generated.

> >@@ -15958,7 +16081,8 @@ arm_get_frame_offsets (void)
> >  use 32-bit push/pop instructions.  */
> >   if (! any_sibcall_uses_r3 ()
> >   && arm_size_return_regs () <= 12
> >-  && (offsets->saved_regs_mask & (1 << 3)) == 0)
> >+  && (offsets->saved_regs_mask & (1 << 3)) == 0
> >+  && (TARGET_THUMB2 || !current_tune->prefer_ldrd_strd))
> 
> Not sure I completely follow this change yet.
> 
If the stack is not aligned, we need to adjust the stack in prologue.
Here, instead of adjusting the stack, we PUSH register R3 on stack, so
that no additional ADD instruction is needed for stack adjustment.
This works fine when we generate multi-reg load/store instructions.

However, when we generate STRD in ARM mode, non-consecutive registers
are stored using STR/STM instruction. As pair register of R3 (reg R2) is
never pushed on stack, we always end up generating STR instruction to
PUSH R3 on stack. This is more expensive than doing ADD SP, SP, #4 for
stack adjustment.

e.g. if we are PUSHing {R4, R5, R6} registers, the stack is not aligned,
hence, we PUSH {R3, R4, R5, R6}
So, Instructions generated are:
STR R6, [sp, #4]
STRD R4, R5, [sp, #12]
STR R3, [sp, #16]

However, if instead of R3, other caller-saved register is PUSHed,
we push {R4, R5, R6, R7}, to generate
STRD R6, R7, [sp, #8]
STRD R4, R5, [sp, #16]

If no caller saved register is available, we generate ADD instruction,
which is still better than generating STR. 
> 
> Hmmm the question remains if we want to put these into ldmstm.md since
> it was theoretically
> auto-generated from ldmstm.ml. If this has to be marked to be separate
> then I'd like
> to regenerate ldmstm.md from ldmstm.ml and differentiate between the
> bits that can be auto-generated
> and the bits that have been added since.
> 
The current patterns are quite different from patterns generated using
arm-ldmstm.ml. I will submit updated arm-ldmstm.ml file generating
ldrd/strd patterns as a new patch. Is that fine?

The patch is tested with check-gcc, check-gdb and bootstrap.

I see a regression in gcc:
FAIL: gcc.c-torture/execute/vector-compare-1.c compilation,  -O3
-fomit-frame-pointer -funroll-loops with error message 
/tmp/ccC13odV.s: Assembler messages:
/tmp/ccC13odV.s:544: Error: co-processor offset out of range

This seems to be uncovered latent bug, and I am looking into it.

- Thanks and regards,
  Sameera D.diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index e71ead5..ccf05c7 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -163,6 +163,7 @@ extern const char *arm_output_memory_barrier (rtx *);
 extern const char *arm_output_sync_insn (rtx, rtx *);
 extern unsigned int arm_sync_loop_insns (rtx , rtx *);
 extern int arm_attr_length_push_multi(rtx, rtx);
+extern bool bad_reg_pair_for_arm_ldrd_strd (rtx, rtx);
 
 #if defined TREE_CODE
 extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 334a25f..deee78b 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -93,6 +93,7 @@ static bool arm_assemble_integer (rtx, unsigned int, int);
 static void arm_print_operand (FILE *, rtx, int);
 static void arm_print_operand_address (FILE *, rtx);
 static bool arm_print_operand_punct_valid_p (unsigned char code);
+static rtx emit_multi_reg_push (unsigned long);
 static const char *fp_const_from_val (REAL_VALUE_TYPE *);
 static arm_cc get_arm_condition_code (rtx);
 static HOST_WIDE_INT int_log2 (HOST_WIDE_INT);
@@ -15438,6 +15439,117 @@ arm_output_function_epilogue (FILE *file ATTRIBUTE_UNUSED,
 }
 }
 
+/* STRD in ARM mode needs consecutive registers to be stored.  This function
+   keeps accumulating non-consecutive registers until first consecutive register
+   pair is found.  It then generates multi register PUSH for all accumulated
+   registers, and then generates STRD with write-back for consecutive register
+   pair.  This process is repeated until all the registers are stored on stack.
+   multi register PUSH takes care of lone registers as well.  */
+static void
+arm_emit_strd_push (unsigned long saved_regs_mask)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx tmp, tmp1;
+  unsigned long regs_to_be_pushed_mask;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+if (saved_regs_mask & (1 << i))
+  num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+
+  /* Var j iterates over all registers to gather all registers in
+ save

Re: [RFA/ARM][Patch 05/05]: LDRD generation instead of POP in A15 ARM epilogue.

2011-11-08 Thread Sameera Deshpande
On Fri, 2011-10-21 at 13:45 +0100, Ramana Radhakrishnan wrote: 
> change that. Other than that this patch looks OK and please watch out
> for stylistic issues from the previous patch.

Ramana, please find attached reworked patch. The patch is tested with
check-gcc, check-gdb and bootstrap with no regression.

- Thanks and regards,
  Sameera D.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index deee78b..4a86749 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15960,6 +15960,135 @@ bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx src2)
   || (REGNO (src2) == SP_REGNUM));
 }
 
+/* LDRD in ARM mode needs consecutive registers to be stored.  This function
+   keeps accumulating non-consecutive registers until first consecutive register
+   pair is found.  It then generates multi-reg POP for all accumulated
+   registers, and then generates LDRD with write-back for consecutive register
+   pair.  This process is repeated until all the registers are loaded from
+   stack.  multi register POP takes care of lone registers as well.  However,
+   LDRD cannot be generated for PC, as results are unpredictable.  Hence, if PC
+   is in SAVED_REGS_MASK, generate multi-reg POP with RETURN or LDR with RETURN
+   depending upon number of registers in REGS_TO_BE_POPPED_MASK.  */
+static void
+arm_emit_ldrd_pop (unsigned long saved_regs_mask, bool really_return)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp;
+  unsigned long regs_to_be_popped_mask = 0;
+  bool pc_in_list = false;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+if (saved_regs_mask & (1 << i))
+  num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+
+  for (i = 0, j = 0; i < num_regs; j++)
+if (saved_regs_mask & (1 << j))
+  {
+i++;
+if ((j % 2) == 0
+&& (saved_regs_mask & (1 << (j + 1)))
+&& (j + 1) != SP_REGNUM
+&& (j + 1) != PC_REGNUM
+&& regs_to_be_popped_mask)
+  {
+/* Current register and next register form register pair for which
+   LDRD can be generated.  Generate POP for accumulated registers
+   and reset regs_to_be_popped_mask.  SP should be handled here as
+   the results are unpredictable if register being stored is same
+   as index register (in this case, SP).  PC is always the last
+   register being popped.  Hence, we don't have to worry about PC
+   here.  */
+arm_emit_multi_reg_pop (regs_to_be_popped_mask, pc_in_list);
+pc_in_list = false;
+regs_to_be_popped_mask = 0;
+continue;
+  }
+
+if (j == PC_REGNUM)
+  {
+gcc_assert (really_return);
+pc_in_list = 1;
+  }
+
+regs_to_be_popped_mask |= (1 << j);
+
+if ((j % 2) == 1
+&& (saved_regs_mask & (1 << (j - 1)))
+&& j != SP_REGNUM
+&& j != PC_REGNUM)
+  {
+ /* Generate a LDRD for register pair R_, R_.  The pattern
+generated here is
+[(SET SP, (PLUS SP, 8))
+ (SET R_, (MEM SP))
+ (SET R_, (MEM (PLUS SP, 4)))].  */
+ par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));
+
+ tmp = gen_rtx_SET (VOIDmode,
+stack_pointer_rtx,
+plus_constant (stack_pointer_rtx, 8));
+ RTX_FRAME_RELATED_P (tmp) = 1;
+ XVECEXP (par, 0, 0) = tmp;
+
+ tmp = gen_rtx_SET (SImode,
+gen_rtx_REG (SImode, j - 1),
+gen_frame_mem (SImode, stack_pointer_rtx));
+ RTX_FRAME_RELATED_P (tmp) = 1;
+ XVECEXP (par, 0, 1) = tmp;
+ dwarf = alloc_reg_note (REG_CFA_RESTORE,
+ gen_rtx_REG (SImode, j - 1),
+ dwarf);
+
+ tmp = gen_rtx_SET (SImode,
+ gen_rtx_REG (SImode, j),
+ gen_frame_mem (SImode,
+   plus_constant (stack_pointer_rtx, 4)));
+ RTX_FRAME_RELATED_P (tmp) = 1;
+ XVECEXP (par, 0, 2) = tmp;
+ dwarf = alloc_reg_note (REG_CFA_RESTORE,
+ gen_rtx_REG (SImode, j),
+ dwarf);
+
+ insn = emit_insn (par);
+ REG_NOTES (insn) = dwarf;
+ pc_in_list = false;
+ regs_to_be_popped_mask = 0;
+ dwarf = NULL_RTX;
+  }
+  }
+
+  if (regs_to_be_popped_mask)
+{
+  /* single PC pop can happen here.  Take care of that.  */
+  if (pc_in_list && (regs_to_be_popped_mask == (1 << PC_REGNUM)))
+{
+  /* Only PC is to be popped.  */
+  par = 

Re: [RFA/ARM][Patch 01/02]: Thumb2 epilogue in RTL

2011-11-10 Thread Sameera Deshpande
Hi Richard,

thanks for your comments.

-- 



> +  if (GET_CODE (SET_SRC (elt = XVECEXP (op, 0, offset_adj))) == PLUS)
> 
> It's generally best not to use assignments within conditionals unless
> there is a strong reason otherwise (that normally implies something like
> being deep within a condition test where you only want to update the
> variable if some pre-conditions are true and that can't be easily
> factored out).
> 
> +  != (unsigned int) (first_dest_regno + regs_per_val *
> (i - base
> 
> Line length (split the line just before the '+' operator.
> 
> +  /* now show EVERY reg that will be restored, using a SET for each.  */
> 
> Capital letter at start of sentence.  Why is EVERY in caps?
> 
> +  saved_regs_mask = offsets->saved_regs_mask;
> +  for (i = 0, num_regs = 0; i <= LAST_ARM_REGNUM; i++)
> 
> blank line before the for loop.
> 
> +  /* It's illegal to do a pop for only one reg, so generate an ldr.  */
> 
> GCC coding standards suggest avoiding the use of 'illegal'.  Suggest
> changing that to 'Pop can only be used for more than one reg; so...'
> 
> +reg_names[REGNO (XEXP (XVECEXP (operands[0], 0, 2),
> 0))]);
> +
> +/* Skip over the first two elements and the one we just generated.
>  */
> +for (i = 3; i < (num_saves); i++)
> +  {
> +strcat (pattern, \", %|\");
> 
> +strcat (pattern,
> 
> +reg_names[REGNO (XEXP (XVECEXP (operands[0], 0, i),
> 0))]);
> +  }
> +
> +strcat (pattern, \"}\");
> +output_asm_insn (pattern, operands);
> +
> 
> +return \"\";
> +  }
> +  "
> 
> +  [(set_attr "type" "load4")]
> 
> There's a lot of trailing white space here.  Please remove.

Removed white spaces in reworked patch
http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01009.html

> 
> +(define_insn "*thumb2_ldr_with_return"
> +  [(return)
> +   (set (reg:SI PC_REGNUM)
> +(mem:SI (post_inc:SI (match_operand:SI 0 "s_register_operand"
> "k"]
> +  "TARGET_THUMB2"
> +  "ldr%?\t%|pc, [%0], #4"
> +  [(set_attr "type" "load1")
> +   (set_attr "predicable" "yes")]
> +)
> +
> 
> This pattern doesn't seem to be used.  What's its purpose?

This pattern is generated from thumb2_expand_return in 

+  if (num_regs == 1)
+{
+  rtx par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+  rtx reg = gen_rtx_REG (SImode, PC_REGNUM);
+  rtx addr = gen_rtx_MEM (SImode,
+  gen_rtx_POST_INC (SImode,
+
stack_pointer_rtx));
+  set_mem_alias_set (addr, get_frame_alias_set ());
+  XVECEXP (par, 0, 0) = ret_rtx;
+  XVECEXP (par, 0, 1) = gen_rtx_SET (SImode, reg, addr);
+  RTX_FRAME_RELATED_P (par) = 1;
+  emit_jump_insn (par);
+}

> 
> +static const struct { const char *const name; } table[]
> +  = { {\"d0\"}, {\"d1\"}, {\"d2\"}, {\"d3\"},
> 
> I'm not keen on having this table.  Generally the register names should
> be configurable depending on the assembler flavour and this patch
> defeats that.  Is there any way to rewrite this code so that it can use
> the standard operand methods for generating register names?

The updated patch was resent after comments from Ramana and Paul which
eliminates this table.

http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01009.html

I will take care of other formatting issues and will resend the patch.

> 
> In summary, this is mostly OK, apart from the last two items.
> 
> R.

- Thanks and regards,
  Sameera D.




Re: [RFA/ARM][Patch 01/02]: Thumb2 epilogue in RTL

2011-11-10 Thread Sameera Deshpande
On Thu, 2011-11-10 at 13:44 +, Richard Earnshaw wrote:
> On 28/09/11 17:15, Sameera Deshpande wrote:
> > Hi!
> > 
> > This patch generates Thumb2 epilogues in RTL form.
> > 
> > The work involves defining new functions, predicates and patterns along with
> > few changes in existing code:
> > * The load_multiple_operation predicate was found to be too restrictive for
> > integer loads as it required consecutive destination regs, so this
> > restriction was lifted.
> > * Variations of load_multiple_operation were required to handle cases 
> >- where SP must be the base register 
> >- where FP values were being loaded (which do require consecutive
> > destination registers)
> >- where PC can be in register-list (which requires return pattern along
> > with register loads).
> >   Hence, the common code was factored out into a new function in arm.c and
> > parameterised to show 
> >- whether consecutive destination regs are needed
> >- the data type being loaded 
> >- whether the base register has to be SP
> >- whether PC is in register-list
> > 
> > The patch is tested with arm-eabi with no regressions.
> > 
> > ChangeLog:
> > 
> > 2011-09-28  Ian Bolton 
> > Sameera Deshpande  
> >
> >* config/arm/arm-protos.h (load_multiple_operation_p): New
> > declaration.
> >  (thumb2_expand_epilogue): Likewise.
> >  (thumb2_output_return): Likewise
> >  (thumb2_expand_return): Likewise.
> >  (thumb_unexpanded_epilogue): Rename to... 
> >  (thumb1_unexpanded_epilogue): ...this 
> >* config/arm/arm.c (load_multiple_operation_p): New function. 
> >  (thumb2_emit_multi_reg_pop): Likewise.
> >  (thumb2_emit_vfp_multi_reg_pop): Likewise.
> >  (thumb2_expand_return): Likewise. 
> >  (thumb2_expand_epilogue): Likewise. 
> >  (thumb2_output_return): Likewise
> >  (thumb_unexpanded_epilogue): Rename to...
> >  ( thumb1_unexpanded_epilogue): ...this
> >* config/arm/arm.md (pop_multiple_with_stack_update): New pattern. 
> >  (pop_multiple_with_stack_update_and_return): Likewise.
> >  (thumb2_ldr_with_return): Likewise.
> >  (floating_point_pop_multiple_with_stack_update): Likewise.
> >  (return): Update condition and code for pattern.
> >  (arm_return): Likewise.
> >  (epilogue_insns): Likewise.
> >* config/arm/predicates.md (load_multiple_operation): Update
> > predicate.
> >  (load_multiple_operation_stack_and_return): New predicate. 
> >  (load_multiple_operation_stack): Likewise.
> >  (load_multiple_operation_stack_fp): Likewise.
> >* config/arm/thumb2.md (thumb2_return): Remove.
> >  (thumb2_rtl_epilogue_return): New pattern.
> > 
> > 
> > - Thanks and regards,
> >   Sameera D.
> > 
> > 
> > thumb2_rtl_epilogue_complete-27Sept.patch
> > 
> 
> +  if (GET_CODE (SET_SRC (elt = XVECEXP (op, 0, offset_adj))) == PLUS)
> 
> It's generally best not to use assignments within conditionals unless
> there is a strong reason otherwise (that normally implies something like
> being deep within a condition test where you only want to update the
> variable if some pre-conditions are true and that can't be easily
> factored out).
> 
> +  != (unsigned int) (first_dest_regno + regs_per_val *
> (i - base
> 
> Line length (split the line just before the '+' operator.
> 
> +  /* now show EVERY reg that will be restored, using a SET for each.  */
> 
> Capital letter at start of sentence.  Why is EVERY in caps?
> 
> +  saved_regs_mask = offsets->saved_regs_mask;
> +  for (i = 0, num_regs = 0; i <= LAST_ARM_REGNUM; i++)
> 
> blank line before the for loop.
> 
> +  /* It's illegal to do a pop for only one reg, so generate an ldr.  */
> 
> GCC coding standards suggest avoiding the use of 'illegal'.  Suggest
> changing that to 'Pop can only be used for more than one reg; so...'
> 
> +reg_names[REGNO (XEXP (XVECEXP (operands[0], 0, 2),
> 0))]);
> +
> +/* Skip over the first two elements and the one we just generated.
>  */
> +for (i = 3; i < (num_saves); i++)
> +  {
> +strcat (pattern, \", %|\");
> 
> +strcat (pattern,
> 
> +reg_names[REGNO (XEXP (XVECEXP (operands[0], 0, i),
> 0))]);
> +  }
> +
> +strc

[RFA/ARM][Patch]: Fix NEG_POOL_RANGE

2011-11-17 Thread Sameera Deshpande
Hi!

Please find attached the patch updating NEG_POOL_RANGE from 1008 to
1020 -(8 + ).

Tested with check-gcc with no regression.

The test-case failing for patch 'STRD generation instead of PUSH in A15
ARM prologue' (http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01158.html)
passes with this fix.

gcc/ChangeLog entry:

2011-11-17  Sameera Deshpande  

   * config/arm/arm.md (arm_movdi): Update NEG_POOL_RANGE.
 (movdf_soft_insn): Likewise.
   * config/arm/fpa.md (thumb2_movdf_fpa): Likewise.
   * config/arm/neon.md (neon_mov): Likewise.
   * config/arm/vfp.md (movdi_vfp): Likewise.
 (movdi_vfp_cortexa8): Likewise.
 (movdf_vfp): Likewise.

- Thanks and regards,
  Sameera D.*** gcc/config/arm/.svn/text-base/arm.md.svn-base	Mon Oct 31 14:59:55 2011
--- gcc/config/arm/arm.md	Thu Nov 17 11:52:38 2011
*** (define_insn "*arm_movdi"
*** 5223,5229 
[(set_attr "length" "8,12,16,8,8")
 (set_attr "type" "*,*,*,load2,store2")
 (set_attr "arm_pool_range" "*,*,*,1020,*")
!(set_attr "arm_neg_pool_range" "*,*,*,1008,*")
 (set_attr "thumb2_pool_range" "*,*,*,4096,*")
 (set_attr "thumb2_neg_pool_range" "*,*,*,0,*")]
  )
--- 5223,5229 
[(set_attr "length" "8,12,16,8,8")
 (set_attr "type" "*,*,*,load2,store2")
 (set_attr "arm_pool_range" "*,*,*,1020,*")
!(set_attr "arm_neg_pool_range" "*,*,*,1004,*")
 (set_attr "thumb2_pool_range" "*,*,*,4096,*")
 (set_attr "thumb2_neg_pool_range" "*,*,*,0,*")]
  )
*** (define_insn "*movdf_soft_insn"
*** 6583,6589 
[(set_attr "length" "8,12,16,8,8")
 (set_attr "type" "*,*,*,load2,store2")
 (set_attr "pool_range" "*,*,*,1020,*")
!(set_attr "arm_neg_pool_range" "*,*,*,1008,*")
 (set_attr "thumb2_neg_pool_range" "*,*,*,0,*")]
  )
  
--- 6583,6589 
[(set_attr "length" "8,12,16,8,8")
 (set_attr "type" "*,*,*,load2,store2")
 (set_attr "pool_range" "*,*,*,1020,*")
!(set_attr "arm_neg_pool_range" "*,*,*,1004,*")
 (set_attr "thumb2_neg_pool_range" "*,*,*,0,*")]
  )
  
*** gcc/config/arm/.svn/text-base/neon.md.svn-base	Mon Oct 31 14:59:54 2011
--- gcc/config/arm/neon.md	Thu Nov 17 11:52:38 2011
*** (define_insn "*neon_mov"
*** 198,204 
(set_attr "insn" "*,*,*,*,*,*,mov,*,*")
(set_attr "length" "4,4,4,4,4,4,8,8,8")
(set_attr "pool_range" "*,*,*,1020,*,*,*,1020,*")
!   (set_attr "neg_pool_range" "*,*,*,1008,*,*,*,1008,*")])
  
  (define_insn "*neon_mov"
[(set (match_operand:VQXMOV 0 "nonimmediate_operand"
--- 198,204 
(set_attr "insn" "*,*,*,*,*,*,mov,*,*")
(set_attr "length" "4,4,4,4,4,4,8,8,8")
(set_attr "pool_range" "*,*,*,1020,*,*,*,1020,*")
!   (set_attr "neg_pool_range" "*,*,*,1004,*,*,*,1004,*")])
  
  (define_insn "*neon_mov"
[(set (match_operand:VQXMOV 0 "nonimmediate_operand"
*** (define_insn "*neon_mov"
*** 243,249 
 (set_attr "insn" "*,*,*,*,*,*,mov,*,*")
 (set_attr "length" "4,8,4,8,8,8,16,8,16")
 (set_attr "pool_range" "*,*,*,1020,*,*,*,1020,*")
!(set_attr "neg_pool_range" "*,*,*,1008,*,*,*,1008,*")])
  
  (define_expand "movti"
[(set (match_operand:TI 0 "nonimmediate_operand" "")
--- 243,249 
 (set_attr "insn" "*,*,*,*,*,*,mov,*,*")
 (set_attr "length" "4,8,4,8,8,8,16,8,16")
 (set_attr "pool_range" "*,*,*,1020,*,*,*,1020,*")
!(set_attr "neg_pool_range" "*,*,*,996,*,*,*,996,*")])
  
  (define_expand "movti"
[(set (match_operand:TI 0 "nonimmediate_operand" "")
*** gcc/config/arm/.svn/text-base/vfp.md.svn-base	Thu Sep 29 10:31:13 2011
--- gcc/config/arm/vfp.md	Thu Nov 17 11:52:38 2011
*** (define_insn "*movdi_vfp"
*** 178,184 
   (const_int 4))]
(const_int 4)))
 (set_attr "pool_range" "*,*,*,*,1020,4096,*,*,*,*,1020,*")
!(set_attr "neg_pool_range" "*,*,*,*,1008,0,*,*,*,*,1008,*")
 (set_attr "arch"   "t2,any,any,any,a,t2,any,any,any,any,any,an

Re: Ping! Re: [RFA/ARM][Patch 02/02]: ARM epilogues in RTL

2011-11-22 Thread Sameera Deshpande
On Fri, 2011-11-18 at 21:45 +, Ramana Radhakrishnan wrote:
> On 5 October 2011 17:04, Sameera Deshpande  wrote:
> > Ping!
> >
> > http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01855.html
> 
> 
> This should now be rebased given your other changes to the Thumb2
> epilogues patch or is there a newer version of this patch . This
> doesn't apply cleanly on top of the other patch.

Ramana, I have already sent the rebased patch at
gcc.gnu.org/ml/gcc-patches/2011-11/msg00885.html.
Can you please see if it is working? 

-- 




RE: [RFA/ARM][Patch 01/02]: Thumb2 epilogue in RTL

2011-11-22 Thread Sameera Deshpande
On Tue, 2011-11-22 at 01:55 +, Xinyu Qi wrote:
> At 2011-11-19 07:11:17,"Ramana Radhakrishnan" 
>  wrote:
> > On 10 November 2011 18:07, Sameera Deshpande
> >  wrote:
> > > Please find attached the reworked patch.
> > 
> > OK but for a very small bit .
> > 
> > I'll note that we need to add support for the iwMMXt registers but the
> > attached patch (untested) should be what is needed to support the iwMMxt
> > registers and this should be something that should be incorporated into your
> > ARM RTL epilogues patch as well. My understanding is that this is all you
> > should need to do as there is a wldrd as a part of *iwmmx*_movdi instruction
> > and therefore this should just work in this form.
> > 
> > Xinyu can you help in testing this patch once this is applied along with 
> > your
> > other patches to turn on Thumb2 for iwmmxt ?
> > 
> > cheers
> > Ramana
> 
> Hi, Ramana,
> 
> Though I don't fully understand this patch, I think it is unnecessary to 
> consider turn on Thumb2 for iwmmxt,
> because there is no chip that supports WMMX instructions with thumb2 mode.
> 
> Thanks,
> Xinyu
> 

Ramana, in that case, should I add the change you suggested in ARM RTL
epilogue patch only?

-- 




Re: [RFA/ARM][Patch]: Fix NEG_POOL_RANGE

2011-11-24 Thread Sameera Deshpande
On Fri, 2011-11-18 at 23:12 +, Ramana Radhakrishnan wrote:
> On 17 November 2011 15:16, Sameera Deshpande  
> wrote:
> > Hi!
> >
> > Please find attached the patch updating NEG_POOL_RANGE from 1008 to
> > 1020 -(8 + ).
> 
> This is OK - can you add a comment around the neg_pool_range attribute
> in arm.md stating that the limit should essentially be  -
> (8 +  ?.
> 
Hi Ramana,

Thanks for your comment.
Please find attached the updated patch.

-- *** /work/spec-test/local-checkouts/gcc-fsf/gcc/config/arm/arm.md	2011-11-22 17:20:36.0 +
--- gcc/config/arm/arm.md	2011-11-22 17:14:48.0 +
***  (define_attr "enabled" "no,yes"
*** 268,274 
  ; can be placed.  If the distance is zero, then this insn will never
  ; reference the pool.
  ; NEG_POOL_RANGE is nonzero for insns that can reference a constant pool entry
! ; before its address.
  (define_attr "arm_pool_range" "" (const_int 0))
  (define_attr "thumb2_pool_range" "" (const_int 0))
  (define_attr "arm_neg_pool_range" "" (const_int 0))
--- 268,274 
  ; can be placed.  If the distance is zero, then this insn will never
  ; reference the pool.
  ; NEG_POOL_RANGE is nonzero for insns that can reference a constant pool entry
! ; before its address.  It is set to  - (8 + ).
  (define_attr "arm_pool_range" "" (const_int 0))
  (define_attr "thumb2_pool_range" "" (const_int 0))
  (define_attr "arm_neg_pool_range" "" (const_int 0))
*** (define_insn "*arm_movdi"
*** 5223,5229 
[(set_attr "length" "8,12,16,8,8")
 (set_attr "type" "*,*,*,load2,store2")
 (set_attr "arm_pool_range" "*,*,*,1020,*")
!(set_attr "arm_neg_pool_range" "*,*,*,1008,*")
 (set_attr "thumb2_pool_range" "*,*,*,4096,*")
 (set_attr "thumb2_neg_pool_range" "*,*,*,0,*")]
  )
--- 5223,5229 
[(set_attr "length" "8,12,16,8,8")
 (set_attr "type" "*,*,*,load2,store2")
 (set_attr "arm_pool_range" "*,*,*,1020,*")
!(set_attr "arm_neg_pool_range" "*,*,*,1004,*")
 (set_attr "thumb2_pool_range" "*,*,*,4096,*")
 (set_attr "thumb2_neg_pool_range" "*,*,*,0,*")]
  )
*** (define_insn "*movdf_soft_insn"
*** 6583,6589 
[(set_attr "length" "8,12,16,8,8")
 (set_attr "type" "*,*,*,load2,store2")
 (set_attr "pool_range" "*,*,*,1020,*")
!(set_attr "arm_neg_pool_range" "*,*,*,1008,*")
 (set_attr "thumb2_neg_pool_range" "*,*,*,0,*")]
  )
  
--- 6583,6589 
[(set_attr "length" "8,12,16,8,8")
 (set_attr "type" "*,*,*,load2,store2")
 (set_attr "pool_range" "*,*,*,1020,*")
!(set_attr "arm_neg_pool_range" "*,*,*,1004,*")
 (set_attr "thumb2_neg_pool_range" "*,*,*,0,*")]
  )
  
*** /work/spec-test/local-checkouts/gcc-fsf/gcc/config/arm/fpa.md	2011-11-22 17:18:37.0 +
--- gcc/config/arm/fpa.md	2011-11-22 17:14:48.0 +
*** (define_insn "*thumb2_movdf_fpa"
*** 671,677 
 (set_attr "type"
  "load1,store2,*,store2,load1,ffarith,ffarith,f_fpa_load,f_fpa_store,r_mem_f,f_mem_r")
 (set_attr "pool_range" "*,*,*,*,4092,*,*,1024,*,*,*")
!(set_attr "neg_pool_range" "*,*,*,*,0,*,*,1020,*,*,*")]
  )
  
  ;; Saving and restoring the floating point registers in the prologue should
--- 671,677 
 (set_attr "type"
  "load1,store2,*,store2,load1,ffarith,ffarith,f_fpa_load,f_fpa_store,r_mem_f,f_mem_r")
 (set_attr "pool_range" "*,*,*,*,4092,*,*,1024,*,*,*")
!(set_attr "neg_pool_range" "*,*,*,*,0,*,*,1008,*,*,*")]
  )
  
  ;; Saving and restoring the floating point registers in the prologue should
*** /work/spec-test/local-checkouts/gcc-fsf/gcc/config/arm/neon.md	2011-11-22 17:18:37.0 +
--- gcc/config/arm/neon.md	2011-11-22 17:14:48.0 +
*** (define_insn "*neon_mov"
*** 198,204 
(set_attr "insn" "*,*,*,*,*,*,mov,*,*")
(set_attr "length" "4,4,4,4,4,4,8,8,8")
(set_attr "pool_range" "*,*,*,1020,*,*,*,1020,*")
!   (set_attr "neg_pool_range" "*,*,*,1008,*,*,*,1008,*")])
  
  (define_insn "*neon_mov"
[(set (match_operand:VQXMOV 0 "nonimmediate_operand"
--- 198,204 
(set_attr "

[Patch] Fix Bug 51162

2011-11-24 Thread Sameera Deshpande
Hi,

Please find attached the patch fixing bugzilla issue
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51162.

ARM architecture implements vec_[load|store]_lanes which are
implemented as internal function calls. The function gimple_call_fn ()
returns NULL for internal calls. Hence, this patch guards dereferences
of 'fn' in dump_gimple_call ().

Tests in gcc-dg/vect failing with 'segmentation fault', pass with this
patch.

gcc/Changelog entry:
2011-11-24  Sameera Deshpande  

   * gimple-pretty-print.c (dump_gimple_call): Check if fn is NULL
 before dereferencing.

-- diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index f0e7c50..6d96868 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -699,11 +699,12 @@ dump_gimple_call (pretty_printer *buffer, gimple gs, int spc, int flags)
 pp_string (buffer, " [tail call]");
 
   /* Dump the arguments of _ITM_beginTransaction sanely.  */
-  if (TREE_CODE (fn) == ADDR_EXPR)
+  if (fn != NULL && TREE_CODE (fn) == ADDR_EXPR)
 fn = TREE_OPERAND (fn, 0);
-  if (TREE_CODE (fn) == FUNCTION_DECL && decl_is_tm_clone (fn))
+  if (fn != NULL && TREE_CODE (fn) == FUNCTION_DECL && decl_is_tm_clone (fn))
 pp_string (buffer, " [tm-clone]");
-  if (TREE_CODE (fn) == FUNCTION_DECL
+  if (fn != NULL
+  && TREE_CODE (fn) == FUNCTION_DECL
   && DECL_BUILT_IN_CLASS (fn) == BUILT_IN_NORMAL
   && DECL_FUNCTION_CODE (fn) == BUILT_IN_TM_START
   && gimple_call_num_args (gs) > 0)

Added myself to MAINTAINERS: write after approval

2011-11-25 Thread Sameera Deshpande
Committed.

-- Index: MAINTAINERS
===
--- MAINTAINERS	(revision 181721)
+++ MAINTAINERS	(working copy)
@@ -345,6 +345,7 @@
 David Daney	david.da...@caviumnetworks.com
 Bud Davis	jmda...@link.com
 Chris Demetriou	c...@google.com
+Sameera Deshpandesameera.deshpa...@arm.com
 Fran�ois Dumont	fdum...@gcc.gnu.org
 Benoit Dupont de Dinechin			benoit.dupont-de-dinec...@st.com
 Michael Eager	ea...@eagercon.com

Re: [RFA/ARM][Patch 01/02]: Thumb2 epilogue in RTL

2011-12-01 Thread Sameera Deshpande
On Tue, 2011-11-22 at 10:37 +, Ramana Radhakrishnan wrote:

> Xinyu: I seem to have mis-remembered that one of your patches was
> turning on Thumb2 for wMMX.
> >
> > Ramana, in that case, should I add the change you suggested in ARM RTL
> > epilogue patch only?
> 
> The comment in Thumb2 epilogues should remain and yes - it should be
> added to the ARM RTL epilogue patch only. I'm also ok with that being
> in with a #if 0 around it but given it's in the epilogue whoever tries
> turning on Thumb2 for iwMMX will surely notice that in the first
> testrun :)

Ramana,

Please find attached updated patch which sets CFA_RESTORE note for
single register pop and fixing new ICEs in check-gcc at trunk.

The patch is tested with check-gcc, bootstrap and check-gdb without
regression.

-- diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 23a29c6..2c38883 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -65,6 +65,7 @@ extern int thumb1_legitimate_address_p (enum machine_mode, rtx, int);
 extern int arm_const_double_rtx (rtx);
 extern int neg_const_double_rtx_ok_for_fpa (rtx);
 extern int vfp3_const_double_rtx (rtx);
+extern bool load_multiple_operation_p (rtx, bool, enum machine_mode, bool);
 extern int neon_immediate_valid_for_move (rtx, enum machine_mode, rtx *, int *);
 extern int neon_immediate_valid_for_logic (rtx, enum machine_mode, int, rtx *,
 	   int *);
@@ -176,10 +177,13 @@ extern int arm_float_words_big_endian (void);
 
 /* Thumb functions.  */
 extern void arm_init_expanders (void);
-extern const char *thumb_unexpanded_epilogue (void);
+extern const char *thumb1_unexpanded_epilogue (void);
 extern void thumb1_expand_prologue (void);
 extern void thumb1_expand_epilogue (void);
 extern const char *thumb1_output_interwork (void);
+extern void thumb2_expand_epilogue (void);
+extern void thumb2_output_return (rtx);
+extern void thumb2_expand_return (void);
 #ifdef TREE_CODE
 extern int is_called_in_ARM_mode (tree);
 #endif
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e3b0b88..40c8b44 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -8906,6 +8906,139 @@ neon_valid_immediate (rtx op, enum machine_mode mode, int inverse,
 #undef CHECK
 }
 
+/* Return true if OP is a valid load multiple operation for MODE mode.
+   CONSECUTIVE is true if the registers in the operation must form
+   a consecutive sequence in the register bank.  STACK_ONLY is true
+   if the base register must be the stack pointer.  RETURN_PC is true
+   if value is to be loaded in PC.  */
+bool
+load_multiple_operation_p (rtx op, bool consecutive, enum machine_mode mode,
+   bool return_pc)
+{
+  HOST_WIDE_INT count = XVECLEN (op, 0);
+  unsigned dest_regno, first_dest_regno;
+  rtx src_addr;
+  HOST_WIDE_INT i = 1, base = 0;
+  HOST_WIDE_INT offset = 0;
+  rtx elt;
+  bool addr_reg_loaded = false;
+  bool update = false;
+  int reg_increment, regs_per_val;
+  int offset_adj;
+
+  /* If DFmode, we must be asking for consecutive,
+ since fldmdd can only do consecutive regs.  */
+  gcc_assert ((mode != DFmode) || consecutive);
+
+  /* Set up the increments and the regs per val based on the mode.  */
+  reg_increment = GET_MODE_SIZE (mode);
+  regs_per_val = mode == DFmode ? 2 : 1;
+  offset_adj = return_pc ? 1 : 0;
+
+  if (count <= 1
+  || GET_CODE (XVECEXP (op, 0, offset_adj)) != SET
+  || !REG_P (SET_DEST (XVECEXP (op, 0, offset_adj
+return false;
+
+  /* Check to see if this might be a write-back.  */
+  elt = XVECEXP (op, 0, offset_adj);
+  if (GET_CODE (SET_SRC (elt)) == PLUS)
+{
+  i++;
+  base = 1;
+  update = true;
+
+  /* The offset adjustment should be same as number of registers being
+ popped * size of single register.  */
+  if (!REG_P (SET_DEST (elt))
+  || !REG_P (XEXP (SET_SRC (elt), 0))
+  || !CONST_INT_P (XEXP (SET_SRC (elt), 1))
+  || INTVAL (XEXP (SET_SRC (elt), 1)) !=
+  ((count - 1 - offset_adj) * reg_increment))
+return false;
+}
+
+  i = i + offset_adj;
+  base = base + offset_adj;
+  /* Perform a quick check so we don't blow up below.  */
+  if (GET_CODE (XVECEXP (op, 0, i - 1)) != SET
+  || !REG_P (SET_DEST (XVECEXP (op, 0, i - 1)))
+  || !MEM_P (SET_SRC (XVECEXP (op, 0, i - 1
+return false;
+
+  /* If only one reg being loaded, success depends on the type:
+ FLDMDD can do just one reg, LDM must do at least two.  */
+  if (count <= i)
+return mode == DFmode ? true : false;
+
+  first_dest_regno = REGNO (SET_DEST (XVECEXP (op, 0, i - 1)));
+  dest_regno = first_dest_regno;
+
+  src_addr = XEXP (SET_SRC (XVECEXP (op, 0, i - 1)), 0);
+
+  if (GET_CODE (src_addr) == PLUS)
+{
+  if (!CONST_INT_P (XEXP (src_addr, 1)))
+return false;
+  offset = INTVAL (XEXP (src_addr, 1));
+  src_addr = XEXP (src_addr, 0);
+}
+
+  if (!REG_P (src_addr))
+return false;
+
+  /* T

Re: [Patch] Fix Bug 51162

2011-12-02 Thread Sameera Deshpande
On Wed, 2011-11-30 at 19:43 +, Jason Merrill wrote:
> On 11/24/2011 05:42 AM, Sameera Deshpande wrote:
> > -  if (TREE_CODE (fn) == ADDR_EXPR)
> > +  if (fn != NULL && TREE_CODE (fn) == ADDR_EXPR)
> >  fn = TREE_OPERAND (fn, 0);
> > -  if (TREE_CODE (fn) == FUNCTION_DECL && decl_is_tm_clone (fn))
> > +  if (fn != NULL && TREE_CODE (fn) == FUNCTION_DECL && decl_is_tm_clone 
> > (fn))
> >  pp_string (buffer, " [tm-clone]");
> > -  if (TREE_CODE (fn) == FUNCTION_DECL
> > +  if (fn != NULL
> 
> I'd rather not add the null check so many times.  How about just 
> returning if fn is null?
> 
> Jason
> 

Jason,

Thanks for your comment.
Please find attached reworked patch returning if fn is NULL.

the patch is tested with check-gcc for ARM.

-- diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index f0e7c50..3b5f670 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -698,6 +698,9 @@ dump_gimple_call (pretty_printer *buffer, gimple gs, int spc, int flags)
   if (gimple_call_tail_p (gs))
 pp_string (buffer, " [tail call]");
 
+  if (fn == NULL)
+return;
+
   /* Dump the arguments of _ITM_beginTransaction sanely.  */
   if (TREE_CODE (fn) == ADDR_EXPR)
 fn = TREE_OPERAND (fn, 0);

Re: [RFA/ARM][Patch 02/05]: LDRD generation instead of POP in A15 Thumb2 epilogue.

2011-12-30 Thread Sameera Deshpande
Hi!

Please find attached revised LDRD generation patch for A15 Thumb-2 mode.

Because of the major rework in ARM and Thumb-2 RTL epilogue patches,
this patch has undergone some changes.

The patch is tested with check-gcc, bootstrap and check-gdb without
regression.

Ok for trunk?

-- diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 64d5993..49aae52 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -201,6 +201,7 @@ extern void thumb_reload_in_hi (rtx *);
 extern void thumb_set_return_address (rtx, rtx);
 extern const char *thumb1_output_casesi (rtx *);
 extern const char *thumb2_output_casesi (rtx *);
+extern bool bad_reg_pair_for_thumb_ldrd_strd (rtx, rtx);
 #endif
 
 /* Defined in pe.c.  */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d671281..6d008c5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15847,6 +15847,154 @@ arm_emit_vfp_multi_reg_pop (int first_reg, int num_regs, rtx base_reg)
   REG_NOTES (par) = dwarf;
 }
 
+bool
+bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx src2)
+{
+  return (GET_CODE (src1) != REG
+  || GET_CODE (src2) != REG
+  || (REGNO (src1) == PC_REGNUM)
+  || (REGNO (src1) == SP_REGNUM)
+  || (REGNO (src1) == REGNO (src2))
+  || (REGNO (src2) == PC_REGNUM)
+  || (REGNO (src2) == SP_REGNUM));
+}
+
+/* Generate and emit a pattern that will be recognized as LDRD pattern.  If even
+   number of registers are being popped, multiple LDRD patterns are created for
+   all register pairs.  If odd number of registers are popped, last register is
+   loaded by using LDR pattern.  */
+static void
+thumb2_emit_ldrd_pop (unsigned long saved_regs_mask, bool really_return)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, reg, tmp1;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+if (saved_regs_mask & (1 << i))
+  num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+  gcc_assert (really_return || ((saved_regs_mask & (1 << PC_REGNUM)) == 0));
+
+  /* We cannot generate ldrd for PC.  Hence, reduce the count if PC is
+ to be popped.  So, if num_regs is even, now it will become odd,
+ and we can generate pop with PC.  If num_regs is odd, it will be
+ even now, and ldr with return can be generated for PC.  */
+  if (really_return && (saved_regs_mask & (1 << PC_REGNUM)))
+num_regs--;
+
+  /* Var j iterates over all the registers to gather all the registers in
+ saved_regs_mask.  Var i gives index of saved registers in stack frame.
+ A PARALLEL RTX of register-pair is created here, so that pattern for
+ LDRD can be matched.  As PC is always last register to be popped, and
+ we have already decremented num_regs if PC, we don't have to worry
+ about PC in this loop.  */
+  for (i = 0, j = 0; i < (num_regs - (num_regs % 2)); j++)
+if (saved_regs_mask & (1 << j))
+  {
+gcc_assert (j != SP_REGNUM);
+
+/* Create RTX for memory load.  */
+reg = gen_rtx_REG (SImode, j);
+tmp = gen_rtx_SET (SImode,
+   reg,
+   gen_frame_mem (SImode,
+   plus_constant (stack_pointer_rtx, 4 * i)));
+RTX_FRAME_RELATED_P (tmp) = 1;
+
+if (i % 2 == 0)
+  {
+/* When saved-register index (i) is even, the RTX to be emitted is
+   yet to be created.  Hence create it first.  The LDRD pattern we
+   are generating is :
+   [ (SET (reg_t0) (MEM (PLUS (SP) (NUM
+ (SET (reg_t1) (MEM (PLUS (SP) (NUM + 4 ]
+   where target registers need not be consecutive.  */
+par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+dwarf = NULL_RTX;
+  }
+
+/* ith register is added in PARALLEL RTX.  If i is even, the reg_i is
+   added as 0th element and if i is odd, reg_i is added as 1st element
+   of LDRD pattern shown above.  */
+XVECEXP (par, 0, (i % 2)) = tmp;
+dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
+
+if ((i % 2) == 1)
+  {
+/* When saved-register index (i) is odd, RTXs for both the registers
+   to be loaded are generated in above given LDRD pattern, and the
+   pattern can be emitted now.  */
+par = emit_insn (par);
+REG_NOTES (par) = dwarf;
+  }
+
+i++;
+  }
+
+  /* If the number of registers pushed is odd AND really_return is false OR
+ number of registers are even AND really_return is true, last register is
+ popped using LDR.  It can be PC as well.  Hence, adjust the stack first and
+ then LDR with post increment.  */
+
+  /* Increment the stack pointer, based on there being
+ num_regs 4-byte registers to restore.  */
+  tmp = gen_rtx_SET (VOIDmode,
+ stack_pointer_rtx,
+  

Re: [RFA/ARM][Patch 05/05]: LDRD generation instead of POP in A15 ARM epilogue.

2011-12-30 Thread Sameera Deshpande
Hi Ramana,

Please find attached revised LDRD generation patch for A15 ARM mode.

Because of the major rework in ARM RTL epilogue patch, this patch has
undergone some changes.

The patch is tested with check-gcc, bootstrap and check-gdb without
regression.

Ok for trunk?

-- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d5c651c..46becfb 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -16101,6 +16101,135 @@ bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx src2)
   || (REGNO (src2) == SP_REGNUM));
 }
 
+/* LDRD in ARM mode needs consecutive registers to be stored.  This function
+   keeps accumulating non-consecutive registers until first consecutive register
+   pair is found.  It then generates multi-reg POP for all accumulated
+   registers, and then generates LDRD with write-back for consecutive register
+   pair.  This process is repeated until all the registers are loaded from
+   stack.  multi register POP takes care of lone registers as well.  However,
+   LDRD cannot be generated for PC, as results are unpredictable.  Hence, if PC
+   is in SAVED_REGS_MASK, generate multi-reg POP with RETURN or LDR with RETURN
+   depending upon number of registers in REGS_TO_BE_POPPED_MASK.  */
+static void
+arm_emit_ldrd_pop (unsigned long saved_regs_mask, bool really_return)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp;
+  unsigned long regs_to_be_popped_mask = 0;
+  bool pc_in_list = false;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+if (saved_regs_mask & (1 << i))
+  num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+
+  for (i = 0, j = 0; i < num_regs; j++)
+if (saved_regs_mask & (1 << j))
+  {
+i++;
+if ((j % 2) == 0
+&& (saved_regs_mask & (1 << (j + 1)))
+&& (j + 1) != SP_REGNUM
+&& (j + 1) != PC_REGNUM
+&& regs_to_be_popped_mask)
+  {
+/* Current register and next register form register pair for which
+   LDRD can be generated.  Generate POP for accumulated registers
+   and reset regs_to_be_popped_mask.  SP should be handled here as
+   the results are unpredictable if register being stored is same
+   as index register (in this case, SP).  PC is always the last
+   register being popped.  Hence, we don't have to worry about PC
+   here.  */
+arm_emit_multi_reg_pop (regs_to_be_popped_mask, pc_in_list);
+pc_in_list = false;
+regs_to_be_popped_mask = 0;
+continue;
+  }
+
+if (j == PC_REGNUM)
+  {
+gcc_assert (really_return);
+pc_in_list = 1;
+  }
+
+regs_to_be_popped_mask |= (1 << j);
+
+if ((j % 2) == 1
+&& (saved_regs_mask & (1 << (j - 1)))
+&& j != SP_REGNUM
+&& j != PC_REGNUM)
+  {
+ /* Generate a LDRD for register pair R_, R_.  The pattern
+generated here is
+[(SET SP, (PLUS SP, 8))
+ (SET R_, (MEM SP))
+ (SET R_, (MEM (PLUS SP, 4)))].  */
+ par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));
+
+ tmp = gen_rtx_SET (VOIDmode,
+stack_pointer_rtx,
+plus_constant (stack_pointer_rtx, 8));
+ RTX_FRAME_RELATED_P (tmp) = 1;
+ XVECEXP (par, 0, 0) = tmp;
+
+ tmp = gen_rtx_SET (SImode,
+gen_rtx_REG (SImode, j - 1),
+gen_frame_mem (SImode, stack_pointer_rtx));
+ RTX_FRAME_RELATED_P (tmp) = 1;
+ XVECEXP (par, 0, 1) = tmp;
+ dwarf = alloc_reg_note (REG_CFA_RESTORE,
+ gen_rtx_REG (SImode, j - 1),
+ dwarf);
+
+ tmp = gen_rtx_SET (SImode,
+ gen_rtx_REG (SImode, j),
+ gen_frame_mem (SImode,
+   plus_constant (stack_pointer_rtx, 4)));
+ RTX_FRAME_RELATED_P (tmp) = 1;
+ XVECEXP (par, 0, 2) = tmp;
+ dwarf = alloc_reg_note (REG_CFA_RESTORE,
+ gen_rtx_REG (SImode, j),
+ dwarf);
+
+ insn = emit_insn (par);
+ REG_NOTES (insn) = dwarf;
+ pc_in_list = false;
+ regs_to_be_popped_mask = 0;
+ dwarf = NULL_RTX;
+  }
+  }
+
+  if (regs_to_be_popped_mask)
+{
+  /* single PC pop can happen here.  Take care of that.  */
+  if (pc_in_list && (regs_to_be_popped_mask == (1 << PC_REGNUM)))
+{
+  /* Only PC is to be popped.  */
+  par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+  XVECEXP (par

[Patch ARM] Fix PR 49069.

2012-01-24 Thread Sameera Deshpande
Hi,

Please find attached the patch fixing bug 49069.

This patch is tested with check-gcc on trunk and 4.6 without regression.
OK for trunk?
Is it fine to backport to 4.6 branch?

ChangeLog:
2012-01-24  Sameera Deshpande  
PR target/49069
gcc/config/arm/arm.md (cstoredi4): Handle the case when both
operands are const_int.

gcc/testsuite/ChangeLog:
2012-01-24  Sameera Deshpande  
PR target/49069
gcc.target/arm/pr49069.c: New compile-only test.

- Thanks and regards,
  Sameera D.

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 751997f..e3dc98f 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -7911,8 +7911,9 @@
  enum rtx_code code = GET_CODE (operands[1]);
 
  /* We should not have two constants.  */
- gcc_assert (GET_MODE (operands[2]) == DImode
-		 || GET_MODE (operands[3]) == DImode);
+ if (!(GET_MODE (operands[2]) == DImode || GET_MODE (operands[3]) == DImode)
+ && !(reload_in_progress || reload_completed))
+   operands[3] = force_reg (DImode, operands[3]);
 
 /* Flip unimplemented DImode comparisons to a form that
arm_gen_compare_reg can handle.  */
diff --git a/gcc/testsuite/gcc.target/arm/pr49069.c b/gcc/testsuite/gcc.target/arm/pr49069.c
new file mode 100644
index 000..3cc903e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr49069.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-Os -mfloat-abi=softfp -mfpu=vfpv3-d16" } */
+
+__extension__ typedef unsigned long long int uint64_t;
+
+static int
+func2 (int a, int b)
+{
+  return a == 0 ? a : a / b;
+}
+
+int array1[1];
+const uint64_t array2[1] = { 1 };
+
+void
+foo (void)
+{
+  for (array1[0] = 0; array1[0] == 1; array1[0]++)
+{
+}
+  if (bar (array2[0] == func2 (array1[0], 0)) == 0)
+{
+}
+}

Re: [Patch, regrename] Fix PR87330 : ICE in scan_rtx_reg, at regrename.c

2018-10-30 Thread Sameera Deshpande
On Tue, 9 Oct 2018 at 04:08, Eric Botcazou  wrote:
>
> > Other notes need not be changed, as they don't hold renamed register
> > information.
> >
> > Ok for trunk?
>
> No, REG_DEAD & REG_UNUSED note must be recomputed by passes consuming them.
>
> > 2018-10-09 Sameera Deshpande  >
> > * gcc/regrename.c (regrename_do_replace): Add condition to alter
> > regname if note has same register marked dead in notes.
>
> No gcc/ prefix in gcc/ChangeLog.
>
> --
> Eric Botcazou

Hi Eric,

Thanks for your comments.

Please find attached updated patch invoking data flow for updating the
REG_DEAD and REG_UNUSED notes.

As this change is made in falkor specific file, adding James and
Richard for review.

Ok for trunk?

Changelog:

2018-10-30 Sameera Deshpande diff --git a/gcc/config/aarch64/falkor-tag-collision-avoidance.c b/gcc/config/aarch64/falkor-tag-collision-avoidance.c
index fb6568f..4ca9d66 100644
--- a/gcc/config/aarch64/falkor-tag-collision-avoidance.c
+++ b/gcc/config/aarch64/falkor-tag-collision-avoidance.c
@@ -805,6 +805,7 @@ execute_tag_collision_avoidance ()
   df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
   df_chain_add_problem (DF_UD_CHAIN);
   df_compute_regs_ever_live (true);
+  df_note_add_problem ();
   df_analyze ();
   df_set_flags (DF_DEFER_INSN_RESCAN);
 


Re: [AArch64] Add Saphira pipeline description.

2018-10-30 Thread Sameera Deshpande
On Fri, 26 Oct 2018 at 13:33, Sameera Deshpande
 wrote:
>
> Hi!
>
> Please find attached the patch to add a pipeline description for the
> Qualcomm Saphira core.  It is tested with a bootstrap and make check,
> with no regressions.
>
> Ok for trunk?
>
> gcc/
> Changelog:
>
> 2018-10-26 Sameera Deshpande 
>
> * config/aarch64/aarch64-cores.def (saphira): Use saphira pipeline.
> * config/aarch64/aarch64.md: Include saphira.md
> * config/aarch64/saphira.md: New file for pipeline description.
>
> --
> - Thanks and regards,
>   Sameera D.

Hi!

Please find attached updated patch.
Bootstrap and make check passed without regression. Ok for trunk?

-- 
- Thanks and regards,
  Sameera D.
diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index 3d876b8..8e4c646 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -90,7 +90,7 @@ AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2
 /* ARMv8.4-A Architecture Processors.  */
 
 /* Qualcomm ('Q') cores. */
-AARCH64_CORE("saphira", saphira,falkor,8_4A,  AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   0x51, 0xC01, -1)
+AARCH64_CORE("saphira", saphira,saphira,8_4A,  AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   0x51, 0xC01, -1)
 
 /* ARMv8-A big.LITTLE implementations.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a014a01..f951354 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -298,6 +298,7 @@
 (include "../arm/cortex-a57.md")
 (include "../arm/exynos-m1.md")
 (include "falkor.md")
+(include "saphira.md")
 (include "thunderx.md")
 (include "../arm/xgene1.md")
 (include "thunderx2t99.md")
diff --git a/gcc/config/aarch64/saphira.md b/gcc/config/aarch64/saphira.md
new file mode 100644
index 000..bbf1c5c
--- /dev/null
+++ b/gcc/config/aarch64/saphira.md
@@ -0,0 +1,583 @@
+;; Saphira pipeline description
+;; Copyright (C) 2017-2018 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_automaton "saphira")
+
+;; Complex int instructions (e.g. multiply and divide) execute in the X
+;; pipeline.  Simple int instructions execute in the X, Y, Z and B pipelines.
+
+(define_cpu_unit "saphira_x" "saphira")
+(define_cpu_unit "saphira_y" "saphira")
+
+;; Branches execute in the Z or B pipeline or in one of the int pipelines depending
+;; on how complex it is.  Simple int insns (like movz) can also execute here.
+
+(define_cpu_unit "saphira_z" "saphira")
+(define_cpu_unit "saphira_b" "saphira")
+
+;; Vector and FP insns execute in the VX and VY pipelines.
+
+(define_automaton "saphira_vfp")
+
+(define_cpu_unit "saphira_vx" "saphira_vfp")
+(define_cpu_unit "saphira_vy" "saphira_vfp")
+
+;; Loads execute in the LD pipeline.
+;; Stores execute in the ST pipeline, for address, data, and
+;; vector data.
+
+(define_automaton "saphira_mem")
+
+(define_cpu_unit "saphira_ld" "saphira_mem")
+(define_cpu_unit "saphira_st" "saphira_mem")
+
+;; The GTOV and VTOG pipelines are for general to vector reg moves, and vice
+;; versa.
+
+(define_cpu_unit "saphira_gtov" "saphira")
+(define_cpu_unit "saphira_vtog" "saphira")
+
+;; Common reservation combinations.
+
+(define_reservation "saphira_vxvy" "saphira_vx|saphira_vy")
+(define_reservation "saphira_zb"   "saphira_z|saphira_b")
+(define_reservation "saphira_xyzb" "saphira_x|saphira_y|saphira_z|saphira_b")
+
+;; SIMD Floating-Point Instructions
+
+(define_insn_reservation "saphira_afp_1_vxvy" 1
+  (and (eq_attr "tune" "saphira")
+   (eq_attr "type" "neon_fp_neg_s,neon_fp_neg_d,neon_fp_abs_s,neon_fp_abs_d,neon_fp_neg_s_q,neon_fp_neg_d_q,neon_fp_abs_s_q,neon_fp_abs_d_q"))
+  "saphira_vxvy")
+
+(define_insn_reservatio

Re: [Patch, regrename] Fix PR87330 : ICE in scan_rtx_reg, at regrename.c

2018-10-30 Thread Sameera Deshpande
On Tue, 30 Oct 2018 at 16:16, Richard Earnshaw (lists)
 wrote:
>
> On 30/10/2018 10:09, Sameera Deshpande wrote:
> > On Tue, 9 Oct 2018 at 04:08, Eric Botcazou  wrote:
> >>
> >>> Other notes need not be changed, as they don't hold renamed register
> >>> information.
> >>>
> >>> Ok for trunk?
> >>
> >> No, REG_DEAD & REG_UNUSED note must be recomputed by passes consuming them.
> >>
> >>> 2018-10-09 Sameera Deshpande  >>>
> >>> * gcc/regrename.c (regrename_do_replace): Add condition to alter
> >>> regname if note has same register marked dead in notes.
> >>
> >> No gcc/ prefix in gcc/ChangeLog.
> >>
> >> --
> >> Eric Botcazou
> >
> > Hi Eric,
> >
> > Thanks for your comments.
> >
> > Please find attached updated patch invoking data flow for updating the
> > REG_DEAD and REG_UNUSED notes.
> >
> > As this change is made in falkor specific file, adding James and
> > Richard for review.
> >
> > Ok for trunk?
> >
> > Changelog:
> >
> > 2018-10-30 Sameera Deshpande  >
> > * gcc/config/aarch64/falkor-tag-collision-avoidance.c
> > (execute_tag_collision_avoidance): Invoke df_note_add_problem to
> > recompute REG_DEAD and REG_UNUSED notes before analysis.
> >
>
> 'Call df_note_add_problem.' is enough.
>
> OK with that change.
>
> R.
>
> >
> > bug87330.patch
> >
> > diff --git a/gcc/config/aarch64/falkor-tag-collision-avoidance.c 
> > b/gcc/config/aarch64/falkor-tag-collision-avoidance.c
> > index fb6568f..4ca9d66 100644
> > --- a/gcc/config/aarch64/falkor-tag-collision-avoidance.c
> > +++ b/gcc/config/aarch64/falkor-tag-collision-avoidance.c
> > @@ -805,6 +805,7 @@ execute_tag_collision_avoidance ()
> >df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
> >df_chain_add_problem (DF_UD_CHAIN);
> >df_compute_regs_ever_live (true);
> > +  df_note_add_problem ();
> >df_analyze ();
> >df_set_flags (DF_DEFER_INSN_RESCAN);
> >
> >
>
Thanks Richard! Patch committed at revision 265618.

-- 
- Thanks and regards,
  Sameera D.


Re: [AArch64] Add Saphira pipeline description.

2018-10-31 Thread Sameera Deshpande
On Wed, 31 Oct 2018 at 00:37, James Greenhalgh  wrote:
>
> On Tue, Oct 30, 2018 at 05:12:58AM -0500, Sameera Deshpande wrote:
> > On Fri, 26 Oct 2018 at 13:33, Sameera Deshpande
> >  wrote:
> > >
> > > Hi!
> > >
> > > Please find attached the patch to add a pipeline description for the
> > > Qualcomm Saphira core.  It is tested with a bootstrap and make check,
> > > with no regressions.
> > >
> > > Ok for trunk?
>
> OK.
>
Thanks James, will commit the change.

> I wonder if there's anything we can do to improve maintainability in these
> cases where two pipeline models have considerable overlaps.
>
I agree that there is a need to have some mechanism to maintain the
architectures which have many commonalities.
However, Saphira and Falkor are very different to have lot of sharing,
and with further performance tuning for Saphira, the differences will
be more prominent.
I will commit this patch as is for saphira, and will look at the
possible factoring for Saphira and Falkor pipelines, with
commonalities and differences when the tuning for Saphira is done.

> Thanks,
> James
>
> > >
> > > gcc/
> > > Changelog:
> > >
> > > 2018-10-26 Sameera Deshpande 
> > >
> > > * config/aarch64/aarch64-cores.def (saphira): Use saphira pipeline.
> > > * config/aarch64/aarch64.md: Include saphira.md
> > > * config/aarch64/saphira.md: New file for pipeline description.
> > >
> > > --
> > > - Thanks and regards,
> > >   Sameera D.
> >
> > Hi!
> >
> > Please find attached updated patch.
> > Bootstrap and make check passed without regression. Ok for trunk?
> >
> > --
> > - Thanks and regards,
> >   Sameera D.
>
> > diff --git a/gcc/config/aarch64/aarch64-cores.def 
> > b/gcc/config/aarch64/aarch64-cores.def
> > index 3d876b8..8e4c646 100644
> > --- a/gcc/config/aarch64/aarch64-cores.def
> > +++ b/gcc/config/aarch64/aarch64-cores.def
> > @@ -90,7 +90,7 @@ AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  
> > AARCH64_FL_FOR_ARCH8_2
> >  /* ARMv8.4-A Architecture Processors.  */
> >
> >  /* Qualcomm ('Q') cores. */
> > -AARCH64_CORE("saphira", saphira,falkor,8_4A,  
> > AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   
> > 0x51, 0xC01, -1)
> > +AARCH64_CORE("saphira", saphira,saphira,8_4A,  
> > AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   
> > 0x51, 0xC01, -1)
> >
> >  /* ARMv8-A big.LITTLE implementations.  */
> >
> > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> > index a014a01..f951354 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -298,6 +298,7 @@
> >  (include "../arm/cortex-a57.md")
> >  (include "../arm/exynos-m1.md")
> >  (include "falkor.md")
> > +(include "saphira.md")
> >  (include "thunderx.md")
> >  (include "../arm/xgene1.md")
> >  (include "thunderx2t99.md")
> > diff --git a/gcc/config/aarch64/saphira.md b/gcc/config/aarch64/saphira.md
> > new file mode 100644
> > index 000..bbf1c5c
> > --- /dev/null
> > +++ b/gcc/config/aarch64/saphira.md
> > @@ -0,0 +1,583 @@
> > +;; Saphira pipeline description
> > +;; Copyright (C) 2017-2018 Free Software Foundation, Inc.
> > +;;
> > +;; This file is part of GCC.
> > +;;
> > +;; GCC is free software; you can redistribute it and/or modify it
> > +;; under the terms of the GNU General Public License as published by
> > +;; the Free Software Foundation; either version 3, or (at your option)
> > +;; any later version.
> > +;;
> > +;; GCC is distributed in the hope that it will be useful, but
> > +;; WITHOUT ANY WARRANTY; without even the implied warranty of
> > +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +;; General Public License for more details.
> > +;;
> > +;; You should have received a copy of the GNU General Public License
> > +;; along with GCC; see the file COPYING3.  If not see
> > +;; <http://www.gnu.org/licenses/>.
> > +
> > +(define_automaton "saphira")
> > +
> > +;; Complex int instructions (e.g. multiply and divide) execute in the X
> > +;; pipeline.  Simple int instructions execute in the X, Y, Z and B 
> > pipelines.
> > +
> > +(define_cpu_unit "saphira_x" "saphira"

Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2018-05-08 Thread Sameera Deshpande
On 1 May 2018 at 05:05, Sameera Deshpande  wrote:
> On 13 April 2018 at 20:21, James Greenhalgh  wrote:
>> On Fri, Apr 13, 2018 at 03:39:32PM +0100, Sameera Deshpande wrote:
>>> On Fri 13 Apr, 2018, 8:04 PM James Greenhalgh, 
>>> mailto:james.greenha...@arm.com>> wrote:
>>> On Fri, Apr 06, 2018 at 08:55:47PM +0100, Christophe Lyon wrote:
>>> > Hi,
>>> >
>>> > 2018-04-06 12:15 GMT+02:00 Sameera Deshpande 
>>> > mailto:sameera.deshpa...@linaro.org>>:
>>> > > Hi Christophe,
>>> > >
>>> > > Please find attached the updated patch with testcases.
>>> > >
>>> > > Ok for trunk?
>>> >
>>> > Thanks for the update.
>>> >
>>> > Since the new intrinsics are only available on aarch64, you want to
>>> > prevent the tests from running on arm.
>>> > Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two 
>>> > targets.
>>> > There are several examples on how to do that in that directory.
>>> >
>>> > I have also noticed that the tests fail at execution on aarch64_be.
>>>
>>> I think this is important to fix. We don't want the big-endian target to 
>>> have
>>> failing implementations of the Neon intrinsics. What is the nature of the
>>> failure?
>>>
>>> From what I can see, nothing in the patch prevents using these intrinsics
>>> on big-endian, so either the intrinsics behaviour is wrong (we have a wrong
>>> code bug), or the testcase expected behaviour is wrong.
>>>
>>> I don't think disabling the test for big-endian is the right fix. We should
>>> either fix the intrinsics, or fix the testcase.
>>>
>>> Thanks,
>>> James
>>>
>>> Hi James,
>>>
>>> As the tests assume the little endian order of elements while checking the
>>> results, the tests are failing for big endian targets. So, the failures are
>>> not because of intrinsic implementations, but because of the testcase.
>>
>> The testcase is a little hard to follow through the macros, but why would
>> this be the case?
>>
>> ld1 is deterministic on big and little endian for which elements will be
>> loaded from memory, as is st1.
>>
>> My expectation would be that:
>>
>>   int __attribute__ ((noinline))
>>   test_vld_u16_x3 ()
>>   {
>> uint16_t data[3 * 3];
>> uint16_t temp[3 * 3];
>> uint16x4x3_t vectors;
>> int i,j;
>> for (i = 0; i < 3 * 3; i++)
>>   data [i] = (uint16_t) 3*i;
>> asm volatile ("" : : : "memory");
>> vectors = vld1_u16_x3 (data);
>> vst1_u16 (temp, vectors.val[0]);
>> vst1_u16 (&temp[3], vectors.val[1]);
>> vst1_u16 (&temp[3 * 2], vectors.val[2]);
>> asm volatile ("" : : : "memory");
>> for (j = 0; j < 3 * 3; j++)
>>   if (temp[j] != data[j])
>> return 1;
>> return 0;
>>   }
>>
>> would work equally well for big- or little-endian.
>>
>> I think this is more likely to be an intrinsics implementation bug.
>>
>> Thanks,
>> James
>>
>
> Hi James,
>
> Please find attached the updated patch, which now passes for little as
> well as big endian.
> Ok for trunk?
>
> --
> - Thanks and regards,
>   Sameera D.
>
> gcc/Changelog:
>
> 2018-05-01  Sameera Deshpande  
>
>
> * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
> (st1x2): Likewise.
> (st1x3): Likewise.
> * config/aarch64/aarch64-simd.md
> (aarch64_ld1x3): New pattern.
> (aarch64_ld1_x3_): Likewise
> (aarch64_st1x2): Likewise
> (aarch64_st1_x2_): Likewise
> (aarch64_st1x3): Likewise
> (aarch64_st1_x3_): Likewise
> * config/aarch64/arm_neon.h (vld1_u8_x3): New function.
> (vld1_s8_x3): Likewise.
> (vld1_u16_x3): Likewise.
> (vld1_s16_x3): Likewise.
> (vld1_u32_x3): Likewise.
> (vld1_s32_x3): Likewise.
> (vld1_u64_x3): Likewise.
> (vld1_s64_x3): Likewise.
> (vld1_f16_x3): Likewise.
> (vld1_f32_x3): Likewise.
> (vld1_f64_x3): Likewise.
> (vld1_p8_x3): Likewise.
> (vld1_p16_x3): Likewise.
> (vld1_p64_x3): Likewise.
> (vld1q_u8_x3): Likewise.
> (vld1q_s8_x3): Likewise.
> (vld1q_u16_x3): Likewise.
> (vld1q_

Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2018-05-22 Thread Sameera Deshpande
On Tue 22 May, 2018, 9:26 PM James Greenhalgh, 
wrote:

> On Mon, Apr 30, 2018 at 06:35:11PM -0500, Sameera Deshpande wrote:
> > On 13 April 2018 at 20:21, James Greenhalgh 
> wrote:
> > > On Fri, Apr 13, 2018 at 03:39:32PM +0100, Sameera Deshpande wrote:
> > >> On Fri 13 Apr, 2018, 8:04 PM James Greenhalgh, <
> james.greenha...@arm.com<mailto:james.greenha...@arm.com>> wrote:
> > >> On Fri, Apr 06, 2018 at 08:55:47PM +0100, Christophe Lyon wrote:
> > >> > Hi,
> > >> >
> > >> > 2018-04-06 12:15 GMT+02:00 Sameera Deshpande <
> sameera.deshpa...@linaro.org<mailto:sameera.deshpa...@linaro.org>>:
> > >> > > Hi Christophe,
> > >> > >
> > >> > > Please find attached the updated patch with testcases.
> > >> > >
> > >> > > Ok for trunk?
> > >> >
> > >> > Thanks for the update.
> > >> >
> > >> > Since the new intrinsics are only available on aarch64, you want to
> > >> > prevent the tests from running on arm.
> > >> > Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the
> two targets.
> > >> > There are several examples on how to do that in that directory.
> > >> >
> > >> > I have also noticed that the tests fail at execution on aarch64_be.
> > >>
> > >> I think this is important to fix. We don't want the big-endian target
> to have
> > >> failing implementations of the Neon intrinsics. What is the nature of
> the
> > >> failure?
> > >>
> > >> From what I can see, nothing in the patch prevents using these
> intrinsics
> > >> on big-endian, so either the intrinsics behaviour is wrong (we have a
> wrong
> > >> code bug), or the testcase expected behaviour is wrong.
> > >>
> > >> I don't think disabling the test for big-endian is the right fix. We
> should
> > >> either fix the intrinsics, or fix the testcase.
> > >>
> > >> Thanks,
> > >> James
> > >>
> > >> Hi James,
> > >>
> > >> As the tests assume the little endian order of elements while
> checking the
> > >> results, the tests are failing for big endian targets. So, the
> failures are
> > >> not because of intrinsic implementations, but because of the testcase.
> > >
> > > The testcase is a little hard to follow through the macros, but why
> would
> > > this be the case?
> > >
> > > ld1 is deterministic on big and little endian for which elements will
> be
> > > loaded from memory, as is st1.
> > >
> > > My expectation would be that:
> > >
> > >   int __attribute__ ((noinline))
> > >   test_vld_u16_x3 ()
> > >   {
> > > uint16_t data[3 * 3];
> > > uint16_t temp[3 * 3];
> > > uint16x4x3_t vectors;
> > > int i,j;
> > > for (i = 0; i < 3 * 3; i++)
> > >   data [i] = (uint16_t) 3*i;
> > > asm volatile ("" : : : "memory");
> > > vectors = vld1_u16_x3 (data);
> > > vst1_u16 (temp, vectors.val[0]);
> > > vst1_u16 (&temp[3], vectors.val[1]);
> > > vst1_u16 (&temp[3 * 2], vectors.val[2]);
> > > asm volatile ("" : : : "memory");
> > > for (j = 0; j < 3 * 3; j++)
> > >   if (temp[j] != data[j])
> > > return 1;
> > > return 0;
> > >   }
> > >
> > > would work equally well for big- or little-endian.
> > >
> > > I think this is more likely to be an intrinsics implementation bug.
> > >
> > > Thanks,
> > > James
> > >
> >
> > Hi James,
> >
> > Please find attached the updated patch, which now passes for little as
> > well as big endian.
> > Ok for trunk?
>
>
> OK.
>
> Thanks,
> James
>
> >
> > --
> > - Thanks and regards,
> >   Sameera D.
> >
> > gcc/Changelog:
> >
> > 2018-05-01  Sameera Deshpande  
> >
> >
> > * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
> > (st1x2): Likewise.
> > (st1x3): Likewise.
> > * config/aarch64/aarch64-simd.md
> > (aarch64_ld1x3): New pattern.
> > (aarch64_ld1_x3_): Likewise
> > (aarch64_st1x2): Likewise
> > (aarch64_st1_x2_): Likewise
&

[AARCH64] Add support of ARMv8.4 in saphira for Qualcomm server part

2018-05-29 Thread Sameera Deshpande
Hi!

Please find attached the patch to add support of ARMv8.4 in saphira
for Qualcomm server part. Tested on aarch64, without any regressions.

Ok for trunk?

-- 
- Thanks and regards,
  Sameera D.
diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index 33b96ca2861..e64d8314fa9 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -86,10 +86,10 @@ AARCH64_CORE("thunderx2t99",  thunderx2t99,  thunderx2t99, 8_1A,  AARCH64_FL_FOR
 AARCH64_CORE("cortex-a55",  cortexa55, cortexa53, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa53, 0x41, 0xd05, -1)
 AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa73, 0x41, 0xd0a, -1)
 
-/* ARMv8.3-A Architecture Processors.  */
+/* ARMv8.4-A Architecture Processors.  */
 
 /* Qualcomm ('Q') cores. */
-AARCH64_CORE("saphira", saphira,falkor,8_3A,  AARCH64_FL_FOR_ARCH8_3 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   0x51, 0xC01, -1)
+AARCH64_CORE("saphira", saphira,falkor,8_4A,  AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   0x51, 0xC01, -1)
 
 /* ARMv8-A big.LITTLE implementations.  */
 


Re: [AARCH64] Add support of ARMv8.4 in saphira for Qualcomm server part

2018-05-29 Thread Sameera Deshpande
On Tue 29 May, 2018, 9:19 PM Siddhesh Poyarekar, <
siddhesh.poyare...@linaro.org> wrote:

> On 29 May 2018 at 21:17, James Greenhalgh 
> wrote:
> > On Tue, May 29, 2018 at 05:01:42AM -0500, Sameera Deshpande wrote:
> >> Hi!
> >>
> >> Please find attached the patch to add support of ARMv8.4 in saphira
> >> for Qualcomm server part. Tested on aarch64, without any regressions.
> >>
> >> Ok for trunk?
> >
> > I'm trusting that this is the right thing to do for this core. As
> Siddhesh
> > contributed the original patch; I'd like him to also sign off on this
> > modification.
> >
> > OK for trunk with Siddhesh's ack.
>
> LGTM too.
>
> Thanks,
> Siddhesh
>

Thanks James and Siddhesh.

- Sameera

>