Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute

2018-09-04 Thread Steve Ellcey
On Tue, 2018-09-04 at 17:20 +, Wilco Dijkstra wrote:
> External Email
> 
> Hi Steve,
> 
> The latest version compiles the examples I used correctly, so it looks fine
> from that perspective (but see comments below). However the key point of
> the ABI is to enable better code generation when calling a vector function,
> and that will likely require further changes that may conflict with this 
> patch.
> Do you have patches for that work outstanding? It seems best to do this in
> one go.
> 
> Also did you check there is no regression in code generation for non-vector
> functions?

Yes, I checked for regressions in non-vector functions.  They way this
is written, there should be zero changes to any generated code that
does not involve vector functions.

I have some changes to improve code generation for SIMD functions but
it is not ready to go yet.  The main change I made to improve code
generation is to modify TARGET_HARD_REGNO_CALL_PART_CLOBBERED to accept
an instruction pointer so we could check to see if the call being made
is to a 'normal' function or a 'simd' function and return the correct
value.  Unfortunately, there are many places where we don't have the
call insn available and have to make the pessimistic guess.

I also changed get_call_reg_set_usage so that it could return different
default sets of what registers changed based on whether or not the
function being called was normal or simd.  Again, there are some places
where we have to make a pessimistic guess.

> 
> 
> +/* Return 1 if the register is used by the epilogue.  We need to say the
> +   return register is used, but only after epilogue generation is complete.
> +   Note that in the case of sibcalls, the values "used by the epilogue" are
> +   considered live at the start of the called function.
> +
> +   For SIMD functions we need to return 1 for FP registers that are saved and
> +   restored by a function but not zero in call_used_regs.  If we do not do
> +   this optimizations may remove the restore of the register.  */
> +
> +int
> +aarch64_epilogue_uses (int regno)
> +{
> +  if (epilogue_completed && (regno) == LR_REGNUM)
> +return 1;
> +  if (aarch64_simd_decl_p (cfun->decl) && FP_SIMD_SAVED_REGNUM_P
> (regno))
> +return 1;
> +  return 0;
> +}
> 
> I'm not convinced this is a good idea. It suggests GCC doesn't have the 
> correct set
> of caller/callee-save registers for vector functions (I don't see a change to 
> update
> CALL_USED_REGISTERS or aarch64_hard_regno_call_part_clobbered), which could
> lead to all kinds of interesting issues.

The problem is that there is no single correct set of caller/callee-
saved registers anymore.  There is one set for normal functions and a
different set for simd functions.  GCC isn't set up to have two
different caller/callee saved ABIs on one target, there is only
one CALL_USED_REGISTERS macro used to set one call_used_regs
array.  Should V16-V23 be set in CALL_USED_REGISTERS?  For 'normal'
functions, yes.  For SIMD functions, no.

> 
> +/* Return false for non-leaf SIMD functions in order to avoid
> +   shrink-wrapping them.  Doing this will lose the necessary
> +   save/restore of FP registers.  */
> +
> +bool
> +aarch64_use_simple_return_insn_p (void)
> +{
> +  if (aarch64_simd_decl_p (cfun->decl) && !crtl->is_leaf)
> +return false;
> +
> +  return true;
> +}
> 
> Such as this...
> 
> Wilco


Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute

2018-09-04 Thread Wilco Dijkstra
Hi Steve,

The latest version compiles the examples I used correctly, so it looks fine
from that perspective (but see comments below). However the key point of 
the ABI is to enable better code generation when calling a vector function,
and that will likely require further changes that may conflict with this patch.
Do you have patches for that work outstanding? It seems best to do this in
one go.

Also did you check there is no regression in code generation for non-vector
functions? 


+/* Return 1 if the register is used by the epilogue.  We need to say the
+   return register is used, but only after epilogue generation is complete.
+   Note that in the case of sibcalls, the values "used by the epilogue" are
+   considered live at the start of the called function.
+
+   For SIMD functions we need to return 1 for FP registers that are saved and
+   restored by a function but not zero in call_used_regs.  If we do not do 
+   this optimizations may remove the restore of the register.  */
+
+int
+aarch64_epilogue_uses (int regno)
+{
+  if (epilogue_completed && (regno) == LR_REGNUM)
+    return 1;
+  if (aarch64_simd_decl_p (cfun->decl) && FP_SIMD_SAVED_REGNUM_P (regno))
+    return 1;
+  return 0;
+}

I'm not convinced this is a good idea. It suggests GCC doesn't have the correct 
set
of caller/callee-save registers for vector functions (I don't see a change to 
update
CALL_USED_REGISTERS or aarch64_hard_regno_call_part_clobbered), which could
lead to all kinds of interesting issues.

+/* Return false for non-leaf SIMD functions in order to avoid
+   shrink-wrapping them.  Doing this will lose the necessary
+   save/restore of FP registers.  */
+
+bool
+aarch64_use_simple_return_insn_p (void)
+{
+  if (aarch64_simd_decl_p (cfun->decl) && !crtl->is_leaf)
+    return false;
+
+  return true;
+}

Such as this...

Wilco


Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute

2018-09-04 Thread Kyrill Tkachov

Hi Steve,

On 20/08/18 18:37, Steve Ellcey wrote:

On Tue, 2018-08-07 at 12:15 -0500, Segher Boessenkool wrote:

> > +/* { dg-final { scan-assembler-not "\[ \t\]stp\tq\[01234567\]" } }
> > */
> That's [0-7] but maybe you find [01234567] more readable here.

Segher,  I fixed all the issues you pointed out except this one.  Since
there are some uses of non consecutive numbers in one of the tests I
decided to leave [01234567] instead of using [0-7].  Here is the
latest version of the patch, there are no semantic changes, just
syntactical ones to address the issues that you pointed out.

Steve Ellcey
sell...@cavium.com



One more comment below.
It looks sensible enough to me otherwise, but I haven't done a deep review of 
the logic.

Thanks,
Kyrill



2018-08-20  Steve Ellcey  

* config/aarch64/aarch64-protos.h (aarch64_use_simple_return_insn_p):
New prototype.
(aarch64_epilogue_uses): Ditto.
* config/aarch64/aarch64.c (aarch64_attribute_table): New array.
(aarch64_simd_decl_p): New function.
(aarch64_reg_save_mode): New function.
(aarch64_is_simd_call_p): New function.
(aarch64_function_ok_for_sibcall): Check for simd calls.
(aarch64_layout_frame): Check for simd function.
(aarch64_gen_storewb_pair): Handle E_TFmode.
(aarch64_push_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_loadwb_pair): Handle E_TFmode.
(aarch64_pop_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_store_pair): Handle E_TFmode.
(aarch64_gen_load_pair): Ditto.
(aarch64_save_callee_saves): Handle different mode sizes.
(aarch64_restore_callee_saves): Ditto.
(aarch64_components_for_bb): Check for simd function.
(aarch64_epilogue_uses): New function.
(aarch64_process_components): Check for simd function.
(aarch64_expand_prologue): Ditto.
(aarch64_expand_epilogue): Ditto.
(aarch64_expand_call): Ditto.
(TARGET_ATTRIBUTE_TABLE): New define.
* config/aarch64/aarch64.h (EPILOGUE_USES): Redefine.
(FP_SIMD_SAVED_REGNUM_P): New macro.
* config/aarch64/aarch64.md (V23_REGNUM) New constant.
(simple_return): New define_expand.
(load_pair_dw_tftf): New instruction.
(store_pair_dw_tftf): Ditto.
(loadwb_pair_): Ditto.
("storewb_pair_): Ditto.

2018-08-20  Steve Ellcey  

* gcc.target/aarch64/torture/aarch64-torture.exp: New file.
* gcc.target/aarch64/torture/simd-abi-1.c: New test.
* gcc.target/aarch64/torture/simd-abi-2.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-3.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-4.c: Ditto.


@@ -4469,6 +4536,9 @@ aarch64_restore_callee_saves (machine_mode mode,
   unsigned regno;
   unsigned regno2;
   poly_int64 offset;
+  HOST_WIDE_INT mode_size;
+
+  gcc_assert (GET_MODE_SIZE (mode).is_constant(&mode_size));
 
I think you want to use the gcc_unreachable approach here:


  if (!GET_MODE_SIZE (mode).is_constant(&mode_size))
gcc_unreachable ();

just in case the gcc_assert is compiled out.





Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute

2018-08-31 Thread Steve Ellcey


Ping.  Any feedback from the Aarch64 maintainers?

Steve Ellcey
sell...@cavium.com



On Mon, 2018-08-20 at 10:37 -0700, Steve Ellcey wrote:
> On Tue, 2018-08-07 at 12:15 -0500, Segher Boessenkool wrote:
> 
> > 
> > > 
> > > +/* { dg-final { scan-assembler-not "\[ \t\]stp\tq\[01234567\]" }
> > > }
> > > */
> > That's [0-7] but maybe you find [01234567] more readable here.
> Segher,  I fixed all the issues you pointed out except this one.
>  Since
> there are some uses of non consecutive numbers in one of the tests I 
> decided to leave [01234567] instead of using [0-7].  Here is the 
> latest version of the patch, there are no semantic changes, just
> syntactical ones to address the issues that you pointed out.
> 
> Steve Ellcey
> sell...@cavium.com
> 
> 
> 2018-08-20  Steve Ellcey  
> 
>   * config/aarch64/aarch64-protos.h
> (aarch64_use_simple_return_insn_p):
>   New prototype.
>   (aarch64_epilogue_uses): Ditto.
>   * config/aarch64/aarch64.c (aarch64_attribute_table): New
> array.
>   (aarch64_simd_decl_p): New function.
>   (aarch64_reg_save_mode): New function.
>   (aarch64_is_simd_call_p): New function.
>   (aarch64_function_ok_for_sibcall): Check for simd calls.
>   (aarch64_layout_frame): Check for simd function.
>   (aarch64_gen_storewb_pair): Handle E_TFmode.
>   (aarch64_push_regs): Use aarch64_reg_save_mode to get mode.
>   (aarch64_gen_loadwb_pair): Handle E_TFmode.
>   (aarch64_pop_regs): Use aarch64_reg_save_mode to get mode.
>   (aarch64_gen_store_pair): Handle E_TFmode.
>   (aarch64_gen_load_pair): Ditto.
>   (aarch64_save_callee_saves): Handle different mode sizes.
>   (aarch64_restore_callee_saves): Ditto.
>   (aarch64_components_for_bb): Check for simd function.
>   (aarch64_epilogue_uses): New function.
>   (aarch64_process_components): Check for simd function.
>   (aarch64_expand_prologue): Ditto.
>   (aarch64_expand_epilogue): Ditto.
>   (aarch64_expand_call): Ditto.
>   (TARGET_ATTRIBUTE_TABLE): New define.
>   * config/aarch64/aarch64.h (EPILOGUE_USES): Redefine.
>   (FP_SIMD_SAVED_REGNUM_P): New macro.
>   * config/aarch64/aarch64.md (V23_REGNUM) New constant.
>   (simple_return): New define_expand.
>   (load_pair_dw_tftf): New instruction.
>   (store_pair_dw_tftf): Ditto.
>   (loadwb_pair_): Ditto.
>   ("storewb_pair_): Ditto.
> 
> 2018-08-20  Steve Ellcey  
> 
>   * gcc.target/aarch64/torture/aarch64-torture.exp: New file.
>   * gcc.target/aarch64/torture/simd-abi-1.c: New test.
>   * gcc.target/aarch64/torture/simd-abi-2.c: Ditto.
>   * gcc.target/aarch64/torture/simd-abi-3.c: Ditto.
>   * gcc.target/aarch64/torture/simd-abi-4.c: Ditto.


Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute

2018-08-20 Thread Steve Ellcey
On Tue, 2018-08-07 at 12:15 -0500, Segher Boessenkool wrote:

> > +/* { dg-final { scan-assembler-not "\[ \t\]stp\tq\[01234567\]" } }
> > */
> That's [0-7] but maybe you find [01234567] more readable here.

Segher,  I fixed all the issues you pointed out except this one.  Since
there are some uses of non consecutive numbers in one of the tests I 
decided to leave [01234567] instead of using [0-7].  Here is the 
latest version of the patch, there are no semantic changes, just
syntactical ones to address the issues that you pointed out.

Steve Ellcey
sell...@cavium.com


2018-08-20  Steve Ellcey  

* config/aarch64/aarch64-protos.h (aarch64_use_simple_return_insn_p):
New prototype.
(aarch64_epilogue_uses): Ditto.
* config/aarch64/aarch64.c (aarch64_attribute_table): New array.
(aarch64_simd_decl_p): New function.
(aarch64_reg_save_mode): New function.
(aarch64_is_simd_call_p): New function.
(aarch64_function_ok_for_sibcall): Check for simd calls.
(aarch64_layout_frame): Check for simd function.
(aarch64_gen_storewb_pair): Handle E_TFmode.
(aarch64_push_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_loadwb_pair): Handle E_TFmode.
(aarch64_pop_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_store_pair): Handle E_TFmode.
(aarch64_gen_load_pair): Ditto.
(aarch64_save_callee_saves): Handle different mode sizes.
(aarch64_restore_callee_saves): Ditto.
(aarch64_components_for_bb): Check for simd function.
(aarch64_epilogue_uses): New function.
(aarch64_process_components): Check for simd function.
(aarch64_expand_prologue): Ditto.
(aarch64_expand_epilogue): Ditto.
(aarch64_expand_call): Ditto.
(TARGET_ATTRIBUTE_TABLE): New define.
* config/aarch64/aarch64.h (EPILOGUE_USES): Redefine.
(FP_SIMD_SAVED_REGNUM_P): New macro.
* config/aarch64/aarch64.md (V23_REGNUM) New constant.
(simple_return): New define_expand.
(load_pair_dw_tftf): New instruction.
(store_pair_dw_tftf): Ditto.
(loadwb_pair_): Ditto.
("storewb_pair_): Ditto.

2018-08-20  Steve Ellcey  

* gcc.target/aarch64/torture/aarch64-torture.exp: New file.
* gcc.target/aarch64/torture/simd-abi-1.c: New test.
* gcc.target/aarch64/torture/simd-abi-2.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-3.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-4.c: Ditto.diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index af5db9c..99c962f 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -423,6 +423,7 @@ bool aarch64_split_dimode_const_store (rtx, rtx);
 bool aarch64_symbolic_address_p (rtx);
 bool aarch64_uimm12_shift (HOST_WIDE_INT);
 bool aarch64_use_return_insn_p (void);
+bool aarch64_use_simple_return_insn_p (void);
 const char *aarch64_mangle_builtin_type (const_tree);
 const char *aarch64_output_casesi (rtx *);
 
@@ -507,6 +508,8 @@ void aarch64_split_simd_move (rtx, rtx);
 /* Check for a legitimate floating point constant for FMOV.  */
 bool aarch64_float_const_representable_p (rtx);
 
+extern int aarch64_epilogue_uses (int);
+
 #if defined (RTX_CODE)
 void aarch64_gen_unlikely_cbranch (enum rtx_code, machine_mode cc_mode,
    rtx label_ref);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index fa01475..9cffec8 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1027,6 +1027,15 @@ static const struct processor *selected_tune;
 /* The current tuning set.  */
 struct tune_params aarch64_tune_params = generic_tunings;
 
+/* Table of machine attributes.  */
+static const struct attribute_spec aarch64_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "aarch64_vector_pcs", 0, 0, true,  false, false, false, NULL, NULL },
+  { NULL, 0, 0, false, false, false, false, NULL, NULL }
+};
+
 #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0)
 
 /* An ISA extension in the co-processor and main instruction set space.  */
@@ -1405,6 +1414,31 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
   return false;
 }
 
+/* Return true if this is a definition of a vectorized simd function.  */
+
+static bool
+aarch64_simd_decl_p (tree fndecl)
+{
+  if (lookup_attribute ("aarch64_vector_pcs", DECL_ATTRIBUTES (fndecl)) != NULL)
+return true;
+  if (lookup_attribute ("simd", DECL_ATTRIBUTES (fndecl)) == NULL)
+return false;
+  return (VECTOR_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl;
+}
+
+/* Return the mode a register save/restore should use.  DImode for integer
+   registers, DFmode for FP registers in non-SIMD functions (they only save
+   the bottom half of a 128 bit register), or TF

Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute

2018-08-07 Thread Segher Boessenkool
Hi!

Some very trivial comments...

On Mon, Aug 06, 2018 at 03:13:52PM -0700, Steve Ellcey wrote:
>   (aarch64_components_for_bb): Check for simd function.
>   (aarch64_epilogue_uses): New function.
>   (aarch64_process_components): Ditto.
>   (aarch64_expand_prologue): Ditto.
>   (aarch64_expand_epilogue): Ditto.
>   (aarch64_expand_call): Ditto.

Those "Ditto"s are meant to read "Check for simd function", not "New
function".

> +int
> +aarch64_epilogue_uses (int regno)
> +{
> +  if (epilogue_completed && (regno) == LR_REGNUM)

Those parens around "regno" are a bit superfluous, makes the reader think
there is more going on than there is ;-)

> +(define_insn "store_pair_dw_tftf"
> +  [(set (match_operand:TF 0 "aarch64_mem_pair_operand" "=Ump")
> + (match_operand:TF 1 "register_operand" "w"))
> +   (set (match_operand:TF 2 "memory_operand" "=m")
> + (match_operand:TF 3 "register_operand" "w"))]
> +   "TARGET_SIMD &&
> +rtx_equal_p (XEXP (operands[2], 0),

The && should be on the continuation line.

> +(define_insn "loadwb_pair_"
> +  [(parallel
> +[(set (match_operand:P 0 "register_operand" "=k")
> +  (plus:P (match_operand:P 1 "register_operand" "0")
> +  (match_operand:P 4 "aarch64_mem_pair_offset" "n")))
> + (set (match_operand:TX 2 "register_operand" "=w")
> +  (mem:TX (match_dup 1)))
> + (set (match_operand:TX 3 "register_operand" "=w")
> +  (mem:TX (plus:P (match_dup 1)
> +  (match_operand:P 5 "const_int_operand" "n"])]
> +  "TARGET_SIMD && INTVAL (operands[5]) == GET_MODE_SIZE (mode)"
> +  "ldp\\t%q2, %q3, [%1], %4"
> +  [(set_attr "type" "neon_ldp_q")]
> +)

Here you don't indent with tabs.

> +/* { dg-final { scan-assembler "\[ \t\]stp\tq10, q11" } } */

If you use {} instead of "" you don't need to backtick everything.
Also, instead of [ \t] you can probably use [[:space:]] which has
shorthand \s .

> +/* { dg-final { scan-assembler-not "\[ \t\]stp\tq\[01234567\]" } } */

That's [0-7] but maybe you find [01234567] more readable here.

> +/* { dg-final { scan-assembler-not "\[ \t\]str\t" } } */

You can also use \m and \M for start resp. end of word:

/* { dg-final { scan-assembler-not {\mstr\M} } } */

(or if you like double quotes better that is:

/* { dg-final { scan-assembler-not "\\mstr\\M" } } */

but why would you want that ;-) )


Segher


Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute

2018-08-07 Thread Steve Ellcey
I have a question about my own patch.  In doing testing I realized
that if I use the aarch64_vector_pcs attribute on a platform without
SIMD (i.e. -march=armv8.1-a+nosimd) I get an ICE.  That is obviously
not what we want but I was wondering what the right behaviour is.

We certainly don't want to generate code for a SIMD function on a
target that does not support SIMD, but is it OK to declare something
with the simd or aarch64_vector_pcs attribute if it is never used or called?
Or should even the declaration of a function with the simd/aarch64_vector_pcs
attribute result in a compilation error?  I don't think we want to
complain about simd declarations because those could show up in the system
headers like mathcalls.h even when not used.  Or should those be ifdef'ed
in some way based on the __ARM_NEON macro so they don't show up when not
compiling for SIMD?

Any ideas on where should this check be done, I thought the
TARGET_OPTION_VALID_ATTRIBUTE_P hook might be the right place, but that
seems to be specific to the 'target' attribute only, not attributes
in general.

Steve Ellcey
sell...@cavium.com


Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute

2018-08-06 Thread Steve Ellcey
Thanks for the feedback Kyrill.  I have updated my patch and attached
the new version to this email.  The one change I did not make was to
remove load_pair_dw_tftf and store_pair_dw_tftf and use the
load_pair and vec_store_pair
patterns.  Having to add two new iterators to remove two instructions
didn't seem like much of an advantage and I liked having the names
for tf to match the other load_pair_dw/store_pair_dw instructions.
If you feel strongly about this I can go ahead and make that change
though.

With respect to the new tests iterating over the various options, I am
not sure how that works but it does.  I copied the aarch64-torture.exp
file from one of the other targets and verified that it ran the
tests with -O0, -O1, -O2, '-O3 -g', -Os, '-O2 -flto -fno-use-linker-
plugin -flto-partition=none', and '-O2 -flto -fuse-linker-plugin -fno-
fat-lto-objects'.

Steve Ellcey
sell...@cavium.com


2018-08-06  Steve Ellcey  

* config/aarch64/aarch64-protos.h (aarch64_use_simple_return_insn_p):
New prototype.
(aarch64_epilogue_uses): Ditto.
* config/aarch64/aarch64.c (aarch64_attribute_table): New array.
(aarch64_simd_decl_p): New function.
(aarch64_reg_save_mode): New function.
(aarch64_is_simd_call_p): New function.
(aarch64_function_ok_for_sibcall): Check for simd calls.
(aarch64_layout_frame): Check for simd function.
(aarch64_gen_storewb_pair): Handle E_TFmode.
(aarch64_push_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_loadwb_pair): Handle E_TFmode.
(aarch64_pop_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_store_pair): Handle E_TFmode.
(aarch64_gen_load_pair): Ditto.
(aarch64_save_callee_saves): Handle different mode sizes.
(aarch64_restore_callee_saves): Ditto.
(aarch64_components_for_bb): Check for simd function.
(aarch64_epilogue_uses): New function.
(aarch64_process_components): Ditto.
(aarch64_expand_prologue): Ditto.
(aarch64_expand_epilogue): Ditto.
(aarch64_expand_call): Ditto.
(TARGET_ATTRIBUTE_TABLE): New define.
* config/aarch64/aarch64.h (EPILOGUE_USES): Redefine.
(FP_SIMD_SAVED_REGNUM_P): New macro.
* config/aarch64/aarch64.md (V23_REGNUM) New constant.
(simple_return): New define_expand.
(load_pair_dw_tftf): New instruction.
(store_pair_dw_tftf): Ditto.
(loadwb_pair_): Ditto.
("storewb_pair_): Ditto.


2018-08-06  Steve Ellcey  

* gcc.target/aarch64/torture/aarch64-torture.exp: New file.
* gcc.target/aarch64/torture/simd-abi-1.c: New test.
* gcc.target/aarch64/torture/simd-abi-2.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-3.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-4.c: Ditto.diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index af5db9c..99c962f 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -423,6 +423,7 @@ bool aarch64_split_dimode_const_store (rtx, rtx);
 bool aarch64_symbolic_address_p (rtx);
 bool aarch64_uimm12_shift (HOST_WIDE_INT);
 bool aarch64_use_return_insn_p (void);
+bool aarch64_use_simple_return_insn_p (void);
 const char *aarch64_mangle_builtin_type (const_tree);
 const char *aarch64_output_casesi (rtx *);
 
@@ -507,6 +508,8 @@ void aarch64_split_simd_move (rtx, rtx);
 /* Check for a legitimate floating point constant for FMOV.  */
 bool aarch64_float_const_representable_p (rtx);
 
+extern int aarch64_epilogue_uses (int);
+
 #if defined (RTX_CODE)
 void aarch64_gen_unlikely_cbranch (enum rtx_code, machine_mode cc_mode,
    rtx label_ref);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index fa01475..7ab2a60 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1027,6 +1027,15 @@ static const struct processor *selected_tune;
 /* The current tuning set.  */
 struct tune_params aarch64_tune_params = generic_tunings;
 
+/* Table of machine attributes.  */
+static const struct attribute_spec aarch64_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "aarch64_vector_pcs", 0, 0, true,  false, false, false, NULL, NULL },
+  { NULL, 0, 0, false, false, false, false, NULL, NULL }
+};
+
 #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0)
 
 /* An ISA extension in the co-processor and main instruction set space.  */
@@ -1405,6 +1414,31 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
   return false;
 }
 
+/* Return true if this is a definition of a vectorized simd function.  */
+
+static bool
+aarch64_simd_decl_p (tree fndecl)
+{
+  if (lookup_attribute ("aarch64_vector_pcs", DECL_ATTRIBUTES (fndecl)) != NULL)
+return true;
+  if (lookup_attribute ("simd", DECL_A

Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute

2018-08-01 Thread Kyrill Tkachov

Hi Steve,

On 31/07/18 23:24, Steve Ellcey wrote:

Here is a new version of my patch to support the Aarch64 SIMD ABI [1]
in GCC.  I think this is complete enought to be considered for check
in.  I wrote a few new tests and put them in a new gcc.target/torture
directory so they would be run with multiple optimization options.  I
also verified that there are no regressions in the GCC testsuite.

The significant difference between the standard ARM ABI and the SIMD
ABI is that in the normal ABI a callee saves only the lower 64 bits of
registers V8-V15, in the SIMD ABI the callee must save all 128 bits of
registers V8-V23.

As I mentioned in my RFC, I intend to (eventually) follow this patch
with two more, one to define the TARGET_SIMD_CLONE* macros and one to
improve the GCC register allocation/usage when calling SIMD
functions.  Right now, a caller calling a SIMD function will save more
registers than it needs to because some of those registers will also be
saved by the callee.



Thanks for working on this!
A few comments on the patch inline.

Thanks,
Kyrill


Steve Ellcey
sell...@cavium.com

[1] https://developer.arm.com/products/software-development-tools/hpc/a
rm-compiler-for-hpc/vector-function-abi

Compiler ChangeLog:

2018-07-31  Steve Ellcey  

* config/aarch64/aarch64-protos.h (aarch64_use_simple_return_insn_p):
New prototype.
(aarch64_epilogue_uses): Ditto.
* config/aarch64/aarch64.c (aarch64_attribute_table): New array.
(aarch64_simd_decl_p): New function.
(aarch64_reg_save_mode): New function.
(aarch64_is_simd_call_p): New function.
(aarch64_function_ok_for_sibcall): Check for simd calls.
(aarch64_layout_frame): Check for simd function.
(aarch64_gen_storewb_pair): Handle E_TFmode.
(aarch64_push_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_loadwb_pair): Handle E_TFmode.
(aarch64_pop_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_store_pair): Handle E_TFmode.
(aarch64_gen_load_pair): Ditto.
(aarch64_save_callee_saves): Handle different mode sizes.
(aarch64_restore_callee_saves): Ditto.
(aarch64_components_for_bb): Check for simd function.
(aarch64_epilogue_uses): New function.
(aarch64_process_components): Ditto.
(aarch64_expand_prologue): Ditto.
(aarch64_expand_epilogue): Ditto.
(aarch64_expand_call): Ditto.
(TARGET_ATTRIBUTE_TABLE): New define.
* config/aarch64/aarch64.h (EPILOGUE_USES): Redefine.
(FP_SIMD_SAVED_REGNUM_P): New macro.
* config/aarch64/aarch64.md (V23_REGNUM) New constant.
(simple_return): New define_expand.
(load_pair_dw_tftf): New instruction.
(store_pair_dw_tftf): Ditto.
(loadwb_pair_): Ditto.
("storewb_pair_): Ditto.

Testsuite ChangeLog:

2018-07-31  Steve Ellcey  

* gcc.target/aarch64/torture/aarch64-torture.exp: New file.
* gcc.target/aarch64/torture/simd-abi-1.c: New test.
* gcc.target/aarch64/torture/simd-abi-2.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-3.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-4.c: Ditto.


gcc-vect-abi.patch


diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index af5db9c..99c962f 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -423,6 +423,7 @@ bool aarch64_split_dimode_const_store (rtx, rtx);
 bool aarch64_symbolic_address_p (rtx);
 bool aarch64_uimm12_shift (HOST_WIDE_INT);
 bool aarch64_use_return_insn_p (void);
+bool aarch64_use_simple_return_insn_p (void);
 const char *aarch64_mangle_builtin_type (const_tree);
 const char *aarch64_output_casesi (rtx *);
 
@@ -507,6 +508,8 @@ void aarch64_split_simd_move (rtx, rtx);

 /* Check for a legitimate floating point constant for FMOV.  */
 bool aarch64_float_const_representable_p (rtx);
 
+extern int aarch64_epilogue_uses (int);

+
 #if defined (RTX_CODE)
 void aarch64_gen_unlikely_cbranch (enum rtx_code, machine_mode cc_mode,
   rtx label_ref);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index fa01475..9e6827a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1027,6 +1027,15 @@ static const struct processor *selected_tune;
 /* The current tuning set.  */
 struct tune_params aarch64_tune_params = generic_tunings;
 
+/* Table of machine attributes.  */

+static const struct attribute_spec aarch64_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "aarch64_vector_pcs", 0, 0, true,  false, false, false, NULL, NULL },
+  { NULL, 0, 0, false, false, false, false, NULL, NULL }
+};
+
 #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0)
 
 /* An ISA extension in the co-processor 

[Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute

2018-07-31 Thread Steve Ellcey
Here is a new version of my patch to support the Aarch64 SIMD ABI [1]
in GCC.  I think this is complete enought to be considered for check
in.  I wrote a few new tests and put them in a new gcc.target/torture
directory so they would be run with multiple optimization options.  I
also verified that there are no regressions in the GCC testsuite.

The significant difference between the standard ARM ABI and the SIMD
ABI is that in the normal ABI a callee saves only the lower 64 bits of 
registers V8-V15, in the SIMD ABI the callee must save all 128 bits of
registers V8-V23.

As I mentioned in my RFC, I intend to (eventually) follow this patch
with two more, one to define the TARGET_SIMD_CLONE* macros and one to
improve the GCC register allocation/usage when calling SIMD
functions.  Right now, a caller calling a SIMD function will save more
registers than it needs to because some of those registers will also be
saved by the callee.

Steve Ellcey
sell...@cavium.com

[1] https://developer.arm.com/products/software-development-tools/hpc/a
rm-compiler-for-hpc/vector-function-abi

Compiler ChangeLog:

2018-07-31  Steve Ellcey  

* config/aarch64/aarch64-protos.h (aarch64_use_simple_return_insn_p):
New prototype.
(aarch64_epilogue_uses): Ditto.
* config/aarch64/aarch64.c (aarch64_attribute_table): New array.
(aarch64_simd_decl_p): New function.
(aarch64_reg_save_mode): New function.
(aarch64_is_simd_call_p): New function.
(aarch64_function_ok_for_sibcall): Check for simd calls.
(aarch64_layout_frame): Check for simd function.
(aarch64_gen_storewb_pair): Handle E_TFmode.
(aarch64_push_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_loadwb_pair): Handle E_TFmode.
(aarch64_pop_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_store_pair): Handle E_TFmode.
(aarch64_gen_load_pair): Ditto.
(aarch64_save_callee_saves): Handle different mode sizes.
(aarch64_restore_callee_saves): Ditto.
(aarch64_components_for_bb): Check for simd function.
(aarch64_epilogue_uses): New function.
(aarch64_process_components): Ditto.
(aarch64_expand_prologue): Ditto.
(aarch64_expand_epilogue): Ditto.
(aarch64_expand_call): Ditto.
(TARGET_ATTRIBUTE_TABLE): New define.
* config/aarch64/aarch64.h (EPILOGUE_USES): Redefine.
(FP_SIMD_SAVED_REGNUM_P): New macro.
* config/aarch64/aarch64.md (V23_REGNUM) New constant.
(simple_return): New define_expand.
(load_pair_dw_tftf): New instruction.
(store_pair_dw_tftf): Ditto.
(loadwb_pair_): Ditto.
("storewb_pair_): Ditto.

Testsuite ChangeLog:

2018-07-31  Steve Ellcey  

* gcc.target/aarch64/torture/aarch64-torture.exp: New file.
* gcc.target/aarch64/torture/simd-abi-1.c: New test.
* gcc.target/aarch64/torture/simd-abi-2.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-3.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-4.c: Ditto.diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index af5db9c..99c962f 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -423,6 +423,7 @@ bool aarch64_split_dimode_const_store (rtx, rtx);
 bool aarch64_symbolic_address_p (rtx);
 bool aarch64_uimm12_shift (HOST_WIDE_INT);
 bool aarch64_use_return_insn_p (void);
+bool aarch64_use_simple_return_insn_p (void);
 const char *aarch64_mangle_builtin_type (const_tree);
 const char *aarch64_output_casesi (rtx *);
 
@@ -507,6 +508,8 @@ void aarch64_split_simd_move (rtx, rtx);
 /* Check for a legitimate floating point constant for FMOV.  */
 bool aarch64_float_const_representable_p (rtx);
 
+extern int aarch64_epilogue_uses (int);
+
 #if defined (RTX_CODE)
 void aarch64_gen_unlikely_cbranch (enum rtx_code, machine_mode cc_mode,
    rtx label_ref);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index fa01475..9e6827a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1027,6 +1027,15 @@ static const struct processor *selected_tune;
 /* The current tuning set.  */
 struct tune_params aarch64_tune_params = generic_tunings;
 
+/* Table of machine attributes.  */
+static const struct attribute_spec aarch64_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "aarch64_vector_pcs", 0, 0, true,  false, false, false, NULL, NULL },
+  { NULL, 0, 0, false, false, false, false, NULL, NULL }
+};
+
 #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0)
 
 /* An ISA extension in the co-processor and main instruction set space.  */
@@ -1405,6 +1414,26 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
   return false;
 }
 
+/* Return true if this is a definition of a v