Re: [Patch][GCC][middle-end] - Lower store and load neon builtins to gimple

2021-10-21 Thread Christophe LYON via Gcc-patches



On 20/10/2021 12:16, Richard Biener via Gcc-patches wrote:

On Wed, 20 Oct 2021, Andre Vieira (lists) wrote:


On 27/09/2021 12:54, Richard Biener via Gcc-patches wrote:

On Mon, 27 Sep 2021, Jirui Wu wrote:


Hi all,

I now use the type based on the specification of the intrinsic
instead of type based on formal argument.

I use signed Int vector types because the outputs of the neon builtins
that I am lowering is always signed. In addition, fcode and stmt
does not have information on whether the result is signed.

Because I am replacing the stmt with new_stmt,
a VIEW_CONVERT_EXPR cast is already in the code if needed.
As a result, the result assembly code is correct.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? If OK can it be committed for me, I have no commit rights.

+   tree temp_lhs = gimple_call_lhs (stmt);
+   aarch64_simd_type_info simd_type
+ = aarch64_simd_types[mem_type];
+   tree elt_ptr_type = build_pointer_type (simd_type.eltype);
+   tree zero = build_zero_cst (elt_ptr_type);
+   gimple_seq stmts = NULL;
+   tree base = gimple_convert (, elt_ptr_type,
+   args[0]);
+   new_stmt = gimple_build_assign (temp_lhs,
+fold_build2 (MEM_REF,
+TREE_TYPE (temp_lhs),
+base,
+zero));

this now uses the alignment info as on the LHS of the call by using
TREE_TYPE (temp_lhs) as type of the MEM_REF.  So for example

   typedef int foo __attribute__((vector_size(N),aligned(256)));

   foo tem = ld1 (ptr);

will now access *ptr as if it were aligned to 256 bytes.  But I'm sure
the ld1 intrinsic documents the required alignment (either it's the
natural alignment of the vector type loaded or element alignment?).

For element alignment you'd do sth like

tree access_type = build_aligned_type (vector_type, TYPE_ALIGN
(TREE_TYPE (vector_type)));

for example.

Richard.

Hi,

I'm taking over this patch from Jirui.

I've decided to use the vector type stored in aarch64_simd_type_info, since
that should always have the correct alignment.

To be fair though, I do wonder whether this is actually needed as is right
now, since the way we cast the inputs and outputs of these __builtins in
arm_neon.h prevents these issues I think, but it is more future proof. Also
you could argue people could use the __builtins directly, though I'd think
that would be at their own risk.

Is this OK?

Yes, this variant looks OK.



Hi Andre,

These new tests fail on aarch64_be:

gcc.target/aarch64/fmla_intrinsic_1.c scan-assembler-times fmadd\\td[0-9]+, 
d[0-9]+, d[0-9]+, d[0-9]+ 2
gcc.target/aarch64/fmla_intrinsic_1.c scan-assembler-times fmla\\tv[0-9]+.2d, 
v[0-9]+.2d, v[0-9]+.d\\[[0-9]+\\] 2
gcc.target/aarch64/fmls_intrinsic_1.c scan-assembler-times fmls\\tv[0-9]+.2d, 
v[0-9]+.2d, v[0-9]+.d\\[[0-9]+\\] 2
gcc.target/aarch64/fmls_intrinsic_1.c scan-assembler-times fmsub\\td[0-9]+, 
d[0-9]+, d[0-9]+, d[0-9]+ 2
gcc.target/aarch64/fmul_intrinsic_1.c scan-assembler-times fmul\\td[0-9]+, 
d[0-9]+, d[0-9]+ 2
gcc.target/aarch64/fmul_intrinsic_1.c scan-assembler-times fmul\\tv[0-9]+.2d, 
v[0-9]+.2d, v[0-9]+.d\\[[0-9]+\\] 2


I've also noticed that:

FAIL: gcc.target/aarch64/vect-vca.c execution test
on aarch64 with -mabi=ilp32

Christophe





Kind regards,
Andre



Re: [Patch][GCC][middle-end] - Lower store and load neon builtins to gimple

2021-10-20 Thread Richard Biener via Gcc-patches
On Wed, 20 Oct 2021, Andre Vieira (lists) wrote:

> On 27/09/2021 12:54, Richard Biener via Gcc-patches wrote:
> > On Mon, 27 Sep 2021, Jirui Wu wrote:
> >
> >> Hi all,
> >>
> >> I now use the type based on the specification of the intrinsic
> >> instead of type based on formal argument.
> >>
> >> I use signed Int vector types because the outputs of the neon builtins
> >> that I am lowering is always signed. In addition, fcode and stmt
> >> does not have information on whether the result is signed.
> >>
> >> Because I am replacing the stmt with new_stmt,
> >> a VIEW_CONVERT_EXPR cast is already in the code if needed.
> >> As a result, the result assembly code is correct.
> >>
> >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >>
> >> Ok for master? If OK can it be committed for me, I have no commit rights.
> > +   tree temp_lhs = gimple_call_lhs (stmt);
> > +   aarch64_simd_type_info simd_type
> > + = aarch64_simd_types[mem_type];
> > +   tree elt_ptr_type = build_pointer_type (simd_type.eltype);
> > +   tree zero = build_zero_cst (elt_ptr_type);
> > +   gimple_seq stmts = NULL;
> > +   tree base = gimple_convert (, elt_ptr_type,
> > +   args[0]);
> > +   new_stmt = gimple_build_assign (temp_lhs,
> > +fold_build2 (MEM_REF,
> > +TREE_TYPE (temp_lhs),
> > +base,
> > +zero));
> >
> > this now uses the alignment info as on the LHS of the call by using
> > TREE_TYPE (temp_lhs) as type of the MEM_REF.  So for example
> >
> >   typedef int foo __attribute__((vector_size(N),aligned(256)));
> >
> >   foo tem = ld1 (ptr);
> >
> > will now access *ptr as if it were aligned to 256 bytes.  But I'm sure
> > the ld1 intrinsic documents the required alignment (either it's the
> > natural alignment of the vector type loaded or element alignment?).
> >
> > For element alignment you'd do sth like
> >
> >tree access_type = build_aligned_type (vector_type, TYPE_ALIGN
> > (TREE_TYPE (vector_type)));
> >
> > for example.
> >
> > Richard.
> Hi,
> 
> I'm taking over this patch from Jirui.
> 
> I've decided to use the vector type stored in aarch64_simd_type_info, since
> that should always have the correct alignment.
> 
> To be fair though, I do wonder whether this is actually needed as is right
> now, since the way we cast the inputs and outputs of these __builtins in
> arm_neon.h prevents these issues I think, but it is more future proof. Also
> you could argue people could use the __builtins directly, though I'd think
> that would be at their own risk.
> 
> Is this OK?

Yes, this variant looks OK.

> Kind regards,
> Andre
> 


Re: [Patch][GCC][middle-end] - Lower store and load neon builtins to gimple

2021-10-20 Thread Andre Vieira (lists) via Gcc-patches

On 27/09/2021 12:54, Richard Biener via Gcc-patches wrote:

On Mon, 27 Sep 2021, Jirui Wu wrote:


Hi all,

I now use the type based on the specification of the intrinsic
instead of type based on formal argument.

I use signed Int vector types because the outputs of the neon builtins
that I am lowering is always signed. In addition, fcode and stmt
does not have information on whether the result is signed.

Because I am replacing the stmt with new_stmt,
a VIEW_CONVERT_EXPR cast is already in the code if needed.
As a result, the result assembly code is correct.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? If OK can it be committed for me, I have no commit rights.

+   tree temp_lhs = gimple_call_lhs (stmt);
+   aarch64_simd_type_info simd_type
+ = aarch64_simd_types[mem_type];
+   tree elt_ptr_type = build_pointer_type (simd_type.eltype);
+   tree zero = build_zero_cst (elt_ptr_type);
+   gimple_seq stmts = NULL;
+   tree base = gimple_convert (, elt_ptr_type,
+   args[0]);
+   new_stmt = gimple_build_assign (temp_lhs,
+fold_build2 (MEM_REF,
+TREE_TYPE (temp_lhs),
+base,
+zero));

this now uses the alignment info as on the LHS of the call by using
TREE_TYPE (temp_lhs) as type of the MEM_REF.  So for example

  typedef int foo __attribute__((vector_size(N),aligned(256)));

  foo tem = ld1 (ptr);

will now access *ptr as if it were aligned to 256 bytes.  But I'm sure
the ld1 intrinsic documents the required alignment (either it's the
natural alignment of the vector type loaded or element alignment?).

For element alignment you'd do sth like

   tree access_type = build_aligned_type (vector_type, TYPE_ALIGN
(TREE_TYPE (vector_type)));

for example.

Richard.

Hi,

I'm taking over this patch from Jirui.

I've decided to use the vector type stored in aarch64_simd_type_info, 
since that should always have the correct alignment.


To be fair though, I do wonder whether this is actually needed as is 
right now, since the way we cast the inputs and outputs of these 
__builtins in arm_neon.h prevents these issues I think, but it is more 
future proof. Also you could argue people could use the __builtins 
directly, though I'd think that would be at their own risk.


Is this OK?

Kind regards,
Andrediff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 
1a507ea59142d0b5977b0167abfe9a58a567adf7..a815e4cfbccab692ca688ba87c71b06c304abbfb
 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -46,6 +46,7 @@
 #include "emit-rtl.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "gimple-fold.h"
 
 #define v8qi_UP  E_V8QImode
 #define v4hi_UP  E_V4HImode
@@ -2399,11 +2400,65 @@ aarch64_general_fold_builtin (unsigned int fcode, tree 
type,
   return NULL_TREE;
 }
 
+enum aarch64_simd_type
+get_mem_type_for_load_store (unsigned int fcode)
+{
+  switch (fcode)
+  {
+VAR1 (LOAD1, ld1 , 0, LOAD, v8qi)
+VAR1 (STORE1, st1 , 0, STORE, v8qi)
+  return Int8x8_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v16qi)
+VAR1 (STORE1, st1 , 0, STORE, v16qi)
+  return Int8x16_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v4hi)
+VAR1 (STORE1, st1 , 0, STORE, v4hi)
+  return Int16x4_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v8hi)
+VAR1 (STORE1, st1 , 0, STORE, v8hi)
+  return Int16x8_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v2si)
+VAR1 (STORE1, st1 , 0, STORE, v2si)
+  return Int32x2_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v4si)
+VAR1 (STORE1, st1 , 0, STORE, v4si)
+  return Int32x4_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v2di)
+VAR1 (STORE1, st1 , 0, STORE, v2di)
+  return Int64x2_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v4hf)
+VAR1 (STORE1, st1 , 0, STORE, v4hf)
+  return Float16x4_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v8hf)
+VAR1 (STORE1, st1 , 0, STORE, v8hf)
+  return Float16x8_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v4bf)
+VAR1 (STORE1, st1 , 0, STORE, v4bf)
+  return Bfloat16x4_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v8bf)
+VAR1 (STORE1, st1 , 0, STORE, v8bf)
+  return Bfloat16x8_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v2sf)
+VAR1 (STORE1, st1 , 0, STORE, v2sf)
+  return Float32x2_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v4sf)
+VAR1 (STORE1, st1 , 0, STORE, v4sf)
+  return Float32x4_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v2df)
+VAR1 (STORE1, st1 , 0, STORE, v2df)
+  return Float64x2_t;
+default:
+  gcc_unreachable ();
+  break;
+  }
+}
+
 /* Try to fold STMT, given that it's a call to the built-in function with
subcode FCODE.  Return the new statement on success and null on
failure.  */
 gimple *
-aarch64_general_gimple_fold_builtin (unsigned int fcode, gcall *stmt)

RE: [Patch][GCC][middle-end] - Lower store and load neon builtins to gimple

2021-09-27 Thread Richard Biener via Gcc-patches
On Mon, 27 Sep 2021, Jirui Wu wrote:

> Hi all,
> 
> I now use the type based on the specification of the intrinsic
> instead of type based on formal argument. 
> 
> I use signed Int vector types because the outputs of the neon builtins
> that I am lowering is always signed. In addition, fcode and stmt
> does not have information on whether the result is signed.
> 
> Because I am replacing the stmt with new_stmt,
> a VIEW_CONVERT_EXPR cast is already in the code if needed.
> As a result, the result assembly code is correct.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master? If OK can it be committed for me, I have no commit rights.

+   tree temp_lhs = gimple_call_lhs (stmt);
+   aarch64_simd_type_info simd_type
+ = aarch64_simd_types[mem_type];
+   tree elt_ptr_type = build_pointer_type (simd_type.eltype);
+   tree zero = build_zero_cst (elt_ptr_type);
+   gimple_seq stmts = NULL;
+   tree base = gimple_convert (, elt_ptr_type,
+   args[0]);
+   new_stmt = gimple_build_assign (temp_lhs,
+fold_build2 (MEM_REF,
+TREE_TYPE (temp_lhs),
+base,
+zero));

this now uses the alignment info as on the LHS of the call by using
TREE_TYPE (temp_lhs) as type of the MEM_REF.  So for example

 typedef int foo __attribute__((vector_size(N),aligned(256)));

 foo tem = ld1 (ptr);

will now access *ptr as if it were aligned to 256 bytes.  But I'm sure
the ld1 intrinsic documents the required alignment (either it's the
natural alignment of the vector type loaded or element alignment?).

For element alignment you'd do sth like

  tree access_type = build_aligned_type (vector_type, TYPE_ALIGN 
(TREE_TYPE (vector_type)));

for example.

Richard.


> Thanks,
> Jirui
> 
> > -Original Message-
> > From: Richard Biener 
> > Sent: Thursday, September 16, 2021 2:59 PM
> > To: Jirui Wu 
> > Cc: gcc-patches@gcc.gnu.org; jeffreya...@gmail.com; i...@airs.com; Richard
> > Sandiford 
> > Subject: Re: [Patch][GCC][middle-end] - Lower store and load neon builtins 
> > to
> > gimple
> > 
> > On Thu, 16 Sep 2021, Jirui Wu wrote:
> > 
> > > Hi all,
> > >
> > > This patch lowers the vld1 and vst1 variants of the store and load
> > > neon builtins functions to gimple.
> > >
> > > The changes in this patch covers:
> > > * Replaces calls to the vld1 and vst1 variants of the builtins
> > > * Uses MEM_REF gimple assignments to generate better code
> > > * Updates test cases to prevent over optimization
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master? If OK can it be committed for me, I have no commit rights.
> > 
> > +   new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
> > +   fold_build2 (MEM_REF,
> > +   TREE_TYPE
> > +   (gimple_call_lhs (stmt)),
> > +   args[0], build_int_cst
> > +   (TREE_TYPE (args[0]), 0)));
> > 
> > you are using TBAA info based on the formal argument type that might have
> > pointer conversions stripped.  Instead you should use a type based on the
> > specification of the intrinsics (or the builtins).
> > 
> > Likewise for the type of the access (mind alignment info there!).
> > 
> > Richard.
> > 
> > > Thanks,
> > > Jirui
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/aarch64/aarch64-builtins.c
> > (aarch64_general_gimple_fold_builtin):
> > > lower vld1 and vst1 variants of the neon builtins
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/aarch64/fmla_intrinsic_1.c:
> > > prevent over optimization
> > > * gcc.target/aarch64/fmls_intrinsic_1.c:
> > > prevent over optimization
> > > * gcc.target/aarch64/fmul_intrinsic_1.c:
> > > prevent over optimization
> > > * gcc.target/aarch64/mla_intrinsic_1.c:
> > > prevent over optimization
> > > * gcc.target/aarch64/mls_intrinsic_1.c:
> > > prevent over optimization
> > > * gcc.target/aarch64/mul_intrinsic_1.c:
> > > prevent over optimization
> > > * gcc.target/aarch64/simd/vmul_elem_1.c:
> > > prevent over optimization
> > > * gcc.target/aarch64/vclz.c:
> > > replace macro with function to prevent over optimization
> > > * gcc.target/aarch64/vneg_s.c:
> > > replace macro with function to prevent over optimization
> > >
> > 
> > --
> > Richard Biener 
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendï¿œrffer; HRB 36809 (AG Nuernberg)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


RE: [Patch][GCC][middle-end] - Lower store and load neon builtins to gimple

2021-09-27 Thread Jirui Wu via Gcc-patches
Hi all,

I now use the type based on the specification of the intrinsic
instead of type based on formal argument. 

I use signed Int vector types because the outputs of the neon builtins
that I am lowering is always signed. In addition, fcode and stmt
does not have information on whether the result is signed.

Because I am replacing the stmt with new_stmt,
a VIEW_CONVERT_EXPR cast is already in the code if needed.
As a result, the result assembly code is correct.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? If OK can it be committed for me, I have no commit rights.

Thanks,
Jirui

> -Original Message-
> From: Richard Biener 
> Sent: Thursday, September 16, 2021 2:59 PM
> To: Jirui Wu 
> Cc: gcc-patches@gcc.gnu.org; jeffreya...@gmail.com; i...@airs.com; Richard
> Sandiford 
> Subject: Re: [Patch][GCC][middle-end] - Lower store and load neon builtins to
> gimple
> 
> On Thu, 16 Sep 2021, Jirui Wu wrote:
> 
> > Hi all,
> >
> > This patch lowers the vld1 and vst1 variants of the store and load
> > neon builtins functions to gimple.
> >
> > The changes in this patch covers:
> > * Replaces calls to the vld1 and vst1 variants of the builtins
> > * Uses MEM_REF gimple assignments to generate better code
> > * Updates test cases to prevent over optimization
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master? If OK can it be committed for me, I have no commit rights.
> 
> +   new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
> +   fold_build2 (MEM_REF,
> +   TREE_TYPE
> +   (gimple_call_lhs (stmt)),
> +   args[0], build_int_cst
> +   (TREE_TYPE (args[0]), 0)));
> 
> you are using TBAA info based on the formal argument type that might have
> pointer conversions stripped.  Instead you should use a type based on the
> specification of the intrinsics (or the builtins).
> 
> Likewise for the type of the access (mind alignment info there!).
> 
> Richard.
> 
> > Thanks,
> > Jirui
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-builtins.c
> (aarch64_general_gimple_fold_builtin):
> > lower vld1 and vst1 variants of the neon builtins
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/fmla_intrinsic_1.c:
> > prevent over optimization
> > * gcc.target/aarch64/fmls_intrinsic_1.c:
> > prevent over optimization
> > * gcc.target/aarch64/fmul_intrinsic_1.c:
> > prevent over optimization
> > * gcc.target/aarch64/mla_intrinsic_1.c:
> > prevent over optimization
> > * gcc.target/aarch64/mls_intrinsic_1.c:
> > prevent over optimization
> > * gcc.target/aarch64/mul_intrinsic_1.c:
> > prevent over optimization
> > * gcc.target/aarch64/simd/vmul_elem_1.c:
> > prevent over optimization
> > * gcc.target/aarch64/vclz.c:
> > replace macro with function to prevent over optimization
> > * gcc.target/aarch64/vneg_s.c:
> > replace macro with function to prevent over optimization
> >
> 
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendï¿œrffer; HRB 36809 (AG Nuernberg)
diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 
119f67d4e4c9e70e9ab1de773b42a171fbdf423e..124fd35caa01ef4a83dae0626f83efb62c053bd1
 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -46,6 +46,7 @@
 #include "emit-rtl.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "gimple-fold.h"
 
 #define v8qi_UP  E_V8QImode
 #define v4hi_UP  E_V4HImode
@@ -2387,6 +2388,59 @@ aarch64_general_fold_builtin (unsigned int fcode, tree 
type,
   return NULL_TREE;
 }
 
+enum aarch64_simd_type
+get_mem_type_for_load_store (unsigned int fcode)
+{
+  switch (fcode)
+  {
+VAR1 (LOAD1, ld1 , 0, LOAD, v8qi)
+VAR1 (STORE1, st1 , 0, STORE, v8qi)
+  return Int8x8_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v16qi)
+VAR1 (STORE1, st1 , 0, STORE, v16qi)
+  return Int8x16_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v4hi)
+VAR1 (STORE1, st1 , 0, STORE, v4hi)
+  return Int16x4_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v8hi)
+VAR1 (STORE1, st1 , 0, STORE, v8hi)
+  return Int16x8_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v2si)
+VAR1 (STORE1, st1 , 0, STORE, v2si)
+  return Int32x2_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v4si)
+VAR1 (STORE1, st1 , 

Re: [Patch][GCC][middle-end] - Lower store and load neon builtins to gimple

2021-09-16 Thread Richard Biener via Gcc-patches
On Thu, 16 Sep 2021, Jirui Wu wrote:

> Hi all,
> 
> This patch lowers the vld1 and vst1 variants of the
> store and load neon builtins functions to gimple.
> 
> The changes in this patch covers:
> * Replaces calls to the vld1 and vst1 variants of the builtins
> * Uses MEM_REF gimple assignments to generate better code
> * Updates test cases to prevent over optimization
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master? If OK can it be committed for me, I have no commit rights.

+   new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
+   fold_build2 (MEM_REF,
+   TREE_TYPE
+   (gimple_call_lhs (stmt)),
+   args[0], build_int_cst
+   (TREE_TYPE (args[0]), 0)));

you are using TBAA info based on the formal argument type that might
have pointer conversions stripped.  Instead you should use a type
based on the specification of the intrinsics (or the builtins).

Likewise for the type of the access (mind alignment info there!).

Richard.

> Thanks,
> Jirui
> 
> gcc/ChangeLog:
> 
> * config/aarch64/aarch64-builtins.c 
> (aarch64_general_gimple_fold_builtin):
> lower vld1 and vst1 variants of the neon builtins
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/aarch64/fmla_intrinsic_1.c:
> prevent over optimization
> * gcc.target/aarch64/fmls_intrinsic_1.c:
> prevent over optimization
> * gcc.target/aarch64/fmul_intrinsic_1.c:
> prevent over optimization
> * gcc.target/aarch64/mla_intrinsic_1.c:
> prevent over optimization
> * gcc.target/aarch64/mls_intrinsic_1.c:
> prevent over optimization
> * gcc.target/aarch64/mul_intrinsic_1.c:
> prevent over optimization
> * gcc.target/aarch64/simd/vmul_elem_1.c:
> prevent over optimization
> * gcc.target/aarch64/vclz.c:
> replace macro with function to prevent over optimization
> * gcc.target/aarch64/vneg_s.c:
> replace macro with function to prevent over optimization
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)