Re: [PATCH][ARM][1/2] Add support for vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsics

2013-04-17 Thread Richard Earnshaw

On 17/04/13 12:06, Kyrylo Tkachov wrote:

Hi Julian,


From: Julian Brown [mailto:jul...@codesourcery.com]
Sent: 13 April 2013 15:04
To: Julian Brown
Cc: Kyrylo Tkachov; gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana
Radhakrishnan
Subject: Re: [PATCH][ARM][1/2] Add support for vcvt_f16_f32 and
vcvt_f32_f16 NEON intrinsics

On Fri, 12 Apr 2013 20:09:39 +0100
Julian Brown  wrote:


On Fri, 12 Apr 2013 15:19:18 +0100
Kyrylo Tkachov  wrote:


Hi all,

This patch adds the vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsic
to arm_neon.h through the generator ML scripts and also adds the
built-ins to which the intrinsics will map to. The generator ML
scripts are updated and used to generate the relevant .texi
documentation, arm_neon.h and the tests in gcc.target/arm/neon .


FWIW, some of the changes to neon*.ml can be simplified somewhat --

my

attempt at an improved version of those bits is attached. I'm still
not too happy with mode_suffix, but these new instructions require
adding semantics to parts of the generator program which weren't
really very well-defined to start with :-). I appreciate that it's a
bit of a tangle...


I thought of an improvement to the mode_suffix part from the last
version of the patch, so here it is. I'm done fiddling with this now,
so back to you!


Thanks for looking at it! My Ocaml-fu is rather limited.
It does look cleaner now.
Here it is together with all the other parts of the patch, plus some
minor formatting changes.

Ok for trunk now?

gcc/ChangeLog
2013-04-17  Kyrylo Tkachov  
 Julian Brown  

* config/arm/arm.c (neon_builtin_type_mode): Add T_V4HF.
(TB_DREG): Add T_V4HF.
(v4hf_UP): New macro.
(neon_itype): Add NEON_FLOAT_WIDEN, NEON_FLOAT_NARROW.
(arm_init_neon_builtins): Handle NEON_FLOAT_WIDEN,
NEON_FLOAT_NARROW.
Handle initialisation of V4HF. Adjust initialisation of reinterpret
built-ins.
(arm_expand_neon_builtin): Handle NEON_FLOAT_WIDEN,
NEON_FLOAT_NARROW.
(arm_vector_mode_supported_p): Handle V4HF.
(arm_mangle_map): Handle V4HFmode.
* config/arm/arm.h (VALID_NEON_DREG_MODE): Add V4HF.
* config/arm/arm_neon_builtins.def: Add entries for
vcvtv4hfv4sf, vcvtv4sfv4hf.
* config/arm/neon.md (neon_vcvtv4sfv4hf): New pattern.
(neon_vcvtv4hfv4sf): Likewise.
* config/arm/neon-gen.ml: Handle half-precision floating point
features.
* config/arm/neon-testgen.ml: Handle Requires_FP_bit feature.
* config/arm/arm_neon.h: Regenerate.
* config/arm/neon.ml (type elts): Add F16.
(type vectype): Add T_float16x4, T_floatHF.
(type vecmode): Add V4HF.
(type features): Add Requires_FP_bit feature.
(elt_width): Handle F16.
(elt_class): Likewise.
(elt_of_class_width): Likewise.
(mode_of_elt): Refactor.
(type_for_elt): Handle F16, fix error messages.
(vectype_size): Handle T_float16x4.
(vcvt_sh): New function.
(ops): Add entries for vcvt_f16_f32, vcvt_f32_f16.
(string_of_vectype): Handle T_floatHF, T_float16, T_float16x4.
(string_of_mode): Handle V4HF.
* doc/arm-neon-intrinsics.texi: Regenerate.


gcc/testsuite/ChangeLog
2013-04-17  Kyrylo Tkachov  
 Julian Brown  

* gcc.target/arm/neon/vcvtf16_f32.c: New test. Generated.
* gcc.target/arm/neon/vcvtf32_f16.c: Likewise.


neon-vcvt-intrinsics.patch



Please give Julian 24 hours for one final review of the Ocaml bits. 
Otherwise OK.


R.




RE: [PATCH][ARM][1/2] Add support for vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsics

2013-04-17 Thread Kyrylo Tkachov
Hi Julian,

> From: Julian Brown [mailto:jul...@codesourcery.com]
> Sent: 13 April 2013 15:04
> To: Julian Brown
> Cc: Kyrylo Tkachov; gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana
> Radhakrishnan
> Subject: Re: [PATCH][ARM][1/2] Add support for vcvt_f16_f32 and
> vcvt_f32_f16 NEON intrinsics
> 
> On Fri, 12 Apr 2013 20:09:39 +0100
> Julian Brown  wrote:
> 
> > On Fri, 12 Apr 2013 15:19:18 +0100
> > Kyrylo Tkachov  wrote:
> >
> > > Hi all,
> > >
> > > This patch adds the vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsic
> > > to arm_neon.h through the generator ML scripts and also adds the
> > > built-ins to which the intrinsics will map to. The generator ML
> > > scripts are updated and used to generate the relevant .texi
> > > documentation, arm_neon.h and the tests in gcc.target/arm/neon .
> >
> > FWIW, some of the changes to neon*.ml can be simplified somewhat --
> my
> > attempt at an improved version of those bits is attached. I'm still
> > not too happy with mode_suffix, but these new instructions require
> > adding semantics to parts of the generator program which weren't
> > really very well-defined to start with :-). I appreciate that it's a
> > bit of a tangle...
> 
> I thought of an improvement to the mode_suffix part from the last
> version of the patch, so here it is. I'm done fiddling with this now,
> so back to you!

Thanks for looking at it! My Ocaml-fu is rather limited.
It does look cleaner now.
Here it is together with all the other parts of the patch, plus some
minor formatting changes.

Ok for trunk now?

gcc/ChangeLog
2013-04-17  Kyrylo Tkachov  
Julian Brown  

* config/arm/arm.c (neon_builtin_type_mode): Add T_V4HF.
(TB_DREG): Add T_V4HF.
(v4hf_UP): New macro.
(neon_itype): Add NEON_FLOAT_WIDEN, NEON_FLOAT_NARROW.
(arm_init_neon_builtins): Handle NEON_FLOAT_WIDEN,
NEON_FLOAT_NARROW.
Handle initialisation of V4HF. Adjust initialisation of reinterpret
built-ins.
(arm_expand_neon_builtin): Handle NEON_FLOAT_WIDEN,
NEON_FLOAT_NARROW.
(arm_vector_mode_supported_p): Handle V4HF.
(arm_mangle_map): Handle V4HFmode.
* config/arm/arm.h (VALID_NEON_DREG_MODE): Add V4HF.
* config/arm/arm_neon_builtins.def: Add entries for
vcvtv4hfv4sf, vcvtv4sfv4hf.
* config/arm/neon.md (neon_vcvtv4sfv4hf): New pattern.
(neon_vcvtv4hfv4sf): Likewise.
* config/arm/neon-gen.ml: Handle half-precision floating point
features.
* config/arm/neon-testgen.ml: Handle Requires_FP_bit feature.
* config/arm/arm_neon.h: Regenerate.
* config/arm/neon.ml (type elts): Add F16.
(type vectype): Add T_float16x4, T_floatHF.
(type vecmode): Add V4HF.
(type features): Add Requires_FP_bit feature.
(elt_width): Handle F16.
(elt_class): Likewise.
(elt_of_class_width): Likewise.
(mode_of_elt): Refactor.
(type_for_elt): Handle F16, fix error messages.
(vectype_size): Handle T_float16x4.
(vcvt_sh): New function.
(ops): Add entries for vcvt_f16_f32, vcvt_f32_f16.
(string_of_vectype): Handle T_floatHF, T_float16, T_float16x4.
(string_of_mode): Handle V4HF.
* doc/arm-neon-intrinsics.texi: Regenerate.


gcc/testsuite/ChangeLog
2013-04-17  Kyrylo Tkachov  
Julian Brown  

* gcc.target/arm/neon/vcvtf16_f32.c: New test. Generated.
* gcc.target/arm/neon/vcvtf32_f16.c: Likewise.


neon-vcvt-intrinsics.patch
Description: Binary data


Re: [PATCH][ARM][1/2] Add support for vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsics

2013-04-13 Thread Julian Brown
On Fri, 12 Apr 2013 20:09:39 +0100
Julian Brown  wrote:

> On Fri, 12 Apr 2013 15:19:18 +0100
> Kyrylo Tkachov  wrote:
> 
> > Hi all,
> > 
> > This patch adds the vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsic
> > to arm_neon.h through the generator ML scripts and also adds the
> > built-ins to which the intrinsics will map to. The generator ML
> > scripts are updated and used to generate the relevant .texi
> > documentation, arm_neon.h and the tests in gcc.target/arm/neon .
> 
> FWIW, some of the changes to neon*.ml can be simplified somewhat -- my
> attempt at an improved version of those bits is attached. I'm still
> not too happy with mode_suffix, but these new instructions require
> adding semantics to parts of the generator program which weren't
> really very well-defined to start with :-). I appreciate that it's a
> bit of a tangle...

I thought of an improvement to the mode_suffix part from the last
version of the patch, so here it is. I'm done fiddling with this now,
so back to you!

Cheers,

JulianIndex: neon-gen.ml
===
--- neon-gen.ml	(revision 197804)
+++ neon-gen.ml	(working copy)
@@ -121,6 +121,7 @@ let rec signed_ctype = function
   | T_uint16 | T_int16 -> T_intHI
   | T_uint32 | T_int32 -> T_intSI
   | T_uint64 | T_int64 -> T_intDI
+  | T_float16 -> T_floatHF
   | T_float32 -> T_floatSF
   | T_poly8 -> T_intQI
   | T_poly16 -> T_intHI
@@ -275,8 +276,8 @@ let rec mode_suffix elttype shape =
 let mode = mode_of_elt elttype shape in
 string_of_mode mode
   with MixedMode (dst, src) ->
-let dstmode = mode_of_elt dst shape
-and srcmode = mode_of_elt src shape in
+let dstmode = mode_of_elt ~argpos:0 dst shape
+and srcmode = mode_of_elt ~argpos:1 src shape in
 string_of_mode dstmode ^ string_of_mode srcmode
 
 let get_shuffle features =
@@ -291,19 +292,24 @@ let print_feature_test_start features =
 match List.find (fun feature ->
match feature with Requires_feature _ -> true
 | Requires_arch _ -> true
+| Requires_FP_bit _ -> true
 | _ -> false)
  features with
-  Requires_feature feature -> 
+  Requires_feature feature ->
 Format.printf "#ifdef __ARM_FEATURE_%s@\n" feature
 | Requires_arch arch ->
 Format.printf "#if __ARM_ARCH >= %d@\n" arch
+| Requires_FP_bit bit ->
+Format.printf "#if ((__ARM_FP & 0x%X) != 0)@\n"
+  (1 lsl bit)
 | _ -> assert false
   with Not_found -> assert true
 
 let print_feature_test_end features =
   let feature =
-List.exists (function Requires_feature x -> true
-  | Requires_arch x -> true
+List.exists (function Requires_feature _ -> true
+  | Requires_arch _ -> true
+  | Requires_FP_bit _ -> true
   |  _ -> false) features in
   if feature then Format.printf "#endif@\n"
 
@@ -365,6 +371,7 @@ let deftypes () =
 "__builtin_neon_hi", "int", 16, 4;
 "__builtin_neon_si", "int", 32, 2;
 "__builtin_neon_di", "int", 64, 1;
+"__builtin_neon_hf", "float", 16, 4;
 "__builtin_neon_sf", "float", 32, 2;
 "__builtin_neon_poly8", "poly", 8, 8;
 "__builtin_neon_poly16", "poly", 16, 4;
Index: neon.ml
===
--- neon.ml	(revision 197804)
+++ neon.ml	(working copy)
@@ -21,7 +21,7 @@
.  *)
 
 (* Shorthand types for vector elements.  *)
-type elts = S8 | S16 | S32 | S64 | F32 | U8 | U16 | U32 | U64 | P8 | P16
+type elts = S8 | S16 | S32 | S64 | F16 | F32 | U8 | U16 | U32 | U64 | P8 | P16
   | I8 | I16 | I32 | I64 | B8 | B16 | B32 | B64 | Conv of elts * elts
   | Cast of elts * elts | NoElts
 
@@ -37,6 +37,7 @@ type vectype = T_int8x8| T_int8x16
 	 | T_uint16x4  | T_uint16x8
 	 | T_uint32x2  | T_uint32x4
 	 | T_uint64x1  | T_uint64x2
+	 | T_float16x4
 	 | T_float32x2 | T_float32x4
 	 | T_poly8x8   | T_poly8x16
 	 | T_poly16x4  | T_poly16x8
@@ -46,11 +47,13 @@ type vectype = T_int8x8| T_int8x16
  | T_uint8 | T_uint16
  | T_uint32| T_uint64
  | T_poly8 | T_poly16
- | T_float32   | T_arrayof of int * vectype
+ | T_float16   | T_float32
+ | T_arrayof of int * vectype
  | T_ptrto of vectype | T_const of vectype
  | T_void  | T_intQI
  | T_intHI | T_intSI
- | T_intDI | T_floatSF
+ | T_intDI | T_floatHF
+ | T_floatSF
 
 (* The meanings of the following are:
  TImode : "Tetra", two registers (four words).
@@ -93,7 +96,7 @@ type arity = Arity0 of vectype
| Arity4 of vectype * vectype * vectype * vectype * vectype
 
 type vecm

Re: [PATCH][ARM][1/2] Add support for vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsics

2013-04-12 Thread Julian Brown
On Fri, 12 Apr 2013 15:19:18 +0100
Kyrylo Tkachov  wrote:

> Hi all,
> 
> This patch adds the vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsic
> to arm_neon.h through the generator ML scripts and also adds the
> built-ins to which the intrinsics will map to. The generator ML
> scripts are updated and used to generate the relevant .texi
> documentation, arm_neon.h and the tests in gcc.target/arm/neon .

FWIW, some of the changes to neon*.ml can be simplified somewhat -- my
attempt at an improved version of those bits is attached. I'm still not
too happy with mode_suffix, but these new instructions require adding
semantics to parts of the generator program which weren't really very
well-defined to start with :-). I appreciate that it's a bit of a
tangle...

Output from this version remains the same as yours.

HTH,

JulianIndex: neon-gen.ml
===
--- neon-gen.ml	(revision 197804)
+++ neon-gen.ml	(working copy)
@@ -121,6 +121,7 @@ let rec signed_ctype = function
   | T_uint16 | T_int16 -> T_intHI
   | T_uint32 | T_int32 -> T_intSI
   | T_uint64 | T_int64 -> T_intDI
+  | T_float16 -> T_floatHF
   | T_float32 -> T_floatSF
   | T_poly8 -> T_intQI
   | T_poly16 -> T_intHI
@@ -275,8 +276,14 @@ let rec mode_suffix elttype shape =
 let mode = mode_of_elt elttype shape in
 string_of_mode mode
   with MixedMode (dst, src) ->
-let dstmode = mode_of_elt dst shape
-and srcmode = mode_of_elt src shape in
+let dstmode, srcmode =
+  match shape with
+	Use_operands [| d; s |] ->
+	  mode_of_elt dst (All (0, d)),
+	  mode_of_elt src (All (0, s))
+  | _ ->
+	  mode_of_elt dst shape,
+	  mode_of_elt src shape in
 string_of_mode dstmode ^ string_of_mode srcmode
 
 let get_shuffle features =
@@ -291,19 +298,24 @@ let print_feature_test_start features =
 match List.find (fun feature ->
match feature with Requires_feature _ -> true
 | Requires_arch _ -> true
+| Requires_FP_bit _ -> true
 | _ -> false)
  features with
-  Requires_feature feature -> 
+  Requires_feature feature ->
 Format.printf "#ifdef __ARM_FEATURE_%s@\n" feature
 | Requires_arch arch ->
 Format.printf "#if __ARM_ARCH >= %d@\n" arch
+| Requires_FP_bit bit ->
+Format.printf "#if ((__ARM_FP & 0x%X) != 0)@\n"
+  (1 lsl bit)
 | _ -> assert false
   with Not_found -> assert true
 
 let print_feature_test_end features =
   let feature =
-List.exists (function Requires_feature x -> true
-  | Requires_arch x -> true
+List.exists (function Requires_feature _ -> true
+  | Requires_arch _ -> true
+  | Requires_FP_bit _ -> true
   |  _ -> false) features in
   if feature then Format.printf "#endif@\n"
 
@@ -365,6 +377,7 @@ let deftypes () =
 "__builtin_neon_hi", "int", 16, 4;
 "__builtin_neon_si", "int", 32, 2;
 "__builtin_neon_di", "int", 64, 1;
+"__builtin_neon_hf", "float", 16, 4;
 "__builtin_neon_sf", "float", 32, 2;
 "__builtin_neon_poly8", "poly", 8, 8;
 "__builtin_neon_poly16", "poly", 16, 4;
Index: neon.ml
===
--- neon.ml	(revision 197804)
+++ neon.ml	(working copy)
@@ -21,7 +21,7 @@
.  *)
 
 (* Shorthand types for vector elements.  *)
-type elts = S8 | S16 | S32 | S64 | F32 | U8 | U16 | U32 | U64 | P8 | P16
+type elts = S8 | S16 | S32 | S64 | F16 | F32 | U8 | U16 | U32 | U64 | P8 | P16
   | I8 | I16 | I32 | I64 | B8 | B16 | B32 | B64 | Conv of elts * elts
   | Cast of elts * elts | NoElts
 
@@ -37,6 +37,7 @@ type vectype = T_int8x8| T_int8x16
 	 | T_uint16x4  | T_uint16x8
 	 | T_uint32x2  | T_uint32x4
 	 | T_uint64x1  | T_uint64x2
+	 | T_float16x4
 	 | T_float32x2 | T_float32x4
 	 | T_poly8x8   | T_poly8x16
 	 | T_poly16x4  | T_poly16x8
@@ -46,11 +47,13 @@ type vectype = T_int8x8| T_int8x16
  | T_uint8 | T_uint16
  | T_uint32| T_uint64
  | T_poly8 | T_poly16
- | T_float32   | T_arrayof of int * vectype
+ | T_float16   | T_float32
+ | T_arrayof of int * vectype
  | T_ptrto of vectype | T_const of vectype
  | T_void  | T_intQI
  | T_intHI | T_intSI
- | T_intDI | T_floatSF
+ | T_intDI | T_floatHF
+ | T_floatSF
 
 (* The meanings of the following are:
  TImode : "Tetra", two registers (four words).
@@ -93,7 +96,7 @@ type arity = Arity0 of vectype
| Arity4 of vectype * vectype * vectype * vectype * vectype
 
 type vecmode = V8QI | V4HI | V2SI | V2SF | DI
- | V16QI | V8HI | V4SI | V4SF 

[PATCH][ARM][1/2] Add support for vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsics

2013-04-12 Thread Kyrylo Tkachov
Hi all,

This patch adds the vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsic
to arm_neon.h through the generator ML scripts and also adds the
built-ins to which the intrinsics will map to. The generator ML scripts
are updated and used to generate the relevant .texi documentation,
arm_neon.h and the tests in gcc.target/arm/neon .

The new intrinsics are guarded by checking the __ARM_FP predefine
as described in ACLE. The second bit of the macro defines
half-precision floating point support, so the intrinsics are guarded by:
#if ((__ARM_FP & 0x2) != 0)

In arm.c I had to add handling of half-precision floats
(and their vector forms) in quite a few places.
I hope I didn't miss any part out.

Testing arm-none-eabi on qemu showed no regressions.

Ok for trunk?

Thanks,
Kyrill

gcc/ChangeLog
2013-04-12  Kyrylo Tkachov  

* config/arm/arm.c (neon_builtin_type_mode): Add T_V4HF.
(TB_DREG): Add T_V4HF.
(v4hf_UP): New macro.
(neon_itype): Add NEON_FLOAT_WIDEN, NEON_FLOAT_NARROW.
(arm_init_neon_builtins): Handle NEON_FLOAT_WIDEN,
NEON_FLOAT_NARROW.
Handle initialisation of V4HF. Adjust initialisation of reinterpret
built-ins.
(arm_expand_neon_builtin): Handle NEON_FLOAT_WIDEN,
NEON_FLOAT_NARROW.
(arm_vector_mode_supported_p): Handle V4HF.
(arm_mangle_map): Handle V4HFmode.
* config/arm/arm.h (VALID_NEON_DREG_MODE): Add V4HF.
* config/arm/arm_neon_builtins.def: Add entries for
vcvtv4hfv4sf, vcvtv4sfv4hf.
* config/arm/neon.md (neon_vcvtv4sfv4hf): New pattern.
(neon_vcvtv4hfv4sf): Likewise.
* config/arm/neon-gen.ml: Handle half-precision floating point
features.
* config/arm/neon-testgen.ml: Handle Requires_FP_bit feature.
* config/arm/arm_neon.h: Regenerate.
* config/arm/neon.ml (type elts): Add F16.
(type vectype): Add T_float16x4, T_floatHF.
(type vecmode): Add V4HF.
(string_of_mode): Move earlier in the file.
(type features): Add Requires_FP_bit feature.
(elt_width): Handle F16.
(elt_class): Likewise.
(elt_of_class_width): Likewise.
(mode_of_elt_str): New function.
(type_for_elt): Handle F16, fix error messages.
(vectype_size): Handle T_float16x4.
(vcvt_sh): New function.
(ops): Add entries for vcvt_f16_f32, vcvt_f32_f16.
(string_of_vectype): Handle T_floatHF, T_float16, T_float16x4.
* doc/arm-neon-intrinsics.texi: Regenerate.


gcc/testsuite/ChangeLog
2013-04-12  Kyrylo Tkachov  

* gcc.target/arm/neon/vcvtf16_f32.c: New test. Generated.
* gcc.target/arm/neon/vcvtf32_f16.c: Likewise.

neon-vcvt-intrinsics-temp.patch
Description: Binary data