Re: [RFC 1/2] RISC-V: Add support for _Bfloat16.

2023-11-10 Thread Jeff Law




On 10/25/23 04:15, Jin Ma wrote:

+;; The conversion of DF to BF needs to be done with SF if there is a
+;; chance to generate at least one instruction, otherwise just using
+;; libfunc __truncdfbf2.
+(define_expand "truncdfbf2"
+  [(set (match_operand:BF 0 "register_operand" "=f")
+   (float_truncate:BF
+   (match_operand:DF 1 "register_operand" " f")))]
+  "TARGET_DOUBLE_FLOAT || TARGET_ZDINX"
+  {
+convert_move (operands[0],
+ convert_modes (SFmode, DFmode, operands[1], 0), 0);
+DONE;
+  })

So for conversions to/from BFmode, doesn't generic code take care of
this for us?  Search for convert_mode_scalar in expr.cc. That code will
utilize SFmode as an intermediate step just like your expander.   Is
there some reason that generic code is insufficient?

Similarly for the the other conversions.


As far as I can see, the function 'convert_mode_scalar' doesn't seem to be 
perfect for
dealing with the conversions to/from BFmode. It can only handle BF to HF, SF, 
DF and
SF to BF well, but the rest of the conversion without any processing, directly 
using
the libcall.

Maybe I should choose to enhance its functionality? This seems to be a
good choice, I'm not sure.My recollection was that BF could be converted 
to/from SF trivially and

if we wanted BF->DF we'd first convert to SF, then to DF.

Direct BF<->DF conversions aren't actually important from a performance
standpoint.  So it's OK if they have an extra step IMHO.


Thank you very much for your review and detailed reply. Maybe there are some 
problems with my expression
and I am a little confused about your guidance. My understanding is that you 
also think that it is reasonable to
convert through SF, right? In fact, this is what I did.
My point was that I would expect the generic code to handle the 
conversion and that we didn't need to handle it explicitly in the RISC-V 
backend.


Meaning that I don't think we need a define_expand for truncdfbf2, 
fix_truncbf2, fixuns_truncbf2, floatbf2, or 
floatunsbf2.





In this patch, my thoughts are as follows:

The general principle is to use the real instructions instead of libcall as 
much as possible for conversions,
while minimizing the definition of libcall(only reusing which has been defined 
by other architectures such
as aarch64). If SF can be used as a transit, it is preferred to convert to SF, 
otherwise libcall is directly used.

1. For the conversions between floating points

For BF->DF, as you said, the function 'convert_mode_scalar' in the general code 
has been well implemented,
which will be expressed as BF->SF->DF. And the generated instruction list may 
be as follows:
   'call __extendbfsf2' + 'call __extendsfdf2' (when only soft floating point 
support);
   'call __extendbfsf2' + 'fcvt.d.s'   (when (TARGET_DOUBLE_FLOAT || 
TARGET_ZDINX) is true);
   'fcvt.s.bf16'+ 'fcvt.d.s'   (when ((TARGET_DOUBLE_FLOAT || 
TARGET_ZDINX) && TARGET_ZFBFMIN) is true)

For DF->BF, if any of fcvt.s.d and fcvt.bf16.s cannot be generated, the 'call 
__truncdfbf2' is directly generated
by the function 'convert_mode_scalar'. Otherwise the new pattern(define_expand 
"truncdfbf2") is used. This
makes it possible to implement DF->BF by 'fcvt.s.d' + 'fcvt.bf16.s', which 
cannot be generated by the function
'convert_mode_scala'.
But I would have expected convert_mode_scalar to generate DF->BF by 
first truncating to SF, then to BF.   If that is missing for truncation, 
then we should add it to convert_mode_scalar rather than expressing it 
as a backend expander.








2. For the conversions between integer and BF, it seems that gcc only uses 
libcall to implement it, but this is
obviously wrong. For example, the conversion BF->SI directly calls the 
unimplemented libcall __fixunsbfsi.
So I added some new pattern to handle these transformations with SF.
I would suggest these move into target independent code as well. 
There's no reason I'm aware of that they should be implemented entirely 
in a target machine description.  We're not really doing anything target 
specific in here.


jeff


Re: [RFC 1/2] RISC-V: Add support for _Bfloat16.

2023-10-25 Thread Jin Ma
> >>> +;; The conversion of DF to BF needs to be done with SF if there is a
> >>> +;; chance to generate at least one instruction, otherwise just using
> >>> +;; libfunc __truncdfbf2.
> >>> +(define_expand "truncdfbf2"
> >>> +  [(set (match_operand:BF 0 "register_operand" "=f")
> >>> +   (float_truncate:BF
> >>> +   (match_operand:DF 1 "register_operand" " f")))]
> >>> +  "TARGET_DOUBLE_FLOAT || TARGET_ZDINX"
> >>> +  {
> >>> +convert_move (operands[0],
> >>> +   convert_modes (SFmode, DFmode, operands[1], 0), 0);
> >>> +DONE;
> >>> +  })
> >> So for conversions to/from BFmode, doesn't generic code take care of
> >> this for us?  Search for convert_mode_scalar in expr.cc. That code will
> >> utilize SFmode as an intermediate step just like your expander.   Is
> >> there some reason that generic code is insufficient?
> >>
> >> Similarly for the the other conversions.
> > 
> > As far as I can see, the function 'convert_mode_scalar' doesn't seem to be 
> > perfect for
> > dealing with the conversions to/from BFmode. It can only handle BF to HF, 
> > SF, DF and
> > SF to BF well, but the rest of the conversion without any processing, 
> > directly using
> > the libcall.
> > 
> > Maybe I should choose to enhance its functionality? This seems to be a
> > good choice, I'm not sure.My recollection was that BF could be converted 
> > to/from SF trivially and 
> if we wanted BF->DF we'd first convert to SF, then to DF.
> 
> Direct BF<->DF conversions aren't actually important from a performance 
> standpoint.  So it's OK if they have an extra step IMHO.

Thank you very much for your review and detailed reply. Maybe there are some 
problems with my expression
and I am a little confused about your guidance. My understanding is that you 
also think that it is reasonable to
convert through SF, right? In fact, this is what I did.

In this patch, my thoughts are as follows:

The general principle is to use the real instructions instead of libcall as 
much as possible for conversions,
while minimizing the definition of libcall(only reusing which has been defined 
by other architectures such
as aarch64). If SF can be used as a transit, it is preferred to convert to SF, 
otherwise libcall is directly used.

1. For the conversions between floating points

For BF->DF, as you said, the function 'convert_mode_scalar' in the general code 
has been well implemented,
which will be expressed as BF->SF->DF. And the generated instruction list may 
be as follows:
  'call __extendbfsf2' + 'call __extendsfdf2' (when only soft floating point 
support);
  'call __extendbfsf2' + 'fcvt.d.s'   (when (TARGET_DOUBLE_FLOAT || 
TARGET_ZDINX) is true);
  'fcvt.s.bf16'+ 'fcvt.d.s'   (when ((TARGET_DOUBLE_FLOAT || 
TARGET_ZDINX) && TARGET_ZFBFMIN) is true)

For DF->BF, if any of fcvt.s.d and fcvt.bf16.s cannot be generated, the 'call 
__truncdfbf2' is directly generated
by the function 'convert_mode_scalar'. Otherwise the new pattern(define_expand 
"truncdfbf2") is used. This
makes it possible to implement DF->BF by 'fcvt.s.d' + 'fcvt.bf16.s', which 
cannot be generated by the function
'convert_mode_scala'.

2. For the conversions between integer and BF, it seems that gcc only uses 
libcall to implement it, but this is
obviously wrong. For example, the conversion BF->SI directly calls the 
unimplemented libcall __fixunsbfsi.
So I added some new pattern to handle these transformations with SF.

Thanks,

Jin

> 
> jeff

Re: [RFC 1/2] RISC-V: Add support for _Bfloat16.

2023-10-09 Thread Jeff Law




On 10/9/23 00:18, Jin Ma wrote:


+;; The conversion of DF to BF needs to be done with SF if there is a
+;; chance to generate at least one instruction, otherwise just using
+;; libfunc __truncdfbf2.
+(define_expand "truncdfbf2"
+  [(set (match_operand:BF 0 "register_operand" "=f")
+   (float_truncate:BF
+   (match_operand:DF 1 "register_operand" " f")))]
+  "TARGET_DOUBLE_FLOAT || TARGET_ZDINX"
+  {
+convert_move (operands[0],
+ convert_modes (SFmode, DFmode, operands[1], 0), 0);
+DONE;
+  })

So for conversions to/from BFmode, doesn't generic code take care of
this for us?  Search for convert_mode_scalar in expr.cc. That code will
utilize SFmode as an intermediate step just like your expander.   Is
there some reason that generic code is insufficient?

Similarly for the the other conversions.


As far as I can see, the function 'convert_mode_scalar' doesn't seem to be 
perfect for
dealing with the conversions to/from BFmode. It can only handle BF to HF, SF, 
DF and
SF to BF well, but the rest of the conversion without any processing, directly 
using
the libcall.

Maybe I should choose to enhance its functionality? This seems to be a
good choice, I'm not sure.My recollection was that BF could be converted to/from SF trivially and 

if we wanted BF->DF we'd first convert to SF, then to DF.

Direct BF<->DF conversions aren't actually important from a performance 
standpoint.  So it's OK if they have an extra step IMHO.


jeff


Re: [RFC 1/2] RISC-V: Add support for _Bfloat16.

2023-10-09 Thread Jin Ma
> On 9/19/23 02:44, Jin Ma wrote:
> > gcc/ChangeLog:
> > 
> > * config/riscv/iterators.md (HFBF): New.
> > * config/riscv/riscv-builtins.cc (riscv_init_builtin_types):
> > Initialize data type_Bfloat16.
> > * config/riscv/riscv-modes.def (FLOAT_MODE): New.
> > (ADJUST_FLOAT_FORMAT): New.
> > * config/riscv/riscv.cc (riscv_mangle_type): Support for BFmode.
> > (riscv_scalar_mode_supported_p): Ditto.
> > (riscv_libgcc_floating_mode_supported_p): Ditto.
> > (riscv_block_arith_comp_libfuncs_for_mode): New.
> > (riscv_init_libfuncs): Opening and closing some libfuncs for BFmode.
> > * config/riscv/riscv.md (mode" ): Add BF.
> > (truncdfbf2): New.
> > (movhf): Support for BFmode.
> > (mov): Ditto.
> > (*mov_softfloat):  Ditto.
> > (fix_truncbf2): New.
> > (fixuns_truncbf2): New.
> > (floatbf2): New.
> > (floatunsbf2): New.
> > 
> > libgcc/ChangeLog:
> > 
> > * config/riscv/sfp-machine.h (_FP_NANFRAC_B): New.
> > (_FP_NANSIGN_B): New.
> > * config/riscv/t-softfp32: Add support for BF libfuncs.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/riscv/bf16_arithmetic.c: New test.
> > * gcc.target/riscv/bf16_call.c: New test.
> > * gcc.target/riscv/bf16_comparisons.c: New test.
> > * gcc.target/riscv/bf16_convert-1.c: New test.
> > * gcc.target/riscv/bf16_convert-2.c: New test.
> > * gcc.target/riscv/bf16_convert_run.c: New test.
> So this can't go in the tree until the extension has moved into a frozen 
> state.  Hopefully that'll happen before we close stage1 development in Nov.

Ok, this is very reasonable.

> 
> 
> > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > index e00b8ee3579..5048628c784 100644
> > --- a/gcc/config/riscv/riscv.md
> > +++ b/gcc/config/riscv/riscv.md
> > @@ -1631,6 +1631,20 @@ (define_insn "truncdfhf2"
> > [(set_attr "type" "fcvt")
> >  (set_attr "mode" "HF")])
> >   
> > +;; The conversion of DF to BF needs to be done with SF if there is a
> > +;; chance to generate at least one instruction, otherwise just using
> > +;; libfunc __truncdfbf2.
> > +(define_expand "truncdfbf2"
> > +  [(set (match_operand:BF 0 "register_operand" "=f")
> > +   (float_truncate:BF
> > +   (match_operand:DF 1 "register_operand" " f")))]
> > +  "TARGET_DOUBLE_FLOAT || TARGET_ZDINX"
> > +  {
> > +convert_move (operands[0],
> > + convert_modes (SFmode, DFmode, operands[1], 0), 0);
> > +DONE;
> > +  })
> So for conversions to/from BFmode, doesn't generic code take care of 
> this for us?  Search for convert_mode_scalar in expr.cc. That code will 
> utilize SFmode as an intermediate step just like your expander.   Is 
> there some reason that generic code is insufficient?
>
> Similarly for the the other conversions.

As far as I can see, the function 'convert_mode_scalar' doesn't seem to be 
perfect for
dealing with the conversions to/from BFmode. It can only handle BF to HF, SF, 
DF and
SF to BF well, but the rest of the conversion without any processing, directly 
using
the libcall.

Maybe I should choose to enhance its functionality? This seems to be a
good choice, I'm not sure.

Jin

> 
> Otherwise it looks pretty good.
> 
> Jeff

Re: [RFC 1/2] RISC-V: Add support for _Bfloat16.

2023-09-29 Thread Jeff Law




On 9/19/23 02:44, Jin Ma wrote:

gcc/ChangeLog:

* config/riscv/iterators.md (HFBF): New.
* config/riscv/riscv-builtins.cc (riscv_init_builtin_types):
Initialize data type_Bfloat16.
* config/riscv/riscv-modes.def (FLOAT_MODE): New.
(ADJUST_FLOAT_FORMAT): New.
* config/riscv/riscv.cc (riscv_mangle_type): Support for BFmode.
(riscv_scalar_mode_supported_p): Ditto.
(riscv_libgcc_floating_mode_supported_p): Ditto.
(riscv_block_arith_comp_libfuncs_for_mode): New.
(riscv_init_libfuncs): Opening and closing some libfuncs for BFmode.
* config/riscv/riscv.md (mode" ): Add BF.
(truncdfbf2): New.
(movhf): Support for BFmode.
(mov): Ditto.
(*mov_softfloat):  Ditto.
(fix_truncbf2): New.
(fixuns_truncbf2): New.
(floatbf2): New.
(floatunsbf2): New.

libgcc/ChangeLog:

* config/riscv/sfp-machine.h (_FP_NANFRAC_B): New.
(_FP_NANSIGN_B): New.
* config/riscv/t-softfp32: Add support for BF libfuncs.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/bf16_arithmetic.c: New test.
* gcc.target/riscv/bf16_call.c: New test.
* gcc.target/riscv/bf16_comparisons.c: New test.
* gcc.target/riscv/bf16_convert-1.c: New test.
* gcc.target/riscv/bf16_convert-2.c: New test.
* gcc.target/riscv/bf16_convert_run.c: New test.
So this can't go in the tree until the extension has moved into a frozen 
state.  Hopefully that'll happen before we close stage1 development in Nov.





diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index e00b8ee3579..5048628c784 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1631,6 +1631,20 @@ (define_insn "truncdfhf2"
[(set_attr "type" "fcvt")
 (set_attr "mode" "HF")])
  
+;; The conversion of DF to BF needs to be done with SF if there is a

+;; chance to generate at least one instruction, otherwise just using
+;; libfunc __truncdfbf2.
+(define_expand "truncdfbf2"
+  [(set (match_operand:BF 0 "register_operand" "=f")
+   (float_truncate:BF
+   (match_operand:DF 1 "register_operand" " f")))]
+  "TARGET_DOUBLE_FLOAT || TARGET_ZDINX"
+  {
+convert_move (operands[0],
+ convert_modes (SFmode, DFmode, operands[1], 0), 0);
+DONE;
+  })
So for conversions to/from BFmode, doesn't generic code take care of 
this for us?  Search for convert_mode_scalar in expr.cc. That code will 
utilize SFmode as an intermediate step just like your expander.   Is 
there some reason that generic code is insufficient?


Similarly for the the other conversions.

Otherwise it looks pretty good.

Jeff


[RFC 1/2] RISC-V: Add support for _Bfloat16.

2023-09-19 Thread Jin Ma
gcc/ChangeLog:

* config/riscv/iterators.md (HFBF): New.
* config/riscv/riscv-builtins.cc (riscv_init_builtin_types):
Initialize data type_Bfloat16.
* config/riscv/riscv-modes.def (FLOAT_MODE): New.
(ADJUST_FLOAT_FORMAT): New.
* config/riscv/riscv.cc (riscv_mangle_type): Support for BFmode.
(riscv_scalar_mode_supported_p): Ditto.
(riscv_libgcc_floating_mode_supported_p): Ditto.
(riscv_block_arith_comp_libfuncs_for_mode): New.
(riscv_init_libfuncs): Opening and closing some libfuncs for BFmode.
* config/riscv/riscv.md (mode" ): Add BF.
(truncdfbf2): New.
(movhf): Support for BFmode.
(mov): Ditto.
(*mov_softfloat):  Ditto.
(fix_truncbf2): New.
(fixuns_truncbf2): New.
(floatbf2): New.
(floatunsbf2): New.

libgcc/ChangeLog:

* config/riscv/sfp-machine.h (_FP_NANFRAC_B): New.
(_FP_NANSIGN_B): New.
* config/riscv/t-softfp32: Add support for BF libfuncs.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/bf16_arithmetic.c: New test.
* gcc.target/riscv/bf16_call.c: New test.
* gcc.target/riscv/bf16_comparisons.c: New test.
* gcc.target/riscv/bf16_convert-1.c: New test.
* gcc.target/riscv/bf16_convert-2.c: New test.
* gcc.target/riscv/bf16_convert_run.c: New test.
---
 gcc/config/riscv/iterators.md |   2 +
 gcc/config/riscv/riscv-builtins.cc|  16 ++
 gcc/config/riscv/riscv-modes.def  |   4 +
 gcc/config/riscv/riscv.cc |  93 --
 gcc/config/riscv/riscv.md |  94 --
 .../gcc.target/riscv/bf16_arithmetic.c|  36 
 gcc/testsuite/gcc.target/riscv/bf16_call.c|  17 ++
 .../gcc.target/riscv/bf16_comparisons.c   |  25 +++
 .../gcc.target/riscv/bf16_convert-1.c |  39 +
 .../gcc.target/riscv/bf16_convert-2.c |  38 
 .../gcc.target/riscv/bf16_convert_run.c   | 163 ++
 libgcc/config/riscv/sfp-machine.h |   3 +
 libgcc/config/riscv/t-softfp32|   7 +-
 13 files changed, 503 insertions(+), 34 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/bf16_arithmetic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/bf16_call.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/bf16_comparisons.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/bf16_convert-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/bf16_convert-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/bf16_convert_run.c

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index ecf033f2fa7..73523b73fdd 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -84,6 +84,8 @@ (define_mode_iterator SOFTF [SF (DF "TARGET_64BIT") (HF 
"TARGET_ZFHMIN")])
 ;; instruction.
 (define_mode_attr size [(QI "b") (HI "h")])
 
+(define_mode_iterator HFBF [HF BF])
+
 ;; Mode attributes for loads.
 (define_mode_attr load [(QI "lb") (HI "lh") (SI "lw") (DI "ld") (HF "flh") (SF 
"flw") (DF "fld")])
 
diff --git a/gcc/config/riscv/riscv-builtins.cc 
b/gcc/config/riscv/riscv-builtins.cc
index 3fe3a89dcc2..b7bb89794f7 100644
--- a/gcc/config/riscv/riscv-builtins.cc
+++ b/gcc/config/riscv/riscv-builtins.cc
@@ -192,6 +192,7 @@ static GTY(()) int riscv_builtin_decl_index[NUM_INSN_CODES];
   riscv_builtin_decls[riscv_builtin_decl_index[(CODE)]]
 
 tree riscv_float16_type_node = NULL_TREE;
+tree riscv_bfloat16_type_node = NULL_TREE;
 
 /* Return the function type associated with function prototype TYPE.  */
 
@@ -235,6 +236,21 @@ riscv_init_builtin_types (void)
   if (!maybe_get_identifier ("_Float16"))
 lang_hooks.types.register_builtin_type (riscv_float16_type_node,
"_Float16");
+
+  /* Provide the _Bfloat16 type and bfloat16_type_node if needed.  */
+  if (!bfloat16_type_node)
+{
+  riscv_bfloat16_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (riscv_bfloat16_type_node) = 16;
+  SET_TYPE_MODE (riscv_bfloat16_type_node, BFmode);
+  layout_type (riscv_bfloat16_type_node);
+}
+  else
+riscv_bfloat16_type_node = bfloat16_type_node;
+
+  if (!maybe_get_identifier ("_Bfloat16"))
+lang_hooks.types.register_builtin_type (riscv_bfloat16_type_node,
+   "_Bfloat16");
 }
 
 /* Implement TARGET_INIT_BUILTINS.  */
diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
index e3c6ccb2809..723bfaee42d 100644
--- a/gcc/config/riscv/riscv-modes.def
+++ b/gcc/config/riscv/riscv-modes.def
@@ -22,6 +22,10 @@ along with GCC; see the file COPYING3.  If not see
 FLOAT_MODE (HF, 2, ieee_half_format);
 FLOAT_MODE (TF, 16, ieee_quad_format);
 
+FLOAT_MODE (BF, 2, 0);
+/* Reuse definition from arm.  */
+ADJUST_FLOAT_FORMAT (BF, _bfloat_half_format);
+
 /* Vector modes.  */
 
 /* Encode the ratio of