Re: [PATCH] [ARC] Add single/double IEEE precission FPU support.

Joern Wolfgang Rennecke Tue, 02 Feb 2016 14:52:37 -0800


On 01/02/16 13:57, Claudiu Zissulescu wrote:

In this patch, we add support for the new FPU instructions available with
ARC V2 processors.  The new FPU instructions covers both single and
double precision IEEE formats. While the single precision is available
for both ARC EM and ARC HS processors, the double precision is only
available for ARC HS. ARC EM will make use of the double precision assist
instructions which are in fact FPX double instructions.  The double
floating point precision instructions are making use of the odd-even
register pairs to hold 64-bit datums, exactly like in the case of ldd/std
instructions.

Additional to the mods required by FPU instructions to be supported by
GCC, we forced all the 64 bit datum to use odd-even register pairs (HS
only), as we observed a better usage of the ldd/std, and less generated
move instructions.  A byproduct of this optimization, is a new ABI, which
places the 64-bit arguments into odd-even register pairs.  This behavior
can be selected using -mabi option.

Feedback is welcomed,

  VECTOR_MODES (INT, 16);       /* V16QI V8HI V4SI V2DI */
+
+/* FPU conditon flags. */

Typo

+ error ("FPU double precission options are available for ARC HSonly.");


There should be no period at the end of the error message string.

+      if (TARGET_HS && (arc_fpu_build & FPX_DP))
+       error ("FPU double precission assist "

Typo.  And Ditto.


+      case EQ:
+      case NE:
+      case UNORDERED:
+      case UNLT:
+      case UNLE:
+      case UNGT:
+      case UNGE:
+       return CC_FPUmode;
+
+      case LT:
+      case LE:
+      case GT:
+      case GE:
+      case ORDERED:
+       return CC_FPUEmode;

cse and other code transformations are likely to do better if you use
just one mode for these.  It is also very odd to have comparisons and their
inverse use different modes.  Have you done any benchmarking for this?

@@ -1282,6 +1363,16 @@ arc_conditional_register_usage (void)
        arc_hard_regno_mode_ok[60] = 1 << (int) S_MODE;
     }

+  /* ARCHS has 64-bit data-path which makes use of the even-odd paired
+     registers.  */
+  if (TARGET_HS)
+    {
+      for (regno = 1; regno < 32; regno +=2)
+       {
+         arc_hard_regno_mode_ok[regno] = S_MODES;
+       }
+    }
+

Does TARGET_HS with -mabi=default allow for passing DFmode / DImodearguments

in odd registers?  I fear you might run into reload trouble when trying to
access the values.

+arc_hard_regno_nregs (int regno,
...
+  if ((regno < FIRST_PSEUDO_REGISTER)
+      && (HARD_REGNO_MODE_OK (regno, mode)
+         || (mode == BLKmode)))
+    return words;
+  return 0;

This prima facie contradicts HARD_REGNO_NREGS, which considers the
larger sizes of simd vector and dma config registers.
I see that there is no actual conflict as the vector registers are not
used for argument passing, but the comment in the function only states
what the function does - not quite correctly, as detailed before - and
not what it is for.

So, either the mxp support has to be removed before this patch goes in,
or arc_hard_regno_nregs has to handle simd registers properly, or the
comment at the top should state the limited applicability of this
function, and there should be an assert to check that the register
number passed is suitable - e.g.:
gcc_assert (regno < ARC_FIRST_SIMD_VR_REG)

+/* Given an CUMULATIVE_ARGS, this function returns an RTX if the

Typo: C is not a vowel.

+  if (!named && TARGET_HS)
+    {
+      /* For unamed args don't try fill up the reg-holes.  */
+      reg_idx = cum->last_reg;
+      /* Only interested in the number of regs.  */

You should make up your mind what the priorities for stdarg are.
Traditionally, lots of gcc ports have supported broken code that lacks
declarations of variadic functions, and furthermore have placed
emphasis on simplicity of varargs/stdarg callee code, at the expense
of normal code.  Often for compatibility with a pre-existing
compiler, sometimes by just copying from existing ports without
stopping to consider the ramifications.
If you make argument passing different for stdarg declared functions,
the broken code that lacks declarations won't work any more.
Ignoring registers for argument passing is not helping the callers
code density.  So the only objective that might be furthered here
is stdarg callee simplicity.  But if you really want that, and ignore
compatibility with broken code, the logical thing to do is not to
pass any unnamed arguments in registers.

If stdarg caller's code size is considered important, and stdarg
callees mostly irrelevant (as mostly associated with I/O, and
linked in just once per function), this aligns well with supporting
broken code: it shouldn't matter if the argument is anonymous or
not, it's the same effort for the caller to pass it.

One further thing to consider when forging new ABIs is that
partial argument passing is there solely for the convenience of
stdarg callees, and/or the programmer who wrote that part of
the target port.  Code size an speed of common code is generally
better when partial argument passing is eliminated.  There is
nothing (apart from backwards compatibility issues for
established ABIs) that prevents a port for a processor with a
single class of argument passing registers to make the stdarg
handling code in the callee (TARGET_GIMPLIFY_VA_ARG_EXPR) keep
track of when the register save zone is exhausted.  It's just
shifting some burden from non-variadic code and variadic caller
side code to the callee side of variadic code.

+      /* We cannot partialy pass some modes for HS (i.e. DImode,
Typo
+        non-compatible mode).  As the DI regs needs to be in even-odd

If you can't always do partial argument passing, then there is no
point in doing it at all.  Well, at least for any given mode, but
really, if you need to keep track of of many saved registers
you have left in order to process va_arg invocations for some
modes, it's really easier to just ditch the partial argument
passing altogether.
The only way to salvage a simplistic va_arg implementation is to
force alignment of stack-passed arguments to match register-passed
arguments.

+        register pair, when comming to partial passing of an

Typo.

+        argument, the code bellow will not find a suitable register

Typo.

+  if (advance && named && (arc_abi != ARC_ABI_PACK))
+    {
+       /* MAX out any other free register if a named arguments goes on

Typo: make up your mind about singular/plural.  You could use 'any' instead
of 'a' to remain ambiguous, but the noun has to agree with the verb.

+             if (!link_insn
+                 /* Avoid FPU instructions.  */
+                 || (GET_MODE (SET_DEST
+                               (PATTERN (link_insn))) == CC_FPUmode)
+                 || (GET_MODE (SET_DEST
+                               (PATTERN (link_insn))) == CC_FPU_UNEQmode)
+                 || (GET_MODE (SET_DEST
+                               (PATTERN (link_insn))) == CC_FPUEmode))

It's pointless to search for the CC setter and then bail out this late.
The mode is also accessible in the CC user, so after we have computed
pc_target, we can check the condition code register in the comparison
XEXP (pc_target, 1) for its mode.

+ emit_insn(gen_adddf3_insn(operands[0], operands[1],operands[2],tmp,const0_rtx));


Four missing spaces.

+ emit_insn(gen_adddf3_insn(operands[0], operands[1],operands[2],const1_rtx,const1_rtx));


Ditto.

+ int const_index = ((GET_CODE (operands[1]) == CONST_DOUBLE) ?1: 2);


Missing space.

+(define_expand "subdf3"
..
+   if (TARGET_DPFP)
..
+        if (TARGET_EM && GET_CODE (operands[1]) == CONST_DOUBLE)

+ emit_insn(gen_subdf3_insn(operands[0], operands[2],operands[1],tmp,const0_rtx));


Four missing spaces.  Besides, subtraction is not commutative.

+        else

+ emit_insn(gen_subdf3_insn(operands[0], operands[1],operands[2],tmp,const0_rtx));


Four missing spaces.

+ emit_insn(gen_subdf3_insn(operands[0], operands[1],operands[2],const1_rtx,const1_rtx));


Ditto.

still in "subdf3":
+  else if (TARGET_FP_DOUBLE)

So this implies that both (TARGET_DPFP) and (TARGET_FP_DOUBLE) might betrue atthe same time. In that case, so we really want to prefer the(TARGET_DPFP) expansion?

+ emit_insn(gen_muldf3_insn(operands[0], operands[1],operands[2],tmp,const0_rtx));

Four missing spaces.

+ emit_insn(gen_muldf3_insn(operands[0], operands[1],operands[2],const1_rtx,const1_rtx));

Ditto.


+mabi=

+Target RejectNegative Joined Var(arc_abi) Enum(arc_abi_type)Init(ARC_ABI_DEFAULT)+Specify the desired ABI used for ARC HS processors. Variants can bedefault or mwabi.


According to the rest of the patch, this should be 'optimized' instead of
'mwabi'
Although it could make sense to use option names that are more likely to
stand the test of time if you change the default later on, or if other
changes ask for yet another abi change that is optimized in a different way.
I also note that there is a variable value ARC_ABI_PACK with no matching
string, but some code to handle it.

May I also suggest that it makes sense to tie the default abi to hardware
variants that are relevant for multilib purposes.
E.g., if you have a multilib with full hardware floating point support, it
makes sense to make that for the double-word register optimized abi,
and make that abi the default for that hardware configuration.


+(define_insn "*addsf3_fpu"
+  [(set (match_operand:SF 0 "register_operand" "=r,r,r,r,r")
+       (plus:SF (match_operand:SF 1 "nonmemory_operand" "0,r,0,r,F")
+                (match_operand:SF 2 "nonmemory_operand" "r,r,F,F,r")))]

Addition is commutative (even floating point addition - even though it'snot associative).

We are missing out on post-reload conditionalization here.

+;; Multiplication
+(define_insn "*mulsf3_fpu"
+  [(set (match_operand:SF 0 "register_operand" "=r,r,r,r,r")
+       (mult:SF (match_operand:SF 1 "nonmemory_operand" "0,r,0,r,F")
+                (match_operand:SF 2 "nonmemory_operand" "r,r,F,F,r")))]

Ditto for multiplication.

Also applies to the multiply operands of fmasf4_fpu, fnmasf4_fpu,fmadf4_fpu, fnmadf4_fpu.


+(define_insn "*cmpsf_trap_fpu"

That name makes as little sense to me as having two separate modesCC_FPU and CC_FPUEfor positive / negated usage and having two comparison patterns preprecision that

do the same but pretend to be dissimilar.

+(define_insn "*cmpsf_fpu_uneq"
+  [(set (reg:CC_FPU_UNEQ CC_REG)
+       (compare:CC_FPU_UNEQ
+        (match_operand:SF 0 "register_operand"  "r,r")
+        (match_operand:SF 1 "nonmemory_operand" "r,F")))]
+  "TARGET_FP_SINGLE"
+  "fscmp %0, %1\\n\\tmov.v.f 0,0\\t;set Z flag"
+  [(set_attr "length" "8,16")

That should be "8,12" .

Likewise for double precision.


+/* Single precissionf floating point support with fused operation. */
+#define TARGET_FP_SFUZED  ((arc_fpu_build & FPU_SF) != 0)
+/* Double precissionf floating point support with fused operation. */

2*2 typo : precissionf
And is there a particular reason for 'FUZED' (a self-destruct mechanism?),
or is that another typo / malapropism?
Also, the agglomeration of S/D with FU{S,Z}ED is confusing.  Could you
spare another underscore?  If you are too skint for horizontal space, even
camel case would be better than suggesting SF used / DF used.

I couldn't help but to think of this: https://xkcd.com/739/

There are likely a number of costs that should be tweaked in arc_rtx_costs.

Re: [PATCH] [ARC] Add single/double IEEE precission FPU support.

Reply via email to