Not many targets define this besides msp430, pdp1, xtensa, and arm compared to those that appear to unconditionally have a hardware division instruction (also, pdp11 and msp430 seem confused and should be empty instead of "1" and "(! TARGET_HWMULT)" - and having hardware multiplication doesn't have a bearing anyway even if that worked; see numbers below for an example).
Heads-up maintainers of ports without hardware division (including conditionally, for multilibbed configurations)! -- >8 -- With this, execution time for e.g. __moddi3 go from 59 to 40 cycles in the "fast" case or from 290 to 200 cycles in the "slow" case (when the !TARGET_HAS_NO_HW_DIVIDE variant calls division and modulus functions for 32-bit SImode), as exposed by gcc.c-torture/execute/arith-rand-ll.c compiled for -march=v10. Unfortunately, it just puts a performance improvement "dent" of 0.07% in a arith-rand-ll.c-based performance test - where all loops are also reduced to 1/10. The size of every affected libgcc function is reduced to less than half and they are all now leaf functions. * config/cris/t-cris (HOST_LIBGCC2_CFLAGS): Add -DTARGET_HAS_NO_HW_DIVIDE. --- libgcc/config/cris/t-cris | 3 +++ 1 file changed, 3 insertions(+) diff --git a/libgcc/config/cris/t-cris b/libgcc/config/cris/t-cris index b582974a42ee..e0020294be96 100644 --- a/libgcc/config/cris/t-cris +++ b/libgcc/config/cris/t-cris @@ -8,3 +8,6 @@ $(LIB2ADD): $(srcdir)/config/cris/arit.c echo "#define L$$name" > tmp-$@ \ && echo '#include "$<"' >> tmp-$@ \ && mv -f tmp-$@ $@ + +# Use an appropriate implementation when implementing DImode division. +HOST_LIBGCC2_CFLAGS += -DTARGET_HAS_NO_HW_DIVIDE -- 2.30.2