On Mon, 16 Jan 2017, Toma Tabacu wrote: > After searching through the archives, I have found an interesting bit of > information about DIV.G/MOD.G in the original submission thread: > > > > Ruan Beihong 23 July 2008: > > > > > > I've seen the Loongson 2F manual carefully. The (d)div(u) is > > > internally splited into one (d)div(u).g and one (d)mod(u).g. So I said > > > before was wrong. The truth is that, (d)div(u).g and (d)mod(u).g are > > > always faster than (d)div(u), at least the time spend on mflo/mfhi is > > > saved. > > > > > > James Ruan > > > > Richard Sandiford 24 July 2008: > > > > OK, great. In that case, it should simply be a case of disabling > > the divmod-related insns for Loongson, in addition to your patch. > > (Probably stating the obvious there, sorry.) > > > > Richard > > Here's the link for part 1 of the submission thread (has the quotes from > above): > https://gcc.gnu.org/ml/gcc-patches/2008-07/msg01529.html > and here's part 2: > https://gcc.gnu.org/ml/gcc-patches/2008-11/msg00273.html
Thanks for digging this out! > If DIV.G/MOD.G are faster, according to Ruan Beihong, and also smaller than > DIV > (or the same size [1]), as pointed out by Maciej, then I am led to the same > conclusion as Richard Sandiford: that only DIV.G/MOD.G should be generated for > Loongson. > > I think it would still be a good idea to add a test for separated DIV.G/MOD.G, > though. Possibly, though the combined tests need to stay then, to make sure generic DIV/DIVU is not ever produced. > What are your thoughts on this ? > Have I misunderstood something in the context of the submission thread ? > > Regards, > Toma > > [1] I've noticed that GCC generates the same TEQ instruction twice if both > DIV.G and MOD.G are needed, which makes the sequence just as big as > DIV + TEQ + MFHI + MFLO; this seems unnecessary to me. This ought to be handled then, likely by adding Loongson-specific RTL insns matching the `divmod<mode>4' and `udivmod<mode>4' expanders. It may be as simple as say (conceptually, untested): (define_insn "<u>divmod<GPR:mode>4_loongson" [(set (match_operand:GPR 0 "register_operand" "=d") (any_div:GPR (match_operand:GPR 1 "register_operand" "d") (match_operand:GPR 2 "register_operand" "d"))) (set (match_operand:GPR 3 "register_operand" "=d") (any_mod:GPR (match_dup 1) (match_dup 2)))] "TARGET_LOONGSON_2EF" { return mips_output_division ("<GPR:d>div<u>.g\t%0,%1,%2\;<GPR:d>mod<u>.g\t%3,%1,%2", operands); } [(set_attr "type" "idiv") (set_attr "mode" "<GPR:MODE>")]) although any final fix will have to take an instruction count adjustment into account too, as `mips_idiv_insns' won't as it stands handle the new case. Maciej