Re: [PATCH v10 04/26] target/loongarch: Add fixed point arithmetic instruction translation

2021-11-17 Thread gaosong

Hi Richard,

On 2021/11/17 下午5:55, Richard Henderson wrote:


@fmt_rr_i12 and @fmt_rr_ui12 are two 'Formats',  but they use the 
same 'Argument sets'(rr_i).


What I meant is that there would be a single gen_rr_i function handing 
the argument set rr_i; no need for two gen_rr_i* functions. 


Got it.

Thanks.
Song Gao



Re: [PATCH v10 04/26] target/loongarch: Add fixed point arithmetic instruction translation

2021-11-17 Thread Richard Henderson

On 11/17/21 10:29 AM, gaosong wrote:

gen_rr_i ?


The code is not written completely,  like this:

gen_rr_i12:

@fmt_rr_i12    .. imm:s12 rj:5 rd:5 _i
slti  001000  . . @fmt_rr_i12
sltui 001001  . . @fmt_rr_i12
...

gen_rr_ui12:

@fmt_rr_ui12    .. imm:12 rj:5 rd:5 _i
andi  001101  . . @fmt_rr_ui12
ori   001110  . . @fmt_rr_ui12
xori  00  . . @fmt_rr_ui12
...

@fmt_rr_i12 and @fmt_rr_ui12 are two 'Formats',  but they use the same 
'Argument sets'(rr_i).


What I meant is that there would be a single gen_rr_i function handing the argument set 
rr_i; no need for two gen_rr_i* functions.



gen_rrr_sa2p1:

@fmt_rrr_sa2p1  ... .. rk:5 rj:5 rd:5   _rr_sa  
sa=%sa2p1
lsl_w     010 .. . . .@fmt_rrr_sa2p1
alsl_wu    011 .. . . .   @fmt_rrr_sa2p1
alsl_d    0010 110 .. . . .   @fmt_rrr_sa2p1
...

gen_rrr_sa2:
@fmt_rrr_sa2  ... sa:2 rk:5 rj:5 rd:5   _rr_sa
bytepick_w     100 .. . . .   @fmt_rrr_sa3
...

gen_rrr_sa3:
@fmt_rrr_sa3   .. sa:3 rk:5 rj:5 rd:5   _rr_sa
bytepick_d     11 ... . . .   @fmt_rrr_sa3
...


Likewise a single gen_rrr_sa function.


r~



Re: [PATCH v10 04/26] target/loongarch: Add fixed point arithmetic instruction translation

2021-11-17 Thread gaosong

Hi Richard,

On 2021/11/17 下午4:28, Richard Henderson wrote:

On 11/17/21 8:57 AM, gaosong wrote:
I see that  insns.decode format is not very consistent with other 
architectures, such ARM/RISCV


No.  I don't like how riscv has done it, though they have quite a few 
split fields, so perhaps they thought it looked weird.




#
# Argument sets
#
_i  rd imm
  rd rj rk
_i rd rj imm
_sa rd rj rk sa

#
# Formats
#
@fmt_rrr   . rk:5 rj:5 rd:5 
@fmt_r_i20     ... imm:s20 rd:5 _i
@fmt_rr_i12    .. imm:s12 rj:5 rd:5 _i
@fmt_rr_ui12    .. imm:12 rj:5 rd:5 _i
@fmt_rr_i16    .. imm:s16 rj:5 rd:5 _i
@fmt_rrr_sa2p1    ... .. rk:5 rj:5 rd:5 _sa  
sa=%sa2p1


#
# Fixed point arithmetic operation instruction
#
add_w     0001 0 . . . @fmt_rrr
add_d     0001 1 . . . @fmt_rrr
sub_w     0001 00010 . . . @fmt_rrr
sub_d     0001 00011 . . . @fmt_rrr
slt   0001 00100 . . . @fmt_rrr
sltu  0001 00101 . . . @fmt_rrr
slti  001000  . .   
@fmt_rr_i12



and trans_xxx.c.inc

static bool gen_rrr(DisasContext *ctx, arg_rrr *a, ...) {}
static bool gen_rr_i12(DisasContext *ctx, arg_rr_i *a, ) {}


gen_rr_i ?


The code is not written completely,  like this:

gen_rr_i12:

@fmt_rr_i12    .. imm:s12 rj:5 rd:5 _i
slti  001000  . . @fmt_rr_i12
sltui 001001  . . @fmt_rr_i12
...

gen_rr_ui12:

@fmt_rr_ui12    .. imm:12 rj:5 rd:5 _i
andi  001101  . . @fmt_rr_ui12
ori   001110  . . @fmt_rr_ui12
xori  00  . . @fmt_rr_ui12
...

@fmt_rr_i12 and @fmt_rr_ui12 are two 'Formats',  but they use the same 
'Argument sets'(rr_i).




static bool gen_rrr_sa2p1(DisasContext *ctx, arg_rrr_sa *a, ...) {}


gen_rrr_sa ?


Likewise.

gen_rrr_sa2p1:

@fmt_rrr_sa2p1  ... .. rk:5 rj:5 rd:5   _rr_sa  
sa=%sa2p1
lsl_w     010 .. . . .@fmt_rrr_sa2p1
alsl_wu    011 .. . . .   @fmt_rrr_sa2p1
alsl_d    0010 110 .. . . .   @fmt_rrr_sa2p1
...

gen_rrr_sa2:
@fmt_rrr_sa2  ... sa:2 rk:5 rj:5 rd:5   _rr_sa
bytepick_w     100 .. . . .   @fmt_rrr_sa3
...

gen_rrr_sa3:
@fmt_rrr_sa3   .. sa:3 rk:5 rj:5 rd:5   _rr_sa
bytepick_d     11 ... . . .   @fmt_rrr_sa3
...


Richard, is that OK?


Other than those two nits, this looks very clean.  Thanks,


OK, I'll correct it on v11.

Thanks.
Song Gao



Re: [PATCH v10 04/26] target/loongarch: Add fixed point arithmetic instruction translation

2021-11-17 Thread Richard Henderson

On 11/17/21 8:57 AM, gaosong wrote:
I see that  insns.decode format is not very consistent with other architectures, such 
ARM/RISCV


No.  I don't like how riscv has done it, though they have quite a few split fields, so 
perhaps they thought it looked weird.




#
# Argument sets
#
_i  rd imm
  rd rj rk
_i rd rj imm
_sa rd rj rk sa

#
# Formats
#
@fmt_rrr   . rk:5 rj:5 rd:5 
@fmt_r_i20     ... imm:s20 rd:5 _i
@fmt_rr_i12    .. imm:s12 rj:5 rd:5 _i
@fmt_rr_ui12    .. imm:12 rj:5 rd:5 _i
@fmt_rr_i16    .. imm:s16 rj:5 rd:5 _i
@fmt_rrr_sa2p1    ... .. rk:5 rj:5 rd:5 _sa  sa=%sa2p1

#
# Fixed point arithmetic operation instruction
#
add_w     0001 0 . . .    @fmt_rrr
add_d     0001 1 . . .    @fmt_rrr
sub_w     0001 00010 . . .    @fmt_rrr
sub_d     0001 00011 . . .    @fmt_rrr
slt   0001 00100 . . . @fmt_rrr
sltu  0001 00101 . . . @fmt_rrr
slti  001000  . .   @fmt_rr_i12


and trans_xxx.c.inc

static bool gen_rrr(DisasContext *ctx, arg_rrr *a, ...) {}
static bool gen_rr_i12(DisasContext *ctx, arg_rr_i *a, ) {}


gen_rr_i ?


static bool gen_rrr_sa2p1(DisasContext *ctx, arg_rrr_sa *a, ...) {}


gen_rrr_sa ?


Richard, is that OK?


Other than those two nits, this looks very clean.  Thanks,


r~



Re: [PATCH v10 04/26] target/loongarch: Add fixed point arithmetic instruction translation

2021-11-17 Thread gaosong

Hi Richard,

On 2021/11/15 下午4:42, Richard Henderson wrote:

On 11/15/21 4:59 AM, gaosong wrote:

'The width of the immediate is a detail of the format'  means:

_rdrjimm rd  rj imm

@fmt_rdrjimm  .. imm:12  rj:5 rd:5 _rdrjimm
@fmt_rdrjimm14   imm:14  rj:5 rd:5 _rdrjimm
@fmt_rdrjimm16    .. imm:16  rj:5 rd:5 _rdrjimm

and we print in the disassembly, liks this

output_rdrjimm(DisasContext *ctx, arg_fmt_rdrjimm * a,  const char 
*mnemonic)

{
 output(ctx, mnemonic, "%s, %s, 0x%x", regnames[a->rd], 
regnames[a->rj], a->imm);

}

is that right?


Yes.

I'll note that regnames[] is defined in target/loongarch/cpu.c, which 
is not available when we want to use this disassembler for 
tcg/loongarch64/.  I think it would be easier to print this as


    "r%d", a->rd

so that you do not need to rely on the external strings.

I also think you should print signed numbers, "%d", because 0xfff8 
(truncated to 32 bits) is not really the correct representation of -8 
for a 64-bit operand.




1. We print sa in disassembly...
2. We use sa on gen_alsl_* not (sa2+1).
3. bytepick_w use the same print functions.
Is my understanding right?


Yes, that is the issue I am describing.

I see that  insns.decode format is not very consistent with other 
architectures, such ARM/RISCV


I'll correct it , like this:

# Fields
#
%sa2p1 15:2 !function=plus_1

#
# Argument sets
#
_i  rd imm
  rd rj rk
_i rd rj imm
_sa rd rj rk sa

#
# Formats
#
@fmt_rrr   . rk:5 rj:5 rd:5 
@fmt_r_i20     ... imm:s20 rd:5 _i
@fmt_rr_i12    .. imm:s12 rj:5 rd:5 _i
@fmt_rr_ui12    .. imm:12 rj:5 rd:5 _i
@fmt_rr_i16    .. imm:s16 rj:5 rd:5 _i
@fmt_rrr_sa2p1    ... .. rk:5 rj:5 rd:5 _sa  sa=%sa2p1

#
# Fixed point arithmetic operation instruction
#
add_w     0001 0 . . .    @fmt_rrr
add_d     0001 1 . . .    @fmt_rrr
sub_w     0001 00010 . . .    @fmt_rrr
sub_d     0001 00011 . . .    @fmt_rrr
slt   0001 00100 . . . @fmt_rrr
sltu  0001 00101 . . . @fmt_rrr
slti  001000  . .   
@fmt_rr_i12



and trans_xxx.c.inc

static bool gen_rrr(DisasContext *ctx, arg_rrr *a, ...) {}
static bool gen_rr_i12(DisasContext *ctx, arg_rr_i *a, ) {}
static bool gen_rrr_sa2p1(DisasContext *ctx, arg_rrr_sa *a, ...) {}
...

Richard, is that OK?

Thanks,
Song Gao



Re: [PATCH v10 04/26] target/loongarch: Add fixed point arithmetic instruction translation

2021-11-15 Thread Richard Henderson

On 11/15/21 4:59 AM, gaosong wrote:

'The width of the immediate is a detail of the format'  means:

_rdrjimm rd  rj imm

@fmt_rdrjimm  .. imm:12  rj:5 rd:5 _rdrjimm
@fmt_rdrjimm14   imm:14  rj:5 rd:5 _rdrjimm
@fmt_rdrjimm16    .. imm:16  rj:5 rd:5 _rdrjimm

and we print in the disassembly, liks this

output_rdrjimm(DisasContext *ctx, arg_fmt_rdrjimm * a,  const char *mnemonic)
{
 output(ctx, mnemonic, "%s, %s, 0x%x", regnames[a->rd], regnames[a->rj], 
a->imm);
}

is that right?


Yes.

I'll note that regnames[] is defined in target/loongarch/cpu.c, which is not available 
when we want to use this disassembler for tcg/loongarch64/.  I think it would be easier to 
print this as


"r%d", a->rd

so that you do not need to rely on the external strings.

I also think you should print signed numbers, "%d", because 0xfff8 (truncated to 32 
bits) is not really the correct representation of -8 for a 64-bit operand.




1. We print sa in disassembly...
2. We use sa on gen_alsl_* not (sa2+1).
3. bytepick_w use the same print functions.
Is my understanding right?


Yes, that is the issue I am describing.


r~



Re: [PATCH v10 04/26] target/loongarch: Add fixed point arithmetic instruction translation

2021-11-14 Thread gaosong


Hi Richard,

On 2021/11/12 下午10:05, Richard Henderson wrote:

On 11/12/21 7:53 AM, Song Gao wrote:

+#
+# Fields
+#
+%rd  0:5
+%rj  5:5
+%rk  10:5
+%sa2 15:2
+%si12    10:s12
+%ui12    10:12
+%si16    10:s16
+%si20    5:s20


You should only create separate field definitions like this when they 
are complex: e.g. the logical field is disjoint or there's a need for 
!function.



+
+#
+# Argument sets
+#
+_rdrjrk rd rj rk
+_rdrjsi12   rd rj si12
+_rdrjrksa2  rd rj rk sa2
+_rdrjsi16   rd rj si16
+_rdrjui12   rd rj ui12
+_rdsi20 rd si20


Some of these should be combined.  The width of the immediate is a 
detail of the format, not the decoded argument set.  Thus you should have


_rdimm rd imm
_rdrjimm   rd rj imm
_rdrjrk    rd rj rk
_rdrjrksa  rd rj rk sa


'The width of the immediate is a detail of the format'  means:

_rdrjimm rd  rj imm

@fmt_rdrjimm  .. imm:12  rj:5 rd:5 _rdrjimm
@fmt_rdrjimm14   imm:14  rj:5 rd:5 _rdrjimm
@fmt_rdrjimm16    .. imm:16  rj:5 rd:5 _rdrjimm

and we print in the disassembly, liks this

output_rdrjimm(DisasContext *ctx, arg_fmt_rdrjimm * a,  const char *mnemonic)
{
output(ctx, mnemonic, "%s, %s, 0x%x", regnames[a->rd], regnames[a->rj], 
a->imm);
}

is that right?

+alsl_w     010 .. . . .   
@fmt_rdrjrksa2
+alsl_wu    011 .. . . .   
@fmt_rdrjrksa2
+alsl_d    0010 110 .. . . .   
@fmt_rdrjrksa2


The encoding of these insns is that the shift is sa+1.

While you compensate for this in gen_alsl_*, we print the "wrong" 
number in the disassembly.  I think it would be better to do


%sa2p1 15:2 !function=plus_1
@fmt_rdrjrksa2p1    ... .. rk:5 rj:5 rd:5 \
  _rdrjrksa sa=%sa2p1

1. We print sa in disassembly output_rdrjrksa(DisasContext *ctx, 
arg_fmt_rdrjsa* a, const char *memonic) { output(ctx, memonic, "%s, %s, 
%s, 0x0x", regnames[a->rd], regnames[a->rj], a->sa) } 2. We use sa on 
gen_alsl_* not (sa2+1). 3 bytepick_w use the same print functions. but 
the Field/Argurment/Format is %sa2 15:2 _rdrjrksa rd rj sa 
@fmt_rdrjrk sa2   ... sa:2 rk:5 rj:5 rd:5 _rdrjrksa Is 
my understanding right? Thanks. Song Gao




Re: [PATCH v10 04/26] target/loongarch: Add fixed point arithmetic instruction translation

2021-11-12 Thread WANG Xuerui

On 11/12/21 22:05, Richard Henderson wrote:

On 11/12/21 7:53 AM, Song Gao wrote:

+#
+# Fields
+#
+%rd  0:5
+%rj  5:5
+%rk  10:5
+%sa2 15:2
+%si12    10:s12
+%ui12    10:12
+%si16    10:s16
+%si20    5:s20


You should only create separate field definitions like this when they 
are complex: e.g. the logical field is disjoint or there's a need for 
!function.



+
+#
+# Argument sets
+#
+_rdrjrk rd rj rk
+_rdrjsi12   rd rj si12
+_rdrjrksa2  rd rj rk sa2
+_rdrjsi16   rd rj si16
+_rdrjui12   rd rj ui12
+_rdsi20 rd si20


Some of these should be combined.  The width of the immediate is a 
detail of the format, not the decoded argument set.  Thus you should have


_rdimm rd imm
_rdrjimm   rd rj imm
_rdrjrk    rd rj rk
_rdrjrksa  rd rj rk sa


I'd like to add, that the organization of the whole decodetree file 
closely resembles that of the ISA manual, most likely on purpose (while 
not stated anywhere in the patch). However the manual itself is not 
without errors or inconsistencies; for example, the 9 "base instruction 
formats" classification is nowhere near accurate, and here we can see 
the author is forced to create ad-hoc names (repeating the operand 
slots). I suggest just generating the descriptions from the 
loongarch-opcodes project [1]; no need to duplicate work. I'll happily 
help if you decide to do that.


[1]: https://github.com/loongson-community/loongarch-opcodes



+alsl_w     010 .. . . .   
@fmt_rdrjrksa2

+alsl_wu    011 .. . . . @fmt_rdrjrksa2
+alsl_d    0010 110 .. . . . @fmt_rdrjrksa2


The encoding of these insns is that the shift is sa+1.

While you compensate for this in gen_alsl_*, we print the "wrong" 
number in the disassembly.  I think it would be better to do


%sa2p1 15:2 !function=plus_1
@fmt_rdrjrksa2p1    ... .. rk:5 rj:5 rd:5 \
  _rdrjrksa sa=%sa2p1


Here again, the manual was inconsistent with the binutils 
implementation; the manual says (for ALSL.W, it's SLADD in 
loongarch-opcodes project's revised mnemonics):


"ALSL.W logically left-shifts rj[31:0] by (sa2+1) bits, [snip]" 
(translation mine, not copied from the official translation)


Clearly the "+1" part is not meant to show up in disassembly. Yet the 
binutils implementation acts as if the operand should be pre-added 1 in 
source code, and disassembles and prints as such, obvious mismatch here. 
I'd suggest fixing the disassembly code to remove this inconsistency. 
And the "+1" "feature" is not used anywhere else AFAIK, so it wouldn't 
hurt to just delete everything about it.





r~





Re: [PATCH v10 04/26] target/loongarch: Add fixed point arithmetic instruction translation

2021-11-12 Thread Richard Henderson

On 11/12/21 7:53 AM, Song Gao wrote:

+#
+# Fields
+#
+%rd  0:5
+%rj  5:5
+%rk  10:5
+%sa2 15:2
+%si1210:s12
+%ui1210:12
+%si1610:s16
+%si205:s20


You should only create separate field definitions like this when they are complex: e.g. 
the logical field is disjoint or there's a need for !function.



+
+#
+# Argument sets
+#
+_rdrjrk rd rj rk
+_rdrjsi12   rd rj si12
+_rdrjrksa2  rd rj rk sa2
+_rdrjsi16   rd rj si16
+_rdrjui12   rd rj ui12
+_rdsi20 rd si20


Some of these should be combined.  The width of the immediate is a detail of the format, 
not the decoded argument set.  Thus you should have


_rdimm rd imm
_rdrjimm   rd rj imm
_rdrjrkrd rj rk
_rdrjrksa  rd rj rk sa


+alsl_w     010 .. . . .   @fmt_rdrjrksa2
+alsl_wu    011 .. . . .   @fmt_rdrjrksa2
+alsl_d    0010 110 .. . . .   @fmt_rdrjrksa2


The encoding of these insns is that the shift is sa+1.

While you compensate for this in gen_alsl_*, we print the "wrong" number in the 
disassembly.  I think it would be better to do


%sa2p1 15:2 !function=plus_1
@fmt_rdrjrksa2p1    ... .. rk:5 rj:5 rd:5 \
  _rdrjrksa sa=%sa2p1


r~



[PATCH v10 04/26] target/loongarch: Add fixed point arithmetic instruction translation

2021-11-11 Thread Song Gao
This includes:
- ADD.{W/D}, SUB.{W/D}
- ADDI.{W/D}, ADDU16ID
- ALSL.{W[U]/D}
- LU12I.W, LU32I.D LU52I.D
- SLT[U], SLT[U]I
- PCADDI, PCADDU12I, PCADDU18I, PCALAU12I
- AND, OR, NOR, XOR, ANDN, ORN
- MUL.{W/D}, MULH.{W[U]/D[U]}
- MULW.D.W[U]
- DIV.{W[U]/D[U]}, MOD.{W[U]/D[U]}
- ANDI, ORI, XORI

Signed-off-by: Song Gao 
Signed-off-by: Xiaojuan Yang 
Reviewed-by: Richard Henderson 
---
 target/loongarch/insn_trans/trans_arith.c.inc | 319 ++
 target/loongarch/insns.decode |  88 +++
 target/loongarch/translate.c  |  78 +++
 target/loongarch/translate.h  |  19 ++
 4 files changed, 504 insertions(+)
 create mode 100644 target/loongarch/insn_trans/trans_arith.c.inc
 create mode 100644 target/loongarch/insns.decode

diff --git a/target/loongarch/insn_trans/trans_arith.c.inc 
b/target/loongarch/insn_trans/trans_arith.c.inc
new file mode 100644
index 000..384a158
--- /dev/null
+++ b/target/loongarch/insn_trans/trans_arith.c.inc
@@ -0,0 +1,319 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ */
+
+static bool gen_r3(DisasContext *ctx, arg_fmt_rdrjrk *a,
+   DisasExtend src1_ext, DisasExtend src2_ext,
+   DisasExtend dst_ext, void (*func)(TCGv, TCGv, TCGv))
+{
+TCGv dest = gpr_dst(ctx, a->rd, dst_ext);
+TCGv src1 = gpr_src(ctx, a->rj, src1_ext);
+TCGv src2 = gpr_src(ctx, a->rk, src2_ext);
+
+func(dest, src1, src2);
+
+/* dst_ext is EXT_NONE and input is dest, We don't run gen_set_gpr. */
+if (dst_ext) {
+gen_set_gpr(a->rd, dest, dst_ext);
+}
+return true;
+}
+
+static bool gen_r2_si12(DisasContext *ctx, arg_fmt_rdrjsi12 *a,
+DisasExtend src_ext, DisasExtend dst_ext,
+void (*func)(TCGv, TCGv, TCGv))
+{
+TCGv dest = gpr_dst(ctx, a->rd, dst_ext);
+TCGv src1 = gpr_src(ctx, a->rj, src_ext);
+TCGv src2 = tcg_constant_tl(a->si12);
+
+func(dest, src1, src2);
+
+if (dst_ext) {
+gen_set_gpr(a->rd, dest, dst_ext);
+}
+return true;
+}
+
+static bool gen_r3_sa2(DisasContext *ctx, arg_fmt_rdrjrksa2 *a,
+   DisasExtend src_ext, DisasExtend dst_ext,
+   void (*func)(TCGv, TCGv, TCGv, TCGv, target_long))
+{
+TCGv dest = gpr_dst(ctx, a->rd, dst_ext);
+TCGv src1 = gpr_src(ctx, a->rj, src_ext);
+TCGv src2 = gpr_src(ctx, a->rk, src_ext);
+TCGv temp = tcg_temp_new();
+
+func(dest, src1, src2, temp, a->sa2);
+
+if (dst_ext) {
+gen_set_gpr(a->rd, dest, dst_ext);
+}
+tcg_temp_free(temp);
+return true;
+}
+
+static bool trans_lu12i_w(DisasContext *ctx, arg_lu12i_w *a)
+{
+TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE);
+
+tcg_gen_movi_tl(dest, a->si20 << 12);
+return true;
+}
+
+static bool gen_pc(DisasContext *ctx, arg_fmt_rdsi20 *a,
+   target_ulong (*func)(target_ulong, int))
+{
+TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE);
+target_ulong addr = func(ctx->base.pc_next, a->si20);
+
+tcg_gen_movi_tl(dest, addr);
+return true;
+}
+
+static bool gen_r2_ui12(DisasContext *ctx, arg_fmt_rdrjui12 *a,
+void (*func)(TCGv, TCGv, target_long))
+{
+TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE);
+TCGv src1 = gpr_src(ctx, a->rj, EXT_NONE);
+
+func(dest, src1, a->ui12);
+return true;
+}
+
+static void gen_slt(TCGv dest, TCGv src1, TCGv src2)
+{
+tcg_gen_setcond_tl(TCG_COND_LT, dest, src1, src2);
+}
+
+static void gen_sltu(TCGv dest, TCGv src1, TCGv src2)
+{
+tcg_gen_setcond_tl(TCG_COND_LTU, dest, src1, src2);
+}
+
+static void gen_mulh_w(TCGv dest, TCGv src1, TCGv src2)
+{
+tcg_gen_mul_i64(dest, src1, src2);
+tcg_gen_sari_i64(dest, dest, 32);
+}
+
+static void gen_mulh_wu(TCGv dest, TCGv src1, TCGv src2)
+{
+tcg_gen_mul_i64(dest, src1, src2);
+tcg_gen_sari_i64(dest, dest, 32);
+}
+
+static void gen_mulh_d(TCGv dest, TCGv src1, TCGv src2)
+{
+TCGv discard = tcg_temp_new();
+tcg_gen_muls2_tl(discard, dest, src1, src2);
+tcg_temp_free(discard);
+}
+
+static void gen_mulh_du(TCGv dest, TCGv src1, TCGv src2)
+{
+TCGv discard = tcg_temp_new();
+tcg_gen_mulu2_tl(discard, dest, src1, src2);
+tcg_temp_free(discard);
+}
+
+static void prep_divisor_d(TCGv ret, TCGv src1, TCGv src2)
+{
+TCGv t0 = tcg_temp_new();
+TCGv t1 = tcg_temp_new();
+TCGv zero = tcg_constant_tl(0);
+
+/*
+ * If min / -1, set the divisor to 1.
+ * This avoids potential host overflow trap and produces min.
+ * If x / 0, set the divisor to 1.
+ * This avoids potential host overflow trap;
+ * the required result is undefined.
+ */
+tcg_gen_setcondi_tl(TCG_COND_EQ, ret, src1, INT64_MIN);
+tcg_gen_setcondi_tl(TCG_COND_EQ, t0, src2, -1);
+tcg_gen_setcondi_tl(TCG_COND_EQ, t1, src2, 0);
+tcg_gen_and_tl(ret, ret, t0);
+tcg_gen_or_tl(ret, ret,