Re: [PATCH v10 04/26] target/loongarch: Add fixed point arithmetic instruction translation
Hi Richard, On 2021/11/17 下午5:55, Richard Henderson wrote: @fmt_rr_i12 and @fmt_rr_ui12 are two 'Formats', but they use the same 'Argument sets'(rr_i). What I meant is that there would be a single gen_rr_i function handing the argument set rr_i; no need for two gen_rr_i* functions. Got it. Thanks. Song Gao
Re: [PATCH v10 04/26] target/loongarch: Add fixed point arithmetic instruction translation
On 11/17/21 10:29 AM, gaosong wrote: gen_rr_i ? The code is not written completely, like this: gen_rr_i12: @fmt_rr_i12 .. imm:s12 rj:5 rd:5 _i slti 001000 . . @fmt_rr_i12 sltui 001001 . . @fmt_rr_i12 ... gen_rr_ui12: @fmt_rr_ui12 .. imm:12 rj:5 rd:5 _i andi 001101 . . @fmt_rr_ui12 ori 001110 . . @fmt_rr_ui12 xori 00 . . @fmt_rr_ui12 ... @fmt_rr_i12 and @fmt_rr_ui12 are two 'Formats', but they use the same 'Argument sets'(rr_i). What I meant is that there would be a single gen_rr_i function handing the argument set rr_i; no need for two gen_rr_i* functions. gen_rrr_sa2p1: @fmt_rrr_sa2p1 ... .. rk:5 rj:5 rd:5 _rr_sa sa=%sa2p1 lsl_w 010 .. . . .@fmt_rrr_sa2p1 alsl_wu 011 .. . . . @fmt_rrr_sa2p1 alsl_d 0010 110 .. . . . @fmt_rrr_sa2p1 ... gen_rrr_sa2: @fmt_rrr_sa2 ... sa:2 rk:5 rj:5 rd:5 _rr_sa bytepick_w 100 .. . . . @fmt_rrr_sa3 ... gen_rrr_sa3: @fmt_rrr_sa3 .. sa:3 rk:5 rj:5 rd:5 _rr_sa bytepick_d 11 ... . . . @fmt_rrr_sa3 ... Likewise a single gen_rrr_sa function. r~
Re: [PATCH v10 04/26] target/loongarch: Add fixed point arithmetic instruction translation
Hi Richard, On 2021/11/17 下午4:28, Richard Henderson wrote: On 11/17/21 8:57 AM, gaosong wrote: I see that insns.decode format is not very consistent with other architectures, such ARM/RISCV No. I don't like how riscv has done it, though they have quite a few split fields, so perhaps they thought it looked weird. # # Argument sets # _i rd imm rd rj rk _i rd rj imm _sa rd rj rk sa # # Formats # @fmt_rrr . rk:5 rj:5 rd:5 @fmt_r_i20 ... imm:s20 rd:5 _i @fmt_rr_i12 .. imm:s12 rj:5 rd:5 _i @fmt_rr_ui12 .. imm:12 rj:5 rd:5 _i @fmt_rr_i16 .. imm:s16 rj:5 rd:5 _i @fmt_rrr_sa2p1 ... .. rk:5 rj:5 rd:5 _sa sa=%sa2p1 # # Fixed point arithmetic operation instruction # add_w 0001 0 . . . @fmt_rrr add_d 0001 1 . . . @fmt_rrr sub_w 0001 00010 . . . @fmt_rrr sub_d 0001 00011 . . . @fmt_rrr slt 0001 00100 . . . @fmt_rrr sltu 0001 00101 . . . @fmt_rrr slti 001000 . . @fmt_rr_i12 and trans_xxx.c.inc static bool gen_rrr(DisasContext *ctx, arg_rrr *a, ...) {} static bool gen_rr_i12(DisasContext *ctx, arg_rr_i *a, ) {} gen_rr_i ? The code is not written completely, like this: gen_rr_i12: @fmt_rr_i12 .. imm:s12 rj:5 rd:5 _i slti 001000 . . @fmt_rr_i12 sltui 001001 . . @fmt_rr_i12 ... gen_rr_ui12: @fmt_rr_ui12 .. imm:12 rj:5 rd:5 _i andi 001101 . . @fmt_rr_ui12 ori 001110 . . @fmt_rr_ui12 xori 00 . . @fmt_rr_ui12 ... @fmt_rr_i12 and @fmt_rr_ui12 are two 'Formats', but they use the same 'Argument sets'(rr_i). static bool gen_rrr_sa2p1(DisasContext *ctx, arg_rrr_sa *a, ...) {} gen_rrr_sa ? Likewise. gen_rrr_sa2p1: @fmt_rrr_sa2p1 ... .. rk:5 rj:5 rd:5 _rr_sa sa=%sa2p1 lsl_w 010 .. . . .@fmt_rrr_sa2p1 alsl_wu 011 .. . . . @fmt_rrr_sa2p1 alsl_d 0010 110 .. . . . @fmt_rrr_sa2p1 ... gen_rrr_sa2: @fmt_rrr_sa2 ... sa:2 rk:5 rj:5 rd:5 _rr_sa bytepick_w 100 .. . . . @fmt_rrr_sa3 ... gen_rrr_sa3: @fmt_rrr_sa3 .. sa:3 rk:5 rj:5 rd:5 _rr_sa bytepick_d 11 ... . . . @fmt_rrr_sa3 ... Richard, is that OK? Other than those two nits, this looks very clean. Thanks, OK, I'll correct it on v11. Thanks. Song Gao
Re: [PATCH v10 04/26] target/loongarch: Add fixed point arithmetic instruction translation
On 11/17/21 8:57 AM, gaosong wrote: I see that insns.decode format is not very consistent with other architectures, such ARM/RISCV No. I don't like how riscv has done it, though they have quite a few split fields, so perhaps they thought it looked weird. # # Argument sets # _i rd imm rd rj rk _i rd rj imm _sa rd rj rk sa # # Formats # @fmt_rrr . rk:5 rj:5 rd:5 @fmt_r_i20 ... imm:s20 rd:5 _i @fmt_rr_i12 .. imm:s12 rj:5 rd:5 _i @fmt_rr_ui12 .. imm:12 rj:5 rd:5 _i @fmt_rr_i16 .. imm:s16 rj:5 rd:5 _i @fmt_rrr_sa2p1 ... .. rk:5 rj:5 rd:5 _sa sa=%sa2p1 # # Fixed point arithmetic operation instruction # add_w 0001 0 . . . @fmt_rrr add_d 0001 1 . . . @fmt_rrr sub_w 0001 00010 . . . @fmt_rrr sub_d 0001 00011 . . . @fmt_rrr slt 0001 00100 . . . @fmt_rrr sltu 0001 00101 . . . @fmt_rrr slti 001000 . . @fmt_rr_i12 and trans_xxx.c.inc static bool gen_rrr(DisasContext *ctx, arg_rrr *a, ...) {} static bool gen_rr_i12(DisasContext *ctx, arg_rr_i *a, ) {} gen_rr_i ? static bool gen_rrr_sa2p1(DisasContext *ctx, arg_rrr_sa *a, ...) {} gen_rrr_sa ? Richard, is that OK? Other than those two nits, this looks very clean. Thanks, r~
Re: [PATCH v10 04/26] target/loongarch: Add fixed point arithmetic instruction translation
Hi Richard, On 2021/11/15 下午4:42, Richard Henderson wrote: On 11/15/21 4:59 AM, gaosong wrote: 'The width of the immediate is a detail of the format' means: _rdrjimm rd rj imm @fmt_rdrjimm .. imm:12 rj:5 rd:5 _rdrjimm @fmt_rdrjimm14 imm:14 rj:5 rd:5 _rdrjimm @fmt_rdrjimm16 .. imm:16 rj:5 rd:5 _rdrjimm and we print in the disassembly, liks this output_rdrjimm(DisasContext *ctx, arg_fmt_rdrjimm * a, const char *mnemonic) { output(ctx, mnemonic, "%s, %s, 0x%x", regnames[a->rd], regnames[a->rj], a->imm); } is that right? Yes. I'll note that regnames[] is defined in target/loongarch/cpu.c, which is not available when we want to use this disassembler for tcg/loongarch64/. I think it would be easier to print this as "r%d", a->rd so that you do not need to rely on the external strings. I also think you should print signed numbers, "%d", because 0xfff8 (truncated to 32 bits) is not really the correct representation of -8 for a 64-bit operand. 1. We print sa in disassembly... 2. We use sa on gen_alsl_* not (sa2+1). 3. bytepick_w use the same print functions. Is my understanding right? Yes, that is the issue I am describing. I see that insns.decode format is not very consistent with other architectures, such ARM/RISCV I'll correct it , like this: # Fields # %sa2p1 15:2 !function=plus_1 # # Argument sets # _i rd imm rd rj rk _i rd rj imm _sa rd rj rk sa # # Formats # @fmt_rrr . rk:5 rj:5 rd:5 @fmt_r_i20 ... imm:s20 rd:5 _i @fmt_rr_i12 .. imm:s12 rj:5 rd:5 _i @fmt_rr_ui12 .. imm:12 rj:5 rd:5 _i @fmt_rr_i16 .. imm:s16 rj:5 rd:5 _i @fmt_rrr_sa2p1 ... .. rk:5 rj:5 rd:5 _sa sa=%sa2p1 # # Fixed point arithmetic operation instruction # add_w 0001 0 . . . @fmt_rrr add_d 0001 1 . . . @fmt_rrr sub_w 0001 00010 . . . @fmt_rrr sub_d 0001 00011 . . . @fmt_rrr slt 0001 00100 . . . @fmt_rrr sltu 0001 00101 . . . @fmt_rrr slti 001000 . . @fmt_rr_i12 and trans_xxx.c.inc static bool gen_rrr(DisasContext *ctx, arg_rrr *a, ...) {} static bool gen_rr_i12(DisasContext *ctx, arg_rr_i *a, ) {} static bool gen_rrr_sa2p1(DisasContext *ctx, arg_rrr_sa *a, ...) {} ... Richard, is that OK? Thanks, Song Gao
Re: [PATCH v10 04/26] target/loongarch: Add fixed point arithmetic instruction translation
On 11/15/21 4:59 AM, gaosong wrote: 'The width of the immediate is a detail of the format' means: _rdrjimm rd rj imm @fmt_rdrjimm .. imm:12 rj:5 rd:5 _rdrjimm @fmt_rdrjimm14 imm:14 rj:5 rd:5 _rdrjimm @fmt_rdrjimm16 .. imm:16 rj:5 rd:5 _rdrjimm and we print in the disassembly, liks this output_rdrjimm(DisasContext *ctx, arg_fmt_rdrjimm * a, const char *mnemonic) { output(ctx, mnemonic, "%s, %s, 0x%x", regnames[a->rd], regnames[a->rj], a->imm); } is that right? Yes. I'll note that regnames[] is defined in target/loongarch/cpu.c, which is not available when we want to use this disassembler for tcg/loongarch64/. I think it would be easier to print this as "r%d", a->rd so that you do not need to rely on the external strings. I also think you should print signed numbers, "%d", because 0xfff8 (truncated to 32 bits) is not really the correct representation of -8 for a 64-bit operand. 1. We print sa in disassembly... 2. We use sa on gen_alsl_* not (sa2+1). 3. bytepick_w use the same print functions. Is my understanding right? Yes, that is the issue I am describing. r~
Re: [PATCH v10 04/26] target/loongarch: Add fixed point arithmetic instruction translation
Hi Richard, On 2021/11/12 下午10:05, Richard Henderson wrote: On 11/12/21 7:53 AM, Song Gao wrote: +# +# Fields +# +%rd 0:5 +%rj 5:5 +%rk 10:5 +%sa2 15:2 +%si12 10:s12 +%ui12 10:12 +%si16 10:s16 +%si20 5:s20 You should only create separate field definitions like this when they are complex: e.g. the logical field is disjoint or there's a need for !function. + +# +# Argument sets +# +_rdrjrk rd rj rk +_rdrjsi12 rd rj si12 +_rdrjrksa2 rd rj rk sa2 +_rdrjsi16 rd rj si16 +_rdrjui12 rd rj ui12 +_rdsi20 rd si20 Some of these should be combined. The width of the immediate is a detail of the format, not the decoded argument set. Thus you should have _rdimm rd imm _rdrjimm rd rj imm _rdrjrk rd rj rk _rdrjrksa rd rj rk sa 'The width of the immediate is a detail of the format' means: _rdrjimm rd rj imm @fmt_rdrjimm .. imm:12 rj:5 rd:5 _rdrjimm @fmt_rdrjimm14 imm:14 rj:5 rd:5 _rdrjimm @fmt_rdrjimm16 .. imm:16 rj:5 rd:5 _rdrjimm and we print in the disassembly, liks this output_rdrjimm(DisasContext *ctx, arg_fmt_rdrjimm * a, const char *mnemonic) { output(ctx, mnemonic, "%s, %s, 0x%x", regnames[a->rd], regnames[a->rj], a->imm); } is that right? +alsl_w 010 .. . . . @fmt_rdrjrksa2 +alsl_wu 011 .. . . . @fmt_rdrjrksa2 +alsl_d 0010 110 .. . . . @fmt_rdrjrksa2 The encoding of these insns is that the shift is sa+1. While you compensate for this in gen_alsl_*, we print the "wrong" number in the disassembly. I think it would be better to do %sa2p1 15:2 !function=plus_1 @fmt_rdrjrksa2p1 ... .. rk:5 rj:5 rd:5 \ _rdrjrksa sa=%sa2p1 1. We print sa in disassembly output_rdrjrksa(DisasContext *ctx, arg_fmt_rdrjsa* a, const char *memonic) { output(ctx, memonic, "%s, %s, %s, 0x0x", regnames[a->rd], regnames[a->rj], a->sa) } 2. We use sa on gen_alsl_* not (sa2+1). 3 bytepick_w use the same print functions. but the Field/Argurment/Format is %sa2 15:2 _rdrjrksa rd rj sa @fmt_rdrjrk sa2 ... sa:2 rk:5 rj:5 rd:5 _rdrjrksa Is my understanding right? Thanks. Song Gao
Re: [PATCH v10 04/26] target/loongarch: Add fixed point arithmetic instruction translation
On 11/12/21 22:05, Richard Henderson wrote: On 11/12/21 7:53 AM, Song Gao wrote: +# +# Fields +# +%rd 0:5 +%rj 5:5 +%rk 10:5 +%sa2 15:2 +%si12 10:s12 +%ui12 10:12 +%si16 10:s16 +%si20 5:s20 You should only create separate field definitions like this when they are complex: e.g. the logical field is disjoint or there's a need for !function. + +# +# Argument sets +# +_rdrjrk rd rj rk +_rdrjsi12 rd rj si12 +_rdrjrksa2 rd rj rk sa2 +_rdrjsi16 rd rj si16 +_rdrjui12 rd rj ui12 +_rdsi20 rd si20 Some of these should be combined. The width of the immediate is a detail of the format, not the decoded argument set. Thus you should have _rdimm rd imm _rdrjimm rd rj imm _rdrjrk rd rj rk _rdrjrksa rd rj rk sa I'd like to add, that the organization of the whole decodetree file closely resembles that of the ISA manual, most likely on purpose (while not stated anywhere in the patch). However the manual itself is not without errors or inconsistencies; for example, the 9 "base instruction formats" classification is nowhere near accurate, and here we can see the author is forced to create ad-hoc names (repeating the operand slots). I suggest just generating the descriptions from the loongarch-opcodes project [1]; no need to duplicate work. I'll happily help if you decide to do that. [1]: https://github.com/loongson-community/loongarch-opcodes +alsl_w 010 .. . . . @fmt_rdrjrksa2 +alsl_wu 011 .. . . . @fmt_rdrjrksa2 +alsl_d 0010 110 .. . . . @fmt_rdrjrksa2 The encoding of these insns is that the shift is sa+1. While you compensate for this in gen_alsl_*, we print the "wrong" number in the disassembly. I think it would be better to do %sa2p1 15:2 !function=plus_1 @fmt_rdrjrksa2p1 ... .. rk:5 rj:5 rd:5 \ _rdrjrksa sa=%sa2p1 Here again, the manual was inconsistent with the binutils implementation; the manual says (for ALSL.W, it's SLADD in loongarch-opcodes project's revised mnemonics): "ALSL.W logically left-shifts rj[31:0] by (sa2+1) bits, [snip]" (translation mine, not copied from the official translation) Clearly the "+1" part is not meant to show up in disassembly. Yet the binutils implementation acts as if the operand should be pre-added 1 in source code, and disassembles and prints as such, obvious mismatch here. I'd suggest fixing the disassembly code to remove this inconsistency. And the "+1" "feature" is not used anywhere else AFAIK, so it wouldn't hurt to just delete everything about it. r~
Re: [PATCH v10 04/26] target/loongarch: Add fixed point arithmetic instruction translation
On 11/12/21 7:53 AM, Song Gao wrote: +# +# Fields +# +%rd 0:5 +%rj 5:5 +%rk 10:5 +%sa2 15:2 +%si1210:s12 +%ui1210:12 +%si1610:s16 +%si205:s20 You should only create separate field definitions like this when they are complex: e.g. the logical field is disjoint or there's a need for !function. + +# +# Argument sets +# +_rdrjrk rd rj rk +_rdrjsi12 rd rj si12 +_rdrjrksa2 rd rj rk sa2 +_rdrjsi16 rd rj si16 +_rdrjui12 rd rj ui12 +_rdsi20 rd si20 Some of these should be combined. The width of the immediate is a detail of the format, not the decoded argument set. Thus you should have _rdimm rd imm _rdrjimm rd rj imm _rdrjrkrd rj rk _rdrjrksa rd rj rk sa +alsl_w 010 .. . . . @fmt_rdrjrksa2 +alsl_wu 011 .. . . . @fmt_rdrjrksa2 +alsl_d 0010 110 .. . . . @fmt_rdrjrksa2 The encoding of these insns is that the shift is sa+1. While you compensate for this in gen_alsl_*, we print the "wrong" number in the disassembly. I think it would be better to do %sa2p1 15:2 !function=plus_1 @fmt_rdrjrksa2p1 ... .. rk:5 rj:5 rd:5 \ _rdrjrksa sa=%sa2p1 r~
[PATCH v10 04/26] target/loongarch: Add fixed point arithmetic instruction translation
This includes: - ADD.{W/D}, SUB.{W/D} - ADDI.{W/D}, ADDU16ID - ALSL.{W[U]/D} - LU12I.W, LU32I.D LU52I.D - SLT[U], SLT[U]I - PCADDI, PCADDU12I, PCADDU18I, PCALAU12I - AND, OR, NOR, XOR, ANDN, ORN - MUL.{W/D}, MULH.{W[U]/D[U]} - MULW.D.W[U] - DIV.{W[U]/D[U]}, MOD.{W[U]/D[U]} - ANDI, ORI, XORI Signed-off-by: Song Gao Signed-off-by: Xiaojuan Yang Reviewed-by: Richard Henderson --- target/loongarch/insn_trans/trans_arith.c.inc | 319 ++ target/loongarch/insns.decode | 88 +++ target/loongarch/translate.c | 78 +++ target/loongarch/translate.h | 19 ++ 4 files changed, 504 insertions(+) create mode 100644 target/loongarch/insn_trans/trans_arith.c.inc create mode 100644 target/loongarch/insns.decode diff --git a/target/loongarch/insn_trans/trans_arith.c.inc b/target/loongarch/insn_trans/trans_arith.c.inc new file mode 100644 index 000..384a158 --- /dev/null +++ b/target/loongarch/insn_trans/trans_arith.c.inc @@ -0,0 +1,319 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (c) 2021 Loongson Technology Corporation Limited + */ + +static bool gen_r3(DisasContext *ctx, arg_fmt_rdrjrk *a, + DisasExtend src1_ext, DisasExtend src2_ext, + DisasExtend dst_ext, void (*func)(TCGv, TCGv, TCGv)) +{ +TCGv dest = gpr_dst(ctx, a->rd, dst_ext); +TCGv src1 = gpr_src(ctx, a->rj, src1_ext); +TCGv src2 = gpr_src(ctx, a->rk, src2_ext); + +func(dest, src1, src2); + +/* dst_ext is EXT_NONE and input is dest, We don't run gen_set_gpr. */ +if (dst_ext) { +gen_set_gpr(a->rd, dest, dst_ext); +} +return true; +} + +static bool gen_r2_si12(DisasContext *ctx, arg_fmt_rdrjsi12 *a, +DisasExtend src_ext, DisasExtend dst_ext, +void (*func)(TCGv, TCGv, TCGv)) +{ +TCGv dest = gpr_dst(ctx, a->rd, dst_ext); +TCGv src1 = gpr_src(ctx, a->rj, src_ext); +TCGv src2 = tcg_constant_tl(a->si12); + +func(dest, src1, src2); + +if (dst_ext) { +gen_set_gpr(a->rd, dest, dst_ext); +} +return true; +} + +static bool gen_r3_sa2(DisasContext *ctx, arg_fmt_rdrjrksa2 *a, + DisasExtend src_ext, DisasExtend dst_ext, + void (*func)(TCGv, TCGv, TCGv, TCGv, target_long)) +{ +TCGv dest = gpr_dst(ctx, a->rd, dst_ext); +TCGv src1 = gpr_src(ctx, a->rj, src_ext); +TCGv src2 = gpr_src(ctx, a->rk, src_ext); +TCGv temp = tcg_temp_new(); + +func(dest, src1, src2, temp, a->sa2); + +if (dst_ext) { +gen_set_gpr(a->rd, dest, dst_ext); +} +tcg_temp_free(temp); +return true; +} + +static bool trans_lu12i_w(DisasContext *ctx, arg_lu12i_w *a) +{ +TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE); + +tcg_gen_movi_tl(dest, a->si20 << 12); +return true; +} + +static bool gen_pc(DisasContext *ctx, arg_fmt_rdsi20 *a, + target_ulong (*func)(target_ulong, int)) +{ +TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE); +target_ulong addr = func(ctx->base.pc_next, a->si20); + +tcg_gen_movi_tl(dest, addr); +return true; +} + +static bool gen_r2_ui12(DisasContext *ctx, arg_fmt_rdrjui12 *a, +void (*func)(TCGv, TCGv, target_long)) +{ +TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE); +TCGv src1 = gpr_src(ctx, a->rj, EXT_NONE); + +func(dest, src1, a->ui12); +return true; +} + +static void gen_slt(TCGv dest, TCGv src1, TCGv src2) +{ +tcg_gen_setcond_tl(TCG_COND_LT, dest, src1, src2); +} + +static void gen_sltu(TCGv dest, TCGv src1, TCGv src2) +{ +tcg_gen_setcond_tl(TCG_COND_LTU, dest, src1, src2); +} + +static void gen_mulh_w(TCGv dest, TCGv src1, TCGv src2) +{ +tcg_gen_mul_i64(dest, src1, src2); +tcg_gen_sari_i64(dest, dest, 32); +} + +static void gen_mulh_wu(TCGv dest, TCGv src1, TCGv src2) +{ +tcg_gen_mul_i64(dest, src1, src2); +tcg_gen_sari_i64(dest, dest, 32); +} + +static void gen_mulh_d(TCGv dest, TCGv src1, TCGv src2) +{ +TCGv discard = tcg_temp_new(); +tcg_gen_muls2_tl(discard, dest, src1, src2); +tcg_temp_free(discard); +} + +static void gen_mulh_du(TCGv dest, TCGv src1, TCGv src2) +{ +TCGv discard = tcg_temp_new(); +tcg_gen_mulu2_tl(discard, dest, src1, src2); +tcg_temp_free(discard); +} + +static void prep_divisor_d(TCGv ret, TCGv src1, TCGv src2) +{ +TCGv t0 = tcg_temp_new(); +TCGv t1 = tcg_temp_new(); +TCGv zero = tcg_constant_tl(0); + +/* + * If min / -1, set the divisor to 1. + * This avoids potential host overflow trap and produces min. + * If x / 0, set the divisor to 1. + * This avoids potential host overflow trap; + * the required result is undefined. + */ +tcg_gen_setcondi_tl(TCG_COND_EQ, ret, src1, INT64_MIN); +tcg_gen_setcondi_tl(TCG_COND_EQ, t0, src2, -1); +tcg_gen_setcondi_tl(TCG_COND_EQ, t1, src2, 0); +tcg_gen_and_tl(ret, ret, t0); +tcg_gen_or_tl(ret, ret,