Re: [PATCH v5 09/30] tcg/loongarch64: Implement tcg_out_mov and tcg_out_movi

2021-09-25 Thread Philippe Mathieu-Daudé

On 9/25/21 16:04, Richard Henderson wrote:

On 9/25/21 5:54 AM, Philippe Mathieu-Daudé wrote:

+    /* High bits must be set; load with lu12i.w + optional ori.  */
+    tcg_target_long hi12 = sextreg(val, 12, 20);



+    tcg_out_opc_lu12i_w(s, rd, hi12);
+    if (lo != 0) {
+    tcg_out_opc_ori(s, rd, rd, lo & 0xfff);


Isn't lo already 12-bit? Why the mask?


lo was extracted signed, for addi; ori wants the value unsigned.
The value range asserts in tcg_out_opc_* will notice the difference.


I indeed missed tcg_target_long, thanks.



Re: [PATCH v5 09/30] tcg/loongarch64: Implement tcg_out_mov and tcg_out_movi

2021-09-25 Thread WANG Xuerui

Hi Philippe,

On 9/25/21 17:54, Philippe Mathieu-Daudé wrote:

On 9/24/21 19:25, WANG Xuerui wrote:

Signed-off-by: WANG Xuerui 
Reviewed-by: Richard Henderson 
---
  tcg/loongarch64/tcg-target.c.inc | 109 +++
  1 file changed, 109 insertions(+)



+/* Loads a 32-bit immediate into rd, sign-extended.  */
+static void tcg_out_movi_i32(TCGContext *s, TCGReg rd, int32_t val)
+{
+    /* Single-instruction cases.  */
+    tcg_target_long lo = sextreg(val, 0, 12);
+    if (lo == val) {
+    /* val fits in simm12: addi.w rd, zero, val */
+    tcg_out_opc_addi_w(s, rd, TCG_REG_ZERO, val);
+    return;
+    }
+    if (0x800 <= val && val <= 0xfff) {
+    /* val fits in uimm12: ori rd, zero, val */
+    tcg_out_opc_ori(s, rd, TCG_REG_ZERO, val);
+    return;
+    }
+
+    /* High bits must be set; load with lu12i.w + optional ori.  */
+    tcg_target_long hi12 = sextreg(val, 12, 20);


Please declare variables in function prologue.

Sure; will fix in v6.


Maybe name lo12 and hi20?
I added ASCII art to hopefully clarify the namings; originally I used 
MIPS R6 terms (low, upper, higher, top) but all the MIPS R6 instructions 
take 16-bit imm, so I figured just naming by bitfield LSB index would be 
best. The Loongson documentation people didn't invent any dedicated name 
for the 4 parts or 3 load-upper instructions, either.



+    tcg_out_opc_lu12i_w(s, rd, hi12);
+    if (lo != 0) {
+    tcg_out_opc_ori(s, rd, rd, lo & 0xfff);


Isn't lo already 12-bit? Why the mask?
As Richard explained, lo is signed while ori takes unsigned imm, so this 
is necessary to not trip up the debug assert and overwrite the opcode.



+    }
+}
+
+static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
+ tcg_target_long val)
+{
+    if (type == TCG_TYPE_I32 || val == (int32_t)val) {
+    tcg_out_movi_i32(s, rd, val);
+    return;
+    }
+
+    /* PC-relative cases.  */
+    intptr_t pc_offset = tcg_pcrel_diff(s, (void *)val);


Declare in prologue.

Ack; will fix.


+    if (pc_offset == sextreg(pc_offset, 0, 22) && (pc_offset & 3) == 
0) {

+    /* Single pcaddu2i.  */
+    tcg_out_opc_pcaddu2i(s, rd, pc_offset >> 2);
+    return;
+    }
+
+    if (pc_offset == (int32_t)pc_offset) {
+    /* Offset within 32 bits; load with pcalau12i + ori. */
+    tcg_target_long lo = sextreg(val, 0, 12);


Name this 'val_lo' similarly to val_hi?
Nice catch; that was leftover from v4 where the tcg_out_movi_i32 logic 
was not factored out. Fixed in v6.



+    tcg_target_long pc_hi = (val - pc_offset) >> 12;
+    tcg_target_long val_hi = val >> 12;
+    tcg_target_long offset_hi = val_hi - pc_hi;
+    tcg_debug_assert(offset_hi == sextreg(offset_hi, 0, 20));
+    tcg_out_opc_pcalau12i(s, rd, offset_hi);
+    if (lo != 0) {
+    tcg_out_opc_ori(s, rd, rd, lo & 0xfff);


Again, lo is already 12-bit.

Same as above.



+    }
+    return;
+    }
+
+    /* Single cu52i.d case.  */
+    if (ctz64(val) >= 52) {
+    tcg_out_opc_cu52i_d(s, rd, TCG_REG_ZERO, val >> 52);
+    return;
+    }
+
+    /* Slow path.  Initialize the low 32 bits, then concat high 
bits.  */

+    tcg_out_movi_i32(s, rd, val);
+
+    bool rd_high_bits_are_ones = (int32_t)val < 0;


Declare in prologue, however this is hard to read. KISS:

   rd_high_bits_are_ones = (int32_t)val < 0 ? true : false;
Hmm, comparison operators return boolean results already; and I thought 
expressions like `foo ? true : false` are typically considered to have 
"bad smell"? I think I'd want to keep the current way of saying 
things... But I'll of course move the declaration to function prologue.



+    tcg_target_long hi32 = sextreg(val, 32, 20);


By 'hi32' I expect the 32 high bits. Maybe explicit as hi32_20?


+    tcg_target_long hi52 = sextreg(val, 52, 12);


And hi52_12?
The names are getting long with addition of bitfield lengths; I hope the 
ASCII art in v6 could alleviate the need for longer names.



+
+    if (imm_part_needs_loading(rd_high_bits_are_ones, hi32)) {
+    tcg_out_opc_cu32i_d(s, rd, hi32);
+    rd_high_bits_are_ones = hi32 < 0;


Again KISS:

   if (hi32 < 0) {
   rd_high_bits_are_ones = true;
   }

As explained by Richard, this is indeed meant to be an unconditional 
overwrite. After concatenating the 51-to-32 bits, the topmost 12 bits is 
changed to be sign-extension of hi32, so the flag must be updated to 
reflect that.

+    }
+
+    if (imm_part_needs_loading(rd_high_bits_are_ones, hi52)) {
+    tcg_out_opc_cu52i_d(s, rd, rd, hi52);
+    }
+}


With comment addressed:
Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v5 09/30] tcg/loongarch64: Implement tcg_out_mov and tcg_out_movi

2021-09-25 Thread Richard Henderson

On 9/25/21 5:54 AM, Philippe Mathieu-Daudé wrote:

+    /* High bits must be set; load with lu12i.w + optional ori.  */
+    tcg_target_long hi12 = sextreg(val, 12, 20);


Please declare variables in function prologue.


Ah, true.  Officially, that's qemu coding style.  I tend to overlook it because I don't 
like the rule.




+    tcg_out_opc_lu12i_w(s, rd, hi12);
+    if (lo != 0) {
+    tcg_out_opc_ori(s, rd, rd, lo & 0xfff);


Isn't lo already 12-bit? Why the mask?


lo was extracted signed, for addi; ori wants the value unsigned.
The value range asserts in tcg_out_opc_* will notice the difference.


+    /* Slow path.  Initialize the low 32 bits, then concat high bits.  */
+    tcg_out_movi_i32(s, rd, val);
+
+    bool rd_high_bits_are_ones = (int32_t)val < 0;


Declare in prologue, however this is hard to read. KISS:


If anything was to change here, I think I'd simply re-use "lo":

lo = (int32_t)val;

and then check lo < 0 rather than rd_high_bits_are_ones as a boolean.


+    rd_high_bits_are_ones = hi32 < 0;


Again KISS:

    if (hi32 < 0) {
    rd_high_bits_are_ones = true;
    }


This is certainly not KISS, and also wrong.  We would want an unconditional write to 
rd_high_bits_are_ones.  Although if there were any change,


lo = hi32.

so that, again, we test lo < 0 for hi52.


r~



Re: [PATCH v5 09/30] tcg/loongarch64: Implement tcg_out_mov and tcg_out_movi

2021-09-25 Thread Philippe Mathieu-Daudé

On 9/24/21 19:25, WANG Xuerui wrote:

Signed-off-by: WANG Xuerui 
Reviewed-by: Richard Henderson 
---
  tcg/loongarch64/tcg-target.c.inc | 109 +++
  1 file changed, 109 insertions(+)



+/* Loads a 32-bit immediate into rd, sign-extended.  */
+static void tcg_out_movi_i32(TCGContext *s, TCGReg rd, int32_t val)
+{
+/* Single-instruction cases.  */
+tcg_target_long lo = sextreg(val, 0, 12);
+if (lo == val) {
+/* val fits in simm12: addi.w rd, zero, val */
+tcg_out_opc_addi_w(s, rd, TCG_REG_ZERO, val);
+return;
+}
+if (0x800 <= val && val <= 0xfff) {
+/* val fits in uimm12: ori rd, zero, val */
+tcg_out_opc_ori(s, rd, TCG_REG_ZERO, val);
+return;
+}
+
+/* High bits must be set; load with lu12i.w + optional ori.  */
+tcg_target_long hi12 = sextreg(val, 12, 20);


Please declare variables in function prologue.

Maybe name lo12 and hi20?


+tcg_out_opc_lu12i_w(s, rd, hi12);
+if (lo != 0) {
+tcg_out_opc_ori(s, rd, rd, lo & 0xfff);


Isn't lo already 12-bit? Why the mask?


+}
+}
+
+static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
+ tcg_target_long val)
+{
+if (type == TCG_TYPE_I32 || val == (int32_t)val) {
+tcg_out_movi_i32(s, rd, val);
+return;
+}
+
+/* PC-relative cases.  */
+intptr_t pc_offset = tcg_pcrel_diff(s, (void *)val);


Declare in prologue.


+if (pc_offset == sextreg(pc_offset, 0, 22) && (pc_offset & 3) == 0) {
+/* Single pcaddu2i.  */
+tcg_out_opc_pcaddu2i(s, rd, pc_offset >> 2);
+return;
+}
+
+if (pc_offset == (int32_t)pc_offset) {
+/* Offset within 32 bits; load with pcalau12i + ori.  */
+tcg_target_long lo = sextreg(val, 0, 12);


Name this 'val_lo' similarly to val_hi?


+tcg_target_long pc_hi = (val - pc_offset) >> 12;
+tcg_target_long val_hi = val >> 12;
+tcg_target_long offset_hi = val_hi - pc_hi;
+tcg_debug_assert(offset_hi == sextreg(offset_hi, 0, 20));
+tcg_out_opc_pcalau12i(s, rd, offset_hi);
+if (lo != 0) {
+tcg_out_opc_ori(s, rd, rd, lo & 0xfff);


Again, lo is already 12-bit.


+}
+return;
+}
+
+/* Single cu52i.d case.  */
+if (ctz64(val) >= 52) {
+tcg_out_opc_cu52i_d(s, rd, TCG_REG_ZERO, val >> 52);
+return;
+}
+
+/* Slow path.  Initialize the low 32 bits, then concat high bits.  */
+tcg_out_movi_i32(s, rd, val);
+
+bool rd_high_bits_are_ones = (int32_t)val < 0;


Declare in prologue, however this is hard to read. KISS:

   rd_high_bits_are_ones = (int32_t)val < 0 ? true : false;


+tcg_target_long hi32 = sextreg(val, 32, 20);


By 'hi32' I expect the 32 high bits. Maybe explicit as hi32_20?


+tcg_target_long hi52 = sextreg(val, 52, 12);


And hi52_12?


+
+if (imm_part_needs_loading(rd_high_bits_are_ones, hi32)) {
+tcg_out_opc_cu32i_d(s, rd, hi32);
+rd_high_bits_are_ones = hi32 < 0;


Again KISS:

   if (hi32 < 0) {
   rd_high_bits_are_ones = true;
   }


+}
+
+if (imm_part_needs_loading(rd_high_bits_are_ones, hi52)) {
+tcg_out_opc_cu52i_d(s, rd, rd, hi52);
+}
+}


With comment addressed:
Reviewed-by: Philippe Mathieu-Daudé 



[PATCH v5 09/30] tcg/loongarch64: Implement tcg_out_mov and tcg_out_movi

2021-09-24 Thread WANG Xuerui
Signed-off-by: WANG Xuerui 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target.c.inc | 109 +++
 1 file changed, 109 insertions(+)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 8f7c556c37..f1d0047f5b 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -247,6 +247,113 @@ static void tcg_out_mb(TCGContext *s, TCGArg a0)
 tcg_out_opc_dbar(s, 0);
 }
 
+static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
+{
+if (ret == arg) {
+return true;
+}
+switch (type) {
+case TCG_TYPE_I32:
+case TCG_TYPE_I64:
+/*
+ * Conventional register-register move used in LoongArch is
+ * `or dst, src, zero`.
+ */
+tcg_out_opc_or(s, ret, arg, TCG_REG_ZERO);
+break;
+default:
+g_assert_not_reached();
+}
+return true;
+}
+
+static bool imm_part_needs_loading(bool high_bits_are_ones,
+   tcg_target_long part)
+{
+if (high_bits_are_ones) {
+return part != -1;
+} else {
+return part != 0;
+}
+}
+
+/* Loads a 32-bit immediate into rd, sign-extended.  */
+static void tcg_out_movi_i32(TCGContext *s, TCGReg rd, int32_t val)
+{
+/* Single-instruction cases.  */
+tcg_target_long lo = sextreg(val, 0, 12);
+if (lo == val) {
+/* val fits in simm12: addi.w rd, zero, val */
+tcg_out_opc_addi_w(s, rd, TCG_REG_ZERO, val);
+return;
+}
+if (0x800 <= val && val <= 0xfff) {
+/* val fits in uimm12: ori rd, zero, val */
+tcg_out_opc_ori(s, rd, TCG_REG_ZERO, val);
+return;
+}
+
+/* High bits must be set; load with lu12i.w + optional ori.  */
+tcg_target_long hi12 = sextreg(val, 12, 20);
+tcg_out_opc_lu12i_w(s, rd, hi12);
+if (lo != 0) {
+tcg_out_opc_ori(s, rd, rd, lo & 0xfff);
+}
+}
+
+static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
+ tcg_target_long val)
+{
+if (type == TCG_TYPE_I32 || val == (int32_t)val) {
+tcg_out_movi_i32(s, rd, val);
+return;
+}
+
+/* PC-relative cases.  */
+intptr_t pc_offset = tcg_pcrel_diff(s, (void *)val);
+if (pc_offset == sextreg(pc_offset, 0, 22) && (pc_offset & 3) == 0) {
+/* Single pcaddu2i.  */
+tcg_out_opc_pcaddu2i(s, rd, pc_offset >> 2);
+return;
+}
+
+if (pc_offset == (int32_t)pc_offset) {
+/* Offset within 32 bits; load with pcalau12i + ori.  */
+tcg_target_long lo = sextreg(val, 0, 12);
+tcg_target_long pc_hi = (val - pc_offset) >> 12;
+tcg_target_long val_hi = val >> 12;
+tcg_target_long offset_hi = val_hi - pc_hi;
+tcg_debug_assert(offset_hi == sextreg(offset_hi, 0, 20));
+tcg_out_opc_pcalau12i(s, rd, offset_hi);
+if (lo != 0) {
+tcg_out_opc_ori(s, rd, rd, lo & 0xfff);
+}
+return;
+}
+
+/* Single cu52i.d case.  */
+if (ctz64(val) >= 52) {
+tcg_out_opc_cu52i_d(s, rd, TCG_REG_ZERO, val >> 52);
+return;
+}
+
+/* Slow path.  Initialize the low 32 bits, then concat high bits.  */
+tcg_out_movi_i32(s, rd, val);
+
+bool rd_high_bits_are_ones = (int32_t)val < 0;
+tcg_target_long hi32 = sextreg(val, 32, 20);
+tcg_target_long hi52 = sextreg(val, 52, 12);
+
+if (imm_part_needs_loading(rd_high_bits_are_ones, hi32)) {
+tcg_out_opc_cu32i_d(s, rd, hi32);
+rd_high_bits_are_ones = hi32 < 0;
+}
+
+if (imm_part_needs_loading(rd_high_bits_are_ones, hi52)) {
+tcg_out_opc_cu52i_d(s, rd, rd, hi52);
+}
+}
+
 /*
  * Entry-points
  */
@@ -262,6 +369,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 tcg_out_mb(s, a0);
 break;
 
+case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
+case INDEX_op_mov_i64:
 default:
 g_assert_not_reached();
 }
-- 
2.33.0