Re: [PATCH 04/15] arc: TCG and decoder glue code and helpers
Ok, enter and leave will officially get to be TCG code. To be honest initially we thought that helper code would be preferable to TCG one. Apparently we were wrong. :-) Thanks for your quick feedback. On 1/15/21 9:53 PM, Richard Henderson wrote: > On 1/15/21 11:48 AM, Cupertino Miranda wrote: >>> In the case of enter or leave, this is one load/store plus one addition, >>> followed by a branch. All of which is encoded as fields in the instruction. >>> Extremely simple. >> >> So your recommendation is leave the conditional exception triggering of >> enter and leave in a helper and move the loads/stores to tcg ? > > What? No. > > > r~ >
Re: [PATCH 04/15] arc: TCG and decoder glue code and helpers
On 1/15/21 11:48 AM, Cupertino Miranda wrote: >> In the case of enter or leave, this is one load/store plus one addition, >> followed by a branch. All of which is encoded as fields in the instruction. >> Extremely simple. > > So your recommendation is leave the conditional exception triggering of > enter and leave in a helper and move the loads/stores to tcg ? What? No. r~
Re: [PATCH 04/15] arc: TCG and decoder glue code and helpers
On 1/15/21 11:28 AM, Shahab Vahedi wrote: >>> +cpu_stl_data(env, tmp_sp, CPU_FP(env)); >>> +} >> >> And what if these stores raise an exception? I doubt you're going to get an >> exception at the correct pc. > > I've added a few bad-weather test cases [1] and they work as expected. Indeed, > none of those tests trigger an exception during the "cpu_stl_data()". Could > you > elaborate why you think the PC might be incorrect? Then I can add the > corresponding > tests and fix the behavior. Because you're using cpu_stl_data_ra, with GETPC, if the store faults (e.g. SIGSEGV) then the exception unwind will not be done. This will happen to work ok if and only if "enter" is the first insn of the TB. >> In the case of enter or leave, this is one load/store plus one addition, >> followed by a branch. All of which is encoded as fields in the instruction. >> Extremely simple. > > You're suggesting that "enter/leave" should use TCG opcodes instead of > helpers? If yes, do you really think it is possible to implement each with ~10 > opcodes? More or less. Two per registers stored, plus two moves. It looks like the limit of registers is either 3 or 14, depending on the cpu configuration. Certainly this is no different from other "push multiple" type of instructions of other architectures, which do exactly this. r~
Re: [PATCH 04/15] arc: TCG and decoder glue code and helpers
On 1/15/21 8:31 PM, Richard Henderson wrote: > On 1/15/21 7:11 AM, Cupertino Miranda wrote: >>> Similarly. I think that both of these could be implemented entirely in >>> translate, which is what >>> +bool restore_fp= u7 & 0x10; /* u[4] indicates if fp must be saved */ +bool restore_blink = u7 & 0x20; /* u[5] indicates saving of blink */ +bool jump_to_blink = u7 & 0x40; /* u[6] should we jump to blink? */ >>> >>> these bits strongly imply. >>> >> >> For lack of knowing better, it is unclear to me where to draw the line >> when choosing between a translate time (tcg) or helper implementation. >> Your suggestions for carry/overflow computation are sharp and we should >> have never used an helper, however I wonder what would be the benefit of >> implementing enter and leave through TCG. >> >> We have dealt with those exception issues by just changing SP in the end >> of the instruction implementation, when no exceptions can happen. > > 5-10 tcg opcodes is the rule of thumb. A conditional exception (requiring a > branch) is a good reason to put the whole thing out of line. > > In the case of enter or leave, this is one load/store plus one addition, > followed by a branch. All of which is encoded as fields in the instruction. > Extremely simple. So your recommendation is leave the conditional exception triggering of enter and leave in a helper and move the loads/stores to tcg ? > >> As far as I understand when an exception happens in the middle of the >> helper or even on a TCG implementation, it jumps out of that TB >> execution to deal with the exception. On rtie instead of it returning to >> the same tcg_ld or tcg_st where it actually triggered the exception it >> will re-decode the same instruction which triggered the exception, and >> re-attempts to execute it. >> Is that the case in current TCG implementation, or did it improved and >> it is now able to return to previous execution flow (i.e translation >> block) ? > > I think I don't understand your question. > > An exception leaves the TB, via longjmp. Before the longjmp, there is > normally > an "unwind" or "restore" operation, to sync the cpu state with the middle of > the TB. This happens in restore_state_to_opc(). > > When processing of the exception is complete, execution will continue with the > appropriate cpu state. Which will probably be a new TB that (logically) > partially overlaps the previous TB. > > I.e. everything will work as you'd expect. > > So... what's the question? > You answered the question. That is exactly how I understand it works. > > r~ >
Re: [PATCH 04/15] arc: TCG and decoder glue code and helpers
Hi Richard, On 12/1/20 10:35 PM, Richard Henderson wrote: > On 11/11/20 10:17 AM, cupertinomira...@gmail.com wrote: >> From: Cupertino Miranda >> +void helper_enter(CPUARCState *env, uint32_t u6) >> +{ >> +/* nothing to do? then bye-bye! */ >> +if (!u6) { >> +return; >> +} >> + >> +uint8_t regs = u6 & 0x0f; /* u[3:0] determines registers to save >> */ >> +boolsave_fp= u6 & 0x10; /* u[4] indicates if fp must be saved >> */ >> +boolsave_blink = u6 & 0x20; /* u[5] indicates saving of blink >> */ >> +uint8_t stack_size = 4 * (regs + save_fp + save_blink); >> + >> +/* number of regs to be saved must be sane */ >> +check_enter_leave_nr_regs(env, regs, GETPC()); > > Both of these checks could be translate time. > >> +/* this cannot be executed in a delay/execution slot */ >> +check_delay_or_execution_slot(env, GETPC()); > > As could this. > >> +/* stack must be a multiple of 4 (32 bit aligned) */ >> +check_addr_is_word_aligned(env, CPU_SP(env) - stack_size, GETPC()); >> + >> +uint32_t tmp_sp = CPU_SP(env); >> + >> +if (save_fp) { >> +tmp_sp -= 4; >> +cpu_stl_data(env, tmp_sp, CPU_FP(env)); >> +} > > And what if these stores raise an exception? I doubt you're going to get an > exception at the correct pc. I've added a few bad-weather test cases [1] and they work as expected. Indeed, none of those tests trigger an exception during the "cpu_stl_data()". Could you elaborate why you think the PC might be incorrect? Then I can add the corresponding tests and fix the behavior. [1] https://github.com/foss-for-synopsys-dwc-arc-processors/qemu/blob/master/tests/tcg/arc/check_enter_leave.S#L227 > 5-10 tcg opcodes is the rule of thumb. A conditional exception (requiring a > branch) is a good reason to put the whole thing out of line. > > In the case of enter or leave, this is one load/store plus one addition, > followed by a branch. All of which is encoded as fields in the instruction. > Extremely simple. You're suggesting that "enter/leave" should use TCG opcodes instead of helpers? If yes, do you really think it is possible to implement each with ~10 opcodes? -- Shahab
Re: [PATCH 04/15] arc: TCG and decoder glue code and helpers
On 1/15/21 7:11 AM, Cupertino Miranda wrote: >> Similarly. I think that both of these could be implemented entirely in >> translate, which is what >> >>> +bool restore_fp= u7 & 0x10; /* u[4] indicates if fp must be saved >>> */ >>> +bool restore_blink = u7 & 0x20; /* u[5] indicates saving of blink >>> */ >>> +bool jump_to_blink = u7 & 0x40; /* u[6] should we jump to blink? >>> */ >> >> these bits strongly imply. >> > > For lack of knowing better, it is unclear to me where to draw the line > when choosing between a translate time (tcg) or helper implementation. > Your suggestions for carry/overflow computation are sharp and we should > have never used an helper, however I wonder what would be the benefit of > implementing enter and leave through TCG. > > We have dealt with those exception issues by just changing SP in the end > of the instruction implementation, when no exceptions can happen. 5-10 tcg opcodes is the rule of thumb. A conditional exception (requiring a branch) is a good reason to put the whole thing out of line. In the case of enter or leave, this is one load/store plus one addition, followed by a branch. All of which is encoded as fields in the instruction. Extremely simple. > As far as I understand when an exception happens in the middle of the > helper or even on a TCG implementation, it jumps out of that TB > execution to deal with the exception. On rtie instead of it returning to > the same tcg_ld or tcg_st where it actually triggered the exception it > will re-decode the same instruction which triggered the exception, and > re-attempts to execute it. > Is that the case in current TCG implementation, or did it improved and > it is now able to return to previous execution flow (i.e translation > block) ? I think I don't understand your question. An exception leaves the TB, via longjmp. Before the longjmp, there is normally an "unwind" or "restore" operation, to sync the cpu state with the middle of the TB. This happens in restore_state_to_opc(). When processing of the exception is complete, execution will continue with the appropriate cpu state. Which will probably be a new TB that (logically) partially overlaps the previous TB. I.e. everything will work as you'd expect. So... what's the question? r~
Re: [PATCH 04/15] arc: TCG and decoder glue code and helpers
>> +void QEMU_NORETURN helper_halt(CPUARCState *env, uint32_t npc) >> +{ >> +CPUState *cs = env_cpu(env); >> +if (env->stat.Uf) { >> +cs->exception_index = EXCP_PRIVILEGEV; >> +env->causecode = 0; >> +env->param = 0; >> + /* Restore PC such that we point at the faulty instruction. */ >> +env->eret = env->pc; > > Any reason not to handle Uf at translate time? Or at least create a single > helper function for that here. But it seems like translate will have to do a > lot of priv checking anyway and will already have that handy. Since we needed a helper anyway to deal with causecode and param, we thought it would be reasonable to do all in the helper. We did not made a TCG access for causecode and param enviroment values. > >> +void helper_enter(CPUARCState *env, uint32_t u6) >> +{ >> +/* nothing to do? then bye-bye! */ >> +if (!u6) { >> +return; >> +} >> + >> +uint8_t regs = u6 & 0x0f; /* u[3:0] determines registers to save >> */ >> +boolsave_fp= u6 & 0x10; /* u[4] indicates if fp must be saved >> */ >> +boolsave_blink = u6 & 0x20; /* u[5] indicates saving of blink >> */ >> +uint8_t stack_size = 4 * (regs + save_fp + save_blink); >> + >> +/* number of regs to be saved must be sane */ >> +check_enter_leave_nr_regs(env, regs, GETPC()); > > Both of these checks could be translate time. > >> +/* this cannot be executed in a delay/execution slot */ >> +check_delay_or_execution_slot(env, GETPC()); > > As could this. > >> +/* stack must be a multiple of 4 (32 bit aligned) */ >> +check_addr_is_word_aligned(env, CPU_SP(env) - stack_size, GETPC()); >> + >> +uint32_t tmp_sp = CPU_SP(env); >> + >> +if (save_fp) { >> +tmp_sp -= 4; >> +cpu_stl_data(env, tmp_sp, CPU_FP(env)); >> +} > > And what if these stores raise an exception? I doubt you're going to get an > exception at the correct pc. > >> +void helper_leave(CPUARCState *env, uint32_t u7) > > Similarly. I think that both of these could be implemented entirely in > translate, which is what > >> +bool restore_fp= u7 & 0x10; /* u[4] indicates if fp must be saved >> */ >> +bool restore_blink = u7 & 0x20; /* u[5] indicates saving of blink >> */ >> +bool jump_to_blink = u7 & 0x40; /* u[6] should we jump to blink? >> */ > > these bits strongly imply. > For lack of knowing better, it is unclear to me where to draw the line when choosing between a translate time (tcg) or helper implementation. Your suggestions for carry/overflow computation are sharp and we should have never used an helper, however I wonder what would be the benefit of implementing enter and leave through TCG. We have dealt with those exception issues by just changing SP in the end of the instruction implementation, when no exceptions can happen. As far as I understand when an exception happens in the middle of the helper or even on a TCG implementation, it jumps out of that TB execution to deal with the exception. On rtie instead of it returning to the same tcg_ld or tcg_st where it actually triggered the exception it will re-decode the same instruction which triggered the exception, and re-attempts to execute it. Is that the case in current TCG implementation, or did it improved and it is now able to return to previous execution flow (i.e translation block) ?
Re: [PATCH 04/15] arc: TCG and decoder glue code and helpers
On 11/11/20 10:17 AM, cupertinomira...@gmail.com wrote: > From: Cupertino Miranda > > Signed-off-by: Cupertino Miranda > --- > target/arc/extra_mapping.def | 40 ++ > target/arc/helper.c| 293 + > target/arc/helper.h| 46 ++ > target/arc/op_helper.c | 749 + > target/arc/semfunc_mapping.def | 329 +++ > 5 files changed, 1457 insertions(+) > create mode 100644 target/arc/extra_mapping.def > create mode 100644 target/arc/helper.c > create mode 100644 target/arc/helper.h > create mode 100644 target/arc/op_helper.c > create mode 100644 target/arc/semfunc_mapping.def Not an ideal patch split, since nothing uses the def files at this point. But looking forward to the end product, they seem to be exactly what I was talking about re string manipulation vs patch 3. In particular, arc_map_opcode should be done at qemu build/compile time. And not for nothing, a linear search through strings during translation is really beyond the pale. > +#if defined(CONFIG_USER_ONLY) > + > +void arc_cpu_do_interrupt(CPUState *cs) > +{ > +ARCCPU *cpu = ARC_CPU(cs); > +CPUARCState *env = &cpu->env; > + > +cs->exception_index = -1; > +CPU_ILINK(env) = env->pc; > +} There are no user-only interrupts. > +/* > + * NOTE: Special LP_END exception. Immediatelly return code execution to Immediately. > +/* 15. The PC is set with the appropriate exception vector. */ > +env->pc = cpu_ldl_code(env, env->intvec + offset); > +CPU_PCL(env) = env->pc & 0xfffe; You should be using address_space_ldl, and handling any error. As it is, if this load fails you'll get another interrupt, bringing you right back here, etc. > +DEF_HELPER_1(debug, void, env) > +DEF_HELPER_2(norm, i32, env, i32) > +DEF_HELPER_2(normh, i32, env, i32) > +DEF_HELPER_2(ffs, i32, env, i32) > +DEF_HELPER_2(fls, i32, env, i32) > +DEF_HELPER_2(lr, tl, env, i32) > +DEF_HELPER_3(sr, void, env, i32, i32) > +DEF_HELPER_2(halt, noreturn, env, i32) > +DEF_HELPER_1(rtie, void, env) > +DEF_HELPER_1(flush, void, env) > +DEF_HELPER_4(raise_exception, noreturn, env, i32, i32, i32) > +DEF_HELPER_2(zol_verify, void, env, i32) > +DEF_HELPER_2(fake_exception, void, env, i32) > +DEF_HELPER_2(set_status32, void, env, i32) > +DEF_HELPER_1(get_status32, i32, env) > +DEF_HELPER_3(carry_add_flag, i32, i32, i32, i32) > +DEF_HELPER_3(overflow_add_flag, i32, i32, i32, i32) > +DEF_HELPER_3(overflow_sub_flag, i32, i32, i32, i32) > + > +DEF_HELPER_2(enter, void, env, i32) > +DEF_HELPER_2(leave, void, env, i32) > + > +DEF_HELPER_3(mpymu, i32, env, i32, i32) > +DEF_HELPER_3(mpym, i32, env, i32, i32) > + > +DEF_HELPER_3(repl_mask, i32, i32, i32, i32) Use DEF_HELPER_FLAGS_* when possible. > +target_ulong helper_norm(CPUARCState *env, uint32_t src1) > +{ > +int i; > +int32_t tmp = (int32_t) src1; > +if (tmp == 0 || tmp == -1) { > +return 0; > +} > +for (i = 0; i <= 31; i++) { > +if ((tmp >> i) == 0) { > +break; > +} > +if ((tmp >> i) == -1) { > +break; > +} > +} > +return i; > +} The spec says 0/-1 -> 0x1f, not 0. That said, this appears to be clrsb32(src1), which could also be expanded inline with two uses of tcg_gen_clz_i32. > +target_ulong helper_normh(CPUARCState *env, uint32_t src1) > +{ > +int i; > +for (i = 0; i <= 15; i++) { > +if (src1 >> i == 0) { > +break; > +} > +if (src1 >> i == -1) { > +break; > +} > +} > +return i; > +} Similarly. > + > +target_ulong helper_ffs(CPUARCState *env, uint32_t src) > +{ > +int i; > +if (src == 0) { > +return 31; > +} > +for (i = 0; i <= 31; i++) { > +if (((src >> i) & 1) != 0) { > +break; > +} > +} > +return i; > +} This should use tcg_gen_ctz_i32. > +target_ulong helper_fls(CPUARCState *env, uint32_t src) > +{ > +int i; > +if (src == 0) { > +return 0; > +} > + > +for (i = 31; i >= 0; i--) { > +if (((src >> i) & 1) != 0) { > +break; > +} > +} > +return i; > +} This should use tcg_gen_clz_i32. > + > +static void report_aux_reg_error(uint32_t aux) > +{ > +if (((aux >= ARC_BCR1_START) && (aux <= ARC_BCR1_END)) || > +((aux >= ARC_BCR2_START) && (aux <= ARC_BCR2_END))) { > +qemu_log_mask(LOG_UNIMP, "Undefined BCR 0x%03x\n", aux); > +} > + > +error_report("Undefined AUX register 0x%03x, aborting", aux); > +exit(EXIT_FAILURE); > +} Do not exit on failure, or error_report for that matter -- raise an exception. Or... > +void helper_sr(CPUARCState *env, uint32_t val, uint32_t aux) > +{ > +struct arc_aux_reg_detail *aux_reg_detail = > +arc_aux_reg_struct_for_address(aux, ARC_OPCODE_ARCv2HS); > + > +if (aux_reg_detail == NULL) { > +report_aux_reg_error(aux); > +} ... on the off-chance
[PATCH 04/15] arc: TCG and decoder glue code and helpers
From: Cupertino Miranda Signed-off-by: Cupertino Miranda --- target/arc/extra_mapping.def | 40 ++ target/arc/helper.c| 293 + target/arc/helper.h| 46 ++ target/arc/op_helper.c | 749 + target/arc/semfunc_mapping.def | 329 +++ 5 files changed, 1457 insertions(+) create mode 100644 target/arc/extra_mapping.def create mode 100644 target/arc/helper.c create mode 100644 target/arc/helper.h create mode 100644 target/arc/op_helper.c create mode 100644 target/arc/semfunc_mapping.def diff --git a/target/arc/extra_mapping.def b/target/arc/extra_mapping.def new file mode 100644 index 00..1387d7d483 --- /dev/null +++ b/target/arc/extra_mapping.def @@ -0,0 +1,40 @@ +/* + * QEMU ARC EXTRA MAPPING + * + * Copyright (c) 2020 Synopsys Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see + * http://www.gnu.org/licenses/lgpl-2.1.html + */ + +SEMANTIC_FUNCTION(SWI, 1) +SEMANTIC_FUNCTION(SWI, 1) +SEMANTIC_FUNCTION(UNIMP, 0) +SEMANTIC_FUNCTION(RTIE, 0) +SEMANTIC_FUNCTION(SLEEP, 1) + +MAPPING(swi, SWI, 0) +CONSTANT(SWI, swi_s, 0, 0) +MAPPING(swi_s, SWI, 1, 0) +MAPPING(trap_s, TRAP, 1, 0) +MAPPING(rtie, RTIE, 0) +MAPPING(sleep, SLEEP, 1, 0) +MAPPING(vadd2, VADD, 3, 0, 1, 2) +MAPPING(vadd2h, VADD, 3, 0, 1, 2) +MAPPING(vadd4h, VADD, 3, 0, 1, 2) +MAPPING(vsub2, VSUB, 3, 0, 1, 2) +MAPPING(vsub2h, VSUB, 3, 0, 1, 2) +MAPPING(vsub4h, VSUB, 3, 0, 1, 2) +MAPPING(mpyd, MPYD, 3, 0, 1, 2) +MAPPING(mpydu, MPYD, 3, 0, 1, 2) diff --git a/target/arc/helper.c b/target/arc/helper.c new file mode 100644 index 00..aca7152ef8 --- /dev/null +++ b/target/arc/helper.c @@ -0,0 +1,293 @@ +/* + * QEMU ARC CPU + * + * Copyright (c) 2020 Synppsys Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see + * http://www.gnu.org/licenses/lgpl-2.1.html + */ + +#include "qemu/osdep.h" + +#include "cpu.h" +#include "hw/irq.h" +#include "include/hw/sysbus.h" +#include "include/sysemu/sysemu.h" +#include "qemu/qemu-print.h" +#include "exec/exec-all.h" +#include "exec/cpu_ldst.h" +#include "qemu/host-utils.h" +#include "exec/helper-proto.h" +#include "irq.h" + +#if defined(CONFIG_USER_ONLY) + +void arc_cpu_do_interrupt(CPUState *cs) +{ +ARCCPU *cpu = ARC_CPU(cs); +CPUARCState *env = &cpu->env; + +cs->exception_index = -1; +CPU_ILINK(env) = env->pc; +} + +#else /* !CONFIG_USER_ONLY */ + +void arc_cpu_do_interrupt(CPUState *cs) +{ +ARCCPU *cpu= ARC_CPU(cs); +CPUARCState *env= &cpu->env; +uint32_t offset = 0; +uint32_t vectno; +const char *name; + +/* + * NOTE: Special LP_END exception. Immediatelly return code execution to + * lp_start. + * Now also used for delayslot MissI cases. + * This special exception should not execute any of the exception + * handling code. Instead it returns immediately after setting PC to the + * address passed as exception parameter. + */ +if (cs->exception_index == EXCP_LPEND_REACHED +|| cs->exception_index == EXCP_FAKE) { +env->pc = env->param; +CPU_PCL(env) = env->pc & 0xfffe; +return; +} + +/* If we take an exception within an exception => fatal Machine Check. */ +if (env->stat.AEf == 1) { +cs->exception_index = EXCP_MACHINE_CHECK; +env->causecode = 0; +env->param = 0; +env->mmu.enabled = false; /* no more MMU */ +env->mpu.enabled = false; /* no more MPU */ +} +vectno = cs->exception_index & 0x0F; +offset = vectno << 2; + +/* Generic computation for exceptions. */ +switch (cs->exception_index) { +case EXCP_RESET: +name = "Reset"; +break; +case EXCP_MEMORY_ERROR: +name = "Memory Error"; +break; +