On Wed, Apr 27, 2011 at 12:07 AM, Blue Swirl <blauwir...@gmail.com> wrote: > On Tue, Apr 26, 2011 at 10:07 PM, Igor Kovalenko > <igor.v.kovale...@gmail.com> wrote: >> On Tue, Apr 26, 2011 at 10:36 PM, Blue Swirl <blauwir...@gmail.com> wrote: >>> On Tue, Apr 26, 2011 at 8:02 PM, Artyom Tarasenko <atar4q...@gmail.com> >>> wrote: >>>> On Mon, Apr 25, 2011 at 10:29 PM, Aurelien Jarno <aurel...@aurel32.net> >>>> wrote: >>>>> On Fri, Apr 22, 2011 at 06:14:06PM +0400, Igor Kovalenko wrote: >>>>>> On Fri, Apr 22, 2011 at 2:39 AM, Laurent Desnogues >>>>>> <laurent.desnog...@gmail.com> wrote: >>>>>> > On Thu, Apr 21, 2011 at 9:45 PM, Igor Kovalenko >>>>>> > <igor.v.kovale...@gmail.com> wrote: >>>>>> >> On Thu, Apr 21, 2011 at 7:44 PM, Laurent Desnogues >>>>>> >> <laurent.desnog...@gmail.com> wrote: >>>>>> >>> On Thu, Apr 21, 2011 at 4:57 PM, Artyom Tarasenko >>>>>> >>> <atar4q...@gmail.com> wrote: >>>>>> >>>> On Tue, Apr 12, 2011 at 4:14 AM, Igor Kovalenko >>>>>> >>>> <igor.v.kovale...@gmail.com> wrote: >>>>>> >>>>>>> Do you have public test case? >>>>>> >>>>>>> It is possible to code this delay slot write test but real issue >>>>>> >>>>>>> may >>>>>> >>>>>>> be corruption elsewhere. >>>>>> >>>> >>>>>> >>>> The test case is trivial: it's just the two instructions, branch >>>>>> >>>> and wrpr. >>>>>> >>>> >>>>>> >>>>> In theory there could be multiple issues including compiler >>>>>> >>>>> induced ones. >>>>>> >>>>> I'd prefer to see some kind of reproducible testcase. >>>>>> >>>> >>>>>> >>>> Ok, attached a 40 byte long test (the first 32 bytes are not used >>>>>> >>>> and >>>>>> >>>> needed only because the bios entry point is 0x20). >>>>>> >>>> >>>>>> >>>> $ git pull && make && sparc64-softmmu/qemu-system-sparc64 -bios >>>>>> >>>> test-wrpr.bin -nographic >>>>>> >>>> Already up-to-date. >>>>>> >>>> make[1]: Nothing to be done for `all'. >>>>>> >>>> /mnt/terra/projects/vanilla/qemu/tcg/tcg.c:1892: tcg fatal error >>>>>> >>>> Aborted >>>>>> >>> >>>>>> >>> The problem seems to be that wrpr is using a non-local >>>>>> >>> TCG tmp (cpu_tmp0). >>>>>> >> >>>>>> >> Just tried the test case with write to %pil - seems like write itself >>>>>> >> is OK. >>>>>> >> The issue appears to be with save_state() call since adding save_state >>>>>> >> to %pil case provokes the same tcg abort. >>>>>> > >>>>>> > The problem is that cpu_tmp0, not being a local tmp, doesn't >>>>>> > need to be saved across helper calls. This results in the >>>>>> > TCG "optimizer" getting rid of it even though it's later used. >>>>>> > Look at the log and you'll see what I mean :-) >>>>>> >>>>>> I'm not very comfortable with tcg yet. Would it be possible to teach >>>>>> optimizer working with delay slots? Or do I look in the wrong place. >>>>>> >>>>> >>>>> The problem is not on the TCG side, but on the target-sparc/translate.c >>>>> side: >>>>> >>>>> | case 0x32: /* wrwim, V9 wrpr */ >>>>> | { >>>>> | if (!supervisor(dc)) >>>>> | goto priv_insn; >>>>> | tcg_gen_xor_tl(cpu_tmp0, cpu_src1, >>>>> cpu_src2); >>>>> | #ifdef TARGET_SPARC64 >>>>> >>>>> Here cpu_tmp0 is loaded. cpu_tmp0 is a TCG temp, which means it is not >>>>> saved across TCG branches. >>>>> >>>>> [...] >>>>> >>>>> | case 6: // pstate >>>>> | save_state(dc, cpu_cond); >>>>> | gen_helper_wrpstate(cpu_tmp0); >>>>> | dc->npc = DYNAMIC_PC; >>>>> | break; >>>>> >>>>> save_state() calls save_npc(), which in turns might call >>>>> gen_generic_branch(): >>>> >>>> Hmm. This is not the only instruction using save_state() and cpu_tmp0. >>>> At least ldd is another example. >>>> >>>>> | static inline void gen_generic_branch(target_ulong npc1, target_ulong >>>>> npc2, >>>>> | TCGv r_cond) >>>>> | { >>>>> | int l1, l2; >>>>> | >>>>> | l1 = gen_new_label(); >>>>> | l2 = gen_new_label(); >>>>> | >>>>> | tcg_gen_brcondi_tl(TCG_COND_EQ, r_cond, 0, l1); >>>>> | >>>>> | tcg_gen_movi_tl(cpu_npc, npc1); >>>>> | tcg_gen_br(l2); >>>>> | >>>>> | gen_set_label(l1); >>>>> | tcg_gen_movi_tl(cpu_npc, npc2); >>>>> | gen_set_label(l2); >>>>> | } >>>>> >>>>> And here is the TCG branch, which drop the TCG temp cpu_temp0. >>>>> >>>>> The solution is either to rewrite gen_generic_branch() without TCG >>>>> branches, or to use a TCG temp local instead of a TCG temp. >>>> >>>> You mean something like >>>> >>>> case 6: // pstate >>>> { >>>> TCGv r_temp; >>>> >>>> r_temp = tcg_temp_new(); >>>> tcg_gen_mov_tl(r_temp, cpu_tmp0); >>>> save_state(dc, cpu_cond); >>>> gen_helper_wrpstate(r_temp); >>>> tcg_temp_free(r_temp); >>>> dc->npc = DYNAMIC_PC; >>>> } >>>> break; >>>> >>>> ? >>>> This fails with the same error message. >>> >>> Close, but you need to use tcg_temp_local_new(). Does this work? >>> >>> diff --git a/target-sparc/translate.c b/target-sparc/translate.c >>> index 3c958b2..52fa2f1 100644 >>> --- a/target-sparc/translate.c >>> +++ b/target-sparc/translate.c >>> @@ -3505,9 +3505,15 @@ static void disas_sparc_insn(DisasContext * dc) >>> tcg_gen_mov_tl(cpu_tbr, cpu_tmp0); >>> break; >>> case 6: // pstate >>> - save_state(dc, cpu_cond); >>> - gen_helper_wrpstate(cpu_tmp0); >>> - dc->npc = DYNAMIC_PC; >>> + { >>> + TCGv r_tmp = tcg_temp_local_new(); >>> + >>> + tcg_gen_mov_tl(r_tmp, cpu_tmp0); >>> + save_state(dc, cpu_cond); >>> + gen_helper_wrpstate(r_tmp); >>> + tcg_temp_free_ptr(r_tmp); >>> + dc->npc = DYNAMIC_PC; >>> + } >>> break; >>> case 7: // tl >>> save_state(dc, cpu_cond); >>> >> >> This one works (I have tested similar change but tcg_temp_free() would >> be OK as well) > > Thanks for spotting, that was not intentional. > >> Could anyone please explain why it still works without being in delay >> slot? Simply pointing at optimizer does not work :( > > save_npc() will call gen_generic_branch() only if dc->npc == JUMP_PC, > which is used for delay slot handling. Otherwise gen_generic_branch() > is not called and thus brcond does not get used, so the temporary > will not be flushed for non-delay slot cases.
Great that explains it all, thanks! I think audit over translate.c would be needed, alternatively all main switch temps for sparc64 could be made local as hinted by Laurent - provided there is no significant speed impact. -- Kind regards, Igor V. Kovalenko