On Tue May 28, 2024 at 8:23 AM AEST, BALATON Zoltan wrote:
> On Wed, 3 Apr 2024, Nicholas Piggin wrote:
> > On Tue Apr 2, 2024 at 9:32 PM AEST, BALATON Zoltan wrote:
> >> On Thu, 21 Mar 2024, BALATON Zoltan wrote:
> >>> On 27/2/24 17:47, BALATON Zoltan wrote:
> >>>> Hello,
> >>>>
> >>>> Commit 18a536f1f8 (accel/tcg: Always require can_do_io) broke booting
> >>>> MorphOS on sam460ex (this was before 8.2.0 and I thought I've verified it
> >>>> before that release but apparently missed it back then). It can be
> >>>> reproduced with https://www.morphos-team.net/morphos-3.18.iso and 
> >>>> following
> >>>> command:
> >>>>
> >>>> qemu-system-ppc -M sam460ex -serial stdio -d unimp,guest_errors \
> >>>>    -drive if=none,id=cd,format=raw,file=morphos-3.18.iso \
> >>>>    -device ide-cd,drive=cd,bus=ide.1
> >>
> >> Any idea on this one? While MorphOS boots on other machines and other OSes
> >> seem to boot on this machine it may still suggest there's some problem
> >> somewhere as this worked before. So it may worth investigating it to make
> >> sure there's no bug that could affect other OSes too even if they boot. I
> >> don't know how to debug this so some help would be needed.
> >
> > In the bad case it crashes after running this TB:
> >
> > ----------------
> > IN:
> > 0x00c01354:  38c00040  li       r6, 0x40
> > 0x00c01358:  38e10204  addi     r7, r1, 0x204
> > 0x00c0135c:  39010104  addi     r8, r1, 0x104
> > 0x00c01360:  39410004  addi     r10, r1, 4
> > 0x00c01364:  39200000  li       r9, 0
> > 0x00c01368:  7cc903a6  mtctr    r6
> > 0x00c0136c:  84c70004  lwzu     r6, 4(r7)
> > 0x00c01370:  7cc907a4  tlbwehi  r6, r9
> > 0x00c01374:  84c80004  lwzu     r6, 4(r8)
> > 0x00c01378:  7cc90fa4  tlbwelo  r6, r9
> > 0x00c0137c:  84ca0004  lwzu     r6, 4(r10)
> > 0x00c01380:  7cc917a4  tlbwehi  r6, r9
> > 0x00c01384:  39290001  addi     r9, r9, 1
> > 0x00c01388:  4200ffe4  bdnz     0xc0136c
> > ----------------
> > IN:
> > 0x00c01374: unable to read memory
> > ----------------
> >
> > "unable to read memory" is the tracer, it does actually translate
> > the address, but it points to a wayward real address which returns
> > 0 to TCG, which is an invalid instruction.
> >
> > The good case instead doesn't exit the TB after 0x00c01370 but after
> > the complete loop at the bdnz. That look like this after the same
> > first TB:
> >
> > ----------------
> > IN:
> > 0x00c0136c:  84c70004  lwzu     r6, 4(r7)
> > 0x00c01370:  7cc907a4  tlbwehi  r6, r9
> > 0x00c01374:  84c80004  lwzu     r6, 4(r8)
> > 0x00c01378:  7cc90fa4  tlbwelo  r6, r9
> > 0x00c0137c:  84ca0004  lwzu     r6, 4(r10)
> > 0x00c01380:  7cc917a4  tlbwehi  r6, r9
> > 0x00c01384:  39290001  addi     r9, r9, 1
> > 0x00c01388:  4200ffe4  bdnz     0xc0136c
> > ----------------
> > IN:
> > 0x00c0138c:  4c00012c  isync
> >
> > All the tlbwe are executed in the same TB. MMU tracing shows the
> > first tlbwehi creates a new valid(!) TLB for 0x00000000-0x100000000
> > that has a garbage RPN because the tlbwelo did not run yet.
> >
> > What's happening in the bad case is that the translator breaks
> > and "re-fetches" instructions in the middle of that sequence, and
> > that's where the bogus translation causes 0 to be returned. The
> > good case the whole block is executed in the same fetch which
> > creates correct translations.
> >
> > So it looks like a morphos bug, the can-do-io change just happens
> > to cause it to re-fetch in that place, but that could happen for
> > a number of reasons, so you can't rely on TLB *only* changing or
> > ifetch *only* re-fetching at a sync point like isync.
> >
> > I would expect code like this to write an invalid entry with tlbwehi,
> > then tlbwelo to set the correct RPN, then make the entry valid with
> > the second tlbwehi. It would probably fix the bug if you just did the
> > first tlbwehi with r6=0 (or at least without the 0x200 bit set).
>
> Revisiting this, I've found in the docs that PPC440 has shadow TLBs so 
> this code can rely upon the TLB not being invalidated until isync and 
> works on real machine but breaks on QEMU.

I never programmed for 440 but it's unclear to me from the docs how
much you can rely on this programatically (you would have to ensure
no page crossings, disable interrupts, hope for no machine check,
etc).

But it does break real software so whether or not it is following
exact letter of the law, it would be good to fix.

> We would either need to make 
> sure the TB runs until the sync or somehow emulate the shadow TLB. I've 
> experimented with the latter but I could not make it work (and 
> unexpectedly keeping a cache of the most recently used entries is slower 
> than always searching through all TLB entries as done now so I've 
> abandoned that idea). The problem is that an entry is modified by multiple 
> tlbwe instructions but these can come in any order (and sometimes only one 
> of them is done like invalidating an entry seems to only do one write) so 
> I don't know when to copy the new entry to the TLB and when to wait for 
> more parts and keep the old one. Any idea how to fix this?

Depends what the important common cases are and exactly how faithfully
you want to model the hardware behaviour I guess.

How are you trying to emulate the shadow TLB? I attached a really quick
hack to see what that would look like... That is modeling the hardware
filled TLB structure ahead of the machine's software TLB. It's not a
perfect model but might be enough, one downside is that it flushes
entire QEMU TLB for any BookE TLB entry change.

The other way to go might be to keep a structure containing the list of
outstanding BookE TLB modifications, and replay that into the TLB on
sync events. That way your QEMU TLB refill path has no extra data
structure to look up or maintain, and you could do more precise
flushing of the QEMU TLB when you apply the changes.

Difficulty would be that TLB instructions would become more complicated
and expensive (reads can't just go to the TLB they would have to
find the most recent change, etc). But maybe that is the better tradeoff
if your lookups are relatively much more common than software-tlb
instructions are not.

Thanks,
Nick

---
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 2015e603d4..afbc766fd1 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -377,6 +377,12 @@ union ppc_tlb_t {
     ppc6xx_tlb_t *tlb6;
     ppcemb_tlb_t *tlbe;
     ppcmas_tlb_t *tlbm;
+
+    /* 440 shadow TLB */
+    ppcemb_tlb_t ishadow[4];
+    int ishadow_idx;
+    ppcemb_tlb_t dshadow[8];
+    int dshadow_idx;
 };
 
 /* possible TLB variants */
diff --git a/target/ppc/helper_regs.c b/target/ppc/helper_regs.c
index 02076e96fb..3207b594e1 100644
--- a/target/ppc/helper_regs.c
+++ b/target/ppc/helper_regs.c
@@ -363,10 +363,22 @@ void store_40x_sler(CPUPPCState *env, uint32_t val)
     env->spr[SPR_405_SLER] = val;
 }
 
+void ppc4xx_tlb_invalidate_shadow(CPUPPCState *env);
 void check_tlb_flush(CPUPPCState *env, bool global)
 {
     CPUState *cs = env_cpu(env);
 
+    if (env->mmu_model == POWERPC_MMU_SOFT_4xx) {
+        assert(!(env->tlb_need_flush & TLB_NEED_GLOBAL_FLUSH));
+
+        if (env->tlb_need_flush & TLB_NEED_LOCAL_FLUSH) {
+            env->tlb_need_flush &= ~TLB_NEED_LOCAL_FLUSH;
+            ppc4xx_tlb_invalidate_shadow(env);
+            tlb_flush(cs);
+        }
+        return;
+    }
+
     /* Handle global flushes first */
     if (global && (env->tlb_need_flush & TLB_NEED_GLOBAL_FLUSH)) {
         env->tlb_need_flush &= ~TLB_NEED_GLOBAL_FLUSH;
diff --git a/target/ppc/mmu-booke.c b/target/ppc/mmu-booke.c
index 55e5dd7c6b..31509be39b 100644
--- a/target/ppc/mmu-booke.c
+++ b/target/ppc/mmu-booke.c
@@ -74,55 +74,91 @@ int mmu40x_get_physical_address(CPUPPCState *env, hwaddr 
*raddr, int *prot,
 {
     ppcemb_tlb_t *tlb;
     int i, ret, zsel, zpr, pr;
+    uint32_t pid = env->spr[SPR_40x_PID];
 
     ret = -1;
-    pr = FIELD_EX64(env->msr, MSR, PR);
+
+    /* Check "shadow TLBs" first */
+    if (access_type == MMU_INST_FETCH) {
+        for (i = 0; i < 4; i++) {
+            tlb = &env->tlb.ishadow[i];
+            if (ppcemb_tlb_check(env, tlb, raddr, address, pid, -i)) {
+                goto found;
+            }
+        }
+    } else {
+        for (i = 0; i < 8; i++) {
+            tlb = &env->tlb.dshadow[i];
+            if (ppcemb_tlb_check(env, tlb, raddr, address, pid, -i)) {
+                goto found;
+            }
+        }
+    }
+    /* Then check main (software visible) TLB */
     for (i = 0; i < env->nb_tlb; i++) {
         tlb = &env->tlb.tlbe[i];
-        if (!ppcemb_tlb_check(env, tlb, raddr, address,
-                              env->spr[SPR_40x_PID], i)) {
-            continue;
+        if (ppcemb_tlb_check(env, tlb, raddr, address, pid, i)) {
+            goto found_main;
         }
-        zsel = (tlb->attr >> 4) & 0xF;
-        zpr = (env->spr[SPR_40x_ZPR] >> (30 - (2 * zsel))) & 0x3;
-        qemu_log_mask(CPU_LOG_MMU,
-                      "%s: TLB %d zsel %d zpr %d ty %d attr %08x\n",
-                      __func__, i, zsel, zpr, access_type, tlb->attr);
-        /* Check execute enable bit */
-        switch (zpr) {
-        case 0x2:
-            if (pr != 0) {
-                goto check_perms;
-            }
-            /* fall through */
-        case 0x3:
-            /* All accesses granted */
-            *prot = PAGE_RWX;
-            ret = 0;
-            break;
+    }
+    goto out;
 
-        case 0x0:
-            if (pr != 0) {
-                /* Raise Zone protection fault.  */
-                env->spr[SPR_40x_ESR] = 1 << 22;
-                *prot = 0;
-                ret = -2;
-                break;
-            }
-            /* fall through */
-        case 0x1:
-check_perms:
-            /* Check from TLB entry */
-            *prot = tlb->prot;
-            if (check_prot_access_type(*prot, access_type)) {
-                ret = 0;
-            } else {
-                env->spr[SPR_40x_ESR] = 0;
-                ret = -2;
-            }
+found_main:
+    /* Shadow must be reloaded, FIFO replacement */
+    if (access_type == MMU_INST_FETCH) {
+        env->tlb.ishadow[env->tlb.ishadow_idx] = *tlb;
+        env->tlb.ishadow_idx++;
+        env->tlb.ishadow_idx %= 4;
+    } else {
+        env->tlb.dshadow[env->tlb.dshadow_idx] = *tlb;
+        env->tlb.dshadow_idx++;
+        env->tlb.dshadow_idx %= 8;
+    }
+
+found:
+    pr = FIELD_EX64(env->msr, MSR, PR);
+
+    zsel = (tlb->attr >> 4) & 0xF;
+    zpr = (env->spr[SPR_40x_ZPR] >> (30 - (2 * zsel))) & 0x3;
+    qemu_log_mask(CPU_LOG_MMU,
+                  "%s: TLB %d zsel %d zpr %d ty %d attr %08x\n",
+                  __func__, i, zsel, zpr, access_type, tlb->attr);
+    /* Check execute enable bit */
+    switch (zpr) {
+    case 0x2:
+        if (pr != 0) {
+            goto check_perms;
+        }
+        /* fall through */
+    case 0x3:
+        /* All accesses granted */
+        *prot = PAGE_RWX;
+        ret = 0;
+        break;
+
+    case 0x0:
+        if (pr != 0) {
+            /* Raise Zone protection fault.  */
+            env->spr[SPR_40x_ESR] = 1 << 22;
+            *prot = 0;
+            ret = -2;
             break;
         }
+        /* fall through */
+    case 0x1:
+check_perms:
+        /* Check from TLB entry */
+        *prot = tlb->prot;
+        if (check_prot_access_type(*prot, access_type)) {
+            ret = 0;
+        } else {
+            env->spr[SPR_40x_ESR] = 0;
+            ret = -2;
+        }
+        break;
     }
+
+out:
     qemu_log_mask(CPU_LOG_MMU, "%s: access %s " TARGET_FMT_lx " => "
                   HWADDR_FMT_plx " %d %d\n",  __func__,
                   ret < 0 ? "refused" : "granted", address,
diff --git a/target/ppc/mmu_helper.c b/target/ppc/mmu_helper.c
index b0a0676beb..502ddf65b6 100644
--- a/target/ppc/mmu_helper.c
+++ b/target/ppc/mmu_helper.c
@@ -108,6 +108,22 @@ static void ppc6xx_tlb_store(CPUPPCState *env, 
target_ulong EPN, int way,
 }
 
 /* Helpers specific to PowerPC 40x implementations */
+void ppc4xx_tlb_invalidate_shadow(CPUPPCState *env);
+void ppc4xx_tlb_invalidate_shadow(CPUPPCState *env)
+{
+    ppcemb_tlb_t *tlb;
+    int i;
+
+    for (i = 0; i < 4; i++) {
+        tlb = &env->tlb.ishadow[i];
+        tlb->prot &= ~PAGE_VALID;
+    }
+    for (i = 0; i < 8; i++) {
+        tlb = &env->tlb.dshadow[i];
+        tlb->prot &= ~PAGE_VALID;
+    }
+}
+
 static inline void ppc4xx_tlb_invalidate_all(CPUPPCState *env)
 {
     ppcemb_tlb_t *tlb;
@@ -117,6 +133,7 @@ static inline void ppc4xx_tlb_invalidate_all(CPUPPCState 
*env)
         tlb = &env->tlb.tlbe[i];
         tlb->prot &= ~PAGE_VALID;
     }
+    ppc4xx_tlb_invalidate_shadow(env);
     tlb_flush(env_cpu(env));
 }
 
@@ -719,6 +736,7 @@ target_ulong helper_4xx_tlbre_lo(CPUPPCState *env, 
target_ulong entry)
     return ret;
 }
 
+#if 0
 static void ppcemb_tlb_flush(CPUState *cs, ppcemb_tlb_t *tlb)
 {
     unsigned mmu_idx = 0;
@@ -736,6 +754,7 @@ static void ppcemb_tlb_flush(CPUState *cs, ppcemb_tlb_t 
*tlb)
     tlb_flush_range_by_mmuidx(cs, tlb->EPN, tlb->size, mmu_idx,
                               TARGET_LONG_BITS);
 }
+#endif
 
 void helper_4xx_tlbwe_hi(CPUPPCState *env, target_ulong entry,
                          target_ulong val)
@@ -753,7 +772,7 @@ void helper_4xx_tlbwe_hi(CPUPPCState *env, target_ulong 
entry,
         qemu_log_mask(CPU_LOG_MMU, "%s: invalidate old TLB %d start "
                       TARGET_FMT_lx " end " TARGET_FMT_lx "\n", __func__,
                       (int)entry, tlb->EPN, tlb->EPN + tlb->size);
-        ppcemb_tlb_flush(cs, tlb);
+        env->tlb_need_flush |= TLB_NEED_LOCAL_FLUSH;
     }
     tlb->size = booke_tlb_to_page_size((val >> PPC4XX_TLBHI_SIZE_SHIFT)
                                        & PPC4XX_TLBHI_SIZE_MASK);
@@ -792,7 +811,7 @@ void helper_4xx_tlbwe_hi(CPUPPCState *env, target_ulong 
entry,
 void helper_4xx_tlbwe_lo(CPUPPCState *env, target_ulong entry,
                          target_ulong val)
 {
-    CPUState *cs = env_cpu(env);
+//    CPUState *cs = env_cpu(env);
     ppcemb_tlb_t *tlb;
 
     qemu_log_mask(CPU_LOG_MMU, "%s entry %i val " TARGET_FMT_lx "\n",
@@ -804,7 +823,7 @@ void helper_4xx_tlbwe_lo(CPUPPCState *env, target_ulong 
entry,
         qemu_log_mask(CPU_LOG_MMU, "%s: invalidate old TLB %d start "
                       TARGET_FMT_lx " end " TARGET_FMT_lx "\n", __func__,
                       (int)entry, tlb->EPN, tlb->EPN + tlb->size);
-        ppcemb_tlb_flush(cs, tlb);
+        env->tlb_need_flush |= TLB_NEED_LOCAL_FLUSH;
     }
     tlb->attr = val & PPC4XX_TLBLO_ATTR_MASK;
     tlb->RPN = val & PPC4XX_TLBLO_RPN_MASK;
@@ -865,7 +884,7 @@ void helper_440_tlbwe(CPUPPCState *env, uint32_t word, 
target_ulong entry,
         qemu_log_mask(CPU_LOG_MMU, "%s: invalidate old TLB %d start "
                       TARGET_FMT_lx " end " TARGET_FMT_lx "\n", __func__,
                       (int)entry, tlb->EPN, tlb->EPN + tlb->size);
-        ppcemb_tlb_flush(env_cpu(env), tlb);
+        env->tlb_need_flush |= TLB_NEED_LOCAL_FLUSH;
     }
 
     switch (word) {

Reply via email to