On Tue Apr 2, 2024 at 9:32 PM AEST, BALATON Zoltan wrote: > On Thu, 21 Mar 2024, BALATON Zoltan wrote: > > On 27/2/24 17:47, BALATON Zoltan wrote: > >> Hello, > >> > >> Commit 18a536f1f8 (accel/tcg: Always require can_do_io) broke booting > >> MorphOS on sam460ex (this was before 8.2.0 and I thought I've verified it > >> before that release but apparently missed it back then). It can be > >> reproduced with https://www.morphos-team.net/morphos-3.18.iso and > >> following > >> command: > >> > >> qemu-system-ppc -M sam460ex -serial stdio -d unimp,guest_errors \ > >> -drive if=none,id=cd,format=raw,file=morphos-3.18.iso \ > >> -device ide-cd,drive=cd,bus=ide.1 > > Any idea on this one? While MorphOS boots on other machines and other OSes > seem to boot on this machine it may still suggest there's some problem > somewhere as this worked before. So it may worth investigating it to make > sure there's no bug that could affect other OSes too even if they boot. I > don't know how to debug this so some help would be needed.
In the bad case it crashes after running this TB: ---------------- IN: 0x00c01354: 38c00040 li r6, 0x40 0x00c01358: 38e10204 addi r7, r1, 0x204 0x00c0135c: 39010104 addi r8, r1, 0x104 0x00c01360: 39410004 addi r10, r1, 4 0x00c01364: 39200000 li r9, 0 0x00c01368: 7cc903a6 mtctr r6 0x00c0136c: 84c70004 lwzu r6, 4(r7) 0x00c01370: 7cc907a4 tlbwehi r6, r9 0x00c01374: 84c80004 lwzu r6, 4(r8) 0x00c01378: 7cc90fa4 tlbwelo r6, r9 0x00c0137c: 84ca0004 lwzu r6, 4(r10) 0x00c01380: 7cc917a4 tlbwehi r6, r9 0x00c01384: 39290001 addi r9, r9, 1 0x00c01388: 4200ffe4 bdnz 0xc0136c ---------------- IN: 0x00c01374: unable to read memory ---------------- "unable to read memory" is the tracer, it does actually translate the address, but it points to a wayward real address which returns 0 to TCG, which is an invalid instruction. The good case instead doesn't exit the TB after 0x00c01370 but after the complete loop at the bdnz. That look like this after the same first TB: ---------------- IN: 0x00c0136c: 84c70004 lwzu r6, 4(r7) 0x00c01370: 7cc907a4 tlbwehi r6, r9 0x00c01374: 84c80004 lwzu r6, 4(r8) 0x00c01378: 7cc90fa4 tlbwelo r6, r9 0x00c0137c: 84ca0004 lwzu r6, 4(r10) 0x00c01380: 7cc917a4 tlbwehi r6, r9 0x00c01384: 39290001 addi r9, r9, 1 0x00c01388: 4200ffe4 bdnz 0xc0136c ---------------- IN: 0x00c0138c: 4c00012c isync All the tlbwe are executed in the same TB. MMU tracing shows the first tlbwehi creates a new valid(!) TLB for 0x00000000-0x100000000 that has a garbage RPN because the tlbwelo did not run yet. What's happening in the bad case is that the translator breaks and "re-fetches" instructions in the middle of that sequence, and that's where the bogus translation causes 0 to be returned. The good case the whole block is executed in the same fetch which creates correct translations. So it looks like a morphos bug, the can-do-io change just happens to cause it to re-fetch in that place, but that could happen for a number of reasons, so you can't rely on TLB *only* changing or ifetch *only* re-fetching at a sync point like isync. I would expect code like this to write an invalid entry with tlbwehi, then tlbwelo to set the correct RPN, then make the entry valid with the second tlbwehi. It would probably fix the bug if you just did the first tlbwehi with r6=0 (or at least without the 0x200 bit set). Thanks, Nick