On Thu, 1 Feb 2024 17:21:49 +0000
Peter Maydell <peter.mayd...@linaro.org> wrote:

> On Thu, 1 Feb 2024 at 17:08, Jonathan Cameron
> <jonathan.came...@huawei.com> wrote:
> >
> > On Thu, 01 Feb 2024 16:45:30 +0000
> > Alex Bennée <alex.ben...@linaro.org> wrote:
> >  
> > > Jonathan Cameron <jonathan.came...@huawei.com> writes:
> > >  
> > > > On Thu, 1 Feb 2024 16:00:56 +0000
> > > > Peter Maydell <peter.mayd...@linaro.org> wrote:
> > > >  
> > > >> On Thu, 1 Feb 2024 at 15:17, Alex Bennée <alex.ben...@linaro.org> 
> > > >> wrote:  
> > > >> >
> > > >> > Peter Maydell <peter.mayd...@linaro.org> writes:  
> > > >> > > So, that looks like:
> > > >> > >  * we call cpu_tb_exec(), which executes some generated code
> > > >> > >  * that generated code calls the lookup_tb_ptr helper to see
> > > >> > >    if we have a generated TB already for the address we're going
> > > >> > >    to execute next
> > > >> > >  * lookup_tb_ptr probes the TLB to see if we know the host RAM
> > > >> > >    address for the guest address
> > > >> > >  * this results in a TLB walk for an instruction fetch
> > > >> > >  * the page table descriptor load is to IO memory
> > > >> > >  * io_prepare assumes it needs to do a TLB recompile, because
> > > >> > >    can_do_io is clear
> > > >> > >
> > > >> > > I am not surprised that the corner case of "the guest put its
> > > >> > > page tables in an MMIO device" has not yet come up :-)
> > > >> > >
> > > >> > > I'm really not sure how the icount handling should interact
> > > >> > > with that...  
> > > >> >
> > > >> > Its not just icount - we need to handle it for all modes now. That 
> > > >> > said
> > > >> > seeing as we are at the end of a block shouldn't can_do_io be set?  
> > > >>
> > > >> The lookup_tb_ptr helper gets called from tcg_gen_goto_tb(),
> > > >> which happens earlier than the tb_stop callback (it can
> > > >> happen in the trans function for branch etc insns, for
> > > >> example).
> > > >>
> > > >> I think it should be OK to clear can_do_io at the start
> > > >> of the lookup_tb_ptr helper, something like:
> > > >> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> > > >> index 977576ca143..7818537f318 100644
> > > >> --- a/accel/tcg/cpu-exec.c
> > > >> +++ b/accel/tcg/cpu-exec.c
> > > >> @@ -396,6 +396,15 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState 
> > > >> *env)
> > > >>      uint64_t cs_base;
> > > >>      uint32_t flags, cflags;
> > > >>
> > > >> +    /*
> > > >> +     * By definition we've just finished a TB, so I/O is OK.
> > > >> +     * Avoid the possibility of calling cpu_io_recompile() if
> > > >> +     * a page table walk triggered by tb_lookup() calling
> > > >> +     * probe_access_internal() happens to touch an MMIO device.
> > > >> +     * The next TB, if we chain to it, will clear the flag again.
> > > >> +     */
> > > >> +    cpu->neg.can_do_io = true;
> > > >> +
> > > >>      cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
> > > >>
> > > >>      cflags = curr_cflags(cpu);
> > > >>
> > > >> -- PMM  
> > > >
> > > > No joy.  Seems like a very similar backtrace.
> > > >
> > > > Thread 5 "qemu-system-x86" received signal SIGABRT, Aborted.
> > > > [Switching to Thread 0x7ffff4efe6c0 (LWP 23937)]
> > > > __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized 
> > > > out>) at ./nptl/pthread_kill.c:44
> > > > 44      ./nptl/pthread_kill.c: No such file or directory.
> > > > (gdb) bt
> > > > #0  __pthread_kill_implementation (no_tid=0, signo=6, 
> > > > threadid=<optimized out>) at ./nptl/pthread_kill.c:44
> > > > #1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at 
> > > > ./nptl/pthread_kill.c:78
> > > > #2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) 
> > > > at ./nptl/pthread_kill.c:89
> > > > #3  0x00007ffff77c43b6 in __GI_raise (sig=sig@entry=6) at 
> > > > ../sysdeps/posix/raise.c:26
> > > > #4  0x00007ffff77aa87c in __GI_abort () at ./stdlib/abort.c:79
> > > > #5  0x0000555555c4d19e in cpu_abort (cpu=cpu@entry=0x5555578e0cb0, 
> > > > fmt=fmt@entry=0x555556048ee8 "cpu_io_recompile: could not find TB for 
> > > > pc=%p") at ../../cpu-target.c:373
> > > > #6  0x0000555555c9cb25 in cpu_io_recompile 
> > > > (cpu=cpu@entry=0x5555578e0cb0, retaddr=retaddr@entry=0) at 
> > > > ../../accel/tcg/translate-all.c:611
> > > > #7  0x0000555555c9f744 in io_prepare (retaddr=0, addr=19595790664, 
> > > > attrs=..., xlat=<optimized out>, cpu=0x5555578e0cb0, 
> > > > out_offset=<synthetic pointer>) at ../../accel/tcg/cputlb.c:1339
> > > > #8  do_ld_mmio_beN (cpu=0x5555578e0cb0, full=0x7ffe88012890, 
> > > > ret_be=ret_be@entry=0, addr=19595790664, size=size@entry=8, mmu_idx=4, 
> > > > type=MMU_DATA_LOAD, ra=0) at ../../accel/tcg/cputlb.c:2030
> > > > #9  0x0000555555ca0ecd in do_ld_8 (cpu=cpu@entry=0x5555578e0cb0, 
> > > > p=p@entry=0x7ffff4efcdd0, mmu_idx=<optimized out>, 
> > > > type=type@entry=MMU_DATA_LOAD, memop=<optimized out>, ra=ra@entry=0) at 
> > > > ../../accel/tcg/cputlb.c:2356
> > > > #10 0x0000555555ca332f in do_ld8_mmu (cpu=cpu@entry=0x5555578e0cb0, 
> > > > addr=addr@entry=19595790664, oi=oi@entry=52, ra=ra@entry=0, 
> > > > access_type=access_type@entry=MMU_DATA_LOAD) at 
> > > > ../../accel/tcg/cputlb.c:2439
> > > > #11 0x0000555555ca5e69 in cpu_ldq_mmu (ra=0, oi=52, addr=19595790664, 
> > > > env=0x5555578e3470) at ../../accel/tcg/ldst_common.c.inc:169
> > > > #12 cpu_ldq_le_mmuidx_ra (env=0x5555578e3470, addr=19595790664, 
> > > > mmu_idx=<optimized out>, ra=ra@entry=0) at 
> > > > ../../accel/tcg/ldst_common.c.inc:301
> > > > #13 0x0000555555b4b5de in ptw_ldq (in=0x7ffff4efcf10) at 
> > > > ../../target/i386/tcg/sysemu/excp_helper.c:98
> > > > #14 ptw_ldq (in=0x7ffff4efcf10) at 
> > > > ../../target/i386/tcg/sysemu/excp_helper.c:93
> > > > #15 mmu_translate (env=env@entry=0x5555578e3470, in=0x7ffff4efcfd0, 
> > > > out=0x7ffff4efcfa0, err=err@entry=0x7ffff4efcfb0) at 
> > > > ../../target/i386/tcg/sysemu/excp_helper.c:173
> > > > #16 0x0000555555b4c3f3 in get_physical_address (err=0x7ffff4efcfb0, 
> > > > out=0x7ffff4efcfa0, mmu_idx=0, access_type=MMU_DATA_STORE, 
> > > > addr=18386491786698339392, env=0x5555578e3470) at 
> > > > ../../target/i386/tcg/sysemu/excp_helper.c:578
> > > > #17 x86_cpu_tlb_fill (cs=0x5555578e0cb0, addr=18386491786698339392, 
> > > > size=<optimized out>, access_type=MMU_DATA_STORE, mmu_idx=0, 
> > > > probe=<optimized out>, retaddr=140736029817822) at 
> > > > ../../target/i386/tcg/sysemu/excp_helper.c:604
> > > > #18 0x0000555555ca0df9 in tlb_fill (retaddr=140736029817822, mmu_idx=0, 
> > > > access_type=MMU_DATA_STORE, size=<optimized out>, 
> > > > addr=18386491786698339392, cpu=0x7ffff4efd120) at 
> > > > ../../accel/tcg/cputlb.c:1315
> > > > #19 mmu_lookup1 (cpu=cpu@entry=0x5555578e0cb0, 
> > > > data=data@entry=0x7ffff4efd120, mmu_idx=0, 
> > > > access_type=access_type@entry=MMU_DATA_STORE, 
> > > > ra=ra@entry=140736029817822) at ../../accel/tcg/cputlb.c:1713
> > > > #20 0x0000555555ca2b71 in mmu_lookup (cpu=0x5555578e0cb0, 
> > > > addr=18386491786698339392, oi=<optimized out>, ra=140736029817822, 
> > > > type=MMU_DATA_STORE, l=0x7ffff4efd120) at ../../accel/tcg/cputlb.c:1803
> > > > #21 0x0000555555ca3e5d in do_st8_mmu (cpu=0x5555578e0cb0, addr=23937, 
> > > > val=18386491784638059520, oi=6, ra=140736029817822) at 
> > > > ../../accel/tcg/cputlb.c:2853
> > > > #22 0x00007fffa9107c63 in code_gen_buffer ()  
> > >
> > > No thats different - we are actually writing to the MMIO region here.
> > > But the fact we hit cpu_abort because we can't find the TB we are
> > > executing is a little problematic.
> > >
> > > Does ra properly point to the code buffer here?  
> >
> > Err.  How would I tell?  
> 
> Well, a NULL pointer for the return address is definitely not in
> the codegen buffer :-)
> 
> This is again a TLB fill case, but this time it's a data
> access from a guest store insn. We had the correct ra when we
> did the do_st8_mmu() down in frame 21: ra=140736029817822,
> but as we go through the page table walk, we leave the ra
> behind in x86_cpu_tlb_fill(), and so the ptw_ldq()
> passes a zero ra down to the cpu_ldq_mmuidx_ra() (which
> is generally meant to mean "I am not being called from
> translated code and so can_do_io should be false").
> 
> I think that none of the page-table-walking handling
> (either in target code or in general) has been designed
> with the idea in mind that it might need to do something
> for icount if the ptw touches an MMIO device. This is probably
> not as simple as merely "plumb the ra value down through the
> ptw code" -- somebody needs to think about whether doing
> an io_recompile() is the right response to that situation.
> And any "do an address translation for me" system insns
> might or might not need to be dealt with differently.
> 
> If you can at all arrange for your workload not to put
> page tables into MMIO device regions then your life will
> be a lot simpler.

The only way we can do that is abandon supporting emulation
of fine grained interleave (or at least documented that
if you are using this 'mode', don't use it as normal memory)

Whilst I do plan to add a performance option that just
locks out those settings so that we can avoid using MMIO
regions we are going to keep getting people assuming this
case will work  :( 

We need the high perf restricted option for virtualization
usecases where it's reasonable to fake it that there is
no interleave going on (it will be happening in hardware).

I guess doing that goes up in priority :(

Jonathan

> 
> thanks
> -- PMM


Reply via email to