On Wed, 4 Sep 2019 at 03:41, Peter Xu <pet...@redhat.com> wrote: > On Tue, Sep 03, 2019 at 05:50:56PM +0100, Peter Maydell wrote: > > Do you have a backtrace of QEMU from the segfault? I'm having trouble > > thinking of what the situation is when we'd try to invoke the > > read handler on io_mem_notdirty... > > I've no good understanding of how PHYS_SECTION_NOTDIRTY is used > yet... though from what I understand that's the thing this patch wants > to fix. Because after the broken commit, this line will be > overwritten: > > .valid.accepts = notdirty_mem_accepts, > > and accept() will be reset to NULL. > > With that, memory_region_access_valid(is_write=false) could return > valid now (so a read could happen), while it should never, logically?
Having looked into this a bit further, I think that the problem here is that in commit 30d7e098d5c38644359 we accidentally removed the code for TLB_RECHECK-type TLB entries that handled the "actually this is a RAM access" case after repeating the tlb_fill(). So instead of read accesses to notdirty-mem going through that code and never getting into the io_readx() function, now they do. That combined with the bug where we made the .valid.accepts assignment stop working means you can get into this segfault. This is quite rare because I think only Arm M-profile CPUs and Sparc when using the 'invert endian' page table bit (ie Solaris guests doing PCI stuff) will do this. If we apply this patch to reinstate .valid.accepts then instead of a segfault we'll get a guest exception; which is still not the right behaviour. So I think we need to: (1) fix the cputlb code to reinstate the "handle RAM immediately" codepath (2) either allow reads to notdirty-mem MRs (ie make them just read from the host backing RAM), or define them to be a QEMU bug and make them assert immediately the read is attempted thanks -- PMM