On 15 December 2014 at 13:16, Paolo Bonzini <pbonz...@redhat.com> wrote: > If not, it should not need any change to the memory API; you can do it > entirely within cputlb.c, roughly the same as the handling of > TLB_NOTDIRTY. It also marks pages as I/O, but only internally within TCG.
Speaking of TLB_NOTDIRTY, I just wrote up a summary of how that works for a private email, so I figured I might as well send it here too so it's in the qemu-devel mail archives; it's probably not new information to anybody involved in this immediate conversation. How we arrange to throw away cached translations when the guest writes to that part of memory: * we have two data structures effectively tracking dirty status: (1) there are a set of bitmaps which track different kinds of dirtiness (the DIRTY_MEMORY_*); the functions for manipulating these are mostly in ram_addr.h. One of the bitmaps is for DIRTY_MEMORY_CODE. (2) where we have an entry in the QEMU TLB for a page which is backed by host RAM, we may set the TLB_NOTDIRTY bit in the addr_write TLB entry field (TLB_NOTDIRTY is one of several low order bits that can be set in what is otherwise a page-aligned virtual address in the TLB structure. TLB_MMIO is another, indicating that the entry is not RAM at all.) TLB entries come and go, but the bitmaps cover all of physical RAM. When a TLB entry is present then the NOTDIRTY flag should be just a cache for "at least one of the dirty bitmaps says this page is not dirty". * when we generate code we call tlb_protect_code() (from tb_alloc_page()): this calls cpu_physical_memory_reset_dirty(), which both updates the dirty bitmap data structure (marking the region as clean in the DIRTY_MEMORY_CODE bitmap) and also calls cpu_tlb_reset_dirty_all() to OR in the TLB_NOTDIRTY flag for any present TLB entries in the range * when we add an entry to the TLB, tlb_set_page() will OR in the TLB_NOTDIRTY bit if the bitmap says this is clean memory, so the two structures stay in sync * tlb_set_page() also calls memory_region_section_get_iotlb() to get an iotlb entry for this RAM, which is what will be used on the slow path. For RAM this will be io_mem_notdirty. * if the guest attempts a read, we don't do anything special because this uses addr_read, not addr_write * for a guest write, the generated code will look at addr_write; it takes the fast path if the low order bits are clear (indicating dirty host RAM). Otherwise we take the slow path (clean RAM, MMIO, nothing present, etc etc). * we then follow the slow path without special casing RAM, which means we'll use the iotlb entry set up when the TLB entry was populated, which is io_mem_notdirty. * notdirty_mem_write() will invalidate the cached TBs if the DIRTY_MEMORY_CODE bitmap says this memory is clean, and do the access the slow way. We then mark the TLB entry as dirty by calling tlb_set_dirty, so next time we'll take the fast path. (There's an optimisation wrinkle here: tb_invalidate_phys_page_fast() is complicated because it tries to avoid simply nuking every TB in the page. So it might need to keep accesses on the slow path. It only calls tlb_unprotect_code_phys() to update the DIRTY_MEMORY_CODE bitmap if every TB on the page has been invalidated. This is why notdirty_mem_write()'s call to tlb_set_dirty() is conditional.) * writes to already-dirty memory can take the fast path, which just writes to the host RAM without calling out or checking any dirty bits. Note that for linux-user mode the mechanism is totally different, because we don't have a softmmu TLB data structure; instead we use mprotect to write-protect the page, and then in the SIGSEGV handler we may throw away cached TBs before un-write-protecting it. -- PMM