On Sun, Nov 16, 2014 at 06:11:48PM +0000, Peter Maydell wrote: > I'm trying to track down a bug in ARM TCG where we: > * boot a guest > * run 'shutdown -r now' to trigger a reboot > * on reboot, crash when running userspace because the contents > of physical RAM have changed but the translated code from > before the shutdown was never invalidated > > This is with a virtio-mmio block device as the disk. > > Debugging indicates that when the post-reboot guest reloads > binaries from disk into ram we fail to invalidate the cached > translations. For the specific case I looked at, we have a > translation of code at ramaddr_t 0x806e000. The disk load > pulls 0x16000 bytes of data off disk to address 0x806a000. > Virtio correctly calls address_space_unmap(), which is supposed > to be what marks the ram range as dirty. It in turn calls > invalidate_and_set_clean(). However invalidate_and_set_clean() > just does this: > > if (cpu_physical_memory_is_clean(addr)) { > /* invalidate code */ > tb_invalidate_phys_page_range(addr, addr + length, 0); > /* set dirty bit */ > cpu_physical_memory_set_dirty_range_nocode(addr, length); > } > > So if the first page in the range (here 0x806a000) happens > to be dirty then we won't do anything, even if later pages > in the range do need to be invalidated. Also, we'll call > tb_invalidate_phys_page_range() with a start/end which may > be in different physical pages, which is forbidden by that > function's API. > > I guess invalidate_and_set_clean() really needs to be > fixed to loop through each page in the range; does anybody > know how this is supposed to work (or why nobody's noticed > this bug before :-)) ?
Not directly but I don't like this code because it's not atomic. I'll send patches soon for atomic test-and-set and test-and-clear. Hopefully it won't impact performance too much. What you've discovered seems like a plain old bug. It needs a loop. Stefan
pgpG50EJ2bRrO.pgp
Description: PGP signature