On Mon, Mar 19, 2018 at 2:41 AM, Paolo Bonzini <pbonz...@redhat.com> wrote:
> On 16/03/2018 20:41, Michael Clark wrote: > > From reading other code that accesses memory regions directly, > > it appears that the rcu_read_lock needs to be held. Note: the > > original code for accessing RAM directly was added because > > there is no other way to use atomic_cmpxchg on guest physical > > address space. > > > > Cc: Sagar Karandikar <sag...@eecs.berkeley.edu> > > Cc: Bastian Koppelmann <kbast...@mail.uni-paderborn.de> > > Signed-off-by: Michael Clark <m...@sifive.com> > > Signed-off-by: Palmer Dabbelt <pal...@sifive.com> > > --- > > target/riscv/helper.c | 14 ++++++++++++-- > > 1 file changed, 12 insertions(+), 2 deletions(-) > > I think the bug here is that rcu_read_lock/unlock is missing in > cpu_memory_rw_debug. Are there any other callers you had in mind? So this is not a bug in our patch per se, rather a bug in cpu_memory_rw_debug? It seems that most other code that that uses the address_space_* interfaces (e.g. hw/virtio/virtio.c) holds the rcu_read_lock, presumably to handle cases where the memory map might change at runtime, e.g. during ballooning. This would be exhibited if a page walk collided with a balloon. Ideally we could add a cpu_physical_memory_atomic_cmpxchg API to generic code, and we could avoid holding the rcu_read_lock in target/riscv i.e. encapsulate guest physical memory atomic_cmpxchg. The atomic_cmpxchg wrapper handles multiple word widthes, so we would need cpu_physical_memory_atomic_cmpxchg32 and cpu_physical_memory_atomic_cmpxchg64. We need to use atomic_cmpxchg in the PTE update to detect the case where the PTE has changed between reading it and updating the accessed dirty bits. As far as I can tell, the code is correct. In fact we are reading a very strong interpretation of the RISC-V specification. This is what The RISC-V Instruction Set Manual Volume II: Privileged Architecture says about accessed and dirty updates, which is the reason for the code in question: When a virtual page is accessed and the A bit is clear, or is written and the D bit is clear, the implementation sets the corresponding bit in the PTE. The PTE update must be atomic with respect to other accesses to the PTE, and must atomically check that the PTE is valid and grants suffi cient permissions. The PTE update must be exact (i.e., not speculative), and observed in program order by the local hart. The ordering on loads and stores provided by FENCE instructions and the acquire/release bits on atomic instructions also orders the PTE updates associated with those loads and stores as observed by remote harts. This is an interesting constraint. I believe the intent is that we just AMOOR bit A+D bits on the PTE (although we don't have AMOOR, we have atomic_cmpxchg), however we have read a stronger interpretation (while stronger, it is not incorrect), and that is that the atomicity guarantee extends from the page walker reading the PTE entry, checking its permissions and then updating it, which in hardware would require the page walker to take load reservations, and I don't think it does. Apparently the hardware just AMOORs the bits, so the PTE could actually be pointing somewhere else by the time we go to update the bits, although it is the same virtual address has been accessed or dirtied (albiet with a different physical translation). In any case, we have a very strong interpretation of the specification wording, potentially stronger than hardware may provide. We had a long discussion about this on the RISC-V QEMU issue tracker. Stefan O'Rear mentioned he might make a test that hammers a PTE from another thread while one thread is doing a page walk to try to cause the page walker to set the A+D bits on a PTE that is different to the one it used to create the virtual to physical mapping. That's some background around why the code exists in the first place, which provides the strongest possible gaurantee on PTE atomicity. - https://github.com/riscv/riscv-qemu/pull/103 - https://github.com/riscv/riscv-qemu/pull/105 The get_physical_address bug that this patch fixes does not show up in practice. i.e. we aren't actually hitting cases where we have page walks colliding with address space changes, however I think holding rcu_read_lock seems to be correct and this bug may show up in the future. > diff --git a/target/riscv/helper.c b/target/riscv/helper.c > > index 02cbcea..e71633a 100644 > > --- a/target/riscv/helper.c > > +++ b/target/riscv/helper.c > > @@ -209,6 +209,9 @@ restart: > > as the PTE is no longer valid */ > > MemoryRegion *mr; > > hwaddr l = sizeof(target_ulong), addr1; > > + enum { success, translate_fail, restart_walk} action = > success; > > + > > + rcu_read_lock(); > > mr = address_space_translate(cs->as, pte_addr, > > &addr1, &l, false); > > if (memory_access_is_direct(mr, true)) { > > @@ -222,7 +225,7 @@ restart: > > target_ulong old_pte = > > atomic_cmpxchg(pte_pa, pte, updated_pte); > > if (old_pte != pte) { > > - goto restart; > > + action = restart_walk; > > } else { > > pte = updated_pte; > > } > > @@ -230,7 +233,14 @@ restart: > > } else { > > /* misconfigured PTE in ROM (AD bits are not > preset) or > > * PTE is in IO space and can't be updated > atomically */ > > - return TRANSLATE_FAIL; > > + action = translate_fail; > > + } > > + rcu_read_unlock(); > > + > > + switch (action) { > > + case success: break; > > + case translate_fail: return TRANSLATE_FAIL; > > + case restart_walk: goto restart; > > } > > } > > > > > >