I was looking at a reported bug about semihosting when the MPU is being used on Arm CPUs: https://gitlab.com/qemu-project/qemu/-/issues/3292 which turns out to be because in cpu_memory_rw_debug() instead of passing in the vaddr we're given to get_phys_page_debug() we first round it down to a target-page, then translate it, then add the page offset back in. This doesn't work if the target is dealing with memory protection/validity at a granularity smaller than a target-page, as the Arm MPU does.
I thought this was a bit odd, because the Arm page-table walk code will happily translate a non-page-aligned address for you, so I had a bit of a dig into our implementations and callers, and we seem to be inconsistent about whether you're supposed to handle non-page-aligned addresses here. (By "handle" I mean both "can take a non-page-aligned input" and also "the resulting physaddr includes the bits for the offset within the page and has not been rounded down to a page-base address".) Implementations that don't handle non-aligned addresses: * alpha, microblaze, sparc, x86, riscv, s390x (but riscv only because it explicitly masks out the low bits just before returning the physaddr!) Implementations that do handle non-aligned addresses: * arm, avr, hppa, loongarch, m68k, mips, openrisc, rx, sh4, tricore, xtensa Implementations that were too complex for me to determine the answer with a quick look: * ppc And at the callsites we have: Callers that naturally always use page aligned addresses: * hw/i386/vapic.c : always uses page-aligned addresses * target/sparc/mmu_helper.c: uses in dump_mmu() are always page aligned Callers that assume non-aligned addresses are handled: * hw/xtensa/sim.c,xtfpga.c : we translate ELF program header p_addrs * target/xtensa/xtensa-semi.c (these two only work because xtensa is in the "handles" this group) Callers that do some kind of "work around getting the page base", and mostly-work with both: * monitor/hmp-cmds.c: in hmp_gva2gpa we round to a page base then add back in the offset within the page * plugins/api.c: ORs in the low bits, so can cope with both "get exact answer" and "gives page base" * target/s390x/helper.c: s390_cpu_get_phys_addr_debug is a wrapper implementing "give actual page" in terms of cpu_get_phys_page_debug by the usual "mask and put the page offset back in" approach * physmem.c: cpu_memory_rw_debug rounds to a page base then adds back in the offset within the page afterwards Looking at this, my feeling is that we should: * define that these APIs must handle non-page-aligned addresses * fix the six or seven implementations that don't (either "properly" or by adding in the "mask to page base then OR back in the page offset" code to their get_phys_page_debug functions) * clean up the callsites to not work around the lack of handling of non-page-aligned addresses * add/extend the get_phys_page_debug API so that a caller can get the lg_page_size * fix the cpu_memory_rw_debug vs Arm MPU bug by having it use that lg_page_size to determine how much memory to access at once rather than assuming it can read to the end of the page Opinions ? thanks -- PMM
