On Sun, Nov 03, 2013 at 09:26:06PM +0000, Peter Maydell wrote: > On 3 November 2013 20:48, Marcel Apfelbaum <marce...@redhat.com> wrote: > > The problem appears when a root memory region within an > > address space with size < UINT64_MAX has overlapping children > > with the same size. If the size of the root memory region is UINT64_MAX > > everyting is ok. > > > > Solved the regression by making the system-memory region > > of size UINT64_MAX instead of INT64_MAX. > > > > Signed-off-by: Marcel Apfelbaum <marce...@redhat.com> > > --- > > In the mean time I am investigating why the > > root memory region has to be UINT64_MAX size in order > > to have overlapping children > > > system_memory = g_malloc(sizeof(*system_memory)); > > - memory_region_init(system_memory, NULL, "system", INT64_MAX); > > + memory_region_init(system_memory, NULL, "system", UINT64_MAX); > > address_space_init(&address_space_memory, system_memory, "memory"); > > As you say above we should investigate why this caused a > problem, but I was surprised the system memory space isn't > already maximum size. It turns out that that change was > introduced in commit 8417cebf in an attempt to avoid overflow > issues by sticking to signed 64 bit arithmetic. This approach was > subsequently ditched in favour of using proper 128 bit arithmetic > in commit 08dafab4, but we never changed the init call for > the system memory back to UINT64_MAX. So I think this is > a good change in itself. > > -- PMM
I think I debugged it. So this patch seems to help simply because we only have sanity checking asserts in the subpage path. UINT64_MAX will make the region a number of full pages and avoid hitting the checks. I think I see what the issue is: exec.c assumes that TARGET_PHYS_ADDR_SPACE_BITS is enough to render any section in system memory: number of page table levels is calculated from that: #define P_L2_LEVELS \ (((TARGET_PHYS_ADDR_SPACE_BITS - TARGET_PAGE_BITS - 1) / L2_BITS) + 1) any other bits are simply ignored: for (i = P_L2_LEVELS - 1; i >= 0 && !lp.is_leaf; i--) { if (lp.ptr == PHYS_MAP_NODE_NIL) { return §ions[PHYS_SECTION_UNASSIGNED]; } p = nodes[lp.ptr]; lp = p[(index >> (i * L2_BITS)) & (L2_SIZE - 1)]; } so mask by L2_SIZE - 1 means that each round looks at L2_BITS bits, and there are at most P_L2_LEVELS. Any other bits are simply ignored. This is very wrong and can break in a number of other ways, for example I think we will also hit this assert if we have a non aligned 64 bit BAR of a PCI device. I think the fastest solution is to just limit system memory size of TARGET_PAGE_BITS. I sent a patch like this. -- MST