Re: [Qemu-devel] [RFC PATCH 1/1] linux-user: Probe the guest base for shared objects when needed
On 06/12/2012 09:08 AM, Richard Henderson wrote: I think this is one of those cases where the -B or -R options (or QEMU_GUEST_BASE and QEMU_RESERVED_VA env variables) are the best way forward for whatever cpu you're emulating. That or a change to the target's default ld script, not to link real executables quite so low in the address space. Per Richard's recommendation I experimented with -R for my use cases. It seems to mostly work, but for ARM GNU/Linux there is an issue that makes it awkward to work with. In particular, this commit [1] added validation for the guest base as a way to ensure that the kernel-provided user mode helper functions on ARM can be mapped. The validation function is invoked by 'probe_guest_base', but also in main.c:3456 whenever -R or -B is used: if (reserved_va || have_guest_base) { if (!guest_validate_base(guest_base)) { fprintf(stderr, Guest base/Reserved VA rejected by guest code\n); exit(1); } } Thus we might be able to allocate the reserved VA region, but it might fail the validation and exit. I had this actually happen on many test cases when testing '-R 128M' with portions of the GCC testsuite. To solve this issue I experimented with performing a similar probing in 'main' as in 'probe_guest_base' so that we can find a reserved VA region that also passes validation. If a region isn't found that can be validated, then QEMU gives up. Does this approach seem reasonable? [1] http://git.qemu.org/?p=qemu.git;a=commit;h=97cc75606aef406e90a243cdb25347039003e7f0 -- Meador Inge CodeSourcery / Mentor Embedded http://www.mentor.com/embedded-software
Re: [Qemu-devel] [RFC PATCH 1/1] linux-user: Probe the guest base for shared objects when needed
On 06/27/2012 08:51 AM, Meador Inge wrote: To solve this issue I experimented with performing a similar probing in 'main' as in 'probe_guest_base' so that we can find a reserved VA region that also passes validation. If a region isn't found that can be validated, then QEMU gives up. Does this approach seem reasonable? I guess so, depending on how you adjust the hint each time. I do wonder if it wouldn't be better to rearrange things such that for 64-bit hosts and 32-bit guests we *always* reserve 4G so that there's zero possibility of the guest stomping on host memory. That would also solve your problem. r~
Re: [Qemu-devel] [RFC PATCH 1/1] linux-user: Probe the guest base for shared objects when needed
On 27 June 2012 18:32, Richard Henderson r...@twiddle.net wrote: I do wonder if it wouldn't be better to rearrange things such that for 64-bit hosts and 32-bit guests we *always* reserve 4G so that there's zero possibility of the guest stomping on host memory. That would also solve your problem. We already almost do that; #if (TARGET_LONG_BITS == 32) (HOST_LONG_BITS == 64) /* * When running 32-on-64 we should make sure we can fit all of the possible * guest address space into a contiguous chunk of virtual host memory. * * This way we will never overlap with our own libraries or binaries or stack * or anything else that QEMU maps. */ unsigned long reserved_va = 0xf700; #else unsigned long reserved_va; #endif #endif The only reason this isn't asking for the full 4GB is that pesky ARM commpage, and (as you hint) the right way to fix this is to make the commpage cope OK with being inside the reserved region as well as outside it, and then we could make that reserved_va value actually be 4GB. -- PMM
Re: [Qemu-devel] [RFC PATCH 1/1] linux-user: Probe the guest base for shared objects when needed
Am 27.06.2012 19:32, schrieb Richard Henderson: On 06/27/2012 08:51 AM, Meador Inge wrote: To solve this issue I experimented with performing a similar probing in 'main' as in 'probe_guest_base' so that we can find a reserved VA region that also passes validation. If a region isn't found that can be validated, then QEMU gives up. Does this approach seem reasonable? I guess so, depending on how you adjust the hint each time. I do wonder if it wouldn't be better to rearrange things such that for 64-bit hosts and 32-bit guests we *always* reserve 4G so that there's zero possibility of the guest stomping on host memory. That would also solve your problem. openSUSE uses a version patched so that IIUC 3G are reserved. Just today this failed on a system where swap got disabled and the mmap() thus failed. Alex suggested an algorithm that starts at 3G (4G, whatever) and when that fails probes lower limits until it succeeds. Either way, it's not a new problem, and each solution so far has had other drawbacks... cc'ing the relevant folks. Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [RFC PATCH 1/1] linux-user: Probe the guest base for shared objects when needed
On 06/27/2012 10:53 AM, Andreas Färber wrote: openSUSE uses a version patched so that IIUC 3G are reserved. Just today this failed on a system where swap got disabled and the mmap() thus failed. Err... why? We map with MAP_NORESERVE, so swap shouldn't matter... r~
Re: [Qemu-devel] [RFC PATCH 1/1] linux-user: Probe the guest base for shared objects when needed
Am 27.06.2012 20:36, schrieb Richard Henderson: On 06/27/2012 10:53 AM, Andreas Färber wrote: openSUSE uses a version patched so that IIUC 3G are reserved. Just today this failed on a system where swap got disabled and the mmap() thus failed. Err... why? We map with MAP_NORESERVE, so swap shouldn't matter... Wasn't my system... Adrian? /-F -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [RFC PATCH 1/1] linux-user: Probe the guest base for shared objects when needed
openSUSE uses a version patched so that IIUC 3G are reserved. Just today this failed on a system where swap got disabled and the mmap() thus failed. Err... why? We map with MAP_NORESERVE, so swap shouldn't matter... I can't say if it's the same cause, but we fail with ulimit -v 4046848. Incidentally, it seems a strange that we only reserve 0xf700 bytes, not the full 4G. Paul
Re: [Qemu-devel] [RFC PATCH 1/1] linux-user: Probe the guest base for shared objects when needed
On 28.06.2012, at 02:06, Paul Brook wrote: openSUSE uses a version patched so that IIUC 3G are reserved. Just today this failed on a system where swap got disabled and the mmap() thus failed. Err... why? We map with MAP_NORESERVE, so swap shouldn't matter... I can't say if it's the same cause, but we fail with ulimit -v 4046848. Incidentally, it seems a strange that we only reserve 0xf700 bytes, not the full 4G. Uh, I think that was because of the vdso shared page that is allocated on top of -R. Either way, this whole approach only works for 32-on-64. For 64-on-64, we can't reserve enough virtual memory on the host to satisfy the guest process for all archs. Alex
Re: [Qemu-devel] [RFC PATCH 1/1] linux-user: Probe the guest base for shared objects when needed
On 06/27/2012 12:32 PM, Richard Henderson wrote: On 06/27/2012 08:51 AM, Meador Inge wrote: To solve this issue I experimented with performing a similar probing in 'main' as in 'probe_guest_base' so that we can find a reserved VA region that also passes validation. If a region isn't found that can be validated, then QEMU gives up. Does this approach seem reasonable? I guess so, depending on how you adjust the hint each time. What I am currently experimenting with is essentially the same as what is in 'probe_guest_base'. So something like (not an actually patch submission, just listing this here for discussion): Index: linux-user/main.c === --- linux-user/main.c (revision 376549) +++ linux-user/main.c (working copy) @@ -3486,35 +3486,53 @@ int main(int argc, char **argv, char **e guest_base = HOST_PAGE_ALIGN(guest_base); if (reserved_va) { -void *p; +unsigned long host_start, real_start, first_start, host_size; int flags; flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE; if (have_guest_base) { flags |= MAP_FIXED; } -p = mmap((void *)guest_base, reserved_va, PROT_NONE, flags, -1, 0); -if (p == MAP_FAILED) { -fprintf(stderr, Unable to reserve guest address space\n); -exit(1); -} -guest_base = (unsigned long)p; -/* Make sure the address is properly aligned. */ -if (guest_base ~qemu_host_page_mask) { -munmap(p, reserved_va); -p = mmap((void *)guest_base, reserved_va + qemu_host_page_size, - PROT_NONE, flags, -1, 0); -if (p == MAP_FAILED) { + + first_start = host_start = HOST_PAGE_ALIGN(guest_base); + while (1) { +host_size = reserved_va; +real_start = (unsigned long) mmap((void *)host_start, host_size, + PROT_NONE, flags, -1, 0); +if (real_start == (unsigned long)-1) { +fprintf(stderr, Unable to reserve guest address space\n); +exit(1); +} +guest_base = host_start; +/* Make sure the address is properly aligned. */ +if (guest_base ~qemu_host_page_mask) { +munmap((void*)real_start, host_size); +host_size += qemu_host_page_size; +real_start = (unsigned long) mmap((void *)guest_base, + host_size, + PROT_NONE, flags, -1, 0); +if (real_start == (unsigned long)-1) { +fprintf(stderr, Unable to reserve guest address space\n); +exit(1); +} +guest_base = HOST_PAGE_ALIGN(real_start); +} + +if (guest_validate_base(guest_base)) +break; + +munmap((void *)real_start, host_size); +host_start += qemu_host_page_size; +if (host_start == first_start) { fprintf(stderr, Unable to reserve guest address space\n); exit(1); } -guest_base = HOST_PAGE_ALIGN((unsigned long)p); } qemu_log(Reserved 0x%lx bytes of guest address space\n, reserved_va); mmap_next_start = reserved_va; } -if (reserved_va || have_guest_base) { +if (have_guest_base) { if (!guest_validate_base(guest_base)) { fprintf(stderr, Guest base/Reserved VA rejected by guest code\n); exit(1); I do wonder if it wouldn't be better to rearrange things such that for 64-bit hosts and 32-bit guests we *always* reserve 4G so that there's zero possibility of the guest stomping on host memory. That would also solve your problem. I am seeing problems with 32-on-32 where the ARM commpage check wraps around (incidentally I also ran into problems with -B because for some values of guest_base it is easy for guest_base = min_mmap_addr and guest_base + kernel_helper_addr min_mmap_addr to hold). -- Meador Inge CodeSourcery / Mentor Embedded http://www.mentor.com/embedded-software
Re: [Qemu-devel] [RFC PATCH 1/1] linux-user: Probe the guest base for shared objects when needed
On 28.06.2012, at 02:06, Paul Brook wrote: openSUSE uses a version patched so that IIUC 3G are reserved. Just today this failed on a system where swap got disabled and the mmap() thus failed. Err... why? We map with MAP_NORESERVE, so swap shouldn't matter... I can't say if it's the same cause, but we fail with ulimit -v 4046848. Incidentally, it seems a strange that we only reserve 0xf700 bytes, not the full 4G. Uh, I think that was because of the vdso shared page that is allocated on top of -R. That can't be right. The whole point of -R is that it defines all the guest accessible virtual address space. The surrounding space is liable to be used by something else, and we must not make any assumptions about it. Further inspection shows that guest_validate_base contains some extremely bogus code. If the guest needs something at the top of its address space then we need to offset address zero within the block, and ensure accesses wrap appropriately. Paul
Re: [Qemu-devel] [RFC PATCH 1/1] linux-user: Probe the guest base for shared objects when needed
On 06/27/2012 07:47 PM, Paul Brook wrote: On 28.06.2012, at 02:06, Paul Brook wrote: openSUSE uses a version patched so that IIUC 3G are reserved. Just today this failed on a system where swap got disabled and the mmap() thus failed. Err... why? We map with MAP_NORESERVE, so swap shouldn't matter... I can't say if it's the same cause, but we fail with ulimit -v 4046848. Incidentally, it seems a strange that we only reserve 0xf700 bytes, not the full 4G. Uh, I think that was because of the vdso shared page that is allocated on top of -R. That can't be right. The whole point of -R is that it defines all the guest accessible virtual address space. The surrounding space is liable to be used by something else, and we must not make any assumptions about it. Further inspection shows that guest_validate_base contains some extremely bogus code. If the guest needs something at the top of its address space then we need to offset address zero within the block, and ensure accesses wrap appropriately. 'guest_validate_base' is currently called for three reasons: (1) in main.c when using -B, (2) in main.c when using -R after mapping the reserved va region, and (3) and when probing for a guest base in probe_guest_base. For case (1) I suppose things are pretty much the same -- we just need to map the extra region when needed (e.g. for the ARM kernel helpers). For case (2) maybe we can do a probing similar to what I mentioned here [1], but taking into account what you stated above and ensuring that the probing finds a single region for the request va region size and any needed extra stuff. Case (3) is mostly the same as (2) but we are probing for a guest base with a region size deduced from looking at the image we are loading. I suppose it is still OK to map two regions here. The single region only applies to -R? Thoughts? [1] http://lists.nongnu.org/archive/html/qemu-devel/2012-06/msg04589.html -- Meador Inge CodeSourcery / Mentor Embedded http://www.mentor.com/embedded-software
Re: [Qemu-devel] [RFC PATCH 1/1] linux-user: Probe the guest base for shared objects when needed
'guest_validate_base' is currently called for three reasons: (1) in main.c when using -B, (2) in main.c when using -R after mapping the reserved va region, and (3) and when probing for a guest base in probe_guest_base. For case (1) I suppose things are pretty much the same -- we just need to map the extra region when needed (e.g. for the ARM kernel helpers). Yes. For case (2) maybe we can do a probing similar to what I mentioned here [1], but taking into account what you stated above and ensuring that the probing finds a single region for the request va region size and any needed extra stuff. Something like that, yes. I suspect there are better ways to implement it though. In principle your patch is making (2) a variant of (3). Instead of probing for the segments covered by the image we probe for the reserved regions (e.g. for ARM [0-reserved_va, 0x - 0x]). A good implementation should automagically DTRT for both 32-bit and 64-bit hosts. Case (3) is mostly the same as (2) but we are probing for a guest base with a region size deduced from looking at the image we are loading. I suppose it is still OK to map two regions here. The single region only applies to -R? I'd say (3) is more similar to (1). There's no fundamental reason why -R has to allocate a single block. In all cases we should be checking the same thing - are the addresses we need available on the host? Having different code paths calling guest_validate_base, etc. for different reasons makes me think we're doing it wrong :-) Paul
Re: [Qemu-devel] [RFC PATCH 1/1] linux-user: Probe the guest base for shared objects when needed
On 2012-06-07 13:59, Meador Inge wrote: load_addr = loaddr; if (ehdr-e_type == ET_DYN) { +if (loaddr mmap_min_addr) +probe_guest_base(image_name, loaddr, hiaddr); This doesn't make any sense. loaddr is almost certainly 0, unless you've pre-linked the ld.so image. But the next statement is letting the system pick the address at which the image will be loaded. What you're actually wanting is to probe the address ranges of the real program, which since this is essentially a program running a program is not visible to us at all. I think this is one of those cases where the -B or -R options (or QEMU_GUEST_BASE and QEMU_RESERVED_VA env variables) are the best way forward for whatever cpu you're emulating. That or a change to the target's default ld script, not to link real executables quite so low in the address space. r~
Re: [Qemu-devel] [RFC PATCH 1/1] linux-user: Probe the guest base for shared objects when needed
On 06/12/2012 09:08 AM, Richard Henderson wrote: On 2012-06-07 13:59, Meador Inge wrote: load_addr = loaddr; if (ehdr-e_type == ET_DYN) { +if (loaddr mmap_min_addr) +probe_guest_base(image_name, loaddr, hiaddr); This doesn't make any sense. loaddr is almost certainly 0, unless you've pre-linked the ld.so image. But the next statement is letting the system pick the address at which the image will be loaded. It usually is. I just want guest_base to be computed to something that will work for cases where a fixed address image is later loaded (at which point it is too late to compute the guest_base). Always probing is one way I found to do that, but as I originally said I don't know this code very well so maybe that is not a good method. I think this is one of those cases where the -B or -R options (or QEMU_GUEST_BASE and QEMU_RESERVED_VA env variables) are the best way forward for whatever cpu you're emulating. That or a change to the target's default ld script, not to link real executables quite so low in the address space. Hmmm, OK. I was really hoping to have something more automatic. Perhaps I will have to use the options. Thanks for the review. -- Meador Inge CodeSourcery / Mentor Embedded http://www.mentor.com/embedded-software
[Qemu-devel] [RFC PATCH 1/1] linux-user: Probe the guest base for shared objects when needed
In some cases when running a shared library directly from QEMU (e.g. ld.so) the guest base should still be probed so that any images loaded later at fixed addresses by the target code can still be mapped. Signed-off-by: Meador Inge mead...@codesourcery.com --- linux-user/elfload.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/linux-user/elfload.c b/linux-user/elfload.c index f3b1552..c71c287 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -1443,6 +1443,7 @@ static void probe_guest_base(const char *image_name, goto exit_errmsg; } } +have_guest_base = 1; qemu_log(Relocating guest address space from 0x TARGET_ABI_FMT_lx to 0x%lx\n, loaddr, real_start); @@ -1528,6 +1529,8 @@ static void load_elf_image(const char *image_name, int image_fd, load_addr = loaddr; if (ehdr-e_type == ET_DYN) { +if (loaddr mmap_min_addr) +probe_guest_base(image_name, loaddr, hiaddr); /* The image indicates that it can be loaded anywhere. Find a location that can hold the memory space required. If the image is pre-linked, LOADDR will be non-zero. Since we do -- 1.7.7.6