[PATCH] linux-user/mmap.c: check range of mremap result in target address space
If mremap succeeds, an additional check is performed to ensure that the new address range fits into the target address space. This check was previously perfomed in host address space, with the upper bound fixed to abi_ulong. This patch replaces the static check with a call to `guest_range_valid`, performing the range check against the actual size of the target address space. It also moves the corresponding block to prevent it from being called incorrectly when the mapping itself fails. Signed-off-by: Tobias Koch --- linux-user/mmap.c | 21 - 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/linux-user/mmap.c b/linux-user/mmap.c index f261563420..101bd013a1 100644 --- a/linux-user/mmap.c +++ b/linux-user/mmap.c @@ -751,20 +751,23 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong old_size, } if (prot == 0) { host_addr = mremap(g2h(old_addr), old_size, new_size, flags); -if (host_addr != MAP_FAILED && reserved_va && old_size > new_size) { -mmap_reserve(old_addr + old_size, old_size - new_size); + +if (host_addr != MAP_FAILED) { +/* Check if address fits target address space */ +if (!guest_range_valid(h2g(host_addr), new_size)) { +/* Revert mremap() changes */ +host_addr = mremap(g2h(old_addr), new_size, old_size, + flags); +errno = ENOMEM; +host_addr = MAP_FAILED; +} else if (reserved_va && old_size > new_size) { +mmap_reserve(old_addr + old_size, old_size - new_size); +} } } else { errno = ENOMEM; host_addr = MAP_FAILED; } -/* Check if address fits target address space */ -if ((unsigned long)host_addr + new_size > (abi_ulong)-1) { -/* Revert mremap() changes */ -host_addr = mremap(g2h(old_addr), new_size, old_size, flags); -errno = ENOMEM; -host_addr = MAP_FAILED; -} } if (host_addr == MAP_FAILED) { -- 2.20.1
Re: [PATCH] linux-user: mremap fails with EFAULT if address range overlaps with stack guard
Ping On 6/16/20 12:10 AM, Tobias Koch wrote: > Ok, so according to the manpage, mremap generates EFAULT when "the range > old_address to old_address+old_size is an > invalid virtual memory address for this process". This is what the kernel > does for the stack guard. However, the > mappings in setup_arg_pages() will only ever provoke an ENOMEM, because there > is no artifical way to turn a page into an > invalid address. So as long as target bits >= host bits, this works as > expected and EFAULT is generated, because then > mremap is basically passed through and the kernel responds directly. But when > reserved_va is set, this needs to be > special-cased to fake kernel behavior. > > I'm open to other suggestions. I also understand that the code duplication in > elfload.c and mmap.c to handle this is > undesirable, but the most viable alternative seems to be introducing more > globals. > > On 6/15/20 11:28 PM, Tobias Koch wrote: >> Hm, I see I need to have another look at this :) >> >> On 6/15/20 10:17 AM, Tobias Koch wrote: >>> Hi Laurent, >>> >>> the code in musl libc probing the stack is in >>> >>> >>> https://git.musl-libc.org/cgit/musl/plain/src/thread/pthread_getattr_np.c >>> >>> The setup in elfload.c does work, but only when reserved_va is not set. In >>> that case, any stack guard violation is >>> handled by the host kernel and thus results in the expected EFAULT. >>> >>> However, in case of e.g. a 32bit target being emulated on a 64bit host, >>> reserved_va is set and the current code in >>> mmap.c will only produce a more generic ENOMEM, deviating from the kernel's >>> behavior. >>> >>> >>> On 5/7/20 5:35 PM, Laurent Vivier wrote: >>>> Le 05/03/2020 à 22:05, Tobias Koch a écrit : >>>>> If the address range starting at old_address overlaps with the stack >>>>> guard it >>>>> is invalid and mremap must fail with EFAULT. The musl c library relies on >>>>> this >>>>> behavior to detect the stack size, which it does by doing consecutive >>>>> mremaps >>>>> until it hits the stack guard. Without this patch, software (such as the >>>>> Ruby >>>>> interpreter) that calls pthread_getattr_np under musl will crash on 32 bit >>>>> targets emulated on a 64 bit host. >>>> Could you share some pointers to the code that is doing this? >>>> >>>> We have already this kind of code in linux-user/elfload.c, >>>> setup_arg_pages(): could you check why it doesn't work? >
Re: [PATCH] linux-user: mremap fails with EFAULT if address range overlaps with stack guard
Ok, so according to the manpage, mremap generates EFAULT when "the range old_address to old_address+old_size is an invalid virtual memory address for this process". This is what the kernel does for the stack guard. However, the mappings in setup_arg_pages() will only ever provoke an ENOMEM, because there is no artifical way to turn a page into an invalid address. So as long as target bits >= host bits, this works as expected and EFAULT is generated, because then mremap is basically passed through and the kernel responds directly. But when reserved_va is set, this needs to be special-cased to fake kernel behavior. I'm open to other suggestions. I also understand that the code duplication in elfload.c and mmap.c to handle this is undesirable, but the most viable alternative seems to be introducing more globals. On 6/15/20 11:28 PM, Tobias Koch wrote: > Hm, I see I need to have another look at this :) > > On 6/15/20 10:17 AM, Tobias Koch wrote: >> Hi Laurent, >> >> the code in musl libc probing the stack is in >> >> https://git.musl-libc.org/cgit/musl/plain/src/thread/pthread_getattr_np.c >> >> The setup in elfload.c does work, but only when reserved_va is not set. In >> that case, any stack guard violation is >> handled by the host kernel and thus results in the expected EFAULT. >> >> However, in case of e.g. a 32bit target being emulated on a 64bit host, >> reserved_va is set and the current code in >> mmap.c will only produce a more generic ENOMEM, deviating from the kernel's >> behavior. >> >> >> On 5/7/20 5:35 PM, Laurent Vivier wrote: >>> Le 05/03/2020 à 22:05, Tobias Koch a écrit : >>>> If the address range starting at old_address overlaps with the stack guard >>>> it >>>> is invalid and mremap must fail with EFAULT. The musl c library relies on >>>> this >>>> behavior to detect the stack size, which it does by doing consecutive >>>> mremaps >>>> until it hits the stack guard. Without this patch, software (such as the >>>> Ruby >>>> interpreter) that calls pthread_getattr_np under musl will crash on 32 bit >>>> targets emulated on a 64 bit host. >>> Could you share some pointers to the code that is doing this? >>> >>> We have already this kind of code in linux-user/elfload.c, >>> setup_arg_pages(): could you check why it doesn't work?
Re: [PATCH] linux-user: mremap fails with EFAULT if address range overlaps with stack guard
Hm, I see I need to have another look at this :) On 6/15/20 10:17 AM, Tobias Koch wrote: > Hi Laurent, > > the code in musl libc probing the stack is in > > https://git.musl-libc.org/cgit/musl/plain/src/thread/pthread_getattr_np.c > > The setup in elfload.c does work, but only when reserved_va is not set. In > that case, any stack guard violation is > handled by the host kernel and thus results in the expected EFAULT. > > However, in case of e.g. a 32bit target being emulated on a 64bit host, > reserved_va is set and the current code in > mmap.c will only produce a more generic ENOMEM, deviating from the kernel's > behavior. > > > On 5/7/20 5:35 PM, Laurent Vivier wrote: >> Le 05/03/2020 à 22:05, Tobias Koch a écrit : >>> If the address range starting at old_address overlaps with the stack guard >>> it >>> is invalid and mremap must fail with EFAULT. The musl c library relies on >>> this >>> behavior to detect the stack size, which it does by doing consecutive >>> mremaps >>> until it hits the stack guard. Without this patch, software (such as the >>> Ruby >>> interpreter) that calls pthread_getattr_np under musl will crash on 32 bit >>> targets emulated on a 64 bit host. >> Could you share some pointers to the code that is doing this? >> >> We have already this kind of code in linux-user/elfload.c, >> setup_arg_pages(): could you check why it doesn't work? >> >> Thanks, >> Laurent
Re: [PATCH] linux-user: mremap fails with EFAULT if address range overlaps with stack guard
Hi Laurent, the code in musl libc probing the stack is in https://git.musl-libc.org/cgit/musl/plain/src/thread/pthread_getattr_np.c The setup in elfload.c does work, but only when reserved_va is not set. In that case, any stack guard violation is handled by the host kernel and thus results in the expected EFAULT. However, in case of e.g. a 32bit target being emulated on a 64bit host, reserved_va is set and the current code in mmap.c will only produce a more generic ENOMEM, deviating from the kernel's behavior. On 5/7/20 5:35 PM, Laurent Vivier wrote: > Le 05/03/2020 à 22:05, Tobias Koch a écrit : >> If the address range starting at old_address overlaps with the stack guard it >> is invalid and mremap must fail with EFAULT. The musl c library relies on >> this >> behavior to detect the stack size, which it does by doing consecutive mremaps >> until it hits the stack guard. Without this patch, software (such as the Ruby >> interpreter) that calls pthread_getattr_np under musl will crash on 32 bit >> targets emulated on a 64 bit host. > Could you share some pointers to the code that is doing this? > > We have already this kind of code in linux-user/elfload.c, > setup_arg_pages(): could you check why it doesn't work? > > Thanks, > Laurent
[PATCH] linux-user: mremap fails with EFAULT if address range overlaps with stack guard
If the address range starting at old_address overlaps with the stack guard it is invalid and mremap must fail with EFAULT. The musl c library relies on this behavior to detect the stack size, which it does by doing consecutive mremaps until it hits the stack guard. Without this patch, software (such as the Ruby interpreter) that calls pthread_getattr_np under musl will crash on 32 bit targets emulated on a 64 bit host. Signed-off-by: Tobias Koch --- linux-user/mmap.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/linux-user/mmap.c b/linux-user/mmap.c index 8685f02e7e..62cddbd072 100644 --- a/linux-user/mmap.c +++ b/linux-user/mmap.c @@ -651,6 +651,12 @@ int target_munmap(abi_ulong start, abi_ulong len) return ret; } +#ifdef TARGET_HPPA +#define STACK_GROWS_DOWN 0 +#else +#define STACK_GROWS_DOWN 1 +#endif + abi_long target_mremap(abi_ulong old_addr, abi_ulong old_size, abi_ulong new_size, unsigned long flags, abi_ulong new_addr) @@ -665,6 +671,26 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong old_size, return -1; } +/* Check that we're not overlapping with the stack guard. */ +if (reserved_va) { +abi_ulong guard_size, guard_start; + +guard_size = TARGET_PAGE_SIZE < qemu_real_host_page_size ? +qemu_real_host_page_size : TARGET_PAGE_SIZE; + +if (STACK_GROWS_DOWN) { +guard_start = reserved_va - guest_stack_size - guard_size; +} else { +guard_start = reserved_va - guard_size; +} + +if (guard_start < old_addr + old_size && + old_addr < guard_start + guard_size) { +errno = EFAULT; +return -1; +} +} + mmap_lock(); if (flags & MREMAP_FIXED) { -- 2.20.1
[PATCH v2] linux-user: do prlimit selectively
Analogous to what commit 5dfa88f7 did for setrlimit, this commit selectively ignores limits for memory-related resources in prlimit64 calls. This is to prevent too restrictive limits from causing QEMU itself to malfunction. Signed-off-by: Tobias Koch --- linux-user/syscall.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 8d27d10807..4f2f9eb12b 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -11871,7 +11871,10 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, struct target_rlimit64 *target_rnew, *target_rold; struct host_rlimit64 rnew, rold, *rnewp = 0; int resource = target_to_host_resource(arg2); -if (arg3) { + +if (arg3 && (resource != RLIMIT_AS && + resource != RLIMIT_DATA && + resource != RLIMIT_STACK)) { if (!lock_user_struct(VERIFY_READ, target_rnew, arg3, 1)) { return -TARGET_EFAULT; } -- 2.20.1
[PATCH] linux-user: do prlimit selectively
Analogous to what commit 5dfa88f7 did for setrlimit, this commit selectively ignores limits for memory-related resources in prlimit64 calls. This is to prevent too restrictive limits from causing QEMU itself to malfunction. Signed-off-by: Tobias Koch --- linux-user/syscall.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 8d27d10807..8554c77a38 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -11872,13 +11872,17 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, struct host_rlimit64 rnew, rold, *rnewp = 0; int resource = target_to_host_resource(arg2); if (arg3) { - if (!lock_user_struct(VERIFY_READ, target_rnew, arg3, 1)) { - return -TARGET_EFAULT; + if (resource != RLIMIT_AS && + resource != RLIMIT_DATA && + resource != RLIMIT_STACK) { + if (!lock_user_struct(VERIFY_READ, target_rnew, arg3, 1)) { + return -TARGET_EFAULT; + } + rnew.rlim_cur = tswap64(target_rnew->rlim_cur); + rnew.rlim_max = tswap64(target_rnew->rlim_max); + unlock_user_struct(target_rnew, arg3, 0); + rnewp = &rnew; } - rnew.rlim_cur = tswap64(target_rnew->rlim_cur); - rnew.rlim_max = tswap64(target_rnew->rlim_max); - unlock_user_struct(target_rnew, arg3, 0); - rnewp = &rnew; } ret = get_errno(sys_prlimit64(arg1, resource, rnewp, arg4 ? &rold : 0)); -- 2.20.1
[Bug 1823790] Re: QEMU mishandling of SO_PEERSEC forces systemd into tight loop
I carried out the following test: * fetched the QEMU coming with 18.04, * added this patch, * built an LXD container with arch arm64 and the patched qemu-aarch64-static inside, * launched it on amd64 Previously various systemd services would not come up properly, now they are running like a charm. The only grief I have is that network configuration does not work, but that is due to # ip addr Unsupported setsockopt level=270 optname=11 which is a different story. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1823790 Title: QEMU mishandling of SO_PEERSEC forces systemd into tight loop Status in QEMU: Confirmed Bug description: While building Debian images for embedded ARM target systems I detected that QEMU seems to force newer systemd daemons into a tight loop. My setup is the following: Host machine: Ubuntu 18.04, amd64 LXD container: Debian Buster, arm64, systemd 241 QEMU: qemu-aarch64-static, 4.0.0-rc2 (custom build) and 3.1.0 (Debian 1:3.1+dfsg-7) To easily reproduce the issue I have created the following repository: https://github.com/lueschem/edi-qemu The call where systemd gets looping is the following: 2837 getsockopt(3,1,31,274891889456,274887218756,274888927920) = -1 errno=34 (Numerical result out of range) Furthermore I also verified that the issue is not related to LXD. The same behavior can be reproduced using systemd-nspawn. This issue reported against systemd seems to be related: https://github.com/systemd/systemd/issues/11557 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1823790/+subscriptions
[Bug 1823790] Re: QEMU mishandling of SO_PEERSEC forces systemd into tight loop
Thanks, Laurent! I'll get back to you, asap. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1823790 Title: QEMU mishandling of SO_PEERSEC forces systemd into tight loop Status in QEMU: Confirmed Bug description: While building Debian images for embedded ARM target systems I detected that QEMU seems to force newer systemd daemons into a tight loop. My setup is the following: Host machine: Ubuntu 18.04, amd64 LXD container: Debian Buster, arm64, systemd 241 QEMU: qemu-aarch64-static, 4.0.0-rc2 (custom build) and 3.1.0 (Debian 1:3.1+dfsg-7) To easily reproduce the issue I have created the following repository: https://github.com/lueschem/edi-qemu The call where systemd gets looping is the following: 2837 getsockopt(3,1,31,274891889456,274887218756,274888927920) = -1 errno=34 (Numerical result out of range) Furthermore I also verified that the issue is not related to LXD. The same behavior can be reproduced using systemd-nspawn. This issue reported against systemd seems to be related: https://github.com/systemd/systemd/issues/11557 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1823790/+subscriptions
[Bug 1823790] Re: QEMU mishandling of SO_PEERSEC forces systemd into tight loop
** Changed in: qemu Status: New => Confirmed -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1823790 Title: QEMU mishandling of SO_PEERSEC forces systemd into tight loop Status in QEMU: Confirmed Bug description: While building Debian images for embedded ARM target systems I detected that QEMU seems to force newer systemd daemons into a tight loop. My setup is the following: Host machine: Ubuntu 18.04, amd64 LXD container: Debian Buster, arm64, systemd 241 QEMU: qemu-aarch64-static, 4.0.0-rc2 (custom build) and 3.1.0 (Debian 1:3.1+dfsg-7) To easily reproduce the issue I have created the following repository: https://github.com/lueschem/edi-qemu The call where systemd gets looping is the following: 2837 getsockopt(3,1,31,274891889456,274887218756,274888927920) = -1 errno=34 (Numerical result out of range) Furthermore I also verified that the issue is not related to LXD. The same behavior can be reproduced using systemd-nspawn. This issue reported against systemd seems to be related: https://github.com/systemd/systemd/issues/11557 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1823790/+subscriptions
[Bug 1823790] Re: QEMU mishandling of SO_PEERSEC forces systemd into tight loop
I'm a bit surprised that this bug doesn't get more attention, as it makes it very hard to run qemu-emulated containers of Bionic hosted on Bionic. Running such containers is a common way to cross-compile packages for foreign architectures in the absence of sufficiently powerful target HW. The documentation on SO_PEERSEC is indeed sparse, but I do want to second Fritz in his approach. I don't see a reason, why treating the payload as incorrect and throwing it back at the application is better than handling it and saying it is not implemented (which is the case). Arguably, applications should be fixed to handle the error correctly, but I'm afraid that is a can of worms. I have encountered the same problem with systemd, apt and getent. Would the maintainers be open to an SRU request on QEMU for this? -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1823790 Title: QEMU mishandling of SO_PEERSEC forces systemd into tight loop Status in QEMU: New Bug description: While building Debian images for embedded ARM target systems I detected that QEMU seems to force newer systemd daemons into a tight loop. My setup is the following: Host machine: Ubuntu 18.04, amd64 LXD container: Debian Buster, arm64, systemd 241 QEMU: qemu-aarch64-static, 4.0.0-rc2 (custom build) and 3.1.0 (Debian 1:3.1+dfsg-7) To easily reproduce the issue I have created the following repository: https://github.com/lueschem/edi-qemu The call where systemd gets looping is the following: 2837 getsockopt(3,1,31,274891889456,274887218756,274888927920) = -1 errno=34 (Numerical result out of range) Furthermore I also verified that the issue is not related to LXD. The same behavior can be reproduced using systemd-nspawn. This issue reported against systemd seems to be related: https://github.com/systemd/systemd/issues/11557 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1823790/+subscriptions