[PATCH] linux-user/mmap.c: check range of mremap result in target address space

2020-10-28 Thread Tobias Koch
If mremap succeeds, an additional check is performed to ensure that the
new address range fits into the target address space. This check was
previously perfomed in host address space, with the upper bound fixed to
abi_ulong.

This patch replaces the static check with a call to `guest_range_valid`,
performing the range check against the actual size of the target address
space. It also moves the corresponding block to prevent it from being
called incorrectly when the mapping itself fails.

Signed-off-by: Tobias Koch 
---
 linux-user/mmap.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index f261563420..101bd013a1 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -751,20 +751,23 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong 
old_size,
 }
 if (prot == 0) {
 host_addr = mremap(g2h(old_addr), old_size, new_size, flags);
-if (host_addr != MAP_FAILED && reserved_va && old_size > new_size) 
{
-mmap_reserve(old_addr + old_size, old_size - new_size);
+
+if (host_addr != MAP_FAILED) {
+/* Check if address fits target address space */
+if (!guest_range_valid(h2g(host_addr), new_size)) {
+/* Revert mremap() changes */
+host_addr = mremap(g2h(old_addr), new_size, old_size,
+   flags);
+errno = ENOMEM;
+host_addr = MAP_FAILED;
+} else if (reserved_va && old_size > new_size) {
+mmap_reserve(old_addr + old_size, old_size - new_size);
+}
 }
 } else {
 errno = ENOMEM;
 host_addr = MAP_FAILED;
 }
-/* Check if address fits target address space */
-if ((unsigned long)host_addr + new_size > (abi_ulong)-1) {
-/* Revert mremap() changes */
-host_addr = mremap(g2h(old_addr), new_size, old_size, flags);
-errno = ENOMEM;
-host_addr = MAP_FAILED;
-}
 }
 
 if (host_addr == MAP_FAILED) {
-- 
2.20.1




Re: [PATCH] linux-user: mremap fails with EFAULT if address range overlaps with stack guard

2020-07-04 Thread Tobias Koch
Ping

On 6/16/20 12:10 AM, Tobias Koch wrote:
> Ok, so according to the manpage, mremap generates EFAULT when "the range 
> old_address to old_address+old_size is an
> invalid virtual memory address for this process". This is what the kernel 
> does for the stack guard. However, the
> mappings in setup_arg_pages() will only ever provoke an ENOMEM, because there 
> is no artifical way to turn a page into an
> invalid address. So as long as target bits >= host bits, this works as 
> expected and EFAULT is generated, because then
> mremap is basically passed through and the kernel responds directly. But when 
> reserved_va is set, this needs to be
> special-cased to fake kernel behavior.
> 
> I'm open to other suggestions. I also understand that the code duplication in 
> elfload.c and mmap.c to handle this is
> undesirable, but the most viable alternative seems to be introducing more 
> globals.
> 
> On 6/15/20 11:28 PM, Tobias Koch wrote:
>> Hm, I see I need to have another look at this :)
>>
>> On 6/15/20 10:17 AM, Tobias Koch wrote:
>>> Hi Laurent,
>>>
>>> the code in musl libc probing the stack is in
>>>
>>>     
>>> https://git.musl-libc.org/cgit/musl/plain/src/thread/pthread_getattr_np.c
>>>
>>> The setup in elfload.c does work, but only when reserved_va is not set. In 
>>> that case, any stack guard violation is
>>> handled by the host kernel and thus results in the expected EFAULT.
>>>
>>> However, in case of e.g. a 32bit target being emulated on a 64bit host, 
>>> reserved_va is set and the current code in
>>> mmap.c will only produce a more generic ENOMEM, deviating from the kernel's 
>>> behavior.
>>>
>>>
>>> On 5/7/20 5:35 PM, Laurent Vivier wrote:
>>>> Le 05/03/2020 à 22:05, Tobias Koch a écrit :
>>>>> If the address range starting at old_address overlaps with the stack 
>>>>> guard it
>>>>> is invalid and mremap must fail with EFAULT. The musl c library relies on 
>>>>> this
>>>>> behavior to detect the stack size, which it does by doing consecutive 
>>>>> mremaps
>>>>> until it hits the stack guard. Without this patch, software (such as the 
>>>>> Ruby
>>>>> interpreter) that calls pthread_getattr_np under musl will crash on 32 bit
>>>>> targets emulated on a 64 bit host.
>>>> Could you share some pointers to the code that is doing this?
>>>>
>>>> We have already this kind of code in linux-user/elfload.c,
>>>> setup_arg_pages(): could you check why it doesn't work?
> 



Re: [PATCH] linux-user: mremap fails with EFAULT if address range overlaps with stack guard

2020-06-15 Thread Tobias Koch
Ok, so according to the manpage, mremap generates EFAULT when "the range 
old_address to old_address+old_size is an
invalid virtual memory address for this process". This is what the kernel does 
for the stack guard. However, the
mappings in setup_arg_pages() will only ever provoke an ENOMEM, because there 
is no artifical way to turn a page into an
invalid address. So as long as target bits >= host bits, this works as expected 
and EFAULT is generated, because then
mremap is basically passed through and the kernel responds directly. But when 
reserved_va is set, this needs to be
special-cased to fake kernel behavior.

I'm open to other suggestions. I also understand that the code duplication in 
elfload.c and mmap.c to handle this is
undesirable, but the most viable alternative seems to be introducing more 
globals.

On 6/15/20 11:28 PM, Tobias Koch wrote:
> Hm, I see I need to have another look at this :)
>
> On 6/15/20 10:17 AM, Tobias Koch wrote:
>> Hi Laurent,
>>
>> the code in musl libc probing the stack is in
>>
>>     https://git.musl-libc.org/cgit/musl/plain/src/thread/pthread_getattr_np.c
>>
>> The setup in elfload.c does work, but only when reserved_va is not set. In 
>> that case, any stack guard violation is
>> handled by the host kernel and thus results in the expected EFAULT.
>>
>> However, in case of e.g. a 32bit target being emulated on a 64bit host, 
>> reserved_va is set and the current code in
>> mmap.c will only produce a more generic ENOMEM, deviating from the kernel's 
>> behavior.
>>
>>
>> On 5/7/20 5:35 PM, Laurent Vivier wrote:
>>> Le 05/03/2020 à 22:05, Tobias Koch a écrit :
>>>> If the address range starting at old_address overlaps with the stack guard 
>>>> it
>>>> is invalid and mremap must fail with EFAULT. The musl c library relies on 
>>>> this
>>>> behavior to detect the stack size, which it does by doing consecutive 
>>>> mremaps
>>>> until it hits the stack guard. Without this patch, software (such as the 
>>>> Ruby
>>>> interpreter) that calls pthread_getattr_np under musl will crash on 32 bit
>>>> targets emulated on a 64 bit host.
>>> Could you share some pointers to the code that is doing this?
>>>
>>> We have already this kind of code in linux-user/elfload.c,
>>> setup_arg_pages(): could you check why it doesn't work?



Re: [PATCH] linux-user: mremap fails with EFAULT if address range overlaps with stack guard

2020-06-15 Thread Tobias Koch
Hm, I see I need to have another look at this :)

On 6/15/20 10:17 AM, Tobias Koch wrote:
> Hi Laurent,
>
> the code in musl libc probing the stack is in
>
>     https://git.musl-libc.org/cgit/musl/plain/src/thread/pthread_getattr_np.c
>
> The setup in elfload.c does work, but only when reserved_va is not set. In 
> that case, any stack guard violation is
> handled by the host kernel and thus results in the expected EFAULT.
>
> However, in case of e.g. a 32bit target being emulated on a 64bit host, 
> reserved_va is set and the current code in
> mmap.c will only produce a more generic ENOMEM, deviating from the kernel's 
> behavior.
>
>
> On 5/7/20 5:35 PM, Laurent Vivier wrote:
>> Le 05/03/2020 à 22:05, Tobias Koch a écrit :
>>> If the address range starting at old_address overlaps with the stack guard 
>>> it
>>> is invalid and mremap must fail with EFAULT. The musl c library relies on 
>>> this
>>> behavior to detect the stack size, which it does by doing consecutive 
>>> mremaps
>>> until it hits the stack guard. Without this patch, software (such as the 
>>> Ruby
>>> interpreter) that calls pthread_getattr_np under musl will crash on 32 bit
>>> targets emulated on a 64 bit host.
>> Could you share some pointers to the code that is doing this?
>>
>> We have already this kind of code in linux-user/elfload.c,
>> setup_arg_pages(): could you check why it doesn't work?
>>
>> Thanks,
>> Laurent



Re: [PATCH] linux-user: mremap fails with EFAULT if address range overlaps with stack guard

2020-06-15 Thread Tobias Koch
Hi Laurent,

the code in musl libc probing the stack is in

    https://git.musl-libc.org/cgit/musl/plain/src/thread/pthread_getattr_np.c

The setup in elfload.c does work, but only when reserved_va is not set. In that 
case, any stack guard violation is
handled by the host kernel and thus results in the expected EFAULT.

However, in case of e.g. a 32bit target being emulated on a 64bit host, 
reserved_va is set and the current code in
mmap.c will only produce a more generic ENOMEM, deviating from the kernel's 
behavior.


On 5/7/20 5:35 PM, Laurent Vivier wrote:
> Le 05/03/2020 à 22:05, Tobias Koch a écrit :
>> If the address range starting at old_address overlaps with the stack guard it
>> is invalid and mremap must fail with EFAULT. The musl c library relies on 
>> this
>> behavior to detect the stack size, which it does by doing consecutive mremaps
>> until it hits the stack guard. Without this patch, software (such as the Ruby
>> interpreter) that calls pthread_getattr_np under musl will crash on 32 bit
>> targets emulated on a 64 bit host.
> Could you share some pointers to the code that is doing this?
>
> We have already this kind of code in linux-user/elfload.c,
> setup_arg_pages(): could you check why it doesn't work?
>
> Thanks,
> Laurent



[PATCH] linux-user: mremap fails with EFAULT if address range overlaps with stack guard

2020-03-05 Thread Tobias Koch
If the address range starting at old_address overlaps with the stack guard it
is invalid and mremap must fail with EFAULT. The musl c library relies on this
behavior to detect the stack size, which it does by doing consecutive mremaps
until it hits the stack guard. Without this patch, software (such as the Ruby
interpreter) that calls pthread_getattr_np under musl will crash on 32 bit
targets emulated on a 64 bit host.

Signed-off-by: Tobias Koch 
---
 linux-user/mmap.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 8685f02e7e..62cddbd072 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -651,6 +651,12 @@ int target_munmap(abi_ulong start, abi_ulong len)
 return ret;
 }
 
+#ifdef TARGET_HPPA
+#define STACK_GROWS_DOWN 0
+#else
+#define STACK_GROWS_DOWN 1
+#endif
+
 abi_long target_mremap(abi_ulong old_addr, abi_ulong old_size,
abi_ulong new_size, unsigned long flags,
abi_ulong new_addr)
@@ -665,6 +671,26 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong 
old_size,
 return -1;
 }
 
+/* Check that we're not overlapping with the stack guard. */
+if (reserved_va) {
+abi_ulong guard_size, guard_start;
+
+guard_size = TARGET_PAGE_SIZE < qemu_real_host_page_size ?
+qemu_real_host_page_size : TARGET_PAGE_SIZE;
+
+if (STACK_GROWS_DOWN) {
+guard_start = reserved_va - guest_stack_size - guard_size;
+} else {
+guard_start = reserved_va - guard_size;
+}
+
+if (guard_start < old_addr + old_size &&
+ old_addr < guard_start + guard_size) {
+errno = EFAULT;
+return -1;
+}
+}
+
 mmap_lock();
 
 if (flags & MREMAP_FIXED) {
-- 
2.20.1




[PATCH v2] linux-user: do prlimit selectively

2020-03-05 Thread Tobias Koch
Analogous to what commit 5dfa88f7 did for setrlimit, this commit
selectively ignores limits for memory-related resources in prlimit64
calls. This is to prevent too restrictive limits from causing QEMU
itself to malfunction.

Signed-off-by: Tobias Koch 
---
 linux-user/syscall.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 8d27d10807..4f2f9eb12b 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -11871,7 +11871,10 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 struct target_rlimit64 *target_rnew, *target_rold;
 struct host_rlimit64 rnew, rold, *rnewp = 0;
 int resource = target_to_host_resource(arg2);
-if (arg3) {
+
+if (arg3 && (resource != RLIMIT_AS &&
+ resource != RLIMIT_DATA &&
+ resource != RLIMIT_STACK)) {
 if (!lock_user_struct(VERIFY_READ, target_rnew, arg3, 1)) {
 return -TARGET_EFAULT;
 }
-- 
2.20.1




[PATCH] linux-user: do prlimit selectively

2020-03-04 Thread Tobias Koch
Analogous to what commit 5dfa88f7 did for setrlimit, this commit
selectively ignores limits for memory-related resources in prlimit64
calls. This is to prevent too restrictive limits from causing QEMU
itself to malfunction.

Signed-off-by: Tobias Koch 
---
 linux-user/syscall.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 8d27d10807..8554c77a38 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -11872,13 +11872,17 @@ static abi_long do_syscall1(void *cpu_env, int
num, abi_long arg1,
 struct host_rlimit64 rnew, rold, *rnewp = 0;
 int resource = target_to_host_resource(arg2);
 if (arg3) {
-    if (!lock_user_struct(VERIFY_READ, target_rnew, arg3, 1)) {
-    return -TARGET_EFAULT;
+    if (resource != RLIMIT_AS &&
+    resource != RLIMIT_DATA &&
+    resource != RLIMIT_STACK) {
+    if (!lock_user_struct(VERIFY_READ, target_rnew, arg3, 1)) {
+    return -TARGET_EFAULT;
+    }
+    rnew.rlim_cur = tswap64(target_rnew->rlim_cur);
+    rnew.rlim_max = tswap64(target_rnew->rlim_max);
+    unlock_user_struct(target_rnew, arg3, 0);
+    rnewp = &rnew;
 }
-    rnew.rlim_cur = tswap64(target_rnew->rlim_cur);
-    rnew.rlim_max = tswap64(target_rnew->rlim_max);
-    unlock_user_struct(target_rnew, arg3, 0);
-    rnewp = &rnew;
 }
 
 ret = get_errno(sys_prlimit64(arg1, resource, rnewp, arg4 ?
&rold : 0));
-- 
2.20.1




[Bug 1823790] Re: QEMU mishandling of SO_PEERSEC forces systemd into tight loop

2020-02-04 Thread Tobias Koch
I carried out the following test:

* fetched the QEMU coming with 18.04,
* added this patch,
* built an LXD container with arch arm64 and the patched qemu-aarch64-static 
inside,
* launched it on amd64

Previously various systemd services would not come up properly, now they
are running like a charm. The only grief I have is that network
configuration does not work, but that is due to

# ip addr
Unsupported setsockopt level=270 optname=11

which is a different story.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1823790

Title:
  QEMU mishandling of SO_PEERSEC forces systemd into tight loop

Status in QEMU:
  Confirmed

Bug description:
  While building Debian images for embedded ARM target systems I
  detected that QEMU seems to force newer systemd daemons into a tight
  loop.

  My setup is the following:

  Host machine: Ubuntu 18.04, amd64
  LXD container: Debian Buster, arm64, systemd 241
  QEMU: qemu-aarch64-static, 4.0.0-rc2 (custom build) and 3.1.0 (Debian 
1:3.1+dfsg-7)

  To easily reproduce the issue I have created the following repository:
  https://github.com/lueschem/edi-qemu

  The call where systemd gets looping is the following:
  2837 getsockopt(3,1,31,274891889456,274887218756,274888927920) = -1 errno=34 
(Numerical result out of range)

  Furthermore I also verified that the issue is not related to LXD.
  The same behavior can be reproduced using systemd-nspawn.

  This issue reported against systemd seems to be related:
  https://github.com/systemd/systemd/issues/11557

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1823790/+subscriptions



[Bug 1823790] Re: QEMU mishandling of SO_PEERSEC forces systemd into tight loop

2020-02-03 Thread Tobias Koch
Thanks, Laurent! I'll get back to you, asap.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1823790

Title:
  QEMU mishandling of SO_PEERSEC forces systemd into tight loop

Status in QEMU:
  Confirmed

Bug description:
  While building Debian images for embedded ARM target systems I
  detected that QEMU seems to force newer systemd daemons into a tight
  loop.

  My setup is the following:

  Host machine: Ubuntu 18.04, amd64
  LXD container: Debian Buster, arm64, systemd 241
  QEMU: qemu-aarch64-static, 4.0.0-rc2 (custom build) and 3.1.0 (Debian 
1:3.1+dfsg-7)

  To easily reproduce the issue I have created the following repository:
  https://github.com/lueschem/edi-qemu

  The call where systemd gets looping is the following:
  2837 getsockopt(3,1,31,274891889456,274887218756,274888927920) = -1 errno=34 
(Numerical result out of range)

  Furthermore I also verified that the issue is not related to LXD.
  The same behavior can be reproduced using systemd-nspawn.

  This issue reported against systemd seems to be related:
  https://github.com/systemd/systemd/issues/11557

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1823790/+subscriptions



[Bug 1823790] Re: QEMU mishandling of SO_PEERSEC forces systemd into tight loop

2020-01-28 Thread Tobias Koch
** Changed in: qemu
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1823790

Title:
  QEMU mishandling of SO_PEERSEC forces systemd into tight loop

Status in QEMU:
  Confirmed

Bug description:
  While building Debian images for embedded ARM target systems I
  detected that QEMU seems to force newer systemd daemons into a tight
  loop.

  My setup is the following:

  Host machine: Ubuntu 18.04, amd64
  LXD container: Debian Buster, arm64, systemd 241
  QEMU: qemu-aarch64-static, 4.0.0-rc2 (custom build) and 3.1.0 (Debian 
1:3.1+dfsg-7)

  To easily reproduce the issue I have created the following repository:
  https://github.com/lueschem/edi-qemu

  The call where systemd gets looping is the following:
  2837 getsockopt(3,1,31,274891889456,274887218756,274888927920) = -1 errno=34 
(Numerical result out of range)

  Furthermore I also verified that the issue is not related to LXD.
  The same behavior can be reproduced using systemd-nspawn.

  This issue reported against systemd seems to be related:
  https://github.com/systemd/systemd/issues/11557

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1823790/+subscriptions



[Bug 1823790] Re: QEMU mishandling of SO_PEERSEC forces systemd into tight loop

2020-01-28 Thread Tobias Koch
I'm a bit surprised that this bug doesn't get more attention, as it
makes it very hard to run qemu-emulated containers of Bionic hosted on
Bionic. Running such containers is a common way to cross-compile
packages for foreign architectures in the absence of sufficiently
powerful target HW.

The documentation on SO_PEERSEC is indeed sparse, but I do want to
second Fritz in his approach. I don't see a reason, why treating the
payload as incorrect and throwing it back at the application is better
than handling it and saying it is not implemented (which is the case).

Arguably, applications should be fixed to handle the error correctly,
but I'm afraid that is a can of worms. I have encountered the same
problem with systemd, apt and getent. Would the maintainers be open to
an SRU request on QEMU for this?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1823790

Title:
  QEMU mishandling of SO_PEERSEC forces systemd into tight loop

Status in QEMU:
  New

Bug description:
  While building Debian images for embedded ARM target systems I
  detected that QEMU seems to force newer systemd daemons into a tight
  loop.

  My setup is the following:

  Host machine: Ubuntu 18.04, amd64
  LXD container: Debian Buster, arm64, systemd 241
  QEMU: qemu-aarch64-static, 4.0.0-rc2 (custom build) and 3.1.0 (Debian 
1:3.1+dfsg-7)

  To easily reproduce the issue I have created the following repository:
  https://github.com/lueschem/edi-qemu

  The call where systemd gets looping is the following:
  2837 getsockopt(3,1,31,274891889456,274887218756,274888927920) = -1 errno=34 
(Numerical result out of range)

  Furthermore I also verified that the issue is not related to LXD.
  The same behavior can be reproduced using systemd-nspawn.

  This issue reported against systemd seems to be related:
  https://github.com/systemd/systemd/issues/11557

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1823790/+subscriptions