Re: [PATCH 0/5] linux-user: Support extended clone(CLONE_VM)

2020-06-16 Thread Josh Kunz
On Tue, Jun 16, 2020 at 9:32 AM Peter Maydell 
wrote:
>
> On Tue, 16 Jun 2020 at 17:08, Alex Bennée  wrote:
> > Apart from "a more perfect emulation" is there a particular use case
> > served by the extra functionality? AIUI up until this point we've
> > basically supported glibc's use of clone() which has generally been
> > enough. I'm assuming you've come across stuff that needs this more fine
> > grained support?
>
> There are definitely cases we don't handle that cause problems;
> notably https://bugs.launchpad.net/qemu/+bug/1673976 reports
> that newer glibc implement posix_spawn() using CLONE_VM|CLONE_VFORK
> which we don't handle correctly (though it is now just "we don't
> report failures correctly" rather than "guest asserts").

This originally came up for us at Google in multi-threaded guest binaries
using TCMalloc (https://github.com/google/tcmalloc). TCMalloc does not have
any special `at_fork` handling, so guest processes using TCMalloc spawn
subprocesses using a custom bit of code based on `clone(CLONE_VM)` (notably
*not* vfork()).

We've also been using this patch to work around similar issues in QEMU
itself when creating subprocesses with fork()/vfork(). Since QEMU (and
GLib) rely on locks to emulate multi-threaded guests that share virtual
memory, QEMU itself is also vulnerable to deadlocks when a guest forks.
Without this patch we've run into deadlocks with TCG region trees, and
GLib's `g_malloc`, there are likely others as well, since we could still
reproduce the deadlocks after fixing these specific problems.

The issues caused by using fork() in multi-threaded guests are pretty
tricky to debug. They are fundamentally data races (was another thread in
the critical section or not?), and they usually manifest as deadlocks,
which show up as timeouts or hangs to users. I suspect this issue happens
frequently in the wild, but at a low enough rate/user that nobody bothered
fixing it/reporting it yet. Use of `vfork()` with `CLONE_VM` is common as
mentioned by Alex. For example it is the only way to spawn subprocesses in
Go on most platforms:
https://github.com/golang/go/blob/master/src/syscall/exec_linux.go#L218

I tried to come up with a good reproducer for these issues, but I haven't
been able to make one. The cases we have at Google that trigger this issue
reliably are big and they contain lots of code I can't share. When
compiling QEMU itself with TCMalloc, I can pretty reliably reproduce a
deadlock with a program that just calls vfork(), but that's somewhat
synthetic.

> The problem has always been that glibc implicitly assumes it
> knows what the process's threads are like, ie that it is the
> only thing doing any clone()s. (The comment in syscall.c mentions
> it "breaking mutexes" though I forget what I had in mind when
> I wrote that comment.) I haven't looked at these patches,
> but the risk of being clever is that we end up implicitly
> depending on details of glibc's internal implementation in a
> potentially fragile way.
>
>
> I forget whether QEMU can build against musl libc, but if we do
> then that might be an interesting test of whether we have
> accidental dependencies on the libc internals.

I agree it would be interesting to test against musl. I'm pretty sure it
would work (this patch only relies on POSIX APIs + Platform ABIs for TLS),
but it would be interesting to confirm.

--
Josh Kunz


Re: [PATCH 0/5] linux-user: Support extended clone(CLONE_VM)

2020-06-16 Thread Peter Maydell
On Tue, 16 Jun 2020 at 17:08, Alex Bennée  wrote:
> Apart from "a more perfect emulation" is there a particular use case
> served by the extra functionality? AIUI up until this point we've
> basically supported glibc's use of clone() which has generally been
> enough. I'm assuming you've come across stuff that needs this more fine
> grained support?

There are definitely cases we don't handle that cause problems;
notably https://bugs.launchpad.net/qemu/+bug/1673976 reports
that newer glibc implement posix_spawn() using CLONE_VM|CLONE_VFORK
which we don't handle correctly (though it is now just "we don't
report failures correctly" rather than "guest asserts").
The problem has always been that glibc implicitly assumes it
knows what the process's threads are like, ie that it is the
only thing doing any clone()s. (The comment in syscall.c mentions
it "breaking mutexes" though I forget what I had in mind when
I wrote that comment.) I haven't looked at these patches,
but the risk of being clever is that we end up implicitly
depending on details of glibc's internal implementation in a
potentially fragile way.

I forget whether QEMU can build against musl libc, but if we do
then that might be an interesting test of whether we have
accidental dependencies on the libc internals.

thanks
-- PMM



Re: [PATCH 0/5] linux-user: Support extended clone(CLONE_VM)

2020-06-16 Thread Alex Bennée


Josh Kunz  writes:

> This patch series implements extended support for the `clone` system
> call. As best I can tell, any option combination including `CLONE_VM`
> should be supported with the addition of this patch series. The
> implementation is described in greater detail in the patches themselves.
>
> Testing:
>
>   * All targets built on x86_64.
>   * `make check` and `make check-tcg` are passing. Additional tests have
> been added to `linux-test.c` to validate clone behavior.

Not for me - which probably means you don't have docker/podman setup or
cross compilers for all the various guests we have. See
tests/tcg/configure.sh

>
> Caveats:
>
>   * This series touches, but does not fix, several bits of code that are
> racey (namely the sigact table and the fd trans table).
>   * `exit_group` does not perform the appropriate cleanup for non-thread
> children created with `CLONE_VM`. CPUs for such children are never
> cleaned up. The correct implementation of exit-group is non-trivial
> (since it also needs to track/handle cleanup for threads in the
> clone'd child process). Also, I don't fully understand the
> interaction between QOM<->linux-user. My naive implementation based
> on the current implementation `exit(2)` was regularly crashing. If
> maintainers have suggestions for better ways to handle exit_group,
> they would be greatly appreciated. 
>   * execve does not clean up the CPUs of clone'd children, for the same
> reasons as `exit_group`.

Apart from "a more perfect emulation" is there a particular use case
served by the extra functionality? AIUI up until this point we've
basically supported glibc's use of clone() which has generally been
enough. I'm assuming you've come across stuff that needs this more fine
grained support?

>
> Josh Kunz (5):
>   linux-user: Refactor do_fork to use new `qemu_clone`
>   linux-user: Make fd_trans task-specific.
>   linux-user: Make sigact_table part of the task state.
>   linux-user: Support CLONE_VM and extended clone options
>   linux-user: Add PDEATHSIG test for clone process hierarchy.
>
>  linux-user/Makefile.objs|   2 +-
>  linux-user/clone.c  | 565 
>  linux-user/clone.h  |  27 ++
>  linux-user/fd-trans-tbl.c   |  13 +
>  linux-user/fd-trans-type.h  |  17 +
>  linux-user/fd-trans.c   |   3 -
>  linux-user/fd-trans.h   |  75 ++--
>  linux-user/main.c   |   1 +
>  linux-user/qemu.h   |  49 +++
>  linux-user/signal.c |  84 -
>  linux-user/syscall.c| 452 --
>  tests/tcg/multiarch/Makefile.target |   3 +
>  tests/tcg/multiarch/linux-test.c| 227 ++-
>  13 files changed, 1264 insertions(+), 254 deletions(-)
>  create mode 100644 linux-user/clone.c
>  create mode 100644 linux-user/clone.h
>  create mode 100644 linux-user/fd-trans-tbl.c
>  create mode 100644 linux-user/fd-trans-type.h


-- 
Alex Bennée



Re: [PATCH 0/5] linux-user: Support extended clone(CLONE_VM)

2020-06-13 Thread Alex Bennée


Josh Kunz  writes:

> This patch series implements extended support for the `clone` system
> call. As best I can tell, any option combination including `CLONE_VM`
> should be supported with the addition of this patch series. The
> implementation is described in greater detail in the patches themselves.
>
> Testing:
>
>   * All targets built on x86_64.
>   * `make check` and `make check-tcg` are passing. Additional tests have
> been added to `linux-test.c` to validate clone behavior.
>
> Caveats:
>
>   * This series touches, but does not fix, several bits of code that are
> racey (namely the sigact table and the fd trans table).
>   * `exit_group` does not perform the appropriate cleanup for non-thread
> children created with `CLONE_VM`. CPUs for such children are never
> cleaned up. The correct implementation of exit-group is non-trivial
> (since it also needs to track/handle cleanup for threads in the
> clone'd child process). Also, I don't fully understand the
> interaction between QOM<->linux-user.

When the QOM object gets unrefed for the final time it should cause a
bunch of clean-up in the common vCPU management code where things like
plugin cleanup are done.

This was recently touched in 1f81ce90e31ef338ee53a0cea02344237bc470cc
where I removed linux-user messing around with the active cpu list and
left it to the core code to deal with. Previously it wasn't being
properly unrealized.

> My naive implementation based
> on the current implementation `exit(2)` was regularly crashing. If
> maintainers have suggestions for better ways to handle exit_group,
> they would be greatly appreciated. 
>   * execve does not clean up the CPUs of clone'd children, for the same
> reasons as `exit_group`.
>
> Josh Kunz (5):
>   linux-user: Refactor do_fork to use new `qemu_clone`
>   linux-user: Make fd_trans task-specific.
>   linux-user: Make sigact_table part of the task state.
>   linux-user: Support CLONE_VM and extended clone options
>   linux-user: Add PDEATHSIG test for clone process hierarchy.
>
>  linux-user/Makefile.objs|   2 +-
>  linux-user/clone.c  | 565 
>  linux-user/clone.h  |  27 ++
>  linux-user/fd-trans-tbl.c   |  13 +
>  linux-user/fd-trans-type.h  |  17 +
>  linux-user/fd-trans.c   |   3 -
>  linux-user/fd-trans.h   |  75 ++--
>  linux-user/main.c   |   1 +
>  linux-user/qemu.h   |  49 +++
>  linux-user/signal.c |  84 -
>  linux-user/syscall.c| 452 --
>  tests/tcg/multiarch/Makefile.target |   3 +
>  tests/tcg/multiarch/linux-test.c| 227 ++-
>  13 files changed, 1264 insertions(+), 254 deletions(-)
>  create mode 100644 linux-user/clone.c
>  create mode 100644 linux-user/clone.h
>  create mode 100644 linux-user/fd-trans-tbl.c
>  create mode 100644 linux-user/fd-trans-type.h


-- 
Alex Bennée



Re: [PATCH 0/5] linux-user: Support extended clone(CLONE_VM)

2020-06-11 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20200612014606.147691-1-...@google.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

+++ /tmp/qemu-test/build/tests/qemu-iotests/041.out.bad 2020-06-12 
03:41:14.015076859 +
@@ -1,5 +1,30 @@
-
+WARNING:qemu.machine:qemu received signal 9: 
/tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 
-display none -vga none -chardev 
socket,id=mon,path=/tmp/tmp.YHsXXgjzyL/qemu-22509-monitor.sock -mon 
chardev=mon,mode=control -qtest 
unix:path=/tmp/tmp.YHsXXgjzyL/qemu-22509-qtest.sock -accel qtest -nodefaults 
-display none -accel qtest -drive 
if=virtio,id=drive0,file=/tmp/qemu-test/test.img,format=qcow2,cache=writeback,aio=threads,node-name=top,backing.node-name=base
 -drive if=ide,id=drive1,media=cdrom
+..E.
+==
+ERROR: test_pause (__main__.TestSingleBlockdev)
+--
+Traceback (most recent call last):
+  File "041", line 108, in test_pause
---
 Ran 104 tests
 
-OK
+FAILED (errors=1)
  TESTiotest-qcow2: 042
  TESTiotest-qcow2: 043
  TESTiotest-qcow2: 046
---
Not run: 259
Failures: 041
Failed 1 of 119 iotests
make: *** [check-tests/check-block.sh] Error 1
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 665, in 
sys.exit(main())
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=0b953bb4c5534a2f9f54bbb077c4a060', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-fy1xwr_z/src/docker-src.2020-06-11-23.31.14.29203:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=0b953bb4c5534a2f9f54bbb077c4a060
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-fy1xwr_z/src'
make: *** [docker-run-test-quick@centos7] Error 2

real17m15.344s
user0m8.554s


The full log is available at
http://patchew.org/logs/20200612014606.147691-1-...@google.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[PATCH 0/5] linux-user: Support extended clone(CLONE_VM)

2020-06-11 Thread Josh Kunz
This patch series implements extended support for the `clone` system
call. As best I can tell, any option combination including `CLONE_VM`
should be supported with the addition of this patch series. The
implementation is described in greater detail in the patches themselves.

Testing:

  * All targets built on x86_64.
  * `make check` and `make check-tcg` are passing. Additional tests have
been added to `linux-test.c` to validate clone behavior.

Caveats:

  * This series touches, but does not fix, several bits of code that are
racey (namely the sigact table and the fd trans table).
  * `exit_group` does not perform the appropriate cleanup for non-thread
children created with `CLONE_VM`. CPUs for such children are never
cleaned up. The correct implementation of exit-group is non-trivial
(since it also needs to track/handle cleanup for threads in the
clone'd child process). Also, I don't fully understand the
interaction between QOM<->linux-user. My naive implementation based
on the current implementation `exit(2)` was regularly crashing. If
maintainers have suggestions for better ways to handle exit_group,
they would be greatly appreciated. 
  * execve does not clean up the CPUs of clone'd children, for the same
reasons as `exit_group`.

Josh Kunz (5):
  linux-user: Refactor do_fork to use new `qemu_clone`
  linux-user: Make fd_trans task-specific.
  linux-user: Make sigact_table part of the task state.
  linux-user: Support CLONE_VM and extended clone options
  linux-user: Add PDEATHSIG test for clone process hierarchy.

 linux-user/Makefile.objs|   2 +-
 linux-user/clone.c  | 565 
 linux-user/clone.h  |  27 ++
 linux-user/fd-trans-tbl.c   |  13 +
 linux-user/fd-trans-type.h  |  17 +
 linux-user/fd-trans.c   |   3 -
 linux-user/fd-trans.h   |  75 ++--
 linux-user/main.c   |   1 +
 linux-user/qemu.h   |  49 +++
 linux-user/signal.c |  84 -
 linux-user/syscall.c| 452 --
 tests/tcg/multiarch/Makefile.target |   3 +
 tests/tcg/multiarch/linux-test.c| 227 ++-
 13 files changed, 1264 insertions(+), 254 deletions(-)
 create mode 100644 linux-user/clone.c
 create mode 100644 linux-user/clone.h
 create mode 100644 linux-user/fd-trans-tbl.c
 create mode 100644 linux-user/fd-trans-type.h

-- 
2.27.0.290.gba653c62da-goog